WO2024012003A1 - 数据处理方法、装置、设备、存储介质和程序产品 - Google Patents

数据处理方法、装置、设备、存储介质和程序产品 Download PDF

Info

Publication number
WO2024012003A1
WO2024012003A1 PCT/CN2023/090470 CN2023090470W WO2024012003A1 WO 2024012003 A1 WO2024012003 A1 WO 2024012003A1 CN 2023090470 W CN2023090470 W CN 2023090470W WO 2024012003 A1 WO2024012003 A1 WO 2024012003A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
target
response
statement
phenomenon
Prior art date
Application number
PCT/CN2023/090470
Other languages
English (en)
French (fr)
Inventor
曹琛
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP23838503.3A priority Critical patent/EP4379554A1/en
Publication of WO2024012003A1 publication Critical patent/WO2024012003A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Definitions

  • This application relates to the field of computer technology, and in particular to data processing.
  • ANR Application Not Responding
  • Embodiments of the present application provide a data processing method, device, equipment, storage medium and program product, which can effectively solve the non-response phenomenon of target applications from the operating system level, reduce the probability of any kind of non-response phenomenon, and have strong versatility. Can improve overall stability.
  • embodiments of the present application provide a data processing method, including:
  • Any non-responsive phenomenon is generated during the process of running the target application based on the system code of the operating system; and any non-responsive phenomenon is
  • the on-site information is used to describe: the execution of the system code when the corresponding unresponsive phenomenon occurs;
  • commonality analysis on the on-site information of various non-response phenomena to obtain commonality analysis results, and determine the fault point causing the non-response phenomenon from the system code based on the commonality analysis results; wherein the commonality analysis results include: the various non-response phenomena Shared information in field information responding to phenomena;
  • a data processing device including:
  • the acquisition module is used to obtain on-site information on various non-response phenomena of the target application. Any non-response phenomenon is generated in the process of running the target application based on the system code of the operating system; and any The on-site information of a non-responsive phenomenon is used to describe: the execution of the system code when the corresponding non-responsive phenomenon occurs;
  • the processing module is used to perform commonality analysis on the on-site information of various non-response phenomena, obtain commonality analysis results, and determine the fault point that causes the non-response phenomenon from the system code based on the commonality analysis results; wherein the commonality analysis results include: The common information in the on-site information of the various non-response phenomena;
  • the processing module is used to repair the system code according to the fault point, so as to run the target application based on the repaired system code.
  • embodiments of the present application provide a computer device, including: a processor, a memory, and a network interface;
  • the processor is connected to a memory and a network interface, where the network interface is used to provide network communication functions, the memory is used to store computer programs, and the processor is used to call the computer program to execute the data processing method in the embodiment of the present application.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, the data processing method in the embodiment of the present application is executed.
  • inventions of the present application provide a computer program product.
  • the computer program product includes a computer program or computer instructions.
  • the computer program or computer instructions are executed by a processor, the data processing method of the embodiment of the present application is implemented.
  • on-site information on various non-response phenomena of the target application can be obtained. Any non-response phenomenon can occur in the process of running the target application based on the system code of the operating system, and the corresponding on-site information Can be used to describe the execution of system code when the unresponsiveness occurs.
  • the fault point that causes unresponsiveness can be determined from the system code, and the system code can be repaired based on the fault point.
  • Running the target application based on the repaired system code can prevent the application from being unresponsive from the operating system side. Intercept possible situations where the phenomenon (that is, the target application's unresponsiveness) may occur, and try to avoid the occurrence of the application's unresponsiveness as much as possible. It can be seen that this method starts from the operating system level, collects and analyzes the commonality of on-site information about various unresponsive phenomena that actually occur, and repairs the system code of the native operating system based on the results of the analysis, which can fundamentally solve the problem. It solves the problem of unresponsiveness of applications on the operating system. It is highly versatile and can reduce the occurrence of any unresponsiveness in target applications on the operating system, thereby effectively improving the stability of the system or application.
  • Figure 1a is a schematic diagram of classification of application unresponsiveness provided by an embodiment of the present application.
  • Figure 1b is a schematic diagram of a process for triggering application unresponsiveness provided by an embodiment of the present application
  • Figure 1c is a schematic flowchart of a service startup provided by an embodiment of the present application.
  • Figure 1d is a schematic diagram of the generation process of the non-response phenomenon of a service type provided by the embodiment of the present application;
  • Figure 1e is a schematic diagram of the generation process of a broadcast type non-response phenomenon provided by an embodiment of the present application
  • Figure 1f is a schematic diagram of the generation process of a non-response phenomenon of a content providing type provided by an embodiment of the present application;
  • Figure 1g is a schematic diagram of the generation process of the non-response phenomenon of an input distribution type provided by an embodiment of the present application;
  • Figure 2a is an architectural diagram of a cloud gaming system provided by an embodiment of the present application.
  • Figure 2b is a schematic diagram showing the non-response phenomenon in a cloud game provided by an embodiment of the present application
  • Figure 2c is a schematic diagram of the underlying interaction logic of a system in a server in a cloud gaming scenario provided by an embodiment of the present application;
  • Figure 3a is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • Figure 3b is a schematic diagram of on-site information of unresponsiveness in a cloud gaming scenario provided by an embodiment of the present application
  • FIG. 4 is a schematic flowchart of another data processing method provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of yet another data processing method provided by an embodiment of the present application.
  • Figure 6a is a schematic diagram of the result of a parent process call stack provided by an embodiment of the present application.
  • Figure 6b is a schematic diagram of the logic code of an objective function provided by an embodiment of the present application.
  • Figure 7 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Operating System The full English name is Operating System, or OS for short.
  • An operating system is a computer program that manages computer hardware and software resources. It is the most basic system software in the computer system.
  • the operating system has processor management (such as process control, process synchronization), memory management (such as memory allocation and recycling, address mapping), device management (such as file storage space management, file read and write management) and file management (such as buffer management, virtual device) and other functions.
  • processor management such as process control, process synchronization
  • memory management such as memory allocation and recycling, address mapping
  • device management such as file storage space management, file read and write management
  • file management such as buffer management, virtual device
  • There are many types of operating systems such as the common Android (or Android system), Linux, Windows, IOS, etc.
  • Kernel The kernel is the core of the operating system.
  • the kernel can convert the input commands into machine language that the computer hardware can understand, and the kernel directly contacts the hardware and can inform the hardware of the request initiated by the application program.
  • the functions of the kernel include but are not limited to: process management, task scheduling, memory management, etc.
  • file management means that the kernel uses the file system to organize files, and maintains monitoring of file data storage, file status, and access settings through the file system
  • process management means that in a multi-process environment, the kernel determines which process is used by the CPU ( Central Processing Unit (Central Processing Unit) runs first and the length of the allocated running time slice
  • memory management means that the kernel can detect the memory space and generate or destroy memory to ensure that the application is executed correctly.
  • Hard coding refers to the application development practice of embedding data directly into the source code of a program or other executable object. Unlike getting data from outside or generating it at runtime, hardcoding can usually only be modified by editing the source code and recompiling the executable. In computer programming or text editing, hardcoding is a method of replacing a variable variable with a fixed value.
  • ANR The full name in English is Application Not Responding, and in Chinese it is Application Not Responding (can be referred to as No Response).
  • ANR is a very common phenomenon in operating systems (such as Android systems). For developers, it may be a code bug (error), but for users, it may cause a bad user experience. The operating system needs to complete some events within a certain period of time. If no effective response is obtained beyond the scheduled time or the response time is too long, ANR will occur. Generally speaking, a prompt box will pop up on the system interface to inform the object that the current application is not responding. The object can choose to continue waiting (Wait) or force close (Force close), which is a self-protection mechanism of the operating system. In addition, when an application becomes unresponsive, you can also directly close the process where the application becomes unresponsive without popping up a prompt box to prompt the object.
  • the target application refers to any application running in the computer device, such as an application program in a terminal or a service program in a server.
  • the target application can be an installation-free application (such as a small program, a web application (such as a shopping website)), or a third-party application installed on a computer device; according to the application function, the target application can be It is any one of game applications, audio and video applications, social applications, shopping applications, etc.
  • the target application can run based on the environment provided by the operating system. During the process of running the target application based on the system code of the operating system, the target application may become unresponsive (ie, ANR).
  • the ANR generated by the target application may include the following categories: service type non-response phenomenon, broadcast type non-response phenomenon, content provision type non-response phenomenon, and input distribution type non-response phenomenon.
  • the non-response phenomenon refers to the non-response phenomenon triggered by the service not being completed within the predetermined time period (for example, 20s) (Service Timeout);
  • the non-response phenomenon of the broadcast type refers to the non-response phenomenon that the broadcast is not completed within the predetermined time period (such as 10s) (BroadcastQueue Timeout);
  • the non-response phenomenon of the content provider type refers to the non-response phenomenon triggered by the content provider timeout (ContentProvider Timeout) after publishing (publish);
  • the non-response phenomenon of the input distribution type refers to the input event Unresponsiveness triggered by distribution timeout (InputDispatching Timeout).
  • the above four types of ANR can be classified into the classification diagram of application unresponsiveness as shown in Figure 1a.
  • the generation process (or triggering mechanism) of various unresponsive phenomena in the above-mentioned applications will be introduced one by one
  • the process of triggering ANR can be divided into three steps, as shown in Figure 1b, including: setting the duration threshold, and eliminating application unresponsiveness (that is, application unresponsiveness, referred to as ANR).
  • Trigger conditions trigger application no response (ie ANR).
  • ANR application no response
  • the duration threshold can be set based on the corresponding ANR.
  • the duration thresholds of different ANRs may be the same or different. If it times out, ANR will be triggered. If there is no timeout, ANR will not be triggered, and the ANR triggering condition will be released.
  • releasing the ANR triggering condition means canceling the originally set duration threshold.
  • a target application is any application currently running on the computer device.
  • the application process of the target application can interact with the underlying processes of the operating system (such as system service processes) to complete corresponding events.
  • the system service process is used to start and manage the entire Javaframework. Important services in the system can be started in the system service process, such as component management service AMS (ActivityManagerService), window management service WMS (WindowManagerService), etc.
  • the system service process and the application process of the target application can be hatched through the Zygote process (a process used to incubate new processes, referred to as the incubation process here).
  • the following content is specifically described using the system service process in the operating system as the execution subject.
  • the four major components of the Android operating system Activity, Service, BroadCast Receiver, and Content Provider.
  • 1Activity component is a visual interface for object operations, which can provide a window for objects to complete operation instructions.
  • 2Service service is an application component that can perform long-running operations in the background without an object interface. Since Service usually runs in the background and generally does not need to interact with objects, the Service component does not have a graphical object interface. The Service component is usually used to provide background services for other components or detect the running status of other components.
  • 3BroadCast Receiver can be used to filter out external events that the application is interested in (such as incoming phone calls, when the data network is available) and respond to them.
  • BroadCast Receiver can start an Activity or Service to respond to the information they receive, or use NotificationManager (notification manager) to notify objects, such as notifications to play sounds, or messages displayed in the status bar.
  • NotificationManager notification manager
  • 4ContentProvider content provider
  • ContentProvider provides the specified data set of an application to other applications, and other applications can obtain or store data from the content provider through the ContentResolver class.
  • ContentProvider publishes data and can be called through the ContentResolver object combined with Uri (Universal Resource Identifer, Universal Resource Identifier).
  • Uri represents the address of data operation, and each ContentProvider will have a unique address when publishing data.
  • the process of non-responsiveness of the service type includes: responding to the service creation sent by the application process of the target application. Request, set the service duration threshold; send a service creation message to the service process, so that the service process will notify the service process to call one or more other processes to perform service creation based on the service creation message, and return feedback information after the service is successfully created; if If the feedback information returned by the service process is not received within the service duration threshold, the service type non-response phenomenon will occur.
  • the service creation request sent by the application process of the target application can be used to request the system to create services required by the target application, such as listening services, notification services, etc.
  • the listening service is, for example, a service that listens to whether the player account is logged in, or It is a service that listens to whether players are online.
  • the system service process can set a service duration threshold in response to a service creation request initiated by an application process.
  • the service duration threshold can be used to detect whether service creation times out. For example, the foreground service duration threshold and the background service duration threshold are both 20s. Then, the system service process can send a service creation message to the service process, so that the service process calls one or more processes to perform service creation work.
  • the service process can be pre-created by the component management service request in the system service process.
  • the main thread in the service process can call one or more other processes (that is, processes other than the service process) to perform service creation work.
  • This one or more processes can include processes with a parent-child relationship, and the parent-child relationship Address space can be shared between processes.
  • the main thread in the service process may send feedback information to the system service process.
  • the feedback information here may be a notification message indicating that the service creation is completed.
  • the feedback information returned by the service process is not received within the service duration threshold, it means that the time spent on creating the service is greater than or equal to the service duration threshold, and a service timeout occurs, resulting in service type unresponsiveness.
  • the feedback information returned by the service process is received within the service duration threshold, it means that the time spent in creating the service is less than the service duration threshold, and no unresponsiveness of the service type will occur.
  • Service Timeout occurs when startService (starting the service).
  • For Service it can specifically include the following two categories: foreground service, the timeout time (i.e., service duration threshold) is 20s; background service, the timeout time is 200s.
  • Starting a Service in the target application can be achieved by calling the API startService with one line of code.
  • the process is a simple process diagram of startService as shown in Figure 1c below.
  • the process of service creation and startup is mainly completed by the component management service (AMS, ActivityManagerService) in the system service process.
  • AMS requests Zygote to create a creation process (Create Process) that hosts the service through socket (socket, a communication method) communication, which includes a request to create a thread (ActivityThread, the main thread).
  • the service runs in a separate creation process. For running local services, you do not need to start the service process.
  • ActivityThread plays the role of the main thread of the application. After that, Zygote copies the Zygote process to generate a new process through fork, and loads ActivityThread-related resources into the new process.
  • AMS sends a request to create a service to the ActivityThread in the newly generated process through Binder (an inter-process cross-process communication mechanism) communication.
  • ActivityThread starts the running service. Specifically, ActivityThread can call the onCreate method to create the service (createservice).
  • the system service process i.e. system_server process
  • the system service process will allocate an idle thread binder_1 (i.e. communication thread 1) to receive the request, and then send the service timeout message SERVICE_TIMEOUT_MSG to the component manager ActivityManager to set the service duration threshold (step 2 in the figure); next, binder_1 notifies binder_3 of the service process (i.e. service process) (i.e.
  • communication thread 3) prepares to perform processing work (scheduling and creation of services scheduleCreateService) (step 3 in the figure); after binder_3 receives it, it is handed over to the main thread (that is, the main thread, which can correspond to the main thread ActivityThread in Figure 1c), and the service creation event is added to the task queue of the main thread (that is, sent message sendMessage) (step 4 in the figure); then the main thread will perform a series of work, specifically the service creation work, which can include the process of creating the service by the ActivityThread thread in Figure 1c (step 5 in the figure) , complete the startup of the service life cycle (waiting for waitToFinish to be completed); after completing the above work, the main thread will report to the system_server process that the work has been completed (that is, the service creation work serviceDoneExecuting is completed), and then the binder_2 thread in the system_server process (that is, communication thread 2) A message will be received (step 6 in the figure). If the service creation is completed within the service
  • the process of generating a broadcast type non-response phenomenon includes: responding to a broadcast request initiated by the application process of the target application, setting the broadcast duration threshold; sending a broadcast registration message to the broadcast receiving process, so that the broadcast receiving process notifies the broadcast based on the broadcast registration message
  • the receiving process calls one or more other processes to perform broadcast work, and returns feedback information after the broadcast is completed; if the feedback information returned by the broadcast receiving process is not received within the broadcast duration threshold, a broadcast-type unresponsive phenomenon will occur.
  • the broadcast mechanism is used for communication between processes/threads. Broadcasting can be divided into broadcast sending and broadcast receiving. Broadcasting can include parallel broadcasting and serial broadcasting. Common application unresponsiveness occurs in serial broadcast scenarios. Similar to the generation process of service type ANR, the system service process can respond to the broadcast request initiated by the application process of the target application and set the broadcast duration threshold. The application process of the target application is the process where the broadcast sender is located. The system service process can send a broadcast registration message to the broadcast receiving process, so that the main thread in the broadcast receiving process can call one or more other processes to perform broadcast work. This one or more processes can include processes with a parent-child relationship, and the address space can be shared between parent-child processes.
  • the broadcast receiving process is the process where the broadcast receiving end is located and can be used to receive broadcast messages from other applications or systems. Broadcasting work can specifically broadcast various events, such as broadcasts of date changes, broadcasts of system startup completion, etc. In a cloud game scenario, the broadcast events are, for example, network switching broadcasts, network failure broadcasts, and so on.
  • a broadcast receiving queue can be created based on the broadcast registration message to process the received broadcast events in an orderly manner. After the main thread completes processing of the broadcast work, feedback information can be returned.
  • the feedback information is a notification message indicating the completion of the broadcast work.
  • the time spent in broadcast work is greater than or equal to the broadcast duration threshold, and a broadcast timeout occurs, resulting in a broadcast-type unresponsive phenomenon.
  • the feedback information returned by the broadcast process is received within the broadcast duration threshold, it means that the time spent on creating the broadcast is less than the broadcast duration threshold, and no broadcast-type unresponsiveness will occur.
  • the broadcast type ANR is introduced in more detail.
  • the schematic diagram of the generation process of the broadcast type ANR shown in Figure 1e can be combined.
  • the system service process can allocate an idle thread binder_1 (i.e. communication thread 1) to receive the send broadcast request, and then send it to the component
  • the manager ActivityManager sends the broadcast timeout message BROADCAST_TIMEOUT_MSG (sendMessage) to set the broadcast duration threshold (step 2-3 in the figure).
  • binder_1 notifies the binder_3 thread (ie, Reciver process) of the broadcast receiving process (ie, Reciver process) through the broadcast registration message.
  • Communication thread 3 prepares to perform processing work (step 4 in the figure); binder_3 sends a message to the main thread (i.e.
  • main thread after receiving it, and adds the event to the task queue of the main thread (communication thread 3 sends a message to the main thread , sendMessage, step 5) in the figure; next the main thread will perform a series of work, here is the broadcast work including the start of the life cycle of broadcast reception, if it is found that the current process also has SP (SharedPreferences, a data storage method ) is writing the file and needs to wait for the SP data persistence work (that is, the main thread sends a message to the queued work loop thread, sendMessage, as shown in step 6 in the figure), and the queued-work-looper thread (queued work loop thread) sends a message to the queued work loop thread.
  • SP SharedPreferences, a data storage method
  • the system_server process reports that the broadcast work has been completed (step 7 in the figure). On the contrary, the main thread can directly report that the broadcast work has been completed (that is, finishReceiver). Then the binder_2 thread (i.e. communication thread 2) in the system_server process will receive the message. If all work is completed within the broadcast duration threshold, the trigger condition of ANR can be released without ANR (step 8 in the figure) , otherwise ANR will occur.
  • the process of generating the non-response phenomenon of the content providing type includes: responding to a request to obtain the content providing object initiated by the application process of the target application, detecting the startup status of the content providing process corresponding to the content providing object; if the startup status indicates that the content providing process has not Start, create a content providing process, and notify the content providing process to call one or more other processes, install the content providing object, and return feedback information after installing the content providing object.
  • the content providing object is configured with an installation duration threshold; if If the feedback information returned by the content providing process is not received within the duration threshold, a non-response phenomenon of the content providing type will occur.
  • Content provider objects are used to implement data sharing functions between different applications.
  • the content providing process is used to install and publish content providing objects to provide content data and implement data sharing functions.
  • the system service process can respond to a request to obtain a content provider object initiated by the application process of the target application and detect the startup status of the content provider process. If the content provider process is not started, it means that the content provider process may not have been created, and the content provider can be created. The process is started. After the content providing process is created, it can register itself with the system service process and set the installation time threshold. After that, the content providing process is notified to perform the installation of the content providing object.
  • the main thread in the content providing process can call a or multiple other processes to execute. This one or more processes can include processes with a parent-child relationship, and the address space can be shared between parent-child processes.
  • the feedback information is a notification message used to indicate the completion of the installation of the content providing object. It can also be used to instruct the publishing of the content providing object to return the content data provided by the obtained content providing object.
  • the content provided object can be the address book in the system
  • the content data provided can be data related to the contacts in the address book. It can be understood that in the specific implementation of this application, related data such as address books are involved.
  • the installation time will be greater than or equal to the installation time threshold, and a content provision timeout will occur, resulting in content provision type unresponsiveness.
  • the feedback information returned by the content providing process is received within the installation time threshold, it means that the installation time of the content providing object is less than the installation time threshold, and no unresponsiveness of the content providing type will occur.
  • FIG. 1f A schematic diagram of the generation process of broadcast type ANR shown in Figure 1f.
  • the system service process can allocate an idle thread binder_1 (ie, communication thread 1) to receive the request to obtain ContentProvider.
  • the communication site binder_2 i.e., communication thread 2 of the system_server process receives the registration message (step 3 in the figure), sends the content provision object publishing timeout message CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG message to the ActivityManager inside the system_server process, and sets the installation time threshold (as shown in the figure) Step 4) of Then it is handed over to the main thread (that is, sending a message to the main thread, sendMessage, as shown in step 6 in the figure), and adding the event to the task queue; then the main thread will perform a series of work, here is the installation work of the content provided object, Optionally, it can also include the work of publishing the content provider object, and
  • the process of generating unresponsiveness of the input event distribution type includes: if an input event is received, the currently received input event is added to the input queue and the input distribution thread is awakened.
  • the input distribution thread is used to transfer the input event in the input queue.
  • Each input event is sequentially distributed to the application process of the target application for processing; when the distribution sequence of the currently received input event arrives, the application process of the target application is processing other input events, resulting in an input event distribution type unresponsive phenomenon. .
  • the thread in the system service process can listen to the input event reported by the bottom layer, and when receiving the input event, it can add it to the input queue to wait for distribution and processing. At the same time, it can wake up the input distribution thread to distribute it to the input queue.
  • input event at this time you can set the distribution start time.
  • the distribution sequence of the currently received input event arrives, for example, when the distribution time point of the currently received input event is reached, if the application process of the target application is still processing other input events, it means that the application process cannot process the current reception that is about to be distributed. If the input event is received, unresponsiveness of the input event distribution type will occur.
  • the input event can specifically be an operation event of the object in the game client. The operation event arrives in the server running the cloud game in the form of an operation stream, but cannot be processed, resulting in an ANR.
  • the currently received input event is distributed to the application process through the input distribution thread. This allows the application process to call one or more other processes to process the currently received input event and return feedback information after the processing is completed; if the feedback information returned by the application process is not received within the processing time threshold, an input event is generated. Distribution type unresponsiveness.
  • the currently received input event can be distributed to the application process through the input distribution thread, specifically to the target window of the application process.
  • the application process can call one or more other processes to The currently received input event is processed, and feedback information is returned after the processing is completed.
  • the feedback information is a notification message indicating the completion of the input event.
  • One or more processes called may include processes with a parent-child relationship, and the address space can be shared between parent-child processes.
  • the feedback information returned by the application process is not received within the processing time threshold, and a new input event is received, it means that the currently received input event has not been processed within the specified processing time threshold, and the next input event will be waited. Processing of currently received input events, resulting in unresponsiveness of the input event distribution type. On the contrary, as long as no new input event is received when processing the currently received input event, no matter whether the feedback information returned by the application process is not received within the processing time threshold, or the feedback information returned by the application process is received within the processing time threshold, , the unresponsiveness of the input event distribution type will not occur.
  • the ANR of the input event distribution type is introduced in more detail. Specifically, it can be combined with the schematic diagram of the generation process of the input event distribution type ANR shown in Figure 1g.
  • the InputReader thread i.e., the input reading thread
  • EventHub Event center
  • the InputDispatcher thread starts input event distribution, sets the distribution start time, first detects whether there are events being processed, if not, takes out the event at the head of the mInBoundQueue queue, and then checks Whether the window is ready; when the window is ready, move the event to the outBoundQueue queue (output queue, used to store input events that will be distributed to the target window); at this time, if the application pipeline peer connection is normal, the data will be taken out from the outBoundQueue.
  • InputDispatcher sends a message to inform the target application that it is ready to perform processing work; at this time, the main thread of the target application ( That is, the main thread) receives input events and forwards the received input events layer by layer to the target window for processing; after completing the above work, a message will be sent to the system_server process to report the completion of the work (that is, sending a completion signal, as shown in step 7 in the figure), Next, the system will remove the event from the waitQueue queue; if a time-consuming operation (such as file operation) is currently being processed in the input system, each subsequent input event will detect whether the previous input event has timed out. If it times out, ANR.
  • a time-consuming operation such as file operation
  • this application provides a solution to repair the system code to solve the unresponsiveness of the application.
  • on-site information about various unresponsive phenomena of the target application can be collected. Based on this on-site information, the execution status of the system code when the target application produces unresponsive phenomena can be known.
  • Common analysis results can be determined by performing common analysis. The so-called common analysis refers to the analysis of finding common elements.
  • the fault point that causes the unresponsive phenomenon can be determined from the system code of the operating system, thereby based on Repair the system code at the fault point, so that running the target application based on the repaired system code can effectively reduce the probability of unresponsiveness and reduce the impact of the unresponsiveness on the target application or operating system.
  • the crash situation is conducive to the stable operation of the system and applications.
  • This solution starts from the perspective of the underlying system and fundamentally solves the problem of unresponsive applications on the operating system by repairing the system code of the operating system at the operating system level. This is a fundamental solution and is universal.
  • the target application may be a cloud gaming application.
  • the so-called cloud game refers to an online game technology based on cloud computing technology.
  • Cloud computing technology is a kind of cloud technology.
  • Cloud technology refers to a hosting technology that unifies a series of resources such as hardware, software, and network within a wide area network or local area network to realize data calculation, storage, processing, and sharing.
  • cloud computing is a computing model that distributes computing tasks across a resource pool composed of a large number of computers, enabling various application systems to obtain computing power, storage space and information services as needed.
  • the network that provides resources is called a "cloud.”
  • the resources in the "cloud” can be infinitely expanded from the user's point of view, and can be obtained at any time, used on demand, expanded at any time, and paid according to use.
  • cloud platform As a basic capability provider of cloud computing, it will establish a cloud computing resource pool (referred to as cloud platform, generally called IaaS (Infrastructure as a Service, infrastructure as a service) platform), and deploy various types of virtual resources in the resource pool to provide External customers choose to use it.
  • the cloud computing resource pool mainly includes: computing equipment (virtualized machines, including operating systems), storage equipment, and network equipment.
  • cloud computing technology can also be used for common analysis of on-site information.
  • the game can be run on a cloud server, and the high-consuming rendering calculations in the game can be placed on the cloud server, and the images and sounds can be transmitted to the player's game terminal through the network using audio and video streams to operate the stream.
  • the user's operation instructions are transmitted to the cloud server to perform corresponding calculations.
  • 5G Fifth Generation Mobile Communication Technology
  • the environment in which cloud games are run can be called a cloud gaming environment.
  • multiple operating systems can be run on one or more independent servers (such as servers using ARM/x86 and other architectures) by running system containers, and related video streams can be The image is passed to the remote receiving program for processing.
  • the ARM architecture is a 32-bit/or 64-bit reduced instruction set processor architecture
  • the x86 architecture (The X86 architecture) is a computer language instruction set executed by a microprocessor.
  • Containers refer to a type of operating system-level virtualization.
  • Containers can be used to host operating systems; this can be achieved through isolation mechanisms (such as namespaces): in the kernel state, multiple operating systems (i.e., server operating systems and device Operating systems) share the same kernel; in user mode, multiple operating systems remain independent of each other.
  • the server operating system here refers to the general operating system in the server, such as Linux operating system, Android operating system, etc.
  • the device operating system refers to the operating system in the container, such as Android operating system, IOS operating system, etc.
  • a system container refers to an instance of a container, which can run based on a server operating system (such as a Linux operating system); for example, the system container can be an Android container running on a Linux operating system, and the Android container loads Android Mirroring, the so-called mirroring is a form of file storage; merging multiple files into one image file through mirroring can facilitate the distribution and use of files.
  • a server operating system such as a Linux operating system
  • the system container can be an Android container running on a Linux operating system, and the Android container loads Android Mirroring, the so-called mirroring is a form of file storage; merging multiple files into one image file through mirroring can facilitate the distribution and use of files.
  • the system container mentioned in the embodiment of this application is not limited to the Android container; for example, if the IOS operating system supports open source research and development, the system container can also be an IOS container, and so on.
  • the cloud gaming environment can be supported by corresponding operating resources provided by the devices in the cloud gaming system.
  • the cloud gaming system may include at least one edge server 11, multiple game clients 12, and at least one analysis server 13.
  • the edge server is a server running cloud games.
  • At least one system container can be deployed in each edge server as shown in Figure 2a, and one or more game APPs (Applications) can be installed in the system container. These game apps can run one or more cloud games, so that cloud games can be run through the system container.
  • Each system container can be connected to at least one game client 12, thereby transmitting the game screen and sound of the cloud game to the connected game client 12.
  • a prompt can be displayed on the game client 12, as shown in Figure 2b.
  • the analysis server 13 is a server used to analyze the phenomenon of unresponsiveness of applications. During the running process of the cloud game application in the system container in at least one edge server 11, there may be different types of unresponsiveness in the application in one or more edge servers. At this time, the analysis server 13 can collect the information in the edge server 11. On-site information generated when at least two non-response phenomena occur in a certain cloud game application, and conduct commonality analysis on the on-site information of various non-response phenomena. Based on the common analysis results, find out the common fault points that cause the non-response phenomenon. The common fault point repairs the system code of the operating system, and runs the cloud game application based on the repaired system code to solve the problem of application unresponsiveness.
  • the above work can also be performed by a target edge server, that is, any edge server in at least one edge server 11 can obtain on-site information of multiple unresponsive phenomena, and repair the operating system based on the same principle as above. System code to solve and avoid application unresponsiveness.
  • edge servers and analysis servers can be independent physical servers, or a server cluster or distributed system composed of multiple physical servers. They can also provide cloud services, cloud databases, cloud computing, cloud functions, and cloud computing. Storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, but are not limited to this.
  • the game client 12 can be a terminal device that provides basic capabilities such as streaming media playback capabilities, human-computer interaction capabilities, and communication capabilities, or it can be an application program running in the terminal device.
  • terminal devices include but are not limited to: mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, aircraft and other devices, which are not limited in this application.
  • Figure 2a is only an exemplary system architecture representing the cloud game system, and does not limit the specific architecture of the cloud game system; for example, in other embodiments, the cloud game system may also include a background server for scheduling. ,etc.
  • the underlying layer can be based on containerization technology, system and kernel technology, and on this basis, it can also be combined with audio and video technology and network optimization.
  • computing resource management and other aspects create a corresponding cloud game platform (which can provide operating environment and services for cloud games), and a variety of cloud games can be run on this cloud game platform.
  • the following takes the cloud game bottom layer based on the Android native operating system and the Linux kernel as an example to illustrate the interaction logic related to the system level of the cloud game running process.
  • the underlying interaction logic of the system in the server specifically involves the interaction between the system container, the Linux kernel, and the hardware provided by the server.
  • the system container or the game APP in the system container can send an operation request to the operating system, and the operating system interacts with the Linux kernel in it.
  • the Linux kernel receives the operation request, and based on the Operation request
  • the Linux kernel can call related hardware (such as one or more of central processing unit, graphics processor, memory, etc.) to complete the operation corresponding to the operation request.
  • the operation request can be The corresponding operation results are returned to the system container through the Linux kernel.
  • the graphics processor GPU, Graphics Processing Unit
  • the server can be called based on the rendering request to execute the rendering event corresponding to the rendering request, and the rendered game screen can be obtained and displayed through Linux.
  • the kernel returns the rendered game screen.
  • the encoding module in the system container can also be called to perform image compression on the returned game screen to obtain the compressed image.
  • the underlying hardware resources can be called through the Linux kernel for encoding.
  • Obtain the encoded data that is, the compressed image
  • the compressed image is transmitted to the game client in the form of a video stream.
  • this application provides a data processing method, which can be executed by a computer device, which can be a terminal or a server.
  • a computer device which can be a terminal or a server.
  • the computer device here may be, for example, any analysis server 13 in the cloud gaming system shown in Figure 2a.
  • the data processing method can also be executed by the terminal and the server, and there is no limitation on this; for ease of understanding, the following description takes the example of the data processing method being executed by a computer device.
  • the terminal can be a mobile phone, a computer, an intelligent voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, or other equipment, and this application does not limit this.
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware. Cloud servers include software services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and basic cloud computing services such as big data and artificial intelligence platforms, but are not limited to these.
  • Figure 3a is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • the data processing method Management methods may include the following S301-S303:
  • S301 Obtain on-site information on various non-response phenomena among the multiple non-response phenomena of the target application.
  • Multiple unresponsive phenomena of the target application refer to at least two unresponsive phenomena of the target application, such as the unresponsive phenomenon of the service type, the unresponsive phenomenon of the broadcast type, the unresponsive phenomenon of the content providing type and the input event distribution type introduced above. At least two of the non-response phenomena.
  • the non-response phenomenon of the target application refers to the phenomenon that the target application does not respond. Specifically, some events fail to receive an effective response within a predetermined time or the response time is too long.
  • the unresponsiveness of the target application may be referred to as application unresponsiveness (ANR) or the ANR phenomenon.
  • the operating system (Operation System, OS) is used to control and manage the hardware and software resources of the entire computer system, and to reasonably organize and schedule the computer's work and resource allocation to provide convenient interfaces and environments for objects and other software. Categories of operating systems include but are not limited to: Android operating system, Windows operating system, Linux operating system, iOS operating system, etc.
  • the operating system can provide a running environment for applications, and applications can complete some events by calling functions provided by the operating system (i.e., system calls). Therefore, running the target application based on the system code of the operating system means that the system code of the operating system can be called during the process of running the target application. During this process, the target application may become unresponsive. For example, when the system code of the Android operating system is called to run the target application, the listening service required by the target application is not created within a predetermined time.
  • a target application refers to any application running on a computer device. For example, an application running in a terminal or a service program running in a server. The target application can specifically be any of game applications, audio and video applications, social applications, shopping applications, etc. Taking the cloud game scenario as an example, the target application can refer to the system container deployed in the server. Cloud gaming application or cloud gaming service program.
  • the non-response phenomenon of the target application can occur in different scenarios.
  • the non-responsiveness phenomenon generated in one scenario can correspond to a type of non-response phenomenon.
  • unresponsiveness will occur in the following scenarios: 1Service timeout (Service Timeout), specifically including foreground service timeout and background service timeout, for example, the foreground service is not completed within 20s; 2Broadcast timeout ( Broadcast Timeout), including foreground broadcast timeout and background broadcast timeout, for example, the foreground broadcast is not completed within 10s; 3 Content provider timeout (ContentProvider Timeout), for example, the content provider times out for 10s after publish and becomes unresponsive; 4Input event distribution timeout (InputDispatching Timeout), including key and touch events. Specifically, it includes key response distribution timeout (Key Dispatch Timeout) and touch event distribution timeout. For example, the default key response distribution time is 5s. If it exceeds, nothing will happen. response phenomenon.
  • 1Service timeout Service Timeout
  • On-site information can be automatically generated and collected when each non-response phenomenon occurs, and one type of non-response phenomenon can correspond to a type of on-site information.
  • the on-site information of any kind of non-response phenomenon is used to describe: the execution of the system code when the corresponding non-response phenomenon occurs.
  • the on-site information of the unresponsive phenomenon refers to the relevant information captured by the operating system when the unresponsive phenomenon occurs.
  • This on-site information can be used to describe the execution of the system code of the operating system when this kind of unresponsiveness occurs.
  • the execution status of the system code includes, for example, at which step in the execution of the system code the unresponsiveness occurs, in which process or thread the event response times out, etc.
  • the on-site information may include but is not limited to the following: the time when the ANR occurs, a trace file used to record the stack of each thread of the process before and after the ANR occurs, the ANR type, and the time of each process when the ANR occurs. Information (such as process number, process name, process execution start time and end time), etc. Various unresponsive on-site information can be recorded in relevant logs. During specific analysis, all or part of the on-site information can be obtained for further analysis.
  • the unresponsive scene of a cloud game application can be shown in Figure 3b.
  • docker exec–it b23 sh b2343985639e means entering the container with the container name b2343985639e and finding the location of the configuration file of the corresponding image.
  • grep Sgame means that the process containing the sgame keyword will be found in the container named b2343985639e and displayed.
  • the scene where the application becomes unresponsive occurs on the cloud gaming platform.
  • the obtained scene information includes the process information of each process in the container (only part of which is shown).
  • the process ID of the process (PID, Process Identifiction) 1910
  • the ID of the parent process of the process (PPID, Parent Process Identifiction) 110
  • the resource percentage used by the CPU 16 and the system startup time 12 :54:18
  • the CPU usage time is 00:03:15
  • the process name "com.ten.tmgp.sgame", etc.
  • S302 Perform commonality analysis on the on-site information of various non-response phenomena, obtain commonality analysis results, and determine the fault point causing the non-response phenomenon from the system code based on the commonality analysis results.
  • Commonality analysis refers to the analysis method of finding common characteristics.
  • commonality analysis is performed on the on-site information of various non-response phenomena, which means to find the common characteristics that produce various non-response phenomena and obtain the commonality analysis results.
  • the commonality analysis results include common information in the on-site information of various unresponsive phenomena, such as the same process number in the on-site information of various unresponsive phenomena.
  • the fault point that causes the non-response phenomenon can be determined from the system code.
  • This fault point is a common fault point where various non-response phenomena occur.
  • the fault point can be a fault code in the system code that causes various unresponsive phenomena. For example, the unresponsiveness occurs when using system call functions. Through this fault point, we can further understand the common causes of application unresponsiveness, so as to facilitate the determination of how to repair the system code.
  • ANR ANR-based neural network
  • the specific analysis method for specific ANR will be limited to specific scenarios (such as operation System version, application version, terminal, etc.), the solution to solve ANR may not be applicable in another scenario. Therefore, the versatility for various scenarios is not high.
  • the analysis mechanism provided in this program is a common analysis mechanism. By looking for common characteristics between on-site information of various unresponsive phenomena, common analysis results are obtained, and based on the common analysis results, the causes of various unresponsive phenomena are known. point of failure. Furthermore, subsequent repair processing can overcome any of the various non-responsive phenomena of the target application, effectively reducing the probability of occurrence of application non-responsiveness, and is versatile and stable in various scenarios.
  • the system code of the operating system can be repaired based on the fault point to obtain the repaired system code.
  • the repair process here may refer to the modification of the system code.
  • the repaired system code is an optimization of the original system code based on the fault point.
  • Running the target application based on the repaired system code can call the repaired system code while the target application is running. Since the system code optimizes the failure point where unresponsiveness occurs, it can effectively avoid the occurrence of unresponsiveness during the application. No response.
  • the transformation of the native operating system is achieved by repairing the system code of the operating system. Since it is a modification of the system code, it can solve the problems when various ANRs occur in principle, and the operating system can be deployed on any device or device. in the platform. In this way, this solution can not only be applied to cloud gaming platforms, but also effectively solve the instability of cloud gaming platforms due to ANR and improve the stability of cloud gaming platforms. It can also be applied to real terminals (such as mobile phones), simulators (such as Android emulators) products, and operating system platforms (such as other Android platforms), etc. It has strong scene versatility and can effectively avoid various ANR phenomena.
  • the data processing solution provided by the embodiment of the present application supports the collection of on-site information on various non-response phenomena of the target application. Since the on-site information is used to describe the execution of the system code when the non-response phenomenon occurs, by collecting multiple Conduct commonality analysis on the on-site information of various ANRs. We can analyze the execution of the system code from the operating system level, find the common characteristics that cause various non-response phenomena from the bottom layer, and obtain the commonality analysis results. Based on the commonality analysis results, we can determine the non-response phenomenon. The system code of the operating system can be repaired based on the fault point, and the target application can be run based on the repaired system code.
  • the repaired system operation can intercept the possible occurrence of unresponsiveness from the operating system side, which can well cope with most ANR scenarios and avoid most ANR problems. It can be seen that this solution can start from the underlying system code of the operating system and solve the ANR problem on the operating system from the root. It is highly versatile and can reduce the occurrence of any unresponsiveness in the operating system or the target application, thereby effectively improving System or application stability. When applied to a cloud gaming platform, it can improve the compatibility and stability of the cloud gaming platform and improve the player's gaming experience.
  • Figure 4 is a schematic flow chart of another data processing method provided by an embodiment of the present application.
  • the data processing method may include S401-S406:
  • S401 Obtain on-site information on various non-response phenomena among the multiple non-response phenomena of the target application.
  • S402 Perform commonality analysis on the on-site information of various non-response phenomena, and obtain commonality analysis results.
  • the system code includes multiple processes and the code fragments executed by each process; the on-site information of any unresponsive phenomenon includes: each process that is running in the system code when the corresponding unresponsive phenomenon occurs. process identifier.
  • a process is an execution process of a program. After an executable program is run, it becomes a process. Processes can execute code in the runtime environment.
  • the system code includes code fragments executed by each process in multiple processes. The code fragments are part of the system code executed by the process in the system code. Specifically, when the system code is compiled and then run, the system code can be run into multiple process instances, and each process instance executes a corresponding code fragment.
  • the scene information may include the process identification corresponding to the running process.
  • the process identifier is information used to mark the process.
  • the process identifier can be a process name or a keyword of the process name, etc.
  • the optional implementation method of S402 can be: for any field information in the field information of various unresponsive phenomena, traverse each process identifier in any field information; in each field information except any field information, search The process identifier currently traversed; if the process identifier currently traversed is found, the process identifier currently traversed is added to the commonality analysis results; if the process identifier currently traversed is not found, each process in any site information is continued to be traversed logo.
  • Any field information refers to any field information of non-response phenomenon among the field information of various non-response phenomena obtained.
  • Each process ID in any site information can be traversed.
  • the currently traversed process ID can be found in the site information in other unresponsive phenomena.
  • the currently traversed process ID is the process ID being traversed in any scene information. If the currently traversed process ID is found, it means that there is a process ID that is the same as the currently traversed process ID in other scene information.
  • the process corresponding to the process ID is also run when other unresponsive phenomena occur.
  • the currently traversed process ID found The process identifier is common to various non-response phenomena.
  • the currently traversed process identifier can be added to the common analysis results. If the currently traversed process identifier is not found, it means that the currently traversed process does not exist in other on-site information. Identifies the same process ID, and then can continue to traverse each other process ID in any unresponsive scene information.
  • the obtained on-site information of various non-response phenomena includes on-site information of four types of non-response phenomena.
  • the four types of on-site information of non-response phenomena are respectively: a site information of type A non-response phenomenon, type B non-response phenomenon.
  • b site information for the phenomenon c site information for the type C non-response phenomenon, and d site information for the type D non-response phenomenon.
  • the process identifier contained in the site information is the process name.
  • the process name game in the site information a is currently being traversed. It can be searched in site information b, site information c, and site d.
  • process identifier being traversed in the on-site information of any unresponsive phenomenon as a benchmark and searching for the same process identifier in other on-site information, the analysis of the common characteristics in the system code can be realized.
  • process identifier is a concise representation. , such as numbers or simple characters, can efficiently find the same process identifier and improve the efficiency of commonality analysis.
  • the commonality analysis results include process identifiers shared among field information of various non-response phenomena.
  • the following implementation methods of S403 to S405 can be used to determine the fault point from the system code based on the common analysis results.
  • the shared process identifier can be implemented using the optional implementation method of S402 introduced above, or other methods can be used.
  • S403 Determine M target processes based on each process identifier in the commonality analysis result.
  • the process corresponding to each process identifier in the commonality analysis result can be determined as the target process, thereby obtaining M target processes.
  • M is a positive integer greater than 1, and the value of M is equal to the number of process identifiers in the commonality analysis results.
  • the commonality analysis results include two process identifiers: process1 and process2
  • the process corresponding to process1 and the process corresponding to process2 can be used as the target process. In this way, based on the two process identifiers in the commonality analysis results, 2 can be determined. target process.
  • S404 Determine the correlation between each target process among the M target processes, and obtain each target process from the system code. A piece of code executed by a target process.
  • the association between each target process can be used to describe the hierarchical relationship between at least two processes among the M target processes. This relationship includes but is not limited to: father-son relationship, brother relationship, etc.
  • a process can be associated with one or more other processes. For example, process process1 is the parent process of process2, and process1 is the child process of process4. That is, process1 and process2 have a parent-child relationship, and process1 and process4 are also The father-son relationship is just that the role played by process1 in the father-son relationship is different.
  • the M target processes include a first process and a second process, that is, 2 target processes.
  • the first process and the second process are common processes when various unresponsive phenomena occur. Specifically, they refer to processes that will be run before the unresponsive phenomenon occurs.
  • the specific implementation method of determining the correlation between each of the M target processes may include the following: obtaining the attribute information of the first process and the attribute information of the second process from the on-site information of various unresponsive phenomena;
  • the attribute information of any process includes: the process number of any process and the process number of the process that calls any process; if the attribute information of the second process includes the process number of the first process or the attribute information of the first process, Including the process number of the second process, it is determined that the association between the first process and the second process is a parent-child relationship.
  • the on-site information of any kind of non-response phenomenon may include: attribute information corresponding to the process being run in the system code when the non-response phenomenon occurs.
  • the attribute information of the process (that is, the target process) corresponding to the process identifier in the common analysis result can be obtained from the on-site information of various non-response phenomena.
  • the target process includes a first process and a second process
  • the attribute information of the process obtained from the scene information specifically includes attribute information of the first process and attribute information of the second process.
  • the attribute information of a process is information used to describe the characteristics of a process.
  • the attribute information of any process includes the process number of the process and the process number of the process that calls any process. For example, any process is process A.
  • the attribute information of process A can include the process number of process A (that is, the process number of any process) and the process number of process B (that is, for any process). The process number of the calling process). Through the attribute information, you can know the process number of the process itself and which process specifically called the current process.
  • Process Identification PID
  • PID Process Identification
  • the process number can be represented by a natural number, such as 123, 605, or a binary representation, such as 001, 010. Here, There are no restrictions on the representation of process numbers.
  • the attribute information of the first process includes the process number of the first process and the process number of the process that calls the first process.
  • the attribute information of the second process includes the process number of the second process and the process number of the process that calls the second process. Number. Then you can select the attribute information of any process for analysis, including the following situations:
  • the attribute information of the second process includes the process number of the first process, it means that the process number of the process that calls the second process in the attribute information of the second process is the process number of the first process, that is, the second process When called by the first process or the first process calls the second process, it can be determined that the relationship between the first process and the second process is a parent-child relationship, and the first process is the parent process of the second process, and the second process is A child process of the first process.
  • the attribute information of the first process includes the process number of the second process, it means that the process number of the process that calls the first process in the attribute information of the first process is the process number of the second process, that is, the first process
  • the relationship between the first process and the second process is a parent-child relationship, and the first process is a child process of the second process, and the second process The process is the parent process of the first process.
  • the scene of cloud game ANR will be generated by various non-responses and collected. Live information. Therefore, the commonality analysis results may include relevant content of the process shown in Figure 3b.
  • the operating system has two sgame processes at the same time, namely the process with process number 1910 and the process with process number 4697. According to the process number, it can be found that the parent process number of the second process (that is, the process with process number 4697) is 1910, which is the first sgame process. From this we can know that there is a parent-child relationship between the two processes, and the second process is the child process of the first process, and the first process is the child process of the process with process number 110.
  • the attribute information of the second process does not include the process number of the first process, and the attribute information of the first process does not include the process number of the second process, it means that the process calling the second process is not the first process but another process.
  • the process that calls the first process is not the second process but another process, then it can be determined that the relationship between the first process and the second process is not a father-son relationship, but other relationships, such as a brother relationship, that is, the first process
  • the process and the second process are both sub-processes of the same process.
  • first process with process number 1910 and the second process with process number 4906 are both sgame processes, but the first process and the second process The parent process number of both is 110, that is to say, the first process and the second process are both child processes of the process with process number 110. It can be judged that these two processes are brothers.
  • the code fragments executed by each target process can be obtained from the system code. Since there are M target processes, M code fragments can be obtained specifically, so that Subsequently, the fault point causing the unresponsiveness phenomenon is determined from each code fragment. For details, see S405 below.
  • S405 Based on the correlation relationship, determine the fault point that causes the non-response phenomenon from the obtained M code fragments.
  • Each of the M code fragments is executed by a corresponding process among the M processes.
  • the fault point that causes the unresponsiveness can be determined from the obtained M code fragments.
  • the fault point may be a code statement or a piece of code in one of the M code fragments. Since the code snippet is executed when various non-response phenomena occur, this fault point is common to all kinds of non-response phenomena. Due to the existence of this fault point, any kind of non-response phenomenon is possible.
  • Triggered which type of non-response phenomenon is triggered can be determined based on other information, such as the event type of process execution, for example, when the process executes an input distribution event, it can be determined to be an ANR of input distribution timeout.
  • the target process can be determined through the process identifier included in the commonality analysis result, and further the association between each target process can be determined, and the code fragments executed by each target process can be obtained from the system code, and based on The correlation relationship and the obtained code snippets are used to analyze the fault points that cause unresponsiveness.
  • This method starts from the system code of the underlying operating system. Based on the common analysis of various unresponsive phenomena, the common analysis results are used to determine the system code that is executed when various unresponsive phenomena occur. From the system code The fault point can be identified, and the common cause of ANR can be determined from the system level so that fundamental repairs can be made to solve the ANR problem.
  • S406 Repair the system code according to the fault point to run the target application based on the repaired system code.
  • the solution provided by the embodiment of the present application can perform commonality analysis on the obtained on-site information of various unresponsiveness phenomena, and find out the common processes that are run when the target application becomes unresponsive (for example, the processes corresponding to the common process identifiers) , these shared processes can be further analyzed as target processes, and can be returned to the system code of the operating system.
  • the fault point that causes unresponsiveness is determined based on the correlation relationship and the code fragment executed by the target process.
  • the fault point is specifically determined from the code fragment executed by the target process.
  • the application unresponsiveness can be determined from the bottom layer of the system.
  • the root cause of the phenomenon can be solved in order to solve the problem of application unresponsiveness from the root cause.
  • the repaired system code can effectively reduce the occurrence of unresponsiveness during the operation of the target application. Improve application operation and overall system stability.
  • Figure 5 is a schematic flow chart of yet another data processing method provided by an embodiment of the present application.
  • the data processing method may include S501-S507:
  • S501 Obtain on-site information on various non-response phenomena among the multiple non-response phenomena of the target application.
  • S502 Perform commonality analysis on the on-site information of various non-response phenomena, and obtain commonality analysis results.
  • S503 Determine M target processes based on each process identifier in the commonality analysis result.
  • S504 Determine the association between each of the M target processes, and obtain the code fragments executed by each target process from the system code.
  • the M target processes include a first process and a second process, and the association between the first process and the second process is a parent-child relationship; where the first process is the parent process of the second process.
  • the second process is a child process of the first process.
  • the association described in the above embodiment can be determined based on the process number contained in the attribute information of the process.
  • the specific implementation of S505 may include the following: Based on the parent-child relationship, determine the obtained code fragment executed by the first process as the benchmark code fragment. ; and determine the first code statement from the benchmark code fragment.
  • the first code statement refers to: the code statement executed by the first process before the non-response phenomenon occurs; when the first code statement is a statement used to implement a function call operation When, in the code fragment executed by the first process, the call stack of the first process is analyzed along the first code statement.
  • the call stack includes various functions called by the first process; if there is a target function that failed to be called in the call stack, then The logic code of the target function is determined from the base code fragment; based on the code fragment executed by the second process, the fault point that causes the non-response phenomenon is determined from the logic code of the target function.
  • the code fragment executed by the first process contains many code statements. Before the unresponsiveness occurs, the first process may have executed some code statements in the code fragment and then stopped execution.
  • the second process has a similar principle. Since there is a parent-child relationship between the first process and the second process, the code executed by the first process as the parent process may contain code that creates the second process. Therefore, the code fragment executed by the first process can first be determined as a baseline. Code snippets are analyzed, and the benchmark code snippet is a code snippet used as a baseline for analysis.
  • the first code statement may be determined from the benchmark code fragment, and the first code statement is the code statement executed by the first process before the unresponsiveness occurs. The first code statement can be judged to determine whether the analysis conditions are met, and then further analysis can be performed. specifically:
  • the call stack of the first process can be analyzed along the first code statement in the code fragment executed by the first process.
  • relevant executable programs or system commands can be called to implement function calling operations.
  • the first code executed by the first process as the parent process may specifically be a statement that executes Runtime.getRuntime().exec(“xxx.exe”).
  • Runtime.getRuntime().exec() is used to call External executable program or system command
  • Runtime.getRuntime() returns the Runtime object of the current application.
  • the exec() method of the object instructs to create a child process to execute the specified executable program (here it is named "xxx. exe” executable program, "xxx.exe” indicates the name of the program to be executed), and returns the Process object instance corresponding to the sub-process. Through Process, you can control the execution of the sub-process or obtain the information of the sub-process.
  • the last code statement executed by the parent process before ANR occurs is the first code statement.
  • the first code statement can be further analyzed: In the code fragment executed by the first process, the first process can be analyzed along the first code statement.
  • the call stack includes various functions called by the first process. For the analysis of the call stack, corresponding debugging tools can be used according to the type of code fragment executed by the first process. For example, if the code fragment executed by the first process is Java code, you can use printStackTrace (a method of printing exception information on the command line in the program). (Debugging tool for the location and cause of the error).
  • strace a debugging tool used to intercept and record the system calls executed by the process and the signals received by the process. Tools
  • these debugging tools can analyze the call stack of the corresponding process.
  • the call stack can also be understood as a mechanism for the interpreter (such as the JavaScript interpreter in the browser) to follow the function execution flow. Through this mechanism, it is possible to know which function is being executed and which functions are called in the executed function body. Which function.
  • the following takes the occurrence of ANR in a cloud game scenario as an example. Based on the analysis of the ANR on-site information on the cloud game platform, it is determined that the operating system has two sgame processes at the same time, and the two processes have a father-son relationship. The scene is captured through debugging tools. Through the call stack, it is determined that the parent process is executing the Runtime.getRuntime().exec() statement (that is, executing the first code statement) before ANR. Following Runtime.getRuntime().exec() you can The parent process call stack is found as shown in Figure 6a.
  • the Runtime class calls the exec method, creates the ProcessBuilder class through the exec method, and creates the ProcessImpl class through the ProcessBuilder.start() method, then uses the start() method to create the Unix process, creates the Unix process object through new Unixprocess, and then based on the Unix process
  • the object forks a new process, which executes program code different from the Unix process, and obtains UnixProcess_md.c. Then it forks a new process based on the UNIX process and executes program code different from the UNIX process, and finally calls the startChild function.
  • the above is the call stack of the parent process.
  • the last function in the call stack of the parent process calls the startChild function, which is the approximate location of the program error.
  • the target function can be further analyzed: the logic code of the target function is determined from the code fragment executed by the first process.
  • the logic code is part of the code in the code fragment executed by the first process, based on the code fragment executed by the second process.
  • the executed code snippet determines the fault point that causes unresponsiveness from the logic code corresponding to the target function. At this time, the fault point is specifically the code statement in the logic code corresponding to the target function.
  • the code statements related to it can be analyzed based on the execution of the code statement.
  • the logic code of the target function includes a process creation statement.
  • the process creation statement is a statement used to create a child process, and the child process created through the process creation statement shares the same identity with the corresponding parent process. an address space.
  • the logic code of the target function as shown in Figure 6b is provided, which includes a process creation statement.
  • the target function is the startChild function analyzed based on the parent process call stack in Figure 6a.
  • vfork is a system call of Linux.
  • the process created by it shares the address space of the parent and child processes. That is to say, the child process created through the system call vfork shares the same address space with its corresponding parent process. In this way, the child process completely runs in the address space of the parent process. If the child process modifies a certain Variables will directly affect the parent process.
  • the second process is the child process of the first process, that is, the first process is the parent process and the second process is the child process.
  • the first process can create a second process by executing the process creation statement in the target function. In this way, the first process and the second process share the same address space.
  • the address space is a collection of all available resources.
  • the shared address space here can be a physical address space or a virtual address space.
  • the implementation method of determining the fault point causing the unresponsiveness phenomenon from the logic code of the target function may include the following: determining the second code statement from the code fragment executed by the second process; When the second code statement is a statement used to implement a data reading operation, the target resource that the second process needs to read is determined based on the second code statement; if the target resource is held by the first process, the logic code of the target function is included.
  • the process creation statement is determined as the fault point that causes unresponsiveness; among them, the second process is a child process of the first process; if the target resource is held by the first process, the second process is blocked; and when the second process When the blocked duration is greater than the duration threshold, the non-response phenomenon is triggered.
  • the second code statement can be determined from the code fragment executed by the second process.
  • the second code statement refers to the code statement executed by the second process before the non-response phenomenon occurs. Because before an ANR occurs, the second process may execute multiple code statements in the corresponding code fragment, and some code statements may not be suitable for subsequent analysis. Therefore, the second code statement can be judged to determine the code statement. Whether it meets the analysis conditions. specifically:
  • the second code statement is a statement used to implement a data reading operation, it means that the second code statement meets the analysis conditions, and the second code statement can be further analyzed:
  • the read data will be locked, that is, other processes cannot access the data, and the target resources required by the second process to perform the data reading operation can be determined.
  • the statement of the data reading operation may be, for example, a code statement for reading a file resource.
  • the second process executes the readdir() function to read /proc/self/fd, where readdir() is commonly used to traverse folders.
  • the file under /proc/self/fd represents the file descriptor in the current process directory.
  • the target resources required by the second process to perform the data reading operation may be available resources in the address space, such as CPU, memory and other hardware resources. If the target resource is held by the first process, it means that the first process holds the resources needed by the second process to read the data. The target resource, and the address space between the first process and the second process is shared, then the second process will be blocked and wait for the first process to release the target resource. When the second process is blocked for longer than the duration threshold, For example, the second process is blocked for 10 seconds and the duration threshold is 8 seconds. If the blocked duration is greater than the duration threshold, unresponsiveness will occur. Based on the above analysis, this is caused by the way the process is created.
  • the process creation statement in the logic code of the target function can be determined as the fault point that produces the unresponsiveness phenomenon.
  • the target resource is not held by the first process, it means that the second process can use the target resource, and then there will be no unresponsiveness.
  • the analysis of the target function is implemented in combination with the code fragments executed by the second process. Since the code fragments executed by the second process contain statements for data reading operations, the target resources required for the data reading operations are held by the first process. Sometimes, due to the shared address space between two processes, the unresponsiveness of the target application will be triggered. Based on this logic, it can be determined that the unresponsiveness is caused by the process creation statement in the logic code of the target function. This can then be identified as the point of failure.
  • the fault point includes the target function of the code fragment executed by the first process, specifically the process creation statement in the target function. Therefore, the implementation of the target function can be modified in the operating system.
  • the method of repairing the system code according to the fault point can be as described in S506 to S507 below.
  • the target statement for creating a child process is a code statement different from the process creation statement, although it has similar functions to the process creation statement, that is, both can create a child process.
  • the child process created by the target statement uses a different address space independently from the corresponding parent process.
  • the corresponding parent process is the process of the child process created by calling the target statement.
  • the address space between the child process and the corresponding parent process is not shared. In this way, when the child process performs a data reading operation, the required target resources will not be shared. It is held by the first process, but in an independent address space, which can effectively avoid the occurrence of unresponsiveness of the application.
  • the process creation statement includes a function field, and the function field stores the first system call function.
  • the process creation statement creates a child process through the first system call function.
  • the function field included in the process creation statement may store a first system call function, which may be provided by the operating system.
  • the specific implementation content of S506 includes: modifying the first system call function in the function field in the process creation statement to the second system call function to obtain the target statement used to create the child process; where, the target statement Create a child process through the second system call function.
  • the second system call function is a system call function different from the first system call function and is also provided by the operating system. Under the second system call function, the address spaces between the child process created through the second system call function and the corresponding parent process are independent of each other. Since the system call function is a kernel function provided by the operating system, the modification of the system call function is performed from the kernel level, which can solve the ANR problem on the operating system from the system kernel level. It can improve the compatibility and stability of the platform and effectively prevent and avoid ANR phenomena.
  • the process creation statement in the original system code can be replaced with the target statement.
  • the first system call function can be disabled and changed to the second system call function to realize the modification of the system code at the kernel level.
  • the second process created through the target statement uses a different address space independently from the first process.
  • the repair content of the system code of the operating system is a customized modification of the system code.
  • This modification involves changes to the system calls of the kernel, which can intercept possible ANR situations from the kernel side and the system side. , and cope with most scenarios where ANR may occur.
  • the modified content has no policies that are strongly bound to a specific platform, and there are no hard-coded parts, so it can be loosely coupled to the device or platform itself, so that the repaired operating system can be applied to any scenario, such as cloud gaming. , real terminal devices or simulators, etc., which can effectively avoid application unresponsiveness and improve the stability and compatibility of the platform or device.
  • this quality inspection includes: code review and security testing during system code writing. Conducting security checks during the coding process can identify security issues in customized system code and ensure the security of the repaired system code.
  • Code Review also known as code review, here refers to an operation of checking the compliance of source code with coding standards and code quality by reading the code. Code review can improve code quality and find potential errors (bugs). )wait. Security checks and code reviews can be performed using corresponding analysis tools. In this way, it can be ensured that all data conforms to expectations, and the problem of application unresponsiveness can be solved while ensuring the compatibility and stability of the operating system.
  • the data processing solution provided by the embodiment of the present application can obtain and collect on-site information on various unresponsive phenomena of the target application. Since the on-site information is used to describe the execution of the system code when the unresponsive phenomenon occurs, by analyzing Conduct commonality analysis on various ANR on-site information, analyze the execution of system code from the operating system level, find common characteristics that cause various non-response phenomena from the bottom layer, and obtain commonality analysis results. Common analysis here The results include a common process identifier. Based on the common process identifier, multiple common target processes are determined, and the code snippets executed by each target process are analyzed. Specifically, the execution of the code snippets by the process can be followed through the call stack.
  • FIG. 7 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the above-mentioned data processing device may be a computer program (including program code) running in a computer device.
  • the data processing device may be an application software; the data processing device may be used to execute corresponding steps in the method provided by the embodiments of the present application.
  • the data processing device 700 may include at least one of the following: an acquisition module 701 and a processing module 702.
  • the acquisition module 701 is used to obtain on-site information on various non-response phenomena among the multiple non-response phenomena of the target application. Any non-response phenomenon is generated in the process of running the target application based on the system code of the operating system; and The on-site information of any kind of non-response phenomenon is used to describe: the execution status of the system code when the corresponding non-response phenomenon occurs;
  • the processing module 702 is used to perform commonality analysis on the on-site information of various non-response phenomena, obtain commonality analysis results, and determine the fault point that causes the non-response phenomenon from the system code based on the commonality analysis results; wherein the commonality analysis results include : Common information in the on-site information of the various non-response phenomena;
  • the processing module 702 is used to repair the system code according to the fault point, so as to run the target application based on the repaired system code.
  • the system code includes multiple processes and the code fragments executed by each process;
  • the on-site information of any unresponsive phenomenon includes: each process that is running in the system code when the corresponding unresponsive phenomenon occurs.
  • the process identification; the processing module 702 is specifically used for: traversing each process identification in any on-site information for any on-site information of various unresponsive phenomena; in each on-site information except any on-site information. , search for the currently traversed process ID; if the currently traversed process ID is found, the currently traversed process ID is added to the commonality analysis results; if the currently traversed process ID is not found, continue to traverse any site information The identification of each process.
  • the common analysis results include: common process identifiers among the on-site information of various unresponsive phenomena; the processing module 702 is specifically used to: determine M target processes based on each process identifier in the common analysis results, M The value of is equal to the number of process identifiers in the commonality analysis results; determine the correlation between each target process in the M target processes, and obtain the code fragments executed by each target process from the system code; based on the correlation, from Determine the fault point that causes unresponsiveness among the M code snippets obtained.
  • the M target processes include a first process and a second process; the processing module 702 is specifically used to: obtain the attribute information of the first process and the second process from the on-site information of various non-response phenomena. Attribute information; the attribute information of any process includes: the process number of any process and the process number of the process that calls any process; if the attribute information of the second process includes the process number of the first process or the process number of the first process, If the attribute information includes the process number of the second process, it is determined that the association between the first process and the second process is a parent-child relationship.
  • the M target processes include a first process and a second process, and the first process and the second process The relationship between is a parent-child relationship; wherein the first process is the parent process of the second process; the processing module 702 is also specifically used to: based on the parent-child relationship, determine the obtained code fragment executed by the first process as the benchmark code Fragment; and determine the first code statement from the benchmark code fragment.
  • the first code statement refers to: the code statement executed by the first process before the non-response phenomenon occurs; when the first code statement is used to implement a function call operation statement, the call stack of the first process is analyzed along the first code statement in the code fragment executed by the first process.
  • the call stack includes various functions called by the first process; if there is a target function that failed to be called in the call stack, The logic code of the target function is determined from the benchmark code fragment; based on the code fragment executed by the second process, the fault point that causes the non-response phenomenon is determined from the logic code of the target function.
  • the logic code of the target function includes a process creation statement.
  • the process creation statement is a statement used to create a child process, and the child process created through the process creation statement shares the same address with the corresponding parent process.
  • the second code statement refers to: the code statement executed by the second process before the non-response phenomenon occurs; when the second process When the second code statement is a statement used to implement a data reading operation, the target resource required by the second process to perform the data reading operation is determined based on the second code statement; if the target resource is held by the first process, the target function's
  • the process creation statement in the logic code is determined to be the fault point that causes unresponsiveness; among them, the second process is a child process of the first process; if the target resource is held by the first process, the second process is blocked; and when When the second process is blocked for longer than the duration threshold, the non-response phenomenon is triggered.
  • the processing module 702 is specifically used to: determine the target statement used to create a child process.
  • the child process created by the target statement uses different address spaces independently from the corresponding parent process; adopt the target statement in the system code Replace process creation statements to fix system code.
  • the process creation statement includes a function field, and the function field stores the first system call function.
  • the process creation statement creates a child process through the first system call function; the processing module 702 is specifically used to: create the process The first system call function in the function field of the statement is modified into the second system call function, and a target statement for creating a sub-process is obtained; wherein, the target statement creates a sub-process through the second system call function.
  • various non-response phenomena of the target application include non-responsiveness of service type; the process of generating the non-response phenomenon of service type includes: setting a service duration threshold in response to a service creation request sent by the application process of the target application. ; Send a service creation message to the service process, so that the service process will notify the service process to call one or more other processes to perform service creation work based on the service creation message, and return feedback information after the service is successfully created; if the service time is not exceeded within the service duration threshold After receiving the feedback information returned by the service process, the service type non-response phenomenon occurs.
  • various non-response phenomena of the target application include broadcast-type non-response phenomena; the generation process of the broadcast-type non-response phenomenon includes: in response to a broadcast request initiated by the application process of the target application, setting a broadcast duration threshold ;Send a broadcast registration message to the broadcast receiving process, so that the broadcast receiving process will notify the broadcast receiving process to call one or more other processes to perform broadcast work based on the broadcast registration message, and return feedback information after the broadcast is completed; if it is within the broadcast duration threshold If the feedback information returned by the broadcast receiving process is not received, a broadcast-type unresponsive phenomenon occurs.
  • the various non-responsiveness phenomena of the target application include non-responsiveness phenomena of the content providing type;
  • the generation process of the non-responsiveness phenomenon of the content providing type includes: in response to the acquisition of content provision initiated by the application process of the target application.
  • Object request detect the startup status of the content provider process corresponding to the content provider object; if the startup status indicates that the content provider process has not been started, create the content provider process, and notify the content provider process to call one or more other processes to install the content provider object, and returns feedback information after installing the content providing object.
  • the content providing object is configured with an installation time threshold; if the feedback information returned by the content providing process is not received within the installation time threshold, a content providing type unresponsiveness will occur.
  • various unresponsive phenomena of the target application include unresponsive phenomena of the input event distribution type; the generation process of the unresponsive phenomenon of the input event distribution type includes: if an input event is received, the currently received Input events are added to the input queue and wake up the input distribution thread.
  • the input distribution thread is used to distribute each input event in the input queue to the application process of the target application for processing; when the distribution sequence of the currently received input event arrives When the application process of the target application is processing other input events, a non-response phenomenon of the input event distribution type will occur.
  • the computer device 800 may include an independent device (such as one or more servers, nodes, terminals, etc.), or may include components within the independent device (such as a chip, a software module or a hardware module, etc.).
  • the computer device 800 may include at least one processor 801 and a communication interface 802. Further optionally, the computer device 800 may also include at least one memory 803 and a bus 804. Among them, the processor 801, the communication interface 802 and the memory 803 are connected through the bus 804.
  • the processor 801 is a module that performs arithmetic operations and/or logical operations. Specifically, it can be a central processing unit (CPU), a graphics processor (GPU), a microprocessor unit (MPU). ), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), Complex programmable logic device (CPLD), co-processor (assist in central processing) (Complete corresponding processing and applications), microcontroller unit (Microcontroller Unit, MCU) and other processing modules, or a combination thereof.
  • CPU central processing unit
  • GPU graphics processor
  • MPU microprocessor unit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • CPLD Complex programmable logic device
  • co-processor assistst in central processing
  • MCU microcontroller Unit
  • other processing modules or a combination thereof.
  • Communication interface 802 may be used to provide information input or output to at least one processor. And/or, the communication interface 802 can be used to receive data sent from the outside and/or send data to the outside. It can be a wired link interface such as an Ethernet cable, or a wireless link (Wi-Fi, Bluetooth, General wireless transmission, vehicle short-range communication technology and other short-range wireless communication technology, etc.) interface. Communication interface 802 may serve as a network interface.
  • Wi-Fi Wi-Fi, Bluetooth, General wireless transmission, vehicle short-range communication technology and other short-range wireless communication technology, etc.
  • the memory 803 is used to provide storage space, and data such as operating systems and computer programs can be stored in the storage space.
  • the memory 803 may be a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), or a portable read-only memory.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • portable read-only memory One or more combinations of memory (compact disc read-only memory, CD-ROM), etc.
  • At least one processor 801 in the computer device 800 is used to call a computer program stored in at least one memory 803 to execute the data processing method described in the embodiment shown in this application.
  • the processor 801 in the computer device 800 is used to call at least one memory A computer program stored in 803 for performing the following operations:
  • the processor 801 is specifically configured to: obtain on-site information on various non-response phenomena of the target application. Any non-response phenomenon occurs when the system code based on the operating system runs the target. Generated during the application process; and the on-site information of any kind of non-response phenomenon is used to describe: when the corresponding non-response phenomenon occurs, the execution of the system code; the commonality analysis of the on-site information of various non-response phenomena is obtained.
  • Commonality analysis results and determine the fault point that causes unresponsiveness from the system code based on the commonality analysis results; perform repair processing on the system code according to the fault point, so as to run the target application based on the repaired system code; wherein, the commonality analysis results Including: common information in the on-site information of the various non-response phenomena.
  • the computer device 800 described in the embodiment of the present application can execute the description of the data processing method in the previous corresponding embodiment, and can also execute the description of the data processing device 700 in the previous corresponding embodiment of Figure 7, I won’t go into details here. In addition, the description of the beneficial effects of using the same method will not be described again.
  • an exemplary embodiment of the present application also provides a storage medium in which the computer program of the aforementioned data processing method is stored.
  • the computer program can be The description of the data processing method in the implementation embodiment will not be repeated here, and the description of the beneficial effects of using the same method will not be repeated here. It will be appreciated that the program instructions may be deployed and executed on one or multiple computer devices capable of communicating with each other.
  • the above-mentioned computer-readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or the internal storage unit of the above-mentioned computer equipment, such as the hard disk or memory of the computer equipment.
  • the computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the computer device, Flash card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.
  • a computer program product includes a computer program, and the computer program is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device executes the method in the embodiment of the present application.
  • Modules in the device of the embodiment of the present application can be merged, divided, and deleted according to actual needs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Stored Programmes (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请公开一种数据处理方法、装置、设备及存储介质,方法包括:获取目标应用的多种无响应现象中的各种无响应现象的现场信息,任一种无响应现象是在基于操作***的***代码运行目标应用的过程中产生的;现场信息用于描述在产生相应的无响应现象时***代码的执行情况;对各种无响应现象的现场信息进行共性分析,得到共性分析结果,并根据共性分析结果从***代码中确定产生无响应现象的故障点;根据故障点对***代码进行修复处理,以基于修复后的***代码运行目标应用。本申请可由云服务器执行,能够从操作***层面有效解决目标应用的无响应现象,降低发生任一种无响应现象的概率,通用性强,可提高整体稳定性。

Description

数据处理方法、装置、设备、存储介质和程序产品
本申请要求于2022年07月13日提交中国专利局、申请号为202210819928.7、申请名称为“数据处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及数据处理。
背景技术
随着计算机技术的发展,一些操作***的开源性得以让操作***被二次开发、定制,操作***所提供的越来越多有趣且实用的功能不断的给人们带来便捷的体验。
在操作***上支持各种应用程序的运行,但是如果应用程序响应不够灵敏时,可能会发生ANR(Application Not Responding,应用无响应)的现象,这种ANR现象的存在会影响使用体验。然而目前解决ANR的方案大多是基于他人的分析经验来定位自己所遇到的ANR问题,针对具体ANR的问题进行具体分析,这样受限于操作***版本、应用程序版本等具体的场景,并不一定就能保证解决本次ANR之后不因为其他原因再次发生ANR,通用性不强。
发明内容
本申请实施例提供一种数据处理方法、装置、设备、存储介质和程序产品,能够从操作***层面有效解决目标应用的无响应现象,降低发生任一种无响应现象的概率,通用性强,可提高整体稳定性。
一方面,本申请实施例提供了一种数据处理方法,包括:
获取目标应用的多种无响应现象中的各种无响应现象的现场信息,任一种无响应现象是在基于操作***的***代码运行目标应用的过程中产生的;且任一种无响应现象的现场信息用于描述:在产生相应的无响应现象时,***代码的执行情况;
对各种无响应现象的现场信息进行共性分析,得到共性分析结果,并根据共性分析结果从***代码中确定产生无响应现象的故障点;其中,所述共性分析结果包括:所述各种无响应现象的现场信息中的共有信息;
根据故障点对***代码进行修复处理,以基于修复后的***代码运行目标应用。
一方面,本申请实施例提供了一种数据处理装置,包括:
获取模块,用于获取目标应用的多种无响应现象中的各种无响应现象的现场信息,任一种无响应现象是在基于操作***的***代码运行目标应用的过程中产生的;且任一种无响应现象的现场信息用于描述:在产生相应的无响应现象时,***代码的执行情况;
处理模块,用于对各种无响应现象的现场信息进行共性分析,得到共性分析结果,并根据共性分析结果从***代码中确定产生无响应现象的故障点;其中,所述共性分析结果包括:所述各种无响应现象的现场信息中的共有信息;
处理模块,用于根据故障点对***代码进行修复处理,以基于修复后的***代码运行目标应用。
相应地,本申请实施例提供了一种计算机设备,包括:处理器、存储器以及网络接口; 处理器与存储器、网络接口相连,其中,网络接口用于提供网络通信功能,存储器用于存储计算机程序,处理器用于调用计算机程序,以执行本申请实施例中的数据处理方法。
相应地,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序当被处理器执行时,执行本申请实施例中的数据处理方法。
相应地,本申请实施例提供了一种计算机程序产品,计算机程序产品包括计算机程序或计算机指令,计算机程序或计算机指令被处理器执行时实现本申请实施例的数据处理方法。
在本申请实施例中,可获取到目标应用的多种无响应现象的现场信息,任一种无响应现象可在基于操作***的***代码运行目标应用的过程中产生的,且对应的现场信息可用于描述产生该无响应现象时***代码的执行情况。通过对获取到的各种无响应现象的现场信息进行共性分析,可以从操作***层面对***代码的执行情况进行分析,从底层寻找目标应用产生各种无响应现象的共同特性,得到共性分析结果,进而基于该共性分析结果可以从***代码中确定出产生无响应现象的故障点,并基于故障点对***代码进行修复,基于修复后的***代码运行目标应用可以从操作***侧对应用无响应现象(即目标应用的无响应现象)可能发生的情况进行拦截,尽可能地避免应用无响应现象的发生。可见,此种方式是从操作***层面着手,对实际发生的各种无响应现象的现场信息进行收集、共性分析,并基于分析得到的结果对原生的操作***的***代码进行修复,可以根本上解决操作***上应用无响应现象的问题,通用性强,能够减少在操作***上目标应用发生任一种无响应现象的情况,从而有效提高***或应用的稳定性。
附图说明
图1a是本申请实施例提供的一种应用无响应的分类示意图;
图1b是本申请实施例提供的一种触发应用无响应现象的过程示意图;
图1c是本申请实施例提供的一种启动服务的流程示意图;
图1d是本申请实施例提供的一种服务类型的无响应现象的产生过程示意图;
图1e是本申请实施例提供的一种广播类型的无响应现象的产生过程示意图;
图1f是本申请实施例提供的一种内容提供类型的无响应现象的产生过程示意图;
图1g是本申请实施例提供的一种输入分发类型的无响应现象的产生过程示意图;
图2a是本申请实施例提供的一种云游戏***的架构图;
图2b是本申请实施例提供的一种云游戏中无响应现象的提示示意图;
图2c是本申请实施例提供的一种云游戏场景下服务器中***的底层交互逻辑示意图;
图3a是本申请实施例提供的一种数据处理方法的流程示意图;
图3b是本申请实施例提供的一种云游戏场景下产生的无响应现象的现场信息的示意图;
图4是本申请实施例提供的另一种数据处理方法的流程示意图;
图5是本申请实施例提供的再一种数据处理方法的流程示意图;
图6a是本申请实施例提供的一种父进程调用栈的结果示意图;
图6b是本申请实施例提供的一种目标函数的逻辑代码的示意图;
图7是本申请实施例提供的一种数据处理装置的结构示意图;
图8是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的相关术语和概念进行介绍。
操作***:英文全称Operating System,简称OS。操作***是管理计算机硬件和软件资源的计算机程序。是计算机***中最基本的***软件。操作***具备处理机管理(例如进程控制、进程同步)、存储器管理(例如内存分配与回收、地址映射)、设备管理(例如文件存储空间管理、文件读写管理)和文件管理(例如缓冲管理、虚拟设备)等功能。操作***种类很多,比如常见的Android(或称为Android***)、Linux、Windows、IOS等等。
内核:内核是操作***的核心。内核可以将输入的命令转换成计算机硬件能够理解的机器语言,且内核直接和硬件联系,并可以告知硬件应用程序发起的请求。内核的功能包括但不限于:进程管理、任务调度、内存管理,等等。其中,文件管理是指内核使用文件***来组织文件,并通过文件***保持对文件数据存储、文件状态、访问设置的监测;进程管理是指在多进程环境下,内核决定哪一道进程被CPU(Central Processing Unit,中央处理器)优先运行,以及分配的运行时间片长度是多少;内存管理是指内核可以检测内存空间,生成或销毁内存,以确保应用程序被正确执行。
硬编码:是指将数据直接嵌入到程序或者其他可执行对象的源代码中的应用开发实践。与从外部获取数据或在运行时生成数据不同,硬编码通常只能通过编辑源代码和重新编译可执行文件来修改。在计算机程序或文本编辑中,硬编码是指将可变变量用一个固定值来代替的方法。
ANR:英文全称为Application Not Responding,中文为应用无响应(可简称无响应)。ANR是在操作***(例如Android***)中十分常见的现象,对于开发者来说可能是代码上的bug(错误),但是对于使用者来说,可能会造成不好的使用体验。操作***对于一些事件需要在一定时间内完成,如果超过预定时间未能得到有效响应或者响应时间过长,都会造成ANR。一般来说,***界面会弹出一个提示框,告知对象当前应用未响应,对象可以选择继续等待(Wait)或者强制关闭(Force close),是操作***的一种自我保护机制。此外,也可以在发生应用无响应时直接关闭发生应用无响应的进程而不弹出提示框以提示对象。
目标应用:本申请中目标应用是指计算机设备中运行的任一应用,例如终端中的应用程序或者是服务器中的服务程序。按照应用的安装方式划分,目标应用可以是免安装应用(例如小程序、网页应用(如购物网站)),或者是安装于计算机设备中的第三方应用程序;按照应用功能划分,目标应用具体可以是游戏类应用、音视频类应用、社交类应用、购物类应用等等中的任一种。
目标应用可以基于操作***所提供的环境运行,在基于操作***的***代码运行目标应用的过程中,目标应用可能会发生无响应的情况(即ANR)。在本申请实施例中,目标应用产生的ANR可包括如下几类:服务类型的无响应现象、广播类型的无响应现象、内容提供类型的无响应现象、输入分发类型的无响应现象。以Android操作***为例,服务类型的 无响应现象是指服务在预定时长(例如20s)内未执行完成(Service Timeout)而触发的无响应现象;广播类型的无响应现象是指广播在预定时长(例如10s)内未执行完成(BroadcastQueue Timeout)而触发的无响应现象;内容提供类型的无响应现象是指内容提供者在发布(publish)后超时(ContentProvider Timeout)而触发的无响应现象;输入分发类型的无响应现象是指输入事件分发超时(InputDispatching Timeout)而触发的无响应现象。以上四类ANR可以归为如图1a所示的应用无响应的分类示意图。对于以上提及的应用的各类无响应现象的产生过程(或者说触发机制)将在下述一一介绍。
从操作***的***源码角度来说,触发ANR的过程具体可以分为3个步骤,如图1b所示,包括:设置时长阈值、解除应用无响应(即应用的无响应现象,简称ANR)的触发条件、触发应用无响应(即ANR)。通过设置时长阈值可以判断相应事件的实际处理时长是否超时(即实际处理时长大于设置的时长阈值),时长阈值可以依据相应ANR进行设置,不同ANR的时长阈值可能相同或不同。若超时则触发ANR,若没有超时则不会触发ANR,并且解除ANR触发条件,此处解除ANR触发条件是指取消原有设置的时长阈值。
下面对各种ANR的产生过程进行详细介绍。在此,先对以下应用无响应现象中所涉及的共同的概念以及一些术语进行介绍。
目标应用是指计算机设备中当前运行的任一应用程序。目标应用的应用进程可与操作***底层的进程(例如***服务进程)进行交互,完成相应的事件。
***服务进程用于启动、管理整个Javaframework,***中重要的服务都可以在***服务进程中开启,例如组件管理服务AMS(ActivityManagerService)、窗口管理服务WMS(WindowManagerService)等。在Android操作***中,可以通过Zygote进程(用于孵化新进程的进程,在此称为孵化进程)孵化出***服务进程和目标应用的应用进程。以下内容具体是以操作***中的***服务进程作为执行主体进行描述。
Android操作***四大组件:Activity、Service、BroadCast Receiver、Content Provider。①Activity(组件)是对象操作的可视化界面,可以为对象提供完成操作指令的窗口。②Service(服务)是一个可以在后台执行长时间运行操作而没有对象界面的应用组件。由于Service通常在后台运行,一般不需与对象交互,因此Service组件没有图形对象界面,Service组件通常用于为其他组件提供后台服务或检测其他组件的运行状态。③BroadCast Receiver(广播接收器)可用于过滤出应用感兴趣的外部事件(例如电话呼入、数据网络可用时),并对其做出响应。BroadCast Receiver可以启动一个Activity或Service来响应它们收到的信息,或者用NotificationManager(通知管理者)来通知对象,例如播放声音的通知,或者是显示在状态栏的消息。④ContentProvider(内容提供者)使一个应用程序的指定数据集提供给其他应用程序,其他应用可以通过ContentResolver类从该内容提供者中获取或存入数据。具体地,ContentProvider将数据发布出来,可以通过ContentResolver对象结合Uri(Universal Resource Identifer,通用资源标识符)进行调用。其中,Uri代表数据操作的地址,每一个ContentProvider发布数据时都会有唯一的地址。
(一)服务类型的无响应现象。
服务类型的无响应现象的产生过程包括:响应于目标应用的应用进程发送的服务创建 请求,设置服务时长阈值;向服务进程发送服务创建消息,以使得服务进程基于服务创建消息,通知服务进程调用一个或多个其他进程执行服务创建工作,并在成功创建服务后返回反馈信息;若在服务时长阈值内未接收到服务进程返回的反馈信息,则产生服务类型的无响应现象。
目标应用的应用进程发送的服务创建请求可用于请求***创建目标应用所需的服务,例如收听服务、通知服务等,在云游戏场景下,该收听服务例如是收听玩家账号是否登录的服务,或者是收听玩家是否在线的服务。***服务进程可以响应于应用进程发起的服务创建请求设置服务时长阈值,该服务时长阈值可用于检测服务创建是否超时。例如前台服务时长阈值和后台服务时长阈值均为20s。接着,***服务进程可以向服务进程发送服务创建消息,以使得服务进程调用一个或多个进程执行服务创建工作,通过服务创建工作可以创建目标应用所需的服务。服务进程可以是由***服务进程中的组件管理服务请求预先创建的。具体地,服务进程中的主线程可以调用一个或多个其他进程(即不同于服务进程之外的进程)执行服务创建工作,这一个或多个进程中可以包括具备父子关系的进程,且父子进程之间可共享地址空间。在服务创建工作执行完成之后,可以由服务进程中的主线程可以向***服务进程发送反馈信息,此处的反馈信息可以是用于指示服务创建完成的通知消息。若在服务时长阈值内未接收到服务进程返回的反馈信息,则说明服务创建工作所花费的时长大于或等于服务时长阈值,发生服务超时的现象,进而产生服务类型的无响应现象。反之,若在服务时长阈值内接收到服务进程返回的反馈信息,则说明服务创建工作所花费的时长小于服务时长阈值,不会产生服务类型的无响应现象。
以Android操作***的ANR为例,对上述内容进行示例性介绍。Service Timeout发生于startService(启动服务)的时候,对于Service具体可以包括以下两类:前台服务,超时时间(即服务时长阈值)为20s;后台服务,超时时间为200s。目标应用中启动一个Service,具体可以通过调用API startService一行代码即可实现。从Android操作***层面来说,过程如下图1c所示的startService的简易流程示意图。服务创建启动的过程,主要是由***服务进程中的组件管理服务(AMS,ActivityManagerService)来完成。AMS通过socket(套接字,一种通信方式)通信向Zygote请求创建承载服务的创建进程(Create Process),其中包括请求创建线程(ActivityThread,即主线程)。服务运行于单独的创建进程中,对于运行本地服务可以不启动服务的过程,ActivityThread扮演者应用程序主线程的角色。之后Zygote通过fork,将Zygote进程复制生成新的进程,并将ActivityThread相关的资源加载到新进程。AMS向新生成的进程中的ActivityThread通过Binder(一种进程间的跨进程通信机制)通信的方式发送创建服务的请求。ActivityThread启动运行服务,具体可以由ActivityThread调用onCreate方法创建服务(createservice)。
如图1d所示,当应用进程(即APP进程)发起创建服务的请求(startService)时(如图中的步骤1),***中的***服务进程(即system_server进程)会分配一个空闲的线程binder_1(即通讯线程1)来接收该请求,紧接着向组件管理者ActivityManager发送服务超时消息SERVICE_TIMEOUT_MSG设置服务时长阈值(如图中的步骤2);接下来,binder_1通知服务进程(即service进程)的binder_3(即通讯线程3)准备执行处理工作(调度创建服务 scheduleCreateService)(如图中的步骤3);binder_3收到后交给主线程(即main线程,可对应于图1c中的主线程ActivityThread),将服务创建事件加入到main线程的任务队列(即发送消息sendMessage)(如图中的步骤4);接下来main线程会进行一系列工作,此处具体是服务创建工作,可包括图1c中由ActivityThread线程创建服务的过程(如图中的步骤5),完成service生命周期的启动(等待完成waitToFinish);完成上述工作后,main线程会向system_server进程汇报工作已完成(即完成服务创建工作serviceDoneExecuting),然后system_server进程中的binder_2线程(即通讯线程2)会收到消息(如图中的步骤6),若在服务时长阈值的时间内完成服务创建工作,则可以解除ANR触发条件而不会发生ANR,否则就会发生ANR。
(二)广播类型的无响应现象。
广播类型的无响应现象的产生过程包括:响应于目标应用的应用进程发起的发送广播请求,设置广播时长阈值;向广播接收进程发送广播注册消息,以使得广播接收进程基于广播注册消息,通知广播接收进程调用一个或多个其他进程执行广播工作,并在广播完成后返回反馈信息;若在广播时长阈值内未接收到广播接收进程返回的反馈信息,则产生广播类型的无响应现象。
广播机制用于进程/线程间通信,广播可以分为广播发送和广播接收。广播可以包括并行广播和串行广播。通常应用的无响应现象发生在串行广播的场景下。与服务类型的ANR的产生过程类似,***服务进程可以响应目标应用的应用进程发起的发送广播请求,设置广播时长阈值。目标应用的应用进程是广播发送端所在的进程。***服务进程可以向广播接收进程发送广播注册消息,以使得广播接收进程中的主线程可调用一个或多个其他进程执行广播工作。这一个或多个进程中可以包括具备父子关系的进程,且父子进程之间可共享地址空间。
广播接收进程是广播接收端所在的进程,可用于接收来自于其他应用程序或者是***的广播消息。广播工作具体可以广播各种事件,例如日期发生改变的广播、***完成启动的广播等,在云游戏场景下,该广播事件例如是网络切换广播、网络故障广播等等。在执行广播工作之前可以基于广播注册消息创建广播接收队列对接收到的广播事件进行有序处理。在主线程对广播工作处理完成之后,可以返回反馈信息,该反馈信息是用于指示广播工作完成的通知消息。若在广播时长阈值内未接收到广播接收进程返回的反馈信息,则广播工作所花费的时长大于或等于广播时长阈值,发生广播超时的现象,进而产生广播类型的无响应现象。反之,若在广播时长阈值内接收到广播进程返回的反馈信息,则说明广播创建工作所花费的时长小于广播时长阈值,不会产生广播类型的无响应现象。
以Android操作***的ANR为例,对广播类型的ANR进行更详细地介绍。具体可结合图1e所示的广播类型的ANR的产生过程的示意图。当目标应用的应用进程发起发送广播请求(sendBroadcast)时(如图中的步骤1),***服务进程可以分配一个空闲的线程binder_1(即通讯线程1)来接收该发送广播请求,紧接着向组件管理者ActivityManager发送广播超时消息BROADCAST_TIMEOUT_MSG(sendMessage)设置广播时长阈值(如图中的步骤2-3),接下来,binder_1通过广播注册消息通知广播接收进程(即Reciver进程)的binder_3线程(即 通讯线程3)准备执行处理工作(如图中的步骤4);binder_3收到后向主线程(即main线程)发送消息,将事件加入到main线程的任务队列(通讯线程3向主线程发送消息,sendMessage,如图中的步骤5);接下来main线程会进行一系列工作,此处为广播工作包括广播接收的生命周期的启动,若发现当前进程还有SP(SharedPreferences,一种数据存储方式)正在写入文件,需等SP数据持久化工作后(即主线程向排队工作循环线程发送消息,sendMessage,如图中的步骤6),由queued-work-looper线程(排队工作循环线程)向system_server进程汇报广播工作已完成(如图中的步骤7),反之,可以直接由主线程汇报广播工作已完成(即finishReceiver)。然后system_server进程中的binder_2线程(即通讯线程2)会收到消息,若在广播时长阈值的时间内完成所有工作,则可以解除ANR的触发条件而不会发生ANR(如图中的步骤8),否则就会发生ANR。
(三)内容提供类型的无响应现象。
内容提供类型的无响应现象的产生过程包括:响应于目标应用的应用进程发起的获取内容提供对象的请求,检测内容提供对象所对应的内容提供进程的启动状态;若启动状态指示内容提供进程未启动,则创建内容提供进程,并通知内容提供进程调用一个或多个其他进程,安装内容提供对象,以及在安装内容提供对象后返回反馈信息,内容提供对象配置有一个安装时长阈值;若在安装时长阈值内未接收到内容提供进程返回的反馈信息,则产生内容提供类型的无响应现象。
内容提供对象用于不同应用程序之间实现数据共享功能。内容提供进程用于安装并发布内容提供对象以提供内容数据,实现数据共享功能。***服务进程可以响应于目标应用的应用进程发起的获取内容提供对象的请求,检测内容提供进程的启动状态,若内容提供进程未启动,表示内容提供进程可能未被创建,则可以创建该内容提供进程并启动,内容提供进程在创建之后可以向***服务进程注册自己,并设置安装时长阈值,之后,通知内容提供进程执行内容提供对象的安装工作,具体可以由内容提供进程中的主线程调用一个或多个其他进程来执行。这一个或多个进程中可以包括具备父子关系的进程,且父子进程之间可共享地址空间。
在安装完成后可以返回反馈信息,该反馈信息是用于指示安装内容提供对象的工作完成的通知消息,还可用于指示发布内容提供对象,以返回获取到的内容提供对象所提供的内容数据。在云游戏场景下,该内容提供对象可以是***中的通讯录,所提供的内容数据可以是通讯录中的联系人的相关数据。可以理解的是,在本申请的具体实施方式中,涉及到通讯录等相关的数据,当本申请以上实施例运用到具体产品或技术中时,需要获得对象许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
若安装时长阈值内未接收到内容提供进程返回的反馈信息,则安装工作所花费的时长大于或等于安装时长阈值,发生内容提供超时的现象,进而产生内容提供类型的无响应现象。反之,若在安装时长阈值内接收到内容提供进程返回的反馈信息,则说明内容提供对象的安装工作所花费的时长小于安装时长阈值,不会产生内容提供类型的无响应现象。
以Android操作***的ANR为例,对内容提供类型的ANR进行更详细地介绍。具体可结 合图1f所示的广播类型的ANR的产生过程的示意图。当目标应用的应用线程发起获取内容提供对象ContentProvider的请求(getContentProvider)时(如图中的步骤1),***服务进程可以分配一个空闲的线程binder_1(即通讯线程1)来接收该获取ContentProvide的请求,若检测到ContentProvider尚未启动则先通过Zygote先fork出新进程(即通过Zygote fork创建新进程,如图中的步骤2),新的ContentProvider进程向***注册自己(attachapplicationlocked,附加应用程序锁);system_server进程的通信现场binder_2(即通讯线程2)接收到该注册消息(如图中的步骤3),向system_server进程内部的ActivityManager发送内容提供对象发布超时消息CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息,设置安装时长阈值(如图中的步骤4);接下来,binder_2通知provider进程(即内容提供进程)的binder_4线程(即通讯线程4)准备执行处理工作(绑定应用程序,bindApplication,如图中的步骤5);binder_4收到后交给主线程(即向main线程发送消息,sendMessage,如图中的步骤6),将事件加入到任务队列;接下来main线程会进行一系列工作,此处为内容提供对象的安装工作,可选地,还可以包括发布内容提供对象的工作,然后向system_server进程汇报安装工作已完成,返回获取到的内容提供对象的内容数据(发布内容提供对象,publishContentProvider,如图中的步骤7),然后system_server进程中的binder_3线程(即通讯线程3)收到消息,若在安装时长阈值内完成所有工作就可以解除ANR的触发条件而不会发生ANR(如图中的步骤8),否则就会发生ANR。
(四)输入事件分发类型的无响应现象。
输入事件分发类型的无响应现象的产生过程包括:若接收到一个输入事件,则将当前接收到的输入事件添加至输入队列中,并唤醒输入分发线程,输入分发线程用于将输入队列中的各个输入事件依序分发给目标应用的应用进程进行处理;当当前接收到的输入事件的分发顺序到达时,目标应用的应用进程在处理其他的输入事件,则产生输入事件分发类型的无响应现象。
具体地,***服务进程中的线程可以收听底层上报的输入事件,并在接收到输入事件时,可以将其添加至输入队列中等待被分发处理,同时,可以唤醒输入分发线程分发该输入队列中的输入事件,此时可以设置分发起始时间。在当前接收到的输入事件的分发顺序到达时,例如到达当前接收到的输入事件的分发时间点时,如果目标应用的应用进程还在处理其他输入事件,说明应用进程无法处理即将分发的当前接收到的输入事件,则会产生输入事件分发类型的无响应现象。在云游戏场景下,该输入事件具体可以是对象在游戏客户端中的操作事件,操作事件以操作流的方式抵达运行云游戏的服务器中,而无法被处理,发生ANR。
在另一种实现方式中,在当前接收到的输入事件的分发顺序到达时,目标应用的应用进程未处理其他的输入事件,则通过输入分发线程将当前接收到的输入事件分发给应用进程,以使应用进程调用一个或多个其他进程对当前接收到的输入事件进行处理,并在处理完成后返回反馈信息;若在处理时长阈值内未接收到应用进程返回的反馈信息,则产生输入事件分发类型的无响应现象。
在当前接收到的输入事件到达分发顺序时,例如基于分发起始时间点和处理时长阈值 确定到达该输入事件的分发时间点,则可以通过输入分发线程将当前接收到的输入事件分发给应用进程,具体是分发至应用进程的目标窗口中,应用进程可以调用一个或多个其他进程对当前接收到的输入事件进行处理,并在处理完成后返回反馈信息,该反馈信息是用于指示输入事件完成的通知消息。其中调用的一个或多个进程中可以包括具备父子关系的进程,且父子进程之间可共享地址空间。
若在处理时长阈值内未接收到应用进程返回的反馈信息,并且接收到新的输入事件,说明当前接收到的输入事件未在规定的处理时长阈值内处理完,而下一个输入事件则会等待当前接收到的输入事件的处理,从而会产生输入事件分发类型的无响应现象。反之,在处理当前接收到的输入事件时只要没有接收新的输入事件,无论在处理时长阈值内未接收到应用进程返回的反馈信息,或者是在处理时长阈值内接收到应用进程返回的反馈信息,都不会发生输入事件分发类型的无响应现象。
以Android操作***的ANR为例,对输入事件分发类型的ANR进行更详细地介绍。具体可结合图1g所示的输入事件分发类型的ANR的产生过程的示意图。首先,InputReader线程(即输入读取线程)通过EventHub(事件中心)收听底层上报的输入事件,一旦收到输入事件就将其放入mInBoundQueue队列(输入队列,用于存储将分发的输入事件)并唤醒InputDispatcher线程(输入分发线程)(如图中的步骤1);InputDispatcher线程开始输入事件分发,设置分发起点时间,先检测是否有正处理的事件,若没有则取出mInBoundQueue队头的事件,然后检查窗口是否就绪;当窗口就绪,则将事件移到outBoundQueue队列(输出队列,用于存储即将要分发给目标窗口的输入事件);此时,若应用管道对端连接正常则将数据从outBoundQueue取出,放入waitQueue队列(等待队列,用于存储等待目标窗口处理的输入事件)(如图中的步骤2-4),然后InputDispatcher发消息告知目标应用准备执行处理工作;此时目标应用的main线程(即主线程)接收输入事件,并将接收到的输入事件层层转发到目标窗口处理;完成上述工作,会发消息向system_server进程汇报工作完成(即发送完成信号,如图中的步骤7),接下来***会将该事件从waitQueue队列中移除;若当前输入***中正在处理某个耗时操作(例如文件操作),后续的每一次输入事件都会检测前一个输入事件是否超时,若超时就ANR。
基于以上对ANR的具体原理介绍,并对发生ANR的操作***的源码(即***代码)进行分析可知,Service、BroadcastQueue、ContentProvider、Input等场景下的ANR中应用进程都会与system_server进程进行交互,且都会与Zygote发请求创建新的进程(部分未示出,可以理解为新进程的创建都可以通过Zygote进程实现)。这些都是底层操作***中的内容。
为解决ANR问题,本申请提供了一种对***代码进行修复以解决应用的无响应现象的方案。具体而言,可以收集目标应用的各种无响应现象的现场信息,基于该现场信息可以得知目标应用产生无响应现象时***代码的执行情况,通过对收集到的各种无响应的现场信息进行共性分析,可确定共性分析结果,所谓共性分析,是指寻找共同要素的分析,基于共性分析得到的共性分析结果,可以从操作***的***代码中确定产生无响应现象的故障点,从而基于故障点对***代码进行修复,这样,基于修复后的***代码运行目标应用就可以有效降低无响应现象发生的概率,减少因发生无响应现象导致目标应用或操作*** 崩掉的情况,有利于***和应用的稳定运行。本方案从底层***角度出发,在操作***层面通过修复操作***的***代码,实现从根本上解决了操作***上应用无响应的问题,这是一种根源性的解决方法,具备通用性。
可见,从操作***层面出发,深入研究操作***中ANR的产生过程,得知ANR触发原理,并从操作***的源码分析几类ANR发生时的底层架构逻辑,可以通过更改***调用函数实现对操作***中ANR触发的架构逻辑的改造,从根源上解决操作***上的ANR问题,可以减少应用程序或者***崩掉的情况,提高***和应用的兼容性和稳定性。由于改造的是原生的操作***,且没有硬编码的使用,因此,与特定平台没有强绑定,可以适用于云游戏、终端以及模拟器等各种场景,场景通用性强,并且能够应对多种ANR,有效规避各种ANR问题。
可以理解的是,本方案可以应用于各种产生应用无响应的场景中,例如终端、模拟器以及云游戏场景。当应用于云游戏场景中时,可以解决云游戏场景中的ANR问题,提高云游戏平台的兼容性和稳定性,提高使用体验。其中,目标应用可以是云游戏应用。所谓云游戏是指一种以云计算技术为基础的在线游戏技术。
云计算技术属于一种云技术,所谓云技术(Cloud technology)是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。其中,云计算(cloud computing)是一种计算模式,它将计算任务分布在大量计算机构成的资源池上,使各种应用***能够根据需要获取计算力、存储空间和信息服务。提供资源的网络被称为“云”。“云”中的资源在使用者看来是可以无限扩展的,并且可以随时获取,按需使用,随时扩展,按使用付费。作为云计算的基础能力提供商,会建立云计算资源池(简称云平台,一般称为IaaS(Infrastructure as a Service,基础设施即服务)平台,在资源池中部署多种类型的虚拟资源,供外部客户选择使用。云计算资源池中主要包括:计算设备(为虚拟化机器,包含操作***)、存储设备、网络设备。本申请中,对于现场信息的共性分析也可以采用云计算技术。
在云游戏场景下,可以将游戏运行于云端服务器中,对于游戏中高消耗的渲染计算可以放置在云端服务器中进行,并以音视频流将画面和声音通过网络传输给玩家游戏终端,以操作流的方式将用户操作指令传输给云端服务器执行相应的计算。受益于当前移动通信技术,如5G(5th Generation Mobile Communication Technology,第五代移动通信技术,简称5G)高速发展,更高的传输带宽、更强的并发能力,带来了更低的网络延时,也为云游戏带来更多发展机会和更大的想象空间。
运行云游戏的环境可称为云游戏环境。在该云游戏环境下,可通过运行***容器的方式,将多个操作***运行在一个或多个独立的服务器(如采用ARM/x86等架构的服务器)上,并通过视频流的方式将相关图像传递至远端接收程序进行处理。其中,ARM架构是一种32位/或64位精简指令集的处理器架构,x86架构(The X86 architecture)是微处理器执行的计算机语言指令集。容器是指操作***级虚拟化的一种类型,容器可用于承载操作***;其通过隔离机制(例如namespace(命名空间))可实现:在内核态,多个操作***(即服务器操作***和设备操作***)共用同一内核;在用户态,多个操作***保持相互独立。 此处的服务器操作***是指服务器内的通用操作***,如Linux操作***、Android操作***等等;设备操作***是指容器内的操作***,如Android(安卓)操作***、IOS操作***等。
相应的,***容器是指容器的一种实例,其可基于服务器操作***(如Linux操作***)运行;例如该***容器可以是在Linux操作***之上运行的Android容器,Android容器加载的是Android镜像,所谓的镜像是一种文件存储形式;通过镜像将多个文件合并成一个镜像文件,可便于文件的分发和使用。应理解的是,本申请实施例所提及的***容器并不局限于Android容器;例如,若IOS操作***支持开源研发,则该***容器还可以是IOS容器,等等。可见,在本申请实施例所提出的云游戏环境下,可通过在一个独立的服务器上部署大量的***容器,充分利用服务器端强大的CPU(Central Processing Unit,中央处理器)能力以及GPU(Graphics Processing Unit,图形处理器)能力,实现高并发地执行***操作,提升云游戏的运行速度。
云游戏环境可以由云游戏***中的设备提供相应的运行资源支撑。请参见图2a所示的云游戏***的架构示意图,云游戏***可以包括至少一个边缘服务器11、多个游戏客户端12以及至少一个分析服务器13。其中,边缘服务器是运行有云游戏的服务器,如图2a中所示的各个边缘服务器内可部署至少一个***容器,并在***容器内安装一个或多个游戏APP(Application,应用程序),通过这些游戏APP可以运行一个或多个云游戏,这样,通过***容器就可以运行云游戏。每个***容器可以与至少一个游戏客户端12相连接,从而将云游戏的游戏画面以及声音传输相连接的游戏客户端12中。此外,在边缘服务器11运行云游戏的过程中发生ANR时,可以在游戏客户端12进行显示提示,如图2b所示。
分析服务器13是用于分析应用发生无响应现象的服务器。在至少一个边缘服务器11中的***容器中的云游戏应用在运行过程中,可能存在一个或多个边缘服务器中的应用发生不同类型的无响应现象,此时分析服务器13可以收集边缘服务器11中某个云游戏应用的至少两种无响应现象发生时产生的现场信息,并对各种无响应现象的现场信息进行共性分析,基于共性分析结果找出产生无响应现象的共有的故障点,通过该共有的故障点修复操作***的***代码,并基于修复后的***代码运行云游戏应用,解决应用无响应的问题。在另一个实施例中,上述工作也可以由一个目标边缘服务器执行,即至少一个边缘服务器11中的任一个边缘服务器可以获取多种无响应现象的现场信息,基于上述相同的原理修复操作***的***代码,以解决和规避应用无响应现象。
需要说明的是,边缘服务器和分析服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器,但并不局限于此。
游戏客户端12可以提供流媒体播放能力、人机交互能力以及通信能力等基本能力的终端设备,也可以是运行于终端设备中的应用程序。其中,终端设备包括但不限于:手机、电脑、智能语音交互设备、智能家电、车载终端、飞行器等等设备,本申请对此不作限制。 可以理解的是,图2a只是示例性的表征云游戏***的***架构,并不对云游戏***的具体架构进行限定;例如在其他实施例中,云游戏***中还可包括用于调度的后台服务器,等等。
基于上述所介绍的云游戏环境以及云游戏***,对于部署在服务器一侧的云游戏,其底层可以基于容器化技术、***与内核技术,并在此基础上还可以联合音视频技术、网络优化、计算资源管理等方面,打造出相应的云游戏平台(可为云游戏提供运行环境和服务),在该云游戏平台中可运行多种云游戏。下面以云游戏底层基于Android原生操作***和Linux内核为例,对云游戏运行过程***层面相关的交互逻辑进行示例性地说明。参见图2c所示,在服务器中***的底层交互逻辑,具体涉及***容器、Linux内核以及服务器所提供的硬件之间的交互。
在***容器运行云游戏的过程中,***容器或者是***容器中的游戏APP可以向操作***发送操作请求,并由操作***与其中的Linux内核进行交互,由Linux内核接收该操作请求,基于该操作请求Linux内核可以调用相关的硬件(例如中央处理器、图形处理器、内存等中的一种或多种)完成该操作请求对应的操作,当硬件按照操作请求执行完之后,可以将操作请求对应的操作结果通过Linux内核返回给***容器。举例来说,当操作请求是渲染请求时,可以基于渲染请求调用服务器所提供的图形处理器(GPU,Graphics Processing Unit)执行该渲染请求对应的渲染事件,得到渲染完成的游戏画面,并通过Linux内核返回渲染完成的游戏画面。在一种实现方式中,还可以调用***容器中的编码模块对返回的游戏画面进行图像压缩处理,得到压缩后的图像,在图像压缩过程中,可以通过Linux内核调用底层的硬件资源进行编码,得到编码数据(即压缩后的图像),并将压缩后的图像通过Linux内核返回至操作***,得到编码数据之后再以视频流的方式将压缩后的图像传输至游戏客户端中。
由于云游戏底层也是基于操作***,而ANR是操作***中常见的现象,因此,在运行云游戏的过程中,也会面临ANR的问题。可以采用上述所提及的对***代码进行修复以解决应用无响应现象的方案加以解决。
基于上述所提及的对***代码进行修复以解决应用无响应现象的方案,本申请提供了一种数据处理方法,该数据处理方法可以由计算机设备执行,计算机设备可以是终端或者服务器。当目标应用为云游戏应用时,此处的计算机设备可以例如是图2a所示的云游戏***中的任一分析服务器13。当然,该数据处理方法也可由终端和服务器共同执行,对此不作限定;为便于理解,以该数据处理方法由计算机设备执行为例进行如下说明。
可以理解的是,终端可以是手机、电脑、智能语音交互设备、智能家电、车载终端、飞行器等等设备,本申请对此不作限制。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器,但并不局限于此。
请参见图3a,图3a是本申请实施例提供的一种数据处理方法的流程示意图,该数据处 理方法可包括以下S301-S303:
S301,获取目标应用的多种无响应现象中的各种无响应现象的现场信息。
目标应用的多种无响应现象是指目标应用的至少两种无响应现象,例如前述介绍的服务类型的无响应现象、广播类型的无响应现象、内容提供类型的无响应现象以及输入事件分发类型的无响应现象中的至少两种。
任一种无响应现象是在基于操作***的***代码运行目标应用的过程中产生的。目标应用的无响应现象是指目标应用程序未响应的现象,具体是一些事件在预定时间内未能得到有效响应或者响应时间过长的现象。为便于描述,目标应用的无响应现象可简称为应用无响应(ANR)或者ANR现象。
操作***(Operation System,OS)用于控制和管理整个计算机***的硬件和软件资源,并合理的组织和调度计算机的工作和资源的分配,以提供给对象和其它软件方便的接口和环境。操作***的类别包括但不限于:Android操作***、Windows操作***、Linux操作***、iOS操作***等等。
操作***可以为应用程序提供运行环境,应用程序可以通过调用操作***提供的函数(即***调用)完成一些事件。因此,基于操作***的***代码运行目标应用是指在运行目标应用的过程中,操作***的***代码可以被调用。在此过程中,目标应用有可能发生无响应现象,例如调用Android操作***的***代码运行目标应用时,目标应用所需的收听服务在预定时间内未创建完成而引起的无响应现象。目标应用是指运行在计算机设备中的任一应用。例如在终端中运行的应用,或者在服务器中运行的服务程序。目标应用具体可以是游戏类应用、音视频类应用、社交类应用、购物类应用等等中的任一种,以云游戏场景为例,该目标应用可以是指部署在服务器的***容器中的云游戏应用或者云游戏服务程序。
目标应用的无响应现象可以在不同场景下产生,一种场景下产生的无响应现象可以对应一类无响应现象。以Android操作***为例,具体会在以下场景下产生无响应现象:①服务超时(Service Timeout),具体包括前台服务超时和后台服务超时,比如前台服务在20s内未执行完成;②广播超时(Broadcast Timeout),包括前台广播超时和后台广播超时,例如前台广播在10s内未执行完成;③内容提供者超时(ContentProvider Timeout),例如内容提供者在publish(发布)后超时10s发生无响应现象;④输入事件分发超时(InputDispatching Timeout),包括按键和触摸事件,具体地,包括按键响应分发超时(Key Dispatch Timeout)和触摸事件分发超时,例如在按键响应分发时长默认为5s,超过则会发生无响应现象。
每种无响应现象发生时都可以自动产生并收集现场信息,一种无响应现象可对应于一类现场信息。任一种无响应现象的现场信息用于描述:在产生相应的无响应现象时,***代码的执行情况。
无响应现象的现场信息是指在该无响应现象产生时操作***所抓取的相关信息。该现场信息可用于描述产生该种无响应现象时,操作***的***代码的执行情况。***代码的执行情况例如是***代码执行到哪一步发生无响应现象,在哪个进程或者线程中事件响应超时等内容。由前述图1d至图1g所介绍的ANR的产生过程可知,在ANR发生时会运行很多 进程,例如在服务类型的ANR中运行的进程有:***服务进程、服务进程以及其他未示出的进程,因此,现场信息可以包括在***代码中被运行的进程的相关信息,例如进程的进程标识。可选地,现场信息可以包括但不限于以下内容:ANR的发生时间、用于记录发生ANR前后进程的各个线程的栈(stack)的溯源(trace)文件、ANR类型、产生ANR时各个进程的信息(例如进程号、进程名称、进程执行开始时间和结束时间)等等。各种无响应的现场信息可以记录于相关日志中,在具体分析时,可以获取全部或者部分现场信息进行进一步地分析。
示例性地,以云游戏场景为例,在一个云游戏上应用无响应的现场可以如图3b所示。其中,docker exec–it b23 sh b2343985639e表示进入到容器名称为b2343985639e的容器中,查找对应镜像的配置文件所在位置。ps-ef|grep Sgame则表示在容器名为b2343985639e的容器中将查找到含有sgame关键字的进程并显示出来。如图3b所示的云游戏平台产生应用无响应现象的现场,获取到的现场信息包括在容器中各个进程(仅展示部分)的进程信息。以显示的第一行信息为例,包括进程的进程ID(PID,Process Identifiction)1910,该进程的父进程的ID(PPID,Parent Process Identifiction)110,cpu使用的资源百分比16,***启动时间12:54:18,使用CPU的时间为00:03:15,以及进程名称“com.ten.tmgp.sgame”等。
通过对各种无响应现象的现场信息进行收集,可以得到在各种无响应现象发生时,操作***的***代码的执行情况,这样有利于后续分析发生这些无响应现象的共同特性,以有效避免操作***之上的应用无响应现象。具体可以参见S302和S303的介绍。
S302,对各种无响应现象的现场信息进行共性分析,得到共性分析结果,并根据共性分析结果从***代码中确定产生无响应现象的故障点。
共性分析是指寻找共同特性的分析方法,此处对各种无响应现象的现场信息进行共性分析,即是指寻找产生各种无响应现象的共同特性,得到共性分析结果。共性分析结果包括各种无响应现象的现场信息中的共有信息,例如各种无响应现象的现场信息中都有的相同进程号。
基于共性分析结果可以从***代码中确定出产生无响应现象的故障点,该故障点是各种无响应现象发生的共同的故障点。故障点可以是***代码中导致各种无响应现象产生的故障代码。例如都是在使用***调用函数时发生的无响应现象。通过该故障点能够进一步得知导致应用无响应现象产生的共同原因,从而便于确定对***代码的修复方式。
由于导致产生ANR的原因有非常多,例如死锁导致ANR、I/O资源不足导致ANR以及主线程死循环导致ANR等等,若针对具体ANR具体分析的方法会受限于具体场景(例如操作***版本、应用版本、终端等),可能换一个场景解决ANR的方案就不适用,因此,对于各种场景通用性并不高。而本方案中所提供的分析机制是一种共性分析机制,通过寻找各种无响应现象的现场信息之间的共同特性,得到共性分析结果,并基于共性分析结果获知各种无响应现象发生的故障点。进而通过后续修复处理可以克服目标应用的多种无响应现象中的任一种无响应现象,有效降低应用无响应现象的发生概率,对于各种场景下都具备通用性和稳定性。
S303,根据故障点对***代码进行修复处理,以基于修复后的***代码运行目标应用。
由于故障点是从底层的***代码中确定的,因此基于故障点可以对操作***的***代码进行修复处理,得到修复后的***代码。此处的修复处理可以是指对***代码的修改,修复后的***代码是基于故障点对原有***代码的优化。基于修复后的***代码运行目标应用,可以在目标应用运行的过程中,调用修复后的***代码,由于***代码优化了发生无响应现象的故障点,因此可以在应用程序运行过程中有效避免产生无响应现象。
这样,通过对操作***的***代码的修复实现了对原生操作***的改造,由于是对***代码的修改,可以从原理上解决多种ANR发生时的问题,而操作***可以部署在任何设备或平台中。这样,本方案不仅可以应用于云游戏平台,有效解决云游戏平台因发生ANR的不稳定现象,提高云游戏平台的稳定性。还可以应用于真实的终端(例如手机)、模拟器(例如Android模拟器)产品以及操作***平台(例如其他Android平台)等等,场景通用性强,可以有效规避各种ANR现象。
通过本申请实施例提供的数据处理方案,支持对目标应用的多种无响应现象的现场信息进行收集,由于现场信息是用于描述对应产生该无响应现象时***代码的执行情况,通过对多种ANR的现场信息进行共性分析,可以从操作***层面对***代码的执行情况进行分析,从底层寻找产生各种无响应现象的共同特性,得到共性分析结果,基于共性分析结果确定出产生无响应现象的故障点,进而基于故障点修复操作***的***代码,并可基于修复后的***代码运行目标应用。这样,在运行目标应用的过程中,修复后的***操作可以从操作***侧对无响应现象可能发生的情况进行拦截,可以很好地应对绝大多数ANR场景,避免绝大多数ANR问题。可见,本方案可以从操作***的底层***代码出发,从根源上解决操作***上ANR的问题,通用性强,能够减少操作***或者是目标应用发生任一种无响应现象的情况,从而有效提高***或应用的稳定性。当应用于云游戏平台时,可以提高云游戏平台的兼容性和稳定性,提高玩家的游戏体验。
请参见图4,图4是本申请实施例提供的另一种数据处理方法的流程示意图,该数据处理方法可包括S401-S406:
S401,获取目标应用的多种无响应现象中的各种无响应现象的现场信息。
S402,对各种无响应现象的现场信息进行共性分析,得到共性分析结果。
在一个实施例中,***代码包括多个进程以及每个进程所执行的代码片段;任一种无响应现象的现场信息包括:在产生相应的无响应现象时,***代码中被运行的各个进程的进程标识。
***代码中可以有多个进程,进程是程序的一次执行过程,可执行程序运行起来后就是进程。进程可以在运行环境中执行代码。***代码中包括多个进程中每个进程所执行的代码片段,代码片段是***代码中由进程执行的部分***代码。具体地,当***代码被编译之后再运行,该***代码可以运行成多个进程实例,每个进程实例执行对应的代码片段。
由于无响应现象是在运行目标应用的过程中产生的,而目标应用的运行会调用操作***的***代码。***代码中包括的各个进程可以被运行,以执行对应的代码片段。因此,现场信息可以包括被运行的进程对应的进程标识。进程标识是用于标记进程的信息,该进程标识可以是进程名称或者进程名称的关键字等等。
基于以上内容,共性分析可以基于现场信息中包含的进程标识进行。S402的可选实现方式可以是:针对各种无响应现象的现场信息中的任一现场信息,遍历任一现场信息中的各个进程标识;在除任一现场信息以外的各个现场信息中,查找当前遍历的进程标识;若查找到当前遍历的进程标识,则将当前遍历的进程标识添加至共性分析结果中;若未查找到当前遍历的进程标识,则继续遍历任一现场信息中的各个进程标识。
任一现场信息是指获取到的各种无响应现象的现场信息中的任一种无响应现象的现场信息。可以遍历任一现场信息中的各个进程标识,在遍历过程中,针对当前正在遍历的进程标识,可以在其他无响应现象中的现场信息中查找当前遍历的进程标识。当前遍历的进程标识是在任一现场信息中正在遍历的进程标识。若查找到当前遍历的进程标识,则说明其他现场信息中存在与当前遍历的进程标识相同的进程标识,该进程标识对应的进程在产生其他无响应现象时同样被运行,查找到的当前遍历的进程标识是各种无响应现象共有的进程标识,可以将当前遍历的进程标识添加至共性分析结果中,若未查找到当前遍历的进程标识,则说明其他现场信息中不存在与当前遍历的进程标识相同的进程标识,进而可以继续遍历任一种无响应现场信息中的各个其他进程标识。
举例来说,获取到的多种无响应现象的现场信息包括4种无响应现象的现场信息,4种无响应现象的现场信息分别为:A类无响应现象的a现场信息、B类无响应现象的b现场信息、C类无响应现象的c现场信息以及D类无响应现象的d现场信息。假设现场信息中包含进程标识是进程名,当前正在遍历a现场信息中的进程名game,可以在b现场信息、c现场信息以及d现场信息中进行查找,当b现场信息、c现场信息以及d现场信息中都找到该进程名,可以将查找到的进程名添加至分析结果中。可以理解的是,以上共性分析的具体实现方式是一种可选的方式,也可以采用其他方式进行共性分析,例如基于线程进行共性分析,在此不做限制。
可见,通过任一种无响应现象的现场信息中正在遍历的进程标识为基准,查找其他各个现场信息中相同的进程标识,可以实现对***代码中共有特性的分析,通常进程标识是简洁的表示,例如数字或简单的字符,可以高效地查找相同的进程标识时,提高共性分析的效率。
在一种实现方式中,共性分析结果中包括各种无响应现象的现场信息之间共有的进程标识。在此基础上,基于共性分析结果从***代码中确定故障点的方式可以采用以下S403~S405的实现方式。其中,共有的进程标识可以采用上述介绍S402的可选实现方式实现,也可以采用其他方式。
S403,基于共性分析结果中的各个进程标识确定M个目标进程。
由于进程标识用于标记进程,因此,可以将共性分析结果中每个进程标识对应的进程都可以确定为目标进程,从而得到M个目标进程。其中,M为大于1的正整数,M的取值等于共性分析结果中的进程标识的数量。举例来说,共性分析结果中包括2个进程标识:process1和process2,那么process1对应的进程以及process2对应的进程均可以作为目标进程,这样基于共性分析结果中的2个进程标识就可以确定出2个目标进程了。
S404,确定M个目标进程中的各个目标进程之间的关联关系,并从***代码中获取各 个目标进程所执行的代码片段。
各个目标进程之间的关联关系可用于描述M个目标进程中至少两个进程之间的层级关系。该关联关系包括但不限于:父子关系、兄弟关系等。一个进程可以与其他一个或多个进程之间存在关联关系,比如,进程process1为process2的父进程,而process1又为process4的子进程,也即process1和process2为父子关系,且process1和process4也为父子关系,只是在父子关系中进程process1所扮演的角色不同。
在一个实施例中,M个目标进程包括第一进程和第二进程,即2个目标进程。第一进程和第二进程是各种无响应现象产生时的共有进程,具体是指无响应现象产生前都会被运行的进程。
确定M个目标进程中各个目标进程之间的关联关系的具体实现方式,可以包括以下内容:从各种无响应现象的现场信息中,获取第一进程的属性信息以及第二进程的属性信息;任一进程的属性信息包括:任一进程的进程号以及对任一进程进行调用的进程的进程号;若第二进程的属性信息中包括第一进程的进程号或者第一进程的属性信息中包括第二进程的进程号,则确定第一进程和第二进程之间的关联关系为父子关系。
任一种无响应现象的现场信息中可以包括:产生该种无响应现象时,***代码中被运行的进程对应的属性信息。可以从各种无响应现象的现场信息中,获取共性分析结果中的进程标识所对应的进程(即目标进程)的属性信息。此处,目标进程包括第一进程和第二进程,从现场信息中获取到的进程的属性信息具体包括第一进程的属性信息和第二进程的属性信息。进程的属性信息是用于描述进程的特征的信息,任一进程的属性信息包括该进程的进程号以及对任一进程进行调用的进程的进程号。例如任一进程为进程A,若进程B调用了进程A,则进程A的属性信息便可包括进程A的进程号(即任一进程的进程号)以及进程B的进程号(即对任一进程进行调用的进程的进程号)。通过属性信息,可以得知进程本身的进程号以及具体是哪个进程调用当前进程。进程号(process Identification,PID)可以是在操作***创建进程时为进程分配的唯一标识号,进程号可以通过自然数表示,例如123、605,也可以通过二进制表示,例如001、010,在此对进程号的表示不做限制。
第一进程的属性信息包括第一进程的进程号以及对第一进程进行调用的进程的进程号,第二进程的属性信息包括第二进程的进程号以及对第二进程进行调用的进程的进程号。接着可以选择任一进程的属性信息进行分析,具体包括以下几种情况:
(1)若第二进程的属性信息中包括第一进程的进程号,说明第二进程的属性信息中对第二进程进行调用的进程的进程号为第一进程的进程号,即第二进程被第一进程调用或者说第一进程调用第二进程,则可以确定第一进程和第二进程之间的关联关系为父子关系,并且第一进程为第二进程的父进程,第二进程为第一进程的子进程。
(2)若第一进程的属性信息中包括第二进程的进程号,说明第一进程的属性信息中对第一进程进行调用的进程的进程号为该第二进程的进程号,即第一进程被第二进程调用或者说第二进程调用第一进程,则也可以确定第一进程和第二进程之间的关联关系为父子关系,且第一进程为第二进程的子进程,第二进程为第一进程的父进程。
示例性地,如前述图3b所示云游戏ANR的现场是各种无响应都会产生该现场,并收集 现场信息。因此,共性分析结果中可以包括图3b所示的进程的相关内容。如图3b可知,操作***同时有2个sgame进程,分别为进程号为1910的进程以及进程号为4697的进程。根据进程号可以发现第2个进程(即进程号为4697的进程)的父进程号为1910,也就是第1个sgame进程。由此可以得知这2个进程之间是父子关系,且第2个进程为第1个进程的子进程,而第1个进程又是进程号为110的进程的子进程。
(3)若第二进程的属性信息中不包括第一进程的进程号,第一进程的属性信息也不包括第二进程的进程号,说明调用第二进程的进程不是第一进程而是其他进程,调用第一进程的进程也不是第二进程而是另外的进程,则可以确定第一进程与第二进程之间的关联关系不是父子关系,而是其他关系,例如兄弟关系,即第一进程和第二进程都为某个相同进程的子进程,举例来说,假设进程号为1910的第一进程以及进程号为4906的第二进程都是sgame进程,但是第一进程和第二进程的父进程号都是110,也就是说,第一进程和第二进程都为进程号为110的进程的子进程。可以判断出这两个进程为兄弟关系。
可见,通过进程的属性信息中包括的进程本身的进程号与对该进程进行调用的进程的进程号,可以确定出与其他进程的进程号之间的关系,从而十分方便地确定出进程之间的关联关系。
由于进程在运行时会执行***代码中对应的代码片段,因此,可以从***代码中获取到各个目标进程所执行的代码片段,由于目标进程有M个,具体可以获取到M个代码片段,以便于后续从各个代码片段中确定产生无响应现象的故障点,具体可参见下述S405。
S405,基于关联关系,从获取到的M个代码片段中确定产生无响应现象的故障点。
M个代码片段中每个代码片段由M个进程中对应进程所执行。通过M个目标进程中各个进程之间的关联关系,可以从获取到的M个代码片段中确定产生无响应现象的故障点。该故障点可能是M个代码片段中某个代码片段中的一个代码语句或者是一段代码。由于代码片段是各种无响应现象发生时都会执行的,因此,该故障点是对于各种无响应现象而言共有的故障点,由于该故障点的存在,任一种无响应现象都有可能被触发,具体触发的是哪种无响应现象,可以结合其他信息进行判断,如进程执行的事件类型,例如进程执行输入分发事件,则可以判定是输入分发超时的ANR。
以上S403~S405的步骤中,通过共性分析结果中包括的进程标识可以确定目标进程,进一步可以确定各个目标进程之间的关联关系,从***代码中获取各个目标进程所执行的代码片段,并基于关联关系和获取到的代码片段分析出产生无响应现象的故障点。此种方式从底层的操作***的***代码着手,基于对各种无响应现象进行共性分析之后,将共性分析结果用于确定出各种无响应现象产生时都被执行的***代码,从***代码中确定出故障点,进而能够从***层面确定出导致ANR的共同原因,以便于从根本上修复以解决ANR的问题。
S406,根据故障点对***代码进行修复处理,以基于修复后的***代码运行目标应用。
本申请实施例所提供的方案,可以对获取到的多种无响应现象的现场信息进行共性分析,查找出目标应用发生无响应现象时被运行的共有进程(例如共有的进程标识对应的进程),这些共有进程可以作为目标进程被进一步分析,具体可以回归到操作***的***代码 中,基于关联关系和目标进程所执行的代码片段中确定产生无响应现象的故障点,该故障点具体是从目标进程所执行的代码片段中确定,这样,就能够从***底层确定应用无响应现象产生的根本原因,以便于从根源上解决应用无响应的问题,在基于故障点对***代码进行修复之后,修复后的***代码可以有效减少目标应用的运行过程中产生无响应现象的情况,提高应用运行以及***的整体稳定性。
请参见图5,图5是本申请实施例提供的再一种数据处理方法的流程示意图,该数据处理方法可包括S501-S507:
S501,获取目标应用的多种无响应现象中的各种无响应现象的现场信息。
S502,对各种无响应现象的现场信息进行共性分析,得到共性分析结果。
S503,基于共性分析结果中的各个进程标识确定M个目标进程。
S504,确定M个目标进程中的各个目标进程之间的关联关系,并从***代码中获取各个目标进程所执行的代码片段。
S505,基于关联关系,从获取到的M个代码片段中确定产生无响应现象的故障点。
在一个实施例中,M个目标进程包括第一进程和第二进程,且第一进程和第二进程之间的关联关系为父子关系;其中,第一进程是第二进程的父进程。第二进程是第一进程的子进程。对于第一进程和第二进程之间的关联关系的确定,可以采用上述实施例中所介绍的根据进程的属性信息中所包含的进程号确定出关联关系。
在基于图4对应的实施例中相关部分介绍的内容的基础之上,S505的具体实现方式可以包括以下内容:基于父子关系,将获取到的第一进程所执行的代码片段确定为基准代码片段;并从基准代码片段中确定第一代码语句,第一代码语句是指:在发生无响应现象之前,第一进程所执行的代码语句;当第一代码语句是用于实现函数调用操作的语句时,在第一进程执行的代码片段中沿着第一代码语句分析出第一进程的调用栈,调用栈包括第一进程调用的各种函数;若调用栈中存在调用失败的目标函数,则从基准代码片段中确定目标函数的逻辑代码;基于第二进程所执行的代码片段,从目标函数的逻辑代码中确定产生无响应现象的故障点。
第一进程所执行的代码片段中包含许多代码语句,在发生无响应现象的时刻之前,第一进程可能执行了代码片段中的部分代码语句就停止执行了。第二进程也是类似的原理。由于第一进程和第二进程之间是父子关系,第一进程作为父进程,其所执行的代码可能存在创建第二进程的代码,因此首先可以对第一进程所执行的代码片段确定为基准代码片段进行分析,基准代码片段是用于作为分析基准的代码片段。可以从基准代码片段中确定第一代码语句,该第一代码语句是在发生无响应现象之前,第一进程所执行的代码语句。可以对第一代码语句进行判断,确定是否满足分析条件,再执行进一步地分析。具体地:
(1)当第一代码语句是用于实现函数调用操作的语句时,可以在第一进程执行的代码片段中沿着第一代码语句分析出第一进程的调用栈。
基于第一代码语句可调用相关可执行程序或***命令,实现函数调用操作。示例性地,作为父进程的第一进程执行的第一代码具体可以是执行Runtime.getRuntime().exec(“xxx.exe”)的语句。其中,Runtime.getRuntime().exec()用于调用 外部可执行程序或者是***命令,Runtime.getRuntime()返回当前应用程序的Runtime对象,该对象的exec()方法指示创建一个子进程执行指定的可执行程序(此处即为名为“xxx.exe”的可执行程序,“xxx.exe”表示要执行的程序名),并返回与该子进程对应的Process对象实例,通过Process可以控制该子进程的执行或获取该子进程的信息。
父进程在发生ANR之前执行的最后一条代码语句为第一代码语句,可以对第一代码语句进一步分析:在第一进程所执行的代码片段中,沿着第一代码语句可分析出第一进程的调用栈,该调用栈包括第一进程调用的各种函数。对于调用栈的分析可以根据第一进程执行的代码片段的类型采用相应的调试工具,例如第一进程执行的代码片段为Java代码,则可以使用printStackTrace(一种在命令行打印异常信息在程序中出错的位置及原因的调试工具),第一进程执行的代码片段为Native C代码,则可以使用strace(一种用来拦截和记录进程所执行的***调用,以及进程所收到的信号的调试工具),这些调试工具都可以分析出相应进程的调用栈。调用栈也可以理解为是解释器(比如浏览器中的JavaScript解释器)跟进函数执行流的一种机制,通过这种机制,能够得知哪个函数正在执行,执行的函数体中又调用了哪个函数。
下面以云游戏场景下发生ANR为例,根据云游戏平台上ANR的现场信息进行分析,确定操作***同时有2个sgame的进程,并且这两个进程为父子关系,通过调试工具抓取出现场的调用栈,通过调用栈确定出父进程在ANR之前是在为执行Runtime.getRuntime().exec()的语句(即执行第一代码语句),沿着Runtime.getRuntime().exec()可以发现父进程调用栈如图6a所示。具体地,Runtime类调用exec方法,通过exec方法创建ProcessBuilder类,并且通过ProcessBuilder.start()方法创建ProcessImpl类,然后使用start()方法创建Unix进程,通过new Unixprocess创建Unix进程对象,再基于Unix进程对象fork出一个新进程,新进程执行和Unix进程不同的程序代码,得到UnixProcess_md.c,然后基于UNIX进程fork出一个新进程并执行和UNIX进程不同的程序代码,最后调用startChild函数。以上即为父进程的调用栈,父进程调用栈中最后一个函数调用startChild函数则是程序出错的大致位置。
若调用栈中存在调用失败的目标函数,说明第一进程执行的代码片段中出错,并且出错的位置是在目标函数处,如上述图6a示出的父进程调用栈中的函数调用只显示到startChild函数,说明第一进程执行到startChild函数为调用时程序崩溃。因此,可以进一步对目标函数进行分析:从第一进程所执行的代码片段中确定出目标函数的逻辑代码,该逻辑代码是第一进程所执行的代码片段中的部分代码,基于第二进程所执行的代码片段,从目标函数对应的逻辑代码中确定产生无响应现象的故障点。此时,故障点具体是目标函数对应的逻辑代码中的代码语句。
(2)当第一代码语句不是用于实现函数调用操作的语句时,例如第一代码语句为其他代码语句时,则可以根据该代码语句的执行情况分析与其相关的代码语句。
可见,对具备父子关系的进程所执行的代码片段进行分析,具体通过分析父进程的第一进程的调用栈,可以大致地获知父进程对代码片段的实际执行情况,从而可以基于实际执行情况,确定具体将从执行崩掉的目标函数的逻辑代码中定位产生无响应现象的故障点,进一步缩小定位范围。
在一种实现方式中,目标函数的逻辑代码中包括进程创建语句,该进程创建语句是用于创建子进程的语句,且通过进程创建语句所创建的子进程与相应的父进程之间共享同一个地址空间。
示例性地,基于图6a的父进程调用栈中的目标函数,提供如图6b所示为目标函数的逻辑代码,其中包含进程创建语句。目标函数是基于图6a的父进程调用栈所分析出来的startChild函数,如图6b所示的startChild函数的具体逻辑为:如果START_CHILD使用vfork(if START_CHILD_USE_VFORK),则可以通过进程创建语句Volatile pid_t resultPid=vfork()创建进程。对于采用此种方式的解释如下:将对vfork的调用分离到一个单独的startChild函数中,可以确保防止子堆栈破坏父堆栈,正如gcc警告所建议的那样,其中,gcc警告具体为:变量“foo”可能被“longjmp”或“vfork”破坏,foo表示数据、功能或命令的变量,longjmp是一种跳转方式。其中,vfork是Linux的一个***调用,其创建的进程,父子进程的地址空间共享。也就是说,通过***调用vfork创建的子进程,和其对应的父进程之间的共享同一个地址空间,这样,子进程是完全运行在父进程的地址空间上的,若子进程修改了某个变量,会直接影响到父进程。
对于具备父子关系的第一进程和第二进程,第二进程为第一进程的子进程,即第一进程为父进程,第二进程为子进程。第一为进程通过执行目标函数中的进程创建语句可以创建出第二进程,这样,第一进程和第二进程之间就共享同一个地址空间。其中,地址空间是所有可用资源的集合,此处共享的地址空间可以是物理地址空间或者是虚拟地址空间。
基于第二进程所执行的代码片段,从目标函数的逻辑代码中确定产生无响应现象的故障点的实现方式可以包括以下内容:从第二进程执行的代码片段中确定第二代码语句;当第二代码语句是用于实现数据读取操作的语句时,根据第二代码语句确定第二进程所需读取的目标资源;若目标资源被第一进程持有,则将目标函数的逻辑代码中的进程创建语句,确定为产生无响应现象的故障点;其中,第二进程是第一进程的子进程;若目标资源被第一进程持有,则第二进程被阻塞;且当第二进程被阻塞的时长大于时长阈值时,无响应现象被触发。
首先可以从第二进程执行的代码片段中确定第二代码语句,该第二代码语句是指:在发生无响应现象之前,第二进程所执行的代码语句。由于在发生ANR之前,第二进程可能执行对应代码片段中的多条代码语句,一些代码语句可能并不适合采用后续的分析方式进行分析,因此,可以对第二代码语句进行判断确定该代码语句是否符合分析条件。具体地:
(1)当第二代码语句是用于实现数据读取操作的语句时,说明第二代码语句符合分析条件,可以对第二代码语句进行进一步地分析:第二代码语句执行数据读取操作时会对读取的数据进行上锁,即其他进程无法访问该数据,可以确定出第二进程执行数据读取操作所需的目标资源。该数据读取操作的语句例如可以是读取文件资源的代码语句,示例性地,第二进程执行readdir()函数以读取/proc/self/fd,其中,readdir()常用来遍历文件夹下的文件,/proc/self/fd表示当前进程目录中的文件描述符。
第二进程执行读取数据操作所需的目标资源可以是地址空间中的可用资源,例如CPU、内存等硬件资源。若目标资源被第一进程持有,说明第一进程持有第二进程读取数据所需 的目标资源,而第一进程和第二进程之间的地址空间又共享,那么第二进程就会被阻塞并等待第一进程释放该目标资源,在第二进程被阻塞的时长大于时长阈值的时,例如第二进程被阻塞的时长为10s,而时长阈值为8s,被阻塞的时长大于时长阈值,就会产生无响应现象。基于上述分析,这是由于创建进程的方式所导致的,因此,可以将目标函数的逻辑代码中的进程创建语句确定为产生无响应现象的故障点。反之,若目标资源未被第一进程持有,说明第二进程可以使用目标资源,那么就不会产生无响应现象。
(2)当第二代码语句不是用于实现数据读取操作的语句时,则可以根据该第二代码语句所指示的具体内容,进行其他内容的分析。
可见,对于目标函数的分析是结合第二进程执行的代码片段实现的,对于第二进程执行的代码片段由于存在数据读取操作的语句,数据读取操作所需的目标资源被第一进程持有时,由于两个进程之间的地址空间共享,从而会触发目标应用的无响应现象,基于此逻辑,可以确定出产生无响应现象是由于目标函数的逻辑代码中的进程创建语句导致的,进而可以将其确定为故障点。
根据以上对产生无响应现象的故障点的确定方式的介绍,可以明确故障点包括第一进程所执行的代码片段的目标函数,具体是目标函数中的进程创建语句。因此,可以在操作***中修改目标函数的实现,在一个实施例中,根据故障点对***代码进行修复的方式可以采用如下S506至S507介绍的内容。
S506,确定用于创建子进程的目标语句。
创建子进程的目标语句是不同于进程创建语句的代码语句,虽然和进程创建语句有类似的功能,即都能创建子进程。但是,目标语句所创建的子进程与相应的父进程独立使用不同的地址空间。相应的父进程是调用目标语句所创建的子进程的进程,子进程与相应的父进程之间的地址空间不共享,这样,子进程在执行数据读取操作时,所需的目标资源不会被第一进程持有,而是在独立的地址空间中,从而可以有效避免应用的无响应现象的产生。
在一个实施例中,进程创建语句中包括函数字段,且函数字段存储第一***调用函数,进程创建语句是通过第一***调用函数创建子进程的。进程创建语句中包括的函数字段可存储第一***调用函数,该第一***调用函数可以由操作***提供。在操作***的***代码中,进程创建语句可以通过第一***调用函数创建子进程。举例来说,如前述的进程创建语句Volatile pid_t resultPid=vfork(),其中,函数字段为resultPid,vfork()为第一***调用函数。
基于以上内容,S506的具体实现内容,包括:将进程创建语句中的函数字段中的第一***调用函数,修改为第二***调用函数,得到用于创建子进程的目标语句;其中,目标语句通过第二***调用函数创建子进程。
第二***调用函数是不同于第一***调用函数的***调用函数,也是由操作***提供。在第二***调用函数下,通过第二***调用函数创建的子进程和对应的父进程之间的地址空间是相互独立的。由于***调用函数是操作***提供的一种内核函数,因此对***调用函数的修改是从内核层面执行的修改,这样可以从***内核层面解决操作***上ANR的问 题,能够提高平台的兼容性和稳定性,对ANR现象进行有效预防和规避。
S507,在***代码中采用目标语句替换进程创建语句,以修复***代码。
确定出目标语句之后,可以将原始的***代码中的进程创建语句替换为目标语句,这样,可以禁掉第一***调用函数,而改成第二***调用函数,在内核层面实现对***代码的修复,在第一进程为父进程,第二进程为子进程的情况下,通过目标语句创建的第二进程由于和第一进程独立使用不同的地址空间,基于修复后的***代码运行目标应用时可顺利地利用目标资源进行数据读取操作,而不会发生阻塞,从而避免发生无响应现象。
举例来说,进程代码创建语句:Volatile pid_t resultPid=vfork()可以修改为目标语句:Volatilepid_tresultPid=fork(),if START_CHILD_USE_VFORK也可以修改为if START_CHILD_USE_FORK。从而实现禁掉vfork(),改成fork()的方式。
可以理解的是,鉴于操作***的整体性,对目标函数中进程代码创建语句的修改,可能会影响操作***的***代码中除目标函数之外的其他部分代码,因此,对于***代码的其他内容也可以进行适应性地修改,以适配以上的具体修复内容。通过以上方式可以修复***代码并得到修复后的***代码,基于修复后的***代码运行目标应用。
可见,针对操作***的***代码的修复内容,是对***代码定制化的修改内容,该修改内容涉及对内核的***调用的更改,可以实现从内核侧以及***侧对ANR可能发生的情况进行拦截,并应对绝大多数可能发生ANR的场景。此外,修改内容没有与特定平台强绑定的策略,也没有硬编码的部分,因此可以实现与设备或平台本身的松耦合,从而能够将修复后的操作***应用于任意场景下,例如云游戏、真实的终端设备或者模拟器等等,可以有效规避应用无响应现象,提高平台或者设备的稳定性和兼容性。
在一种实现方式中,对于定制化修改的***代码还可以:对修复后的***代码进行质量检查。该质量检查包括:代码评审以及***代码编写过程中的安全性检测。在编写代码过程中进行安全性检查可以发现定制化的***代码中存在的安全性问题,保证修复后的***代码安全性。代码评审(Code Review)也称代码复查,此处是指一种通过阅读代码来检查源代码与编码标准的符合性以及代码质量的操作,通过代码评审可以增进代码质量,找出潜在错误(bug)等。对于安全性检查以及代码评审均可以利用相应的分析工具来执行。这样,可以保证各项数据符合预期的情况,并在保证操作***的兼容性和稳定性的前提下,解决应用无响应的问题。
通过上述修复方案进行处理,在云游戏场景下,基于运行修复后的代码运行云游戏的测试过程中以及实际的线上服务过程中,云游戏平台均未出现ANR的情况,从而可以减少ANR发生的频率。可以理解的是,由于云游戏平台是操作***,属于***侧,***上会运行其他大量的游戏APP和其他APP,因此,可能还是会存在小部分ANR现象,但是概率极低。
本申请实施例所提供的数据处理方案,可以获取到目标应用的多种无响应现象的现场信息进行收集,由于现场信息是用于描述对应产生该无响应现象时***代码的执行情况,通过对多种ANR的现场信息进行共性分析,可以从操作***层面对***代码的执行情况进行分析,从底层寻找产生各种无响应现象的共同特性,得到共性分析结果。此处共性分析 结果包括共有的进程标识,基于共有的进程标识确定出共有的多个目标进程,并对各个目标进程执行的代码片段进行分析,具体可以通过调用栈跟进进程执行代码片段的情况,基于调用栈和其他代码片段的执行逻辑确定出更加具体的故障点,在此过程中,可以利用各种调试工具跟进和调试问题来确定故障点。基于故障点可以对操作***进行定制化的改造,具体可以从内核层面修改进程创建时所调用的***函数,从而有效解决应用无响应现象的发生,由于是对***内核层面的修复来解决的操作***的应用无响应的问题,可以显著地提高兼容性和稳定性。
请参见图7,图7是本申请实施例提供的一种数据处理装置的结构示意图。上述数据处理装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如该数据处理装置为一个应用软件;该数据处理装置可以用于执行本申请实施例提供的方法中的相应步骤。如图7所示,该数据处理装置700可以包括以下至少一种:获取模块701、处理模块702。
获取模块701,用于获取目标应用的多种无响应现象中的各种无响应现象的现场信息,任一种无响应现象是在基于操作***的***代码运行目标应用的过程中产生的;且任一种无响应现象的现场信息用于描述:在产生相应的无响应现象时,***代码的执行情况;
处理模块702,用于对各种无响应现象的现场信息进行共性分析,得到共性分析结果,并根据共性分析结果从***代码中确定产生无响应现象的故障点;其中,所述共性分析结果包括:所述各种无响应现象的现场信息中的共有信息;
处理模块702,用于根据故障点对***代码进行修复处理,以基于修复后的***代码运行目标应用。
在一个实施例中,***代码包括多个进程以及每个进程所执行的代码片段;任一种无响应现象的现场信息包括:在产生相应的无响应现象时,***代码中被运行的各个进程的进程标识;处理模块702,具体用于:针对各种无响应现象的现场信息中的任一现场信息,遍历任一现场信息中的各个进程标识;在除任一现场信息以外的各个现场信息中,查找当前遍历的进程标识;若查找到当前遍历的进程标识,则将当前遍历的进程标识添加至共性分析结果中;若未查找到当前遍历的进程标识,则继续遍历任一现场信息中的各个进程标识。
在一个实施例中,共性分析结果包括:各种无响应现象的现场信息之间共有的进程标识;处理模块702,具体用于:基于共性分析结果中的各个进程标识确定M个目标进程,M的取值等于共性分析结果中的进程标识的数量;确定M个目标进程中的各个目标进程之间的关联关系,并从***代码中获取各个目标进程所执行的代码片段;基于关联关系,从获取到的M个代码片段中确定产生无响应现象的故障点。
在一个实施例中,M个目标进程包括第一进程和第二进程;处理模块702,具体用于:从各种无响应现象的现场信息中,获取第一进程的属性信息以及第二进程的属性信息;任一进程的属性信息包括:任一进程的进程号以及对任一进程进行调用的进程的进程号;若第二进程的属性信息中包括第一进程的进程号或者第一进程的属性信息中包括第二进程的进程号,则确定第一进程和第二进程之间的关联关系为父子关系。
在一个实施例中,M个目标进程包括第一进程和第二进程,且第一进程和第二进程之 间的关联关系为父子关系;其中,第一进程是第二进程的父进程;处理模块702,具体还用于:基于父子关系,将获取到的第一进程所执行的代码片段确定为基准代码片段;并从基准代码片段中确定第一代码语句,第一代码语句是指:在发生无响应现象之前,第一进程所执行的代码语句;当第一代码语句是用于实现函数调用操作的语句时,在第一进程执行的代码片段中沿着第一代码语句分析出第一进程的调用栈,调用栈包括第一进程调用的各种函数;若调用栈中存在调用失败的目标函数,则从基准代码片段中确定目标函数的逻辑代码;基于第二进程所执行的代码片段,从目标函数的逻辑代码中确定产生无响应现象的故障点。
在一个实施例中,目标函数的逻辑代码中包括进程创建语句,进程创建语句是用于创建子进程的语句,且通过进程创建语句所创建的子进程与相应的父进程之间共享同一个地址空间;处理模块702,具体用于:从第二进程执行的代码片段中确定第二代码语句,第二代码语句是指:在发生无响应现象之前,第二进程所执行的代码语句;当第二代码语句是用于实现数据读取操作的语句时,根据第二代码语句确定第二进程执行数据读取操作所需的目标资源;若目标资源被第一进程持有,则将目标函数的逻辑代码中的进程创建语句,确定为产生无响应现象的故障点;其中,第二进程是第一进程的子进程;若目标资源被第一进程持有,则第二进程被阻塞;且当第二进程被阻塞的时长大于时长阈值时,无响应现象被触发。
在一个实施例中,处理模块702,具体用于:确定用于创建子进程的目标语句,目标语句所创建的子进程与相应的父进程独立使用不同的地址空间;在***代码中采用目标语句替换进程创建语句,以修复***代码。
在一个实施例中,进程创建语句中包括函数字段,且函数字段存储第一***调用函数,进程创建语句是通过第一***调用函数创建子进程的;处理模块702,具体用于:将进程创建语句中的函数字段中的第一***调用函数,修改为第二***调用函数,得到用于创建子进程的目标语句;其中,目标语句通过第二***调用函数创建子进程。
在一个实施例中,目标应用的多种无响应现象包括服务类型的无响应现象;服务类型的无响应现象的产生过程包括:响应于目标应用的应用进程发送的服务创建请求,设置服务时长阈值;向服务进程发送服务创建消息,以使得服务进程基于服务创建消息,通知服务进程调用一个或多个其他进程执行服务创建工作,并在成功创建服务后返回反馈信息;若在服务时长阈值内未接收到服务进程返回的反馈信息,则产生服务类型的无响应现象。
在一个实施例中,目标应用的多种无响应现象包括广播类型的无响应现象;广播类型的无响应现象的产生过程包括:响应于目标应用的应用进程发起的发送广播请求,设置广播时长阈值;向广播接收进程发送广播注册消息,以使得广播接收进程基于广播注册消息,通知广播接收进程调用一个或多个其他进程执行广播工作,并在广播完成后返回反馈信息;若在广播时长阈值内未接收到广播接收进程返回的反馈信息,则产生广播类型的无响应现象。
在一个实施例中,目标应用的多种无响应现象包括内容提供类型的无响应现象;内容提供类型的无响应现象的产生过程包括:响应于目标应用的应用进程发起的获取内容提供 对象的请求,检测内容提供对象所对应的内容提供进程的启动状态;若启动状态指示内容提供进程未启动,则创建内容提供进程,并通知内容提供进程调用一个或多个其他进程,安装内容提供对象,以及在安装内容提供对象后返回反馈信息,内容提供对象配置有一个安装时长阈值;若在安装时长阈值内未接收到内容提供进程返回的反馈信息,则产生内容提供类型的无响应现象。
在一个实施例中,目标应用的多种无响应现象包括输入事件分发类型的无响应现象;输入事件分发类型的无响应现象的产生过程包括:若接收到一个输入事件,则将当前接收到的输入事件添加至输入队列中,并唤醒输入分发线程,输入分发线程用于将输入队列中的各个输入事件依序分发给目标应用的应用进程进行处理;当当前接收到的输入事件的分发顺序到达时,目标应用的应用进程在处理其他的输入事件,则产生输入事件分发类型的无响应现象。
可以理解的是,本申请实施例所描述的数据处理装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
请参见图8,图8是本申请实施例提供的一种计算机设备的结构示意图。该计算机设备800可以包含独立设备(例如服务器、节点、终端等等中的一个或者多个),也可以包含独立设备内部的部件(例如芯片、软件模块或者硬件模块等)。该计算机设备800可以包括至少一个处理器801和通信接口802,进一步可选地,计算机设备800还可以包括至少一个存储器803和总线804。其中,处理器801、通信接口802和存储器803通过总线804相连。
其中,处理器801是进行算术运算和/或逻辑运算的模块,具体可以是中央处理器(central processing unit,CPU)、图片处理器(graphics processing unit,GPU)、微处理器(microprocessor unit,MPU)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)、复杂可编程逻辑器件(Complex programmable logic device,CPLD)、协处理器(协助中央处理器完成相应处理和应用)、微控制单元(Microcontroller Unit,MCU)等处理模块中的一种或者多种的组合。
通信接口802可以用于为至少一个处理器提供信息输入或者输出。和/或,通信接口802可以用于接收外部发送的数据和/或向外部发送数据,可以为包括诸如以太网电缆等的有线链路接口,也可以是无线链路(Wi-Fi、蓝牙、通用无线传输、车载短距通信技术以及其他短距无线通信技术等)接口。通信接口802可以作为网络接口。
存储器803用于提供存储空间,存储空间中可以存储操作***和计算机程序等数据。存储器803可以是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM)等等中的一种或者多种的组合。
该计算机设备800中的至少一个处理器801用于调用至少一个存储器803中存储的计算机程序,用于执行本申请所示的实施例所描述的数据处理方法。
在一种可能的实施方式中,该计算机设备800中的处理器801用于调用至少一个存储器 803中存储的计算机程序,用于执行以下操作:
在一个实施例中,处理器801,具体用于:获取目标应用的多种无响应现象中的各种无响应现象的现场信息,任一种无响应现象是在基于操作***的***代码运行目标应用的过程中产生的;且任一种无响应现象的现场信息用于描述:在产生相应的无响应现象时,***代码的执行情况;对各种无响应现象的现场信息进行共性分析,得到共性分析结果,并根据共性分析结果从***代码中确定产生无响应现象的故障点;根据故障点对***代码进行修复处理,以基于修复后的***代码运行目标应用;其中,所述共性分析结果包括:所述各种无响应现象的现场信息中的共有信息。
应当理解,本申请实施例中所描述的计算机设备800可执行前文所对应实施例中对该数据处理方法的描述,也可执行前文图7所对应实施例中对该数据处理装置700的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,还应指出,本申请一个示例性实施例还提供了一种存储介质,该存储介质中存储了前述数据处理方法的计算机程序,当一个或多个处理器加载并执行该计算机程序,可以实现实施例中对数据处理方法的描述,这里不再赘述,对采用相同方法的有益效果描述,也在此不再赘述。可以理解的是,程序指令可以被部署在一个或能够互相通信的多个计算机设备上执行。
上述计算机可读存储介质可以是前述任一实施例提供的数据处理装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本申请的一个方面,提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序,处理器执行该计算机程序,使得该计算机设备执行本申请实施例中的方法。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。
本申请实施例装置中的模块可以根据实际需要进行合并、划分和删减。
以上所揭露的仅为本申请的部分实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。

Claims (16)

  1. 一种数据处理方法,所述方法由计算机设备执行,所述方法包括:
    获取目标应用的多种无响应现象中的各种无响应现象的现场信息,任一种无响应现象是在基于操作***的***代码运行所述目标应用的过程中产生的;且任一种无响应现象的现场信息用于描述:在产生相应的无响应现象时,所述***代码的执行情况;
    对所述各种无响应现象的现场信息进行共性分析,得到共性分析结果,并根据所述共性分析结果从所述***代码中确定产生无响应现象的故障点;其中,所述共性分析结果包括:所述各种无响应现象的现场信息中的共有信息;
    根据所述故障点对所述***代码进行修复处理,以基于修复后的***代码运行所述目标应用。
  2. 如权利要求1所述的方法,所述***代码包括多个进程以及每个进程所执行的代码片段;所述任一种无响应现象的现场信息包括:在产生相应的无响应现象时,所述***代码中被运行的各个进程的进程标识;
    所述对所述各种无响应现象的现场信息进行共性分析,得到共性分析结果,包括:
    针对所述各种无响应现象的现场信息中的任一现场信息,遍历所述任一现场信息中的各个进程标识;
    在除所述任一现场信息以外的各个现场信息中,查找当前遍历的进程标识;
    若查找到所述当前遍历的进程标识,则将所述当前遍历的进程标识添加至共性分析结果中;
    若未查找到所述当前遍历的进程标识,则继续遍历所述任一现场信息中的各个进程标识。
  3. 如权利要求1或2所述的方法,所述共性分析结果包括:所述各种无响应现象的现场信息之间共有的进程标识;所述根据所述共性分析结果从所述***代码中确定产生无响应现象的故障点,包括:
    基于所述共性分析结果中的各个进程标识确定M个目标进程,M的取值等于所述共性分析结果中的进程标识的数量;
    确定所述M个目标进程中的各个目标进程之间的关联关系,并从所述***代码中获取所述各个目标进程所执行的代码片段;
    基于所述关联关系,从获取到的M个代码片段中确定产生无响应现象的故障点。
  4. 如权利要求3所述的方法,所述M个目标进程包括第一进程和第二进程;所述确定所述M个目标进程中的各个目标进程之间的关联关系,包括:
    从所述各种无响应现象的现场信息中,获取所述第一进程的属性信息以及所述第二进程的属性信息;任一进程的属性信息包括:所述任一进程的进程号以及对所述任一进程进行调用的进程的进程号;
    若所述第二进程的属性信息中包括所述第一进程的进程号或者所述第一进程的属性信息中包括所述第二进程的进程号,则确定所述第一进程和所述第二进程之间的关联关系为父子关系。
  5. 如权利要求3所述的方法,所述M个目标进程包括第一进程和第二进程,且所述第一进程和所述第二进程之间的关联关系为父子关系;其中,所述第一进程是所述第二进程的父进程;
    所述基于所述关联关系,从获取到的M个代码片段中确定产生无响应现象的故障点,包括:
    基于所述父子关系,将获取到的第一进程所执行的代码片段确定为基准代码片段;并从所述基准代码片段中确定第一代码语句,所述第一代码语句是指:在发生无响应现象之前,所述第一进程所执行的代码语句;
    当所述第一代码语句是用于实现函数调用操作的语句时,在所述第一进程执行的代码片段中沿着第一代码语句分析出所述第一进程的调用栈,所述调用栈包括所述第一进程调用的各种函数;
    若所述调用栈中存在调用失败的目标函数,则从所述基准代码片段中确定所述目标函数的逻辑代码;基于所述第二进程所执行的代码片段,从所述目标函数的逻辑代码中确定产生无响应现象的故障点。
  6. 如权利要求5所述的方法,所述目标函数的逻辑代码中包括进程创建语句,所述进程创建语句是用于创建子进程的语句,且通过所述进程创建语句所创建的子进程与相应的父进程之间共享同一个地址空间;
    所述基于所述第二进程所执行的代码片段,从所述目标函数的逻辑代码中确定产生无响应现象的故障点,包括:
    从所述第二进程执行的代码片段中确定第二代码语句,所述第二代码语句是指:在发生无响应现象之前,所述第二进程所执行的代码语句;
    当所述第二代码语句是用于实现数据读取操作的语句时,根据所述第二代码语句确定所述第二进程执行数据读取操作所需的目标资源;
    若所述目标资源被所述第一进程持有,则将所述目标函数的逻辑代码中的进程创建语句,确定为产生无响应现象的故障点;
    其中,所述第二进程是所述第一进程的子进程;若所述目标资源被所述第一进程持有,则所述第二进程被阻塞;且当所述第二进程被阻塞的时长大于时长阈值时,无响应现象被触发。
  7. 如权利要求6所述的方法,所述根据所述故障点对所述***代码进行修复处理,包括:
    确定用于创建子进程的目标语句,所述目标语句所创建的子进程与相应的父进程独立使用不同的地址空间;
    在所述***代码中采用所述目标语句替换所述进程创建语句,以修复所述***代码。
  8. 如权利要求7所述的方法,所述进程创建语句中包括函数字段,且所述函数字段存储第一***调用函数,所述进程创建语句是通过所述第一***调用函数创建子进程的;
    所述确定用于创建子进程的目标语句,包括:
    将所述进程创建语句中的函数字段中的第一***调用函数,修改为第二***调用函数, 得到用于创建子进程的目标语句;
    其中,所述目标语句通过所述第二***调用函数创建子进程。
  9. 如权利要求1所述的方法,所述目标应用的多种无响应现象包括服务类型的无响应现象;所述服务类型的无响应现象的产生过程包括:
    响应于目标应用的应用进程发送的服务创建请求,设置服务时长阈值;
    向服务进程发送服务创建消息,以使得所述服务进程基于所述服务创建消息,通知所述服务进程调用一个或多个其他进程执行服务创建工作,并在成功创建服务后返回反馈信息;
    若在所述服务时长阈值内未接收到所述服务进程返回的反馈信息,则产生所述服务类型的无响应现象。
  10. 如权利要求1所述的方法,所述目标应用的多种无响应现象包括广播类型的无响应现象;所述广播类型的无响应现象的产生过程包括:
    响应于目标应用的应用进程发起的发送广播请求,设置广播时长阈值;
    向广播接收进程发送广播注册消息,以使得所述广播接收进程基于所述广播注册消息,通知所述广播接收进程调用一个或多个其他进程执行广播工作,并在广播完成后返回反馈信息;
    若在所述广播时长阈值内未接收到所述广播接收进程返回的反馈信息,则产生所述广播类型的无响应现象。
  11. 如权利要求1所述的方法,所述目标应用的多种无响应现象包括内容提供类型的无响应现象;所述内容提供类型的无响应现象的产生过程包括:
    响应于目标应用的应用进程发起的获取内容提供对象的请求,检测所述内容提供对象所对应的内容提供进程的启动状态;
    若所述启动状态指示所述内容提供进程未启动,则创建所述内容提供进程,并通知所述内容提供进程调用一个或多个其他进程,安装所述内容提供对象,以及在安装所述内容提供对象后返回反馈信息,所述内容提供对象配置有一个安装时长阈值;
    若在所述安装时长阈值内未接收到所述内容提供进程返回的反馈信息,则产生所述内容提供类型的无响应现象。
  12. 如权利要求1所述的方法,所述目标应用的多种无响应现象包括输入事件分发类型的无响应现象;所述输入事件分发类型的无响应现象的产生过程包括:
    若接收到一个输入事件,则将当前接收到的输入事件添加至输入队列中,并唤醒输入分发线程,所述输入分发线程用于将所述输入队列中的各个输入事件依序分发给所述目标应用的应用进程进行处理;
    当所述当前接收到的输入事件的分发顺序到达时,所述目标应用的应用进程在处理其他的输入事件,则产生所述输入事件分发类型的无响应现象。
  13. 一种数据处理装置,包括:
    获取模块,用于获取目标应用的多种无响应现象中的各种无响应现象的现场信息,任一种无响应现象是在基于操作***的***代码运行所述目标应用的过程中产生的;且任一 种无响应现象的现场信息用于描述:在产生相应的无响应现象时,所述***代码的执行情况;
    处理模块,用于对所述各种无响应现象的现场信息进行共性分析,得到共性分析结果,并根据所述共性分析结果从所述***代码中确定产生无响应现象的故障点;其中,所述共性分析结果包括:所述各种无响应现象的现场信息中的共有信息;
    所述处理模块,用于根据所述故障点对所述***代码进行修复处理,以基于修复后的***代码运行所述目标应用。
  14. 一种计算机设备,包括:处理器、存储器以及网络接口;
    所述处理器与所述存储器、所述网络接口相连,其中,所述网络接口用于提供网络通信功能,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以执行权利要求1-12中任一项所述的数据处理方法。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时,执行权利要求1-12中任一项所述的数据处理方法。
  16. 一种包括计算机程序的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1-12中任一项所述的数据处理方法。
PCT/CN2023/090470 2022-07-13 2023-04-25 数据处理方法、装置、设备、存储介质和程序产品 WO2024012003A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP23838503.3A EP4379554A1 (en) 2022-07-13 2023-04-25 Data processing method and apparatus, and device, storage medium and program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210819928.7 2022-07-13
CN202210819928.7A CN114880159B (zh) 2022-07-13 2022-07-13 数据处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024012003A1 true WO2024012003A1 (zh) 2024-01-18

Family

ID=82682820

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/090470 WO2024012003A1 (zh) 2022-07-13 2023-04-25 数据处理方法、装置、设备、存储介质和程序产品

Country Status (3)

Country Link
EP (1) EP4379554A1 (zh)
CN (1) CN114880159B (zh)
WO (1) WO2024012003A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880159B (zh) * 2022-07-13 2022-09-13 腾讯科技(深圳)有限公司 数据处理方法、装置、设备及存储介质
CN116708120B (zh) * 2023-04-12 2024-02-23 友帮信互联网技术(北京)有限公司 时间在线共享业务的服务方法和装置
CN117909070A (zh) * 2023-05-29 2024-04-19 荣耀终端有限公司 信息传输方法、电子设备、存储介质和芯片***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106155741A (zh) * 2016-06-30 2016-11-23 努比亚技术有限公司 一种避免应用程序无响应的处理装置及方法
CN109144852A (zh) * 2018-07-25 2019-01-04 百度在线网络技术(北京)有限公司 静态代码的扫描方法、装置、计算机设备及存储介质
US20190196937A1 (en) * 2016-09-09 2019-06-27 Microsoft Technology Licensing, Llc Automated Performance Debugging of Production Applications
CN114528184A (zh) * 2022-02-17 2022-05-24 中国平安人寿保险股份有限公司 应用卡顿监控方法、装置、计算机设备及存储介质
CN114880159A (zh) * 2022-07-13 2022-08-09 腾讯科技(深圳)有限公司 数据处理方法、装置、设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293979B (zh) * 2015-06-25 2019-11-15 伊姆西公司 检测进程无响应的方法和装置
CN108804299B (zh) * 2017-04-26 2023-04-07 腾讯科技(深圳)有限公司 应用程序异常处理方法及装置
CN109992438A (zh) * 2017-12-29 2019-07-09 广东欧珀移动通信有限公司 信息处理方法、装置、计算机设备和计算机可读存储介质
CN109298960A (zh) * 2018-08-15 2019-02-01 中国平安人寿保险股份有限公司 应用崩溃处理方法、装置、计算机装置及存储介质
CN109165114B (zh) * 2018-09-14 2022-07-12 Oppo广东移动通信有限公司 应用程序无响应的处理方法、装置、存储介质及智能终端
CN110188016B (zh) * 2019-05-24 2022-11-01 山东多科科技有限公司 应用程序无响应阻塞的检测方法、终端以及存储介质
CN112860513A (zh) * 2021-01-29 2021-05-28 北京字跳网络技术有限公司 应用无响应监测方法、装置、设备及存储介质
CN113419886B (zh) * 2021-06-21 2022-05-03 网易(杭州)网络有限公司 处理程序崩溃的方法、设备和计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106155741A (zh) * 2016-06-30 2016-11-23 努比亚技术有限公司 一种避免应用程序无响应的处理装置及方法
US20190196937A1 (en) * 2016-09-09 2019-06-27 Microsoft Technology Licensing, Llc Automated Performance Debugging of Production Applications
CN109144852A (zh) * 2018-07-25 2019-01-04 百度在线网络技术(北京)有限公司 静态代码的扫描方法、装置、计算机设备及存储介质
CN114528184A (zh) * 2022-02-17 2022-05-24 中国平安人寿保险股份有限公司 应用卡顿监控方法、装置、计算机设备及存储介质
CN114880159A (zh) * 2022-07-13 2022-08-09 腾讯科技(深圳)有限公司 数据处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN114880159A (zh) 2022-08-09
CN114880159B (zh) 2022-09-13
EP4379554A1 (en) 2024-06-05

Similar Documents

Publication Publication Date Title
WO2024012003A1 (zh) 数据处理方法、装置、设备、存储介质和程序产品
US11068309B2 (en) Per request computer system instances
US10509665B2 (en) Fast-booting application image
CA2921180C (en) Request processing techniques
Ye et al. Droidfuzzer: Fuzzing the android apps with intent-filter tag
US8402318B2 (en) Systems and methods for recording and replaying application execution
US5933639A (en) System and method for debugging distributed programs
US8996925B2 (en) Managing error logs in a distributed network fabric
US7818623B2 (en) Kernel debugging in a cluster computing system
Viennot et al. Transparent mutable replay for multicore debugging and patch validation
US20120102462A1 (en) Parallel test execution
US7984332B2 (en) Distributed system checker
JP2004199330A (ja) 情報処理装置、トレース処理方法、プログラム及び記録媒体
US20130067439A1 (en) Injecting faults into program for testing
EP2972881B1 (en) Diagnostics of state transitions
Shibanai et al. Actoverse: a reversible debugger for actors
WO2023046141A1 (zh) 一种数据库网络负载性能的加速框架、加速方法及设备
CN115694699A (zh) 时延参数采集方法、装置、电子设备及存储介质
Lee et al. Unified debugging of distributed systems with recon
CN110737564A (zh) 一种基于VmWare虚拟机性能监测方法及***
AT&T
Luo et al. {DepFast}: Orchestrating Code of Quorum Systems
Baldassari Design and evaluation of a public resource computing framework
CN114610416A (zh) 一种基于配置参数的数据处理方法及装置
Joshi Analysis and Debugging of OEM's

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2023838503

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23838503

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023838503

Country of ref document: EP

Effective date: 20240227