CN111916210A

CN111916210A - Auxiliary diagnosis method, device, server and storage medium

Info

Publication number: CN111916210A
Application number: CN202011063862.0A
Authority: CN
Inventors: 张渊; 陈天歌
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2020-11-10

Abstract

The embodiment of the application provides an auxiliary diagnosis method, an auxiliary diagnosis device, a server and a storage medium, which are applied to the field of medical science and technology, wherein the server comprises a processor and a memory, the memory is used for storing a computer program, the computer program comprises program instructions, the processor is configured to call the program instructions, and the following steps are executed: receiving an auxiliary diagnosis request sent by terminal equipment, wherein the auxiliary diagnosis request comprises a target state of a target patient at a target time; calling a pre-trained reinforcement learning model to determine a target action corresponding to the target state according to the target state, wherein the target action comprises target diagnosis reference data corresponding to the target state; and sending the target diagnosis reference data to the terminal equipment for displaying. By the aid of the method and the device, an objective auxiliary diagnosis mode can be provided to improve reliability of a diagnosis process. The present application relates to blockchain techniques, such as target diagnostic reference data may be written into a blockchain.

Description

Auxiliary diagnosis method, device, server and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a server, and a storage medium for assisting diagnosis.

Background

Diseases, whether chronic or acute, can cause considerable harm to the patient's body. The complexity of the pathogenic factors of the patient also makes the diagnosis and treatment process difficult for the doctor.

For example, chronic kidney disease has a high incidence rate of 10.8%, and nearly 3000 thousands of patients will develop end-stage renal disease. The course of chronic kidney disease is usually divided into one to five stages, and the fifth stage of the fourth stage requires long-term dialysis treatment, which requires the close cooperation of the patient's long-term adherence and medical care personnel.

The conventional treatment modes of common patients with renal diseases comprise medicine taking and dialysis, and particularly, the treatment mode of dialysis is closely related to various indexes of the patients, and as the physical states of the patients can greatly differ at different times, higher requirements are provided for dialysis doctors. The judgment of the doctor is greatly related to the clinical experience, the fatigue degree and the like of the doctor. Therefore, a completely objective auxiliary diagnosis and treatment method is urgently needed to improve the reliability of the diagnosis process.

Disclosure of Invention

The embodiment of the application provides an auxiliary diagnosis method, an auxiliary diagnosis device, a server and a storage medium, and can provide an objective auxiliary diagnosis mode to improve the reliability of a diagnosis process.

In a first aspect, an embodiment of the present application provides a server, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions, and perform the following steps:

receiving an auxiliary diagnosis request sent by a terminal device, wherein the auxiliary diagnosis request comprises a target state of a target patient at a target time, and the target state comprises a target body state;

calling a pre-trained reinforcement learning model to determine a target action corresponding to the target state according to the target state, wherein the target action comprises target diagnosis reference data corresponding to the target state, and the pre-trained reinforcement learning model is obtained by training according to the state of a sample patient at each time in at least one time;

and sending the target diagnosis reference data to the terminal equipment so that the terminal equipment can display the target diagnosis reference data.

Optionally, the processor is configured to call the program instruction, and further perform the following steps:

calling an original reinforcement learning model to determine an action corresponding to each state according to the state of the sample patient at each time in at least one time, and obtaining a delay reward value of the action corresponding to each state, wherein the state comprises a body state;

obtaining an accumulated delay reward value according to the delay reward value of the action corresponding to each state;

and iterating the original reinforcement learning model for multiple times until the accumulated delay reward value reaches the maximum accumulated delay reward value, and obtaining a pre-trained reinforcement learning model.

Optionally, when the original reinforcement learning model is invoked to determine, according to the state at each time, an action corresponding to each state, and obtain a delay reward value of the action corresponding to each state, the processor is configured to invoke the program instructions to perform the following steps:

calling an original reinforcement learning model to determine a first action corresponding to a first state according to the first state of the first time in the at least one time, and obtaining a delay reward value of the first action; the first time is any one of the at least one time;

calling the original reinforcement learning model to determine a second action corresponding to a second state according to the second state of the second time in the at least one time, and obtaining a delay reward value of the second action; the second time is a next time of the first time.

Optionally, in obtaining the delayed reward value of the first action, the processor is configured to invoke the program instructions to perform the steps of:

after the sample patient is transferred from the first state to the second state, determining a score corresponding to the second state according to the corresponding relation between the states and the scores;

and determining the value corresponding to the second state as the delay reward value of the first action.

obtaining a diagnosis template;

filling the diagnosis template by using the target diagnosis reference data to obtain a filled diagnosis template as a diagnosis file;

and sending the diagnosis file to the terminal equipment so that the terminal equipment can display the diagnosis file.

Optionally, after sending the diagnostic file to the terminal device, the processor is configured to call the program instruction, and further perform the following steps:

receiving an editing request for the diagnostic file sent by the terminal equipment, wherein the editing request carries editing parameters;

updating the diagnostic file by using the editing parameters to obtain an updated diagnostic file;

and sending the updated diagnosis file to the terminal equipment so that the terminal equipment can display the updated diagnosis file.

Optionally, after invoking the pre-trained reinforcement learning model to determine the target action corresponding to the target state according to the target state, the processor is configured to invoke the program instructions and further perform the following steps:

determining the state of the target patient at the next time of the target time, and determining the delay reward value of the target action according to the state of the next time;

and updating the pre-trained reinforcement learning model by using the delayed reward value of the target action.

In a second aspect, an embodiment of the present application provides a diagnosis assisting method, including:

In a third aspect, an embodiment of the present application provides a diagnosis assisting apparatus, including:

the communication module is used for receiving an auxiliary diagnosis request sent by the terminal equipment, wherein the auxiliary diagnosis request comprises a target state of a target patient at a target time, and the target state comprises a target body state;

the processing module is used for calling a pre-trained reinforcement learning model to determine a target action corresponding to the target state according to the target state, the target action comprises target diagnosis reference data corresponding to the target state, and the pre-trained reinforcement learning model is obtained by training according to the state of a sample patient at each time in at least one time;

the communication module is further configured to send the target diagnosis reference data to the terminal device, so that the terminal device displays the target diagnosis reference data.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to implement the steps performed by the server according to the first aspect.

In summary, the server may obtain a target state of a target patient at a target time, call a pre-trained reinforcement learning model to determine a target action corresponding to the target state according to the target state, and send target diagnosis reference data corresponding to the target state included in the target action to the terminal device, so that the terminal device displays the target diagnosis reference data.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a diagnostic aid method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a reinforcement learning model training process provided by an embodiment of the present application;

FIG. 3 is a schematic flow chart of another diagnostic aid provided in embodiments of the present application;

fig. 4 is a schematic network architecture diagram of an auxiliary diagnostic system according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an auxiliary diagnostic apparatus provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Please refer to fig. 1, which is a flowchart illustrating an auxiliary diagnosis method according to an embodiment of the present application. The method may be applied to a server. The server may be a server or a cluster of servers. Specifically, the method may comprise the steps of:

s101, receiving an auxiliary diagnosis request sent by terminal equipment, wherein the auxiliary diagnosis request comprises a target state of a target patient at a target time, and the target state comprises a target body state.

The terminal device includes, but is not limited to, an intelligent terminal such as a notebook computer, a desktop computer, and the like. The target patient may be any patient at a visit. The target time may be the visit time for the target patient, or may also be any time prior to the visit time for the target patient. The target state refers to the state of the target patient at the target time. In one embodiment, the target state may also include a psychological state and/or medical compliance, such as medication compliance.

In one application scenario, when a patient A visits a doctor B, the doctor B can enter an auxiliary diagnosis page by using a desktop computer, and click an auxiliary diagnosis button included in the auxiliary diagnosis page after inputting the current physical state of the patient A based on the auxiliary diagnosis page, the desktop computer can respond to the click operation of the auxiliary diagnosis button and send an auxiliary diagnosis request to a server according to the current physical state of the patient A, the auxiliary diagnosis request carries the current physical state of the patient A, and the server can receive the auxiliary diagnosis request sent by the desktop computer.

S102, calling a pre-trained reinforcement learning model to determine a target action corresponding to the target state according to the target state, wherein the target action comprises target diagnosis reference data corresponding to the target state, and the pre-trained reinforcement learning model is obtained according to the state training of the sample patient at each time in at least one time.

The target diagnosis reference data refers to diagnosis reference data corresponding to a target state. The diagnostic reference data may include at least one of: a medication recommendation for the target patient for the target state, and an examination mode recommendation for the target patient for the target state. For example, for chronic kidney disease, the target diagnosis reference data may include a medication recommendation for the target patient for a target state, a hemodialysis amount and a hemodialysis manner recommendation for the target state. The medication recommendation may include a drug name and a drug use method, the hemodialysis amount recommendation may include a hemodialysis amount value, and the hemodialysis regimen recommendation may include a name of a hemodialysis regimen.

In one embodiment, the pre-trained reinforcement learning model mentioned in the embodiments of the present application can be obtained by: the server calls an original reinforcement learning model to determine an action corresponding to each state according to the state of the sample patient at each time in at least one time, and obtains a delay reward value of the action corresponding to each state, wherein the state comprises a body state; and the server obtains an accumulated delay reward value according to the delay reward value of the action corresponding to each state, and iterates the original reinforcement learning model for multiple times until the accumulated delay reward value reaches the maximum accumulated delay reward value, so as to obtain a pre-trained reinforcement learning model. The embodiment of the application can repeatedly iterate the original reinforcement learning model to maximize the accumulated delay reward value, specifically, can iterate the original reinforcement learning model for multiple times to gradually increase the accumulated delay reward value so that the accumulated delay reward value is finally converged to a certain value, and the value can be used as the maximum accumulated delay reward value. Model training is to maximize the sum of the reward values while avoiding actions that would cause a single delay reward value to decrease, but still trying because the action that causes the current delay reward value to decrease (e.g., taking a different medicine) may cause the next delay reward value to increase severely. After a plurality of iterations, the model tries out all possible possibilities to converge into a stable behavior state, and then the pre-trained reinforcement learning model can be obtained.

In one embodiment, the process of the server invoking the original reinforcement learning model to determine the action corresponding to each state according to the state at each time and obtain the delay reward value of the action corresponding to each state may be as follows: the server calls an original reinforcement learning model to determine a first action corresponding to a first state according to the first state of the first time in the at least one time, and obtains a delay reward value of the first action; the first time is any one of the at least one time; and the server calls the original reinforcement learning model to determine a second action corresponding to a second state in the at least one time according to the second state of the second time, and obtains a delay reward value of the second action. Wherein the second time is the next time of the first time. The first state may refer to a state of the sample patient at a first time. The second state may refer to a state of the sample patient at a second time. The first action may be an action corresponding to the first state, and the second action may be an action corresponding to the second state. The first action may include diagnostic reference data corresponding to the first state. The second action may include diagnostic reference data corresponding to the second state. The method for determining, by the electronic device, the second action corresponding to the second state according to the second state of the second time in the at least one time and obtaining the delay reward value of the second action may refer to a method for determining, by the electronic device, the first action corresponding to the first state according to the first state of the first time in the at least one time and obtaining the delay reward value of the first action, which is not described herein again in this embodiment of the present application.

For example, referring to fig. 2, the first state S1 shown in fig. 2 is the state of the sample patient at the first time, and the server invokes the original reinforcement learning model to determine the first action a1 corresponding to S1 according to S1, and may determine the first delayed reward value corresponding to a 1. Since the sample patient will enter the second state S2 from S1, and S2 is the state of the sample patient at the second time, the server may invoke the original reinforcement learning model to determine a second action a2 corresponding to S2 according to S2, and may determine a second delay reward value corresponding to a 2. The server may add the first delay reward value and the second delay reward value to obtain a cumulative delay reward value R. The server iterates the original reinforcement learning model for a plurality of times until a pre-trained reinforcement learning model is obtained when R reaches a maximum R. If some behaviors result in a decrease in R during one training, the next training avoids those behaviors for model training.

In one embodiment, the process by which the server obtains the delayed reward value for the first action may be as follows: and after the sample patient is transferred from the first state to the second state, the server determines the score corresponding to the second state according to the corresponding relation between the states and the scores, and determines the score corresponding to the second state as the delay reward value of the first action. By adopting the process, the first action can be rewarded according to the transferred state and the actual situation, so that the action selection can be accurately carried out subsequently.

For example, if the second status indicates that the physical condition of the patient is not worsened (e.g., the value of Epidermal Growth Factor Receptor (EGFR) is not changed), the score corresponding to the second status may be determined to be 1 according to the corresponding relationship between the status and the score, and the delay reward value of the first action may be determined to be 1. If the second state indicates that the physical condition of the patient is worsened (e.g., the value of the EGFR is decreased), the score corresponding to the second state can be determined to be-1 according to the corresponding relationship between the state and the score, and the delay reward value for the first action can be determined to be-1. If the second state indicates that the physical condition of the patient is continuously worsened (e.g., the EGFR value decreases multiple times within a preset time period), the score corresponding to the second state may be determined to be-2 according to the correspondence between the state and the score, and further the delay reward value for the first action may be determined to be-2. If the second state indicates that the patient dies, the score corresponding to the second state can be determined to be-3 according to the corresponding relationship between the state and the score, and further the delay reward value for the first action can be determined to be-3. The process of determining the delay reward value of other actions may refer to the above process, and the embodiments of the present application are not described in detail herein.

S103, sending the target diagnosis reference data to the terminal equipment so that the terminal equipment can display the target diagnosis reference data.

In the embodiment of the application, the server can send the target diagnosis reference data to the terminal device, and the terminal device can display the target diagnosis reference data.

It can be seen that, in the embodiment shown in fig. 1, the server may obtain a target state of a target patient at a target time, call a pre-trained reinforcement learning model to determine a target action corresponding to the target state according to the target state, and send target diagnosis reference data corresponding to the target state included in the target action to the terminal device, so that the terminal device displays the target diagnosis reference data.

Please refer to fig. 3, which is a flowchart illustrating another auxiliary diagnosis method according to an embodiment of the present application. The method may be applied to a server. The server may be a server or a cluster of servers. Specifically, the method may comprise the steps of:

s301, receiving an auxiliary diagnosis request sent by a terminal device, wherein the auxiliary diagnosis request comprises a target state of a target patient at a target time, and the target state comprises a target body state.

S302, calling a pre-trained reinforcement learning model to determine a target action corresponding to the target state according to the target state, wherein the target action comprises target diagnosis reference data corresponding to the target state, and the pre-trained reinforcement learning model is obtained according to the state training of the sample patient at each time in at least one time.

S303, sending the target diagnosis reference data to the terminal equipment so that the terminal equipment can display the target diagnosis reference data.

Steps S301 to S303 can refer to steps S101 to S103 in the embodiment of fig. 1, and details of the embodiment of the present application are not repeated herein.

In one embodiment, after the server invokes the pre-trained reinforcement learning model to determine the target action corresponding to the target state according to the target state, the server determines the state of the target patient at the next time of the target time, and determines the delay reward value of the target action according to the state of the next time, so as to update the pre-trained reinforcement learning model with the delay reward value of the target action. The process can optimize the pre-trained reinforcement learning model according to the actual feedback in the model application process, so that the model can select the optimal action more accurately according to the state.

And S304, acquiring a diagnosis template.

In the embodiment of the application, the server may obtain the diagnosis template from a local or database.

In one embodiment, the server may obtain an identifier of a hospital where the target patient is located, and query the diagnosis template corresponding to the identifier of the hospital from a local or database according to the correspondence between the identifier of the hospital and the diagnosis template.

In one embodiment, the server may trigger execution of the step of acquiring the diagnostic template from a local or database upon receiving a diagnostic template acquisition instruction sent by the terminal device.

S305, filling the diagnosis template by using the target diagnosis reference data to obtain the filled diagnosis template as a diagnosis file.

S306, sending the diagnosis file to the terminal equipment so that the terminal equipment can display the diagnosis file.

In the embodiment of the application, the server may fill the target diagnosis reference data into the diagnosis template to obtain the filled diagnosis template as a diagnosis file, and send the diagnosis file to the terminal device, so that the terminal device displays the diagnosis file.

In one embodiment, the terminal device may send an edit request for the diagnostic file to the server. The server may receive an editing request for the diagnostic file sent by the terminal device, where the editing request carries editing parameters. The editing parameters may include parameters to be added, modified or deleted corresponding to the diagnostic file, for example. The server updates the diagnostic file by using the editing parameters to obtain an updated diagnostic file, and sends the updated diagnostic file to the terminal equipment so that the terminal equipment can display the updated diagnostic file.

In the embodiment shown in fig. 3, the server may further fill the diagnostic template according to the target diagnostic reference data to obtain the filled diagnostic template as a diagnostic file, and the diagnostic file is displayed through the terminal device, so that an automatic intelligent generation process of the diagnostic file is realized, and the generation efficiency of the diagnostic file is improved.

The application can be applied to the field of medical science and technology, and relates to a block chain technology, for example, target diagnosis reference data can be written into a block chain, or a hash value of a diagnosis file can be written into the block chain.

Please refer to fig. 4, which is a schematic diagram of a network architecture of an auxiliary diagnostic system according to an embodiment of the present application. The diagnosis assistance system shown in fig. 4 includes a server 10 and a terminal device 20. Wherein:

the terminal device 20 may transmit a supplementary diagnosis request to the server 10. The server may obtain a target action corresponding to the target state of the target patient at the target time by executing steps S101 and S102 according to the auxiliary diagnosis request, and may display target diagnosis reference data included in the target action through the terminal device 20 by executing step S103, which may improve reliability of the diagnosis process by providing an objective auxiliary diagnosis manner.

Please refer to fig. 5, which is a schematic structural diagram of an auxiliary diagnostic apparatus according to an embodiment of the present disclosure. The diagnosis assisting apparatus may be applied to the aforementioned server. Specifically, the diagnosis assisting apparatus may include:

the communication module 501 is configured to receive an auxiliary diagnosis request sent by a terminal device, where the auxiliary diagnosis request includes a target state of a target patient at a target time, and the target state includes a target body state.

A processing module 502, configured to invoke a pre-trained reinforcement learning model to determine, according to the target state, a target action corresponding to the target state, where the target action includes target diagnosis reference data corresponding to the target state, and the pre-trained reinforcement learning model is obtained by training according to a state of a sample patient at each time in at least one time.

The communication module 501 is further configured to send the target diagnosis reference data to the terminal device, so that the terminal device displays the target diagnosis reference data.

In an optional embodiment, the processing module 502 is further configured to invoke an original reinforcement learning model to determine, according to a state of the sample patient at each time in at least one time, an action corresponding to each state, and obtain a delay reward value of the action corresponding to each state, where the state includes a body state; obtaining an accumulated delay reward value according to the delay reward value of the action corresponding to each state; and iterating the original reinforcement learning model for multiple times until the accumulated delay reward value reaches the maximum accumulated delay reward value, and obtaining a pre-trained reinforcement learning model.

In an optional implementation, the processing module 502 invokes an original reinforcement learning model to determine, according to the state at each time, an action corresponding to each state and obtain a delay reward value of the action corresponding to each state, specifically, invokes the original reinforcement learning model to determine, according to a first state at a first time in the at least one time, a first action corresponding to the first state and obtain a delay reward value of the first action; the first time is any one of the at least one time; calling the original reinforcement learning model to determine a second action corresponding to a second state according to the second state of the second time in the at least one time, and obtaining a delay reward value of the second action; the second time is a next time of the first time.

In an optional embodiment, the processing module 502 obtains the delay reward value of the first action, specifically, after the sample patient is transferred from the first state to the second state, determines the score corresponding to the second state according to the corresponding relationship between the state and the score; and determining the value corresponding to the second state as the delay reward value of the first action.

In an alternative embodiment, the processing module 502 is further configured to obtain a diagnosis template; filling the diagnosis template by using the target diagnosis reference data to obtain a filled diagnosis template as a diagnosis file; the diagnostic file is sent to the terminal device through the communication module 501, so that the terminal device displays the diagnostic file.

In an optional implementation manner, the processing module 502 is further configured to receive, through the communication module 501, an editing request for the diagnostic file sent by the terminal device after the diagnostic file is sent to the terminal device, where the editing request carries editing parameters; updating the diagnostic file by using the editing parameters to obtain an updated diagnostic file; the updated diagnosis file is sent to the terminal device through the communication module 501, so that the terminal device displays the updated diagnosis file.

In an optional implementation, the processing module 502 is further configured to, after invoking a pre-trained reinforcement learning model to determine a target action corresponding to the target state according to the target state, determine a state of the target patient at a next time of the target time, and determine a delay reward value of the target action according to the state of the next time; and updating the pre-trained reinforcement learning model by using the delayed reward value of the target action.

It can be seen that, in the embodiment shown in fig. 5, the auxiliary diagnosis device may obtain the target state of the target patient at the target time, and call the pre-trained reinforcement learning model to determine the target action corresponding to the target state according to the target state, so as to send the target diagnosis reference data corresponding to the target state included in the target action to the terminal device, so that the terminal device displays the target diagnosis reference data.

Please refer to fig. 6, which is a schematic structural diagram of a server according to an embodiment of the present disclosure. The server described in this embodiment may include: one or more processors 1000 and memory 2000. The processor 1000 and the memory 2000 may be connected by a bus.

The Processor 1000 may be a Central Processing Unit (CPU), and may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 2000 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). Wherein the memory 2000 is used for storing a computer program comprising program instructions, the processor 1000 is configured for invoking the program instructions for performing the steps of:

In one embodiment, the embodiment of the present application may receive a request for auxiliary diagnosis sent by a terminal device through an input device (not shown). The target diagnosis reference data may be transmitted to the terminal device through an output device (not shown) in the embodiments of the present application. The input and output devices may be standard wired/wireless interfaces.

In one embodiment, the processor 1000 is configured to invoke the program instructions and further perform the steps of:

In one embodiment, when the original reinforcement learning model is invoked to determine the action corresponding to each state according to the state at each time and obtain the delay reward value of the action corresponding to each state, the processor 1000 is configured to invoke the program instructions to perform the following steps:

In one embodiment, in obtaining the delayed reward value for the first action, the processor 1000 is configured to invoke the program instructions to perform the steps of:

obtaining a diagnosis template;

In an embodiment, the diagnostic file may be sent to the terminal device through an output device, so that the terminal device displays the diagnostic file.

In one embodiment, after sending the diagnostic file to the terminal device, the processor 1000 is configured to call the program instructions and further perform the following steps:

In an embodiment, an editing request for the diagnostic file sent by the terminal device may be received through an input device, where the editing request carries editing parameters. The updated diagnosis file can be sent to the terminal equipment through an output device.

In one embodiment, after invoking the pre-trained reinforcement learning model to determine the target action corresponding to the target state according to the target state, the processor 1000 is configured to invoke the program instructions and further perform the following steps:

In a specific implementation, the processor 1000 described in this embodiment of the present application may execute the implementation described in the embodiment of fig. 1 and the embodiment of fig. 3, and may also execute the implementation described in this embodiment of the present application, which is not described herein again.

The functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of sampling hardware, and can also be realized in a form of sampling software functional modules.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The computer readable storage medium may be volatile or nonvolatile. For example, the computer storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A server comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the steps of:

2. The server according to claim 1, wherein the processor is configured to invoke the program instructions and further perform the steps of:

3. The server according to claim 2, wherein when invoking the original reinforcement learning model to determine the action corresponding to each state according to the state at each time and obtain the delay reward value of the action corresponding to each state, the processor is configured to invoke the program instructions to perform the following steps:

4. The server according to claim 3, wherein in obtaining the delayed reward value for the first action, the processor is configured to invoke the program instructions to perform the steps of:

5. The server according to claim 1, wherein the processor is configured to invoke the program instructions and further perform the steps of:

obtaining a diagnosis template;

6. The server according to claim 5, wherein after sending the diagnostic file to the terminal device, the processor is configured to invoke the program instructions and further perform the steps of:

7. The server according to claim 1, wherein after invoking the pre-trained reinforcement learning model to determine the target action corresponding to the target state according to the target state, the processor is configured to invoke the program instructions and further perform the following steps:

8. A method of aiding diagnosis, comprising:

9. A diagnostic aid, comprising:

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the steps performed by the server according to any of claims 1-7.