CN113421638B

CN113421638B - Model generation method and device based on transfer learning and computer equipment

Info

Publication number: CN113421638B
Application number: CN202110701188.2A
Authority: CN
Inventors: 廖希洋
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2022-07-15
Anticipated expiration: 2041-06-22
Also published as: CN113421638A

Abstract

The application relates to the field of artificial intelligence, and provides a model generation method and device based on transfer learning, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a preset type II diabetes intervention model; acquiring target model parameters of a type II diabetes intervention model; initializing model parameters of a preset second reinforcement learning model by using the target model parameters based on the transfer learning to obtain an initialized second reinforcement learning model; acquiring pre-collected follow-up data of a type-I diabetic patient; and training the initialized second reinforcement learning model based on the follow-up data of the type I diabetes patient to obtain a type I diabetes intervention model corresponding to the follow-up data of the type I diabetes patient. The method and the device can reduce the construction cost of the type I diabetes intervention model and improve the construction efficiency of the type I diabetes intervention model. The method can also be applied to the field of block chains, and the type I diabetes intervention model can be stored on the block chains.

Description

Model generation method and device based on transfer learning and computer equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a model generation method and device based on transfer learning and computer equipment.

Background

In recent years, the application of artificial intelligence technology in chronic disease management has been greatly developed. In the chronic disease management scenario, the existing chronic disease intervention model usually needs to be obtained by training a reinforcement learning model by using a large amount of follow-up sample data. However, for type one diabetes with a very low patient population, follow-up sample data for type one diabetes is expensive and difficult to collect. If the existing model generation mode is adopted to generate the intervention model corresponding to the type I diabetes, the great expenditure of manpower and material resources is needed to collect the relevant follow-up sample data, and the problems of high construction cost and low construction efficiency exist in the type I diabetes intervention model due to the long time spent on data collection and the high investment cost of collection.

Disclosure of Invention

The application mainly aims to provide a model generation method, a model generation device, computer equipment and a storage medium based on transfer learning, and aims to solve the technical problems of high construction cost and low construction efficiency of the existing type I diabetes intervention model.

The application provides a model generation method based on transfer learning, which comprises the following steps:

acquiring a preset type II diabetes intervention model; the type II diabetes intervention model is obtained by training a preset first reinforcement learning model based on pre-collected type II diabetes patient follow-up sample data;

acquiring target model parameters of the type II diabetes intervention model;

initializing the model parameters of a preset second reinforcement learning model by using the target model parameters based on the transfer learning to obtain an initialized second reinforcement learning model;

acquiring pre-collected follow-up data of a type-I diabetic patient; wherein the type one diabetic patient follow-up sample data comprises the status of the type one diabetic patient at each time contained within a follow-up period;

training the initialized second reinforcement learning model based on the follow-up data of the type I diabetes patient to obtain a trained second reinforcement learning model, and taking the trained second reinforcement learning model as a type I diabetes intervention model corresponding to the follow-up data of the type I diabetes patient so as to perform diagnosis intervention processing on the data of the type I diabetes patient to be processed through the type I diabetes intervention model.

Optionally, the initializing the preset model parameters of the second reinforcement learning model by using the target model parameters based on the transfer learning to obtain the initialized second reinforcement learning model includes:

obtaining model parameters of the second reinforcement learning model;

setting the model parameters of the second reinforcement learning model as the target model parameters based on the transfer learning to obtain a set second reinforcement learning model;

and taking the set second reinforcement learning model as the initialized second reinforcement learning model.

Optionally, the step of training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient to obtain a trained second reinforcement learning model, and using the trained second reinforcement learning model as the type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient includes:

calling the initialized second reinforcement learning model to ensure that the initialized second reinforcement learning model determines the action corresponding to each state according to the state of the type-I diabetes patient at each time in the follow-up period, and acquiring the delay reward of the action corresponding to each state;

calculating to obtain a target delay reward based on the delay reward of the action corresponding to each state information;

performing multiple times of iterative processing on the initialized second reinforcement learning model until the target delay reward reaches a preset maximum accumulated delay reward, and obtaining a trained second reinforcement learning model;

and taking the trained second reinforcement learning model as the type I diabetes intervention model.

Optionally, the step of calling the initialized second reinforcement learning model to determine, by the initialized second reinforcement learning model, an action corresponding to each state according to the state of the type one diabetes patient at each time included in the follow-up period, and acquiring a delay reward of the action corresponding to each state includes:

calling the initialized second reinforcement learning model to enable the initialized second reinforcement learning model to determine a first action corresponding to a first state according to the first state at a first time; wherein the first time is any one of all times contained in the follow-up visit period;

calling the initialized second reinforcement learning model to enable the initialized second reinforcement learning model to determine a second action corresponding to a second state according to the second state of second time; wherein the second time is a next time within the follow-up period adjacent to the first time;

after the type-I diabetic patient is transferred from the first state to the second state, inquiring a target score corresponding to the second state according to a preset state-score relation mapping table;

and taking the target score as a delay reward of the first action.

Optionally, the step of calculating a target delay reward based on the delay reward of the action corresponding to each piece of state information includes:

summing the delay rewards of the actions corresponding to each state to obtain corresponding sum values;

and using the sum as the target delay reward.

Optionally, after the step of training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient to obtain a trained second reinforcement learning model, and using the trained second reinforcement learning model as a type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient, the method includes:

storing the type one diabetes intervention model;

acquiring a flash frame based on python;

executing a deployment process corresponding to the type one diabetes intervention model based on the flash framework and a web service.

Optionally, after the step of training the initialized second reinforcement learning model based on the follow-up data of the type one diabetic to obtain a trained second reinforcement learning model, and using the trained second reinforcement learning model as a type one diabetes intervention model corresponding to the follow-up data of the type one diabetic, the method includes:

acquiring input data of a specified type I diabetes patient to be processed; wherein the designated type one diabetic patient data comprises a designated status of the designated type one diabetic patient over a designated time;

calling an interface corresponding to the web service, and calling the type I diabetes intervention model through the interface;

inputting the data of the appointed type I diabetes mellitus patient into the type I diabetes mellitus intervention model, and receiving a diagnosis intervention result output after the type I diabetes mellitus intervention model processes the data of the appointed type I diabetes mellitus patient;

and displaying the diagnosis intervention result.

The present application also provides a model generation device based on transfer learning, including:

the first acquisition module is used for acquiring a preset type II diabetes intervention model; the type II diabetes intervention model is obtained by training a preset first reinforcement learning model based on pre-collected type II diabetes patient follow-up sample data;

the second acquisition module is used for acquiring target model parameters of the type II diabetes intervention model;

the processing module is used for initializing the model parameters of a preset second reinforcement learning model by using the target model parameters based on the transfer learning to obtain an initialized second reinforcement learning model;

the third acquisition module is used for acquiring pre-acquired follow-up data of the type-I diabetic patient; wherein the type one diabetic patient follow-up sample data comprises the status of the type one diabetic patient at each time contained within a follow-up period;

and the generation module is used for training the initialized second reinforcement learning model based on the type-one diabetes patient follow-up data to obtain a trained second reinforcement learning model, and taking the trained second reinforcement learning model as a type-one diabetes intervention model corresponding to the type-one diabetes patient follow-up data so as to diagnose and intervene the type-one diabetes patient data to be processed through the type-one diabetes intervention model.

The present application further provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above method when executing the computer program.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method.

The model generation method, the model generation device, the computer equipment and the storage medium based on the transfer learning provided by the application have the following beneficial effects:

according to the model generation method and device based on the transfer learning, the target model parameters of the pre-created type II diabetes intervention model are obtained, and then the target model parameters are used for initializing the model parameters of the preset second reinforcement learning model based on the transfer learning so as to obtain the initialized second reinforcement learning model, so that the initialized second reinforcement learning model can be trained by adopting pre-collected type I diabetes patient follow-up data, and the type I diabetes intervention model corresponding to the type I diabetes patient follow-up data is conveniently and quickly generated. As only the currently acquired follow-up data of the type I diabetes patients is needed to train the second reinforcement learning model which carries out initialization of the target model parameters of the type II diabetes intervention model, the corresponding type I diabetes intervention model can be quickly generated, the acquisition cost of model training data is effectively reduced, the construction cost of generating the type I diabetes intervention model is greatly reduced, and the construction efficiency of the type I diabetes intervention model is improved.

Drawings

FIG. 1 is a flow chart of a model generation method based on transfer learning according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a model generation apparatus based on transfer learning according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Referring to fig. 1, a model generation method based on transfer learning according to an embodiment of the present application includes:

s1: acquiring a preset type II diabetes intervention model; the type II diabetes intervention model is obtained by training a preset first reinforcement learning model based on pre-collected type II diabetes patient follow-up sample data;

s2: acquiring target model parameters of the type II diabetes intervention model;

s3: initializing the model parameters of a preset second reinforcement learning model by using the target model parameters based on the transfer learning to obtain an initialized second reinforcement learning model;

s4: acquiring pre-acquired follow-up data of a type-I diabetic; wherein the type one diabetic patient follow-up sample data comprises the status of a type one diabetic patient at each time encompassed within a follow-up period;

s5: training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient to obtain a trained second reinforcement learning model, and taking the trained second reinforcement learning model as a type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient so as to diagnose and intervene the data of the type one diabetes patient to be processed through the type one diabetes intervention model.

As described in the above steps S1 to S5, the execution subject of the embodiment of the method is a model generation apparatus based on the migration learning. In practical applications, the model generation apparatus based on the transfer learning may be implemented by a virtual apparatus, such as a software code, or may be implemented by an entity apparatus written with or integrated with a relevant execution code, and may perform human-computer interaction with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device. The model generation device based on the transfer learning in the embodiment can effectively reduce the acquisition cost of model training data, greatly reduce the construction cost of generating the type-one diabetes intervention model, and improve the construction efficiency of the type-one diabetes intervention model. Specifically, a preset type two diabetes intervention model is obtained first. The type II diabetes intervention model is obtained by training a preset first reinforcement learning model based on pre-collected type II diabetes patient follow-up sample data. The second type of diabetic patient follow-up sample data comprises a status of the second type of diabetic patient at each time encompassed within the respective follow-up period. In addition, the base number of the patients with type II diabetes is large, so that the difficulty in obtaining mass follow-up sample data of the patients with type II diabetes for constructing the type II diabetes intervention model is low. After the follow-up sample data of the type II diabetes patient is obtained, the follow-up sample data can be further preprocessed, and a type II diabetes intervention model is trained and generated based on the preprocessed follow-up sample data of the type II diabetes patient. In addition, in the training generation process of obtaining the type two diabetes intervention model based on the follow-up visit sample data of the type two diabetes patient, reference may be made to the subsequent process of training generation of the type one diabetes intervention model based on the follow-up visit sample data of the type one diabetes patient, which is not described herein again. In addition, the first reinforcement learning model and the second reinforcement learning model are DQN models, and the network structure used by the DQN model may be three convolution layers plus two full connection layers. And then acquiring target model parameters of the type II diabetes intervention model. The target model parameters may include neuron coefficients of hidden layers in the type ii diabetes intervention model. After the creation of the type two diabetes intervention model is completed, the corresponding target model parameters can be derived from the type two diabetes intervention model based on the pre-written parameter acquisition codes. In addition, the parameter acquisition code may be a code file generated by a relevant developer based on the usage requirements of the actual derived model parameters, and the parameter acquisition code may be stored in the device.

And then initializing the model parameters of a preset second reinforcement learning model by using the target model parameters based on the transfer learning to obtain the initialized second reinforcement learning model. Among them, migration learning is a machine learning method, which means that a pre-trained model is reused in another task. In addition, the second reinforcement learning model requires initialization parameters at the beginning of training, and the general method is random initialization, that is, corresponding model parameters are randomly generated. The initialization is as follows: and acquiring model parameters of the second reinforcement learning model, setting the model parameters of the second reinforcement learning model as the target model parameters based on transfer learning, and replacing random initialization of the second reinforcement learning model with the target model parameters to further obtain the initialized second reinforcement learning model. In addition, although there are some differences in treatment methods between type II diabetes and type I diabetes, the patients with type II diabetes need to be treated with insulin in combination as their conditions progress. The use of insulin often brings safety problems such as hypoglycemic events and other sudden problems, so that the two are related, and the type II diabetes can be used as a source field, and the type I diabetes can be used as a target field. Model parameters in the learned and trained type II diabetes intervention model (which can also be understood as knowledge of a source field learned by the type II diabetes intervention model) can be migrated to an original intervention model corresponding to a target field through migration learning, namely, the second reinforcement learning model assists the second reinforcement learning model to train so as to accelerate and optimize the learning generation efficiency of the type I diabetes intervention model, and therefore the requirements on mass sample data can be effectively reduced in the training process of the type I diabetes intervention model, and the problem of high cost of a training task is solved.

And subsequently acquiring pre-collected follow-up data of the type-I diabetes patient. Wherein the type one diabetic patient follow-up sample data comprises the status of the type one diabetic patient at each time contained within a follow-up period. In addition, because the base number of the patients with type I diabetes is large, the acquisition difficulty of the follow-up visit sample data of the patients with type I diabetes is very high, and only the follow-up visit sample data of the patients with type I diabetes of a preset number is required to be acquired, and the preset number can be set according to actual use requirements. And finally, training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient to obtain a trained second reinforcement learning model, and taking the trained second reinforcement learning model as a type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient so as to diagnose and intervene the data of the type one diabetes patient to be processed through the type one diabetes intervention model. The specific training generation process of the type i diabetes intervention model can refer to the following description of the corresponding embodiments. In addition, the intervention model is not constructed by directly using the follow-up visit data of the type-I diabetes patients because the data volume is extremely small, the intervention model cannot be completely constructed, and the overfitting problem is easily caused. And the initialized second reinforcement learning model is trained in a parameter initializing mode, so that the overfitting problem of the generated type I diabetes intervention model can be reduced by using the knowledge learned by the type II diabetes intervention model. In addition, the follow-up data of the type one diabetes mellitus patient has a new scene, namely new knowledge in the target field, so that the type one diabetes mellitus intervention model cannot be directly used as the type two diabetes mellitus intervention model, but the follow-up data of the type one diabetes mellitus patient is needed to be used for carrying out correction training on the second reinforcement learning model after model parameters of the type two diabetes mellitus intervention model are initialized, and therefore the final type one diabetes intervention model suitable for the follow-up data of the type one diabetes mellitus patient is generated.

In the embodiment, the target model parameters of the pre-created type two diabetes intervention model are acquired, and then the target model parameters are used for initializing the model parameters of the preset second reinforcement learning model based on the transfer learning so as to obtain the initialized second reinforcement learning model, so that the initialized second reinforcement learning model can be trained by adopting pre-acquired type one diabetes patient follow-up data, and the type one diabetes intervention model corresponding to the type one diabetes patient follow-up data can be conveniently and quickly generated. As only the currently acquired follow-up data of the type I diabetes patients is needed to train the second reinforcement learning model which carries out initialization of the target model parameters of the type II diabetes intervention model, the corresponding type I diabetes intervention model can be quickly generated, the acquisition cost of model training data is effectively reduced, the construction cost of generating the type I diabetes intervention model is greatly reduced, and the construction efficiency of the type I diabetes intervention model is improved.

Further, in an embodiment of the present application, the step S3 includes:

s300: obtaining model parameters of the second reinforcement learning model;

s301: setting the model parameters of the second reinforcement learning model as the target model parameters based on the transfer learning to obtain a set second reinforcement learning model;

s302: and taking the set second reinforcement learning model as the initialized second reinforcement learning model.

As described in the foregoing steps S300 to S302, the initializing the preset model parameters of the second reinforcement learning model by using the target model parameters based on the transfer learning to obtain the initialized second reinforcement learning model may specifically include: firstly, obtaining model parameters of the second reinforcement learning model. After the pre-creation of the second reinforcement learning model and the type two diabetes intervention model is completed, corresponding model parameters can be derived from the second reinforcement learning model and the type two diabetes intervention model based on a parameter acquisition code which is written in advance. And then setting the model parameters of the second reinforcement learning model as the target model parameters based on the transfer learning to obtain the set second reinforcement learning model. The second reinforcement learning model and the type II diabetes intervention model have the same model network structure, so the corresponding hidden layers, neurons and the like of the second reinforcement learning model and the type II diabetes intervention model have the same quantity. In addition, the model parameters of the trained type two diabetes intervention model are used as the initialization model parameters of the second reinforcement learning model, because the selection of the initialization model parameters often influences the final effect of the model in the reinforcement learning training process. Because the trained type II diabetes intervention model contains all knowledge of all source fields corresponding to type II diabetes, the type II diabetes intervention model can be learned to the knowledge and applied to the target field, namely, the type I diabetes intervention model corresponding to type I diabetes can be constructed. Therefore, when the type I diabetes intervention model corresponding to the type I diabetes patient follow-up data is generated through training, the initial model parameters of the type I diabetes intervention model can be equal to the target model parameters of the type II diabetes intervention model. And finally, taking the set second reinforcement learning model as the initialized second reinforcement learning model. In this embodiment, the model parameters of the second reinforcement learning model are set as the target model parameters based on the transfer learning, so that the initialized second reinforcement learning model can be quickly obtained, and the initialized second reinforcement learning model can be subsequently trained based on the follow-up visit data of the type one diabetes patient to generate the required type one diabetes intervention model, thereby effectively reducing the requirement on a large amount of sample data in the training and generating process of the type one diabetes intervention model, and solving the problem of high cost of the training task.

Further, in an embodiment of the present application, the step S5 includes:

s500: calling the initialized second reinforcement learning model to ensure that the initialized second reinforcement learning model determines the action corresponding to each state according to the state of the type I diabetes patient at each time in the follow-up period, and acquiring the delay reward of the action corresponding to each state;

s501: calculating to obtain a target delay reward based on the delay reward of the action corresponding to each state information;

s503: performing multiple iterative processing on the initialized second reinforcement learning model until the target delay reward reaches a preset maximum accumulated delay reward, and obtaining a trained second reinforcement learning model;

s504: and taking the trained second reinforcement learning model as the type I diabetes intervention model.

As described in the foregoing steps S500 to S504, the training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient to obtain a trained second reinforcement learning model, and the step of using the trained second reinforcement learning model as the type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient may specifically include: firstly, calling the initialized second reinforcement learning model to ensure that the initialized second reinforcement learning model determines the action corresponding to each state according to the state of each time included in the follow-up period of the type-I diabetes patient, and acquiring the delay reward of the action corresponding to each state. Wherein the status includes at least a blood glucose content status. The actions include diagnostic intervention data corresponding to the state, which may include: the medication suggestion can comprise a medicine name and a medicine using mode, the medicine name can be insulin, the allocation suggestion of the medical care resource can comprise whether the medical care resource needs to be allocated and the allocation quantity of the medical care resource, and the medical care resource can refer to medical care personnel. And then calculating a target delay reward based on the delay reward of the action corresponding to each state information. The time delay rewards of the actions corresponding to each state can be summed to obtain corresponding sum values; and then the sum value is used as the target delay reward, namely the target delay reward is the sum of all delay rewards. And then, carrying out multiple times of iterative processing on the initialized second reinforcement learning model until the target delay reward reaches a preset maximum accumulated delay reward, and obtaining the trained second reinforcement learning model. And finally, taking the trained second reinforcement learning model as the type I diabetes intervention model. The initialized second reinforcement learning model can be iterated repeatedly to maximize the target delay reward, specifically, the initialized second reinforcement learning model can be iterated for multiple times, the target delay reward is gradually increased to be converged to a certain value finally, and the value can be used as the maximum accumulated delay reward. Model training for the initialized second reinforcement learning model maximizes the sum of delay rewards while avoiding actions that cause a single delay reward to fall, but still trying because the action that causes the current delay reward to fall may cause the next delay reward to rise severely. After a plurality of iterations, the model tries out all possible possibilities to converge into a stable behavior state, and then a second reinforcement learning model after training can be obtained. According to the embodiment, the second reinforcement learning model initialized by the target model parameters of the type II diabetes intervention model is trained by only adopting currently acquired type I diabetes patient follow-up data, so that the corresponding type I diabetes intervention model can be quickly generated, the acquisition cost of model training data is effectively reduced, the construction cost of generating the type I diabetes intervention model is greatly reduced, and the construction efficiency of the type I diabetes intervention model is improved.

Further, in an embodiment of the present application, the step S500 includes:

s5000: calling the initialized second reinforcement learning model to enable the initialized second reinforcement learning model to determine a first action corresponding to a first state according to the first state at a first time; wherein the first time is any one of all times contained in the follow-up visit period;

s5001: calling the initialized second reinforcement learning model to enable the initialized second reinforcement learning model to determine a second action corresponding to a second state according to the second state of second time; wherein the second time is a next time within the follow-up period that is adjacent to the first time;

s5002: after the type-I diabetic patient is transferred from the first state to the second state, inquiring a target score corresponding to the second state according to a preset state-score relation mapping table;

s5003: and taking the target score as a delay reward of the first action.

As described in the foregoing steps S5000 to S5003, the step of calling the initialized second reinforcement learning model so that the initialized second reinforcement learning model determines, according to the state of the type one diabetes patient at each time included in the follow-up period, the action corresponding to each state, and acquiring the delay reward of the action corresponding to each state may specifically include: the method comprises the steps of firstly calling the initialized second reinforcement learning model so as to enable the initialized second reinforcement learning model to determine a first action corresponding to a first state according to the first state at a first time. Wherein the first time is any one of all times included in the follow-up period. Additionally, the first state is a state of a type I diabetic at a first time, and the first action is an action corresponding to the first state, which may include diagnostic intervention data corresponding to the first state. And then calling the initialized second reinforcement learning model to enable the initialized second reinforcement learning model to determine a second action corresponding to a second state according to the second state of a second time. Wherein the second time is a next time within the follow-up period that is adjacent to the first time. Additionally, the second state is a state of the type one diabetic patient at a second time, and the second action is an action corresponding to the second state, which may include diagnostic intervention data corresponding to the second state. And then after the type-I diabetes patient is transferred from the first state to the second state, inquiring a target score corresponding to the second state according to a preset state-score relation mapping table. The state-score relation mapping table is a data table which is created in advance by combining actual use conditions and stores states and corresponding scores, and the corresponding scores in any state can be quickly inquired through the data table. For example, if the second state indicates that the blood glucose content of the patient is unchanged compared with the first state, the score corresponding to the second state may be determined to be 2 points according to the mapping relationship between the states and the scores; if the second state indicates that the blood sugar content of the patient is reduced by a lower amplitude compared with the first state, determining that the score corresponding to the second state is-1 according to the mapping relation between the states and the scores; if the second state indicates that the patient has a higher level of blood glucose level reduction than the first state, the score corresponding to the second state may be determined to be-2, based on the mapping between the states and the scores, and so on. And finally, taking the target value as the delay reward of the first action. In addition, the process of determining the delay rewards of other actions may refer to the generation process of the delay reward of the first action, and is not described herein again. In the embodiment, the state of each time included in the follow-up period of the type-one diabetes patient is determined, the action corresponding to each state is determined, the value corresponding to each state can be obtained by inquiring the preset state-value relation mapping table, and then the delay reward of each action can be quickly determined based on the value, so that the initialized second reinforcement learning model can be subjected to multiple times of iterative processing based on the delay reward and the preset maximum accumulated delay reward, and the initialized second reinforcement learning model can be successfully generated and used as the type-one diabetes intervention model.

Further, in an embodiment of the present application, the step S501 includes:

s5010: summing the delay rewards of the actions corresponding to each state to obtain corresponding sum values;

s5011: and using the sum as the target delay reward.

As described in the above steps S5010 to S5011, the step of calculating a target delay award based on the delay award of the action corresponding to each piece of status information may specifically include: firstly, the delay rewards of the actions corresponding to each state are summed to obtain a corresponding sum value. And after the sum value is obtained, taking the sum value as the target delay reward. In the embodiment, the corresponding target delay reward is quickly generated by calculating the sum of the delay rewards of the actions corresponding to each state, so that the initialized second reinforcement learning model can be accurately subjected to multiple times of iterative processing based on the target delay reward, and a type-one diabetes intervention model corresponding to the follow-up visit data of the type-one diabetes patient can be quickly generated.

Further, in an embodiment of the application, after the step S5, the method includes:

s510: storing the type one diabetes intervention model;

s511: acquiring a flash frame based on python;

s512: executing deployment processing corresponding to the type one diabetes intervention model based on the flash framework and a web service.

As described in the foregoing steps S510 to S512, after the step of training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient is performed to obtain the trained second reinforcement learning model, and using the trained second reinforcement learning model as the type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient, a storage process and a deployment process for the type one diabetes intervention model may be further included. Specifically, the type one diabetes intervention model is first stored. The storage mode of the type I diabetes intervention model can be set according to available memory data of the device, and if the current available memory data of the device is smaller than a preset memory threshold value, the type I diabetes intervention model is stored on a block chain, so that the situation that the device is blocked in operation due to the fact that the type I diabetes intervention model is stored in the device under the condition that the available memory of the device is small is avoided, and the smoothness of the operation of the device is guaranteed. By using the block chain to store and manage the type one diabetes intervention model, the security and the non-tamper property of the type one diabetes intervention model can be effectively ensured. In addition, if the current available memory data of the device is larger than the memory threshold, the relevant user can select a local storage mode or a block chain storage mode by himself or herself so as to improve the use experience of the user. Then a python-based flash frame is obtained. The flash framework is a web development micro-framework developed by Python based on Werkzeug and Jinja 2, and has the advantages of being extremely concise, flexible, easy to learn and easy to apply. The flash framework is therefore the best choice to start web development quickly, and in addition, another benefit of using the flash framework is that Python-based machine learning algorithms or data analysis algorithms can be integrated into web applications very easily. And finally executing deployment processing corresponding to the type I diabetes intervention model based on the flash framework and the web service. The model deployment of the type I diabetes intervention model can be completed based on the Flask framework of python and the form of building a web service. The device can then subsequently invoke the type one diabetes intervention model by accessing the interface corresponding to the web service and invoking the instructional data through the json input. After the type-one diabetes intervention model is obtained, the type-one diabetes intervention model is stored and deployed, so that the type-one diabetes intervention model can be quickly called conveniently through a corresponding interface when the type-one diabetes intervention model is required to be used subsequently.

Further, in an embodiment of the present application, after the step S5, the method includes:

s520: acquiring input data of a specified type I diabetes patient to be processed; wherein the designated type one diabetic patient data comprises a designated status of the designated type one diabetic patient over a designated time;

s521: calling an interface corresponding to the web service, and calling the type I diabetes intervention model through the interface;

s522: inputting the data of the appointed type I diabetes mellitus patient into the type I diabetes mellitus intervention model, and receiving a diagnosis intervention result output after the type I diabetes mellitus intervention model processes the data of the appointed type I diabetes mellitus patient;

s523: and displaying the diagnosis intervention result.

As described in the foregoing steps S520 to S523, after the step of training the initialized second reinforcement learning model based on the follow-up visit data of the type one diabetes patient is performed to obtain the trained second reinforcement learning model, and using the trained second reinforcement learning model as the type one diabetes intervention model corresponding to the follow-up visit data of the type one diabetes patient, the method may further include a process of performing a diagnostic intervention process on the input data of the specified type one diabetes patient to be processed by using the type one diabetes intervention model and outputting a corresponding diagnostic intervention result. Specifically, input data specifying type one diabetes patients to be processed is first acquired. Wherein the data structure of the data specifying type one diabetes mellitus patient is the same as the data structure of the follow-up data of the type one diabetes mellitus patient, and the data specifying type one diabetes mellitus patient comprises the specified state of the specified type one diabetes mellitus patient in the specified time. And then calling an interface corresponding to the web service, and calling the type-one diabetes intervention model through the interface. And then inputting the data of the specified type one diabetes mellitus patient into the type one diabetes mellitus intervention model, and receiving a diagnosis intervention result output after the type one diabetes mellitus intervention model processes the data of the specified type one diabetes mellitus patient. And finally displaying the diagnosis intervention result. The display mode of the diagnosis intervention result is not particularly limited, and the display mode can be set according to actual requirements so as to fit the data viewing experience of related users. According to the method and the device, after the input data of the specified type one diabetes mellitus patient to be processed is acquired, the type one diabetes mellitus intervention model can be called out quickly through the interface corresponding to the web service, and then the diagnosis intervention result corresponding to the data of the specified type one diabetes mellitus patient can be generated and displayed quickly through the type one diabetes mellitus intervention model, so that related medical workers can know the specific diagnosis condition of the specified type one diabetes mellitus patient timely and clearly based on the diagnosis intervention result, and can implement corresponding intervention treatment on the specified type one diabetes mellitus patient timely according to the diagnosis intervention result, and guarantee of certain positive effect on the body health of the specified type one diabetes mellitus patient is achieved.

The model generation method based on the transfer learning in the embodiment of the present application may also be applied to the field of blockchains, for example, data such as the type i diabetes intervention model described above is stored on a blockchain. By storing and managing the type one diabetes intervention model by using the block chain, the security and the non-tamper property of the type one diabetes intervention model can be effectively ensured.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering execution according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of canceling contract upgrading logout; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

Referring to fig. 2, an embodiment of the present application further provides a model generation apparatus based on transfer learning, including:

the first acquisition module 1 is used for acquiring a preset type II diabetes intervention model; the type II diabetes intervention model is obtained by training a preset first reinforcement learning model based on pre-collected type II diabetes patient follow-up sample data;

a second obtaining module 2, configured to obtain target model parameters of the type ii diabetes intervention model;

the processing module 3 is configured to initialize a preset model parameter of a second reinforcement learning model by using the target model parameter based on the transfer learning, so as to obtain an initialized second reinforcement learning model;

the third acquisition module 4 is used for acquiring pre-acquired follow-up visit data of the type I diabetes patient; wherein the type one diabetic patient follow-up sample data comprises the status of a type one diabetic patient at each time encompassed within a follow-up period;

the generating module 5 is configured to train the initialized second reinforcement learning model based on the type one diabetes mellitus patient follow-up data to obtain a trained second reinforcement learning model, and use the trained second reinforcement learning model as a type one diabetes mellitus intervention model corresponding to the type one diabetes mellitus patient follow-up data to perform diagnosis intervention processing on the type one diabetes mellitus patient data to be processed through the type one diabetes mellitus intervention model.

In this embodiment, the operations that the modules or units are respectively configured to execute correspond to the steps of the model generation method based on the transfer learning in the foregoing embodiment one to one, and are not described herein again.

Further, in an embodiment of the present application, the processing module 3 includes:

a first obtaining unit, configured to obtain a model parameter of the second reinforcement learning model;

the setting unit is used for setting the model parameters of the second reinforcement learning model as the target model parameters based on the transfer learning to obtain a set second reinforcement learning model;

a first determining unit, configured to use the set second reinforcement learning model as the initialized second reinforcement learning model.

Further, in an embodiment of the present application, the generating module 5 includes:

the first processing unit is used for calling the initialized second reinforcement learning model so as to enable the initialized second reinforcement learning model to determine the action corresponding to each state according to the state of the type one diabetes patient at each time in the follow-up period, and obtain the delay reward of the action corresponding to each state;

the calculating unit is used for calculating a target delay reward based on the delay reward of the action corresponding to each piece of state information;

the second processing unit is used for carrying out repeated iterative processing on the initialized second reinforcement learning model until the target delay reward reaches a preset maximum accumulated delay reward, and obtaining a trained second reinforcement learning model;

a second determining unit, configured to use the trained second reinforcement learning model as the type one diabetes intervention model.

In this embodiment, the operations that the modules or units are respectively configured to execute correspond to the steps of the model generation method based on transfer learning in the foregoing embodiment one to one, and are not described herein again.

Further, in an embodiment of the present application, the first processing unit includes:

the first processing subunit is configured to invoke the initialized second reinforcement learning model, so that the initialized second reinforcement learning model determines, according to a first state at a first time, a first action corresponding to the first state; wherein the first time is any one of all times contained in the follow-up visit period;

the second processing subunit is configured to invoke the initialized second reinforcement learning model, so that the initialized second reinforcement learning model determines, according to a second state at a second time, a second action corresponding to the second state; wherein the second time is a next time within the follow-up period adjacent to the first time;

the inquiring subunit is used for inquiring a target score corresponding to the second state according to a preset state-score relation mapping table after the type I diabetic patient is transferred from the first state to the second state;

and the first determining subunit is used for taking the target score as the delay reward of the first action.

Further, in an embodiment of the application, the calculating unit includes:

the calculating subunit is used for summing the delay rewards of the actions corresponding to each state to obtain corresponding sum values;

and the second determining subunit is used for taking the sum as the target delay reward.

Further, in an embodiment of the present application, the model generation apparatus based on transfer learning includes:

a storage module for storing the type one diabetes intervention model;

the fourth acquisition module is used for acquiring a flash frame based on python;

and the deployment module is used for executing deployment processing corresponding to the type I diabetes intervention model based on the flash framework and the web service.

the fifth acquisition module is used for acquiring the input data of the specified type I diabetes mellitus patient to be processed; wherein the designated type one diabetic patient data comprises a designated status of the designated type one diabetic patient over a designated time;

the calling module is used for calling an interface corresponding to the web service and calling the type-I diabetes intervention model through the interface;

the receiving module is used for inputting the data of the specified type I diabetes mellitus patient into the type I diabetes mellitus intervention model and receiving a diagnosis intervention result output after the type I diabetes mellitus intervention model processes the data of the specified type I diabetes mellitus patient;

and the display module is used for displaying the diagnosis intervention result.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device comprises a processor, a memory, a network interface, a display screen, an input device and a database which are connected through a system bus. Wherein the processor of the computer device is designed to provide computing and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer programs in the storage medium to run. The database of the computer equipment is used for storing a type II diabetes intervention model, type II diabetes patient follow-up visit sample data, target model parameters, an initialized second reinforcement learning model, type I diabetes patient follow-up visit data and a type I diabetes intervention model. The network interface of the computer device is used for communicating with an external terminal through a network connection. The display screen of the computer equipment is an indispensable image-text output equipment in the computer, and is used for converting digital signals into optical signals so that characters and figures are displayed on the screen of the display screen. The input device of the computer equipment is a main device for information exchange between the computer and a user or other equipment and is used for transmitting data, instructions, certain mark information and the like to the computer. The computer program is executed by a processor to implement a method of model generation based on transfer learning.

The processor executes the model generation method based on the transfer learning, and comprises the following steps:

acquiring target model parameters of the type II diabetes intervention model;

initializing model parameters of a preset second reinforcement learning model by using the target model parameters based on the transfer learning to obtain an initialized second reinforcement learning model;

acquiring pre-acquired follow-up data of a type-I diabetic; wherein the type one diabetic patient follow-up sample data comprises the status of the type one diabetic patient at each time contained within a follow-up period;

training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient to obtain a trained second reinforcement learning model, and taking the trained second reinforcement learning model as a type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient so as to diagnose and intervene the data of the type one diabetes patient to be processed through the type one diabetes intervention model.

It will be understood by those skilled in the art that the structure shown in fig. 3 is only a block diagram of a part of the structure related to the present application, and does not constitute a limitation to the apparatus and the computer device to which the present application is applied.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a model generation method based on transfer learning, and specifically includes:

acquiring target model parameters of the type II diabetes intervention model;

acquiring pre-acquired follow-up data of a type-I diabetic; wherein the type one diabetic patient follow-up sample data comprises the status of a type one diabetic patient at each time encompassed within a follow-up period;

In summary, according to the model generation method, the apparatus, the computer device and the storage medium based on the transfer learning provided in the embodiment of the present application, the target model parameters of the pre-created type two diabetes intervention model are obtained, and then the target model parameters are used to initialize the model parameters of the preset second reinforcement learning model based on the transfer learning so as to obtain the initialized second reinforcement learning model, so that the initialized second reinforcement learning model can be trained by using the pre-acquired type one diabetes patient follow-up data, so as to conveniently and quickly generate the type one diabetes intervention model corresponding to the type one diabetes patient follow-up data. As only the currently acquired follow-up data of the type I diabetes mellitus patient is needed to train the second reinforcement learning model which carries out initialization of the target model parameters of the type II diabetes mellitus intervention model, the corresponding type I diabetes mellitus intervention model can be rapidly generated, the acquisition cost of model training data is effectively reduced, the construction cost of generating the type I diabetes mellitus intervention model is greatly reduced, and the construction efficiency of the type I diabetes mellitus intervention model is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, and the computer program may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), dual data rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, apparatus, article or method that comprises the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all the equivalent structures or equivalent processes that can be directly or indirectly applied to other related technical fields by using the contents of the specification and the drawings of the present application are also included in the scope of the present application.

Claims

1. A model generation method based on transfer learning is characterized by comprising the following steps:

acquiring target model parameters of the type II diabetes intervention model;

acquiring pre-acquired follow-up data of a type-I diabetic; wherein the type one diabetic patient follow-up data comprises the status of the type one diabetic patient at each time included in a follow-up period;

training the initialized second reinforcement learning model based on the follow-up data of the type I diabetes patient to obtain a trained second reinforcement learning model, and taking the trained second reinforcement learning model as a type I diabetes intervention model corresponding to the follow-up data of the type I diabetes patient so as to perform diagnosis intervention processing on the data of the type I diabetes patient to be processed through the type I diabetes intervention model;

the step of training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient to obtain a trained second reinforcement learning model, and using the trained second reinforcement learning model as a type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient includes:

using the sum as a target delay reward;

2. The model generation method based on the transfer learning of claim 1, wherein the step of initializing the model parameters of the second reinforcement learning model by using the target model parameters based on the transfer learning to obtain the initialized second reinforcement learning model comprises:

obtaining model parameters of the second reinforcement learning model;

3. The method for generating a model based on transfer learning according to claim 1, wherein the step of invoking the initialized second reinforcement learning model to determine the action corresponding to each state according to the state of the type one diabetes patient at each time in the follow-up period, and obtaining the delay reward of the action corresponding to each state comprises:

calling the initialized second reinforcement learning model to enable the initialized second reinforcement learning model to determine a second action corresponding to a second state according to the second state of second time; wherein the second time is a next time within the follow-up period that is adjacent to the first time;

and taking the target score as a delay reward of the first action.

4. The method for generating a model based on migration learning according to claim 1, wherein the step of training the initialized second reinforcement learning model based on the type one diabetes patient follow-up data to obtain a trained second reinforcement learning model, and using the trained second reinforcement learning model as a type one diabetes intervention model corresponding to the type one diabetes patient follow-up data includes:

storing the type one diabetes intervention model;

acquiring a flash frame based on python;

executing deployment processing corresponding to the type one diabetes intervention model based on the flash framework and a web service.

5. The method for generating a model based on transfer learning according to claim 4, wherein the step of training the initialized second reinforcement learning model based on the follow-up data of the type one diabetes patient to obtain a trained second reinforcement learning model, and using the trained second reinforcement learning model as a type one diabetes intervention model corresponding to the follow-up data of the type one diabetes patient comprises:

inputting the data of the appointed type I diabetes mellitus patient into the type I diabetes mellitus intervention model, and receiving a diagnosis intervention result output by the type I diabetes mellitus intervention model after the data of the appointed type I diabetes mellitus patient is processed;

and displaying the diagnosis intervention result.

6. A model generation device based on transfer learning is characterized by comprising:

the third acquisition module is used for acquiring pre-acquired follow-up data of the type-I diabetic patient; wherein the type one diabetic patient follow-up data comprises the status of the type one diabetic patient at each time included in a follow-up period;

the generation module is used for training the initialized second reinforcement learning model based on the follow-up visit data of the type I diabetes patient to obtain a trained second reinforcement learning model, and taking the trained second reinforcement learning model as a type I diabetes intervention model corresponding to the follow-up visit data of the type I diabetes patient so as to diagnose and intervene the data of the type I diabetes patient to be processed through the type I diabetes intervention model;

using the sum as a target delay reward;

7. A computer arrangement comprising a memory and a processor, the memory having a computer program stored therein, characterized in that the processor, when executing the computer program, is adapted to carry out the steps of the method according to any of claims 1 to 5.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.