CN117174232A

CN117174232A - Electronic medical record generation method and device, electronic equipment and storage medium

Info

Publication number: CN117174232A
Application number: CN202311155779.XA
Authority: CN
Inventors: 欧阳逸
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2023-12-05

Abstract

The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for generating an electronic medical record, an electronic device, and a storage medium, where the method includes: acquiring diagnostic data of each stage associated with a target object; the diagnostic data for each phase includes: at least one data title, and at least one type of data content respectively associated with the at least one data title; aggregating the data content of the same data title associated with different diagnostic data to respectively obtain corresponding aggregation results, and obtaining data to be integrated based on each aggregation result, wherein corresponding identifiers are respectively added to each data title and different types of data content respectively belonging to each data title in the data to be integrated; and adopting a target medical record generation model, and carrying out content integration processing on the data to be integrated according to preset medical record content labels to obtain a corresponding electronic medical record. Thus, the generation efficiency and the content quality of the electronic medical record can be improved.

Description

Electronic medical record generation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for generating an electronic medical record, an electronic device, and a storage medium.

Background

In the field of medical health, after a relevant object finishes a diagnosis, a corresponding electronic medical record is generally required to be built for the relevant object, so that medical staff can be assisted to confirm the diagnosis and treatment condition of the relevant object by means of the electronic medical record, wherein the electronic medical record contains diagnosis and treatment data of the relevant object in each stage in the diagnosis and treatment process.

Under the related technology, in the process of generating the electronic medical record aiming at the related object, medical staff generally manually integrates diagnosis and treatment data of each stage in the diagnosis and treatment process to obtain the corresponding electronic medical record.

However, because the electronic medical record duration is generated by manual arrangement, a large amount of time and cost are required to be consumed for summarizing the data, the diagnosis time of medical staff is greatly occupied, the content integration efficiency of the electronic medical record is reduced, and the integration quality of the medical record content cannot be ensured.

In view of this, a new method for generating electronic medical records is needed to solve the problems of low efficiency and poor content quality of electronic medical records.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating an electronic medical record, electronic equipment and a storage medium, which are used for improving the generation efficiency and the content quality of the electronic medical record.

In a first aspect, a method for generating an electronic medical record is provided, including:

acquiring diagnostic data of each stage associated with a target object; the diagnostic data for each phase includes: at least one data title, and at least one type of data content respectively associated with the at least one data title;

aggregating data contents of the same data titles associated with different diagnostic data to respectively obtain corresponding aggregation results, and obtaining data to be integrated based on each aggregation result, wherein corresponding identifiers are respectively added to each data title and different types of data contents respectively belonging to each data title in the data to be integrated;

and adopting a target medical record generation model, and carrying out content integration processing on the data to be integrated according to preset medical record content labels to obtain the electronic medical record associated with the target object.

A second aspect provides a device for generating an electronic medical record, including:

an acquisition unit for acquiring diagnostic data of each stage associated with the target object; the diagnostic data for each phase includes: at least one data title, and at least one type of data content respectively associated with the at least one data title;

The aggregation unit is used for aggregating the data contents of the same data titles in different diagnostic data to respectively obtain corresponding aggregation results, and obtaining data to be integrated based on each aggregation result, wherein corresponding identifiers are respectively added to each data title and the data contents of different types respectively belonging to each data title in the data to be integrated;

and the processing unit is used for adopting a target medical record generation model, and carrying out content integration processing on the data to be integrated according to preset medical record content labels to obtain the electronic medical record associated with the target object.

Optionally, when the data to be integrated is obtained based on each aggregation result, the aggregation unit is configured to:

for each aggregation result, the following operations are performed: identifying data titles in one aggregation result by adopting a preset title indicator, separating different types of data contents in the one aggregation result by adopting a preset type separator, and separating keywords used for identifying data types and corresponding data contents in the different types of data contents by adopting a preset content separator;

and performing content splicing on the processed aggregation results to obtain corresponding data to be integrated.

content stitching is carried out on each aggregation result to obtain a corresponding total aggregation result;

identifying different data titles in the total aggregation result by adopting a preset title indicator;

separating different types of data contents in the total aggregation result by adopting a preset type separator;

adopting preset content separators to separate keywords for identifying data types and corresponding data contents from the data contents of different types;

and determining the processed total aggregation result as data to be integrated.

Optionally, when the acquiring diagnostic data of each stage associated with the target object, the acquiring unit is configured to perform any one of the following operations:

responding to a medical record generation instruction triggered by a diagnosis object aiming at a target object, and acquiring diagnosis data of each stage associated with the target object, wherein the diagnosis data is stored in a preset target address;

and responding to the medical record generation indication triggered by the diagnosis object aiming at the target object, and acquiring diagnosis data of each stage related to the target object, wherein the diagnosis data is uploaded by the diagnosis object.

Optionally, when the obtaining diagnostic data of each stage associated with the target object stored in the preset target address, the obtaining unit is configured to:

acquiring data positioning information corresponding to the target object, wherein the data positioning information is obtained after privacy processing is performed on the identification information of the target object;

and accessing a preset target address based on the data positioning information, and acquiring diagnosis data of each stage associated with the data positioning information storage.

Optionally, before the generating an indication in response to the medical record triggered by the diagnostic object for the target object, the acquiring unit is further configured to:

authenticating a diagnosis object by adopting a preset authentication condition, and determining that the diagnosis object has the right of triggering medical record generation indication aiming at a target object;

wherein the authentication condition comprises any one or a combination of the following:

the equipment information of the terminal equipment used by the diagnosis object is concentrated in preset safety equipment information;

the IP address of the terminal equipment used by the diagnosis object is concentrated in a pre-stored safe IP address;

a doctor-patient diagnosis and treatment relationship exists between the diagnosis object and the target object.

Optionally, when the aggregation units are used for aggregating the data contents associated with the same data header in different diagnostic data to obtain corresponding aggregation results respectively, the aggregation units are used for:

for each data title, the following operations are performed:

determining the data content belonging to a data title in different diagnostic data respectively;

and according to the generation time sequence of each data content, content splicing is carried out on each data content, and a corresponding aggregation result is obtained.

Optionally, the target medical record generating model is obtained by training in the following manner:

performing multiple rounds of iterative training on the initial medical record generating model by adopting preset training samples to obtain a trained target medical record generating model; a training sample comprising: sample input data, and sample medical records obtained by integrating the content of the sample input data according to preset content labels of each medical record;

wherein, in a round of iterative training process, the following operations are executed:

inputting the acquired sample input data into the medical record to be trained to generate a model to obtain a prediction integration result; the medical record generation model to be trained comprises the following steps: the method comprises the steps of generating a model of an initial medical record after pre-training, and adding a network to be trained in the initial medical record generating model according to a preset fine-tuning training mode;

And calculating a model loss value according to the content difference between the predicted medical record and the corresponding sample medical record, and adjusting the network parameters of the network to be trained according to the model loss value.

Optionally, each training sample is constructed in any one of the following manners:

acquiring data to be integrated of each sample, and respectively integrating the content of the data to be integrated of each sample according to a preset content label of each medical record to obtain a corresponding sample medical record; for each sample of data to be integrated, the following operations are respectively executed: sorting data to be integrated of one sample into sample input data with a length not exceeding a specified length, and generating training samples based on the sample input data and corresponding sample medical records;

acquiring data to be integrated of each sample, and respectively integrating the content of the data to be integrated of each sample according to a preset content label of each medical record to obtain a corresponding sample medical record; for each sample of data to be integrated, the following operations are respectively executed: and splicing the sample to-be-integrated data with a preset processing prompt text, sorting the obtained splicing result into sample input data with the length not exceeding a specified length, and generating a training sample based on the sample input data and a corresponding sample medical record.

In a third aspect, an electronic device is presented comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above method when executing the computer program.

In a fourth aspect, a computer readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, implements the above method.

In a fifth aspect, a computer program product is proposed, comprising a computer program which, when executed by a processor, implements the above method.

The application has the following beneficial effects:

in the embodiment of the application, in the process of generating the electronic medical record aiming at a target object, the diagnostic data of each stage related to the target object is acquired according to the generation of the electronic medical record, so that by means of the acquired diagnostic data of each stage, at least one data title included in each diagnostic data and at least one type of data content respectively related to the at least one data title can be respectively determined, which is equivalent to determining the range of the data content according to which the electronic medical record is generated, and determining the data content form in the diagnostic data of each stage, namely, the existence of the data title, the data type and the association relation among the data content in the diagnostic data;

Then, the data contents of the same data title are associated in different diagnostic data by aggregation to respectively obtain corresponding aggregation results, which are equivalent to sorting and carding the contents of different diagnostic data from the angle of the data titles, so that the data contents of the corresponding same data title are aggregated together, and the obtained aggregation results can comprise the data contents which are generated in different stages and belong to the same data title; based on each aggregation result, obtaining data to be integrated aiming at each data title and different types of data contents respectively belonging to each data title, wherein corresponding identifiers are respectively added, which is equivalent to the fact that the data contents belonging to different data titles are identified and aggregated, so that the obtained data to be integrated can cover all diagnostic data and meanwhile, different types of data contents and attribution conditions of the data contents are identified differently, different types of data contents can be distinguished differently, and the data treatability is improved;

then, a target medical record generation model is adopted, content integration processing is carried out on data to be integrated according to preset medical record content labels, and electronic medical records related to target objects are obtained, so that medical record contents corresponding to the medical record content labels can be respectively extracted from the data to be integrated by means of the target medical record generation model, and finally electronic medical records comprising various medical record contents are obtained through processing, so that the generation effect of the electronic medical records is guaranteed, the generation efficiency of the electronic medical records is greatly improved, and the generation difficulty of the electronic medical records is reduced; in addition, because the data titles and the data contents of different types respectively belonging to the data titles are identified differently in the data to be integrated, the model generation processing for the target medical record can be facilitated, the model is facilitated to identify the data contents of different types, and the processing difficulty of the model is reduced.

Drawings

Fig. 1 is a schematic diagram of a possible application scenario in an embodiment of the present application;

FIG. 2A is a schematic diagram of a flow chart for generating an electronic medical record according to an embodiment of the present application;

fig. 2B is a schematic diagram of a target address pre-stored in a server device according to an embodiment of the present application;

FIG. 2C is a schematic diagram of a possible process for determining the result of the numbering according to the embodiment of the application;

FIG. 2D is a schematic diagram of another possible process for determining the result of the numbering according to the embodiment of the application;

FIG. 2E is a schematic diagram of a process for obtaining data to be integrated according to an embodiment of the present application;

FIG. 2F is a schematic diagram of another process for obtaining data to be integrated according to an embodiment of the present application;

FIG. 2G is a schematic diagram illustrating a process of a round of fine tuning training according to an embodiment of the present application;

FIG. 2H is a schematic diagram of another process of a round of fine tuning training according to an embodiment of the present application;

FIG. 3A is a schematic diagram illustrating a process of each stage of generating an electronic medical record according to an embodiment of the present application;

FIG. 3B is a schematic diagram of a process for obtaining an electronic medical record according to an embodiment of the present application;

fig. 4 is a schematic logic structure diagram of a generating device of an electronic medical record according to an embodiment of the present application;

fig. 5 is a schematic diagram of a hardware composition structure of an electronic device to which the embodiment of the present application is applied.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be capable of operation in sequences other than those illustrated or otherwise described.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

An initial medical record generation model: in the embodiment of the application, the initial medical record generation model can be a pre-trained language model, wherein for the pre-trained language model, the pre-training is realized by means of a large-scale corpus in the pre-training process without generating training samples by manual labeling, and the pre-trained language model can learn general language representation and knowledge from the large-scale corpus by pre-training the language model, so that task processing can be better realized when specific downstream tasks are faced later; in addition, downstream tasks that can be accomplished with the help of the language model include, but are not limited to: natural language understanding, natural language generation, natural language translation, natural language classification, and the like.

Fine tuning training: means further training of the model after pre-training by using each training sample created based on the specific task; by means of fine-tuning training, the model is enabled to adapt better to a specific task.

Natural language generation: by natural language processing and other technologies, generating natural language text conforming to grammar and semantic rules, wherein various tasks exist under the natural language generation technology, such as: automatic summarization, machine translation, dialogue generation, code generation, etc.

Generating an electronic medical record: in the embodiment of the application, the generation of the electronic medical record refers to the generation of medical record text meeting the preset specification requirements based on the diagnosis data of each stage associated with the target object by using a natural language generation technology, wherein the preset specification requirements comprise preset medical record content labels.

Medical record content label: in the embodiment of the application, after the content types in the electronic medical records are determined according to the actual electronic medical record generation needs, the medical record content labels are set for representing different content types, for example, the medical record content labels may be admission conditions, admission diagnosis, discharge diagnosis and the like.

Artificial intelligence (Artificial Intelligence, AI): the system is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The following briefly describes the design concept of the embodiment of the present application:

In the related art, in the process of generating an electronic medical record for a related object, the medical staff typically generates the electronic medical record manually.

In the process of specifically generating the electronic medical record, medical staff refers to various records of related objects during hospitalization, and content is manually re-integrated to finally generate the corresponding electronic medical record.

Therefore, the electronic medical record is generated manually, the generation efficiency of the electronic medical record is reduced, and the generation quality of the electronic medical record cannot be guaranteed. In addition, the diagnosis time of medical staff is occupied, so that the medical staff needs to summarize the electronic medical record for related objects on one hand and needs to communicate and treat other objects on the other hand; when the number of related objects for issuing the electronic medical record is large, the summary work of the electronic medical record can not influence the communication and diagnosis work of medical staff, and the medical staff can not simultaneously consider a plurality of works, so that the working efficiency and the quality can be reduced.

In view of this, in the embodiment of the present application, a method, an apparatus, an electronic device, and a storage medium for generating an electronic medical record are provided, in a process of generating an electronic medical record for a target object, diagnostic data of each stage associated with the target object according to which the electronic medical record is generated is obtained, so that by means of the obtained diagnostic data of each stage, at least one data header included in each diagnostic data and at least one type of data content respectively associated with at least one data header can be respectively determined, which is equivalent to determining a data content range according to which the electronic medical record is generated, and determining a data content form in the diagnostic data of each stage, that is, a data header, a data type, and an association relationship between data contents exist in the diagnostic data;

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and that the embodiments of the present application and the features of the embodiments may be combined with each other without conflict.

Fig. 1 is a schematic diagram of a possible application scenario in an embodiment of the present application. The application scenario diagram includes a server device 110 and a client device 120.

In some possible embodiments of the present application, the server device 110 may train to obtain a target medical record generation model, and implement generation of an electronic medical record; specifically, the server device 110 obtains diagnostic data of each stage associated with the target object, and aggregates and marks the diagnostic data of each stage to obtain corresponding data to be integrated; and then, adopting a target medical record generation model, and carrying out content integration processing on the data to be integrated according to preset medical record content labels to finally obtain the electronic medical record associated with the target object.

In other possible embodiments of the present application, in the case that the client device 120 is powerful, the client device 120 may train to obtain the target medical record generation model, and implement the generation of the electronic medical record; alternatively, the client device 120 may obtain the target medical record generation model trained by the server device 110, and implement generation of the electronic medical record according to the obtained target medical record generation model. In the process of generating the electronic medical record based on the target medical record generating model, the client device 120 may respond to the electronic medical record generating operation triggered by the diagnostic object for the target object, aggregate and mark the diagnostic data of each stage associated with the target object to obtain data to be integrated, and then, adopt the target medical record generating model to perform content integration processing on the data to be integrated according to preset content labels of each medical record to finally obtain the electronic medical record associated with the target object.

When the target medical record generating model is obtained by performing fine-tuning training on the initial medical record generating model after pre-training with a large parameter magnitude, in order to meet the equipment resources required by the model training, the model training and the model processing may be preferably implemented on the server equipment 110.

The server device 110 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), basic cloud computing services such as big data and an artificial intelligence platform.

The client device 120 may be a personal computer, a mobile phone, a tablet computer, a notebook, an electronic book reader, a smart home, a vehicle terminal, or the like.

It should be noted that, in the embodiment of the present application, when the client device 120 receives the medical record generation instruction initiated by the diagnostic object for the target object, the medical record generation instruction initiated by the diagnostic object may be received by means of the target application, where the target application may be an applet application, or a client application, or a web application, and the present application is not limited in this respect.

In the embodiment of the present application, the server device 110 and the client device 120 may communicate through a wired network or a wireless network, and in the following description, only the training of the target medical record generation model and the generation of the electronic medical record are implemented by the server device 110, which is taken as an example, and the related processing procedure is schematically described.

The following describes the process of generating an electronic medical record in combination with several possible application scenarios:

and generating an electronic medical record aiming at a target object needing to guarantee privacy safety in a first scene.

Specifically, in the processing scenario corresponding to the first scene, the server device may respond to the medical record generation instruction of the medical staff on the patient staff to obtain the diagnosis data of each stage associated with the patient staff; then, carrying out classified aggregation and identification processing on the diagnostic data of each stage to obtain processed data to be integrated; and then, adopting a trained target medical record generation model, and carrying out content integration processing on data to be integrated according to preset medical record content labels to obtain the electronic medical record associated with the patient personnel.

It should be noted that, since the scenario is directed to the generation of an electronic medical record by a patient, in an actual processing process, the diagnostic data of each stage acquired by the server device does not include information capable of determining the identity of the patient, because of the protection of the privacy of the patient.

And generating an electronic medical record aiming at a target object which does not need to ensure privacy safety in a second scene.

Specifically, in the processing scenario corresponding to the second scenario, the server device may respond to the medical record generation instruction of the veterinarian on the patient animal to obtain the diagnosis data of each stage associated with the patient animal; then, carrying out classified aggregation and identification processing on the diagnostic data of each stage to obtain processed data to be integrated; and then, adopting a trained target medical record generation model, and carrying out content integration processing on data to be integrated according to preset medical record content labels to obtain the electronic medical record associated with the patient animal.

It should be noted that, according to the actual processing requirement, since the second scenario is the generation of an electronic medical record performed on the patient animal, in the actual processing process, the diagnostic data of each stage acquired by the server device may include information capable of determining the identity of the patient animal.

In addition, it should be understood that in the specific embodiment of the present application, related data such as diagnostic data of each stage of the target object is required to obtain permission or consent of the related object when the embodiments described in the present application are applied to specific products or technologies, and the collection, use and processing of the related data is required to comply with related laws and regulations and standards of related countries and regions.

In the fine tuning training process, the diagnosis data used by the application come from the public medical record data set, wherein the patient information is encrypted, and the data set is only used for scientific research.

The following describes a process of generating an electronic medical record from the perspective of a server device with reference to the accompanying drawings:

referring to fig. 2A, which is a schematic diagram of a generating flow of an electronic medical record according to an embodiment of the present application, a generating process of the electronic medical record is described below with reference to fig. 2A:

step 201: the server device obtains diagnostic data of each stage associated with the target object.

In the embodiment of the present application, in the process of obtaining diagnostic data of each stage associated with a target object, there may be two possible data obtaining manners, where the diagnostic data of each stage includes: at least one data title, and at least one type of data content to which the at least one data title is respectively associated.

And the first data acquisition mode acquires data from a preset target address.

In the data acquisition process corresponding to the first data acquisition mode, the server device can respond to the medical record generation instruction triggered by the diagnosis object aiming at the target object to acquire the diagnosis data of each stage stored in the preset target address and associated with the target object.

Specifically, after the diagnostic object initiates a medical record generation instruction for the target object based on the using terminal, the server device may obtain diagnostic data of each stage associated with the target object stored at a preset target address in response to the medical record generation instruction, where the storage location indicated by the target address stores diagnostic data of each stage corresponding to each diagnosed object; the target object is one object in each diagnosed object; each diagnosed object may be: each object of one diagnostic object having processing authority, or each diagnosed object may be: visit each subject at the same medical facility.

It should be noted that, in a possible implementation manner, the server device may configure a common storage location for each diagnostic object in advance, and determine a target address corresponding to the storage location.

Optionally, in order to ensure the storage security of the data, the server device may configure corresponding storage locations for each diagnostic object in advance, and determine respective target addresses corresponding to the storage locations, so that when data acquisition is performed in response to a medical record generation instruction triggered by one diagnostic object, the data can be directly acquired from the corresponding target addresses.

For example, referring to fig. 2B, which is a schematic diagram of a target address pre-stored in a server device in an embodiment of the present application, according to fig. 2B, for a diagnostic object, there is at least one diagnosed object attributed to the diagnostic object, where the diagnostic object may determine a target object in the diagnosed objects attributed to the diagnostic object, where a relationship between the diagnosed object and the diagnostic object may specifically refer to a doctor-patient relationship, such as a relationship between an attending physician and a patient.

Continuing with the description of fig. 2B, assuming that m medical institutions served by the server device have m medical diagnostic objects, i.e., diagnostic objects 1-m, the server device may pre-store corresponding target addresses for the diagnostic objects 1-m in advance, respectively, e.g., diagnostic object 1 corresponds to target address 1, diagnostic object 2 corresponds to target 2.

Further, in the process that the server side equipment acquires diagnostic data of each stage associated with a target object and stored in a preset target address, the server side equipment acquires data positioning information corresponding to the target object, wherein the data positioning information is obtained after privacy processing is performed on identification information of the target object; and accessing a preset target address based on the data positioning information to acquire diagnostic data of each stage of associated data positioning information storage.

Specifically, in the case that the server device needs to acquire data from the target address, in order to ensure the privacy security of the target object, for the diagnostic data of each stage of the target object stored at the target address, the data associated with the target object is identified by using the data positioning information obtained after privacy processing of the identification information, where the identification information of the diagnosed object refers to information capable of describing the identity of the diagnosed object.

In the embodiment of the present application, when privacy processing is performed on the identification information of the diagnosed object, numbering processing may be performed on the diagnosed object to obtain a numbering result uniquely corresponding to the identification information of the diagnosed object; and then, determining the numbering result as the data positioning information after the privacy processing.

In the process of numbering the diagnosed object, in a feasible implementation manner, the code can be performed from a global scope, namely, the numbering result corresponding to the diagnosed object can be determined by a related numbering server based on the identification information of the diagnosed object in the scope of the affiliated medical institution; alternatively, in other possible embodiments, the corresponding numbering result may be constructed by the relevant numbering server based on the relationship between the diagnosed object and the diagnostic object providing the diagnosis and treatment service for the diagnosed object.

For example, in possible numbering modes, corresponding numbering results can be determined together according to time, the diagnosis mechanism and the identification information of the diagnosed object, for example, the diagnosis time is 2021-3-26, the number corresponding to the diagnosis mechanism is XX012, and the ID information associated with the diagnosed object is L; then, a preset encryption algorithm may be adopted to generate a corresponding encryption result S based on L, where the adopted encryption algorithm may be a function algorithm constructed according to actual processing needs, or may be a specified hash algorithm; then, the encryption result is spliced with the diagnosis time and the number of the diagnosis organization, and the obtained number result is as follows: 20210326XX012S.

Therefore, when the numbering result is constructed, various possible factors such as time, a doctor's office and an encryption result generated based on the identification information are integrated, so that the constructed numbering result meets the privacy requirement and has the identification uniqueness for the diagnosed object.

For another example, referring to fig. 2C, which is a schematic diagram of a possible process of determining a number result in an embodiment of the present application, according to the process illustrated in fig. 2C, in a possible implementation manner, after a medical staff takes a visit to the subject a, personal information of the subject a is obtained; then, medical staff sends a numbering result acquisition request to a numbering server by means of the used terminal equipment, wherein the numbering result acquisition request carries personal information of the object A; furthermore, after the terminal equipment used by the medical staff obtains the numbering result fed back by the numbering server, the terminal equipment applies for determining a target address for storing various diagnostic data for the object A from the server equipment based on the numbering result of the object A, wherein the numbering result is data positioning information corresponding to the object A; and then, the server side equipment feeds back the target address corresponding to the object A to the terminal equipment. It should be noted that, according to the actual configuration requirement, the server device and the numbering server may correspond to the same device or different devices.

For another example, referring to fig. 2D, which is a schematic diagram illustrating another possible process of determining the number result according to the embodiment of the present application, as can be seen from the process illustrated in fig. 2D, in a possible implementation manner, after a medical staff takes a visit to the subject a, personal information of the subject a is obtained; then, medical staff sends a numbering result acquisition request to corresponding server-side equipment by means of the used terminal equipment, wherein the numbering result acquisition request carries personal information of the object A; further, the server device forwards the numbering result acquisition request to a numbering server, and sends the numbering result fed back by the numbering server to a terminal device used by medical staff, wherein the numbering result is data positioning information corresponding to the object A; then, the terminal equipment applies for determining a target address for storing various diagnostic data for the object A from the server equipment based on the coding result of the object A; and then, the server side equipment feeds back the target address corresponding to the object A to the terminal equipment. It should be noted that, according to the actual configuration requirement, the server device and the numbering server may correspond to the same device or different devices.

In this way, when the diagnostic data of each stage associated with the target object is acquired, the diagnostic data can be directly acquired from a preset target address according to the data positioning information corresponding to the target object; in addition, because the data is searched according to the data positioning information in the data acquisition process, the related operation of directly using the identification information of the target object can be avoided, the privacy protection of the target object is enhanced, and the data security is improved.

And a second data acquisition mode is used for acquiring the diagnosis data of each stage of the target object uploaded by the diagnosis object.

In the data acquisition process corresponding to the second data acquisition mode, the server side equipment responds to the medical record generation indication triggered by the diagnosis object aiming at the target object to acquire diagnosis data of each stage uploaded by the diagnosis object and associated with the target object.

Specifically, in a feasible implementation manner, in a process of providing services for the terminal device where the diagnostic object is located, the service end device may present a service function for directly uploading diagnostic data on the terminal device, so that the diagnostic object can upload diagnostic data of each stage of the target object by means of the service function on the terminal device.

In this way, the first data acquisition mode and the second data acquisition mode can be used for acquiring diagnostic data of each stage associated with the target object, and the diversified data acquisition mode can be used for acquiring the diagnostic data, so that a processing basis is provided for the generation of subsequent electronic medical records.

In addition, in the process of selecting either the first data acquisition mode or the second data acquisition mode to perform data acquisition, in order to improve the security of data acquisition, after receiving the medical record generation instruction of the target object, before performing data acquisition in response to the medical record generation instruction, permission identification may be performed on the diagnostic object to determine whether the diagnostic object has permission to trigger generation of a corresponding electronic medical record for the target object.

Specifically, before data acquisition is performed in response to the medical record generation instruction, the server device needs to adopt a preset authentication condition to authenticate the diagnosis object, and it is determined that the diagnosis object has the authority for triggering the medical record generation instruction aiming at the target object.

Wherein the preset authentication conditions comprise any one or combination of the following:

the equipment information of the terminal equipment used by the diagnosis object and the authentication condition I are in a preset safety equipment information set.

Specifically, the server device may maintain a set of security device information in advance, where the set of security device information includes: device information of each terminal device authorized to interact with the server device is attributed to a medical institution in which the diagnosis object is licensed.

Based on the above, when the diagnosis object is on the used terminal equipment, after sending the medical record generation instruction to the server equipment, the equipment information of the terminal equipment can be obtained; further, in the case where it is determined that the device information is included in the preset security device information set, the server device may determine that the diagnostic object using the terminal device meets the constraint of the authentication condition one.

And the authentication condition II and the IP address of the terminal equipment used by the diagnosis object are in a pre-stored safety IP address set.

Specifically, the server device may maintain a set of secure internet protocol (Internet Protocol Address, IP) addresses in advance, where the set of secure IP addresses includes: the respective IP addresses authorized to interact with the server device are attributed to the medical institution in which the diagnostic object is licensed.

Based on the information, when the diagnosis object is on the used terminal equipment, after sending a medical record generation instruction to the server equipment, the IP address of the terminal equipment can be obtained; furthermore, in the case that the IP address is determined to be included in the preset set of secure IP addresses, the server device may determine that the diagnostic object using the terminal device meets the constraint of the authentication condition two.

And a diagnosis and treatment relationship exists between the diagnosis object and the target object under the authentication condition III.

Specifically, when the diagnosis object is on the used terminal equipment, after sending a medical record generation instruction to the server side equipment, the server side equipment judges whether a doctor-patient diagnosis and treatment relationship exists between the diagnosis object and the target object; after determining that the diagnosis object and the target object have the diagnosis and treatment relationship, the server device can determine the diagnosis object and accords with the constraint of the authentication condition III.

It should be noted that, in the embodiment of the present application, when judging whether a diagnosis object and a target object have a doctor-patient diagnosis and treatment relationship, the following operations may be performed: judging whether the object requesting to construct corresponding data positioning information aiming at the target object and requesting the corresponding target object to determine the corresponding target address is the diagnosis object.

Therefore, by authenticating the diagnosis object, the authority verification of the diagnosis object can be realized before the medical record generation instruction of the diagnosis object to the target object is responded, the diagnosis object is ensured to have the authority for triggering and generating the electronic medical record aiming at the target object, and the safety of the interaction process is improved.

Step 202: the server equipment aggregates the data content of the same data title associated with different diagnostic data to respectively obtain corresponding aggregation results, and obtains data to be integrated based on the aggregation results, wherein corresponding identifiers are respectively added in the data to be integrated for each data title and the data content of different types respectively belonging to each data title.

In the embodiment of the application, in the process of aggregating the data content of the same associated data title in different diagnostic data and respectively obtaining the corresponding aggregation result, the server device respectively executes the following operations for each data title: determining the data content belonging to a data title in different diagnostic data respectively; and according to the generation time sequence of each data content, performing content splicing on each determined data content to obtain a corresponding aggregation result.

It should be noted that, in the embodiment of the present application, the diagnostic data of each stage may specifically refer to diagnostic data generated at different time stages, where the diagnostic data of each stage includes: at least one data title, and at least one type of data content respectively associated with the at least one data title, wherein it is understood that the data title is equivalent to a limitation of a wide range of data content, and the at least one type of data content is covered under the one data title.

Specifically, after the server device obtains the diagnostic data of each stage associated with the target object, the diagnostic data of different stages may include data contents belonging to the same data header, so that the server device may aggregate the data contents belonging to the same data header in the diagnostic data of different stages, and finally, the server device corresponds to each data header to obtain corresponding aggregate results.

Taking the aggregation of a data header to obtain a corresponding aggregation result as an example, when the diagnostic data of different stages are aggregated, different data contents belonging to the data header can be spliced according to the generation time sequence of each data content in the diagnostic data of different stages to obtain the corresponding aggregation result.

Therefore, from the angle of the data titles to which the data contents belong, the data titles in different stages are combined and spliced, and the respective corresponding aggregation results of the data titles are determined, which is equivalent to clustering the data titles in different stages according to the data titles, so that the relevance of the data contents in the aggregation results is ensured.

After each aggregation result is obtained, in the process of obtaining the result to be integrated by the server device based on each aggregation result, according to actual processing requirements, corresponding identifiers can be respectively added for different types of data contents belonging to different data titles after each aggregation result is integrated, so as to obtain data to be integrated, or corresponding identifiers can be respectively added for different types of data contents belonging to different data titles, and then each aggregation result after processing is integrated, so as to obtain corresponding data to be integrated, and two possible integration modes are respectively described below:

In the first integration mode, corresponding identifiers are respectively added to data titles and different types of data contents belonging to the data titles in each integration result, and then the processed aggregation results are integrated to obtain data to be integrated.

Specifically, the server device performs the following operations for each aggregation result: identifying data titles in an aggregation result by adopting a preset title indicator, separating different types of data contents in the aggregation result by adopting a preset type separator, and separating keywords used for identifying the data types and corresponding data contents in the different types of data contents by adopting a preset content separator; and then, content splicing is carried out on the processed aggregation results, and corresponding data to be integrated is obtained.

For example, referring to fig. 2E, which is a schematic diagram of a process for obtaining data to be integrated according to an embodiment of the present application, referring to the corresponding process of fig. 2E, the content type and the data content are separated by "#", i.e., the content separator is separated by "#", and the different types of data content are separated by "x", i.e., the type separator is separated by "#"; for example, "in" main complaints "," current medical history ", and the like are taken as key fields describing the type of data contents. "content 1.I" means text content corresponding to the key field i. "#" is used to separate key fields from their content. ". Times." is used to separate multiple key fields. "…" is an omitted symbol to indicate other key fields and contents not mentioned.

Similarly, continuing with the description of fig. 2E, for data title-admission diagnosis, the process results in: content [ admission diagnosis ]; for data title-image report, the processing results in: inspection/inspection of single chinese name # content inspection/inspection of single english name # content inspection report result-objective view # content inspection report result-subjective prompt # content …; for a data title-laboratory test sheet, the processing results in: check/examine single chinese name # content & check index result # content & reference value # content …, wherein "&" represents a separator within a check/examine for separating the "check/examine single chinese name", "check index result" from "reference value"; for the data title-course record, the processing results in: content …. And then, after the processing process is finished, respectively corresponding processing results of the admission condition, the admission diagnosis, the image report, the laboratory test report and the disease course record are spliced in sequence to obtain final data to be integrated.

Therefore, identification of different data titles and various data contents belonging to different data titles can be realized, and different types of data contents can be separated in the finally obtained data to be integrated in a differential mode.

And in the second integration mode, the aggregation results are spliced to obtain a total aggregation result, and then corresponding identifiers are respectively added in the total aggregation result aiming at different data titles and data contents respectively belonging to different types of data titles to obtain data to be integrated.

Specifically, in the processing process corresponding to the second integration mode, the server side equipment performs content splicing on each aggregation result to obtain a corresponding total aggregation result; then adopting a preset title indicator to identify different data titles in the total aggregation result; then, adopting a preset type separator to separate different types of data contents in the total aggregation result; then adopting a preset content separator to separate keywords for identifying the data types and corresponding data contents from different types of data contents respectively; and further, determining the processed total aggregation result as data to be integrated.

For example, referring to fig. 2F, which is a schematic diagram of another process for obtaining data to be integrated in the embodiment of the present application, as can be seen from the process corresponding to fig. 2F, after obtaining each aggregation result, content stitching is performed on each aggregation result to obtain a total aggregation result; and adding a title indicator, a type separator and a content separator to the total aggregation result respectively.

Specifically, continuing to refer to fig. 2F, the content type and the data content are separated by "#" (i.e., the "[ MEANS FOR ] is a header indicator), the content separator is a" # ", and the data content of different types is separated by" x "", i.e., the type separator is a type separator; for example, "in" main complaints "," current medical history ", and the like are taken as key fields describing the type of data contents. "content 1.I" means text content corresponding to the key field i. "#" is used to separate key fields from their content. ". Times." is used to separate multiple key fields. "…" is an omitted symbol to indicate other key fields and contents not mentioned.

Similarly, continuing with the description of fig. 2E, for data title-admission diagnosis, the process results in: content [ admission diagnosis ]; for data title-image report, the processing results in: inspection/inspection of single chinese name # content inspection/inspection of single english name # content inspection report result-objective view # content inspection report result-subjective prompt # content …; for a data title-laboratory test sheet, the processing results in: check/examine single chinese name # content & check index result # content & reference value # content …, wherein "&" represents a separator within a check/examine for separating the "check/examine single chinese name", "check index result" from "reference value"; for the data title-course record, the processing results in: content …. And finally, after the processing is completed, obtaining the final data to be integrated.

In this way, in the process of specifically processing and obtaining the data to be integrated, a large-scale total aggregation result is obtained by integrating, and then, identification of different data titles and various data contents belonging to the different data titles is realized according to the total aggregation result, so that different types of data contents can be separated in the finally obtained data to be integrated in a differentiated manner.

Step 203: and the server side equipment adopts a target medical record generation model, and performs content integration processing on the data to be integrated according to preset medical record content labels to obtain an electronic medical record associated with the target object.

In the embodiment of the present application, before executing step 203, the server device needs to train to obtain the target medical record generating model, and the process of training to obtain the target medical record generating model is described below:

it should be noted that, the target medical record generation model is obtained after performing multiple rounds of iterative training by adopting preset training samples; a training sample comprising: sample input data and corresponding sample medical records; the sample medical records are obtained by integrating corresponding sample input data according to the content of each preset medical record content label.

In order to train to obtain a medical record generation model, the server device needs to construct each training sample for model training, wherein when the training samples are generated, prompt contents with different degrees can be added into sample input data according to actual processing requirements, so when each training sample is generated, any one of the following modes can be adopted, including but not limited to, to realize the generation of each training sample.

The method comprises the steps of generating sample to-be-integrated data based on desensitized open source diagnostic data, and directly constructing sample input data in a training sample based on the sample to-be-integrated data.

In a processing mode corresponding to the generation mode, the server equipment acquires data to be integrated of each sample, and respectively integrates the content of the data to be integrated of each sample according to a preset content label of each medical record to obtain a corresponding sample medical record; for each sample of data to be integrated, the following operations are respectively executed: and sorting the data to be integrated of one sample into sample input data with the length not exceeding a specified length, and generating training samples based on the sample input data and corresponding sample medical records.

Specifically, the server device acquires the open source diagnosis data of each sample object after desensitization, and determines the data title belonging to each medical record content label according to the preset medical record content label; then, according to the processing mode described in step 202, open source diagnostic data of each sample object are processed respectively to obtain sample data to be integrated corresponding to each sample object; and according to the corresponding relation between the data title and the medical record content label, respectively aiming at the data to be integrated of each sample, and sorting to obtain the corresponding sample medical record.

Further, when the training sample is constructed, a training sample is constructed based on the sample to-be-integrated data and the sample medical record with the corresponding relation, wherein the sample to-be-integrated data is taken as: sample input data for input into the model, and sample medical records are taken as: and the sample output result is used for carrying out model output comparison.

It should be noted that, due to the limitation of the computing resource, the sample input data and the sample medical record with too long lengths may exceed the upper limit of the computing resource, so the length constraint may be performed on the sample input data used as the model input according to the limitation of the computing resource, that is, the sample input data to be integrated is sorted into the sample input data with not more than the specified length.

Specifically, in the process of sorting one sample to-be-integrated data into sample input data with a length not exceeding a specified length, the content of the data with the length exceeding the specified length in the sample to-be-integrated data can be truncated; or when the data to be integrated of the sample is determined to exceed the specified length, determining a data title with the lowest priority according to the preset data priority sequence, and deleting the data content belonging to the data title with the low priority from the data to be integrated of the sample until the processed data to be integrated of the sample does not exceed the specified length; or, the designated length of the input model is directly set to be a larger value, so that the data to be integrated of the sample can be avoided from exceeding the designated length as much as possible.

Generating a second mode, generating sample to-be-integrated data based on the desensitized open source diagnostic data, and constructing sample input data in a training sample based on the sample to-be-integrated data and a preset processing prompt text.

In a processing mode corresponding to the second generation mode, the server side equipment acquires data to be integrated of each sample, and respectively integrates the content of the data to be integrated of each sample according to a preset content label of each medical record to obtain a corresponding sample medical record; for each sample of data to be integrated, the following operations are respectively executed: and splicing the data to be integrated of one sample with a preset processing prompt text, sorting the obtained splicing result into sample input data with the length not exceeding a specified length, and generating a training sample based on the sample input data and a corresponding sample medical record.

Specifically, the server device acquires the open source diagnosis data of each sample object after desensitization, and determines the data title belonging to each medical record content label according to the preset medical record content label; then, according to the processing mode described in step 202, open source diagnostic data of each sample object are processed respectively to obtain sample data to be integrated corresponding to each sample object; and according to the corresponding relation between the data title and the medical record content label, respectively based on the data to be integrated of each sample, and sorting to obtain the corresponding sample medical record.

Further, when the training sample is constructed, a training sample is constructed based on the sample to-be-integrated data and the sample medical record with the corresponding relation.

Specifically, when sample input data in a training sample is constructed, splicing the sample to-be-integrated data and a preset processing prompt text, and finishing the obtained splicing result into sample input data with the length not exceeding a specified length, and meanwhile, taking a sample medical record as: and the sample output result is used for carrying out model output comparison.

For example, after a preset processing prompt text is added on the basis of the data to be integrated of the samples, the training sample shape input to the model is as follows: the input of the task is admission information, and the output is an discharge nodule. Please generate an discharge nodule according to the information of the input of the hospital admission condition, the hospital admission diagnosis, the image report, the laboratory test report and the course record. The output must include contents of "admission condition", "admission diagnosis", "diagnosis and treatment pass", "discharge condition", "discharge diagnosis". Input: the sample inputs data. And (3) outputting: sample medical records.

In this way, by means of the training sample generation modes indicated by the first generation mode and the second generation mode, prompt contents can be selectively blended into sample input data according to actual processing requirements, compared with the generation mode which provides the training sample generation mode, the data amount in the sample input data can be reduced and the processing pressure of a model can be reduced to a certain extent in the processing corresponding to the first generation mode; compared with the training sample generation mode provided by the generation mode I, in the processing process corresponding to the generation mode II, more prompt texts can be blended in sample input data, the processing results which the model needs to achieve are informed in the form of texts, the model can learn the data arrangement capability meeting the requirements better, and the model training efficiency can be improved.

Further, after the construction of each training sample is completed, the server side equipment carries out multiple rounds of iterative training on the medical record generating model to be trained by means of each training sample, and the trained medical record generating model is obtained.

It should be noted that, in the embodiment of the present application, the medical record generating model to be trained may be obtained by adding a corresponding network to be trained to the initial medical record generating model according to a preset fine tuning training manner, and the initial medical record generating model may be an open-source language model, for example, the large language model for constructing the initial medical record generating model related to the present application may be any one of the following: (Chat General Language Model, chatGLM), meta AI large language model (Large Language Model Meta AI, LLaMA), bigScience large language model (BigScience Large Open-science Open-access Multilingual Language Model, BLOOM), etc., which all achieve good results in natural language generation tasks.

The process of training the application to obtain the target medical record generation model can be understood as follows: in order to enable the large language model to be better suitable for the task of generating the electronic medical record, each constructed training sample is used for further fine-tuning training on the large language model so as to generate the high-quality electronic medical record.

Taking a round of iterative training for a medical record generation model to be trained as an example, the related processing procedure is described:

the server-side equipment inputs the acquired sample input data into a medical record to be trained to generate a model, and a prediction integration result is obtained; the medical record generation model to be trained comprises the following steps: the method comprises the steps of generating a model of an initial medical record after pre-training, and adding a network to be trained in the model of the initial medical record generation according to a preset fine-tuning training mode; and calculating a model loss value according to the content difference between the prediction integration result and the corresponding sample medical record, and adjusting network parameters of the network to be trained according to the model loss value.

The following describes a round of iterative training process for a medical record generation model to be trained by taking the following two fine tuning training modes as examples:

in the technical scheme provided by the application, after the target medical record generation model is obtained by fine-tuning the large language model, the generation of the electronic medical record can be realized by means of the target medical record generation model, and in the technical scheme provided by the application, the type of the large language model (namely the base model) is not limited, and the medical record generation model to be trained can be constructed based on any large language model such as ChatGLM, LLaMA, BLOOM.

And performing fine Tuning training by adopting a P-Tuning v2 mode.

P-Tuning v2 is a prompt (prompt) based fine Tuning method that directs the model to generate the correct output by introducing a learnable prompt to encode specific knowledge of the electronic medical record generation task into the input of the model.

The specific method is that each layer of the model is generated by the initial medical record, a plurality of trainable pseudo-word position signs (virtual token) are inserted in front of the real sentence signs, namely, the learnable campt is spliced with the original input of each layer of the model to obtain a new input. The stitching order is typically to put a learner-able template in front of the original input, which causes each layer of network inside the model to first focus on the learner-able template (i.e., the network to be trained) when processing the input; and then processing the original input (namely, the information of admission condition, admission diagnosis, image report, laboratory test report and the like) so as to improve the performance of the model on the task of generating the electronic medical record.

The formula of P-Tuning v2 is as follows:

x′＝[p||x]

wherein,identifying the original input of the model, ++>Representing a probt that the model can learn, the | indicates a splicing operation, +.>Representing the result after stitching the original input with a learnable prompt. b represents the training batch size, l represents the length of the original input, d represents the dimension of the original input, p _l Representing the length of the learner able sample. When the model is fine-tuned, the model can have the capability of generating high-quality electronic medical records by training a learner-able template.

For example, referring to fig. 2G, which is a schematic diagram of a processing procedure of a round of fine tuning training in the embodiment of the present application, as can be seen from the content of the schematic diagram in fig. 2G, after a server device adopts a fine tuning training mode of P-tuning to construct a medical record generation model to be trained, in a round of iterative training process, it is assumed that 1 sample input data of a batch of input models is 1, and then 1 predicted medical record output by the model is obtained; and then, calculating a loss value of the model based on cross entropy loss of contents between the predicted medical record and the corresponding sample medical record, and adjusting the network to be trained based on the calculated loss value.

And a second fine tuning training mode is adopted to carry out fine tuning training by adopting a Low-Rank Adaptation (LoRA) mode.

Note that the LoRA aims to improve generalization ability of the model by low rank matrix decomposition. In the fine tuning process, loRA firstly uses a matrix to learn specific knowledge of an electronic medical record generating task, and then decomposes the matrix into two low-rank matrixes to multiply by a low-rank decomposition mode, so that the parameter number and the computational complexity of a model are reduced. By the method, the LoRA can improve the performance of the model on the electronic medical record generation task by optimizing the low-rank matrix under the condition that the original model parameters are kept unchanged, so that the parameter quantity and the calculation complexity of the model are reduced while the performance of the model is kept, and the generalization capability of the model is improved.

The process trimming logic for LoRA is as follows: a bypass is added beside the initial medical record generating model, and the operation of dimension reduction and dimension increase is performed to simulate the inherent rank; and fixing parameters of the initial medical record generation model during training, and only training the dimension reduction matrix and the dimension increase matrix.

The formula for LoRA is as follows:

h＝W ₀ x+ΔWx

ΔW＝BA

h＝W ₀ x+BAx

wherein W is ₀ ∈R ^d×d Representing parameters of the original model, wherein DeltaW is a network parameter to be updated; x represents model input, h represents model output, B ε R ^d×r Represents a dimension-reducing matrix, A epsilon R ^r×d Representing an upbound matrix. d represents the dimension of the input and r represents the size of the rank. When the model is fine-tuned, the model can be provided with the capability of generating high-quality electronic medical records by training the matrixes A and B.

In summary, in the process of fine tuning by the server device to obtain the target medical record generation model, any one of the fine tuning training modes may be adopted for training, where the objective function in the fine tuning stage of the model may be expressed as:

where W represents the original parameters of the large language model (i.e., the initial medical record generation model). Δw represents parameters related to the electronic medical record generation task. D represents each training sample. x represents the input sequence, i.e. the sample input data. y represents the output sequence, i.e., the sample medical record. Y represents the length of the output sequence. P (P) _W+ΔW (y _t |x,y<t) represents that in case the input sequence x is known, and the first t-1 token (i.e. token) in the output sequence, the t-th token is y _t Is a probability of (2).

For example, referring to fig. 2H, which is a schematic diagram of a further processing procedure of a round of fine tuning training in the embodiment of the present application, as can be seen from the content of the schematic diagram in fig. 2H, after the server device adopts the fine tuning training mode of the lorea to-be-trained medical record generating model, in a round of iterative training process, it is assumed that 1 sample input data of a batch of input models is 1, and then 1 predicted medical record output by the model is obtained; and then, calculating a loss value of the model based on cross entropy loss of contents between the predicted medical record and the corresponding sample medical record, and adjusting the network to be trained based on the calculated loss value.

It should be understood that, for the P-Tuning v2 or LoRA Tuning approach employed in the present application, the goal of the Tuning training is to improve the performance of the model on the discharge medical record generation task by optimizing additional learnable parameters while keeping the original parameters of the large language model unchanged. They differ in that P-Tuning v2 introduces a learnable prompt in the input, guiding the output of the model by adjusting the input representation. LoRA is to introduce low-rank matrix decomposition, and adjust the output of the model through a dimension-reducing and dimension-increasing matrix. The amount of parameters of P-Tuning v2 depends mainly on the dimension of the learner-able sample. The amount of parameters of the LoRA depends mainly on the magnitude of the rank of the decreasing and increasing dimensions.

In addition, the length of the learner in the P-Tuning v2 can be adjusted according to the actual processing requirement, and the rank of the low rank matrix in the LoRA can be adjusted according to the actual processing requirement in the model fine Tuning stage.

Therefore, by means of the fine tuning training mode, the fine tuning training of the adaptively constructed model can be realized according to the actual downstream task under the condition that model parameters in the pre-trained initial medical record generation model are not changed, and the processing effect which can be realized by the language model based on large parameter magnitude is ensured.

Similarly, the server device may perform multiple rounds of iterative training by adopting the above iterative manner until a preset convergence condition is satisfied, so as to obtain a trained target medical record generation model, where the preset convergence condition may be: the number of times that the loss value is continuously lower than the first set value reaches a second set value, or the number of model training wheels reaches a third set value; the values of the first setting value, the second setting value and the third setting value are set according to actual processing requirements, which is not particularly limited in the present application.

Then, when the server side equipment obtains a target medical record generation model according to training, the server side equipment adaptively processes the data to be integrated in the form of the content of the sample input data; and further, a target medical record generation model is adopted, and corresponding electronic medical records are output based on the processed data to be integrated.

In the stage of generating the electronic medical record by applying the target medical record generating model, the temperature super-parameters in the model can be adjusted according to the service requirements in consideration of the fact that the temperature super-parameters influence the generation quality of the electronic medical record, so that the electronic medical record meeting the service requirements is generated.

In some possible implementation scenarios, the sample input data may include preset prompt content, so that during adaptive processing, corresponding prompt content may be added to the data to be integrated.

The following is a comprehensive description, taking the related processing procedure as an example, in a specific scenario of generating a discharge record (i.e. electronic medical record) for a patient when the patient is discharged, with reference to the accompanying drawings:

first, in the discharge record generation scenario, in order to achieve the discharge record generation task, each training sample for fine-tuning training needs to be built specifically first.

In particular, considering that in this scenario, the downstream task implemented is to generate discharge records, the sample discharge records of each sample patient should be used as a basis for comparison of the output differences. Moreover, it is necessary to preset a plurality of chapters (i.e., a plurality of medical record content tags) included in the sample discharge record, such as: admission condition, admission diagnosis, diagnosis and treatment pass, discharge condition, discharge diagnosis and the like. In the following description, the procedure of generating the discharge record will be described by taking only the five medical record content labels of the discharge situation, the discharge diagnosis, the diagnosis and the treatment pass, the discharge situation and the discharge diagnosis as examples, wherein the sample discharge record may include other chapters, and the same method can be used for generating the sample discharge record when the content of the other chapters is processed.

In addition, in order for the model to better understand the sample discharge records, the sample discharge records are typically chapter-split to yield sample discharge records of the form: content of admission diagnosis content of diagnosis and diagnosis of discharge content of discharge diagnosis content of …, wherein ' content ' represents a title of each chapter (i.e. each medical record content label), ' content ' represents text content corresponding to the chapter, is used for separating a plurality of chapters, … ' is an omitted symbol, and is used for representing other non-mentioned chapters.

After the content form of the discharge record is constructed, sample input data used as model input may be constructed from the content form of the sample discharge record. Discharge records can be considered as a summary of records of admission records, image reports, laboratory test sheets, etc. In order to generate high quality discharge records, sample input data is typically generated in the form of data content under the data title of a discharge record, video report, laboratory test report, or the like.

It should be appreciated that the quality of the sample input data has a greater impact on the ability of the model to generate discharge records, good sample input data enables the language model to better understand the discharge medical record (or discharge record) generation task. Since the sample input data of the discharge medical record generating task contains a plurality of contents (or a plurality of data titles), namely: admission records, image reports, laboratory exam sheets, and course records, each of which contains a plurality of key fields, such as: the admission records include main complaints, current medical history, past history, personal history, family history, physical examination, auxiliary examination, specialty examination, etc., so that differential identification is required for different data in the sample input data to represent the above information, and thus, differential identification can be performed for different items, and different types of data contents belonging to each item.

Wherein each record of the sample patient during hospitalization may contain other records in addition to the admission record, the image report, the laboratory test report, the course record, and each record may not be limited to the key fields mentioned in this patent. The present patent only takes the example of generating sample input data based on the mentioned multiple contents and key fields, and illustrates schematically the process of fine tuning a large language model, and in other possible scenarios, other item data may be selected for processing.

Furthermore, the processing procedure related to the step 203 may be adopted to obtain sample input data, and a training sample may be constructed in a targeted manner; and then, performing fine tuning training on the pre-trained large language model by adopting each constructed training sample to obtain a trained target medical record generation model.

In a specific application process, the server device processes the diagnosis data of each stage of the patient during the hospitalization period to obtain the data to be integrated, wherein the obtained diagnosis data of each stage does not include the privacy data of the patient, and the patient can be a sample patient in an open source data set or can be a patient in an actual business scene.

Referring to fig. 3A, which is a schematic diagram illustrating a processing procedure of each stage of generating an electronic medical record in an embodiment of the present application, in the following, with reference to fig. 3A, a processing procedure involved before and during the generation of the electronic medical record is described in a comb manner:

steps 301 to 302 are the processing procedure of the server device before the electronic medical record is generated.

Step 301: the server side equipment adopts preset training samples to perform multi-round iterative training on the medical record generating model to be trained, and a target medical record generating model is obtained.

Step 302: the server device adopts preset authentication conditions to authenticate the diagnosis object initiating the medical record generation instruction, and determines that the diagnosis object has the right of triggering the medical record generation instruction aiming at the target object.

After the authentication in step 302, under the condition that it is determined that the generation of the electronic medical record is required to be performed for the target object in response to the medical record generation instruction, the server device adopts the processing procedures illustrated in steps 303-306 to generate the electronic medical record associated with the target object.

Step 303: the server device obtains diagnostic data of each stage associated with the target object.

Step 304: the server equipment aggregates the data content of the same data header associated with different diagnosis data to respectively obtain corresponding aggregation results.

Specifically, when step 304 is executed, after the server device acquires the diagnostic data of each stage associated with the target object, the following operations are executed for each data header in the diagnostic data of each stage:

step 3041: the server device determines the data content belonging to a data title in different diagnostic data.

Step 3042: and the server equipment performs content splicing on each data content according to the generation time sequence of each data content to obtain a corresponding aggregation result.

Step 305: and the server equipment obtains data to be integrated based on the aggregation results.

Specifically, when step 305 is executed, the server device may aggregate each aggregation result to obtain a total aggregation result, and then add an identifier to the total aggregation result to obtain data to be integrated, or may add identifiers to each aggregation result, and then aggregate each aggregation result after processing to obtain data to be integrated.

In the process of adding the identifier, the following operations are performed:

step 3051: the server device adopts a preset title indicator to identify the data title.

Step 3052: the server device adopts a preset type separator to separate different types of data contents.

Step 3053: the server device adopts a preset content separator to separate keywords for identifying the data types and corresponding data contents in different types of data contents.

Step 306: and the server side equipment adopts a target medical record generation model, and performs content integration processing on the data to be integrated according to preset medical record content labels to obtain an electronic medical record associated with the target object.

Further, referring to fig. 3B, a schematic process diagram of processing to obtain an electronic medical record in the embodiment of the present application is shown, where a server device inputs processed data to be integrated into a target medical record generation model to obtain an electronic medical record output by the model.

It should be noted that, according to the actual processing requirement, in the process of aggregating the data contents of the same data header in different diagnostic data and finally generating the data to be integrated, the irrelevant contents irrelevant to the data contents preset in the electronic medical record can be deleted according to the actual processing requirement, for example, when an aggregation result is generated, the irrelevant contents are not aggregated, or after the data to be integrated is obtained, the irrelevant contents in the data to be integrated are deleted.

In this way, the discharge records of the patient can be generated according to the information such as the admission records and the course records of the patient, and manual arrangement of doctors is not needed. The large language model has strong natural language generating capability, and the application utilizes the information of the admission record, the course record, the discharge record and the like of the sample patient to finely tune the large language model, so that the large language model has the capability of generating the discharge record, thereby being better suitable for the discharge record generating task. In an actual business scene, the corresponding discharge records can be generated by means of the target medical record generation model obtained through fine adjustment, the generation quality of the discharge records is improved, moreover, the discharge records can be efficiently and accurately generated only by simple check of doctors, the workload of doctors for summarizing the discharge records can be reduced, the time is saved, and therefore communication and treatment with other inpatients are better carried out, and the working efficiency and quality are improved.

Based on the same inventive concept, referring to fig. 4, which is a schematic logic structure diagram of an electronic medical record generating device according to an embodiment of the present application, the electronic medical record generating device 400 includes an obtaining unit 401, an aggregation unit 402, and a processing unit 403, where,

an acquisition unit 401 for acquiring diagnostic data of each stage associated with a target object; the diagnostic data for each phase includes: at least one data title, and at least one type of data content respectively associated with the at least one data title;

an aggregation unit 402, configured to aggregate data contents associated with the same data header in different diagnostic data, respectively obtain corresponding aggregate results, and obtain data to be integrated based on each aggregate result, where corresponding identifiers are respectively added to each data header and different types of data contents respectively belonging to each data header in the data to be integrated;

the processing unit 403 is configured to perform content integration processing on the data to be integrated according to preset content labels of each medical record by using the target medical record generation model, so as to obtain an electronic medical record associated with the target object.

Optionally, based on each aggregation result, when obtaining the data to be integrated, the aggregation unit 402 is configured to:

For each aggregation result, the following operations are performed: identifying data titles in an aggregation result by adopting a preset title indicator, separating different types of data contents in the aggregation result by adopting a preset type separator, and separating keywords used for identifying the data types and corresponding data contents in the different types of data contents by adopting a preset content separator;

adopting a preset content separator to separate keywords for identifying data types and corresponding data contents from different types of data contents respectively;

Optionally, when acquiring diagnostic data of each stage associated with the target object, the acquiring unit 401 is configured to perform any one of the following operations:

and responding to the medical record generation indication triggered by the diagnosis object aiming at the target object, and acquiring diagnosis data of each stage which is uploaded by the diagnosis object and is associated with the target object.

Optionally, when acquiring diagnostic data stored in each stage associated with a target object at a preset target address, the acquiring unit 401 is configured to:

acquiring data positioning information corresponding to a target object, wherein the data positioning information is obtained by privacy processing of identification information of the target object;

and accessing a preset target address based on the data positioning information, and acquiring diagnostic data of each stage of associated data positioning information storage.

Optionally, before generating an indication in response to the medical record triggered by the diagnostic object for the target object, the obtaining unit 401 is further configured to:

carrying out authentication on the diagnosis object by adopting a preset authentication condition, and determining that the diagnosis object has the right of triggering the medical record generation indication aiming at the target object;

The IP address of the terminal equipment used with the diagnosis object is concentrated in a pre-stored safe IP address;

there is a doctor-patient diagnosis relationship between the diagnosis object and the target object.

Optionally, when the data contents of the same data header associated with different diagnostic data are aggregated to obtain corresponding aggregation results, the aggregation unit 402 is configured to:

for each data title, the following operations are performed:

Optionally, the apparatus further includes a training unit, where the target medical record generating model is obtained by training by the training unit 404 in the following manner:

performing multiple rounds of iterative training on the initial medical record generating model by adopting preset training samples to obtain a trained target medical record generating model; a training sample comprising: sample input data and sample medical records obtained by integrating the content of the sample input data according to preset content labels of each medical record;

inputting the acquired sample input data into a medical record to be trained to generate a model, and obtaining a prediction integration result; the medical record generation model to be trained comprises the following steps: the method comprises the steps of generating a model of an initial medical record after pre-training, and adding a network to be trained in the model of the initial medical record generation according to a preset fine-tuning training mode;

And calculating a model loss value according to the content difference between the predicted medical record and the corresponding sample medical record, and adjusting network parameters of the network to be trained according to the model loss value.

Optionally, each training sample is constructed by training unit 404 in any one of the following ways:

acquiring data to be integrated of each sample, and respectively integrating the content of the data to be integrated of each sample according to a preset content label of each medical record to obtain a corresponding sample medical record; for each sample of data to be integrated, the following operations are respectively executed: the method comprises the steps of sorting data to be integrated of one sample into sample input data with a length not exceeding a specified length, and generating training samples based on the sample input data and corresponding sample medical records;

acquiring data to be integrated of each sample, and respectively integrating the content of the data to be integrated of each sample according to a preset content label of each medical record to obtain a corresponding sample medical record; for each sample of data to be integrated, the following operations are respectively executed: and splicing the data to be integrated of one sample with a preset processing prompt text, sorting the obtained splicing result into sample input data with the length not exceeding a specified length, and generating a training sample based on the sample input data and a corresponding sample medical record.

Having described the method and apparatus for generating an electronic medical record according to an exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.

Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

In the case where the electronic device in the embodiment of the present application corresponds to a server device based on the same inventive concept as the above embodiment of the present application, referring to fig. 5, which is a schematic diagram of a hardware composition structure of an electronic device to which the embodiment of the present application is applied, the electronic device 500 may at least include a processor 501 and a memory 502. The memory 502 stores a computer program that, when executed by the processor 501, causes the processor 501 to perform any one of the steps of generating an electronic medical record.

In some possible embodiments, an electronic device according to the application may comprise at least one processor, and at least one memory. The memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of generating an electronic medical record according to various exemplary embodiments of the application described above in this specification. For example, the processor may perform the steps as shown in fig. 2A.

Various aspects of the generation of an electronic medical record provided by the present application may also be implemented in the form of a program product based on the same inventive concept as the above-described method embodiments, including a computer program for causing an electronic device to perform the steps in the generation of an electronic medical record according to the various exemplary embodiments of the present application described herein above when the program product is run on the electronic device, e.g. the electronic device may perform the steps as shown in fig. 2A.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The method for generating the electronic medical record is characterized by comprising the following steps of:

2. The method of claim 1, wherein the obtaining the data to be integrated based on each aggregation result comprises:

3. The method of claim 1, wherein the obtaining the data to be integrated based on each aggregation result comprises:

4. The method of claim 1, wherein the obtaining diagnostic data for each stage associated with the target object comprises any one of:

5. The method of claim 4, wherein the obtaining diagnostic data stored at the predetermined target address for each stage associated with the target object comprises:

6. The method of claim 4, wherein prior to generating the indication in response to the medical record triggered by the diagnostic object for the target object, further comprising:

7. The method according to any one of claims 1-6, wherein aggregating the data content associated with the same data header in different diagnostic data, respectively, to obtain corresponding aggregate results, comprises:

for each data title, the following operations are performed:

8. The method of any of claims 1-6, wherein the target medical record generation model is trained by:

performing multiple rounds of iterative training on the medical record generating model to be trained by adopting preset training samples to obtain a target medical record generating model; a training sample comprising: sample input data, and sample medical records obtained by integrating the content of the sample input data according to preset content labels of each medical record;

9. The method of claim 8, wherein each training sample is constructed in any one of the following ways:

10. The device for generating the electronic medical record is characterized by comprising the following components:

11. The apparatus of claim 10, wherein the aggregation unit is configured to, when obtaining the data to be integrated based on each aggregation result:

12. The apparatus of claim 10, wherein the aggregation unit is configured to, when obtaining the data to be integrated based on each aggregation result:

13. The apparatus of claim 10, wherein the acquisition unit is configured to perform any one of the following operations when acquiring diagnostic data for each stage associated with a target object:

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-9 when the computer program is executed by the processor.

15. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the method according to any of claims 1-9 when executed by a processor.

16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.