CN114334049B

CN114334049B - Method, device and equipment for structuring electronic medical record

Info

Publication number: CN114334049B
Application number: CN202011619887.4A
Authority: CN
Inventors: 程龙龙; 黄硕; 袁丁; 江正义
Original assignee: Zhongdian Yunnao Tianjin Technology Co ltd
Current assignee: Zhongdian Yunnao Tianjin Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2024-06-07
Anticipated expiration: 2040-12-31
Also published as: CN114334049A

Abstract

The invention provides a method, a device and equipment for structuring electronic medical records, wherein the method comprises the following steps: acquiring an electronic medical record set comprising a plurality of medical record texts; analyzing the topics and the associated contents in the medical record text according to the characteristics of the preset topics and the associated contents, and dividing each analyzed topic and associated contents into paragraphs; performing clause division on each paragraph, performing dependency syntax analysis on each divided clause, and determining the entity and the dependency relationship characteristics thereof; and extracting the entity which accords with the preset dependency relationship in each clause according to the dependency relationship characteristics of the entity, and filling the entity into the corresponding entity position of the preset entity structure to obtain the structured entity corresponding to each clause, wherein the preset entity structure comprises different entity positions and the preset dependency relationship exists among the different entity positions. By using the method provided by the invention, the electronic medical record can be subjected to structuring treatment to obtain the structuring entity with the preset relation, and the possibility is provided for carrying out data mining on the text of the medical record.

Description

Method, device and equipment for structuring electronic medical record

Technical Field

The present invention relates to the field of natural language analysis, and in particular, to a method, an apparatus, and a device for structured processing of an electronic medical record.

Background

The medical record is an original record of the whole process of diagnosis and treatment of patients in a hospital, and comprises a first page, a course record, an examination result, a doctor's advice, an operation record, a nursing record and the like. Electronic medical records are electronically managed, and information about the life-long health status and healthcare behavior of an individual relates to all process information for the collection, storage, transmission, processing and utilization of patient information.

In order to analyze core data and retrieve data according to the electronic medical records, structural analysis is required to be performed on the electronic medical records, and key information in the medical records is extracted efficiently. However, the existing electronic medical record structuring scheme can only perform text analysis through the electronic medical record, extract preset features, and achieve identification of related entities such as diseases, symptoms and medicines in the medical record, but only obtain entity information which cannot be correlated. For example, four entities are extracted from the medical record: the prior art solutions cannot correspond to the relationship between the entities, and cannot determine whether the diastolic pressure is 120 or 80 in the above example. Therefore, the existing scheme for carrying out structural analysis on the electronic medical records cannot apply the extracted information to diagnosis reasoning, and cannot provide possibility for further data mining. Therefore, a scheme for performing deep structuring processing on the electronic medical record to obtain entity information with determined relationship is urgently needed.

Disclosure of Invention

The invention provides a method, a device and equipment for structuring an electronic medical record, which solve the problem that the existing scheme for structuring the electronic medical record can only realize the identification of related entities in the medical record and can only obtain entity information which cannot be correlated.

In a first aspect, the present invention provides a method for structuring an electronic medical record, where the method includes:

acquiring an electronic medical record set comprising a plurality of medical record texts;

analyzing the topics and the associated contents in the medical record text according to the preset topics and the characteristics of the contents associated with the topics, and dividing each analyzed topic and the contents associated with the topics into paragraphs;

performing clause division on each paragraph, performing dependency syntax analysis on each clause obtained by division, and determining an entity in each clause and dependency characteristics of the entity;

And extracting the entity which accords with the preset dependency relationship in each clause according to the dependency relationship characteristics of the entity, and filling the entity into the corresponding entity position of the preset entity structure to obtain the structured entity corresponding to each clause, wherein the preset entity structure comprises different entity positions and the preset dependency relationship exists among the different entity positions.

Optionally, according to the features of the preset theme and the content associated with the theme, analyzing the theme and the content associated with the theme in the medical record text, and dividing each analyzed theme and the content associated with the theme into paragraphs, including:

Analyzing corresponding topics in the medical record text according to characteristics of topics mapped by the slots in a slot group structure, wherein the slot group structure is a structure which is determined according to structures corresponding to different topics and related contents in a medical record template and comprises slots for mapping different topics and corresponding structural relations among the slots;

determining content associated with the parsed subject in the medical record text according to the parsed subject;

Dividing each analyzed theme and associated content into paragraphs, and filling the paragraphs into corresponding slots to obtain corresponding structured data sets.

Optionally, determining the slot group structure according to structures corresponding to different topics and associated contents in the medical record template includes:

Mining a theme in a medical record template and structural relations among the themes, and determining corresponding slot positions and the structural relations among the slot positions according to the themes and the structural relations thereof, wherein the structural relations comprise but are not limited to parallel relations, including relations and selected relations;

and constructing a slot group structure body of a tree structure according to the structural relation between the slot positions.

Optionally, determining the slot group structure according to the structures corresponding to different topics and associated contents in the medical record template, and further includes:

Dividing the case text into case template types according to the content types of the case text in the electronic case record set;

and determining a corresponding slot group structure body according to structures corresponding to different topics and associated contents in different medical record templates.

Optionally, after obtaining the structured entity corresponding to each clause, the method further includes:

And extracting the key value pairs with association relation to the structured entity to obtain entity keywords and values corresponding to the entity keywords.

Optionally, performing dependency syntax analysis on each clause obtained by division, and determining an entity in each clause and a dependency characteristic of the entity, including:

According to a pre-trained word segmentation model, respectively carrying out word segmentation processing on each clause obtained by division to obtain a sub word corresponding to each clause;

performing part-of-speech tagging on the sub-words according to a pre-trained part-of-speech tagging model to obtain sub-words tagged with part of speech;

And carrying out dependency syntax analysis on the sub-words marked with the parts of speech according to a pre-trained dependency syntax analysis model, and determining the entities in each clause and the dependency characteristics of the entities, wherein the dependency characteristics comprise the part of speech characteristics of the entities and the association characteristics among the entities.

Optionally, extracting an entity meeting a preset dependency relationship in each clause according to the dependency relationship characteristic of the entity, and filling the entity into a corresponding entity position of a preset entity structure to obtain a structured entity corresponding to each clause, including:

determining the preset dependency relationship of the entity matched with different entity positions in the preset entity structure;

Extracting entities conforming to the preset dependency relationship in each clause according to the dependency relationship characteristics of the entities;

filling the extracted entity into the corresponding entity position of the preset entity structure to obtain the structured entity corresponding to each clause.

In a second aspect, the present invention provides an electronic medical record structured processing device, including a memory and a processor, wherein:

the memory is used for storing a computer program;

The processor is used for reading the program in the memory and executing the following steps:

Optionally, the processor analyzes the topics and the associated content in the medical record text according to the preset topics and the characteristics of the content associated with the topics, and divides each analyzed topic and the content associated with the topic into paragraphs, including:

Optionally, the processor determines a slot group structure according to structures corresponding to different topics and associated contents in the medical record template, including:

Optionally, the processor determines a slot group structure according to structures corresponding to different topics and associated contents in the medical record template, and further includes:

Optionally, after obtaining the structured entity corresponding to each clause, the processor is further configured to:

Optionally, the processor performs dependency syntax analysis on each clause obtained by division, and determines an entity in each clause and a dependency characteristic of the entity, including:

Optionally, the processor extracts an entity meeting a preset dependency relationship in each clause according to the dependency relationship characteristic of the entity, and fills the entity into a corresponding entity position of a preset entity structure to obtain a structured entity corresponding to each clause, including:

In a third aspect, the present invention provides a device for structuring an electronic medical record, including:

A medical record obtaining unit, configured to obtain an electronic medical record set including a plurality of medical record texts;

The paragraph dividing unit is used for analyzing the topics and the related contents in the medical record text according to the characteristics of the preset topics and the related contents of the topics, and dividing each analyzed topic and the related contents of the topics into paragraphs;

The clause processing unit is used for dividing each paragraph into clauses, performing dependency syntax analysis on each clause obtained by division, and determining an entity in each clause and dependency relationship characteristics of the entity;

The structure extraction unit is used for extracting the entity which accords with the preset dependency relationship in each clause according to the dependency relationship characteristics of the entity, and filling the entity into the corresponding entity position of the preset entity structure to obtain the structured entity corresponding to each clause, wherein the preset entity structure comprises different entity positions and the preset dependency relationship exists among the different entity positions.

Optionally, the paragraph dividing unit analyzes the topics and the related content in the medical record text according to the preset topics and the characteristics of the content related to the topics, and divides each analyzed topic and the content related to the topic into paragraphs, including:

Optionally, the paragraph dividing unit determines a slot group structure body according to structures corresponding to different topics and associated contents in the medical record template, including:

Optionally, the paragraph dividing unit determines a slot group structure body according to structures corresponding to different topics and associated contents in the medical record template, and further includes:

Optionally, after obtaining the structured entities corresponding to the clauses, the structure extraction unit is further configured to:

Optionally, the clause processing unit performs dependency syntax analysis on each clause obtained by dividing, and determines an entity in each clause and a dependency relationship feature of the entity, including:

Optionally, the structure extracting unit extracts an entity meeting a preset dependency relationship in each clause according to the dependency relationship characteristic of the entity, and fills the entity into a corresponding entity position of a preset entity structure to obtain a structured entity corresponding to each clause, including:

In a fourth aspect, the present invention provides a computer program medium having a computer program stored thereon, which when executed by a processor implements the steps of a method for structuring an electronic medical record as provided in the first aspect above.

The method, the device and the equipment for structuring the electronic medical record have the following beneficial effects:

And carrying out structuring treatment on the electronic medical record to obtain a structuring entity with a preset relation, so that data mining on medical record texts is facilitated, and possibility is provided for diagnosis reasoning.

Drawings

FIG. 1 is a flowchart of a method for structuring an electronic medical record according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a scanned medical record according to an embodiment of the present invention;

FIG. 3 is a flowchart of a paragraph dividing method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a slot-bit assembly according to an embodiment of the present invention;

FIG. 5 is a diagram of dependency characteristics of entities in clauses provided by an embodiment of the present invention;

FIG. 6 is a schematic diagram of a preset physical structure according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a structured entity according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an electronic medical record structuring device according to an embodiment of the present invention;

Fig. 9 is a schematic diagram of an electronic medical record structuring device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: in addition, in the description of the embodiments of the present application, "plural" means two or more, and other words and the like, it is to be understood that the preferred embodiments described herein are for illustration and explanation of the present application only, and are not intended to limit the present application, and embodiments of the present application and features in the embodiments may be combined with each other without conflict.

It should be noted that the embodiments described in the following exemplary examples do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

In the following, some terms in the embodiments of the present disclosure are explained for easy understanding by those skilled in the art.

(1) The term "dependency syntax" in embodiments of the present disclosure, a framework that describes a language structure in terms of word-to-word dependencies in natural language processing, is referred to as a dependency syntax. Where "dependency" refers to a relationship between words that is supported and governed, and this relationship has directionality, i.e., one dependency connects a core word (head) and a dependency word (dependency).

(2) The term "dependency syntax analysis", also called dependency analysis, is a process of analyzing an input text sentence to obtain a syntax structure of the sentence, and is used to identify interdependencies between words in the sentence.

(3) The term "triplet" in the presently disclosed embodiments refers to a collection of shapes such as ((x, y), z), often abbreviated as (x, y, z).

(4) In the embodiment of the disclosure, the term "corpus" is a basic resource for carrying language knowledge by taking an electronic computer as a carrier, wherein language materials which are actually appeared in the actual use of the language are stored, and the actual corpus can be a useful resource only through analysis and processing.

(5) The term "chinese word segmentation" in the embodiments of the present disclosure is a process of recombining a continuous sequence of words into a sequence of words according to a certain specification.

(6) The term "part of speech tagging," also known as grammatical tagging or part of speech disambiguation, in embodiments of the present disclosure is a text data processing technique that tags the parts of speech of words in a corpus by their meaning and context.

(7) The term "optical character recognition" (Optical Character Recognition, OCR) in embodiments of the present disclosure refers to the process of an electronic device, such as a scanner or digital camera, inspecting characters printed on paper, determining their shape by detecting dark and light patterns, and then translating the shape into computer text using a character recognition method.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In view of the problem that the existing electronic medical record structuring scheme can only realize the identification of related entities such as diseases, symptoms and medicines in medical records, but cannot obtain entity information with association relations, the application provides a method, a device and equipment for structuring electronic medical records.

The following describes a method, a device and equipment for structuring an electronic medical record in the embodiment of the application in detail with reference to the accompanying drawings.

Example 1

An embodiment of the present invention provides a flowchart of a method for structuring an electronic medical record, as shown in fig. 1, including:

Step S101, acquiring an electronic medical record set comprising a plurality of medical record texts;

Mode 1: and acquiring an electronic medical record set according to the pre-acquired paper medical record.

And scanning the pre-acquired paper medical record, and then performing optical character recognition to acquire an electronic medical record set comprising a plurality of medical record texts.

At present, many medical records in hospitals are paper, and in order to mine patient data in the medical records, the paper medical records need to be scanned and converted into texts by using an OCR technology so as to be subjected to subsequent processing.

Mode 2: and acquiring an electronic medical record set according to the pre-acquired scanning medical record.

And performing optical character recognition on the scanned medical record in the pre-acquired picture format to acquire an electronic medical record set comprising a plurality of medical record texts.

As shown in fig. 2, an embodiment of the present invention provides a schematic diagram of a scanned medical record.

The scan medical record may be in any picture Format, for example, bitmap (BMP) Format, tag image file (TAGIMAGE FILE Format, TIF) Format, joint photographic experts group (Joint Photographic Expert Group, JPEG), and the like.

And recognizing the scanning medical records as text files and storing the text files by performing batch OCR (optical character recognition) on the scanning medical records, and obtaining an electronic medical record set comprising a plurality of medical record texts.

Step S102, analyzing the topics and the associated contents thereof in the medical record text according to the characteristics of the preset topics and the contents associated with the topics, and dividing each analyzed topic and the contents associated with the topics into paragraphs;

as shown in fig. 3, an embodiment of the present invention provides a flowchart of a paragraph dividing method, including:

Step S301, analyzing corresponding topics in the medical record text according to characteristics of topics mapped by the slots in a slot group structure, wherein the slot group structure is a structure which is determined according to structures corresponding to different topics and related contents in a medical record template and comprises slots mapped with different topics and corresponding structural relations among the slots;

the method for determining the slot group structure body in advance according to structures corresponding to different topics and associated contents in the medical record template comprises the following steps:

The medical record templates are preset according to national standards and requirements in specific implementation, and various different types of medical record templates exist.

As an alternative embodiment, different types of medical record templates are preset by:

Mining data elements and data group structure information in clinical standards of medical records;

the clinical standard of the medical record is related national standard of medical record, such as medical record writing basic Specification.

The data elements are topics in the medical record template, and the data set structure information is a structure corresponding to different topics and associated contents in the medical record template.

Constructing a basic medical record template according to the mined data elements and the data set structure information;

on the basis of the basic medical record template, the medical record template is expanded according to the requirements of hospitals.

The above-mentioned extension includes: (1) lateral expansion: content of the same topic is incorporated under the same topic name. (2) longitudinal expansion: adding a new theme.

For example, the basic medical record templates can be expanded according to the department of the hospital to form different types of medical record templates.

By the mode, different types of medical record templates are preset, and the method can adapt to classification and structuring systems of the electronic medical record clinical documents of the current medical standard; for different electronic medical record formats and writing specifications among different hospitals, the method has strong configurability and expandability, and can adapt to the requirements of different hospitals and the post-structuring scenes of different electronic medical records.

As an optional implementation manner, determining the slot group structure according to structures corresponding to different topics and associated contents in the medical record template, and further includes:

The electronic medical record set comprises a plurality of medical record texts, the content types respectively corresponding to the plurality of medical record texts are determined, then medical record templates of the corresponding types are determined according to the content types of the medical record texts, and corresponding slot bit group structures are constructed according to the determined types of the medical record templates.

As an alternative embodiment, the medical records are categorized according to their associated business records, such as medical history, physical examination, inspection reports, etc.

Fig. 4 is a schematic diagram of a slot bit assembly according to an embodiment of the present invention.

The slot group structure in fig. 4 has a tree structure, and includes a plurality of slots, and the slots have a fixed structural relationship.

The slot group structure of fig. 4 includes slots such as a medical record template 1, a medical record 1 to a medical record n, wherein the medical record 1 includes two sub slots for medical history and physical examination, the medical history includes four sub slots for main complaints, current medical history, past history, and system review, and the system review includes two sub slots for five sense organs and respiratory system.

The above-mentioned different topics in the slot mapping medical record template in fig. 4, the structure between slots maps the structural relationship between the topics, where the structural relationship includes, but is not limited to, a parallel relationship, an inclusion relationship, and a selection relationship.

For example, in FIG. 4 above, the medical history and physical examination are in a side-by-side relationship, and the system review includes the facial organ and the ventilator, and the system review and the facial organ, the system review and the ventilator are all in a side-by-side relationship.

In the above example of expanding only the sub slots of the medical record 1 in fig. 4, the similar sub slots are also included in the medical records 2 to n, and the detailed description is omitted here.

The example of indicating a slot group structure provided in fig. 4 is not limited to the embodiment of the present invention, and the slot may be increased, decreased, or deleted according to the specific implementation.

Step S302, determining the content associated with the parsed subject in the medical record text according to the parsed subject;

It should be noted that any implementation manner in which the content associated with the topic may be determined may be applied to the embodiment of the present invention, for example, determining the content associated with the parsed topic according to a regular match.

In the medical records presented in fig. 2, the content associated with the current medical history topic is determined as follows: 1 hour before admission, patients carelessly strain the waist when going down stairs, and immediately have severe waist pain, can not stand and walk, and has obviously limited activity. The family members of the patients hold the patients aside the body, but the symptoms are not improved, and then the patients have pale complexion, listlessness and no dizziness, headache, vomiting and heart vomiting. Thus, the doctor of my department. I check in the department: clear mind, normal blood pressure, stable vital sign, tension and swelling of waist muscles, obvious wide tenderness, lateral bending and stretching, limited movement, and enhanced test (-) of right straight leg high test (-). Therefore, the patient is admitted to the department of I and I with the acute lumbar sprain. The patients have poor mental rest and general food intake, and the patients have no abnormal stool.

The other subject matter is similar to the form of the content related to the current medical history subject matter, and will not be described herein.

Step S303, dividing each analyzed theme and associated content into paragraphs, and filling the paragraphs into corresponding slots to obtain corresponding structured data sets.

The current medical history slot after filling the paragraphs is: { present history: 1 hour before admission, patients carelessly strain the waist when going down stairs, and immediately have severe waist pain, can not stand and walk, and has obviously limited activity. The family members of the patients hold the patients aside the body, but the symptoms are not improved, and then the patients have pale complexion, listlessness and no dizziness, headache, vomiting and heart vomiting. Thus, the doctor of my department. I check in the department: clear mind, normal blood pressure, stable vital sign, tension and swelling of waist muscles, obvious wide tenderness, lateral bending and stretching, limited movement, and enhanced test (-) of right straight leg high test (-). Therefore, the patient is admitted to the department of I and I with the acute lumbar sprain. The patients have poor mental rest and general food intake, and the patients have no abnormal stool. }.

The structured data set comprises at least one clause.

It should be noted that the above structured data set may be invoked by keyword or tag retrieval.

For example, the retrieval of all of the current structured data sets in the slot-set structure may be accomplished by a key to the current medical history.

And (3) using a slot filling technology, combining the structured slot group structural body, carrying out structured filling on medical records stored in a text format, constructing unstructured texts into structured texts, and finally forming standardized and structured document paragraphs, so that search or finer-granularity medical records mining analysis can be carried out.

Through the scheme of paragraph division, document-level structuring of massive medical record documents is achieved, and a retrievable data set is formed. The method solves the problems that in the existing structuring scheme of the electronic medical record, no systematic data group classification and marking exist, and the identification, positioning and management are inconvenient when clinical documents are exchanged and shared across institutions.

Step S103, performing clause division on each paragraph, performing dependency syntax analysis on each clause obtained by division, and determining an entity in each clause and dependency relationship characteristics of the entity;

Clause division is performed on each paragraph, for example, the paragraphs with the medical history are divided, and a plurality of clauses are obtained. Wherein clause 1:1 hour before admission, patients carelessly strain the waist when going down stairs, and immediately have severe waist pain, can not stand and walk, and has obviously limited activity. Clause 2: the family members of the patients hold the patients aside the body, but the symptoms are not improved, and then the patients have pale complexion, listlessness and no dizziness, headache, vomiting and heart vomiting. Clause 3: thus, the doctor of my department. Clause 4: i check in the department: clear mind, normal blood pressure, stable vital sign, tension and swelling of waist muscles, obvious wide tenderness, lateral bending and stretching, limited movement, and enhanced test (-) of right straight leg high test (-). Clause 5: therefore, the patient is admitted to the department of I and I with the acute lumbar sprain. Clause 6: the patients have poor mental rest and general food intake, and the patients have no abnormal stool.

The above-mentioned dependency syntax analysis of each clause obtained by dividing, confirm entity and dependency relation characteristic of the said entity in each clause, include:

Taking the clause 1 as an example, after word segmentation, the obtained corresponding subwords of the clause 1 are as follows: admission, first 1 hour, patient, cause, stair descent, time, carelessness, handling, waist, sprain, immediate, appearance, waist, pain, strenuous, inability, standing, and walking, activities, conspicuous, limited.

The parts of speech are part of speech classifications in modern chinese, including: the real word: nouns, verbs, adjectives, quantitative terms, pronouns; the fictitious words: adverbs, prepositions, conjunctions, assisted words, exclamation, and personification.

Taking the subword corresponding to the clause 1 as an example, wherein the patient is a noun, the standing is a verb, and the strenuous is a adverb.

The above-mentioned dependencies include: master-slave relationship, moving-guest relationship, inter-guest relationship, prepositioned object, double-word, centering relationship, in-state structure, moving-complement structure, parallel relationship, mediate-guest relationship, left additional relationship, right additional relationship, independent relationship and core relationship.

The above-mentioned dependency relationship is an existing definition, and its specific meaning is not described herein.

As an alternative implementation, when performing dependency syntax analysis on clauses in a data set, a subject supplement algorithm is used to supplement a sentence lacking a subject with the subject so as to make the sentence structure complete.

In language habits, when the same subject is used for a plurality of successive sentences, the subject is often abbreviated. For example, his face is unsightly. There is also a fever. At this time, the subject supplement is malformed using a subject supplement algorithm. For example, the clause "still further has a fever", which is supplemented with "he still further has a fever".

As shown in FIG. 5, an embodiment of the present invention provides a schematic diagram of the dependency characteristics of entities in clauses.

The clause in fig. 5 is that he is now looking unsightly in face as if he were ill-conditioned.

The word segmentation obtains a plurality of sub words: now, he, facial, unsightly, looking, ill. In addition, punctuation marks in clauses can be marked.

In fig. 5 described above, parts of speech are noted below each subword, e.g., he is a pronoun and unsightly is an adjective.

The dependencies between the sub-words are annotated using a dependency arc.

The structure of the dependency method has no non-terminal points, and the dependency relationship between words directly occurs to form a dependency pair, wherein one is a core word, also called a dominant word, and the other is a modifier word, also called a subordinate word.

For example, in fig. 5, the core word is hard to see, and the face is a modifier.

The dependencies are represented by a directed arc, called a dependency arc. The direction of the dependent arc is from the dependent word to the dominant word.

For example, in fig. 5, the relationship between the unsightly and the face is mainly called the relationship, and the dependent arc is directed from the unsightly to the face.

The pre-training word segmentation model/part of speech tagging model/dependency syntactic analysis model includes:

Training the word segmentation model/part-of-speech tagging model/dependency syntactic analysis model according to a general Chinese corpus, and primarily adjusting parameters of the word segmentation model/part-of-speech tagging model/dependency syntactic analysis model;

Training the word segmentation model/part of speech tagging model/dependency syntactic analysis model according to a pre-acquired electronic medical record sample set, and adjusting parameters of the preliminarily adjusted word segmentation model/part of speech tagging model/dependency syntactic analysis model.

Training a Chinese word segmentation model, a part-of-speech tagging model and a dependency syntax analysis model by using a general Chinese corpus, and performing fine tuning on a medical record text to obtain the word segmentation model/the part-of-speech tagging model/the dependency syntax analysis model.

On the basis of paragraph segmentation and clause segmentation of a text medical record, a Chinese word segmentation, part-of-speech tagging and dependency syntactic analysis technology and a syntactic structure complement algorithm based on a pre-training model are used for structuring the segmented clauses, and entities in the clauses and dependency relationship characteristics of the entities are determined.

Step S104, extracting the entity which accords with the preset dependency relationship in each clause according to the dependency relationship characteristics of the entity, and filling the entity into the corresponding entity position of the preset entity structure to obtain the structured entity corresponding to each clause, wherein the preset entity structure comprises different entity positions and the preset dependency relationship exists among the different entity positions.

As shown in fig. 6, an embodiment of the present invention provides a schematic diagram of a preset physical structure.

In the preset entity structure shown in fig. 6, three entity positions, namely an entity position 1, an entity position 2 and an entity position 3, are respectively defined, the parts of speech of the three entity positions are nouns, verbs and nouns respectively, a main-name relationship exists between the entity position 1 and the entity position 2, and a guest-moving relationship exists between the entity position 2 and the entity position 3.

It should be noted that the above-mentioned preset physical structure is only an example of the embodiment of the present invention, and the embodiment of the present invention is not limited to any particular embodiment, and the specific structure of the above-mentioned preset physical structure may be changed according to specific implementation situations, for example, the number of the entity positions is adjusted, the relationship between the entity positions is adjusted, and so on.

It should be noted that, according to the language structure characteristics of each data set, a corresponding syntax rule is formulated, and the preset dependency relationship between different entity positions and corresponding matched entities is set in the syntax rule.

For example, in fig. 6, a syntax rule is formulated, and an entity conforming to the structured entity in fig. 6 is extracted.

Specifically, it is specified that a sub-word having a main-predicate-guest relationship is extracted from a clause, a subject is matched with the entity position 1, a predicate is matched with the entity position 2, and an object is matched with the entity position 3. The parts of speech of the subject and the object are nouns, and the parts of speech of the predicate is a verb.

Extracting entities conforming to the preset dependency relationship in each clause according to the dependency relationship characteristics of the entities; it should be noted that, when extracting the entity meeting the preset dependency relationship in each clause, the extraction may succeed or fail for any clause, or only a part of the entity meeting the requirement may be extracted.

For example, according to the above-described syntax rules, in the clause "his arm is scratched for ten minutes", the term "arm", "scratched", "ten minutes" is extracted, which is the case where the extraction is successful.

For example, the entity structure according to fig. 7 can only extract the "as if" and "ill" entities corresponding to the entity position 2 and the entity position 3, which are satisfactory, from the clause given in fig. 6, and is the case of extracting part of the entities.

If there are no satisfactory entities, then the extraction fails.

As an alternative embodiment, the above structured entity is a triplet structure.

Specifically, the above structured entity is a triple structure of < site, symptom, expression >.

The above-mentioned parts mean parts of the human body, such as arms, hearts, skin, etc.

The above symptoms mean abnormal types of the above parts, such as fever, bleeding, redness, and the like.

The expression means the degree of abnormality of the symptoms, for example, 38 degrees celsius, three times a day, one week, or the like.

After the above structured entities are extracted, simple diagnosis reasoning can be performed according to the structured entities, for example, the structured entities related to diseases and complications are extracted, and reasoning of the diseases and complications can be realized; the extraction of structured entities associated with symptoms and diseases allows for simple diagnostic reasoning of disease types.

Fig. 7 is a schematic diagram of a structured entity according to an embodiment of the present invention.

The structured entity is < arm, scratch, ten minutes >.

After obtaining the structuring entity corresponding to each clause, the method further comprises the following steps:

And quantizing part of the entities in the structured entities, and extracting key value pairs with association relations, wherein the key value pairs comprise numerical values.

For example, the symptoms and manifestations in the above-described < location, symptom, manifestation > triplet structure are further quantified to obtain a key-value pair in the form of < key, value >.

For example, the key value pair with the association relation is extracted for the < human body, heating and 39 ℃, and the key value pair of the < heating and 39 ℃ is obtained.

It should be noted that after obtaining the structured entity corresponding to each clause, the method further includes:

and storing the structured entity into a data structure library.

And when the structured entity is stored, the structured entity is sent to a corresponding index item for subsequent data statistics mining.

Example 2

An embodiment of the present invention provides a schematic diagram of an electronic medical record structured processing device 800, including a memory 801 and a processor 802, as shown in fig. 8, wherein:

the memory is used for storing a computer program;

An embodiment of the present invention provides a schematic diagram of an electronic medical record structuring processing device, as shown in fig. 9, including:

A medical record obtaining unit 901, configured to obtain an electronic medical record set including a plurality of medical record texts;

The paragraph dividing unit 902 is configured to parse the topics and the associated contents thereof in the medical record text according to the preset topics and the characteristics of the contents associated with the topics, and divide each parsed topic and the contents associated with the topic into paragraphs;

A clause processing unit 903, configured to divide each paragraph into clauses, and perform dependency syntax analysis on each clause obtained by the division, to determine an entity in each clause and a dependency characteristic of the entity;

The structure extracting unit 904 is configured to extract, according to the dependency characteristics of the entities, entities that conform to a preset dependency relationship in each clause, and fill the entities to corresponding entity positions of a preset entity structure, so as to obtain a structured entity corresponding to each clause, where the preset entity structure includes different entity positions and preset dependency relationships exist between the different entity positions.

The present invention also provides a computer program medium having a computer program stored thereon, which when executed by a processor, implements the steps of a method for structuring an electronic medical record provided in the above-mentioned embodiment 1.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Drive (SSD)), etc.

The above description has been made in detail for the technical solutions provided by the present application, and specific examples are applied in the present application to illustrate the principles and embodiments of the present application, and the above examples are only used to help understand the method and core ideas of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The method for structuring the electronic medical record is characterized by comprising the following steps of:

Dividing each analyzed theme and associated content into paragraphs, and filling the paragraphs into corresponding slots to obtain corresponding structured data sets;

Extracting entities which accord with preset dependency relations in each clause according to the dependency relation characteristics of the entities, and filling the entities into corresponding entity positions of a preset entity structure to obtain structured entities corresponding to each clause, wherein the preset entity structure comprises different entity positions and preset dependency relations exist among the different entity positions;

the method for determining the groove bit group structure body according to the structures corresponding to different topics and associated contents in the medical record template comprises the following steps:

Excavating topics in a medical record template and structural relations among the topics, determining corresponding slots and the structural relations among the slots according to the topics and the structural relations, wherein the structural relations comprise parallel relations, containing relations and selecting relations;

Constructing a slot group structure body of a tree structure according to the structural relation between the slot positions;

dividing the medical record text into medical record template types according to the content types of the medical record text in the electronic medical record set;

2. The method according to claim 1, further comprising, after obtaining the structured entity corresponding to each clause:

3. The method of claim 1, wherein performing dependency syntax analysis on each clause obtained by partitioning to determine entities in each clause and dependency characteristics of the entities, comprises:

4. The method according to claim 1, wherein extracting the entity conforming to the preset dependency relationship in each clause according to the dependency relationship characteristic of the entity, and filling the entity into the corresponding entity position of the preset entity structure, to obtain the structured entity corresponding to each clause, comprises:

5. An electronic medical record structured processing device, comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

The processor is configured to read the program in the memory and execute the method for structuring an electronic medical record according to any one of claims 1 to 4.

6. An electronic medical record structured processing device, comprising:

the paragraph dividing unit is used for analyzing the corresponding topics in the medical record text according to the characteristics of the topics mapped by the slots in the slot group structure body, wherein the slot group structure body is a structure which is determined according to the structures corresponding to different topics and the related contents in the medical record template and comprises the slots for mapping the different topics and the corresponding structural relations among the slots; determining content associated with the parsed subject in the medical record text according to the parsed subject; dividing each analyzed theme and associated content into paragraphs, and filling the paragraphs into corresponding slots to obtain corresponding structured data sets;

The structure extraction unit is used for extracting the entity which accords with the preset dependency relationship in each clause according to the dependency relationship characteristics of the entity, and filling the entity into the corresponding entity position of the preset entity structure to obtain the structured entity corresponding to each clause, wherein the preset entity structure comprises different entity positions and the preset dependency relationship exists among the different entity positions;

7. A computer program medium, characterized in that a computer program is stored thereon, which program, when being executed by a processor, realizes the steps of a method for structuring an electronic medical record according to any of claims 1-4.