CN109388807B - Method, device and storage medium for identifying named entities of electronic medical records - Google Patents

Method, device and storage medium for identifying named entities of electronic medical records Download PDF

Info

Publication number
CN109388807B
CN109388807B CN201811282557.3A CN201811282557A CN109388807B CN 109388807 B CN109388807 B CN 109388807B CN 201811282557 A CN201811282557 A CN 201811282557A CN 109388807 B CN109388807 B CN 109388807B
Authority
CN
China
Prior art keywords
vector matrix
electronic medical
processing
character
medical record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811282557.3A
Other languages
Chinese (zh)
Other versions
CN109388807A (en
Inventor
任江涛
殷明旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201811282557.3A priority Critical patent/CN109388807B/en
Publication of CN109388807A publication Critical patent/CN109388807A/en
Application granted granted Critical
Publication of CN109388807B publication Critical patent/CN109388807B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a method for identifying named entities of electronic medical records, which comprises the following steps: generating a word vector matrix and a radical vector matrix corresponding to a character sequence of the electronic medical record of the named entity to be identified, inputting the radical vector matrix into a convolutional neural network layer for processing to obtain a radical convolutional vector matrix corresponding to the character sequence, generating a word characteristic vector matrix according to the word vector matrix and the radical convolutional vector matrix, and inputting the word characteristic vector matrix into a bidirectional long-short term memory network for processing to obtain a named entity identification result of the electronic medical record. The invention also discloses an electronic medical record named entity recognition device and a storage medium. The invention provides the method for identifying the named entity of the electronic medical record, which has high identification accuracy by extracting the internal morphological characteristics of the characters of the electronic medical record and sequentially inputting the characteristics of the characters and the internal morphological characteristics of the characters into a deep neural network to predict the character label.

Description

Method, device and storage medium for identifying named entities of electronic medical records
Technical Field
The invention relates to the technical field of computers, in particular to a method for identifying named entities of electronic medical records, a device for identifying the named entities of the electronic medical records and a computer storage medium.
Background
With the vigorous development of socioeconomic performance in China and the increasing improvement of the living standard of people, the health consciousness of people is increasingly enhanced, and how to construct an intelligent medical system by using a large amount of medical data is an urgent need of the society at present. The electronic medical record is a medical data text with the most medical data and the most information, has unique specialty, is written by a professional doctor aiming at a patient, and records various symptoms in the process of going in and out of a hospital, diseases diagnosed by the doctor and corresponding treatment means in detail, and also contains a great amount of medical information such as results of various examination reports. Therefore, many intelligent medical information systems are constructed based on information of electronic medical records. In the process of constructing an intelligent medical information system and system, named entity identification is the basis of an important task of information extraction on a large amount of medical data, and is very important for information processing and management systems in various medical fields.
In the prior art, a named entity recognition method based on deep learning in the medical field is available, and a neural network model is used for extracting context information between words or phrases and outputting probability distribution of an entity category. However, because the information representation of the characters or words is not complete, only the character vectors or word vectors are relied on, and deep information hidden in the characters or words is not considered, the recognition effect is not good.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide an electronic medical record named entity identification method, an electronic medical record named entity identification device, electronic medical record named entity identification equipment and a computer storage medium, and aims to solve the technical problems that in an implementation method based on deep learning in the prior art, only word vectors or word vectors are relied on, deep information hidden in the words or words is not considered, and the identification effect is poor.
In order to achieve the above object, the present invention provides a method for identifying named entities of electronic medical records, which comprises the following steps:
generating a word vector matrix corresponding to the character sequence of the electronic medical record of the named entity to be identified;
generating a radical vector matrix corresponding to the character sequence;
inputting the component vector matrix into a first neural network for processing to obtain a component convolution vector matrix corresponding to the character sequence, wherein the first neural network comprises a convolution neural network layer;
generating a word feature vector matrix according to the word vector matrix and the radical convolution vector matrix;
inputting the word feature vector matrix into a second neural network for processing to obtain a named entity recognition result of the electronic medical record, wherein the second neural network comprises a bidirectional long-short term memory network layer;
and the parameters of the first neural network and the second neural network are obtained by training according to the electronic medical record of the identified named entity.
Preferably, the step of generating the radical vector matrix corresponding to the character sequence includes:
acquiring Chinese character components of each character in the character sequence;
generating radical vectors of the characters according to the Chinese character components;
and generating a radical vector matrix corresponding to the character sequence according to the radical vector of each character.
Preferably, the second neural network further includes a full connection layer, and the step of inputting the word feature vector matrix into the second neural network for processing to obtain the named entity identification result of the electronic medical record includes:
inputting the character feature vector matrix into the bidirectional long and short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence;
and inputting the hidden vector matrix into the full connection layer for processing to obtain a named entity recognition result of the electronic medical record.
Preferably, the second neural network further includes a self-attention mechanism layer, and the step of inputting the word feature vector matrix into the second neural network for processing to obtain the named entity identification result of the electronic medical record includes:
inputting the character feature vector matrix into the bidirectional long and short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence;
and inputting the hidden vector matrix into a self-attention mechanism layer for processing to obtain a named entity recognition result of the electronic medical record.
Preferably, the second neural network further includes a self-attention mechanism layer and a conditional random field model, and the step of inputting the word feature vector matrix into the second neural network for processing to obtain the named entity recognition result of the electronic medical record includes:
inputting the character feature vector matrix into the bidirectional long and short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence;
inputting the implicit vector matrix into a self-attention mechanism layer for processing to obtain a prediction matrix corresponding to the character sequence;
and inputting the prediction matrix into the conditional random field model for processing to obtain a named entity recognition result of the electronic medical record.
Preferably, the self-attention mechanism layer includes a full-link layer, and the step of inputting the implicit vector matrix into the self-attention mechanism layer for processing to obtain the prediction matrix corresponding to the character sequence includes:
calculating attention weights of hidden vectors in the hidden vector matrix;
generating an attention vector matrix according to the attention weight and the implicit vector;
generating an attention hidden vector matrix according to the hidden vector matrix and the attention vector matrix;
and inputting the attention hiding vector matrix into the full-connection layer for processing to obtain a prediction matrix corresponding to the character sequence.
Preferably, the step of calculating attention weights of hidden vectors in the hidden vector matrix comprises:
calculating the dependency relationship between the hidden vectors in the hidden vector matrix according to the following formula:
ft,t′=σ(wa tanh(wtht+wt′ht′)),
where t and t' represent different time steps, wa,wt,wt′Is a weight vector, sigma is a sigmoid function, htIs a sum of ht′Hidden vectors of different time steps;
according to the following formula, each hidden vector h in the shown hidden vector matrixkCalculating corresponding attention weights
Figure BDA0001846805320000031
Figure BDA0001846805320000032
Wherein e is an exponential function, N is the number of the hidden vectors,
Figure BDA0001846805320000033
preferably, the second neural network further includes a conditional random field model, and the step of inputting the word feature vector matrix into the second neural network for processing to obtain the named entity recognition result of the electronic medical record includes:
inputting the character feature vector matrix into the bidirectional long and short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence;
and inputting the hidden vector matrix into a conditional random field model for processing to obtain a named entity recognition result of the electronic medical record.
In addition, in order to achieve the above object, the present invention further provides an electronic medical record named entity recognition apparatus, including: the electronic medical record named entity recognition processing program realizes the steps of the method for recognizing the named entity of the electronic medical record when being executed by the processor.
In addition, in order to achieve the above object, the present invention further provides a computer storage medium, wherein the computer storage medium stores a processing program for named entity identification of an electronic medical record, and the processing program for named entity identification of an electronic medical record is executed by a processor to implement the steps of the method for named entity identification of an electronic medical record as described above.
The method for identifying the named entities of the electronic medical record, the device for identifying the named entities of the electronic medical record and the computer storage medium provided by the embodiment of the invention generate a word vector matrix and a component vector matrix corresponding to a character sequence of the electronic medical record of the named entities to be identified, input the component vector matrix into a convolutional neural network layer for processing to obtain the component convolutional vector matrix corresponding to the character sequence, generate a word feature vector matrix according to the word vector matrix and the component convolutional vector matrix, and input the word feature vector matrix into a bidirectional long-short term memory network for processing to obtain a named entity identification result of the electronic medical record. The invention provides the method for identifying the named entity of the electronic medical record, which has high identification accuracy by extracting the internal morphological characteristics of the characters of the electronic medical record and sequentially inputting the characteristics of the characters and the internal morphological characteristics of the characters into a deep neural network to predict the character label.
Drawings
FIG. 1 is a schematic diagram of an apparatus in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for identifying named entities in an electronic medical record according to a first embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a convolutional neural network processing procedure according to a first embodiment of the method for identifying named entities in electronic medical records of the present invention;
FIG. 4 is a schematic diagram of a neural network system processing procedure according to a first embodiment of the method for identifying named entities in electronic medical records of the present invention;
FIG. 5 is a flowchart illustrating a method for identifying named entities in an electronic medical record according to a second embodiment of the present invention;
FIG. 6 is a schematic diagram of a neural network system processing procedure according to a second embodiment of the method for identifying named entities in electronic medical records of the present invention;
FIG. 7 is a flowchart illustrating a method for identifying named entities in an electronic medical record according to a third embodiment of the present invention;
FIG. 8 is a diagram illustrating a processing procedure of a neural network system according to a third embodiment of the method for identifying named entities in electronic medical records of the present invention;
FIG. 9 is a flowchart illustrating a method for identifying named entities in an electronic medical record according to a fourth embodiment of the invention;
FIG. 10 is a diagram illustrating a processing procedure of a neural network system according to a fourth embodiment of the method for identifying named entities in electronic medical records of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.
The terminal of the embodiment of the invention can be a PC, and can also be a mobile terminal device with a display function, such as a smart phone, a tablet computer, an electronic book reader, an MP3(Moving Picture Experts Group Audio Layer III, dynamic video Experts compress standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, dynamic video Experts compress standard Audio Layer 3) player, a portable computer, and the like.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, the memory 1005, which is a type of computer storage medium, can include an operating system, a network communication module, a user interface module, and an electronic medical record named entity identification handler.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke the electronic medical record named entity recognition processing program stored in the memory 1005 and execute the steps of the electronic medical record named entity recognition method.
Referring to fig. 2, a first embodiment of the present invention provides a method for identifying named entities of electronic medical records, where the method includes:
and step S10, generating a word vector matrix corresponding to the character sequence of the electronic medical record of the named entity to be identified.
Firstly, acquiring a character sequence in the current medical history content of the electronic medical record of the named entity to be identified. Since the method for identifying named entities of electronic medical records provided by this embodiment is implemented by combining a convolutional network model (CNN) and a bidirectional long-short term memory network model (Bi-LSTM), and these network models can only process numerical type input, when acquiring a character sequence of an electronic medical record of a named entity to be identified, it needs to be converted into a vector form.
Word vectors corresponding to a character sequence can be obtained by using word vectors trained in advance, for example, word2vec vector representation method of Google is adopted, and the method can project characters into a low-dimensional space, in which the distances between words or phrases with similar semantemes are relatively close. For example, the distance between the words "china" and "guangzhou", "china" and "computer" is much smaller in this low dimensional space than between the words "china" and "guangzhou".
In order to obtain an accurate word vector by adopting a word2vec vector representation method, 10000 electronic histories are used as corpus training word vectors, and a Skip-Gram model in the word2vec is adopted for training. Although the Skip-Gram model is slower than the CBOW model in training, the Skip-Gram model is better than the CBOW in terms of corpus containing rare characters, and the obtained word vector is more matched with the character sequence of the electronic medical record.
Specifically, when a word vector corresponding to a character sequence is obtained by using a word2vec vector representation method, the word vector can be realized in an index manner. For example, if the character sequence of the electronic medical record is C (C1, C2 … Cn), n represents the length of the input character sequence, and a character index is generated according to the position of the character in the sequence. After the pre-trained word vector is obtained, the word vector corresponding to the character can be obtained through a character index table look-up, namely a word vector sequence x (x1, x2 … xn), wherein x belongs to Rn×dAnd d is the word vector space dimension.
And step S20, generating a radical vector matrix corresponding to the character sequence.
In a common named entity recognition method based on a neural network, a word vector or a word vector corresponding to a text to be recognized is usually input into a neural network model for label prediction, but the amount of information expressed by a word or a word is limited, and only the word vector or the word vector is relied on, so that the accuracy of named entity recognition is improved to a limited extent.
Based on the above-mentioned drawbacks of the prior art, the inventive concept of the present invention is formed from the perspective of deep mining of deeper levels of information that may exist within a word or phrase. Because Chinese characters are developed from pictographic characters, a plurality of characters still keep the original meanings thereof, and the meanings of a plurality of characters with similar shapes are similar, such as 'disease' and 'disease', 'pain' and 'pain', and the like, the character morphological information can be taken as the input of a neural network, the neural network is used for carrying out feature extraction on the characters, and deeper information existing in the characters or the words is provided for the later label prediction.
Intuitively, the component composition of the character reflects the form of the character to a certain extent, so that the Chinese character component composition information of the character can be acquired as the character form information, for example, the Chinese character component composition of the "and" character is acquired as the character form information of the "and" character.
Specifically, when the Chinese character component composition of a character is acquired, each Chinese character component is regarded as an independent component of the character, for example, the word "He" and the word "kou" are respectively the left component and the right component of the word "He" and "kou", and a corresponding component vector is generated for each component, a component sequence of the character, which comprises a plurality of components, corresponds to a component vector sequence, and the component vector sequence is equivalent to a two-dimensional component vector matrix. For the character sequence of the named entity to be recognized, the two-dimensional radical vector matrixes of a plurality of characters can form a three-dimensional radical vector matrix corresponding to the character sequence together.
Step S30, inputting the radical vector matrix into a first neural network for processing, so as to obtain a radical convolution vector matrix corresponding to the character sequence, where the first neural network includes a convolution neural network layer.
In this embodiment, a CNN convolutional neural network is used for feature extraction. For example, as shown in fig. 3, the partial sequence of the character "pain" to be recognized is a fixed length of memory allocated for the partial sequence during processing, and the character sequence includes partial padding. After the component vector of the component sequence of the character to be recognized is obtained, the component vector matrix is input into a CNN convolution neural network layer in a first neural network, and the component convolution vector matrix containing character internal form information is output after convolution processing of convolution layers, pooling processing of pooling layers and processing of full connection layers. It should be noted that the CNN convolutional neural network may include a plurality of convolutional layers, a plurality of pooling layers, and a plurality of fully-connected layers, and this embodiment does not limit this structure.
It is understood that the radical vector may be initialized randomly without using a trained vector, and the radical vector is also trained as a parameter in the first neural network.
And step S40, generating a word feature vector matrix according to the word vector matrix and the radical convolution vector matrix.
The character sequences in the electronic medical record of the named entity to be identified are processed through the steps to obtain the corresponding word vector matrix and the radical convolution vector matrix, and because the two vector matrices both contain the characteristic information of the character sequences of the named entity to be identified, the overall word characteristic vector matrix needs to be generated according to the two vector matrices.
Specifically, the vectors in the word vector matrix and the vectors in the character form information vector matrix are subjected to vector splicing. For example, for a character sequence C (C1, C2 … Cn), a word vector matrix X (X1, X2 … Xn) and a radical convolution vector matrix Y (Y1, Y2 … Yn) correspond respectively, where X1, X2 … Xn, Y1, and Y2 … Yn are vectors, a word vector corresponding to a Ci character in the character sequence C and a radical convolution vector are Xi and Yi respectively, Xi and Yi are vector-spliced to obtain a new vector Zi, a word vector corresponding to all characters in the character sequence C and a radical vector are spliced to obtain a new vector Zi, and further, a corresponding word feature vector matrix Z (Z1, Z2 … Zn) can be obtained.
And step S50, inputting the word feature vector matrix into a second neural network for processing to obtain the named entity recognition result of the electronic medical record, wherein the second neural network comprises a bidirectional long-short term memory network layer.
Since named entity recognition is a sequence tagging problem, the second neural network in this embodiment employs a bidirectional long-short term memory network (Bi-LSTM) to extract context information of a sequence, the long-short term memory network (LSTM) is a kind of network of RNN, the LSTM solves a gradient disappearance/explosion problem existing in RNN, and also solves a long-term dependency problem that RNN cannot capture a sequence.
The Bi-LSTM employed in this embodiment comprises LSTM networks in both the forward and backward directions. The word eigenvector matrix Z (Z1, Z2 … Zn) generated from the word vector matrix and the radical convolution vector matrix contains eigenvectors of n characters in the character sequence, the eigenvectors of the n characters are input into the forward LSTM network from left to right, and the hidden vectors corresponding to the eigenvectors of each character are output in sequence
Figure BDA0001846805320000091
Similarly, the feature vectors of the n characters are sequentially input into the backward LSTM network from right to left, and another hidden vector corresponding to the feature vector of each character is sequentially output
Figure BDA0001846805320000092
It can be understood that the feature vectors of n characters are processed by the bidirectional long-short term memory network, and context information of the sequence can be acquired, which is more comprehensive than information acquired by the unidirectional long-short term memory network. Splicing two hidden vectors corresponding to the feature vector of each character to obtain a bidirectional hidden vector
Figure BDA0001846805320000093
And the bidirectional hidden vectors corresponding to the feature vectors of all the characters are put into the same matrix to generate a total hidden vector matrix.
Further, the second neural network also comprises a full connection layer which is used for processing a total hidden vector matrix output by the bidirectional long-short term memory network and finally obtaining a probability matrix corresponding to the character sequence of the named entity to be identified. How to obtain the final named entity recognition result according to the probability matrix is explained next.
Named entity recognition, also called named recognition, refers to recognition of entities with specific meaning in text, for electronic medical records to be recognized in this embodiment, body parts, examination, diseases, symptoms, treatment, and the like.
Named entity recognition typically requires solving two problems: firstly, entity boundary identification, namely word segmentation; the second is to determine entity classes. The two problems can be solved by using labeled data for training a neural network and performing label prediction on characters of a named entity to be recognized in the neural network, wherein various label labeling methods can be adopted, such as an IOB label labeling method or a biees label labeling method.
In this embodiment, when a bios tag labeling method is used in the process of identifying a named entity of an electronic medical record, there are 15 types of defined tags: B-BodyPart, I-BodyPart, E-BodyPart, B-Check, I-Check, E-Check, B-Disease, I-Disease, E-Disease, B-Symptom, I-Symptom, E-Symptom, B-Treatment, I-Treatment, wherein the B-BodyPart label indicates the beginning of a "body part" entity, the I-BodyPart label indicates the interior of a "body part" entity, the E-BodyPart label indicates the end of a "body part" entity, the B-Check label indicates the beginning of a "test examination" entity, the I-Check label indicates the interior of a "test examination" entity, the E-Check label indicates the end of a "test examination" entity, the B-Disease label indicates the beginning of a "Disease" entity, the I-Disease label indicates the interior of a "Disease" entity, the E-Disease tag represents the end of the "Disease" entity, the B-Symptom tag represents the beginning of the "Symptom" entity, the I-Symptom tag represents the interior of the "Symptom" entity, the E-Symptom tag represents the end of the "Symptom" entity, the B-Treatment tag represents the beginning of the "Treatment" entity, the I-Treatment tag represents the interior of the "Treatment" entity, and the E-Treatment tag represents the end of the "Treatment" entity.
The probability value in the probability matrix obtained in the above step is the label classification probability of the character sequence prediction, for example, when the above 15 kinds of labels are defined, 15 probability values are corresponding to each character in the character sequence, that is, the probability value of the character prediction for each label, and the predicted label result with the highest probability value as the character is selected. After the prediction label of each character in the character sequence is determined, the character sequence can be subjected to word segmentation and entity category determination according to the meaning of the label, and named entity identification is completed.
Understandably, the parameters of the first neural network and the second neural network need to be trained by adopting a back propagation and gradient descent algorithm according to the electronic medical record of the identified named entity, so as to obtain better parameters, and improve the accuracy of the named entity identification.
Wherein, the character sequence acquisition of the electronic medical record of the identified named entity includes but is not limited to: running a script program to extract the current medical history part in the electronic medical record and converting the current medical history part into an xml file; importing the xml file into a labeling tool, and performing data labeling on a part of the xml file by a professional doctor; carrying out consistency detection on the data labeling result; if the detection result meets the expected threshold value, marking the rest files by a professional doctor; and running a script program to convert the file marked with the named entity into a training text required by the neural network.
For further explanation of the method for identifying named entities in electronic medical records in this embodiment, fig. 4 shows an illustration of a processing procedure of a neural network system in this embodiment. As shown in fig. 4, the neural network system includes a character embedding layer, a first neural network including a convolutional network layer, and a second neural network including a forward long-short term memory network layer and a backward long-short term memory network layer, and the process of the system for identifying the named entity of the electronic medical record is as follows:
1. and acquiring the text of the electronic medical record, and processing the text by taking 10 sentences as a group of input character embedding layers each time. The sentence length is set to be the maximum sentence length K in 10 sentences, the character radical sequence size is fixed to be 10, the dimension of a pre-trained word vector is 100 dimensions, and the dimension of a radical vector is set to be 50 dimensions, so that a group of 10 sentences forms a 10 XKx 100 word vector matrix and a 10 XKx 10X 50 word radical vector matrix after being processed by a character embedding layer.
2. And inputting the component vector matrix obtained in the step 1 into a convolution network layer for processing, wherein the window size of a convolution kernel is 3, the number of convolution kernels is 30, the pooling window is 2, the data obtained through the convolution network layer processing is a component convolution vector matrix of 10 multiplied by K multiplied by 30, namely the extracted internal form information of each character is represented by a 30-dimensional component vector, and the component vectors in the component vector matrix and the word vectors in the word vector matrix are spliced to obtain a word feature vector matrix of 10 multiplied by K multiplied by 130.
3. And (3) processing the character characteristic vectors obtained in the step (2) by a discarding layer (dropout) to prevent the model from being over-fitted, setting the specific gravity of the dropout to be 0.5, then inputting the dropout into the forward long and short term memory network and the backward long and short term memory network, setting the size of a hidden unit of the long and short term memory network to be 64, and splicing the output of each time step of the forward long and short term memory network and the backward long and short term memory network to obtain a hidden vector matrix of 10 multiplied by K multiplied by 128.
4. And (3) passing the hidden vector matrix vector obtained in the step (3) through a full connection layer, wherein the size of the full connection layer is the number N of the labels in the training sample, and then obtaining a probability matrix of 10 multiplied by K multiplied by N.
5. Since the output 10 × K × N matrix represents the probabilities that one character is marked as N labels, one label with the highest probability among the N probabilities may be selected as the label of the character. For example, the symbol sequences "neck, head, pain, and pain" in FIG. 4 are labeled "B-BodyPart (corresponding to B-BOD in the figure), I-BodyPart (corresponding to I-BOD in the figure), B-Symptom (corresponding to B-SYM in the figure), and I-Symptom (corresponding to I-SYM in the figure)" in this order.
In the embodiment, the method for identifying the named entity of the electronic medical record with high identification accuracy is provided by extracting the morphological characteristics in the characters of the electronic medical record and sequentially inputting the characteristics of the characters and the morphological characteristics in the characters into the deep neural network to predict the character label.
Further, referring to fig. 5, a second embodiment of the present invention provides a method for identifying named entities of electronic medical records based on the first embodiment, where the embodiment includes, in step S50:
and step S60, inputting the character feature vector matrix into the bidirectional long-short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence.
And step S70, inputting the implicit vector matrix into a self-attention mechanism layer for processing to obtain a prediction matrix corresponding to the character sequence.
In the research of the method for identifying the named entities of the electronic medical record, it is found that some entities have dependency relationships, such as the text in the electronic medical record: the symptoms appear repeatedly and aggravate year by year in 10 years, appear in winter and spring and after catching a cold, and go to local hospitals to see a doctor, and the doctor diagnoses the chronic bronchitis and the recurrent cough and expectoration. "season of winter and spring, catching a cold" in the text represents a cause-like entity, "chronic bronchitis" represents a disease-like entity, and "cough, expectoration" represents a symptom-like entity. It is obvious that the "winter and spring season" represents time in a general sentence, but in the medical history as a training sample of the neural network in the present embodiment, it represents an incentive because the coming of the winter and spring season induces the recurrence of seasonal diseases, and a professional doctor marks it as an incentive, so the neural network should mainly use the information of "chronic bronchitis" and "cough" and "expectoration" when deciding the entity type of the "winter and spring season". Therefore, in the present embodiment, a self-attention mechanism is adopted to directly calculate the dependency relationship between the entities by ignoring the distance between the entities.
Calculating the dependency relationship between the hidden vectors in the hidden vector matrix according to the following formula:
ft,t′=σ(wa tanh(wtht+wt′ht′)),
where t and t' represent different time steps, wa,wt,wt′Is a weight vector, sigma is a sigmoid function, htIs a sum of ht′Hidden vectors of different time steps;
for each hidden vector h according to the following formulakCalculating corresponding attention weights
Figure BDA0001846805320000121
Figure BDA0001846805320000122
Wherein e is an exponential function, N is the number of the hidden vectors,
Figure BDA0001846805320000123
the calculation of the attention weight is described in detail below with reference to fig. 6.
As shown in fig. 6, if the character sequence processed this time is { neck, head, pain }, the character sequence length is 4, the hidden vectors input to the self-attention mechanism layer are h1, h2, h3, and h4, and N in the corresponding formula takes a value of 4.
Since the character sequence is sequentially input to the neural network system for processing in time sequence, each character in the character sequence corresponds to a different time step in sequence, for example, the time steps corresponding to the four characters of "neck, region, pain, and pain" may be labeled as t1, t2, t3, and t4 in this example, and the hidden vector corresponding to each character corresponds to these time steps one by one.
For each character to be recognized in the character sequence, a hidden vector corresponding to the time step is output, and attention weight vectors marked by other time steps except the time step need to be correspondingly calculated. For example, a "neck" word is input at time t1, a hidden vector h1 is output corresponding to time t1, and time steps other than this time step include t2, t3, and t4, and are based on a predetermined rule
Figure BDA0001846805320000128
The weight vector to be calculated is
Figure BDA0001846805320000125
The formula for calculating the attention weight at this time is converted into the following formula, where the k value ranges include t2, t3, and t 4.
Figure BDA0001846805320000126
Attention weight obtained according to the following formulaMultiplying the corresponding hidden vector to obtain the final attention vector
Figure BDA0001846805320000127
And finally forming an attention vector matrix by the attention vectors corresponding to the plurality of hidden vectors.
Figure BDA0001846805320000131
Since the hidden vector matrix and the attention vector matrix both contain the prediction information of the character sequence of the named entity to be identified, the attention hidden vector matrix including the total information needs to be generated according to the two vector matrices.
Specifically, the hidden vector in the hidden vector matrix and the attention vector in the attention vector matrix are subjected to vector splicing according to the following formula.
Figure BDA0001846805320000132
For example, there is a hidden vector matrix H (H)1,H2…Hn) And attention vector matrix
Figure BDA0001846805320000133
Wherein H1、H2…HnAnd
Figure BDA0001846805320000134
are all vectors, will HiAnd
Figure BDA0001846805320000135
vector splicing is carried out to obtain a new vector H'iAll the hidden vectors and the attention vector are spliced to obtain a new vector, and a corresponding attention hidden vector matrix H '(H'1,H′2…H′n)。
And inputting the obtained attention hidden vector matrix into the full-connection layer for processing to obtain a prediction matrix corresponding to the character sequence.
And step S80, inputting the prediction matrix into the conditional random field model for processing to obtain the named entity recognition result of the electronic medical record.
If the character labels are predicted independently by directly using the hidden vectors obtained by the Bi-LSTM network layer or the self-attention mechanism layer, the dependency relationship among the labels is not considered, and a bottleneck can be met when the accuracy of the prediction result is improved. For example, the tag after I-symptom may be I-disease, and it is clear that this tag sequence is erroneous. In the named entity recognition task, labels usually have strong dependency relationship, for example, the next label of B-symmetry cannot be I-distance, or only I-symmetry can appear behind B-symmetry.
Therefore, in order to further improve the accuracy of named entity recognition, a Conditional Random Field (CRF) model is used in the present embodiment for final character label prediction. The CRF Model overcomes the disadvantage of independence assumption of a Hidden Markov Model (Hidden Markov Model), solves the marking offset problem of a Maximum Entropy Markov Model (Maximum Entry Markov Model), and explains the action principle of the CRF Model.
For an input sequence x (x1, x2 … xn), let P be the matrix obtained after the attention network, P ∈ Rn×sS is the number of labels, PijIndicating that the ith character in the input sequence is predicted to be the jth tag score. For a predicted sequence y (y1, y2 … yn), its score is defined as:
Figure BDA0001846805320000141
a represents a transition matrix, A ∈ Rs+2×s+2,AijRepresents the probability (score) of a transition from tag i to tag j, and then applying softmax on all possible tag sequences yields the probability of sequence y:
Figure BDA0001846805320000142
the log probability of the correct tag sequence is maximized during the training process:
Figure BDA0001846805320000143
yx denotes all possible tag sequences, including those that do not satisfy the BIOES labeling scheme constraints. In decoding, the maximum score obtained by predicting the output sequence is:
Figure BDA0001846805320000144
for a CRF model, it can be efficiently trained and decoded by employing the viterbi algorithm.
Finally, the method for identifying the named entity of the electronic medical record in the embodiment is further described with reference to fig. 6. Fig. 6 shows a schematic structure of a neural network system according to this embodiment, where the neural network system includes a character embedding layer, a first neural network including a radical CNN convolutional layer, and a second neural network including a bidirectional LSTM layer, a self-attention mechanism layer, and a conditional random field model, and a process of identifying a named entity in an electronic medical record by the system is as follows:
1. and acquiring the text of the electronic medical record, and processing the text by taking 10 sentences as a group of input character embedding layers each time. The sentence length is set to be the maximum sentence length K in 10 sentences, the character radical sequence size is fixed to be 10, the dimension of a pre-trained word vector is 100 dimensions, and the dimension of a radical vector is set to be 50 dimensions, so that a group of 10 sentences forms a 10 XKx 100 word vector matrix and a 10 XKx 10X 50 word radical vector matrix after being processed by a character embedding layer.
2. And inputting the component vector matrix obtained in the step 1 into a component CNN convolution network layer for processing, wherein the size of a convolution kernel window is 3, the number of convolution kernels is 30, a pooling window is 2, the data obtained by the component CNN convolution network layer processing is a component convolution vector matrix of 10 multiplied by K multiplied by 30, namely the extracted internal form information of each character is represented by a 30-dimensional component vector, and the component vector in the component vector matrix and the word vector in the word vector matrix are spliced to obtain a word feature vector matrix of 10 multiplied by K multiplied by 130.
3. And (3) processing the character feature vector obtained in the step (2) by a discarding layer (dropout) to prevent the model from being over-fitted, setting the specific gravity of the dropout to be 0.5, then inputting the dropout into a bidirectional LSTM network, setting the size of a hidden unit of the LSTM network to be 64, and splicing the output of each time step of the bidirectional LSTM to obtain a hidden vector matrix of 10 xKx 128.
4. And (3) sequentially processing the hidden vector matrix vector obtained in the step (3) by a self-attention mechanism layer and a conditional random field model to obtain a prediction probability matrix of 10 multiplied by K multiplied by N.
5. Since the output 10 × K × N matrix represents the probabilities that one character is marked as N labels, one label with the highest probability among the N probabilities may be selected as the label of the character.
In the embodiment, the method for identifying the named entity of the electronic medical record with high identification accuracy is provided by extracting the morphological characteristics in the characters of the electronic medical record and sequentially inputting the characteristics of the characters and the morphological characteristics in the characters into the deep neural network to predict the character label.
Further, referring to fig. 7, a third embodiment of the present invention provides a method for identifying named entities of electronic medical records based on the first embodiment, where the step S50 includes:
and step S90, inputting the character feature vector matrix into the bidirectional long-short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence.
And S100, inputting the hidden vector matrix into a self-attention mechanism layer for processing to obtain a named entity identification result of the electronic medical record.
It is understood that, based on the first embodiment, in consideration of different application scenarios or processing resources, the difference from the second embodiment is that, as shown in fig. 8, the second neural network in the present embodiment includes only a self-attention mechanism layer and does not include a conditional random field model.
After the hidden vector matrix output by the bidirectional long and short term memory network is input into the self-attention mechanism layer, the attention weight of the hidden vector in the hidden vector matrix is calculated, an attention vector matrix is generated according to the attention weight and the hidden vector, an attention hidden vector matrix is generated according to the hidden vector matrix and the attention vector matrix, and finally the attention hidden vector matrix is input into the full-connection layer to be processed to obtain a prediction probability matrix corresponding to the character sequence to be recognized.
And the probability value in the prediction probability matrix obtained in the step is the label classification probability predicted by the character sequence to be recognized, and the prediction label result with the highest probability value as the corresponding character is selected. After the prediction label of each character in the character sequence is determined, the character sequence can be subjected to word segmentation and entity category determination according to the meaning of the label, and named entity identification is completed.
In the embodiment, morphological features inside the electronic medical record characters are extracted through the convolutional neural network, the features of the characters and the morphological features inside the characters are sequentially input into the bidirectional long-short term memory network layer and the self-attention mechanism layer in the deep neural network to predict the character tags, and the accurate and efficient electronic medical record named entity recognition method is provided.
Further, referring to fig. 9, a fourth embodiment of the present invention provides a method for identifying named entities of electronic medical records based on the first embodiment, where the step S50 includes:
and step S110, inputting the character feature vector matrix into a bidirectional long-short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence.
And step S120, inputting the hidden vector matrix into a conditional random field model for processing to obtain a named entity recognition result of the electronic medical record.
It is understood that, based on the first embodiment, due to the consideration of different application scenarios or processing resources, the difference from the second embodiment is that, as shown in fig. 10, the second neural network in the present embodiment only includes the conditional random field model and does not include the self-attention mechanism layer.
And (3) after the hidden vector matrix output by the bidirectional long-short term memory network is input into the conditional random field model, processing to obtain a prediction probability matrix corresponding to the character sequence to be recognized.
And the probability value in the prediction probability matrix obtained in the step is the label classification probability predicted by the character sequence to be recognized, and the prediction label result with the highest probability value as the corresponding character is selected. After the prediction label of each character in the character sequence is determined, the character sequence can be subjected to word segmentation and entity category determination according to the meaning of the label, and named entity identification is completed.
In the embodiment, morphological features inside the electronic medical record characters are extracted through the convolutional neural network, the features of the characters and the morphological features inside the characters are sequentially input into the bidirectional long-short term memory network layer and the conditional random field model in the deep neural network to predict the character tags, and the accurate and efficient electronic medical record named entity recognition method is provided.
The invention also provides an electronic medical record named entity recognition device, which comprises: the electronic medical record named entity recognition processing program realizes the steps of the method for recognizing the electronic medical record named entity when being executed by the processor.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where an electronic medical record named entity identification processing program is stored on the computer-readable storage medium, and when the electronic medical record named entity identification processing program is executed by a processor, the steps of the method for identifying an electronic medical record named entity are implemented.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for identifying named entities of electronic medical records is characterized by comprising the following steps:
generating a word vector matrix corresponding to the character sequence of the electronic medical record of the named entity to be identified;
generating a radical vector matrix corresponding to the character sequence;
inputting the component vector matrix into a first neural network for processing to obtain a component convolution vector matrix corresponding to the character sequence, wherein the first neural network comprises a convolution neural network layer;
splicing the word vectors in the word vector matrix with the radical convolution vectors in the radical convolution vector matrix to obtain corresponding word characteristic vectors, and generating a word characteristic vector matrix based on the word characteristic vectors;
inputting the word feature vector matrix into a second neural network for processing to obtain a named entity recognition result of the electronic medical record, wherein the second neural network comprises a bidirectional long-short term memory network layer;
and the parameters of the first neural network and the second neural network are obtained by training according to the electronic medical record of the identified named entity.
2. The method for identifying named entities in electronic medical records according to claim 1, wherein the step of generating the radical vector matrix corresponding to the character sequence comprises:
acquiring Chinese character components of each character in the character sequence;
generating radical vectors of the characters according to the Chinese character components;
and generating a radical vector matrix corresponding to the character sequence according to the radical vector of each character.
3. The method for identifying named entities in electronic medical records according to claim 2, wherein the second neural network further comprises a full connection layer, and the step of inputting the word feature vector matrix into the second neural network for processing to obtain the identifying result of the named entities in the electronic medical records comprises:
inputting the character feature vector matrix into the bidirectional long and short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence;
and inputting the hidden vector matrix into the full connection layer for processing to obtain a named entity recognition result of the electronic medical record.
4. The method for identifying named entities in electronic medical records according to claim 2, wherein the second neural network further comprises a self-attention mechanism layer, and the step of inputting the word feature vector matrix into the second neural network for processing to obtain the identifying result of the named entities in the electronic medical records comprises:
inputting the character feature vector matrix into the bidirectional long and short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence;
and inputting the hidden vector matrix into a self-attention mechanism layer for processing to obtain a named entity recognition result of the electronic medical record.
5. The method for identifying named entities in electronic medical records according to claim 2, wherein the second neural network further comprises a self-attention mechanism layer and a conditional random field model, and the step of inputting the word feature vector matrix into the second neural network for processing to obtain the identifying result of the named entities in the electronic medical records comprises:
inputting the character feature vector matrix into the bidirectional long and short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence;
inputting the implicit vector matrix into a self-attention mechanism layer for processing to obtain a prediction matrix corresponding to the character sequence;
and inputting the prediction matrix into the conditional random field model for processing to obtain a named entity recognition result of the electronic medical record.
6. The method as claimed in claim 5, wherein the self-attention mechanism layer includes a full-link layer, and the step of inputting the hidden vector matrix into the self-attention mechanism layer for processing to obtain the prediction matrix corresponding to the character sequence includes:
calculating attention weights of hidden vectors in the hidden vector matrix;
generating an attention vector matrix according to the attention weight and the implicit vector;
generating an attention hidden vector matrix according to the hidden vector matrix and the attention vector matrix;
and inputting the attention hiding vector matrix into the full-connection layer for processing to obtain a prediction matrix corresponding to the character sequence.
7. The method for identifying named entities in electronic medical records according to claim 6, wherein the step of calculating the attention weight of the hidden vector in the hidden vector matrix comprises:
calculating the dependency relationship between the hidden vectors in the hidden vector matrix according to the following formula:
ft,t'=σ(watanh(wtht+wt'ht')),
where t and t' represent different time steps, wa,wt,wt'Is a weight vector, sigma is a sigmoid function, htIs a sum of ht'Hidden vectors of different time steps;
according to the following formula, each hidden vector h in the shown hidden vector matrixkCalculating corresponding attention weights
Figure FDA0003184788540000031
Figure FDA0003184788540000032
Wherein e is an exponential function, N is the number of the hidden vectors,
Figure FDA0003184788540000033
8. the method for identifying named entities in electronic medical records according to claim 2, wherein the second neural network further comprises a conditional random field model, and the step of inputting the word feature vector matrix into the second neural network for processing to obtain the identifying result of the named entities in the electronic medical records comprises:
inputting the character feature vector matrix into the bidirectional long and short term memory network for processing to obtain a hidden vector matrix corresponding to the character sequence;
and inputting the hidden vector matrix into a conditional random field model for processing to obtain a named entity recognition result of the electronic medical record.
9. An electronic medical record named entity recognition device, characterized in that, the electronic medical record named entity recognition device includes: the electronic medical record named entity recognition processing program is stored on the storage and can run on the processor, and when being executed by the processor, the electronic medical record named entity recognition processing program realizes the steps of the electronic medical record named entity recognition method according to any one of claims 1 to 8.
10. A storage medium, characterized in that the storage medium stores thereon a processing program for named entity identification of electronic medical record, and the processing program for named entity identification of electronic medical record realizes the steps of the method for named entity identification of electronic medical record according to any one of claims 1 to 8 when being executed by a processor.
CN201811282557.3A 2018-10-30 2018-10-30 Method, device and storage medium for identifying named entities of electronic medical records Active CN109388807B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811282557.3A CN109388807B (en) 2018-10-30 2018-10-30 Method, device and storage medium for identifying named entities of electronic medical records

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811282557.3A CN109388807B (en) 2018-10-30 2018-10-30 Method, device and storage medium for identifying named entities of electronic medical records

Publications (2)

Publication Number Publication Date
CN109388807A CN109388807A (en) 2019-02-26
CN109388807B true CN109388807B (en) 2021-09-21

Family

ID=65427746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811282557.3A Active CN109388807B (en) 2018-10-30 2018-10-30 Method, device and storage medium for identifying named entities of electronic medical records

Country Status (1)

Country Link
CN (1) CN109388807B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960728B (en) * 2019-03-11 2021-01-22 北京市科学技术情报研究所(北京市科学技术信息中心) Method and system for identifying named entities of open domain conference information
CN111797626B (en) * 2019-03-21 2024-06-21 阿里巴巴集团控股有限公司 Named entity recognition method and device
CN109933801B (en) * 2019-03-25 2022-03-29 北京理工大学 Bidirectional LSTM named entity identification method based on predicted position attention
CN109871544B (en) * 2019-03-25 2023-04-25 平安科技(深圳)有限公司 Entity identification method, device, equipment and storage medium based on Chinese medical record
CN110046349A (en) * 2019-03-26 2019-07-23 平安科技(深圳)有限公司 Information identifying method, device, equipment and storage medium based on Chinese case history
CN110135427B (en) * 2019-04-11 2021-07-27 北京百度网讯科技有限公司 Method, apparatus, device and medium for recognizing characters in image
CN110162782B (en) * 2019-04-17 2022-04-01 平安科技(深圳)有限公司 Entity extraction method, device and equipment based on medical dictionary and storage medium
CN110162784B (en) * 2019-04-19 2023-10-27 平安科技(深圳)有限公司 Entity identification method, device and equipment for Chinese medical record and storage medium
WO2019137562A2 (en) 2019-04-25 2019-07-18 Alibaba Group Holding Limited Identifying entities in electronic medical records
CN110287483B (en) * 2019-06-06 2023-12-05 广东技术师范大学 Unregistered word recognition method and system utilizing five-stroke character root deep learning
CN110516228A (en) * 2019-07-04 2019-11-29 湖南星汉数智科技有限公司 Name entity recognition method, device, computer installation and computer readable storage medium
CN110427493B (en) * 2019-07-11 2022-04-08 新华三大数据技术有限公司 Electronic medical record processing method, model training method and related device
CN112329465B (en) * 2019-07-18 2024-06-25 株式会社理光 Named entity recognition method, named entity recognition device and computer readable storage medium
CN110334357A (en) * 2019-07-18 2019-10-15 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition
CN110555441A (en) * 2019-09-10 2019-12-10 杭州橙鹰数据技术有限公司 character recognition method and device
CN111339764A (en) * 2019-09-18 2020-06-26 华为技术有限公司 Chinese named entity recognition method and device
CN110688855A (en) * 2019-09-29 2020-01-14 山东师范大学 Chinese medical entity identification method and system based on machine learning
CN110929749B (en) * 2019-10-15 2022-04-29 平安科技(深圳)有限公司 Text recognition method, text recognition device, text recognition medium and electronic equipment
CN111178074B (en) * 2019-12-12 2023-08-25 天津大学 Chinese named entity recognition method based on deep learning
CN111143534A (en) * 2019-12-26 2020-05-12 腾讯云计算(北京)有限责任公司 Method and device for extracting brand name based on artificial intelligence and storage medium
CN111192692B (en) * 2020-01-02 2023-12-08 上海联影智能医疗科技有限公司 Entity relationship determination method and device, electronic equipment and storage medium
CN111352977B (en) * 2020-03-10 2022-06-17 浙江大学 Time sequence data monitoring method based on self-attention bidirectional long-short term memory network
CN112434520A (en) * 2020-11-11 2021-03-02 北京工业大学 Named entity recognition method and device and readable storage medium
CN113408289B (en) * 2021-06-29 2024-04-16 广东工业大学 Multi-feature fusion supply chain management entity knowledge extraction method and system
CN115201904B (en) * 2022-07-18 2023-03-03 北京石油化工学院 Microseism data compression and event detection method based on edge intelligence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106980608A (en) * 2017-03-16 2017-07-25 四川大学 A kind of Chinese electronic health record participle and name entity recognition method and system
CN107977361A (en) * 2017-12-06 2018-05-01 哈尔滨工业大学深圳研究生院 The Chinese clinical treatment entity recognition method represented based on deep semantic information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933809A (en) * 2017-03-27 2017-07-07 三角兽(北京)科技有限公司 Information processor and information processing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106980608A (en) * 2017-03-16 2017-07-25 四川大学 A kind of Chinese electronic health record participle and name entity recognition method and system
CN107977361A (en) * 2017-12-06 2018-05-01 哈尔滨工业大学深圳研究生院 The Chinese clinical treatment entity recognition method represented based on deep semantic information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的中文机构名识别研究——一种汉字级别的循环神经网络方法;朱丹浩、杨蕾、王东波;《现代图书情报技术》;20161231;第37卷(第12期);第1-8页 *

Also Published As

Publication number Publication date
CN109388807A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN109388807B (en) Method, device and storage medium for identifying named entities of electronic medical records
CN109471945B (en) Deep learning-based medical text classification method and device and storage medium
CN106980683B (en) Blog text abstract generating method based on deep learning
CN112470160B (en) Device and method for personalized natural language understanding
CN108959482B (en) Single-round dialogue data classification method and device based on deep learning and electronic equipment
JP2021516398A (en) Music recommendation methods, equipment, computing equipment and media
KR101754473B1 (en) Method and system for automatically summarizing documents to images and providing the image-based contents
CN110033018B (en) Graph similarity judging method and device and computer readable storage medium
US11954881B2 (en) Semi-supervised learning using clustering as an additional constraint
CN111950596A (en) Training method for neural network and related equipment
CN110851641B (en) Cross-modal retrieval method and device and readable storage medium
CN109948735B (en) Multi-label classification method, system, device and storage medium
CN114676234A (en) Model training method and related equipment
CN114358203A (en) Training method and device for image description sentence generation module and electronic equipment
CN111898704B (en) Method and device for clustering content samples
CN112000778A (en) Natural language processing method, device and system based on semantic recognition
CN109284497B (en) Method and apparatus for identifying medical entities in medical text in natural language
CN114416995A (en) Information recommendation method, device and equipment
CN112529149A (en) Data processing method and related device
CN115062134A (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
CN109543187B (en) Method and device for generating electronic medical record characteristics and storage medium
WO2023134085A1 (en) Question answer prediction method and prediction apparatus, electronic device, and storage medium
CN116306612A (en) Word and sentence generation method and related equipment
CN115408599A (en) Information recommendation method and device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant