CN109388807A - The method, apparatus and storage medium of electronic health record name Entity recognition - Google Patents
The method, apparatus and storage medium of electronic health record name Entity recognition Download PDFInfo
- Publication number
- CN109388807A CN109388807A CN201811282557.3A CN201811282557A CN109388807A CN 109388807 A CN109388807 A CN 109388807A CN 201811282557 A CN201811282557 A CN 201811282557A CN 109388807 A CN109388807 A CN 109388807A
- Authority
- CN
- China
- Prior art keywords
- electronic health
- health record
- matrix
- name entity
- entity recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000036541 health Effects 0.000 title claims abstract description 118
- 238000000034 method Methods 0.000 title claims abstract description 77
- 239000013598 vector Substances 0.000 claims abstract description 238
- 239000011159 matrix material Substances 0.000 claims abstract description 182
- 230000007787 long-term memory Effects 0.000 claims abstract description 31
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims description 26
- 230000007246 mechanism Effects 0.000 claims description 23
- 210000005036 nerve Anatomy 0.000 claims description 21
- 230000015654 memory Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 8
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 abstract description 14
- 230000000877 morphologic effect Effects 0.000 abstract description 12
- 238000000605 extraction Methods 0.000 abstract description 5
- 201000010099 disease Diseases 0.000 description 20
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 20
- 208000024891 symptom Diseases 0.000 description 19
- 230000008569 process Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 11
- 238000002372 labelling Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 230000005484 gravity Effects 0.000 description 4
- 206010011224 Cough Diseases 0.000 description 3
- 206010036790 Productive cough Diseases 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 206010006458 Bronchitis chronic Diseases 0.000 description 2
- 206010006451 bronchitis Diseases 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 208000007451 chronic bronchitis Diseases 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 150000002016 disaccharides Chemical class 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Public Health (AREA)
- Biophysics (AREA)
- Primary Health Care (AREA)
- Medical Informatics (AREA)
- Epidemiology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a kind of methods of electronic health record name Entity recognition, it include: the corresponding word vector matrix of character string and radical vector matrix for generating the electronic health record of name entity to be identified, the radical vector matrix is input to convolutional neural networks layer to handle, obtain the corresponding radical convolution vector matrix of the character string, word eigenvectors matrix is generated according to the word vector matrix and radical convolution vector matrix, the word eigenvectors matrix is input in two-way shot and long term memory network and is handled, obtain the name Entity recognition result of the electronic health record.The invention also discloses a kind of electronic health record name entity recognition device and storage mediums.The morphological feature that the present invention passes through extraction electronic health record character inner, the morphological feature of the feature of character itself and character inner is sequentially inputted to predict alphanumeric tag in deep neural network, provides a kind of method of the electronic health record that recognition accuracy is high name Entity recognition.
Description
Technical field
The present invention relates to the methods of field of computer technology more particularly to a kind of electronic health record name Entity recognition, electronics
Case history names entity recognition device and computer storage medium.
Background technique
With booming and living standards of the people the increasingly raising of Chinese society economy, the also day of people's health consciousness
How benefit enhancing, construct the urgent need that intelligent medical system is present society using a large amount of medical data.Electronic health record
It is that medical data mileage amount is most, comprising information also most medical data text, has its unique professional, for trouble
It person and is write by medical practitioner, has recorded out various symptoms during being admitted to hospital, the disease of diagnosis and corresponding in detail
Treatment means, there are also the results etc. of all kinds of audit reports, contain a large amount of medical information.Therefore many intelligent medical information
System is all based on the information of electronic health record to construct.During constructing intelligent medical information system and system, name is real
Body identification is the basis that the vital task of information extraction is carried out to a large amount of medical data, the information processing to various medical fields
It is particularly significant with management system.
There is the name entity recognition method towards medical field based on deep learning in the prior art, has utilized nerve net
Network model extracts the contextual information between word or word, exports the probability distribution of an entity class.But due to word or word
Information indicate incomplete, rely only on word vector or term vector, do not account for the deep information hiding inside word or word, identification effect
Fruit is bad.
Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill
Art.
Summary of the invention
The main purpose of the present invention is to provide a kind of method of electronic health record name Entity recognition, electronic health record name are real
Body identification device, electronic health record name Entity recognition equipment and computer storage medium, it is intended to solve the taken base of the prior art
Word vector or term vector are relied only in the implementation method of deep learning, do not account for deep layer letter hiding inside word or word
Breath, the bad technical problem of recognition effect.
To achieve the above object, the present invention provides a kind of method of electronic health record name Entity recognition, the electronic health record
The method of name Entity recognition includes the following steps:
Generate the corresponding word vector matrix of character string of the electronic health record of name entity to be identified;
Generate the corresponding radical vector matrix of the character string;
The radical vector matrix is input to first nerves network to handle, it is corresponding partially to obtain the character string
Other convolution vector matrix, wherein the first nerves network includes convolutional neural networks layer;
Word eigenvectors matrix is generated according to the word vector matrix and the radical convolution vector matrix;
The word eigenvectors matrix is input in nervus opticus network and is handled, the life of the electronic health record is obtained
Name Entity recognition result, wherein the nervus opticus network includes two-way shot and long term memory network layer;
Wherein, the parameter of the first nerves network and the nervus opticus network is according to the electronics for having identified name entity
Case history training obtains.
Preferably, described the step of generating the character string corresponding radical vector matrix, includes:
Obtain the Hanzi component of each character in the character string;
The radical vector of each character is generated according to the Hanzi component;
The corresponding radical vector matrix of the character string is generated according to the radical vector of each character.
Preferably, the nervus opticus network further includes full articulamentum, described to be input to the word eigenvectors matrix
The step of being handled in nervus opticus network, obtaining the name Entity recognition result of the electronic health record include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character
The corresponding hidden vector matrix of sequence;
The hidden vector matrix is inputted the full articulamentum to handle, the name entity for obtaining the electronic health record is known
Other result.
Preferably, the nervus opticus network further include from attention mechanism layer, it is described that the word eigenvectors matrix is defeated
The step of entering into nervus opticus network and handled, obtaining the name Entity recognition result of the electronic health record include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character
The corresponding hidden vector matrix of sequence;
The hidden vector matrix is input to from attention mechanism layer and is handled, the name entity of the electronic health record is obtained
Recognition result.
Preferably, the nervus opticus network further include from attention mechanism layer and conditional random field models, it is described will be described
Word eigenvectors matrix is input in nervus opticus network and is handled, and obtains the name Entity recognition result of the electronic health record
The step of include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character
The corresponding hidden vector matrix of sequence;
The hidden vector matrix is input to from attention mechanism layer and is handled, the corresponding prediction of the character string is obtained
Matrix;
The prediction matrix is inputted the conditional random field models to handle, the name for obtaining the electronic health record is real
Body recognition result.
Preferably, it is described from attention mechanism layer include full articulamentum, it is described by the hidden vector matrix be input to from pay attention to
The step of mechanism layer is handled, and the character string corresponding prediction matrix is obtained include:
Calculate the attention weight of hidden vector in the hidden vector matrix;
Attention vector matrix is generated according to the attention weight and the hidden vector;
The hidden vector matrix of attention is generated according to the hidden vector matrix and the attention vector matrix;
The hidden vector matrix of the attention is inputted the full articulamentum to handle, it is corresponding to obtain the character string
Prediction matrix.
Preferably, the attention weight step for calculating hidden vector in the hidden vector matrix includes:
The dependence in the hidden vector matrix between hidden vector is calculated according to following formula:
ft,t′=σ (wa tanh(wtht+wt′ht′)),
Wherein, t and t' indicates different time steps, wa, wt, wt′For weight vectors, σ is sigmoid function, htFor and ht′
For the hidden vector of different time step;
It is each hidden vector h in shown hidden vector matrix according to following formulakCalculate corresponding attention weight
Wherein, e is exponential function, and N is the number of the hidden vector,
Preferably, the nervus opticus network further includes conditional random field models, described by the word eigenvectors matrix
The step of being input in nervus opticus network and handled, obtaining the name Entity recognition result of the electronic health record include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character
The corresponding hidden vector matrix of sequence;
The hidden vector matrix is input to conditional random field models to handle, the name for obtaining the electronic health record is real
Body recognition result.
In addition, to achieve the above object, the present invention also provides the electronic health records to name entity recognition device, the device packet
It includes: the electronic health record name entity that memory, processor and being stored in can be run on the memory and on the processor
Identifying processing program, the electronic health record name Entity recognition processing routine are realized as described above when being executed by the processor
Electronic health record names the step of method of Entity recognition.
In addition, to achieve the above object, the present invention also proposes a kind of computer storage medium, which is characterized in that the meter
The processing routine of electronic health record name Entity recognition is stored on calculation machine storage medium, the electronic health record name Entity recognition
The step of method of electronic health record name Entity recognition as described above is realized when processing routine is executed by processor.
The method for the electronic health record name Entity recognition that the embodiment of the present invention proposes, electronic health record name entity recognition device
And computer storage medium, generate the corresponding word vector matrix of character string and radical of the electronic health record of name entity to be identified
The radical vector matrix is input to convolutional neural networks layer and handled by vector matrix, and it is corresponding to obtain the character string
Radical convolution vector matrix, word eigenvectors matrix is generated according to the word vector matrix and radical convolution vector matrix, will
The word eigenvectors matrix is input in two-way shot and long term memory network and is handled, and the name for obtaining the electronic health record is real
Body recognition result.The present invention, will be in the feature and character of character itself by the morphological feature of extraction electronic health record character inner
The morphological feature in portion is sequentially inputted to predict alphanumeric tag in deep neural network, and it is high to provide a kind of recognition accuracy
Electronic health record name Entity recognition method.
Detailed description of the invention
Fig. 1 is the apparatus structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is the flow diagram that electronic health record of the present invention names entity recognition method first embodiment;
Fig. 3 is that electronic health record of the present invention names the convolutional neural networks treatment process of entity recognition method first embodiment to show
It is intended to;
Fig. 4 is that electronic health record of the present invention names the nerve network system treatment process of entity recognition method first embodiment to show
It is intended to;
Fig. 5 is the flow diagram that electronic health record of the present invention names entity recognition method second embodiment;
Fig. 6 is that electronic health record of the present invention names the nerve network system treatment process of entity recognition method second embodiment to show
It is intended to;
Fig. 7 is the flow diagram that electronic health record of the present invention names entity recognition method 3rd embodiment;
Fig. 8 is that electronic health record of the present invention names the nerve network system treatment process of entity recognition method 3rd embodiment to show
It is intended to;
Fig. 9 is the flow diagram that electronic health record of the present invention names entity recognition method fourth embodiment;
Figure 10 is the nerve network system treatment process that electronic health record of the present invention names entity recognition method fourth embodiment
Schematic diagram.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, Fig. 1 is the terminal structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to.
The terminal of that embodiment of the invention can be PC, be also possible to smart phone, tablet computer, E-book reader, MP3
(Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3)
Player, MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard sound
Frequency level 3) the packaged type terminal device having a display function such as player, portable computer.
As shown in Figure 1, the terminal may include: processor 1001, such as CPU, network interface 1004, user interface
1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing the connection communication between these components.
User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user interface
1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include that the wired of standard connects
Mouth, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable memory
(non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processor
1001 storage device.
Optionally, terminal can also include camera, RF (Radio Frequency, radio frequency) circuit, sensor, audio
Circuit, WiFi module etc..Wherein, sensor such as optical sensor, motion sensor and other sensors.Specifically, light
Sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can according to the light and shade of ambient light come
The brightness of display screen is adjusted, proximity sensor can close display screen and/or backlight when mobile terminal is moved in one's ear.As
One kind of motion sensor, gravity accelerometer can detect the size of (generally three axis) acceleration in all directions, quiet
Size and the direction that can detect that gravity when only, the application that can be used to identify mobile terminal posture are (such as horizontal/vertical screen switching, related
Game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.;Certainly, mobile terminal can also match
The other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor are set, details are not described herein.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal of terminal structure shown in Fig. 1, can wrap
It includes than illustrating more or fewer components, perhaps combines certain components or different component layouts.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium
Believe that module, Subscriber Interface Module SIM and electronic health record name Entity recognition processing routine.
In terminal shown in Fig. 1, network interface 1004 is mainly used for connecting background server, carries out with background server
Data communication;User interface 1003 is mainly used for connecting client (user terminal), carries out data communication with client;And processor
1001 can be used for that the electronic health record stored in memory 1005 is called to name Entity recognition processing routine, and execute the electronics
Case history names the step of entity recognition method.
Referring to Fig. 2, first embodiment of the invention provides a kind of method of electronic health record name Entity recognition, the method packet
It includes:
Step S10 generates the corresponding word vector matrix of character string of the electronic health record of name entity to be identified.
The character string in the present illness history content of the electronic health record of name entity to be identified is first obtained first.Due to this implementation
The electronic health record name entity recognition method that example provides is by combining convolutional network model (CNN) and two-way shot and long term to remember net
Network model (Bi-LSTM) Lai Shixian, and these network models can only handle the input of value type, so to be identified getting
When naming the character string of the electronic health record of entity, need to convert thereof into the form of vector.
Trained word vector in advance usually can be used and obtain the corresponding word vector of character string, for example use Google
Word2vec vector representation method, this method can project to character in lower dimensional space, and the meaning of one's words is close in the lower dimensional space
Word or word distance it is all closer.For example, " China " and " Guangzhou ", " China " and " computer " two groups of words, the former is at this
Distance in lower dimensional space is much smaller than the distance between the latter.
In order to obtain accurate term vector using word2vec vector representation method, come using 10000 parts of electronic health records
As corpus training word vector, and trained using the Skip-Gram model in word2vec.Although the Skip- in training
Gram model ratio CBOW model is slow, but Skip-Gram model shows than CBOW on the corpus comprising rare character
Good, the matching degree of the character string of obtained word vector sum electronic health record is higher.
Specifically, when obtaining the corresponding term vector of character string with word2vec vector representation method, available index
Mode realize.For example, n indicates the character string length of input, root if the character string of electronic health record is C (C1, C2 ... Cn)
Character index is generated according to the position of character in the sequence.After obtaining trained word vector in advance, tabled look-up by character index
The corresponding word vector of character be can be obtained to get to word sequence vector x (x1, x2 ... xn), x ∈ Rn×d, d is word vector space dimension
Degree.
Step S20 generates the corresponding radical vector matrix of the character string.
In common name entity recognition method neural network based, it will usually by the corresponding word of text to be identified to
Amount or word vector are input to neural network model and carry out Tag Estimation, but information content expressed by word or word is limited, relies only on
Word vector or term vector name the accuracy rate of Entity recognition to improve limited.
Based on defect in the prior art set forth above, the deeper time that may be present inside deep excavation word or word
The angle of information form inventive concept of the invention.Since Chinese character is developed by pictograph, many texts are also
Its in store primitive meaning, similar its meaning of text of many shapes is also close, such as " disease " and " disease ", " pain " and " pain " etc., because
This considers the input that the shape information of character can be also used as to neural network, carries out feature extraction to it with neural network, is
Tag Estimation later provides deeper information existing for word or word inside.
From the point of view of intuitive, the component of character constitutes the form for reflecting character to a certain extent, therefore available character
Hanzi component configuration information as character shape information, the Hanzi component for such as obtaining "and" word constitutes " standing grain, mouth " and is used as "and"
The character shape information of word.
Specifically, when the Hanzi component for getting character is constituted, each Hanzi component is considered as one of the character
Independent radical, such as " standing grain " and " mouth " are by the left avertence of "and" word and by right avertence respectively, are that each radical generation one is right
The radical sequence comprising multiple radicals of the radical vector answered, the character is corresponding with radical sequence vector, radical sequence vector etc.
It is same as two-dimentional radical vector matrix.For the character string of name entity to be identified, the two-dimentional radical vector matrix of multiple characters
The corresponding three-dimensional radical vector matrix of the character string can be formed together.
The radical vector matrix is input to first nerves network and handled, obtains the character string by step S30
Corresponding radical convolution vector matrix, wherein the first nerves network includes convolutional neural networks layer.
Feature extraction is carried out using CNN convolutional neural networks in the present embodiment.As Fig. 3 gives at convolutional neural networks
Reason process schematic, such as the radical sequence of character to be identified " pain " word are as shown in Figure 3, are radical sequence point when due to processing
Regular length is saved as in matching, which includes radical filling.The radical sequence for obtaining character to be identified radical to
After amount, radical vector matrix is input in the CNN convolutional neural networks layer in first nerves network, respectively by convolutional layer
Process of convolution, the processing of the pondization of pond layer and the processing of articulamentum entirely, output include the radical convolution of character inner shape information
Vector matrix.It should be noted that CNN convolutional neural networks may include multiple convolutional layers, multiple pond layers and multiple full connections
Layer, the present embodiment are not defined this structure.
It is to be appreciated that radical vector can not use trained vector, first random initializtion, radical vector also when
Make parameter training in first nerves network.
Step S40 generates word eigenvectors matrix according to the word vector matrix and the radical convolution vector matrix.
Processing of the character string Jing Guo above-mentioned steps in the electronic health record of name entity to be identified obtain corresponding word to
Moment matrix and radical convolution vector matrix, due to the character string in the two vector matrixs all comprising name entity to be identified
Characteristic information, so needing to generate overall word eigenvectors matrix according to the two vector matrixs.
Specifically the vector in the vector sum character form information vector matrix in word vector matrix is carried out vector
Splicing.For example, having respectively corresponded word vector matrix X (X1, X2 ... Xn) and radical convolution for character string C (C1, C2 ... Cn)
Vector matrix Y (Y1, Y2 ... Yn), wherein X1, X2 ... Xn and Y1, Y2 ... Yn are vector, and the Ci character in character string C is corresponding
Word vector sum radical convolution vector be respectively Xi and Yi, Xi and Yi is subjected to vector and splices to obtain new vector Z i, by character
The new vector that the corresponding word vector sum radical vector of all characters in sequence C splices, further can be obtained corresponding
Word eigenvectors matrix Z (Z1, Z2 ... Zn).
The word eigenvectors matrix is input in nervus opticus network and handles, obtains the electronics by step S50
The name Entity recognition result of case history, wherein the nervus opticus network includes two-way shot and long term memory network layer.
Because naming Entity recognition is sequence labelling problem, the nervus opticus network in the present embodiment uses two-way length
Short-term memory network (Bi-LSTM) carrys out the contextual information of abstraction sequence, and shot and long term memory network (LSTM) is a kind of net of RNN
Network, LSTM solve the disappearance of gradient present in RNN/explosion issues, also solve RNN cannot capture sequence it is long when rely on ask
Topic.
Bi-LSTM employed in the present embodiment includes the LSTM network in the two directions of forward and backward.According to word to
The above-mentioned word eigenvectors matrix Z (Z1, Z2 ... Zn) that moment matrix and radical convolution vector matrix generate contains n in character string
The feature vector of a character is sequentially output to LSTM network before being sequentially inputted to the feature vector of this n character from left to right
Hidden vector corresponding with the feature vector of each characterSimilarly, the feature vector of this n character is successively defeated from right to left
Enter to rear to LSTM network, is sequentially output another hidden vector corresponding with the feature vector of each character
It is to be appreciated that the feature vector of n character is passed through to the processing of two-way shot and long term memory network, it is available to arrive
The context information of sequence, the information captured compared to unidirectional shot and long term memory network is more comprehensively.By the spy of each character
The corresponding two hidden vectors of sign vector splice to obtain two-way hidden vectorAnd by the feature vector pair of all characters
The two-way hidden vector answered is put into the same matrix, to generate total hidden vector matrix.
Further, nervus opticus network further includes full articulamentum, for handling two-way shot and long term memory network output
Total hidden vector matrix finally obtains the corresponding probability matrix of character string of name entity to be identified.Will be illustrated next how
Final name Entity recognition result is obtained according to probability matrix.
Name Entity recognition is also referred to as proper name identification, refers to the entity with certain sense in identification text, for this reality
For applying the electronic health record for needing to identify in example, it is physical feeling, examines inspection, disease, symptom, treatment etc..
Name Entity recognition usually requires to solve two problems: first is that entity Boundary Recognition, that is, segment;Second is that determining entity
Classification.The solution of both of these problems can be by being used to train neural network, in neural network by the data for having marked label
Tag Estimation is carried out to realize to the character of name entity to be identified, wherein a variety of label for labelling methods, such as IOB can be used
Label for labelling method or BIOES label for labelling method.
In the present embodiment, when using BIOSE label for labelling method during the name Entity recognition to electronic health record, institute
The label of definition has 15 kinds: B-BodyPart, I-BodyPart, E-BodyPart, B-Check, I-Check, E-Check, B-
Disease,I-Disease,E-Disease,B-Symptom,I-Symptom,E-Symptom,B-Treatment,I-
Treatment, I-Treatment, wherein the beginning of B-BodyPart tag representation " physical feeling " entity, I-BodyPart
The inside of tag representation " physical feeling " entity, the end of E-BodyPart tag representation " physical feeling " entity, B-Check mark
Label indicate the beginning of " examine and check " entity, and the inside of entity, E-Check label list " are examined and checked " to I-Check tag representation
Show the end of " examine and check " entity, the beginning of B-Disease tag representation " disease " entity, I-Disease tag representation " disease
The inside of disease " entity, the end of E-Disease tag representation " disease " entity, B-Symptom tag representation " symptom " entity
Start, the inside of I-Symptom tag representation " symptom " entity, the end of E-Symptom tag representation " symptom " entity, B-
The beginning of Treatment tag representation " treatment " entity, the inside of I-Treatment tag representation " treatment " entity, E-
The end of Treatment tag representation " treatment " entity.
Probability value in the probability matrix obtained in above-mentioned steps is the labeling probability of character string prediction, such as
When defining above-mentioned 15 kinds of labels, each of character string character is corresponding with 15 probability values, i.e. the Character prediction is
The probability value of every kind of label, the highest prediction label result for the character of select probability value.It is every in character string has been determined
After the prediction label of a character, the determination of participle and entity class can be carried out to character string according to the meaning of label, is completed
Name Entity recognition.
It is to be appreciated that needing using backpropagation and gradient descent algorithm, according to the electronics disease for having identified name entity
It goes through and the parameter of first nerves network and nervus opticus network is trained, obtain preferably parameter, known with improving name entity
Other accuracy.
Wherein, identified that the character string of the electronic health record of name entity obtains, including but not limited to: Run Script program
It extracts the present illness history part in electronic health record and is converted into xml document;Xml document is imported into annotation tool, by medical practitioner elder generation
Data mark is carried out to a portion xml document;Consistency detection is carried out to data annotation results;If testing result meets
It is expected that threshold value, marks remaining file by medical practitioner;The file for marking good lot name entity is converted mind by Run Script program
The training text needed through network.
In order to which the method for the identification electronic health record name entity to the present embodiment is described further, given in Fig. 4
A kind of signal of the nerve network system treatment process of the present embodiment.As shown in figure 4, the nerve network system includes character insertion
Layer, the first nerves network comprising convolutional network layer, including it is preceding to shot and long term memory network layer, backward shot and long term memory network layer
Nervus opticus network, the system identification electronic health record name entity process are as follows:
1, electronic health record text is obtained, is every time that one group of input character embeding layer is handled with 10 sentences.By sentence
Length is set as the maximum sentence length K in 10 sentences, and character radical sequence size is fixed as 10, the good word vector of pre-training
Dimension is 100 dimensions, and radical vector dimension is set as 50 dimensions, therefore one group of 10 sentence forms 10 after the processing of character embeding layer
The word vector matrix of × K × 100 and the word radical vector matrix of 10 × K × 10 × 50.
2, the input convolutional network layer processing of radical vector matrix is obtained in 1, convolution kernel window size is 3, convolution kernel
Quantity is 30, and pond window is 2, is the radical convolution moment of a vector of 10 × K × 30 by the data that convolutional network layer is handled
Gust, that is, the radical vector that the external morphology information of each character extracted is tieed up with one 30 indicates, will be in radical vector matrix
Radical vector sum word vector matrix in word vector splicing after obtain the word eigenvectors matrix of 10 × K × 130.
3, obtained in 2 word feature vector first pass through abandon layer (dropout) processing, to prevent model over-fitting,
The specific gravity of dropout is set as 0.5, before being then input to again into shot and long term memory network and backward shot and long term memory network,
If the hidden unit size of shot and long term memory network is 64, preceding to shot and long term memory network and backward shot and long term memory network
The output of each time step is stitched together, and obtains the hidden vector matrix of 10 × K × 128.
4, hidden vector matrix vector is obtained in 3 by a full articulamentum, the size of full articulamentum is in training sample
Number of labels N, then obtain 10 × K × N probability matrix.
5, since 10 × K of output × N matrix expression is all the probability that a character is marked as N number of label, so
Select label of the label of probability maximum probability in N number of probability as character.Such as in Fig. 4 character string " neck,
The label that portion, pain, pain " is determined is followed successively by that " B-BodyPart (B-BOD in corresponding diagram), I-BodyPart are (in corresponding diagram
I-BOD), B-Symptom (B-SYM in corresponding diagram), I-Symptom (I-SYM in corresponding diagram) ".
In the present embodiment, by extracting the morphological feature of electronic health record character inner, by the feature and word of character itself
The internal morphological feature of symbol is sequentially inputted to predict alphanumeric tag in deep neural network, provides a kind of identification accurately
The method of the high electronic health record name Entity recognition of rate.
Further, referring to Fig. 5, second embodiment of the invention is based on first embodiment and provides a kind of electronic health record name reality
Body knows method for distinguishing, and the present embodiment includes: in step S50
The word eigenvectors matrix is input to the two-way shot and long term memory network and handled, obtained by step S60
The corresponding hidden vector matrix of the character string.
The hidden vector matrix is input to from attention mechanism layer and handles, obtains the character string pair by step S70
The prediction matrix answered.
It is found in the research to electronic health record name entity recognition method, there is dependence, examples between some entities
Such as this section of text in electronic health record: " above-mentioned symptom occurs repeatedly and aggravates year by year over 10 years, winter-spring season and goes out after suffering from cold
It is existing, until local hospital is medical, it is diagnosed as chronic bronchitis, occurs cough, expectoration repeatedly." in text " winter-spring season, by
It is cool " indicate inducement class entity, " chronic bronchitis " indicates disease class entity, and " cough, expectoration " indicates symptom class entity.It is very bright
Aobvious, " winter-spring season " indicates the time in general sentence, but in the present embodiment as the training sample of neural network
In case history, it indicates inducement, because the arrival of winter-spring season induces the recurrence of Seasonal diseases, professional doctor marks it
For inducement, so neural network, in the entity type of decision " winter-spring season ", neural network should mainly use " chronic branch gas
Guan Yan " and " cough ", the information of " expectoration ".Therefore, in the present embodiment using ignoring the distance between entity from attention mechanism,
Directly calculate the dependence between them.
The dependence between the hidden vector in the hidden vector matrix is calculated according to following formula:
ft,t′=σ (wa tanh(wtht+wt′ht′)),
Wherein, t and t' indicates different time steps, wa, wt, wt′For weight vectors, σ is sigmoid function, htFor and ht′
For the hidden vector of different time step;
It is each hidden vector h according to following formulakCalculate corresponding attention weight
Wherein, e is exponential function, and N is the number of the hidden vector,
The calculating of attention weight is described in detail below with reference to Fig. 6.
As shown in Figure 6, if the character string of this processing is { neck, portion, pain, pain }, then character string length is 4,
Being input to from the hidden vector of attention mechanism layer has h1, h2, h3, h4, and the N value in corresponding formula is 4.
Since character string is sequentially inputted to be handled in nerve network system sequentially in time, so character
Each character in sequence is corresponding in turn to continuous different time step, such as can be in this example by " neck, portion, pain, pain "
This corresponding time step of four characters is labeled as t1, t2, t3, t4, at the same the corresponding hidden vector of each character also with these times
Step corresponds.
For each character to be identified in character string, there is the hidden vector of output corresponding with its time step, together
When to need corresponding calculate in addition to this time step be the attention weight vectors identified with other times step.Such as in t1
Input " neck " word is carved, has the hidden vector h1 of output corresponding with the t1 moment, the time step other than this time step includes t2, t3, t4, root
According to preset ruleCalculative weight vectors areAttention weight is calculated at this time
Fortran is following formula, wherein k value range includes t2, t3, t4.
Corresponding hidden vector, which is multiplied by, with obtained attention weight according to following formula obtains attention force vector to the endThe corresponding attention force vector of multiple hidden vectors is finally formed into attention vector matrix.
Due to the prediction of the character string in hidden vector matrix and attention vector matrix all comprising name entity to be identified
Information, so needing to generate the hidden vector matrix of attention including overall information according to the two vector matrixs.
Specifically by the attention force vector in the hidden vector sum attention vector matrix in hidden vector matrix according to following
Formula carry out vector splicing.
For example, there is hidden vector matrix H (H1,H2…Hn) and attention vector matrixWherein H1、
H2…HnWithIt is vector, by HiWithVector is carried out to splice to obtain new vector H 'i, by it is all it is hidden to
Amount and pays attention to the new vector that splices of force vector and obtain the corresponding hidden vector matrix H ' (H ' of attention1,H′2…H′n)。
The hidden vector matrix of obtained attention is inputted the full articulamentum to handle, it is corresponding pre- to obtain character string
Survey matrix.
The prediction matrix is inputted the conditional random field models and handled, obtains the electronic health record by step S80
Name Entity recognition result.
If directly individually to predict character mark using Bi-LSTM network layer or the hidden vector obtained from attention mechanism layer
Label, do not account for the dependence between label, can be potentially encountered bottleneck when promoting prediction result accuracy.Such as I-
The back symptom label may be I-disease, it is evident that this sequence label is wrong.Usually know in name entity
In other task, there is very strong dependence between label, such as next label of B-symptom cannot be I-disease, or
I-symptom can only occur in the back person B-symptom.
Therefore, in order to further increase the accuracy of name Entity recognition, in the present embodiment use condition random field
(CRF) model is predicted to carry out final alphanumeric tag.CRF model overcomes hidden Markov model (Hidden Markov
Model) independence assumption disadvantage, and solve maximum entropy Markov model (Maximum Entropy Markov Model)
Marking bias problem, the action principle of CRF model is illustrated below.
To a list entries x (x1, x2 ... xn), if P is the matrix through obtaining from after attention network, P ∈ Rn×s, s is
Number of labels, PijIndicate that i-th of Character prediction is j-th of label score in list entries.For a forecasting sequence y (y1,
Y2 ... yn), define its score are as follows:
A indicates transfer matrix, A ∈ Rs+2×s+2, AijThe probability (score) that label j is transferred to from label i is indicated, then in institute
The probability of sequences y is generated on possible sequence label using softmax:
The log probability of correct sequence label is maximized in the training process:
Yx indicates all possible sequence label, and the faulty sequence of BIOES labelling schemes constraint is unsatisfactory for including those.?
When decoding, the largest score that output sequence obtains is predicted are as follows:
It, can be by using viterbi algorithm effectively training and decoding for CRF model.
Fig. 6 is finally combined to be described further the method for the identification electronic health record name entity of the present embodiment.In Fig. 6
A kind of nerve network system structural representation of the present embodiment is given, which includes character embeding layer, comprising inclined
The first nerves network of other CNN convolutional layer, including the second minds LSTM layers two-way, from attention mechanism layer and conditional random field models
Through network, which names the process of entity are as follows:
1, electronic health record text is obtained, is every time that one group of input character embeding layer is handled with 10 sentences.By sentence
Length is set as the maximum sentence length K in 10 sentences, and character radical sequence size is fixed as 10, the good word vector of pre-training
Dimension is 100 dimensions, and radical vector dimension is set as 50 dimensions, therefore one group of 10 sentence forms 10 after the processing of character embeding layer
The word vector matrix of × K × 100 and the word radical vector matrix of 10 × K × 10 × 50.
2, the input radical CNN convolutional network layer processing of radical vector matrix is obtained in 1, convolution kernel window size is 3,
Convolution nuclear volume is 30, and pond window is 2, is the inclined of 10 × K × 30 by the data that radical CNN convolutional network layer is handled
Other convolution vector matrix, that is, the radical vector that the external morphology information of each character extracted is tieed up with one 30 indicate, will be inclined
The word feature vector of 10 × K × 130 is obtained after word vector splicing in radical vector sum word vector matrix in other vector matrix
Matrix.
3, obtained in 2 word feature vector first pass through abandon layer (dropout) processing, to prevent model over-fitting,
The specific gravity of dropout is set as 0.5, is then input to again in two-way LSTM network, if the hidden unit size of LSTM network
It is 64, the output of the two-way each time step of LSTM is stitched together, the hidden vector matrix of 10 × K × 128 is obtained.
4, successively pass through processing from attention mechanism layer, conditional random field models hidden vector matrix vector is obtained in 3, obtain
To 10 × K × N prediction probability matrix.
5, since 10 × K of output × N matrix expression is all the probability that a character is marked as N number of label, so
Select label of the label of probability maximum probability in N number of probability as character.
In the present embodiment, by extracting the morphological feature of electronic health record character inner, by the feature and word of character itself
The internal morphological feature of symbol is sequentially inputted to predict alphanumeric tag in deep neural network, provides a kind of identification accurately
The high electronic health record of rate names entity recognition method.
Further, referring to Fig. 7, third embodiment of the invention is based on first embodiment and provides a kind of electronic health record name reality
Body knows method for distinguishing, and the present embodiment includes: in step S50
The word eigenvectors matrix is input to the two-way shot and long term memory network and handled, obtained by step S90
The corresponding hidden vector matrix of the character string.
The hidden vector matrix is input to from attention mechanism layer and handles, obtains the electronic health record by step S100
Name Entity recognition result.
It should be understood that being based on first embodiment, the considerations of for different application scene or process resource and second is real
A different from is applied, as shown in figure 8, the nervus opticus network in the present embodiment is only included from attention mechanism layer, does not include
Conditional random field models.
The hidden vector matrix of two-way shot and long term memory network output is input to from after attention mechanism layer, first calculates hidden moment of a vector
The attention weight of hidden vector in battle array generates attention vector matrix further according to attention weight and hidden vector, then according to
Hidden vector matrix and attention vector matrix generate the hidden vector matrix of attention, finally the hidden vector matrix of attention is inputted complete
Articulamentum is handled, and the corresponding prediction probability matrix of character string to be identified is obtained.
Probability value in the prediction probability matrix obtained in above-mentioned steps is the label point of character string to be identified prediction
Class probability, the highest prediction label result for corresponding character of select probability value.Each character in character string has been determined
After prediction label, the determination of participle and entity class can be carried out to character string according to the meaning of label, completes name entity
Identification.
In the present embodiment, the morphological feature that electronic health record character inner is extracted by convolutional neural networks, by character sheet
The feature of body and the morphological feature of character inner be sequentially inputted to two-way shot and long term memory network layer in deep neural network and
Alphanumeric tag is predicted from attention mechanism layer, provides a kind of electronic health record name entity recognition method of precise and high efficiency.
Further, referring to Fig. 9, fourth embodiment of the invention is based on first embodiment and provides a kind of electronic health record name reality
Body knows method for distinguishing, and the present embodiment includes: in step S50
The word eigenvectors matrix is input to two-way shot and long term memory network and handled, obtains institute by step S110
State the corresponding hidden vector matrix of character string.
The hidden vector matrix is input to conditional random field models and handled by step S120, obtains the electronics disease
The name Entity recognition result gone through.
It should be understood that being based on first embodiment, the considerations of for different application scene or process resource and second is real
A different from is applied, as shown in Figure 10, the nervus opticus network in the present embodiment only includes conditional random field models, does not wrap
It includes from attention mechanism layer.
After the hidden vector matrix of two-way shot and long term memory network output is input to conditional random field models, through processing obtain to
Identify the corresponding prediction probability matrix of character string.
Probability value in the prediction probability matrix obtained in above-mentioned steps is the label point of character string to be identified prediction
Class probability, the highest prediction label result for corresponding character of select probability value.Each character in character string has been determined
After prediction label, the determination of participle and entity class can be carried out to character string according to the meaning of label, completes name entity
Identification.
In the present embodiment, the morphological feature that electronic health record character inner is extracted by convolutional neural networks, by character sheet
The feature of body and the morphological feature of character inner be sequentially inputted to two-way shot and long term memory network layer in deep neural network and
Conditional random field models predict alphanumeric tag, provide a kind of electronic health record name Entity recognition side of precise and high efficiency
Method.
The present invention also provides a kind of electronic health records to name entity recognition device, which names entity recognition device packet
It includes: the electronic health record name entity that memory, processor and being stored in can be run on the memory and on the processor
Identifying processing program, the electronic health record name Entity recognition processing routine realize the electronics when being executed by the processor
Case history names the step of method of Entity recognition.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium
On be stored with electronic health record name Entity recognition processing routine, electronic health record name Entity recognition processing routine is by processor
The step of method of the electronic health record name Entity recognition is realized when execution.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone,
Computer, server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of method of electronic health record name Entity recognition, which is characterized in that the side of the electronic health record name Entity recognition
Method the following steps are included:
Generate the corresponding word vector matrix of character string of the electronic health record of name entity to be identified;
Generate the corresponding radical vector matrix of the character string;
The radical vector matrix is input to first nerves network to handle, obtains the corresponding radical volume of the character string
Product vector matrix, wherein the first nerves network includes convolutional neural networks layer;
Word eigenvectors matrix is generated according to the word vector matrix and the radical convolution vector matrix;
The word eigenvectors matrix is input in nervus opticus network and is handled, the name for obtaining the electronic health record is real
Body recognition result, wherein the nervus opticus network includes two-way shot and long term memory network layer;
Wherein, the parameter of the first nerves network and the nervus opticus network is according to the electronic health record for having identified name entity
Training obtains.
2. the method for electronic health record name Entity recognition as described in claim 1, which is characterized in that described to generate the character
The step of sequence corresponding radical vector matrix includes:
Obtain the Hanzi component of each character in the character string;
The radical vector of each character is generated according to the Hanzi component;
The corresponding radical vector matrix of the character string is generated according to the radical vector of each character.
3. the method for electronic health record name Entity recognition as claimed in claim 2, which is characterized in that the nervus opticus network
It further include full articulamentum, the described word eigenvectors matrix is input in nervus opticus network is handled, and is obtained described
The step of name Entity recognition result of electronic health record includes:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character string
Corresponding hidden vector matrix;
The hidden vector matrix is inputted the full articulamentum to handle, obtains the name Entity recognition knot of the electronic health record
Fruit.
4. the method for electronic health record name Entity recognition as claimed in claim 2, which is characterized in that the nervus opticus network
It further include from attention mechanism layer, the described word eigenvectors matrix is input in nervus opticus network handles, obtain
The step of name Entity recognition result of the electronic health record includes:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character string
Corresponding hidden vector matrix;
The hidden vector matrix is input to from attention mechanism layer and is handled, the name Entity recognition of the electronic health record is obtained
As a result.
5. the method for electronic health record name Entity recognition as claimed in claim 2, which is characterized in that the nervus opticus network
Further include from attention mechanism layer and conditional random field models, it is described that the word eigenvectors matrix is input to nervus opticus network
In the step of being handled, obtaining the name Entity recognition result of the electronic health record include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character string
Corresponding hidden vector matrix;
The hidden vector matrix is input to from attention mechanism layer and is handled, the corresponding prediction square of the character string is obtained
Battle array;
The prediction matrix is inputted the conditional random field models to handle, the name entity for obtaining the electronic health record is known
Other result.
6. the method for electronic health record name Entity recognition as claimed in claim 5, which is characterized in that described from attention mechanism layer
Including full articulamentum, the described hidden vector matrix is input to from attention mechanism layer is handled, and obtains the character string
The step of corresponding prediction matrix includes:
Calculate the attention weight of hidden vector in the hidden vector matrix;
Attention vector matrix is generated according to the attention weight and the hidden vector;
The hidden vector matrix of attention is generated according to the hidden vector matrix and the attention vector matrix;
The hidden vector matrix of the attention is inputted the full articulamentum to handle, obtains the corresponding prediction of the character string
Matrix.
7. the method for electronic health record as claimed in claim 6 name Entity recognition, which is characterized in that it is described calculate it is described it is hidden to
The attention weight step of hidden vector includes: in moment matrix
The dependence in the hidden vector matrix between hidden vector is calculated according to following formula:
ft,t′=σ (watanh(wtht+wt′ht′)),
Wherein, t and t' indicates different time steps, wa, wt, wt′For weight vectors, σ is sigmoid function, htFor and ht′For not
With the hidden vector of time step;
It is each hidden vector h in shown hidden vector matrix according to following formulakCalculate corresponding attention weight
Wherein, e is exponential function, and N is the number of the hidden vector,
8. the method for electronic health record name Entity recognition as claimed in claim 2, which is characterized in that the nervus opticus network
It further include conditional random field models, the described word eigenvectors matrix is input in nervus opticus network is handled, and is obtained
To the electronic health record name Entity recognition result the step of include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character string
Corresponding hidden vector matrix;
The hidden vector matrix is input to conditional random field models to handle, the name entity for obtaining the electronic health record is known
Other result.
9. a kind of electronic health record names entity recognition device, which is characterized in that the electronic health record names entity recognition device packet
It includes: the electronic health record life that memory, processor, camera and being stored in can be run on the memory and on the processor
The processing routine of the processing routine of name Entity recognition, the electronic health record name Entity recognition is realized when being executed by the processor
The step of naming the method for Entity recognition such as electronic health record described in any item of the claim 1 to 8.
10. a kind of storage medium, which is characterized in that be stored with the processing of electronic health record name Entity recognition on the storage medium
The processing routine of program, the electronic health record name Entity recognition is realized when being executed by processor as any in claim 1 to 8
The step of method of electronic health record name Entity recognition described in.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811282557.3A CN109388807B (en) | 2018-10-30 | 2018-10-30 | Method, device and storage medium for identifying named entities of electronic medical records |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811282557.3A CN109388807B (en) | 2018-10-30 | 2018-10-30 | Method, device and storage medium for identifying named entities of electronic medical records |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109388807A true CN109388807A (en) | 2019-02-26 |
CN109388807B CN109388807B (en) | 2021-09-21 |
Family
ID=65427746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811282557.3A Active CN109388807B (en) | 2018-10-30 | 2018-10-30 | Method, device and storage medium for identifying named entities of electronic medical records |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109388807B (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109871544A (en) * | 2019-03-25 | 2019-06-11 | 平安科技(深圳)有限公司 | Entity recognition method, device, equipment and storage medium based on Chinese case history |
CN109933801A (en) * | 2019-03-25 | 2019-06-25 | 北京理工大学 | Two-way LSTM based on predicted position attention names entity recognition method |
CN109960728A (en) * | 2019-03-11 | 2019-07-02 | 北京市科学技术情报研究所(北京市科学技术信息中心) | A kind of open field conferencing information name entity recognition method and system |
CN110046349A (en) * | 2019-03-26 | 2019-07-23 | 平安科技(深圳)有限公司 | Information identifying method, device, equipment and storage medium based on Chinese case history |
CN110135427A (en) * | 2019-04-11 | 2019-08-16 | 北京百度网讯科技有限公司 | The method, apparatus, equipment and medium of character in image for identification |
CN110162782A (en) * | 2019-04-17 | 2019-08-23 | 平安科技(深圳)有限公司 | Entity extraction method, apparatus, equipment and storage medium based on Medical Dictionary |
CN110287483A (en) * | 2019-06-06 | 2019-09-27 | 广东技术师范大学 | A kind of unknown word identification method and system using five-stroke etymon deep learning |
CN110334357A (en) * | 2019-07-18 | 2019-10-15 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition |
CN110427493A (en) * | 2019-07-11 | 2019-11-08 | 新华三大数据技术有限公司 | Electronic health record processing method, model training method and relevant apparatus |
CN110516228A (en) * | 2019-07-04 | 2019-11-29 | 湖南星汉数智科技有限公司 | Name entity recognition method, device, computer installation and computer readable storage medium |
CN110555441A (en) * | 2019-09-10 | 2019-12-10 | 杭州橙鹰数据技术有限公司 | character recognition method and device |
CN110688855A (en) * | 2019-09-29 | 2020-01-14 | 山东师范大学 | Chinese medical entity identification method and system based on machine learning |
CN110929749A (en) * | 2019-10-15 | 2020-03-27 | 平安科技(深圳)有限公司 | Text recognition method, text recognition device, text recognition medium and electronic equipment |
CN111143534A (en) * | 2019-12-26 | 2020-05-12 | 腾讯云计算(北京)有限责任公司 | Method and device for extracting brand name based on artificial intelligence and storage medium |
CN111178074A (en) * | 2019-12-12 | 2020-05-19 | 天津大学 | Deep learning-based Chinese named entity recognition method |
CN111192692A (en) * | 2020-01-02 | 2020-05-22 | 上海联影智能医疗科技有限公司 | Entity relationship determination method and device, electronic equipment and storage medium |
CN111339764A (en) * | 2019-09-18 | 2020-06-26 | 华为技术有限公司 | Chinese named entity recognition method and device |
CN111352977A (en) * | 2020-03-10 | 2020-06-30 | 浙江大学 | Time sequence data monitoring method based on self-attention bidirectional long-short term memory network |
US10740561B1 (en) | 2019-04-25 | 2020-08-11 | Alibaba Group Holding Limited | Identifying entities in electronic medical records |
CN111797626A (en) * | 2019-03-21 | 2020-10-20 | 阿里巴巴集团控股有限公司 | Named entity identification method and device |
WO2020211250A1 (en) * | 2019-04-19 | 2020-10-22 | 平安科技(深圳)有限公司 | Entity recognition method and apparatus for chinese medical record, device and storage medium |
EP3767516A1 (en) * | 2019-07-18 | 2021-01-20 | Ricoh Company, Ltd. | Named entity recognition method, apparatus, and computer-readable recording medium |
CN112434520A (en) * | 2020-11-11 | 2021-03-02 | 北京工业大学 | Named entity recognition method and device and readable storage medium |
CN112836514A (en) * | 2020-06-19 | 2021-05-25 | 合肥量圳建筑科技有限公司 | Nested entity recognition method and device, electronic equipment and storage medium |
CN113408289A (en) * | 2021-06-29 | 2021-09-17 | 广东工业大学 | Multi-feature fusion supply chain management entity knowledge extraction method and system |
CN115201904A (en) * | 2022-07-18 | 2022-10-18 | 北京石油化工学院 | Microseism data compression and event detection method based on edge intelligence |
CN112836514B (en) * | 2020-06-19 | 2024-07-02 | 合肥量圳建筑科技有限公司 | Nested entity identification method, apparatus, electronic device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106980608A (en) * | 2017-03-16 | 2017-07-25 | 四川大学 | A kind of Chinese electronic health record participle and name entity recognition method and system |
CN107977361A (en) * | 2017-12-06 | 2018-05-01 | 哈尔滨工业大学深圳研究生院 | The Chinese clinical treatment entity recognition method represented based on deep semantic information |
US20180275817A1 (en) * | 2017-03-27 | 2018-09-27 | Tricorn (Beijing) Technology Co., Ltd. | Information processing apparatus, information processing method and computer-readable storage medium |
-
2018
- 2018-10-30 CN CN201811282557.3A patent/CN109388807B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106980608A (en) * | 2017-03-16 | 2017-07-25 | 四川大学 | A kind of Chinese electronic health record participle and name entity recognition method and system |
US20180275817A1 (en) * | 2017-03-27 | 2018-09-27 | Tricorn (Beijing) Technology Co., Ltd. | Information processing apparatus, information processing method and computer-readable storage medium |
CN107977361A (en) * | 2017-12-06 | 2018-05-01 | 哈尔滨工业大学深圳研究生院 | The Chinese clinical treatment entity recognition method represented based on deep semantic information |
Non-Patent Citations (1)
Title |
---|
朱丹浩、杨蕾、王东波: "基于深度学习的中文机构名识别研究——一种汉字级别的循环神经网络方法", 《现代图书情报技术》 * |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960728A (en) * | 2019-03-11 | 2019-07-02 | 北京市科学技术情报研究所(北京市科学技术信息中心) | A kind of open field conferencing information name entity recognition method and system |
CN109960728B (en) * | 2019-03-11 | 2021-01-22 | 北京市科学技术情报研究所(北京市科学技术信息中心) | Method and system for identifying named entities of open domain conference information |
CN111797626A (en) * | 2019-03-21 | 2020-10-20 | 阿里巴巴集团控股有限公司 | Named entity identification method and device |
CN109871544A (en) * | 2019-03-25 | 2019-06-11 | 平安科技(深圳)有限公司 | Entity recognition method, device, equipment and storage medium based on Chinese case history |
CN109933801A (en) * | 2019-03-25 | 2019-06-25 | 北京理工大学 | Two-way LSTM based on predicted position attention names entity recognition method |
CN110046349A (en) * | 2019-03-26 | 2019-07-23 | 平安科技(深圳)有限公司 | Information identifying method, device, equipment and storage medium based on Chinese case history |
CN110135427B (en) * | 2019-04-11 | 2021-07-27 | 北京百度网讯科技有限公司 | Method, apparatus, device and medium for recognizing characters in image |
CN110135427A (en) * | 2019-04-11 | 2019-08-16 | 北京百度网讯科技有限公司 | The method, apparatus, equipment and medium of character in image for identification |
CN110162782B (en) * | 2019-04-17 | 2022-04-01 | 平安科技(深圳)有限公司 | Entity extraction method, device and equipment based on medical dictionary and storage medium |
CN110162782A (en) * | 2019-04-17 | 2019-08-23 | 平安科技(深圳)有限公司 | Entity extraction method, apparatus, equipment and storage medium based on Medical Dictionary |
WO2020211250A1 (en) * | 2019-04-19 | 2020-10-22 | 平安科技(深圳)有限公司 | Entity recognition method and apparatus for chinese medical record, device and storage medium |
US10740561B1 (en) | 2019-04-25 | 2020-08-11 | Alibaba Group Holding Limited | Identifying entities in electronic medical records |
CN110287483A (en) * | 2019-06-06 | 2019-09-27 | 广东技术师范大学 | A kind of unknown word identification method and system using five-stroke etymon deep learning |
CN110287483B (en) * | 2019-06-06 | 2023-12-05 | 广东技术师范大学 | Unregistered word recognition method and system utilizing five-stroke character root deep learning |
CN110516228A (en) * | 2019-07-04 | 2019-11-29 | 湖南星汉数智科技有限公司 | Name entity recognition method, device, computer installation and computer readable storage medium |
CN110427493A (en) * | 2019-07-11 | 2019-11-08 | 新华三大数据技术有限公司 | Electronic health record processing method, model training method and relevant apparatus |
CN110334357A (en) * | 2019-07-18 | 2019-10-15 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition |
EP3767516A1 (en) * | 2019-07-18 | 2021-01-20 | Ricoh Company, Ltd. | Named entity recognition method, apparatus, and computer-readable recording medium |
CN112329465A (en) * | 2019-07-18 | 2021-02-05 | 株式会社理光 | Named entity identification method and device and computer readable storage medium |
CN110555441A (en) * | 2019-09-10 | 2019-12-10 | 杭州橙鹰数据技术有限公司 | character recognition method and device |
CN111339764A (en) * | 2019-09-18 | 2020-06-26 | 华为技术有限公司 | Chinese named entity recognition method and device |
CN110688855A (en) * | 2019-09-29 | 2020-01-14 | 山东师范大学 | Chinese medical entity identification method and system based on machine learning |
CN110929749A (en) * | 2019-10-15 | 2020-03-27 | 平安科技(深圳)有限公司 | Text recognition method, text recognition device, text recognition medium and electronic equipment |
CN110929749B (en) * | 2019-10-15 | 2022-04-29 | 平安科技(深圳)有限公司 | Text recognition method, text recognition device, text recognition medium and electronic equipment |
CN111178074A (en) * | 2019-12-12 | 2020-05-19 | 天津大学 | Deep learning-based Chinese named entity recognition method |
CN111178074B (en) * | 2019-12-12 | 2023-08-25 | 天津大学 | Chinese named entity recognition method based on deep learning |
CN111143534A (en) * | 2019-12-26 | 2020-05-12 | 腾讯云计算(北京)有限责任公司 | Method and device for extracting brand name based on artificial intelligence and storage medium |
CN111192692B (en) * | 2020-01-02 | 2023-12-08 | 上海联影智能医疗科技有限公司 | Entity relationship determination method and device, electronic equipment and storage medium |
CN111192692A (en) * | 2020-01-02 | 2020-05-22 | 上海联影智能医疗科技有限公司 | Entity relationship determination method and device, electronic equipment and storage medium |
CN111352977B (en) * | 2020-03-10 | 2022-06-17 | 浙江大学 | Time sequence data monitoring method based on self-attention bidirectional long-short term memory network |
CN111352977A (en) * | 2020-03-10 | 2020-06-30 | 浙江大学 | Time sequence data monitoring method based on self-attention bidirectional long-short term memory network |
CN112836514A (en) * | 2020-06-19 | 2021-05-25 | 合肥量圳建筑科技有限公司 | Nested entity recognition method and device, electronic equipment and storage medium |
CN112836514B (en) * | 2020-06-19 | 2024-07-02 | 合肥量圳建筑科技有限公司 | Nested entity identification method, apparatus, electronic device and storage medium |
CN112434520A (en) * | 2020-11-11 | 2021-03-02 | 北京工业大学 | Named entity recognition method and device and readable storage medium |
CN113408289A (en) * | 2021-06-29 | 2021-09-17 | 广东工业大学 | Multi-feature fusion supply chain management entity knowledge extraction method and system |
CN113408289B (en) * | 2021-06-29 | 2024-04-16 | 广东工业大学 | Multi-feature fusion supply chain management entity knowledge extraction method and system |
CN115201904A (en) * | 2022-07-18 | 2022-10-18 | 北京石油化工学院 | Microseism data compression and event detection method based on edge intelligence |
CN115201904B (en) * | 2022-07-18 | 2023-03-03 | 北京石油化工学院 | Microseism data compression and event detection method based on edge intelligence |
Also Published As
Publication number | Publication date |
---|---|
CN109388807B (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109388807A (en) | The method, apparatus and storage medium of electronic health record name Entity recognition | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
CN110472229A (en) | Sequence labelling model training method, electronic health record processing method and relevant apparatus | |
CN109471945B (en) | Deep learning-based medical text classification method and device and storage medium | |
CN107679447A (en) | Facial characteristics point detecting method, device and storage medium | |
CN110490213A (en) | Image-recognizing method, device and storage medium | |
CN110503076B (en) | Video classification method, device, equipment and medium based on artificial intelligence | |
CN110517785A (en) | Lookup method, device and the equipment of similar case | |
CN113095415B (en) | Cross-modal hashing method and system based on multi-modal attention mechanism | |
CN110459282A (en) | Sequence labelling model training method, electronic health record processing method and relevant apparatus | |
CN107168992A (en) | Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence | |
US20200074280A1 (en) | Semi-supervised learning using clustering as an additional constraint | |
CN110442840A (en) | Sequence labelling network update method, electronic health record processing method and relevant apparatus | |
CN111709398A (en) | Image recognition method, and training method and device of image recognition model | |
CN111368536A (en) | Natural language processing method, apparatus and storage medium therefor | |
CN112000778A (en) | Natural language processing method, device and system based on semantic recognition | |
CN114419351A (en) | Image-text pre-training model training method and device and image-text prediction model training method and device | |
CN110210540A (en) | Across social media method for identifying ID and system based on attention mechanism | |
CN114648032B (en) | Training method and device of semantic understanding model and computer equipment | |
CN115758282A (en) | Cross-modal sensitive information identification method, system and terminal | |
Wang et al. | Computer vision for lifelogging: Characterizing everyday activities based on visual semantics | |
CN116578738B (en) | Graph-text retrieval method and device based on graph attention and generating countermeasure network | |
CN111445545B (en) | Text transfer mapping method and device, storage medium and electronic equipment | |
CN113761188A (en) | Text label determination method and device, computer equipment and storage medium | |
CN116701635A (en) | Training video text classification method, training video text classification device, training video text classification equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |