CN109388807A - The method, apparatus and storage medium of electronic health record name Entity recognition - Google Patents

The method, apparatus and storage medium of electronic health record name Entity recognition Download PDF

Info

Publication number
CN109388807A
CN109388807A CN201811282557.3A CN201811282557A CN109388807A CN 109388807 A CN109388807 A CN 109388807A CN 201811282557 A CN201811282557 A CN 201811282557A CN 109388807 A CN109388807 A CN 109388807A
Authority
CN
China
Prior art keywords
electronic health
health record
matrix
name entity
entity recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811282557.3A
Other languages
Chinese (zh)
Other versions
CN109388807B (en
Inventor
任江涛
殷明旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201811282557.3A priority Critical patent/CN109388807B/en
Publication of CN109388807A publication Critical patent/CN109388807A/en
Application granted granted Critical
Publication of CN109388807B publication Critical patent/CN109388807B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a kind of methods of electronic health record name Entity recognition, it include: the corresponding word vector matrix of character string and radical vector matrix for generating the electronic health record of name entity to be identified, the radical vector matrix is input to convolutional neural networks layer to handle, obtain the corresponding radical convolution vector matrix of the character string, word eigenvectors matrix is generated according to the word vector matrix and radical convolution vector matrix, the word eigenvectors matrix is input in two-way shot and long term memory network and is handled, obtain the name Entity recognition result of the electronic health record.The invention also discloses a kind of electronic health record name entity recognition device and storage mediums.The morphological feature that the present invention passes through extraction electronic health record character inner, the morphological feature of the feature of character itself and character inner is sequentially inputted to predict alphanumeric tag in deep neural network, provides a kind of method of the electronic health record that recognition accuracy is high name Entity recognition.

Description

The method, apparatus and storage medium of electronic health record name Entity recognition
Technical field
The present invention relates to the methods of field of computer technology more particularly to a kind of electronic health record name Entity recognition, electronics Case history names entity recognition device and computer storage medium.
Background technique
With booming and living standards of the people the increasingly raising of Chinese society economy, the also day of people's health consciousness How benefit enhancing, construct the urgent need that intelligent medical system is present society using a large amount of medical data.Electronic health record It is that medical data mileage amount is most, comprising information also most medical data text, has its unique professional, for trouble It person and is write by medical practitioner, has recorded out various symptoms during being admitted to hospital, the disease of diagnosis and corresponding in detail Treatment means, there are also the results etc. of all kinds of audit reports, contain a large amount of medical information.Therefore many intelligent medical information System is all based on the information of electronic health record to construct.During constructing intelligent medical information system and system, name is real Body identification is the basis that the vital task of information extraction is carried out to a large amount of medical data, the information processing to various medical fields It is particularly significant with management system.
There is the name entity recognition method towards medical field based on deep learning in the prior art, has utilized nerve net Network model extracts the contextual information between word or word, exports the probability distribution of an entity class.But due to word or word Information indicate incomplete, rely only on word vector or term vector, do not account for the deep information hiding inside word or word, identification effect Fruit is bad.
Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill Art.
Summary of the invention
The main purpose of the present invention is to provide a kind of method of electronic health record name Entity recognition, electronic health record name are real Body identification device, electronic health record name Entity recognition equipment and computer storage medium, it is intended to solve the taken base of the prior art Word vector or term vector are relied only in the implementation method of deep learning, do not account for deep layer letter hiding inside word or word Breath, the bad technical problem of recognition effect.
To achieve the above object, the present invention provides a kind of method of electronic health record name Entity recognition, the electronic health record The method of name Entity recognition includes the following steps:
Generate the corresponding word vector matrix of character string of the electronic health record of name entity to be identified;
Generate the corresponding radical vector matrix of the character string;
The radical vector matrix is input to first nerves network to handle, it is corresponding partially to obtain the character string Other convolution vector matrix, wherein the first nerves network includes convolutional neural networks layer;
Word eigenvectors matrix is generated according to the word vector matrix and the radical convolution vector matrix;
The word eigenvectors matrix is input in nervus opticus network and is handled, the life of the electronic health record is obtained Name Entity recognition result, wherein the nervus opticus network includes two-way shot and long term memory network layer;
Wherein, the parameter of the first nerves network and the nervus opticus network is according to the electronics for having identified name entity Case history training obtains.
Preferably, described the step of generating the character string corresponding radical vector matrix, includes:
Obtain the Hanzi component of each character in the character string;
The radical vector of each character is generated according to the Hanzi component;
The corresponding radical vector matrix of the character string is generated according to the radical vector of each character.
Preferably, the nervus opticus network further includes full articulamentum, described to be input to the word eigenvectors matrix The step of being handled in nervus opticus network, obtaining the name Entity recognition result of the electronic health record include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character The corresponding hidden vector matrix of sequence;
The hidden vector matrix is inputted the full articulamentum to handle, the name entity for obtaining the electronic health record is known Other result.
Preferably, the nervus opticus network further include from attention mechanism layer, it is described that the word eigenvectors matrix is defeated The step of entering into nervus opticus network and handled, obtaining the name Entity recognition result of the electronic health record include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character The corresponding hidden vector matrix of sequence;
The hidden vector matrix is input to from attention mechanism layer and is handled, the name entity of the electronic health record is obtained Recognition result.
Preferably, the nervus opticus network further include from attention mechanism layer and conditional random field models, it is described will be described Word eigenvectors matrix is input in nervus opticus network and is handled, and obtains the name Entity recognition result of the electronic health record The step of include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character The corresponding hidden vector matrix of sequence;
The hidden vector matrix is input to from attention mechanism layer and is handled, the corresponding prediction of the character string is obtained Matrix;
The prediction matrix is inputted the conditional random field models to handle, the name for obtaining the electronic health record is real Body recognition result.
Preferably, it is described from attention mechanism layer include full articulamentum, it is described by the hidden vector matrix be input to from pay attention to The step of mechanism layer is handled, and the character string corresponding prediction matrix is obtained include:
Calculate the attention weight of hidden vector in the hidden vector matrix;
Attention vector matrix is generated according to the attention weight and the hidden vector;
The hidden vector matrix of attention is generated according to the hidden vector matrix and the attention vector matrix;
The hidden vector matrix of the attention is inputted the full articulamentum to handle, it is corresponding to obtain the character string Prediction matrix.
Preferably, the attention weight step for calculating hidden vector in the hidden vector matrix includes:
The dependence in the hidden vector matrix between hidden vector is calculated according to following formula:
ft,t′=σ (wa tanh(wtht+wt′ht′)),
Wherein, t and t' indicates different time steps, wa, wt, wt′For weight vectors, σ is sigmoid function, htFor and ht′ For the hidden vector of different time step;
It is each hidden vector h in shown hidden vector matrix according to following formulakCalculate corresponding attention weight
Wherein, e is exponential function, and N is the number of the hidden vector,
Preferably, the nervus opticus network further includes conditional random field models, described by the word eigenvectors matrix The step of being input in nervus opticus network and handled, obtaining the name Entity recognition result of the electronic health record include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character The corresponding hidden vector matrix of sequence;
The hidden vector matrix is input to conditional random field models to handle, the name for obtaining the electronic health record is real Body recognition result.
In addition, to achieve the above object, the present invention also provides the electronic health records to name entity recognition device, the device packet It includes: the electronic health record name entity that memory, processor and being stored in can be run on the memory and on the processor Identifying processing program, the electronic health record name Entity recognition processing routine are realized as described above when being executed by the processor Electronic health record names the step of method of Entity recognition.
In addition, to achieve the above object, the present invention also proposes a kind of computer storage medium, which is characterized in that the meter The processing routine of electronic health record name Entity recognition is stored on calculation machine storage medium, the electronic health record name Entity recognition The step of method of electronic health record name Entity recognition as described above is realized when processing routine is executed by processor.
The method for the electronic health record name Entity recognition that the embodiment of the present invention proposes, electronic health record name entity recognition device And computer storage medium, generate the corresponding word vector matrix of character string and radical of the electronic health record of name entity to be identified The radical vector matrix is input to convolutional neural networks layer and handled by vector matrix, and it is corresponding to obtain the character string Radical convolution vector matrix, word eigenvectors matrix is generated according to the word vector matrix and radical convolution vector matrix, will The word eigenvectors matrix is input in two-way shot and long term memory network and is handled, and the name for obtaining the electronic health record is real Body recognition result.The present invention, will be in the feature and character of character itself by the morphological feature of extraction electronic health record character inner The morphological feature in portion is sequentially inputted to predict alphanumeric tag in deep neural network, and it is high to provide a kind of recognition accuracy Electronic health record name Entity recognition method.
Detailed description of the invention
Fig. 1 is the apparatus structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is the flow diagram that electronic health record of the present invention names entity recognition method first embodiment;
Fig. 3 is that electronic health record of the present invention names the convolutional neural networks treatment process of entity recognition method first embodiment to show It is intended to;
Fig. 4 is that electronic health record of the present invention names the nerve network system treatment process of entity recognition method first embodiment to show It is intended to;
Fig. 5 is the flow diagram that electronic health record of the present invention names entity recognition method second embodiment;
Fig. 6 is that electronic health record of the present invention names the nerve network system treatment process of entity recognition method second embodiment to show It is intended to;
Fig. 7 is the flow diagram that electronic health record of the present invention names entity recognition method 3rd embodiment;
Fig. 8 is that electronic health record of the present invention names the nerve network system treatment process of entity recognition method 3rd embodiment to show It is intended to;
Fig. 9 is the flow diagram that electronic health record of the present invention names entity recognition method fourth embodiment;
Figure 10 is the nerve network system treatment process that electronic health record of the present invention names entity recognition method fourth embodiment Schematic diagram.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, Fig. 1 is the terminal structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to.
The terminal of that embodiment of the invention can be PC, be also possible to smart phone, tablet computer, E-book reader, MP3 (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3) Player, MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard sound Frequency level 3) the packaged type terminal device having a display function such as player, portable computer.
As shown in Figure 1, the terminal may include: processor 1001, such as CPU, network interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing the connection communication between these components. User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include that the wired of standard connects Mouth, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processor 1001 storage device.
Optionally, terminal can also include camera, RF (Radio Frequency, radio frequency) circuit, sensor, audio Circuit, WiFi module etc..Wherein, sensor such as optical sensor, motion sensor and other sensors.Specifically, light Sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can according to the light and shade of ambient light come The brightness of display screen is adjusted, proximity sensor can close display screen and/or backlight when mobile terminal is moved in one's ear.As One kind of motion sensor, gravity accelerometer can detect the size of (generally three axis) acceleration in all directions, quiet Size and the direction that can detect that gravity when only, the application that can be used to identify mobile terminal posture are (such as horizontal/vertical screen switching, related Game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.;Certainly, mobile terminal can also match The other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor are set, details are not described herein.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal of terminal structure shown in Fig. 1, can wrap It includes than illustrating more or fewer components, perhaps combines certain components or different component layouts.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe that module, Subscriber Interface Module SIM and electronic health record name Entity recognition processing routine.
In terminal shown in Fig. 1, network interface 1004 is mainly used for connecting background server, carries out with background server Data communication;User interface 1003 is mainly used for connecting client (user terminal), carries out data communication with client;And processor 1001 can be used for that the electronic health record stored in memory 1005 is called to name Entity recognition processing routine, and execute the electronics Case history names the step of entity recognition method.
Referring to Fig. 2, first embodiment of the invention provides a kind of method of electronic health record name Entity recognition, the method packet It includes:
Step S10 generates the corresponding word vector matrix of character string of the electronic health record of name entity to be identified.
The character string in the present illness history content of the electronic health record of name entity to be identified is first obtained first.Due to this implementation The electronic health record name entity recognition method that example provides is by combining convolutional network model (CNN) and two-way shot and long term to remember net Network model (Bi-LSTM) Lai Shixian, and these network models can only handle the input of value type, so to be identified getting When naming the character string of the electronic health record of entity, need to convert thereof into the form of vector.
Trained word vector in advance usually can be used and obtain the corresponding word vector of character string, for example use Google Word2vec vector representation method, this method can project to character in lower dimensional space, and the meaning of one's words is close in the lower dimensional space Word or word distance it is all closer.For example, " China " and " Guangzhou ", " China " and " computer " two groups of words, the former is at this Distance in lower dimensional space is much smaller than the distance between the latter.
In order to obtain accurate term vector using word2vec vector representation method, come using 10000 parts of electronic health records As corpus training word vector, and trained using the Skip-Gram model in word2vec.Although the Skip- in training Gram model ratio CBOW model is slow, but Skip-Gram model shows than CBOW on the corpus comprising rare character Good, the matching degree of the character string of obtained word vector sum electronic health record is higher.
Specifically, when obtaining the corresponding term vector of character string with word2vec vector representation method, available index Mode realize.For example, n indicates the character string length of input, root if the character string of electronic health record is C (C1, C2 ... Cn) Character index is generated according to the position of character in the sequence.After obtaining trained word vector in advance, tabled look-up by character index The corresponding word vector of character be can be obtained to get to word sequence vector x (x1, x2 ... xn), x ∈ Rn×d, d is word vector space dimension Degree.
Step S20 generates the corresponding radical vector matrix of the character string.
In common name entity recognition method neural network based, it will usually by the corresponding word of text to be identified to Amount or word vector are input to neural network model and carry out Tag Estimation, but information content expressed by word or word is limited, relies only on Word vector or term vector name the accuracy rate of Entity recognition to improve limited.
Based on defect in the prior art set forth above, the deeper time that may be present inside deep excavation word or word The angle of information form inventive concept of the invention.Since Chinese character is developed by pictograph, many texts are also Its in store primitive meaning, similar its meaning of text of many shapes is also close, such as " disease " and " disease ", " pain " and " pain " etc., because This considers the input that the shape information of character can be also used as to neural network, carries out feature extraction to it with neural network, is Tag Estimation later provides deeper information existing for word or word inside.
From the point of view of intuitive, the component of character constitutes the form for reflecting character to a certain extent, therefore available character Hanzi component configuration information as character shape information, the Hanzi component for such as obtaining "and" word constitutes " standing grain, mouth " and is used as "and" The character shape information of word.
Specifically, when the Hanzi component for getting character is constituted, each Hanzi component is considered as one of the character Independent radical, such as " standing grain " and " mouth " are by the left avertence of "and" word and by right avertence respectively, are that each radical generation one is right The radical sequence comprising multiple radicals of the radical vector answered, the character is corresponding with radical sequence vector, radical sequence vector etc. It is same as two-dimentional radical vector matrix.For the character string of name entity to be identified, the two-dimentional radical vector matrix of multiple characters The corresponding three-dimensional radical vector matrix of the character string can be formed together.
The radical vector matrix is input to first nerves network and handled, obtains the character string by step S30 Corresponding radical convolution vector matrix, wherein the first nerves network includes convolutional neural networks layer.
Feature extraction is carried out using CNN convolutional neural networks in the present embodiment.As Fig. 3 gives at convolutional neural networks Reason process schematic, such as the radical sequence of character to be identified " pain " word are as shown in Figure 3, are radical sequence point when due to processing Regular length is saved as in matching, which includes radical filling.The radical sequence for obtaining character to be identified radical to After amount, radical vector matrix is input in the CNN convolutional neural networks layer in first nerves network, respectively by convolutional layer Process of convolution, the processing of the pondization of pond layer and the processing of articulamentum entirely, output include the radical convolution of character inner shape information Vector matrix.It should be noted that CNN convolutional neural networks may include multiple convolutional layers, multiple pond layers and multiple full connections Layer, the present embodiment are not defined this structure.
It is to be appreciated that radical vector can not use trained vector, first random initializtion, radical vector also when Make parameter training in first nerves network.
Step S40 generates word eigenvectors matrix according to the word vector matrix and the radical convolution vector matrix.
Processing of the character string Jing Guo above-mentioned steps in the electronic health record of name entity to be identified obtain corresponding word to Moment matrix and radical convolution vector matrix, due to the character string in the two vector matrixs all comprising name entity to be identified Characteristic information, so needing to generate overall word eigenvectors matrix according to the two vector matrixs.
Specifically the vector in the vector sum character form information vector matrix in word vector matrix is carried out vector Splicing.For example, having respectively corresponded word vector matrix X (X1, X2 ... Xn) and radical convolution for character string C (C1, C2 ... Cn) Vector matrix Y (Y1, Y2 ... Yn), wherein X1, X2 ... Xn and Y1, Y2 ... Yn are vector, and the Ci character in character string C is corresponding Word vector sum radical convolution vector be respectively Xi and Yi, Xi and Yi is subjected to vector and splices to obtain new vector Z i, by character The new vector that the corresponding word vector sum radical vector of all characters in sequence C splices, further can be obtained corresponding Word eigenvectors matrix Z (Z1, Z2 ... Zn).
The word eigenvectors matrix is input in nervus opticus network and handles, obtains the electronics by step S50 The name Entity recognition result of case history, wherein the nervus opticus network includes two-way shot and long term memory network layer.
Because naming Entity recognition is sequence labelling problem, the nervus opticus network in the present embodiment uses two-way length Short-term memory network (Bi-LSTM) carrys out the contextual information of abstraction sequence, and shot and long term memory network (LSTM) is a kind of net of RNN Network, LSTM solve the disappearance of gradient present in RNN/explosion issues, also solve RNN cannot capture sequence it is long when rely on ask Topic.
Bi-LSTM employed in the present embodiment includes the LSTM network in the two directions of forward and backward.According to word to The above-mentioned word eigenvectors matrix Z (Z1, Z2 ... Zn) that moment matrix and radical convolution vector matrix generate contains n in character string The feature vector of a character is sequentially output to LSTM network before being sequentially inputted to the feature vector of this n character from left to right Hidden vector corresponding with the feature vector of each characterSimilarly, the feature vector of this n character is successively defeated from right to left Enter to rear to LSTM network, is sequentially output another hidden vector corresponding with the feature vector of each character
It is to be appreciated that the feature vector of n character is passed through to the processing of two-way shot and long term memory network, it is available to arrive The context information of sequence, the information captured compared to unidirectional shot and long term memory network is more comprehensively.By the spy of each character The corresponding two hidden vectors of sign vector splice to obtain two-way hidden vectorAnd by the feature vector pair of all characters The two-way hidden vector answered is put into the same matrix, to generate total hidden vector matrix.
Further, nervus opticus network further includes full articulamentum, for handling two-way shot and long term memory network output Total hidden vector matrix finally obtains the corresponding probability matrix of character string of name entity to be identified.Will be illustrated next how Final name Entity recognition result is obtained according to probability matrix.
Name Entity recognition is also referred to as proper name identification, refers to the entity with certain sense in identification text, for this reality For applying the electronic health record for needing to identify in example, it is physical feeling, examines inspection, disease, symptom, treatment etc..
Name Entity recognition usually requires to solve two problems: first is that entity Boundary Recognition, that is, segment;Second is that determining entity Classification.The solution of both of these problems can be by being used to train neural network, in neural network by the data for having marked label Tag Estimation is carried out to realize to the character of name entity to be identified, wherein a variety of label for labelling methods, such as IOB can be used Label for labelling method or BIOES label for labelling method.
In the present embodiment, when using BIOSE label for labelling method during the name Entity recognition to electronic health record, institute The label of definition has 15 kinds: B-BodyPart, I-BodyPart, E-BodyPart, B-Check, I-Check, E-Check, B- Disease,I-Disease,E-Disease,B-Symptom,I-Symptom,E-Symptom,B-Treatment,I- Treatment, I-Treatment, wherein the beginning of B-BodyPart tag representation " physical feeling " entity, I-BodyPart The inside of tag representation " physical feeling " entity, the end of E-BodyPart tag representation " physical feeling " entity, B-Check mark Label indicate the beginning of " examine and check " entity, and the inside of entity, E-Check label list " are examined and checked " to I-Check tag representation Show the end of " examine and check " entity, the beginning of B-Disease tag representation " disease " entity, I-Disease tag representation " disease The inside of disease " entity, the end of E-Disease tag representation " disease " entity, B-Symptom tag representation " symptom " entity Start, the inside of I-Symptom tag representation " symptom " entity, the end of E-Symptom tag representation " symptom " entity, B- The beginning of Treatment tag representation " treatment " entity, the inside of I-Treatment tag representation " treatment " entity, E- The end of Treatment tag representation " treatment " entity.
Probability value in the probability matrix obtained in above-mentioned steps is the labeling probability of character string prediction, such as When defining above-mentioned 15 kinds of labels, each of character string character is corresponding with 15 probability values, i.e. the Character prediction is The probability value of every kind of label, the highest prediction label result for the character of select probability value.It is every in character string has been determined After the prediction label of a character, the determination of participle and entity class can be carried out to character string according to the meaning of label, is completed Name Entity recognition.
It is to be appreciated that needing using backpropagation and gradient descent algorithm, according to the electronics disease for having identified name entity It goes through and the parameter of first nerves network and nervus opticus network is trained, obtain preferably parameter, known with improving name entity Other accuracy.
Wherein, identified that the character string of the electronic health record of name entity obtains, including but not limited to: Run Script program It extracts the present illness history part in electronic health record and is converted into xml document;Xml document is imported into annotation tool, by medical practitioner elder generation Data mark is carried out to a portion xml document;Consistency detection is carried out to data annotation results;If testing result meets It is expected that threshold value, marks remaining file by medical practitioner;The file for marking good lot name entity is converted mind by Run Script program The training text needed through network.
In order to which the method for the identification electronic health record name entity to the present embodiment is described further, given in Fig. 4 A kind of signal of the nerve network system treatment process of the present embodiment.As shown in figure 4, the nerve network system includes character insertion Layer, the first nerves network comprising convolutional network layer, including it is preceding to shot and long term memory network layer, backward shot and long term memory network layer Nervus opticus network, the system identification electronic health record name entity process are as follows:
1, electronic health record text is obtained, is every time that one group of input character embeding layer is handled with 10 sentences.By sentence Length is set as the maximum sentence length K in 10 sentences, and character radical sequence size is fixed as 10, the good word vector of pre-training Dimension is 100 dimensions, and radical vector dimension is set as 50 dimensions, therefore one group of 10 sentence forms 10 after the processing of character embeding layer The word vector matrix of × K × 100 and the word radical vector matrix of 10 × K × 10 × 50.
2, the input convolutional network layer processing of radical vector matrix is obtained in 1, convolution kernel window size is 3, convolution kernel Quantity is 30, and pond window is 2, is the radical convolution moment of a vector of 10 × K × 30 by the data that convolutional network layer is handled Gust, that is, the radical vector that the external morphology information of each character extracted is tieed up with one 30 indicates, will be in radical vector matrix Radical vector sum word vector matrix in word vector splicing after obtain the word eigenvectors matrix of 10 × K × 130.
3, obtained in 2 word feature vector first pass through abandon layer (dropout) processing, to prevent model over-fitting, The specific gravity of dropout is set as 0.5, before being then input to again into shot and long term memory network and backward shot and long term memory network, If the hidden unit size of shot and long term memory network is 64, preceding to shot and long term memory network and backward shot and long term memory network The output of each time step is stitched together, and obtains the hidden vector matrix of 10 × K × 128.
4, hidden vector matrix vector is obtained in 3 by a full articulamentum, the size of full articulamentum is in training sample Number of labels N, then obtain 10 × K × N probability matrix.
5, since 10 × K of output × N matrix expression is all the probability that a character is marked as N number of label, so Select label of the label of probability maximum probability in N number of probability as character.Such as in Fig. 4 character string " neck, The label that portion, pain, pain " is determined is followed successively by that " B-BodyPart (B-BOD in corresponding diagram), I-BodyPart are (in corresponding diagram I-BOD), B-Symptom (B-SYM in corresponding diagram), I-Symptom (I-SYM in corresponding diagram) ".
In the present embodiment, by extracting the morphological feature of electronic health record character inner, by the feature and word of character itself The internal morphological feature of symbol is sequentially inputted to predict alphanumeric tag in deep neural network, provides a kind of identification accurately The method of the high electronic health record name Entity recognition of rate.
Further, referring to Fig. 5, second embodiment of the invention is based on first embodiment and provides a kind of electronic health record name reality Body knows method for distinguishing, and the present embodiment includes: in step S50
The word eigenvectors matrix is input to the two-way shot and long term memory network and handled, obtained by step S60 The corresponding hidden vector matrix of the character string.
The hidden vector matrix is input to from attention mechanism layer and handles, obtains the character string pair by step S70 The prediction matrix answered.
It is found in the research to electronic health record name entity recognition method, there is dependence, examples between some entities Such as this section of text in electronic health record: " above-mentioned symptom occurs repeatedly and aggravates year by year over 10 years, winter-spring season and goes out after suffering from cold It is existing, until local hospital is medical, it is diagnosed as chronic bronchitis, occurs cough, expectoration repeatedly." in text " winter-spring season, by It is cool " indicate inducement class entity, " chronic bronchitis " indicates disease class entity, and " cough, expectoration " indicates symptom class entity.It is very bright Aobvious, " winter-spring season " indicates the time in general sentence, but in the present embodiment as the training sample of neural network In case history, it indicates inducement, because the arrival of winter-spring season induces the recurrence of Seasonal diseases, professional doctor marks it For inducement, so neural network, in the entity type of decision " winter-spring season ", neural network should mainly use " chronic branch gas Guan Yan " and " cough ", the information of " expectoration ".Therefore, in the present embodiment using ignoring the distance between entity from attention mechanism, Directly calculate the dependence between them.
The dependence between the hidden vector in the hidden vector matrix is calculated according to following formula:
ft,t′=σ (wa tanh(wtht+wt′ht′)),
Wherein, t and t' indicates different time steps, wa, wt, wt′For weight vectors, σ is sigmoid function, htFor and ht′ For the hidden vector of different time step;
It is each hidden vector h according to following formulakCalculate corresponding attention weight
Wherein, e is exponential function, and N is the number of the hidden vector,
The calculating of attention weight is described in detail below with reference to Fig. 6.
As shown in Figure 6, if the character string of this processing is { neck, portion, pain, pain }, then character string length is 4, Being input to from the hidden vector of attention mechanism layer has h1, h2, h3, h4, and the N value in corresponding formula is 4.
Since character string is sequentially inputted to be handled in nerve network system sequentially in time, so character Each character in sequence is corresponding in turn to continuous different time step, such as can be in this example by " neck, portion, pain, pain " This corresponding time step of four characters is labeled as t1, t2, t3, t4, at the same the corresponding hidden vector of each character also with these times Step corresponds.
For each character to be identified in character string, there is the hidden vector of output corresponding with its time step, together When to need corresponding calculate in addition to this time step be the attention weight vectors identified with other times step.Such as in t1 Input " neck " word is carved, has the hidden vector h1 of output corresponding with the t1 moment, the time step other than this time step includes t2, t3, t4, root According to preset ruleCalculative weight vectors areAttention weight is calculated at this time Fortran is following formula, wherein k value range includes t2, t3, t4.
Corresponding hidden vector, which is multiplied by, with obtained attention weight according to following formula obtains attention force vector to the endThe corresponding attention force vector of multiple hidden vectors is finally formed into attention vector matrix.
Due to the prediction of the character string in hidden vector matrix and attention vector matrix all comprising name entity to be identified Information, so needing to generate the hidden vector matrix of attention including overall information according to the two vector matrixs.
Specifically by the attention force vector in the hidden vector sum attention vector matrix in hidden vector matrix according to following Formula carry out vector splicing.
For example, there is hidden vector matrix H (H1,H2…Hn) and attention vector matrixWherein H1、 H2…HnWithIt is vector, by HiWithVector is carried out to splice to obtain new vector H 'i, by it is all it is hidden to Amount and pays attention to the new vector that splices of force vector and obtain the corresponding hidden vector matrix H ' (H ' of attention1,H′2…H′n)。
The hidden vector matrix of obtained attention is inputted the full articulamentum to handle, it is corresponding pre- to obtain character string Survey matrix.
The prediction matrix is inputted the conditional random field models and handled, obtains the electronic health record by step S80 Name Entity recognition result.
If directly individually to predict character mark using Bi-LSTM network layer or the hidden vector obtained from attention mechanism layer Label, do not account for the dependence between label, can be potentially encountered bottleneck when promoting prediction result accuracy.Such as I- The back symptom label may be I-disease, it is evident that this sequence label is wrong.Usually know in name entity In other task, there is very strong dependence between label, such as next label of B-symptom cannot be I-disease, or I-symptom can only occur in the back person B-symptom.
Therefore, in order to further increase the accuracy of name Entity recognition, in the present embodiment use condition random field (CRF) model is predicted to carry out final alphanumeric tag.CRF model overcomes hidden Markov model (Hidden Markov Model) independence assumption disadvantage, and solve maximum entropy Markov model (Maximum Entropy Markov Model) Marking bias problem, the action principle of CRF model is illustrated below.
To a list entries x (x1, x2 ... xn), if P is the matrix through obtaining from after attention network, P ∈ Rn×s, s is Number of labels, PijIndicate that i-th of Character prediction is j-th of label score in list entries.For a forecasting sequence y (y1, Y2 ... yn), define its score are as follows:
A indicates transfer matrix, A ∈ Rs+2×s+2, AijThe probability (score) that label j is transferred to from label i is indicated, then in institute The probability of sequences y is generated on possible sequence label using softmax:
The log probability of correct sequence label is maximized in the training process:
Yx indicates all possible sequence label, and the faulty sequence of BIOES labelling schemes constraint is unsatisfactory for including those.? When decoding, the largest score that output sequence obtains is predicted are as follows:
It, can be by using viterbi algorithm effectively training and decoding for CRF model.
Fig. 6 is finally combined to be described further the method for the identification electronic health record name entity of the present embodiment.In Fig. 6 A kind of nerve network system structural representation of the present embodiment is given, which includes character embeding layer, comprising inclined The first nerves network of other CNN convolutional layer, including the second minds LSTM layers two-way, from attention mechanism layer and conditional random field models Through network, which names the process of entity are as follows:
1, electronic health record text is obtained, is every time that one group of input character embeding layer is handled with 10 sentences.By sentence Length is set as the maximum sentence length K in 10 sentences, and character radical sequence size is fixed as 10, the good word vector of pre-training Dimension is 100 dimensions, and radical vector dimension is set as 50 dimensions, therefore one group of 10 sentence forms 10 after the processing of character embeding layer The word vector matrix of × K × 100 and the word radical vector matrix of 10 × K × 10 × 50.
2, the input radical CNN convolutional network layer processing of radical vector matrix is obtained in 1, convolution kernel window size is 3, Convolution nuclear volume is 30, and pond window is 2, is the inclined of 10 × K × 30 by the data that radical CNN convolutional network layer is handled Other convolution vector matrix, that is, the radical vector that the external morphology information of each character extracted is tieed up with one 30 indicate, will be inclined The word feature vector of 10 × K × 130 is obtained after word vector splicing in radical vector sum word vector matrix in other vector matrix Matrix.
3, obtained in 2 word feature vector first pass through abandon layer (dropout) processing, to prevent model over-fitting, The specific gravity of dropout is set as 0.5, is then input to again in two-way LSTM network, if the hidden unit size of LSTM network It is 64, the output of the two-way each time step of LSTM is stitched together, the hidden vector matrix of 10 × K × 128 is obtained.
4, successively pass through processing from attention mechanism layer, conditional random field models hidden vector matrix vector is obtained in 3, obtain To 10 × K × N prediction probability matrix.
5, since 10 × K of output × N matrix expression is all the probability that a character is marked as N number of label, so Select label of the label of probability maximum probability in N number of probability as character.
In the present embodiment, by extracting the morphological feature of electronic health record character inner, by the feature and word of character itself The internal morphological feature of symbol is sequentially inputted to predict alphanumeric tag in deep neural network, provides a kind of identification accurately The high electronic health record of rate names entity recognition method.
Further, referring to Fig. 7, third embodiment of the invention is based on first embodiment and provides a kind of electronic health record name reality Body knows method for distinguishing, and the present embodiment includes: in step S50
The word eigenvectors matrix is input to the two-way shot and long term memory network and handled, obtained by step S90 The corresponding hidden vector matrix of the character string.
The hidden vector matrix is input to from attention mechanism layer and handles, obtains the electronic health record by step S100 Name Entity recognition result.
It should be understood that being based on first embodiment, the considerations of for different application scene or process resource and second is real A different from is applied, as shown in figure 8, the nervus opticus network in the present embodiment is only included from attention mechanism layer, does not include Conditional random field models.
The hidden vector matrix of two-way shot and long term memory network output is input to from after attention mechanism layer, first calculates hidden moment of a vector The attention weight of hidden vector in battle array generates attention vector matrix further according to attention weight and hidden vector, then according to Hidden vector matrix and attention vector matrix generate the hidden vector matrix of attention, finally the hidden vector matrix of attention is inputted complete Articulamentum is handled, and the corresponding prediction probability matrix of character string to be identified is obtained.
Probability value in the prediction probability matrix obtained in above-mentioned steps is the label point of character string to be identified prediction Class probability, the highest prediction label result for corresponding character of select probability value.Each character in character string has been determined After prediction label, the determination of participle and entity class can be carried out to character string according to the meaning of label, completes name entity Identification.
In the present embodiment, the morphological feature that electronic health record character inner is extracted by convolutional neural networks, by character sheet The feature of body and the morphological feature of character inner be sequentially inputted to two-way shot and long term memory network layer in deep neural network and Alphanumeric tag is predicted from attention mechanism layer, provides a kind of electronic health record name entity recognition method of precise and high efficiency.
Further, referring to Fig. 9, fourth embodiment of the invention is based on first embodiment and provides a kind of electronic health record name reality Body knows method for distinguishing, and the present embodiment includes: in step S50
The word eigenvectors matrix is input to two-way shot and long term memory network and handled, obtains institute by step S110 State the corresponding hidden vector matrix of character string.
The hidden vector matrix is input to conditional random field models and handled by step S120, obtains the electronics disease The name Entity recognition result gone through.
It should be understood that being based on first embodiment, the considerations of for different application scene or process resource and second is real A different from is applied, as shown in Figure 10, the nervus opticus network in the present embodiment only includes conditional random field models, does not wrap It includes from attention mechanism layer.
After the hidden vector matrix of two-way shot and long term memory network output is input to conditional random field models, through processing obtain to Identify the corresponding prediction probability matrix of character string.
Probability value in the prediction probability matrix obtained in above-mentioned steps is the label point of character string to be identified prediction Class probability, the highest prediction label result for corresponding character of select probability value.Each character in character string has been determined After prediction label, the determination of participle and entity class can be carried out to character string according to the meaning of label, completes name entity Identification.
In the present embodiment, the morphological feature that electronic health record character inner is extracted by convolutional neural networks, by character sheet The feature of body and the morphological feature of character inner be sequentially inputted to two-way shot and long term memory network layer in deep neural network and Conditional random field models predict alphanumeric tag, provide a kind of electronic health record name Entity recognition side of precise and high efficiency Method.
The present invention also provides a kind of electronic health records to name entity recognition device, which names entity recognition device packet It includes: the electronic health record name entity that memory, processor and being stored in can be run on the memory and on the processor Identifying processing program, the electronic health record name Entity recognition processing routine realize the electronics when being executed by the processor Case history names the step of method of Entity recognition.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium On be stored with electronic health record name Entity recognition processing routine, electronic health record name Entity recognition processing routine is by processor The step of method of the electronic health record name Entity recognition is realized when execution.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of method of electronic health record name Entity recognition, which is characterized in that the side of the electronic health record name Entity recognition Method the following steps are included:
Generate the corresponding word vector matrix of character string of the electronic health record of name entity to be identified;
Generate the corresponding radical vector matrix of the character string;
The radical vector matrix is input to first nerves network to handle, obtains the corresponding radical volume of the character string Product vector matrix, wherein the first nerves network includes convolutional neural networks layer;
Word eigenvectors matrix is generated according to the word vector matrix and the radical convolution vector matrix;
The word eigenvectors matrix is input in nervus opticus network and is handled, the name for obtaining the electronic health record is real Body recognition result, wherein the nervus opticus network includes two-way shot and long term memory network layer;
Wherein, the parameter of the first nerves network and the nervus opticus network is according to the electronic health record for having identified name entity Training obtains.
2. the method for electronic health record name Entity recognition as described in claim 1, which is characterized in that described to generate the character The step of sequence corresponding radical vector matrix includes:
Obtain the Hanzi component of each character in the character string;
The radical vector of each character is generated according to the Hanzi component;
The corresponding radical vector matrix of the character string is generated according to the radical vector of each character.
3. the method for electronic health record name Entity recognition as claimed in claim 2, which is characterized in that the nervus opticus network It further include full articulamentum, the described word eigenvectors matrix is input in nervus opticus network is handled, and is obtained described The step of name Entity recognition result of electronic health record includes:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character string Corresponding hidden vector matrix;
The hidden vector matrix is inputted the full articulamentum to handle, obtains the name Entity recognition knot of the electronic health record Fruit.
4. the method for electronic health record name Entity recognition as claimed in claim 2, which is characterized in that the nervus opticus network It further include from attention mechanism layer, the described word eigenvectors matrix is input in nervus opticus network handles, obtain The step of name Entity recognition result of the electronic health record includes:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character string Corresponding hidden vector matrix;
The hidden vector matrix is input to from attention mechanism layer and is handled, the name Entity recognition of the electronic health record is obtained As a result.
5. the method for electronic health record name Entity recognition as claimed in claim 2, which is characterized in that the nervus opticus network Further include from attention mechanism layer and conditional random field models, it is described that the word eigenvectors matrix is input to nervus opticus network In the step of being handled, obtaining the name Entity recognition result of the electronic health record include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character string Corresponding hidden vector matrix;
The hidden vector matrix is input to from attention mechanism layer and is handled, the corresponding prediction square of the character string is obtained Battle array;
The prediction matrix is inputted the conditional random field models to handle, the name entity for obtaining the electronic health record is known Other result.
6. the method for electronic health record name Entity recognition as claimed in claim 5, which is characterized in that described from attention mechanism layer Including full articulamentum, the described hidden vector matrix is input to from attention mechanism layer is handled, and obtains the character string The step of corresponding prediction matrix includes:
Calculate the attention weight of hidden vector in the hidden vector matrix;
Attention vector matrix is generated according to the attention weight and the hidden vector;
The hidden vector matrix of attention is generated according to the hidden vector matrix and the attention vector matrix;
The hidden vector matrix of the attention is inputted the full articulamentum to handle, obtains the corresponding prediction of the character string Matrix.
7. the method for electronic health record as claimed in claim 6 name Entity recognition, which is characterized in that it is described calculate it is described it is hidden to The attention weight step of hidden vector includes: in moment matrix
The dependence in the hidden vector matrix between hidden vector is calculated according to following formula:
ft,t′=σ (watanh(wtht+wt′ht′)),
Wherein, t and t' indicates different time steps, wa, wt, wt′For weight vectors, σ is sigmoid function, htFor and ht′For not With the hidden vector of time step;
It is each hidden vector h in shown hidden vector matrix according to following formulakCalculate corresponding attention weight
Wherein, e is exponential function, and N is the number of the hidden vector,
8. the method for electronic health record name Entity recognition as claimed in claim 2, which is characterized in that the nervus opticus network It further include conditional random field models, the described word eigenvectors matrix is input in nervus opticus network is handled, and is obtained To the electronic health record name Entity recognition result the step of include:
The word eigenvectors matrix is input to the two-way shot and long term memory network to handle, obtains the character string Corresponding hidden vector matrix;
The hidden vector matrix is input to conditional random field models to handle, the name entity for obtaining the electronic health record is known Other result.
9. a kind of electronic health record names entity recognition device, which is characterized in that the electronic health record names entity recognition device packet It includes: the electronic health record life that memory, processor, camera and being stored in can be run on the memory and on the processor The processing routine of the processing routine of name Entity recognition, the electronic health record name Entity recognition is realized when being executed by the processor The step of naming the method for Entity recognition such as electronic health record described in any item of the claim 1 to 8.
10. a kind of storage medium, which is characterized in that be stored with the processing of electronic health record name Entity recognition on the storage medium The processing routine of program, the electronic health record name Entity recognition is realized when being executed by processor as any in claim 1 to 8 The step of method of electronic health record name Entity recognition described in.
CN201811282557.3A 2018-10-30 2018-10-30 Method, device and storage medium for identifying named entities of electronic medical records Active CN109388807B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811282557.3A CN109388807B (en) 2018-10-30 2018-10-30 Method, device and storage medium for identifying named entities of electronic medical records

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811282557.3A CN109388807B (en) 2018-10-30 2018-10-30 Method, device and storage medium for identifying named entities of electronic medical records

Publications (2)

Publication Number Publication Date
CN109388807A true CN109388807A (en) 2019-02-26
CN109388807B CN109388807B (en) 2021-09-21

Family

ID=65427746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811282557.3A Active CN109388807B (en) 2018-10-30 2018-10-30 Method, device and storage medium for identifying named entities of electronic medical records

Country Status (1)

Country Link
CN (1) CN109388807B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871544A (en) * 2019-03-25 2019-06-11 平安科技(深圳)有限公司 Entity recognition method, device, equipment and storage medium based on Chinese case history
CN109933801A (en) * 2019-03-25 2019-06-25 北京理工大学 Two-way LSTM based on predicted position attention names entity recognition method
CN109960728A (en) * 2019-03-11 2019-07-02 北京市科学技术情报研究所(北京市科学技术信息中心) A kind of open field conferencing information name entity recognition method and system
CN110046349A (en) * 2019-03-26 2019-07-23 平安科技(深圳)有限公司 Information identifying method, device, equipment and storage medium based on Chinese case history
CN110135427A (en) * 2019-04-11 2019-08-16 北京百度网讯科技有限公司 The method, apparatus, equipment and medium of character in image for identification
CN110162782A (en) * 2019-04-17 2019-08-23 平安科技(深圳)有限公司 Entity extraction method, apparatus, equipment and storage medium based on Medical Dictionary
CN110287483A (en) * 2019-06-06 2019-09-27 广东技术师范大学 A kind of unknown word identification method and system using five-stroke etymon deep learning
CN110334357A (en) * 2019-07-18 2019-10-15 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition
CN110427493A (en) * 2019-07-11 2019-11-08 新华三大数据技术有限公司 Electronic health record processing method, model training method and relevant apparatus
CN110516228A (en) * 2019-07-04 2019-11-29 湖南星汉数智科技有限公司 Name entity recognition method, device, computer installation and computer readable storage medium
CN110555441A (en) * 2019-09-10 2019-12-10 杭州橙鹰数据技术有限公司 character recognition method and device
CN110688855A (en) * 2019-09-29 2020-01-14 山东师范大学 Chinese medical entity identification method and system based on machine learning
CN110929749A (en) * 2019-10-15 2020-03-27 平安科技(深圳)有限公司 Text recognition method, text recognition device, text recognition medium and electronic equipment
CN111143534A (en) * 2019-12-26 2020-05-12 腾讯云计算(北京)有限责任公司 Method and device for extracting brand name based on artificial intelligence and storage medium
CN111178074A (en) * 2019-12-12 2020-05-19 天津大学 Deep learning-based Chinese named entity recognition method
CN111192692A (en) * 2020-01-02 2020-05-22 上海联影智能医疗科技有限公司 Entity relationship determination method and device, electronic equipment and storage medium
CN111339764A (en) * 2019-09-18 2020-06-26 华为技术有限公司 Chinese named entity recognition method and device
CN111352977A (en) * 2020-03-10 2020-06-30 浙江大学 Time sequence data monitoring method based on self-attention bidirectional long-short term memory network
US10740561B1 (en) 2019-04-25 2020-08-11 Alibaba Group Holding Limited Identifying entities in electronic medical records
CN111797626A (en) * 2019-03-21 2020-10-20 阿里巴巴集团控股有限公司 Named entity identification method and device
WO2020211250A1 (en) * 2019-04-19 2020-10-22 平安科技(深圳)有限公司 Entity recognition method and apparatus for chinese medical record, device and storage medium
EP3767516A1 (en) * 2019-07-18 2021-01-20 Ricoh Company, Ltd. Named entity recognition method, apparatus, and computer-readable recording medium
CN112434520A (en) * 2020-11-11 2021-03-02 北京工业大学 Named entity recognition method and device and readable storage medium
CN112836514A (en) * 2020-06-19 2021-05-25 合肥量圳建筑科技有限公司 Nested entity recognition method and device, electronic equipment and storage medium
CN113408289A (en) * 2021-06-29 2021-09-17 广东工业大学 Multi-feature fusion supply chain management entity knowledge extraction method and system
CN115201904A (en) * 2022-07-18 2022-10-18 北京石油化工学院 Microseism data compression and event detection method based on edge intelligence
CN112836514B (en) * 2020-06-19 2024-07-02 合肥量圳建筑科技有限公司 Nested entity identification method, apparatus, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106980608A (en) * 2017-03-16 2017-07-25 四川大学 A kind of Chinese electronic health record participle and name entity recognition method and system
CN107977361A (en) * 2017-12-06 2018-05-01 哈尔滨工业大学深圳研究生院 The Chinese clinical treatment entity recognition method represented based on deep semantic information
US20180275817A1 (en) * 2017-03-27 2018-09-27 Tricorn (Beijing) Technology Co., Ltd. Information processing apparatus, information processing method and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106980608A (en) * 2017-03-16 2017-07-25 四川大学 A kind of Chinese electronic health record participle and name entity recognition method and system
US20180275817A1 (en) * 2017-03-27 2018-09-27 Tricorn (Beijing) Technology Co., Ltd. Information processing apparatus, information processing method and computer-readable storage medium
CN107977361A (en) * 2017-12-06 2018-05-01 哈尔滨工业大学深圳研究生院 The Chinese clinical treatment entity recognition method represented based on deep semantic information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱丹浩、杨蕾、王东波: "基于深度学习的中文机构名识别研究——一种汉字级别的循环神经网络方法", 《现代图书情报技术》 *

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960728A (en) * 2019-03-11 2019-07-02 北京市科学技术情报研究所(北京市科学技术信息中心) A kind of open field conferencing information name entity recognition method and system
CN109960728B (en) * 2019-03-11 2021-01-22 北京市科学技术情报研究所(北京市科学技术信息中心) Method and system for identifying named entities of open domain conference information
CN111797626A (en) * 2019-03-21 2020-10-20 阿里巴巴集团控股有限公司 Named entity identification method and device
CN109871544A (en) * 2019-03-25 2019-06-11 平安科技(深圳)有限公司 Entity recognition method, device, equipment and storage medium based on Chinese case history
CN109933801A (en) * 2019-03-25 2019-06-25 北京理工大学 Two-way LSTM based on predicted position attention names entity recognition method
CN110046349A (en) * 2019-03-26 2019-07-23 平安科技(深圳)有限公司 Information identifying method, device, equipment and storage medium based on Chinese case history
CN110135427B (en) * 2019-04-11 2021-07-27 北京百度网讯科技有限公司 Method, apparatus, device and medium for recognizing characters in image
CN110135427A (en) * 2019-04-11 2019-08-16 北京百度网讯科技有限公司 The method, apparatus, equipment and medium of character in image for identification
CN110162782B (en) * 2019-04-17 2022-04-01 平安科技(深圳)有限公司 Entity extraction method, device and equipment based on medical dictionary and storage medium
CN110162782A (en) * 2019-04-17 2019-08-23 平安科技(深圳)有限公司 Entity extraction method, apparatus, equipment and storage medium based on Medical Dictionary
WO2020211250A1 (en) * 2019-04-19 2020-10-22 平安科技(深圳)有限公司 Entity recognition method and apparatus for chinese medical record, device and storage medium
US10740561B1 (en) 2019-04-25 2020-08-11 Alibaba Group Holding Limited Identifying entities in electronic medical records
CN110287483A (en) * 2019-06-06 2019-09-27 广东技术师范大学 A kind of unknown word identification method and system using five-stroke etymon deep learning
CN110287483B (en) * 2019-06-06 2023-12-05 广东技术师范大学 Unregistered word recognition method and system utilizing five-stroke character root deep learning
CN110516228A (en) * 2019-07-04 2019-11-29 湖南星汉数智科技有限公司 Name entity recognition method, device, computer installation and computer readable storage medium
CN110427493A (en) * 2019-07-11 2019-11-08 新华三大数据技术有限公司 Electronic health record processing method, model training method and relevant apparatus
CN110334357A (en) * 2019-07-18 2019-10-15 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition
EP3767516A1 (en) * 2019-07-18 2021-01-20 Ricoh Company, Ltd. Named entity recognition method, apparatus, and computer-readable recording medium
CN112329465A (en) * 2019-07-18 2021-02-05 株式会社理光 Named entity identification method and device and computer readable storage medium
CN110555441A (en) * 2019-09-10 2019-12-10 杭州橙鹰数据技术有限公司 character recognition method and device
CN111339764A (en) * 2019-09-18 2020-06-26 华为技术有限公司 Chinese named entity recognition method and device
CN110688855A (en) * 2019-09-29 2020-01-14 山东师范大学 Chinese medical entity identification method and system based on machine learning
CN110929749A (en) * 2019-10-15 2020-03-27 平安科技(深圳)有限公司 Text recognition method, text recognition device, text recognition medium and electronic equipment
CN110929749B (en) * 2019-10-15 2022-04-29 平安科技(深圳)有限公司 Text recognition method, text recognition device, text recognition medium and electronic equipment
CN111178074A (en) * 2019-12-12 2020-05-19 天津大学 Deep learning-based Chinese named entity recognition method
CN111178074B (en) * 2019-12-12 2023-08-25 天津大学 Chinese named entity recognition method based on deep learning
CN111143534A (en) * 2019-12-26 2020-05-12 腾讯云计算(北京)有限责任公司 Method and device for extracting brand name based on artificial intelligence and storage medium
CN111192692B (en) * 2020-01-02 2023-12-08 上海联影智能医疗科技有限公司 Entity relationship determination method and device, electronic equipment and storage medium
CN111192692A (en) * 2020-01-02 2020-05-22 上海联影智能医疗科技有限公司 Entity relationship determination method and device, electronic equipment and storage medium
CN111352977B (en) * 2020-03-10 2022-06-17 浙江大学 Time sequence data monitoring method based on self-attention bidirectional long-short term memory network
CN111352977A (en) * 2020-03-10 2020-06-30 浙江大学 Time sequence data monitoring method based on self-attention bidirectional long-short term memory network
CN112836514A (en) * 2020-06-19 2021-05-25 合肥量圳建筑科技有限公司 Nested entity recognition method and device, electronic equipment and storage medium
CN112836514B (en) * 2020-06-19 2024-07-02 合肥量圳建筑科技有限公司 Nested entity identification method, apparatus, electronic device and storage medium
CN112434520A (en) * 2020-11-11 2021-03-02 北京工业大学 Named entity recognition method and device and readable storage medium
CN113408289A (en) * 2021-06-29 2021-09-17 广东工业大学 Multi-feature fusion supply chain management entity knowledge extraction method and system
CN113408289B (en) * 2021-06-29 2024-04-16 广东工业大学 Multi-feature fusion supply chain management entity knowledge extraction method and system
CN115201904A (en) * 2022-07-18 2022-10-18 北京石油化工学院 Microseism data compression and event detection method based on edge intelligence
CN115201904B (en) * 2022-07-18 2023-03-03 北京石油化工学院 Microseism data compression and event detection method based on edge intelligence

Also Published As

Publication number Publication date
CN109388807B (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN109388807A (en) The method, apparatus and storage medium of electronic health record name Entity recognition
CN106980683B (en) Blog text abstract generating method based on deep learning
CN110472229A (en) Sequence labelling model training method, electronic health record processing method and relevant apparatus
CN109471945B (en) Deep learning-based medical text classification method and device and storage medium
CN107679447A (en) Facial characteristics point detecting method, device and storage medium
CN110490213A (en) Image-recognizing method, device and storage medium
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
CN110517785A (en) Lookup method, device and the equipment of similar case
CN113095415B (en) Cross-modal hashing method and system based on multi-modal attention mechanism
CN110459282A (en) Sequence labelling model training method, electronic health record processing method and relevant apparatus
CN107168992A (en) Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence
US20200074280A1 (en) Semi-supervised learning using clustering as an additional constraint
CN110442840A (en) Sequence labelling network update method, electronic health record processing method and relevant apparatus
CN111709398A (en) Image recognition method, and training method and device of image recognition model
CN111368536A (en) Natural language processing method, apparatus and storage medium therefor
CN112000778A (en) Natural language processing method, device and system based on semantic recognition
CN114419351A (en) Image-text pre-training model training method and device and image-text prediction model training method and device
CN110210540A (en) Across social media method for identifying ID and system based on attention mechanism
CN114648032B (en) Training method and device of semantic understanding model and computer equipment
CN115758282A (en) Cross-modal sensitive information identification method, system and terminal
Wang et al. Computer vision for lifelogging: Characterizing everyday activities based on visual semantics
CN116578738B (en) Graph-text retrieval method and device based on graph attention and generating countermeasure network
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
CN113761188A (en) Text label determination method and device, computer equipment and storage medium
CN116701635A (en) Training video text classification method, training video text classification device, training video text classification equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant