CN106407183A - Method and device for generating medical named entity recognition system - Google Patents

Method and device for generating medical named entity recognition system Download PDF

Info

Publication number
CN106407183A
CN106407183A CN201610864046.7A CN201610864046A CN106407183A CN 106407183 A CN106407183 A CN 106407183A CN 201610864046 A CN201610864046 A CN 201610864046A CN 106407183 A CN106407183 A CN 106407183A
Authority
CN
China
Prior art keywords
medical treatment
name entity
treatment name
recognition system
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610864046.7A
Other languages
Chinese (zh)
Other versions
CN106407183B (en
Inventor
陈成
康波
稽可睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Medical Cross Cloud (beijing) Technology Co Ltd
Yidu Cloud Beijing Technology Co Ltd
Original Assignee
Medical Cross Cloud (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medical Cross Cloud (beijing) Technology Co Ltd filed Critical Medical Cross Cloud (beijing) Technology Co Ltd
Priority to CN201610864046.7A priority Critical patent/CN106407183B/en
Publication of CN106407183A publication Critical patent/CN106407183A/en
Application granted granted Critical
Publication of CN106407183B publication Critical patent/CN106407183B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • G06F19/326

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and device for generating a medical named entity recognition system. The method for generating the medical named entity recognition system comprises the steps of receiving a plurality of medical text samples through a medical named entity recognition system and obtaining a plurality of candidate medical named entities from the plurality of medical text samples; labeling the plurality of candidate medical named entities to obtain a plurality of recommended medical named entities; calculating the quantity ratio of the recommended medical named entities to the candidate medical named entities and judging whether the quantity ratio is smaller than a first preset value or not; if the quantity ratio is smaller than the first preset value, inputting the recommended medical named entities into the medical named entity recognition system, obtaining the plurality of candidate medical named entities from the plurality of medical text samples and going to the step of labeling the plurality of candidate medical named entities; and if the quantity ratio is not smaller than the first preset value, taking the current medical named entity recognition system as a target medical named entity recognition system.

Description

Medical treatment name entity recognition system generation method and device
Technical field
It relates to medical big data technical field, more particularly, to a kind of medical treatment name entity recognition system generation method And medical treatment name entity recognition system generating means.
Background technology
In medical procedure, substantial amounts of medical data can be produced, mainly comprise the case history of patient, doctor's advice, Nursing writs, inspection Finding and inspection conclusion etc., these data reflect essential information, clinical diagnosises, therapeutic process and the result of patient.With doctor Treat the foundation of System information and perfect, increasing medical data switchs to electronic typing by the mode of manual record.Mesh Before, for clinical information such as case history, doctor's advice, Nursing writs and audit reports mainly by healthcare givers by way of natural language Write and form, message structure is complex.Thus how these unstructured datas a large amount of carried out processing, analyze and excavation is The major issue that medical information is built.Wherein, it is requisite for carrying out medical treatment name Entity recognition.
In prior art, the recognition methodss to name entity generally comprise three kinds:Method based on dictionary, based on heuristic The method of rule and the method based on machine learning.First two method has very strong dependency to dictionary or rule, and in Chinese Aspect, operational resource is relatively deficient.Additionally, for magnanimity medical treatment natural language text, due to different medical people The literary style of member differs so that same medical treatment name entity generally has a variety of literary styles.And the method based on machine learning is led to It is all often the method using there being supervision, need artificial mark in a large number to can be only achieved certain effect.Therefore, how in natural language in a large number In speech text, the quick output significant medical treatment name entity that excavates is technical problem urgently to be resolved hurrily.
Disclosed in described background section, above- mentioned information is only used for strengthening the understanding of background of this disclosure, therefore it Can include not constituting the information to prior art known to persons of ordinary skill in the art.
Content of the invention
The purpose of the disclosure is to provide a kind of medical treatment name entity recognition system generation method and medical treatment name entity to know Other system generating means, so at least overcome to a certain extent being led to due to restriction and the defect of correlation technique or The multiple problem of person.
According to an aspect of this disclosure, provide a kind of medical treatment name entity recognition system generation method, including:
Multiple medical treatment samples of text are received by a medical treatment name entity recognition system, and using machine learning from described many Individual medical treatment samples of text obtains multiple candidate's medical treatment name entities;
The plurality of candidate medical treatment name entity is labeled, obtains multiple recommendation medical treatment name entities;
Calculate the described ratio of number recommending medical treatment name entity and described candidate medical treatment name entity, and judge described number Whether the ratio of amount is less than the first preset value;
When judging that described ratio of number is less than described first preset value, medical treatment name entity is recommended to input to institute by described State medical treatment name entity recognition system, and obtain multiple described candidate's medical treatment names according to this from the plurality of medical treatment samples of text Entity simultaneously goes to the step that the plurality of candidate medical treatment name entity is labeled;
When judging that described ratio of number is not less than described first preset value, made with current medical name entity recognition system For target medical treatment name entity recognition system.
In a kind of exemplary embodiment of the disclosure, described using machine learning from the plurality of medical treatment samples of text obtain Multiple candidate's medical treatment name entities are taken to include:
Calculate the weighted value of each name entity in the plurality of medical treatment samples of text;
Weight selection value highest multiple name entity is as described candidate medical treatment name entity.
In a kind of exemplary embodiment of the disclosure, calculate each name entity in the plurality of medical treatment samples of text Weighted value includes:
Under spark environment, calculated each in the plurality of medical treatment samples of text by N-Gram algorithm and tf-idf algorithm The weighted value of individual name entity.
In a kind of exemplary embodiment of the disclosure, described medical treatment name entity is recommended to input to described by the plurality of Medical treatment name entity recognition system, and it is real to obtain multiple described candidate's medical treatment names according to this from the plurality of medical treatment samples of text Body includes:
Obtain similar to the described contextual feature recommending medical treatment to name entity from the plurality of medical treatment sample text Name entity names entity as supplementing medical treatment;
Increase weighted value in the plurality of medical treatment samples of text for the described supplementary medical treatment name entity;
Weight selection value highest multiple name entity is as described candidate medical treatment name entity.
In a kind of exemplary embodiment of the disclosure, described acquisition from the plurality of medical treatment sample text is pushed away with described Recommending medical treatment names the similar name entity of the contextual feature of entity to include as supplementing medical treatment name entity:
Participle is carried out to the plurality of medical treatment sample text according to preset model, obtains multiple cutting units;
Obtain the plurality of contextual feature recommending medical treatment name entity, and respectively will be real for each described recommendation medical treatment name The contextual feature of body is expressed as primary vector;
Obtain the contextual feature of the plurality of cutting unit, and respectively by the contextual feature table of each described cutting unit It is shown as secondary vector;
Calculate the similarity of described primary vector and described secondary vector, and judge whether described similarity is pre- less than second If value;
Choose the secondary vector being not less than described second preset value with the similarity of described primary vector, and will be with described the The corresponding cutting unit of the contextual feature of two vector representations is as described candidate medical treatment name entity.
In a kind of exemplary embodiment of the disclosure, described preset model is HMM.
In a kind of exemplary embodiment of the disclosure, wherein, will be real for each described recommendation medical treatment name by word2vec The contextual feature of body is expressed as primary vector and the contextual feature of each described cutting unit is expressed as secondary vector.
In a kind of exemplary embodiment of the disclosure, described first preset value is 85%-90%.
In a kind of exemplary embodiment of the disclosure, wherein, rower is being entered to the plurality of candidate medical treatment name entity While note, the described recommendation medical treatment name entity being marked is classified;
While obtaining described candidate's medical treatment name entity from the plurality of medical treatment samples of text, according to described with this The classification of the similar described recommendation medical treatment name entity of candidate's medical treatment name entity is recommended to candidate's medical treatment name entity this described Classification.
According to an aspect of this disclosure, provide a kind of medical treatment name entity recognition system generating means, including:
Cold start-up unit, for receiving multiple medical treatment samples of text by a medical treatment name entity recognition system, and utilizes Machine learning obtains multiple candidate's medical treatment name entities from the plurality of medical treatment samples of text;
Mark unit, for being labeled to the plurality of candidate medical treatment name entity, obtains multiple recommendation medical treatment names Entity;
Assessment unit, for calculate described recommend medical treatment name entity and described candidate medical treatment name entity quantity it Than, and judge whether described ratio of number is less than the first preset value;
Feedback unit, for when judging that described ratio of number is less than described first preset value, recommending medical treatment life by described Name entity inputs to described medical treatment name entity recognition system, and obtains multiple institutes from the plurality of medical treatment samples of text according to this State candidate's medical treatment name entity and feed back to described mark unit;
Output unit, for when judging that described ratio of number is not less than described first preset value, with current medical name Entity recognition system is as target medical treatment name entity recognition system.
The medical treatment name entity recognition system generation method of the disclosure and device, by medical treatment name entity recognition system The input medical samples of text based on natural language in a large number, obtains multiple candidate's medical treatment name entities using machine learning;Then Multiple candidates medical treatment name entity is labeled, obtains multiple recommendation medical treatment name entities;Subsequently, recommendation medical treatment life can be calculated It is simultaneously compared, when ratio of number is not less than by the ratio of number of name entity and candidate's medical treatment name entity with the first preset value During the first preset value, illustrate that the performance of medical treatment name entity recognition system has met needs, now, can be directly by medical treatment name Entity recognition system is exported as target medical treatment name entity recognition system;When ratio of number is less than the first preset value, The performance still unsatisfied desire of medical treatment name entity recognition system is then described, medical treatment name entity can be recommended to input to doctor by multiple Treat name entity recognition system and recommend medical treatment name entity to obtain multiple candidates from multiple medical treatment samples of text according to multiple Medical treatment name entity being labeled again, obtains more and recommends medical treatment name entities, iteration that the rest may be inferred, up to quantity it Medical treatment, during not less than the first preset value, when that is, the performance of medical treatment name entity recognition system has met needs, can be named by ratio Entity recognition system is exported as target medical treatment name entity recognition system.
In above process, in conjunction with machine learning and artificial mark, namely with reference to non-supervisory and have supervision algorithm, fast fast-growing Become performance to meet the medical treatment name entity recognition system of needs, so can under minimum artificial labeled cost quick output doctor Treat name entity, can ensure that simultaneously and can reach good discrimination in mass data concentration.
Brief description
Describe its example embodiment by referring to accompanying drawing in detail, above and other feature of the disclosure and advantage will become Become apparent from.
Fig. 1 is the flow chart of embodiment of the present disclosure medical treatment name entity recognition system generation method;
Fig. 2 is from the plurality of in embodiment of the present disclosure medical treatment name entity recognition system generation method using machine learning The flow chart that medical samples of text obtains multiple candidate's medical treatment name entities;
Fig. 3 is to recommend medical treatment name by the plurality of in embodiment of the present disclosure medical treatment name entity recognition system generation method Entity inputs to described medical treatment name entity recognition system, and obtains multiple described according to this from the plurality of medical treatment samples of text The flow chart of candidate's medical treatment name entity;
Fig. 4 is the theory diagram of embodiment of the present disclosure medical treatment name entity recognition system generating means.
Specific embodiment
It is described more fully with example embodiment referring now to accompanying drawing.However, example embodiment can be with multiple shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, these embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively convey to those skilled in the art.Described feature, knot Structure or characteristic can combine in one or more embodiments in any suitable manner.In the following description, provide perhaps Many details are thus provide fully understanding of embodiment of this disclosure.It will be appreciated, however, by one skilled in the art that can Omit one of described specific detail or more to put into practice the technical scheme of the disclosure, or other sides can be adopted Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution a presumptuous guest usurps the role of the host avoiding and The each side making the disclosure thicken.
Additionally, accompanying drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.In figure identical accompanying drawing mark Note represents same or similar part, thus will omit repetition thereof.Some block diagrams shown in accompanying drawing are work( Energy entity, not necessarily must be corresponding with physically or logically independent entity.These work(can be realized using software form Energy entity, or realize these functional entitys in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.
A kind of medical treatment name entity recognition system generation method is provide firstly, with reference in Fig. 1 in this example embodiment Shown, described medical treatment name entity recognition system generation method may comprise steps of:
Step S11, receives multiple medical treatment samples of text by a medical treatment name entity recognition system, and utilizes machine learning Obtain multiple candidate's medical treatment name entities from the plurality of medical treatment samples of text.For example, can be to described medical treatment name entity Identifying system input substantial amounts of medical treatment samples of text, described medical treatment samples of text includes medical treatment name entity and non-medical in a large number Name entity, filters out multiple medical treatment name entities as candidate by way of machine learning from described medical treatment samples of text Medical treatment name entity.
Step S12, is labeled to the plurality of candidate medical treatment name entity, obtains multiple recommendation medical treatment name entities; In this example embodiment, candidate's medical treatment name entity is labeled marking out whether candidate's medical treatment name entity is real Medical treatment name entity, when candidate's medical treatment name entity is real medical treatment name entity, then can be by this candidate's medical treatment life Name entity is as recommendation medical treatment name entity.
Step S13, calculates the described ratio of number recommending medical treatment name entity and described candidate medical treatment name entity, and sentences Whether the described ratio of number that breaks is less than the first preset value;Wherein, described ratio of number is described recommendation medical treatment name entity in institute State proportion in candidate's medical treatment name entity, described first preset value can be considered the threshold value of described ratio of number, described first Preset value is higher, and the shared ratio in described candidate medical treatment name entity of described recommendation medical treatment name entity is bigger, then accordingly Finally give medical treatment name entity recognition system medical treatment name Entity recognition rate higher.For example, this example is implemented In mode, described first preset value is 85%-90%, concrete such as 86%, 88% etc., but be not limited, described first presets Value is alternatively the numerical value being less than 85% or the numerical value higher than 90%.
Step S14, when judging that described ratio of number is less than described first preset value, then illustrates described recommendation medical treatment name Entity shared ratio in described candidate medical treatment name entity is not reaching to predeterminated level, i.e. described medical treatment name Entity recognition The medical treatment name Entity recognition rate of system is too low, now can input to described medical treatment name in fact described recommendation medical treatment name entity Body identifying system, and obtain from the plurality of medical treatment samples of text according to this multiple described candidates medical treatment name entities and go to right The step that the plurality of candidate's medical treatment name entity is labeled;Thus continuous loop iteration, make medical treatment name Entity recognition system The medical treatment name Entity recognition rate of system is constantly lifted, until described ratio of number is not less than described first preset value, subsequently walks Rapid S15.
Step S15. when judging that described ratio of number is not less than described first preset value, then illustrates that medical treatment name entity is known The medical treatment name Entity recognition rate of other system meets demand, at this point it is possible to current medical name entity recognition system is as target Medical treatment name entity recognition system.
Further, with reference to Fig. 2, in this example embodiment, the described utilization machine learning in step S11 is from described many Individual medical treatment samples of text obtains multiple candidate's medical treatment name entities and may include:
Step S111, calculates the weighted value of each name entity in the plurality of medical treatment samples of text, described weighted value tool Body can be the word frequency of each described name entity;And
Step S112, chooses multiple name entities as described candidate medical treatment name entity, the weight of selected name entity Value is higher than the weighted value of not selected name entity.Thus selecting the higher name entity of multiple weighted values, weighted value is higher It may be medical treatment name entity that name entity then has higher.For example, in this example embodiment, can be in the following manner Multiple medical treatment name entities are as described candidate medical treatment name entity:
For example, it is possible to according to the size of weighted value, name entity each described is ranked up, then weight selection value is larger Multiple described name entity as described candidate medical treatment name entity.Again for example, it is also possible to preset predefined weight value, will The weighted value of each described name entity is compared with described predefined weight value, then weight selection value is not less than described predetermined power The name entity of weight values is as described candidate medical treatment name entity.
Additionally, in this example embodiment, the power of each name entity in above-mentioned calculating the plurality of medical treatment samples of text Weight values can include:
Under spark environment, calculated each in the plurality of medical treatment samples of text by N-Gram model and tf-idf algorithm The weighted value of individual name entity.In the process, window value can use the value less than 6, that is, name entity word length be 5 words it Interior.But skilled addressee readily understands that, in other exemplary embodiments of the disclosure, according to computing environment not With and demand not equal it is also possible to calculating above-mentioned weighted value by other means or being obtained by other machines learning style Take above-mentioned multiple candidate's medical treatment name entity, these all also belong to the protection domain of the disclosure.
Further, with reference to Fig. 3, in this example embodiment, described in step S14 recommends medical treatment life by the plurality of Name entity inputs to described medical treatment name entity recognition system, and obtains multiple institutes from the plurality of medical treatment samples of text according to this State candidate's medical treatment name entity and may include following steps:
Step S141, obtains special with the context of described recommendation medical treatment name entity from the plurality of medical treatment sample text Levy similar name entity and name entity as supplementing medical treatment.For example, for described in the plurality of medical treatment sample text Its contextual feature can be recommended the upper and lower of medical treatment name entity with described by the name entity beyond recommending medical treatment to name entity Civilian feature is compared, and obtains the name entity similar to the described contextual feature recommending medical treatment to name entity as supplementary doctor Treat name entity.Contextual feature due to described supplementary medical treatment name entity recommends medical treatment name entity similar to described, because This, may infer that described supplementary medical treatment name entity recommends medical treatment name entity similar to described, and then it is considered that supplement doctor Treating name entity may be real medical treatment name entity.
Step S142, increases weighted value in the plurality of medical treatment samples of text for the described supplementary medical treatment name entity, with After can reacquire described candidate medical treatment name entity, due to increased described supplementary medical treatment name entity weighted value so that The probability that described supplementary medical treatment name entity is chosen as described candidate's medical treatment name entity increases.
Step S143, weight selection value is higher than multiple name entities of other name entities as described candidate medical treatment name Entity.Described candidate's medical treatment name entity now includes described supplementary medical treatment name entity, hence in so that mark knot next time In fruit, more recommendation medical treatment name entities may be produced by supplementary medical treatment name entity.
Further, in this example embodiment, described acquisition from the plurality of medical treatment sample text is pushed away with described Recommend medical treatment name entity contextual feature similar medical treatment name entity as supplement medical treatment name entity can include following Step:
Participle is carried out to the plurality of medical treatment sample text according to preset model, obtains multiple cutting units, described cutting Unit can be the entity word obtaining after participle;In this example embodiment, described preset model can using HMM, Maximum entropy model or conditional random field models etc., do not do particular determination to this in this exemplary embodiment.
Obtain the plurality of contextual feature recommending medical treatment name entity, and respectively will be real for each described recommendation medical treatment name The contextual feature of body is expressed as primary vector, by the described contextual feature vectorization recommending medical treatment name entity, thus just Compare in quantization.For example, this process can be realized using word2vec instrument in this example embodiment, but not as Limit.
Obtain the contextual feature of the plurality of cutting unit, and respectively by the contextual feature table of each described cutting unit It is shown as secondary vector, by each described cutting unit vector, consequently facilitating quantifying to compare.For example, this example embodiment party This process can be realized using word2vec instrument in formula, but be not limited thereto.
Calculate the similarity of described primary vector and described secondary vector, and judge whether described similarity is pre- less than second If value;Described second preset value can have user's sets itself, and described second setting value is bigger, then primary vector and described second to The similarity of amount is higher, conversely, similarity is lower.
Choose the secondary vector being not less than described second preset value with the similarity of described primary vector, and will be with described the The corresponding cutting unit of the contextual feature of two vector representations is as described candidate medical treatment name entity.Thus by relatively more vectorial Similarity draws described cutting unit and the similarity of described recommendation medical treatment name entity.
Further, in described medical treatment name entity recognition system generation method, to the plurality of candidate medical treatment life While name entity is labeled, the plurality of candidate medical treatment name entity can also be classified;For example:Leukemia corresponds to Be categorized as disease, heating is corresponding to be categorized as symptom, if described candidate medical treatment name entity is meaningless word, its classification can be Meaningless class etc..
While obtaining multiple described candidate's medical treatment name entity from the plurality of medical treatment samples of text, can basis The recommendation medical treatment name entity similar to this candidate's medical treatment name entity, that is, the classification of the medical treatment name entity being marked, Classification is recommended to multiple described candidates medical name entities, thus entity and different classification are named in the plurality of candidate medical treatment Correspond to so that while generating described medical treatment name entity recognition system, also can be easy to name entity to divide described medical treatment Class.For example,
In sum, the medical treatment name entity recognition system generation method of the embodiment of the present disclosure, can be to described medical treatment name The entity recognition system input medical samples of text based on natural language in a large number, obtains multiple candidate's medical treatment lives by machine learning Name entity;Then the plurality of candidate medical treatment name entity is labeled, obtains the plurality of recommendation medical treatment name entity;With Afterwards, described ratio of number can be calculated be compared it with described first preset value, when described ratio of number is not less than described the During one preset value, illustrate that the quantity of described recommendation medical treatment name entity reaches requirement, now, can directly described medical treatment be named in fact Body identifying system is exported as target medical treatment name entity recognition system;Preset when described ratio of number is less than described first During value, then illustrate that the described quantity recommending medical treatment name entity is not up to and require, medical treatment name entity can be recommended by the plurality of Input to described medical treatment name entity recognition system and recommend medical treatment name entity from the plurality of medical treatment literary composition according to the plurality of Obtain multiple described candidate's medical treatment name entities in this sample and be labeled again, obtain more described recommendation medical treatment names Entity, iteration that the rest may be inferred, until when described ratio of number is not less than described first preset value, that is, described recommendation medical treatment name is real When the quantity of body not up to requires, can be using described medical treatment name entity recognition system as target medical treatment name entity recognition system Exported.
In above process, the medical treatment name entity of requirement can be reached according to automatic mining quantity in medical sample text, I.e. significant medical treatment name entity, decreases artificial mark, reduces human cost, and can continuous iteration, reduce manually Operation.Thus, output significant medical treatment name entity can quickly be excavated from a large amount of natural language texts.
According to the another aspect of disclosure embodiment, provide a kind of medical treatment name entity recognition system generating means, ginseng Shown in Fig. 4, described medical treatment name entity recognition system generating means include cold start-up unit 10, mark unit 20, assessment Unit 30, feedback unit 40 and output unit 50.Wherein:
Cold start-up unit 10 can be used for receiving multiple medical treatment samples of text by a medical treatment name entity recognition system, and Obtain multiple candidate's medical treatment name entities using machine learning from the plurality of medical treatment samples of text.
Mark unit 20 can be used for the plurality of candidate medical treatment name entity is labeled, and obtains multiple recommendation medical treatment Name entity.
Assessment unit 30 can be used for calculating the described number recommending medical treatment name entity and described candidate medical treatment name entity The ratio of amount, and judge whether described ratio of number is less than the first preset value.
Feedback unit 40 can be used for, when judging that described ratio of number is less than described first preset value, recommending doctor by described Treat name entity to input to described medical treatment name entity recognition system, and obtain many according to this from the plurality of medical treatment samples of text Individual described candidate's medical treatment name entity simultaneously feeds back to described mark unit.
Output unit 50 can be used for when judging that described ratio of number is not less than described first preset value, with current medical Name entity recognition system is as target medical treatment name entity recognition system.
It should be noted that medical treatment name entity recognition system generating means described above implement details and beneficial Effect has carried out wanting in detail to describe in corresponding described medical treatment name entity recognition system generation method, thus no longer superfluous State.
Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to its of the disclosure Its embodiment.The application is intended to any modification, purposes or the adaptations of the disclosure, these modifications, purposes or Person's adaptations are followed the general principle of the disclosure and are included the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments be considered only as exemplary, the true scope of the disclosure and spirit by following Claim is pointed out.
It should be appreciated that the disclosure is not limited to be described above and precision architecture illustrated in the accompanying drawings, and And various modifications and changes can carried out without departing from the scope.The scope of the present disclosure only to be limited by appended claim.

Claims (10)

1. a kind of medical treatment name entity recognition system generation method is it is characterised in that include:
Multiple medical treatment samples of text are received by a medical treatment name entity recognition system, and using machine learning from the plurality of doctor Treat samples of text and obtain multiple candidate's medical treatment name entities;
The plurality of candidate medical treatment name entity is labeled, obtains multiple recommendation medical treatment name entities;
Calculate described recommend medical treatment name entity and described candidate medical treatment name entity ratio of number, and judge described quantity it Than whether it is less than the first preset value;
When judging that described ratio of number is less than described first preset value, medical treatment name entity is recommended to input to described doctor by described Treat name entity recognition system, and obtain multiple described candidate's medical treatment name entities according to this from the plurality of medical treatment samples of text And go to the step that the plurality of candidate medical treatment name entity is labeled;
When judging that described ratio of number is not less than described first preset value, using current medical name entity recognition system as mesh Mark medical treatment name entity recognition system.
2. medical treatment name entity recognition system generation method according to claim 1 is it is characterised in that described utilization machine Study obtains multiple candidate's medical treatment name entities from the plurality of medical treatment samples of text and includes:
Calculate the weighted value of each name entity in the plurality of medical treatment samples of text;
Weight selection value highest multiple name entity is as described candidate medical treatment name entity.
3. medical treatment name entity recognition system generation method according to claim 2 is it is characterised in that calculate the plurality of In medical samples of text, the weighted value of each name entity includes:
Under spark environment, each life in the plurality of medical treatment samples of text is calculated by N-Gram algorithm and tf-idf algorithm The weighted value of name entity.
4. according to claim 2 or described medical treatment name entity recognition system generation method it is characterised in that described will be described Multiple recommendation medical treatment name entities input to described medical treatment name entity recognition system, and according to this from the plurality of medical text sample Obtain multiple described candidate's medical treatment name entities in this to include:
Obtain the name similar to the described contextual feature recommending medical treatment to name entity from the plurality of medical treatment sample text Entity names entity as supplementing medical treatment;
Increase weighted value in the plurality of medical treatment samples of text for the described supplementary medical treatment name entity;
Weight selection value highest multiple name entity is as described candidate medical treatment name entity.
5. according to claim 4 medical treatment name entity recognition system generation method it is characterised in that described from described many The name entity similar to the described contextual feature recommending medical treatment to name entity is obtained as supplement in individual medical treatment sample text Medical treatment name entity includes:
Participle is carried out to the plurality of medical treatment sample text according to preset model, obtains multiple cutting units;
Obtain the plurality of contextual feature recommending medical treatment name entity, and respectively by each described recommendation medical treatment name entity Contextual feature is expressed as primary vector;
Obtain the contextual feature of the plurality of cutting unit, and respectively the contextual feature of each described cutting unit is expressed as Secondary vector;
Calculate the similarity of described primary vector and described secondary vector, and judge whether described similarity is preset less than second Value;
Choose the secondary vector being not less than described second preset value with the similarity of described primary vector, and will with described second to The corresponding cutting unit of contextual feature that amount represents is as described candidate medical treatment name entity.
6. medical treatment name entity recognition system generation method according to claim 5 is it is characterised in that described preset model For HMM.
7. medical treatment name entity recognition system generation method according to claim 5 it is characterised in that wherein, is passed through Each described contextual feature recommending medical treatment name entity is expressed as primary vector and by each described cutting list by word2vec The contextual feature of unit is expressed as secondary vector.
8. the medical treatment name entity recognition system generation method according to any one of claim 1-6 is it is characterised in that described First preset value is 85%-90%.
9. according to any one of claim 1-6 medical treatment name entity recognition system generation method it is characterised in that its In, while the plurality of candidate medical treatment name entity is labeled, to the described recommendation medical treatment name entity being marked Classified;
While obtaining described candidate's medical treatment name entity from the plurality of medical treatment samples of text, according to candidate described with this The classification of the similar described recommendation medical treatment name entity of medical treatment name entity recommends classification to candidate's medical treatment name entity this described.
10. a kind of medical treatment name entity recognition system generating means are it is characterised in that include:
Cold start-up unit, for receiving multiple medical treatment samples of text by a medical treatment name entity recognition system, and utilizes machine Study obtains multiple candidate's medical treatment name entities from the plurality of medical treatment samples of text;
Mark unit, for being labeled to the plurality of candidate medical treatment name entity, obtains multiple recommendation medical treatment name entities;
Assessment unit, for calculating the described ratio of number recommending medical treatment name entity and described candidate medical treatment name entity, and Judge whether described ratio of number is less than the first preset value;
Feedback unit, for when judging that described ratio of number is less than described first preset value, recommending medical treatment name real by described Body inputs to described medical treatment name entity recognition system, and obtains multiple described times from the plurality of medical treatment samples of text according to this Choosing medical treatment name entity simultaneously feeds back to described mark unit;
Output unit, for when judging that described ratio of number is not less than described first preset value, naming entity with current medical Identifying system is as target medical treatment name entity recognition system.
CN201610864046.7A 2016-09-28 2016-09-28 Medical treatment name entity recognition system generation method and device Active CN106407183B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610864046.7A CN106407183B (en) 2016-09-28 2016-09-28 Medical treatment name entity recognition system generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610864046.7A CN106407183B (en) 2016-09-28 2016-09-28 Medical treatment name entity recognition system generation method and device

Publications (2)

Publication Number Publication Date
CN106407183A true CN106407183A (en) 2017-02-15
CN106407183B CN106407183B (en) 2019-06-28

Family

ID=59229294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610864046.7A Active CN106407183B (en) 2016-09-28 2016-09-28 Medical treatment name entity recognition system generation method and device

Country Status (1)

Country Link
CN (1) CN106407183B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897559A (en) * 2017-02-24 2017-06-27 黑龙江特士信息技术有限公司 A kind of symptom and sign class entity recognition method and device towards multi-data source
CN106919793A (en) * 2017-02-24 2017-07-04 黑龙江特士信息技术有限公司 A kind of data standardization processing method and device of medical big data
CN107168946A (en) * 2017-04-14 2017-09-15 北京化工大学 A kind of name entity recognition method of medical text data
CN107992511A (en) * 2017-10-18 2018-05-04 东软集团股份有限公司 Index establishing method, device, storage medium and the electronic equipment of medical data table
CN108763348A (en) * 2018-05-15 2018-11-06 南京邮电大学 A kind of classification improved method of extension short text word feature vector
CN111090338A (en) * 2019-12-11 2020-05-01 心医国际数字医疗***(大连)有限公司 Training method of HMM (hidden Markov model) input method model of medical document, input method model and input method
CN111462913A (en) * 2020-03-11 2020-07-28 云知声智能科技股份有限公司 Automatic segmentation method and device for disease diagnosis in case document
CN111814447A (en) * 2020-06-24 2020-10-23 平安科技(深圳)有限公司 Electronic case duplicate checking method and device based on word segmentation text and computer equipment
CN112487195A (en) * 2019-09-12 2021-03-12 医渡云(北京)技术有限公司 Entity sorting method, device, medium and electronic equipment
CN112949306A (en) * 2019-12-10 2021-06-11 医渡云(北京)技术有限公司 Named entity recognition model creation method, device, equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN103136361A (en) * 2013-03-07 2013-06-05 陈一飞 Semi-supervised extracting method for protein interrelation in biological text
CN103268339A (en) * 2013-05-17 2013-08-28 中国科学院计算技术研究所 Recognition method and system of named entities in microblog messages
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN105938495A (en) * 2016-04-29 2016-09-14 乐视控股(北京)有限公司 Entity relationship recognition method and apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN103136361A (en) * 2013-03-07 2013-06-05 陈一飞 Semi-supervised extracting method for protein interrelation in biological text
CN103268339A (en) * 2013-05-17 2013-08-28 中国科学院计算技术研究所 Recognition method and system of named entities in microblog messages
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN105938495A (en) * 2016-04-29 2016-09-14 乐视控股(北京)有限公司 Entity relationship recognition method and apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任育伟 等: "搜索日志中命名实体识别", 《现代图书情报技术》 *
黄诗琳 等: "针对产品命名实体识别的半监督学习方法", 《北京邮电大学学报》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919793B (en) * 2017-02-24 2019-12-06 黑龙江特士信息技术有限公司 Data standardization processing method and device for medical big data
CN106919793A (en) * 2017-02-24 2017-07-04 黑龙江特士信息技术有限公司 A kind of data standardization processing method and device of medical big data
CN106897559A (en) * 2017-02-24 2017-06-27 黑龙江特士信息技术有限公司 A kind of symptom and sign class entity recognition method and device towards multi-data source
CN107168946A (en) * 2017-04-14 2017-09-15 北京化工大学 A kind of name entity recognition method of medical text data
CN107992511A (en) * 2017-10-18 2018-05-04 东软集团股份有限公司 Index establishing method, device, storage medium and the electronic equipment of medical data table
CN108763348B (en) * 2018-05-15 2022-05-03 南京邮电大学 Classification improvement method for feature vectors of extended short text words
CN108763348A (en) * 2018-05-15 2018-11-06 南京邮电大学 A kind of classification improved method of extension short text word feature vector
CN112487195A (en) * 2019-09-12 2021-03-12 医渡云(北京)技术有限公司 Entity sorting method, device, medium and electronic equipment
CN112487195B (en) * 2019-09-12 2023-06-27 医渡云(北京)技术有限公司 Entity ordering method, entity ordering device, entity ordering medium and electronic equipment
CN112949306A (en) * 2019-12-10 2021-06-11 医渡云(北京)技术有限公司 Named entity recognition model creation method, device, equipment and readable storage medium
CN112949306B (en) * 2019-12-10 2024-04-30 医渡云(北京)技术有限公司 Named entity recognition model creation method, device, equipment and readable storage medium
CN111090338A (en) * 2019-12-11 2020-05-01 心医国际数字医疗***(大连)有限公司 Training method of HMM (hidden Markov model) input method model of medical document, input method model and input method
CN111090338B (en) * 2019-12-11 2021-08-27 心医国际数字医疗***(大连)有限公司 Training method of HMM (hidden Markov model) input method model of medical document, input method model and input method
CN111462913A (en) * 2020-03-11 2020-07-28 云知声智能科技股份有限公司 Automatic segmentation method and device for disease diagnosis in case document
CN111462913B (en) * 2020-03-11 2023-08-15 云知声智能科技股份有限公司 Automatic segmentation method and device for disease diagnosis in case document
CN111814447A (en) * 2020-06-24 2020-10-23 平安科技(深圳)有限公司 Electronic case duplicate checking method and device based on word segmentation text and computer equipment
CN111814447B (en) * 2020-06-24 2022-05-27 平安科技(深圳)有限公司 Electronic case duplicate checking method and device based on word segmentation text and computer equipment

Also Published As

Publication number Publication date
CN106407183B (en) 2019-06-28

Similar Documents

Publication Publication Date Title
CN106407183A (en) Method and device for generating medical named entity recognition system
US10769552B2 (en) Justifying passage machine learning for question and answer systems
Höffner et al. Survey on challenges of question answering in the semantic web
US9621601B2 (en) User collaboration for answer generation in question and answer system
US9558264B2 (en) Identifying and displaying relationships between candidate answers
US9836457B2 (en) Machine translation method for performing translation between languages
WO2020123723A1 (en) System and method for providing health information
CN108565019A (en) Multidisciplinary applicable clinical examination combined recommendation method and device
Song et al. Leveraging dependency forest for neural medical relation extraction
Martinez et al. Information extraction from pathology reports in a hospital setting
US20170169355A1 (en) Ground Truth Improvement Via Machine Learned Similar Passage Detection
WO2018188981A1 (en) Drawing conclusions from free form texts with deep reinforcement learning
Lenz et al. Towards an argument mining pipeline transforming texts to argument graphs
CN105138829A (en) Natural language processing method and system for Chinese diagnosis and treatment information
CN112347781A (en) Generating or modifying ontologies representing relationships within input data
Mondal et al. Wme: Sense, polarity and affinity based concept resource for medical events
Pruneski et al. Natural language processing: using artificial intelligence to understand human language in orthopedics
Basile et al. Uniba at evalita 2014-sentipolc task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features
Pérez et al. Computer aided classification of diagnostic terms in spanish
CN114662477A (en) Stop word list generating method and device based on traditional Chinese medicine conversation and storage medium
Ding et al. Leveraging text and knowledge bases for triple scoring: an ensemble approach-the Bokchoy triple scorer at WSDM Cup 2017
Attardi et al. A Resource and Tool for Super-sense Tagging of Italian Texts.
Cercone et al. Finding best evidence for evidence-based best practice recommendations in health care: the initial decision support system design
CN108573025B (en) Method and device for extracting sentence classification characteristics based on mixed template
Boufrida et al. Automatic rules extraction from medical texts

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant