CN106407183B - Medical treatment name entity recognition system generation method and device - Google Patents

Medical treatment name entity recognition system generation method and device Download PDF

Info

Publication number
CN106407183B
CN106407183B CN201610864046.7A CN201610864046A CN106407183B CN 106407183 B CN106407183 B CN 106407183B CN 201610864046 A CN201610864046 A CN 201610864046A CN 106407183 B CN106407183 B CN 106407183B
Authority
CN
China
Prior art keywords
medical treatment
name entity
treatment name
medical
recognition system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610864046.7A
Other languages
Chinese (zh)
Other versions
CN106407183A (en
Inventor
陈成
康波
稽可睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Medical Cross Cloud (beijing) Technology Co Ltd
Original Assignee
Medical Cross Cloud (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medical Cross Cloud (beijing) Technology Co Ltd filed Critical Medical Cross Cloud (beijing) Technology Co Ltd
Priority to CN201610864046.7A priority Critical patent/CN106407183B/en
Publication of CN106407183A publication Critical patent/CN106407183A/en
Application granted granted Critical
Publication of CN106407183B publication Critical patent/CN106407183B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • G06F19/326

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure proposes that entity recognition system generation method and device are named in a kind of medical treatment, and the medical treatment name entity recognition system generation method includes: to pass through the multiple medical samples of text of a medical treatment name entity recognition system reception and obtain multiple candidate medical name entities from multiple medical samples of text;Multiple candidate medical treatment name entities are labeled, multiple recommendation medical treatment name entities are obtained;The ratio of number for recommending medical treatment name entity and candidate medical treatment name entity is calculated, and judges ratio of number whether less than the first preset value;If ratio of number is less than the first preset value, medical treatment name entity will be recommended to be input to medical treatment name entity recognition system, and obtain multiple candidate medical treatment name entities from multiple medical samples of text accordingly and go to the step of being labeled to multiple candidate medical treatment name entities;If ratio of number is not less than the first preset value, entity recognition system is named to name entity recognition system as target medical treatment using current medical.

Description

Medical treatment name entity recognition system generation method and device
Technical field
This disclosure relates to medical big data technical field more particularly to a kind of medical treatment name entity recognition system generation method And medical treatment name entity recognition system generating means.
Background technique
In medical procedure, a large amount of medical data can be generated, main includes case history, the doctor's advice, Nursing writs, inspection of patient Finding and inspection conclusion etc., these data reflect essential information, clinical diagnosis, therapeutic process and the result of patient.With doctor The foundation of System information and perfect is treated, more and more medical datas switch to electronic typing by the mode of manual record.Mesh Before, for clinical information such as case history, doctor's advice, Nursing writs and audit reports mainly by healthcare givers by way of natural language It writes, message structure is complex.Thus how these a large amount of unstructured datas handled, analyzed and excavation is The major issue of medical information construction.Wherein, it is essential for carrying out medical treatment name Entity recognition.
In the prior art, three kinds are generally comprised to the recognition methods of name entity: method based on dictionary, based on heuristic Regular method and the method based on machine learning.First two method has very strong dependence to dictionary or rule, and in Chinese Aspect, available resource are relatively deficient.In addition, for magnanimity medical treatment natural language text, due to different medical people The literary style of member is different, so that the same medical treatment name entity usually has a variety of literary styles.And the method based on machine learning is logical It is all often to need a large amount of artificial marks to can be only achieved certain effect using the method for having supervision.Therefore, how in a large amount of natural languages The significant medical treatment name entity of output is quickly excavated in speech text to be a technical problem to be solved urgently.
Above- mentioned information are only used for reinforcing the understanding to the background of the disclosure, therefore it disclosed in the background technology part It may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The disclosure is designed to provide a kind of medical treatment name entity recognition system generation method and medical name entity knowledge Other system generating means, so overcome caused by the limitation and defect due to the relevant technologies at least to a certain extent one or The multiple problems of person.
According to one aspect of the disclosure, a kind of medical treatment name entity recognition system generation method is provided, comprising:
Multiple medical samples of text are received by a medical treatment name entity recognition system, and using machine learning from described more A medical treatment samples of text obtains multiple candidate medical treatment name entities;
The multiple candidate medical treatment name entity is labeled, multiple recommendation medical treatment name entities are obtained;
The ratio of number for recommending medical treatment name entity and the candidate medical treatment name entity is calculated, and judges the number Whether the ratio between amount is less than the first preset value;
When judging that the ratio of number is less than first preset value, the recommendation medical treatment name entity is input to institute Medical treatment name entity recognition system is stated, and obtains multiple candidate medical treatment names from the multiple medical samples of text accordingly Entity simultaneously goes to the step of being labeled to the multiple candidate medical treatment name entity;
When judging the ratio of number not less than first preset value, made with current medical name entity recognition system Entity recognition system is named for target medical treatment.
It is described to be obtained using machine learning from the multiple medical samples of text in a kind of exemplary embodiment of the disclosure Multiple candidate medical treatment name entities are taken to include:
Calculate the weighted value of each name entity in the multiple medical samples of text;
The highest multiple name entities of weight selection value are as the candidate medical treatment name entity.
In a kind of exemplary embodiment of the disclosure, each name entity in the multiple medical samples of text is calculated Weighted value includes:
Under spark environment, calculated by N-Gram algorithm and tf-idf algorithm each in the multiple medical samples of text The weighted value of a name entity.
In a kind of exemplary embodiment of the disclosure, it is described the multiple recommendation medical treatment name entity is input to it is described Medical treatment name entity recognition system, and it is real to obtain multiple candidate medical treatment names from the multiple medical samples of text accordingly Body includes:
It is obtained from the multiple medical sample text similar with the recommendation medical treatment name contextual feature of entity Name entity as supplement medical treatment name entity;
Increase weighted value of the supplement medical treatment name entity in the multiple medical samples of text;
The highest multiple name entities of weight selection value are as the candidate medical treatment name entity.
In a kind of exemplary embodiment of the disclosure, described obtain from the multiple medical sample text pushes away with described The similar name entity of contextual feature of medical treatment name entity, which is recommended, as supplement medical treatment names the entity to include:
The multiple medical sample text is segmented according to preset model, obtains multiple cutting units;
The multiple contextual feature for recommending medical treatment name entity is obtained, and respectively that each recommendation medical treatment name is real The contextual feature of body is expressed as primary vector;
The contextual feature of the multiple cutting unit is obtained, and respectively by the contextual feature table of each cutting unit It is shown as secondary vector;
The similarity of the primary vector Yu the secondary vector is calculated, and judges whether the similarity is pre- less than second If value;
Choose the secondary vector for being not less than second preset value with the similarity of the primary vector, and will be with described the The contextual feature that two vectors indicate corresponds to cutting unit as the candidate medical treatment name entity.
In a kind of exemplary embodiment of the disclosure, the preset model is Hidden Markov Model.
In a kind of exemplary embodiment of the disclosure, wherein by word2vec that each recommendation medical treatment name is real The contextual feature of body is expressed as primary vector and the contextual feature of each cutting unit is expressed as secondary vector.
In a kind of exemplary embodiment of the disclosure, first preset value is 85%-90%.
In a kind of exemplary embodiment of the disclosure, wherein marked to the multiple candidate medical treatment name entity While note, classify to the recommendation medical treatment name entity being marked;
While obtaining the candidate medical treatment name entity from the multiple medical samples of text, according to this Candidate medical treatment names the similar classification for recommending medical treatment name entity of entity to recommend the candidate medical name entity Classification.
According to one aspect of the disclosure, a kind of medical treatment name entity recognition system generating means are provided, comprising:
It is cold-started unit, for receiving multiple medical samples of text by a medical treatment name entity recognition system, and is utilized Machine learning obtains multiple candidate medical treatment name entities from the multiple medical samples of text;
Unit is marked, for being labeled to the multiple candidate medical treatment name entity, obtains multiple recommendation medical treatment names Entity;
Assessment unit, for calculate it is described recommend the quantity of medical treatment name entity and the candidate medical treatment name entity it Than, and judge the ratio of number whether less than the first preset value;
Feedback unit, for when judging that the ratio of number is less than first preset value, the recommendation medical treatment to be ordered Name entity is input to the medical treatment name entity recognition system, and obtains multiple institutes from the multiple medical samples of text accordingly It states candidate medical treatment name entity and feeds back to the mark unit;
Output unit, for being named with current medical when judging the ratio of number not less than first preset value Entity recognition system names entity recognition system as target medical treatment.
The medical treatment name entity recognition system generation method and device of the disclosure, by naming entity recognition system to medical treatment The largely medical samples of text based on natural language is inputted, machine learning is utilized to obtain multiple candidate medical treatment name entities;Then Multiple candidate medical treatment name entities are labeled, multiple recommendation medical treatment name entities are obtained;Then, recommendation medical treatment life can be calculated It is simultaneously compared by the ratio of number of name entity and candidate medical treatment name entity with the first preset value, when ratio of number is not less than When the first preset value, illustrate that the performance of medical treatment name entity recognition system has met needs, at this point, can directly name medical treatment Entity recognition system is exported as target medical treatment name entity recognition system;When ratio of number is less than the first preset value, Then illustrate the performance still unsatisfied desire of medical treatment name entity recognition system, multiple recommendation medical treatment name entities can be input to doctor It treats name entity recognition system and obtains multiple candidates from multiple medical samples of text according to multiple recommendation medical treatment name entities Medical treatment name and is labeled entity again, is obtained more and is recommended medical treatment name entities, and so on iteration, up to quantity it When than being not less than the first preset value, i.e. the performance of medical treatment name entity recognition system has met when needing, and can name medical treatment Entity recognition system is exported as target medical treatment name entity recognition system.
In above process, it in conjunction with machine learning and artificial mark, namely combines non-supervisory and has supervision algorithm, fast fast-growing Become second nature and be able to satisfy the medical treatment name entity recognition system of needs, so can under minimum artificial labeled cost the quick doctor of output Name entity is treated, while can guarantee to concentrate in mass data and can achieve good discrimination.
Detailed description of the invention
Its example embodiment is described in detail by referring to accompanying drawing, the above and other feature and advantage of the disclosure will become It is more obvious.
Fig. 1 is the flow chart of embodiment of the present disclosure medical treatment name entity recognition system generation method;
Fig. 2 is using machine learning in embodiment of the present disclosure medical treatment name entity recognition system generation method from the multiple Medical samples of text obtains the flow chart of multiple candidate medical treatment name entities;
Fig. 3 is in embodiment of the present disclosure medical treatment name entity recognition system generation method by the multiple recommendation medical treatment name Entity is input to the medical treatment name entity recognition system, and obtains from the multiple medical samples of text accordingly multiple described The flow chart of candidate medical treatment name entity;
Fig. 4 is the functional block diagram of embodiment of the present disclosure medical treatment name entity recognition system generating means.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and So that all aspects of this disclosure thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.
Provided firstly in this example embodiment it is a kind of medical treatment name entity recognition system generation method, referring to Fig.1 in Shown, the medical treatment name entity recognition system generation method may comprise steps of:
Step S11 receives multiple medical samples of text by a medical treatment name entity recognition system, and utilizes machine learning Multiple candidate medical treatment name entities are obtained from the multiple medical samples of text.It for example, can be to the medical treatment name entity Identifying system input a large amount of medical samples of text includes a large amount of medical treatment name entities in the medical treatment samples of text and non-medical Entity is named, filters out multiple medical treatment name entities from the medical samples of text by way of machine learning as candidate Medical treatment name entity.
Step S12 is labeled the multiple candidate medical treatment name entity, obtains multiple recommendation medical treatment name entities; In this example embodiment, candidate medical treatment name entity is labeled and marks out whether candidate medical treatment name entity is real Medical treatment name entity then can be by candidate medical treatment life when candidate medical treatment name entity is real medical treatment name entity Name entity is as recommendation medical treatment name entity.
Step S13 calculates the ratio of number for recommending medical treatment name entity and the candidate medical treatment name entity, and sentences Whether the ratio of number that breaks is less than the first preset value;Wherein, the ratio of number, that is, recommendation medical treatment name entity is in institute State proportion in candidate medical treatment name entity, first preset value can be considered the threshold value of the ratio of number, described first Preset value is higher, and the ratio for recommending medical treatment name entity shared in the candidate medical treatment name entity is bigger, then accordingly It is finally obtained medical treatment name entity recognition system medical treatment name Entity recognition rate it is higher.For example, this example is implemented In mode, first preset value is 85%-90%, and specific such as 86%, 88%, but not limited to this, and described first is default Value can also be the numerical value lower than 85% or the numerical value higher than 90%.
Step S14 then illustrates the recommendation medical treatment name when judging that the ratio of number is less than first preset value Entity ratio shared in the candidate medical treatment name entity does not reach predeterminated level, i.e., the described medical treatment name Entity recognition The medical treatment name Entity recognition rate of system is too low, and it is real the recommendation medical treatment name entity can be input to the medical treatment name at this time Body identifying system, and obtain multiple candidate medical treatment name entities from the multiple medical samples of text accordingly and go to pair The step of the multiple candidate medical treatment name entity is labeled;To continuous loop iteration, make medical treatment name Entity recognition system The medical treatment name Entity recognition rate of system is constantly promoted, until the ratio of number is not less than first preset value, i.e., subsequent step Rapid S15.
Step S15. then illustrates that medical treatment name entity is known when judging the ratio of number not less than first preset value The medical treatment name Entity recognition rate meet demand of other system, at this point it is possible to which current medical names entity recognition system as target Medical treatment name entity recognition system.
Further, referring to Fig. 2, in this example embodiment, described in step S11 is using machine learning from described more A medical treatment samples of text obtains multiple candidate medical treatment name entities can include:
Step S111 calculates the weighted value of each name entity in the multiple medical samples of text, the weighted value tool Body can be the word frequency of each name entity;And
Step S112 chooses multiple name entities as the candidate medical treatment name entity, the weight of selected name entity Value is higher than the weighted value of not selected name entity.To select the higher name entity of multiple weighted values, weighted value is higher Name entity, which then has higher, to be medical treatment name entity.It for example, can be in the following manner in this example embodiment Multiple medical treatment name entities are as the candidate medical treatment name entity:
For example, can be ranked up according to the size of weighted value to each name entity, then weight selection value is larger Multiple name entities as the candidate medical treatment name entity.For another example predefined weight value can also be preset, it will The weighted value of each name entity is compared with the predefined weight value, then weight selection value is not less than the predetermined power The name entity of weight values is as the candidate medical treatment name entity.
In addition, in this example embodiment, the power of each name entity in the multiple medical samples of text of above-mentioned calculating Weight values may include:
Under spark environment, calculated by N-Gram model and tf-idf algorithm each in the multiple medical samples of text The weighted value of a name entity.In the process, the desirable value less than 6 of window value, i.e. the length of name entity word be 5 words it It is interior.But it will be readily appreciated by those skilled in the art that in other exemplary embodiments of the disclosure, according to calculating environment not Same and difference of demand etc., can also calculate above-mentioned weighted value by other means or be obtained by other machines mode of learning Above-mentioned multiple candidate medical treatment name entities are taken, these also belong to the protection scope of the disclosure.
Further, referring to Fig. 3, in this example embodiment, described in step S14 orders the multiple recommendations medical treatment Name entity is input to the medical treatment name entity recognition system, and obtains multiple institutes from the multiple medical samples of text accordingly Stating candidate medical treatment name entity can comprise the following steps that
Step S141 is obtained special with the context for recommending medical treatment name entity from the multiple medical sample text Similar name entity is levied as supplement medical treatment name entity.For example, for described in the multiple medical sample text Recommend medical treatment name entity other than name entity, can by its contextual feature and it is described recommend medical treatment name entity up and down Literary feature is compared, and obtains name entity similar with the recommendation medical treatment name contextual feature of entity as supplement doctor Treat name entity.Since the contextual feature of the supplement medical treatment name entity is similar to the recommendation medical treatment name entity, because This, may infer that the supplement medical treatment name entity is similar to the recommendation medical treatment name entity, and then can consider supplement doctor Entity may be named for really medical treatment by treating name entity.
Step S142 increases weighted value of the supplement medical treatment name entity in the multiple medical samples of text, with After can reacquire the candidate medical treatment name entity, due to increasing the weighted value of the supplement medical treatment name entity so that The probability that the supplement medical treatment name entity is chosen as the candidate medical treatment name entity increases.
Step S143, weight selection value are higher than multiple name entities of other name entities as the candidate medical treatment name Entity.The candidate medical treatment name entity at this time includes the supplement medical treatment name entity, so that mark knot next time In fruit, it may be generated by supplement medical treatment name entity and more recommend medical treatment name entity.
Further, in this example embodiment, described obtain from the multiple medical sample text pushes away with described It may include following that the similar medical treatment name entity of contextual feature for recommending medical treatment name entity, which is used as supplement medical treatment name entity, Step:
The multiple medical sample text is segmented according to preset model, obtains multiple cutting units, the cutting Unit can be the entity word obtained after segmenting;In this example embodiment, the preset model can be used Hidden Markov Model, Maximum entropy model or conditional random field models etc., do not do particular determination to this in the present exemplary embodiment.
The multiple contextual feature for recommending medical treatment name entity is obtained, and respectively that each recommendation medical treatment name is real The contextual feature of body is expressed as primary vector, by the contextual feature vectorization for recommending medical treatment name entity, thus just Compare in quantization.For example, in this example embodiment can be used word2vec tool realize the process, but not as Limit.
The contextual feature of the multiple cutting unit is obtained, and respectively by the contextual feature table of each cutting unit It is shown as secondary vector, by each cutting unit vector, consequently facilitating quantization is compared.For example, this example embodiment party It word2vec tool can be used to realize the process in formula, but be not limited thereto.
The similarity of the primary vector Yu the secondary vector is calculated, and judges whether the similarity is pre- less than second If value;Second preset value can have user's sets itself, and second setting value is bigger, then primary vector and described second to The similarity of amount is higher, conversely, similarity is lower.
Choose the secondary vector for being not less than second preset value with the similarity of the primary vector, and will be with described the The contextual feature that two vectors indicate corresponds to cutting unit as the candidate medical treatment name entity.To by comparing vector Similarity obtains the cutting unit and the similarity for recommending medical treatment name entity.
Further, in the medical treatment name entity recognition system generation method, to the multiple candidate medical treatment life While name entity is labeled, it can also classify to the multiple candidate medical treatment name entity;Such as: leukaemia is corresponding Be classified as illness, generate heat it is corresponding is classified as symptom, if the candidate medical treatment name entity is meaningless word, classification can be Meaningless class etc..
It, can basis while obtaining multiple candidate medical treatment name entities from the multiple medical samples of text Recommendation medical treatment name entity similar with candidate medical treatment name entity, that is, the classification for the medical treatment name entity being marked, Classification is recommended to multiple candidate medical treatment name entities, thus by the multiple candidate medical treatment name entity and different classification It is corresponding, so that can also be convenient for dividing the medical treatment name entity while generating the medical treatment name entity recognition system Class.For example,
In conclusion the medical treatment name entity recognition system generation method of the embodiment of the present disclosure, it can be to the medical treatment name Entity recognition system inputs the largely medical samples of text based on natural language, obtains multiple candidate medical treatment lives by machine learning Name entity;Then the multiple candidate medical treatment name entity is labeled, obtains the multiple recommendation medical treatment name entity;With Afterwards, the ratio of number can be calculated and be compared it with first preset value, when the ratio of number is not less than described the When one preset value, illustrate that the quantity for recommending medical treatment name entity reaches requirement, at this point, can be directly real by the medical treatment name Body identifying system is exported as target medical treatment name entity recognition system;It is preset when the ratio of number is less than described first When value, then illustrate that the quantity for recommending medical treatment name entity not up to requires, it can be by the multiple recommendation medical treatment name entity It is input to the medical treatment name entity recognition system and names entity from the multiple medical treatment text according to the multiple recommendation medical treatment Multiple candidate medical treatment name entities are obtained in this sample and are labeled again, more described recommendation medical treatment names are obtained Entity, and so on iteration, until that is, described recommendation medical treatment name is real when the ratio of number is not less than first preset value When the quantity of body not up to requires, entity recognition system can be named using the medical treatment name entity recognition system as target medical treatment It is exported.
In above process, the medical treatment name entity of requirement can be reached according to automatic mining quantity in medical sample text, I.e. significant medical treatment name entity, reduces artificial mark, reduces human cost, and can continuous iteration, reduce artificial Operation.It can quickly be excavated from a large amount of natural language texts as a result, and export significant medical treatment name entity.
According to the another aspect of disclosure embodiment, a kind of medical treatment name entity recognition system generating means, ginseng are provided According to shown in Fig. 4, the medical treatment name entity recognition system generating means include cold start-up unit 10, mark unit 20, assessment Unit 30, feedback unit 40 and output unit 50.Wherein:
Cold start-up unit 10 can be used for receiving multiple medical samples of text by a medical treatment name entity recognition system, and Multiple candidate medical treatment name entities are obtained from the multiple medical samples of text using machine learning.
Mark unit 20 can be used for being labeled the multiple candidate medical treatment name entity, obtain multiple recommendation medical treatment Name entity.
Assessment unit 30 can be used for calculating the number for recommending medical treatment name entity and the candidate medical treatment name entity The ratio between amount, and judge the ratio of number whether less than the first preset value.
Feedback unit 40 can be used for curing the recommendation when judging that the ratio of number is less than first preset value It treats name entity and is input to the medical treatment name entity recognition system, and obtained from the multiple medical samples of text accordingly more A candidate medical treatment name entity simultaneously feeds back to the mark unit.
Output unit 50 can be used for when judging the ratio of number not less than first preset value, with current medical Entity recognition system is named to name entity recognition system as target medical treatment.
It should be noted that medical treatment name entity recognition system generating means specific implementation details and beneficial described above Effect has carried out wanting to describe in detail in the corresponding medical treatment name entity recognition system generation method, thus no longer superfluous It states.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims (10)

1. a kind of medical treatment name entity recognition system generation method characterized by comprising
Multiple medical samples of text are received by a medical treatment name entity recognition system, and utilize machine learning from the multiple doctor It treats samples of text and obtains multiple candidate medical treatment name entities;
The multiple candidate medical treatment name entity is labeled, multiple recommendation medical treatment name entities are obtained;
Calculate the ratio of number for recommending medical treatment name entity and the candidate medical treatment name entity, and judge the quantity it Than whether less than the first preset value;
When judging that the ratio of number is less than first preset value, the recommendation medical treatment name entity is input to the doctor Name entity recognition system is treated, and obtains multiple candidate medical treatment name entities from the multiple medical samples of text accordingly And go to the step of being labeled to the multiple candidate medical treatment name entity;
When judging the ratio of number not less than first preset value, name entity recognition system as mesh using current medical Mark medical treatment name entity recognition system.
2. medical treatment name entity recognition system generation method according to claim 1, which is characterized in that described to utilize machine Study obtains multiple candidate medical treatment name entities from the multiple medical samples of text
Calculate the weighted value of each name entity in the multiple medical samples of text;
The highest multiple name entities of weight selection value are as the candidate medical treatment name entity.
3. medical treatment name entity recognition system generation method according to claim 2, which is characterized in that calculate the multiple The weighted value of each name entity includes: in medical samples of text
Under spark environment, each life in the multiple medical samples of text is calculated by N-Gram algorithm and tf-idf algorithm The weighted value of name entity.
4. it is according to claim 2 medical treatment name entity recognition system generation method, which is characterized in that it is described will be described more A recommendation medical treatment name entity is input to the medical treatment name entity recognition system, and accordingly from the multiple medical samples of text It is middle to obtain multiple candidate medical treatment name entities and include:
Name similar with the recommendation medical treatment name contextual feature of entity is obtained from the multiple medical sample text Entity is as supplement medical treatment name entity;
Increase weighted value of the supplement medical treatment name entity in the multiple medical samples of text;
The highest multiple name entities of weight selection value are as the candidate medical treatment name entity.
5. medical treatment name entity recognition system generation method according to claim 4, which is characterized in that described from described more Acquisition name entity similar with the recommendation medical treatment name contextual feature of entity, which is used as, in a medical treatment sample text supplements Medical treatment name entity include:
The multiple medical sample text is segmented according to preset model, obtains multiple cutting units;
The multiple contextual feature for recommending medical treatment name entity is obtained, and respectively by each recommendation medical treatment name entity Contextual feature is expressed as primary vector;
The contextual feature of the multiple cutting unit is obtained, and is respectively expressed as the contextual feature of each cutting unit Secondary vector;
The similarity of the primary vector Yu the secondary vector is calculated, and judges whether the similarity is default less than second Value;
Choose the secondary vector for being not less than second preset value with the similarity of the primary vector, and will with described second to The contextual feature that amount indicates corresponds to cutting unit as the candidate medical treatment name entity.
6. medical treatment name entity recognition system generation method according to claim 5, which is characterized in that the preset model For Hidden Markov Model.
7. medical treatment name entity recognition system generation method according to claim 5, which is characterized in that wherein, pass through Each contextual feature for recommending medical treatment name entity is expressed as primary vector and by each cutting list by word2vec The contextual feature of member is expressed as secondary vector.
8. medical treatment name entity recognition system generation method according to claim 1-6, which is characterized in that described First preset value is 85%-90%.
9. medical treatment name entity recognition system generation method according to claim 1-6, which is characterized in that its In, while being labeled to the multiple candidate medical treatment name entity, entity is named to the recommendation medical treatment being marked Classify;
While obtaining the candidate medical treatment name entity from the multiple medical samples of text, according to candidate described in this The similar classification for recommending medical treatment name entity of medical treatment name entity recommends classification the candidate medical treatment name entity.
10. a kind of medical treatment name entity recognition system generating means characterized by comprising
It is cold-started unit, for receiving multiple medical samples of text by a medical treatment name entity recognition system, and utilizes machine Study obtains multiple candidate medical treatment name entities from the multiple medical samples of text;
Unit is marked, for being labeled to the multiple candidate medical treatment name entity, obtains multiple recommendation medical treatment name entities;
Assessment unit, for calculating the ratio of number for recommending medical treatment name entity and the candidate medical treatment name entity, and Judge the ratio of number whether less than the first preset value;
Feedback unit, for the recommendation medical treatment being named real when judging that the ratio of number is less than first preset value Body is input to the medical treatment name entity recognition system, and obtains multiple times from the multiple medical samples of text accordingly Choosing medical treatment name entity simultaneously feeds back to the mark unit;
Output unit, for naming entity with current medical when judging the ratio of number not less than first preset value Identifying system names entity recognition system as target medical treatment.
CN201610864046.7A 2016-09-28 2016-09-28 Medical treatment name entity recognition system generation method and device Active CN106407183B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610864046.7A CN106407183B (en) 2016-09-28 2016-09-28 Medical treatment name entity recognition system generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610864046.7A CN106407183B (en) 2016-09-28 2016-09-28 Medical treatment name entity recognition system generation method and device

Publications (2)

Publication Number Publication Date
CN106407183A CN106407183A (en) 2017-02-15
CN106407183B true CN106407183B (en) 2019-06-28

Family

ID=59229294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610864046.7A Active CN106407183B (en) 2016-09-28 2016-09-28 Medical treatment name entity recognition system generation method and device

Country Status (1)

Country Link
CN (1) CN106407183B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897559B (en) * 2017-02-24 2019-09-17 黑龙江特士信息技术有限公司 A kind of symptom and sign class entity recognition method and device towards multi-data source
CN106919793B (en) * 2017-02-24 2019-12-06 黑龙江特士信息技术有限公司 Data standardization processing method and device for medical big data
CN107168946A (en) * 2017-04-14 2017-09-15 北京化工大学 A kind of name entity recognition method of medical text data
CN107992511A (en) * 2017-10-18 2018-05-04 东软集团股份有限公司 Index establishing method, device, storage medium and the electronic equipment of medical data table
CN108763348B (en) * 2018-05-15 2022-05-03 南京邮电大学 Classification improvement method for feature vectors of extended short text words
CN112487195B (en) * 2019-09-12 2023-06-27 医渡云(北京)技术有限公司 Entity ordering method, entity ordering device, entity ordering medium and electronic equipment
CN112949306B (en) * 2019-12-10 2024-04-30 医渡云(北京)技术有限公司 Named entity recognition model creation method, device, equipment and readable storage medium
CN111090338B (en) * 2019-12-11 2021-08-27 心医国际数字医疗***(大连)有限公司 Training method of HMM (hidden Markov model) input method model of medical document, input method model and input method
CN111462913B (en) * 2020-03-11 2023-08-15 云知声智能科技股份有限公司 Automatic segmentation method and device for disease diagnosis in case document
CN111814447B (en) * 2020-06-24 2022-05-27 平安科技(深圳)有限公司 Electronic case duplicate checking method and device based on word segmentation text and computer equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN103136361A (en) * 2013-03-07 2013-06-05 陈一飞 Semi-supervised extracting method for protein interrelation in biological text
CN103268339A (en) * 2013-05-17 2013-08-28 中国科学院计算技术研究所 Recognition method and system of named entities in microblog messages
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN105938495A (en) * 2016-04-29 2016-09-14 乐视控股(北京)有限公司 Entity relationship recognition method and apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN103136361A (en) * 2013-03-07 2013-06-05 陈一飞 Semi-supervised extracting method for protein interrelation in biological text
CN103268339A (en) * 2013-05-17 2013-08-28 中国科学院计算技术研究所 Recognition method and system of named entities in microblog messages
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN105938495A (en) * 2016-04-29 2016-09-14 乐视控股(北京)有限公司 Entity relationship recognition method and apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
搜索日志中命名实体识别;任育伟 等;《现代图书情报技术》;20150630;正文第4.1-4.2节
针对产品命名实体识别的半监督学习方法;黄诗琳 等;《北京邮电大学学报》;20130430;第36卷(第2期);正文第2-3节

Also Published As

Publication number Publication date
CN106407183A (en) 2017-02-15

Similar Documents

Publication Publication Date Title
CN106407183B (en) Medical treatment name entity recognition system generation method and device
US10311146B2 (en) Machine translation method for performing translation between languages
US10769552B2 (en) Justifying passage machine learning for question and answer systems
Alvarez-Melis et al. A causal framework for explaining the predictions of black-box sequence-to-sequence models
Cocarascu et al. Identifying attack and support argumentative relations using deep learning
Speck et al. Ensemble learning for named entity recognition
US9621601B2 (en) User collaboration for answer generation in question and answer system
Turner et al. Word2Vec inversion and traditional text classifiers for phenotyping lupus
US9348900B2 (en) Generating an answer from multiple pipelines using clustering
US9275115B2 (en) Correlating corpus/corpora value from answered questions
Song et al. Leveraging dependency forest for neural medical relation extraction
US9146987B2 (en) Clustering based question set generation for training and testing of a question and answer system
US10762992B2 (en) Synthetic ground truth expansion
US9230009B2 (en) Routing of questions to appropriately trained question and answer system pipelines using clustering
Martinez et al. Information extraction from pathology reports in a hospital setting
US20170169355A1 (en) Ground Truth Improvement Via Machine Learned Similar Passage Detection
KR102457821B1 (en) Apparatus and method for supporting decision making based on natural language understanding and question and answer
Comelli et al. An ontology-based retrieval system for mammographic reports
Zhang et al. Argument mining with graph representation learning
Ismail et al. An efficient hybrid LSTM-CNN and CNN-LSTM with GloVe for text multi-class sentiment classification in gender violence
Hussain et al. Implementation of disease prediction chatbot and report analyzer using the concepts of NLP, machine learning and OCR
Schaffer et al. Predicting with Confidence: Classifiers that Know What They Don’t Know
JP2019016352A (en) Application program interface mash-up generation
Correia et al. Automatic in-the-wild dataset annotation with deep generalized multiple instance learning
CN116108163B (en) Text matching method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant