CN108922633A - A kind of disease name standard convention method and canonical system - Google Patents

A kind of disease name standard convention method and canonical system Download PDF

Info

Publication number
CN108922633A
CN108922633A CN201810647287.5A CN201810647287A CN108922633A CN 108922633 A CN108922633 A CN 108922633A CN 201810647287 A CN201810647287 A CN 201810647287A CN 108922633 A CN108922633 A CN 108922633A
Authority
CN
China
Prior art keywords
disease
name
colloquial style
disease name
standard convention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810647287.5A
Other languages
Chinese (zh)
Inventor
华明
陈欣然
那日苏
秦其昌
范军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haide Health Mdt Infotech Ltd
Original Assignee
Beijing Haide Health Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Haide Health Mdt Infotech Ltd filed Critical Beijing Haide Health Mdt Infotech Ltd
Priority to CN201810647287.5A priority Critical patent/CN108922633A/en
Publication of CN108922633A publication Critical patent/CN108922633A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to disease name standard convention method and canonical system, which includes the following steps:Corresponding disease criterion assumed name is searched according to the disease name inquiry element in inquiry request to claim;Disease name standard convention database is called, and disease name is inquired into the disease colloquial style title prestored in element and disease name standard convention database and is matched;According to the disease colloquial style title with disease name inquiry Match of elemental composition, extracts corresponding disease criterion assumed name and claim, and be sent to terminal.Disease name standard convention method and canonical system of the invention is claimed the disease colloquial style name translation of disease name at disease criterion assumed name by disease name standard convention database, effective support is provided for the Consistency service of medical information, it is ensured that efficiently, is fast and accurately inquired.

Description

A kind of disease name standard convention method and canonical system
Technical field
The present invention relates to medical information technical field more particularly to a kind of disease name standard convention methods and specification System.
Background technique
Currently, due to the medical task of healthcare givers it is heavy can not to diagnostic result carry out disease criterion assumed name claim mark or Coding, and different healthcare givers is different to the description of same disease even to diagnostic result due to personal habits etc. It abridges etc., the title of same disease is caused often to present between different medical institutions or different healthcare givers Content is different, this is to needing to bring burden using the mechanism or personnel of diagnostic result, if it is desired to which healthcare givers is to disease Standardization, this is also a kind of burden for healthcare givers.
International Classification of Diseases (international Classification of diseases, ICD) is according to disease Certain features, disease is classified according to rule, and with the method for coding come the system that indicates.However, international disease point Class be that the disease criterion assumed name of disease claims the mapping table with disease number code.For disease disease colloquial style title not It is applicable in.
With the fast development of information technology, internet data bursts out, the real-time storage of magnanimity internet small documents and place Reason becomes the problem that more and more Internet applications are faced.For big file, the real time access meeting of mass small documents Huge pressure is brought to file system, traditional file system is difficult rapidly accessing small high-volume file, this is seriously affected The real-time of Internet application.Memory database technology relies on its advantage in data processing speed, is that magnanimity internet is small The real-time storage of file and processing provide new method.The limitation of relational database causes its performance to be very limited, Concurrency is low, it is difficult to meet the needs of public is growing.Therefore, the real time response speed of vector data service is improved, is met Its high concurrent, high-throughput requirement are a critical issues in the urgent need to address.
Internet has bred huge information ocean, and each information entity has that source is wide, updating decision, various structures The features such as change, diversification of forms, it is various complicated and difficulty that these characteristics also give the search technique excavated based on web information to bring Problem.On how to find out for users as far as possible comprehensively while also as far as possible high price value information the problem of, it is each general to search Index, which is held up, respectively seeks its diameter, in summary generally from Web de-noising, the matching of the full text degree of correlation, the sequence of page significance level etc. It optimizes.Related ends such as propose PageRank, HITS scheduling algorithm.But increasingly with the phenomenon that user information is overloaded Seriously, user also shows the tendency for only accessing oneself real interested page.The universal search engines such as Google are emphasized Search towards all users, search source is very wide, and theme is many and diverse different, inevitable that theme phase is easy to appear on search result The problems such as Guan Du is not high, result is many and diverse.
Real-time reception to medical data can be described with magnanimity, how efficiently, fast and accurately get corresponding The data matched are to match the most key problem urgently to be solved in face of medical data at present.
Therefore, it is necessary to a kind of disease name standard convention method and canonical systems.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind The disease name standard convention method and canonical system of problem are stated, it can be by the disease colloquial style name translation of disease at disease Standardized name, and real time response speed is fast, high concurrent, high-throughput.
According to an aspect of the present invention, a kind of disease name standard convention method is provided, is included the following steps:
Corresponding disease criterion assumed name is searched according to the disease name inquiry element in inquiry request to claim;
Disease name standard convention database is called, and disease name is inquired into element and disease name standard convention The disease colloquial style title prestored in database is matched;
According to the disease colloquial style title with disease name inquiry Match of elemental composition, corresponding disease criterion assumed name is extracted Claim, and is sent to terminal.
Disease name standard convention database includes the single total library of disease label, which includes more A single disease word bank bookmark name;The multiple single disease word banks linked respectively with each single disease word bank bookmark name, respectively Single disease word bank includes that disease colloquial style title memory block, disease criterion assumed name claim memory block and disease name to standardize to turn over Area is translated, for storing multiple disease colloquial style titles, disease criterion assumed name claims memory block to be used for for disease colloquial style title memory block It stores a disease criterion assumed name to claim, each disease colloquial style title claims to standardize in disease name with disease criterion assumed name Translated region association carries out the disease colloquial style title prestored in disease name inquiry element and disease colloquial style title memory block Matching;When there is the disease colloquial style title with disease name inquiry Match of elemental composition, translated region root is standardized in disease name Corresponding disease criterion assumed name is extracted according to the disease colloquial style title to claim, and is sent to terminal;When being not present and disease name When inquiring the disease colloquial style title of Match of elemental composition, the synonymous of disease name inquiry element is searched on synonym searcher Word matches the disease colloquial style title prestored in the synonym and disease colloquial style title memory block, and in disease name Standardization translated region is claimed to extract corresponding disease criterion according to the disease colloquial style title of disease name inquiry Match of elemental composition Assumed name claims, and is sent to terminal.
Above-mentioned disease name standard convention method further includes:
Maximization participle is carried out to disease name inquiry element, synonymous word association is carried out to participle, and same to what is be associated with Adopted word generates synonymous word family, the disease spoken language assumed name that will be prestored in each element in synonymous word family and disease colloquial style title memory block Title is matched.
Above-mentioned disease name standard convention method further includes:
The removal of stop words is carried out to disease name inquiry element.
Above-mentioned disease name standard convention method further includes:
Disease name to be checked is received, is stored together with corresponding standardized name, provides reference for inquiry next time.
According to another aspect of the present invention, a kind of disease name standard convention system is provided, including:
Disease name enquiry module, for searching corresponding disease mark according to the disease name inquiry element in inquiry request Quasi- assumed name claims;Disease name matching module inquires member for calling disease name standard convention database, and by disease name Element is matched with the disease colloquial style title prestored in disease name standard convention database;Disease name extraction module, According to the disease colloquial style title with disease name inquiry Match of elemental composition, extracts corresponding disease criterion assumed name and claim, and send To terminal.
Above-mentioned disease name standard convention system further includes:Synonym searcher, for being not present and disease name When claiming the disease colloquial style title of inquiry Match of elemental composition, the synonym of disease name inquiry element is searched,
Disease name standard convention database includes the single total library of disease label, which includes more A single disease word bank bookmark name;Multiple single disease word banks linked respectively with each single disease word bank bookmark name, respectively Single disease word bank includes that disease colloquial style title memory block, disease criterion assumed name claim memory block and disease name to standardize to turn over Area is translated, for storing multiple disease colloquial style titles, disease criterion assumed name claims memory block to be used for for disease colloquial style title memory block It stores a disease criterion assumed name to claim, each disease colloquial style title claims to standardize in disease name with disease criterion assumed name Translated region association,
Disease name matching module is also used to inquire disease name in element and disease colloquial style title memory block and prestore Disease colloquial style title matched, the disease spoken language assumed name that will be prestored in the synonym and disease colloquial style title memory block Title is matched;
Disease name extraction module is also used to when there is the disease colloquial style title with disease name inquiry Match of elemental composition, Corresponding disease criterion assumed name is extracted according to the disease colloquial style title in disease name standardization translated region to claim to be sent to end End, and extracted pair in disease name standardization translated region according to the disease colloquial style title for inquiring Match of elemental composition with disease name The disease criterion assumed name answered claims to be sent to terminal.
Disease name matching module, which is also used to inquire element to disease name, carries out maximization participle, carries out to participle synonymous Word association, and synonymous word family is generated to the synonym being associated with, by each element in synonymous word family and disease name standard convention The disease colloquial style title prestored in database is matched.
Disease name enquiry module is also used to inquire the removal that element carries out stop words to disease name.
Above-mentioned disease name standard convention system further includes:Disease name cache module, for receiving disease to be checked Title stores together with corresponding standardized name, provides reference for inquiry next time.
Compared with prior art, the present invention having the following advantages that:
1. disease name standard convention method of the invention and canonical system pass through disease name standard convention data Library claims the disease colloquial style name translation of disease name at disease criterion assumed name, provides for the Consistency service of medical information Effectively support, it is ensured that efficiently, fast and accurately inquire.
2. disease name standard convention method and canonical system of the invention carry out disease name inquiry element deactivated The removal of word is claimed with preventing from being led to not find corresponding disease criterion assumed name due to stop words.
It is and corresponding 3. disease name standard convention method of the invention and canonical system receive disease name to be checked Standardized name stores together, provides reference for inquiry next time, disease name standard convention database is ensured Real-time update improves efficiency for the inquiry of standardized name.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Setting.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is disease name standard convention method and step figure of the invention;
Fig. 2 is disease name standard convention method system block diagram of the invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is set.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless otherwise will not be explained in an idealized or overly formal meaning by specific definitions.
Fig. 1 is disease name standard convention method and step figure of the invention, as shown in Figure 1, disease provided by the invention Title standard convention method, includes the following steps:Corresponding disease is searched according to the disease name inquiry element in inquiry request Sick standardized name;Disease name standard convention database is called, and disease name is inquired into element and disease name standard Change the disease colloquial style title prestored in normative database to be matched;According to the disease mouth with disease name inquiry Match of elemental composition Language assumed name claims, and extracts corresponding disease criterion assumed name and claims, and is sent to terminal.Disease name standard convention side of the invention Method is claimed the disease colloquial style name translation of disease name at disease criterion assumed name by disease name standard convention database, Effective support is provided for the Consistency service of medical information, it is ensured that efficiently, is fast and accurately inquired.
Disease name standard convention database includes the single total library of disease label, which includes more A single disease word bank bookmark name;Multiple single disease word banks linked respectively with each single disease word bank bookmark name, respectively Single disease word bank includes that disease colloquial style title memory block, disease criterion assumed name claim memory block and disease name to standardize to turn over Area is translated, for storing multiple disease colloquial style titles, disease criterion assumed name claims memory block to be used for for disease colloquial style title memory block It stores a disease criterion assumed name to claim, each disease colloquial style title claims to standardize in disease name with disease criterion assumed name Translated region association carries out the disease colloquial style title prestored in disease name inquiry element and disease colloquial style title memory block Matching;When there is the disease colloquial style title with disease name inquiry Match of elemental composition, translated region root is standardized in disease name Corresponding disease criterion assumed name is extracted according to the disease colloquial style title to claim, and is sent to terminal;When being not present and disease name When inquiring the disease colloquial style title of Match of elemental composition, the synonymous of disease name inquiry element is searched on synonym searcher Word matches the disease colloquial style title prestored in the synonym and disease colloquial style title memory block, and in disease name Standardization translated region is claimed to extract corresponding disease criterion according to the disease colloquial style title of disease name inquiry Match of elemental composition Assumed name claims, and is sent to terminal.In disease name standard convention method of the invention, when there is no inquire element with disease name When matched disease colloquial style title, the synonym of disease name inquiry element is searched on synonym searcher, this is same The disease colloquial style title prestored in adopted word and disease colloquial style title memory block is matched, and is turned in disease name standardization It translates area to extract corresponding disease criterion assumed name according to the disease colloquial style title of disease name inquiry Match of elemental composition and claim, send To terminal, the standardization for disease colloquial style title provides a possibility that bigger.Synonym searcher can be third party and search Index is held up.
The method for building up of disease name standard convention database includes:According to the disease in kinds of Diseases such as ICD10 Type establishes the single total library of disease label, which includes multiple single disease word bank bookmark names, each A corresponding single disease word bank of single disease word bank bookmark name;Disease colloquial style title memory block, disease are established respectively Standardized name memory block and disease name standardize translated region, and aggregate into single disease word bank folder;Multiple diseases are spoken Assumed name claims to insert disease colloquial style title memory block, claims a disease criterionization filling disease criterion assumed name to memory block, and will Mapping association model inserts disease name and standardizes translated region, to generate single disease word bank.
The method for building up of disease name standard convention database further includes:The not stored disease colloquial style of real-time reception Title, and the disease colloquial style title is included in corresponding disease colloquial style title memory block, with the single disease of real-time update Word bank.
The method for building up of disease name standard convention database further includes:To the description number in each single disease word bank According to vertically being split, and according to time dimension, using Sqoop, (Sqoop is the tool of a open source, is mainly used in Hadoop (Hive) between traditional database (MySQL, PostgreSQL...) carry out data transmitting) tool to data carry out fragment Storage.
Fragment storage is carried out to data using Hive Partitioning (Hive subregion).
The method for building up of disease name standard convention database further includes:To the description number in each single disease word bank According to progress duplicate removal processing.
The method for building up of disease name standard convention database further includes:To the description number in each single disease word bank According to carrying out unified format analysis processing.
The method for building up of disease name standard convention database further includes:To the description number in each single disease word bank According to progress full-shape half-angle conversion process.
The conversion of full-shape half-angle uses Unicode (Unicode) value, and full-shape space and half-width space difference are 12256, other Full-shape and half-angle Unicode difference are 65248.
The method for building up of disease name standard convention database further includes:NLP instruction is carried out to disease colloquial style title Practice, so that the association corresponding with participle of disease colloquial style title.
NLP (Natural Language Processing), natural language processing, also known as NLU (Natural Language Understanding) natural language understanding is branch and the core class of artificial intelligence of language information processing Topic is exactly to allow computer understanding natural language in simple terms.
The content and range of NLP research are all very much, mainly study following some aspects.
Machine translation (Machine Translation, MT):A kind of character translation of language is become other one with machine Kind language;Automatically generate digest (Automatic Summarizing):Content and meaning to original text first understand, so It summarizes and summarizes afterwards, finally state out with brief language;(Information is retrieved to relevant information Retrieval):The related text for meeting user demand is found from mass text with computer system, if towards two kinds and Two or more language is then known as cross-language information retrieval;Text classification (Document Categorization):For given A text, corresponding classification is divided into according to certain principle to it using computer;It answers a question system (Question-AnsweringSystem):The problem of for proposing, is obtained with computer, understands the meaning of problem, then It finds a solution to the problem, and is answered;(Information Filtering) is filtered to information:Mainly in network Flame be filtered and identify;Information extraction (information extraction):Specific thing is extracted from text Part or factural information, information extraction system is usually input with the output of information retrieval system, and can improve information retrieval system The performance of system;Text mining (text mining):Also data mining is cried, is the process for obtaining high quality information from text;Carriage Mutual affection analyses (public opinion analysis):Be the masses on network around some social event or speech to managing this The political attitude entertained is one sufficiently complex, is related to the numerous integrated technology in face;Metaphor calculates (metaphorical computation):The language phenomenon of another things is described with something or other or its certain feature;Automatic error-correcting and automatic Proofreading (Automatic Proofreading):The verification of content of text is carried out, and corrects mistake;Automated Essay Scoring:It is automatic right The quality of composition and the level of writing are evaluated and are given a mark;Light reads character recognition (Opitical Character Recognition,OCR):Hand-written or printing text is identified, they are then transformed into e-text again;It uses Corresponding technology is converted to corresponding written word and indicates, also referred to as automatic speech recognition (ASR);Text-language conversion (Text-To- Speech Convension):The data conversion of text is become voice data;Identification/verifying/certification (Speaker of voice Recognition/Identification/Verification):The speech samples of acoustic analysis speaker, thus to speaking The identity of person judges.
The above research contents covering surface is very extensive, can generally be related to morphology, the grammar of natural language, pragmatics and The many aspects such as semantics.After all, realize that natural language processing most critical to be solved is exactly ambiguity resolution problem and not Know the processing of grammatical phenomenon.
Above-mentioned disease name standard convention method further includes:Maximization participle is carried out to disease name inquiry element, it is right Participle carries out synonymous word association, and generates synonymous word family to the synonym being associated with, by each element in synonymous word family and disease mouth Language assumed name claims the disease colloquial style title prestored in memory block to match.
Above-mentioned disease name standard convention method further includes:The removal of stop words is carried out to disease name inquiry element. Disease name standard convention method of the invention carries out the removal of stop words to disease name inquiry element, to prevent due to disease There is space in sick name query element or claims since the problems such as format leads to not to find corresponding disease criterion assumed name.
Above-mentioned disease name standard convention method further includes:Disease name to be checked is received, with corresponding standard assumed name Title stores together, provides reference for inquiry next time.Disease name standard convention method of the invention receives disease to be checked Title stores together with corresponding standardized name, reference is provided for inquiry next time, so that disease name standard convention number It can ensure real-time update according to library, be the newest disease colloquial style title of the same standardized name real-time update, for standardization The inquiry of title improves probability.
Fig. 2 is disease name standard convention method system block diagram of the invention, as shown in Fig. 2, disease provided by the invention Name of disease claims standard convention system, including:Disease name enquiry module, for inquiring member according to the disease name in inquiry request Element is searched corresponding disease criterion assumed name and is claimed;Disease name matching module, for calling disease name standard convention database, And disease name is inquired into the disease colloquial style title prestored in element and disease name standard convention database and is matched; Disease name extraction module extracts corresponding disease according to the disease colloquial style title with disease name inquiry Match of elemental composition Standardized name, and it is sent to terminal.Disease name standard convention system of the invention passes through disease name standard convention Database claims the disease colloquial style name translation of disease name at disease criterion assumed name, mentions for the Consistency service of medical information Effective support is supplied, it is ensured that efficiently, fast and accurately inquire.
Above-mentioned disease name standard convention system further includes:Synonym searcher, for being not present and disease name When claiming the disease colloquial style title of inquiry Match of elemental composition, the synonym of disease name inquiry element, disease name standardization are searched Normative database includes the single total library of disease label, which includes multiple single disease word bank tag names Claim;Multiple single disease word banks linked respectively with each single disease word bank bookmark name, each single disease word bank includes disease Sick colloquial style title memory block, disease criterion assumed name claim memory block and disease name to standardize translated region, disease colloquial style title Memory block for storing multiple disease colloquial style titles, disease criterion assumed name claim memory block for store a disease criterion assumed name Claim, each disease colloquial style title claims to be associated in disease name standardization translated region with disease criterion assumed name, disease name Matching module is also used to disease name inquiring the disease colloquial style title prestored in element and disease colloquial style title memory block It is matched, the disease colloquial style title prestored in the synonym and disease colloquial style title memory block is matched;Disease Title extraction module is also used to when there is the disease colloquial style title with disease name inquiry Match of elemental composition, in disease name mark Standardization translated region extracts corresponding disease criterion assumed name according to the disease colloquial style title and claims to be sent to terminal, and in disease name Standardization translated region is claimed to extract corresponding disease criterion according to the disease colloquial style title of disease name inquiry Match of elemental composition Assumed name claims to be sent to terminal.In disease name standard convention system of the invention, when there is no inquire element with disease name When matched disease colloquial style title, the synonym of disease name inquiry element is searched on synonym searcher, this is same The disease colloquial style title prestored in adopted word and disease colloquial style title memory block is matched, and is turned in disease name standardization It translates area to extract corresponding disease criterion assumed name according to the disease colloquial style title of disease name inquiry Match of elemental composition and claim, send To terminal, the standardization for disease colloquial style title provides a possibility that bigger.
Disease name matching module, which is also used to inquire element to disease name, carries out maximization participle, carries out to participle synonymous Word association, and synonymous word family is generated to the synonym being associated with, by each element in synonymous word family and disease name standard convention The disease colloquial style title prestored in database is matched.For example, the keyword that user inputs is segmented, synonym With processing, in order to establish participle required for full-text index, after completing to the parsing of keyword, need to keyword It is segmented, the processing of synonym, wherein Chinese word segmentation machine is based on maximum forward matching algorithm and is segmented, point that will be obtained Word entry is sent to disease name cache module.
Disease name enquiry module is also used to inquire the removal that element carries out stop words to disease name.For example, inquiring In module, when extracting the keyword of user's input, by natural semantic processes technology, in conjunction with the semanteme knot of medical professionalism term Structure carries out effective information extraction to the text that user inputs according to medical logic, stop words is such as gone to refer to that those go out in the text Occurrence number is more, but does not have the word of directive significance, such as the nouns such as " type ", " property ", " sign ", " phase ", " two to text classification The numeral-classifier compound such as phase ", the punctuation marks such as " () ", " [] ", these words should all remove before subsequent classification selected characteristic, prevent Classification results are impacted.Disease name standard convention system of the invention carries out stop words to disease name inquiry element Removal, have space or since the problems such as format leads to not find corresponding disease to prevent from inquiring in element due to disease name Sick standardized name.
Above-mentioned disease name standard convention system further includes:Disease name cache module, for receiving disease to be checked Title stores together with corresponding standardized name, provides reference for inquiry next time.Disease name of the invention standardizes rule Model system receives disease name to be checked, stores together with corresponding standardized name, provides reference for inquiry next time, so that Disease name standard convention database can ensure real-time update, be the same newest disease of standardized name real-time update Colloquial style title is that the inquiry of standardized name improves probability.For example, in disease name cache module, first from disease name In standard convention database search for participle, if searched out, be sent to terminal, the disease name cache module primarily directed to The keyword resolution of user's input obtains participle and inquires, and the participle searching request after parsing is dealt on each fragment and is carried out Distributed query.Wherein, storage system uses third-party Redis.The standardized name collection of inquiry is sent to terminal, not The participle collection for finding standardized name is sent to synonym searcher.
Redis is the memory database for the open source Key-Value model issued for the first time in 2009, it is write using C language At, but it supports multilingual interface, such as C++, C#, Java, JavaScript, Python.Redis is entire data base set System is loaded into memory while operated, and is periodically serviced by asynchronous operation being saved on data flush to hard disk Think highly of after opening, data will not lose.
In synonym searcher, if not finding standardized name, after participle merges duplicate removal, traversal does not find standard The participle collection that assumed name claims scans for, and synonym searcher divides standardized name is not found in disease name cache module Word carries out data collection processing, and these vocabulary are added in the dictionary of disease name standard convention database and increase new point The function of word, supports one or more index in classification to realize in multiple index columns, generates Query pairs using multiple domain search As, distributed search index file is carried out in each ES node, result qualified in each node is merged, is sorted, Wherein full-text search engine uses third-party Elasticsearch.The result set and participle are established into corresponding relationship, if It is placed in Redis to be filled, and the result set is sent to result output module.
Synonym searcher uses vertical search engine.
Vertical search engine is primarily to meeting specific area, specific crowd or particular demands and generating.Vertically The search strategy of search engine is very rigorous, prominent its specialization, facilitation, personalization on the basis of universal search engine technology The characteristics of.
Compared to universal search engine, vertical search engine has following some features and advantages:Vertical search engine only focuses on The resource of specific area, so that the excavation resource that can go deep into, finds the correlation and potential value of resource;Likewise, Because paying attention to the retrieval in specific area, the user that vertical search engine faces often has many industry general character, therefore Vertical search engine can better understand search statement and the search expectation of user;User can have in field explicitly to be searched Rope process, search history, and certain user's viscosity can be generated, based on this user's viscosity, vertical search engine can be more The actual search of good understanding and discovery user is intended to;Result feedback on, in conjunction with universal search engine PageRank and The strategies such as HITS authority's model, vertical search engine more absorbed topic correlativity, Yong Hu during carrying out sort result Characteristic and user in field select the influence of feedback.
In vertical search engine, core is the relevant abundant data of theme in field.The source of these data mainly according to It is obtained by two approach, first is that by the data accumulation of itself, first is that being grabbed by from information various in internet Relevant data.
Wherein, Elasticsearch is the software tool that can be used for constructing search engine, its support distribution The open source search engine based on Lucene of formula, multi-tenant, Restful design, allows full-text search to become simple.It is prior It is that its distributed real-time files storage, each field is indexed and can be searched and can extend to up to a hundred services Device handles PB level structure or unstructured data.When real-time foundation is indexed and is retrieved, the retrieval speed of Elasticsearch It is significant to spend clear superiority.Its essential characteristic is as follows:
Index:Its data are stored in one or more indexes (index) by Elasticsearch, with the field SQL Term carrys out analogy, and index can read document just as database to index write-in document or from index, and by Index is write data into using Lucene inside Elasticsearch or retrieves data from index, in Elasticsearch Index may be made of one or more Lucene indexes, detail by Elasticsearch index fragment (shard), (replica) mechanism of duplication and its configuration determine;Document:Document (document) is main in the world Elasticsearch Entity (being also such for Lucene), for all cases using Elasticsearch, they can finally return Become the search to document, document is made of field, and the field name and one or more field values that each field has it are (at this In the case of kind, which is known as multivalue, i.e., has multiple same file-name fields in document), there may be respective difference between document Set of fields, and there is no fixed modes or compulsory structure for document;Mapping:All documents are required before write-in index It is first analyzed, some parameters can be set in user, to determine how to be entry by input text segmentation, which entry should be by It filters out or which additional treatments is that it is necessary to called (as removed html tag), in addition, Elasticsearch is also mentioned Supplied various characteristics, as sequence when needed for field contents information;Type:There is therewith each document in Elasticsearch Corresponding type (type) definition, stores a variety of Doctypes, and different mappings is provided for different document type;Node:It is single A Elasticsearch Service Instance is known as node (node), many times disposes an Elasticsearch node just foot To deal with most of simple applications, but fault-tolerance or when data expansion to single machine is unable to cope with these situations, can more it incline To in the Elasticsearch cluster using multinode.
Since another problem that magnanimity internet small documents real time access is faced is the real-time retrieval of mass small documents Problem.Firstly, since Internet application, database needs constantly to store new internet document, this is to index structure Updating maintenance brings pressure, if the expense of index structure maintenance is excessive, will impact to the performance of system.In addition, Redis is typical key value database, has performance advantage outstanding in inquiry operation of the processing based on major key.Therefore, originally Application is proposed the lightweight search strategy of Elasticsearch search engine and Redis database combination.
Above-mentioned disease name standard convention method system further includes:As a result output module, it is slow for merging disease name The result set that storing module and synonym searcher obtain, and after being ranked up according to rule, it is sent to terminal.
Above-mentioned disease name standard convention method system further includes:Disease name corrects module, for checking user's Whether input has wrong word, if so, then correcting the wrong word.For example, by the primitive nature language expression analysis of user's input Reason is the data available of structuring, by the processing for the data set that this stage segments known medicine, is led for specialized health The specialized vocabulary in domain needs the correction function of being automated according to new word algorithm.
Disease name standard convention method and canonical system intelligent extraction standardized name through the invention, and energy Allow healthcare givers's unrestricted choice standardized name, enough to prevent diagnostic result from having clerical mistake or check that the personnel of diagnostic result can not Judge the type of disease.
Diagnostic result is greatly improved by using disease name standard convention method of the invention and canonical system Recognition efficiency, such as an insurance company, the kinds of Diseases for needing to judge are few then thousands of, more then up to ten thousand, if judgement is all every time It needs the staff of insurance company that can waste many times by manually searching the standardized name of disease, allows disease Standardized name becomes complicated labour, and due to being layman, it is easy to the phenomenon that judging incorrectly, but adopt With disease name standard convention method of the invention and the standardized name of canonical system inquiry disease, insurance is greatly improved The efficiency of the staff of company.
In the present invention, i.e., by with cloud big data analysis, intelligent segmenting word, complete specialized dictionary, fast search The advanced technologies such as engine are automatically performed standardized name matching work, instead of manual identified diagnostic result and table look-up.Therefore, originally Invention can help identifying and diagnosing as a result, and accuracy rate height.
Embodiment one
When the present invention handles external API, (Application Programming Interface, application programming are connect Mouthful) when, detailed process is as follows:
S11:The keyword of user's input is obtained, and handles the additional character in keyword and space, generates keyword;
S12:Correction processing is carried out to processed keyword, wrong word is corrected to correctly spell, such as by keywood It is modified as keyword;
S13:Keyword after correction is segmented, is segmented using maximizing, and synonymous word association is carried out to participle, is closed The synonym being linked to generates synonymous word family, after segmenting to keyword, obtains multiple participle collection, is labeled as keyword1, keywork2...keywordN;
S14:Divide after good word, traversal participle collection is searched in disease name standard convention database, if searching out mark Quasi- assumed name claims, then returns the result output module, and word segmentation result is marked as result1, result2...resultN, does not search out mark Quasi- assumed name claims, then is saved in and does not search out concentration, is labeled as NotHit1, NotHit2...NotHitN, searches for wait enter synonym Device;
S15:Pair synonym searcher, the keyword not searched out scan in Elasticsearch, i.e., NotHit1, NotHit2...NotHitN are traversed, and result result1, result2...resultN are obtained;
S16:By the result1 in obtained result set, result2...resultN is traversed, caching to redis In, and expired time is set, in case caching overflows.
Embodiment two
DM II is inputted into name query module, title correction module is by the uniform format of DM II at reference format for example without sky Lattice half width form, the disease that name-matches module will prestore in the DM II of reference format and disease name standard convention database Colloquial style title, such as matched in each single disease word bank, it is including that the diseases such as two type of diabetes, NDM, DM II are spoken Assumed name, which claims to find in the single disease word bank with standardized name E71.000, inquires the matched disease of element DM II with disease name Colloquial style title DM II, standardized name E71.000 is extracted and is sent to terminal by title extraction module.
Therefore, in the method for the invention, real-time disease diagnostic information is obtained;Application layer uses Redis as application layer Data set cache is then based on maximum forward matching algorithm and carries out word segmentation processing, carries out participle search based on Elasticsearch To use for front-end application, the pass for inputting obtained active user by the disease identification model in preset single disease library Keyword feature is analyzed, to determine whether the diagnosis content of obtained real-time disease title is specification, standardization disease name;If really The diagnosis content for recognizing obtained real-time disease title is lack of standardization, then is sent to third party's identifying system and again identifies that and sort out. The keyword of retrieval can be passed through layered shaping, accurate recognition and the standardization for handling medical diagnosis on disease, data in time by the present invention Relatively independent, fault-tolerance is high, and dates back is also stronger, can take into account high concurrent data processing and the low of front end applications is prolonged Slow interaction demand.
It will be appreciated by those of skill in the art that although some embodiments in this include included in other embodiments Certain features rather than other feature, but the combination of the feature of different embodiments means to be within the scope of the present invention simultaneously And form different embodiments.For example, in the following claims, the one of any of embodiment claimed all may be used Come in a manner of in any combination using.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that:It still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of disease name standard convention method, which is characterized in that include the following steps:
Corresponding disease criterion assumed name is searched according to the disease name inquiry element in inquiry request to claim;
Disease name standard convention database is called, and disease name is inquired into element and disease name standard convention data The disease colloquial style title prestored in library is matched;
According to the disease colloquial style title with disease name inquiry Match of elemental composition, extracts corresponding disease criterion assumed name and claim, and It is sent to terminal.
2. disease name standard convention method according to claim 1, which is characterized in that disease name standard convention Database includes the single total library of disease label, which includes multiple single disease word bank bookmark names;It is more A single disease word bank linked respectively with each single disease word bank bookmark name, each single disease word bank include disease spoken language Assumed name claims memory block, disease criterion assumed name that memory block and disease name is claimed to standardize translated region, disease colloquial style title memory block For storing multiple disease colloquial style titles, disease criterion assumed name claim memory block for store a disease criterion assumed name claim, often A disease colloquial style title claims to be associated in disease name standardization translated region with disease criterion assumed name;
Disease name is inquired the disease colloquial style title prestored in element and disease colloquial style title memory block to match;
When exist and disease name inquiry Match of elemental composition disease colloquial style title when, disease name standardization translated region according to The disease colloquial style title extracts corresponding disease criterion assumed name and claims, and is sent to terminal;
When there is no the disease colloquial style title with disease name inquiry Match of elemental composition, disease is searched on synonym searcher The synonym of sick name query element, the disease colloquial style title that will be prestored in the synonym and disease colloquial style title memory block It is matched, and is mentioned in disease name standardization translated region according to the disease colloquial style title with disease name inquiry Match of elemental composition It takes out corresponding disease criterion assumed name to claim, is sent to terminal.
3. disease name standard convention method according to claim 2, which is characterized in that further include:
Maximization participle is carried out to disease name inquiry element, synonymous word association is carried out to participle, and to the synonym being associated with Generate synonymous word family, by the disease colloquial style title prestored in each element in synonymous word family and disease colloquial style title memory block into Row matching.
4. disease name standard convention method according to claim 3, which is characterized in that further include:
The removal of stop words is carried out to disease name inquiry element.
5. disease name standard convention method according to claim 4, which is characterized in that further include:
Disease name to be checked is received, is stored together with corresponding standardized name, provides reference for inquiry next time.
6. a kind of disease name standard convention system, which is characterized in that including:
Disease name enquiry module, for searching corresponding disease criterion according to the disease name inquiry element in inquiry request Title;
Disease name matching module, for calling disease name standard convention database, and by disease name inquiry element with The disease colloquial style title prestored in disease name standard convention database is matched;
Disease name extraction module extracts corresponding according to the disease colloquial style title with disease name inquiry Match of elemental composition Disease criterion assumed name claims, and is sent to terminal.
7. disease name standard convention system according to claim 6, which is characterized in that further include:Synonym search Module, for searching disease name inquiry member when there is no the disease colloquial style title with disease name inquiry Match of elemental composition The synonym of element;
Disease name standard convention database includes the single total library of disease label, which includes multiple lists One disease word bank bookmark name;Multiple single disease word banks linked respectively with each single disease word bank bookmark name are each single Disease word bank includes that disease colloquial style title memory block, disease criterion assumed name claim memory block and disease name to standardize translation Area, disease colloquial style title memory block claim memory block for depositing for storing multiple disease colloquial style titles, disease criterion assumed name It stores up a disease criterion assumed name to claim, each disease colloquial style title, which claims to standardize in disease name with disease criterion assumed name, to be turned over Translate area's association;
Disease name matching module is also used to disease name inquiring the disease prestored in element and disease colloquial style title memory block Sick colloquial style title is matched, by the disease colloquial style title prestored in the synonym and disease colloquial style title memory block into Row matching;
Disease name extraction module is also used to when there is the disease colloquial style title with disease name inquiry Match of elemental composition, in disease Name of disease, which claims to standardize translated region, to be extracted corresponding disease criterion assumed name according to the disease colloquial style title and claims to be sent to terminal, and It is corresponding according to being extracted with the disease colloquial style title of disease name inquiry Match of elemental composition in disease name standardization translated region Disease criterion assumed name claims to be sent to terminal.
8. disease name standard convention system according to claim 7, which is characterized in that disease name matching module is also For carrying out maximization participle to disease name inquiry element, synonymous word association is carried out to participle, and to the synonym being associated with Synonymous word family is generated, the disease spoken language assumed name that will be prestored in each element in synonymous word family and disease name standard convention database Title is matched.
9. disease name standard convention system according to claim 8, which is characterized in that disease name enquiry module is also For carrying out the removal of stop words to disease name inquiry element.
10. disease name standard convention system according to claim 9, which is characterized in that further include:Disease name is slow Storing module stores together with corresponding standardized name for receiving disease name to be checked, provides ginseng for inquiry next time It examines.
CN201810647287.5A 2018-06-22 2018-06-22 A kind of disease name standard convention method and canonical system Pending CN108922633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810647287.5A CN108922633A (en) 2018-06-22 2018-06-22 A kind of disease name standard convention method and canonical system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810647287.5A CN108922633A (en) 2018-06-22 2018-06-22 A kind of disease name standard convention method and canonical system

Publications (1)

Publication Number Publication Date
CN108922633A true CN108922633A (en) 2018-11-30

Family

ID=64420959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810647287.5A Pending CN108922633A (en) 2018-06-22 2018-06-22 A kind of disease name standard convention method and canonical system

Country Status (1)

Country Link
CN (1) CN108922633A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222709A (en) * 2019-04-29 2019-09-10 上海暖哇科技有限公司 A kind of multi-tag intelligence marking method and system
CN110321351A (en) * 2019-07-05 2019-10-11 云南电网有限责任公司电力科学研究院 A kind of vendor name method for normalizing based on fuzzy matching
CN110442672A (en) * 2019-08-13 2019-11-12 天津云数嘉合科技有限公司 A method of for data analysis and data mining case dictionary
CN110956043A (en) * 2019-12-17 2020-04-03 人和未来生物科技(长沙)有限公司 Domain professional vocabulary word embedding vector training method, system and medium based on alias standardization
CN111428029A (en) * 2020-03-05 2020-07-17 云知声智能科技股份有限公司 Operation name standardization method and device
CN111563142A (en) * 2020-07-14 2020-08-21 成都四方伟业软件股份有限公司 SQL automatic benchmarking matching method and device
CN111652737A (en) * 2020-04-17 2020-09-11 世纪保众(北京)网络科技有限公司 Insurance underwriting method and device based on text semantic processing
CN111899829A (en) * 2020-07-31 2020-11-06 青岛百洋智能科技股份有限公司 Full-text retrieval matching engine based on ICD9/10 participle lexicon
CN111933244A (en) * 2020-08-17 2020-11-13 医渡云(北京)技术有限公司 Medicine data encoding method and device, computer readable medium and electronic equipment
CN112668280A (en) * 2020-12-29 2021-04-16 杭州依图医疗技术有限公司 Medical data processing method and device and storage medium
CN112733528A (en) * 2020-12-31 2021-04-30 平安医疗健康管理股份有限公司 Code matching method, device and equipment for medical data and storage medium
CN112800317A (en) * 2021-02-04 2021-05-14 北京易车互联信息技术有限公司 Search platform architecture for automobile vertical field
CN113722429A (en) * 2021-08-11 2021-11-30 上海保链科技有限公司 Data normalization processing method, device and equipment and computer readable storage medium
CN113823404A (en) * 2021-08-26 2021-12-21 山东健康医疗大数据有限公司 Medical big data-based method for standardizing medical terms for construction of specific diseases
CN114242262A (en) * 2022-02-28 2022-03-25 台州市中心医院(台州学院附属医院) Medical scientific research information rapid processing system based on big data record

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002089004A2 (en) * 2001-04-27 2002-11-07 In2Itive Business Group Ltd Search data management
CN1839404A (en) * 2003-07-14 2006-09-27 理智医疗情报技术株式会社 Method for computerising and standardizing medical information
CN106845058A (en) * 2015-12-04 2017-06-13 北大医疗信息技术有限公司 The standardized method of disease data and modular station
CN107145511A (en) * 2017-03-31 2017-09-08 上海森亿医疗科技有限公司 Structured medical data library generating method and system based on medical science text message

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002089004A2 (en) * 2001-04-27 2002-11-07 In2Itive Business Group Ltd Search data management
CN1839404A (en) * 2003-07-14 2006-09-27 理智医疗情报技术株式会社 Method for computerising and standardizing medical information
CN106845058A (en) * 2015-12-04 2017-06-13 北大医疗信息技术有限公司 The standardized method of disease data and modular station
CN107145511A (en) * 2017-03-31 2017-09-08 上海森亿医疗科技有限公司 Structured medical data library generating method and system based on medical science text message

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222709B (en) * 2019-04-29 2022-01-25 上海暖哇科技有限公司 Multi-label intelligent marking method and system
CN110222709A (en) * 2019-04-29 2019-09-10 上海暖哇科技有限公司 A kind of multi-tag intelligence marking method and system
CN110321351A (en) * 2019-07-05 2019-10-11 云南电网有限责任公司电力科学研究院 A kind of vendor name method for normalizing based on fuzzy matching
CN110442672A (en) * 2019-08-13 2019-11-12 天津云数嘉合科技有限公司 A method of for data analysis and data mining case dictionary
CN110956043A (en) * 2019-12-17 2020-04-03 人和未来生物科技(长沙)有限公司 Domain professional vocabulary word embedding vector training method, system and medium based on alias standardization
CN111428029A (en) * 2020-03-05 2020-07-17 云知声智能科技股份有限公司 Operation name standardization method and device
CN111428029B (en) * 2020-03-05 2023-04-18 云知声智能科技股份有限公司 Operation name standardization method and device
CN111652737A (en) * 2020-04-17 2020-09-11 世纪保众(北京)网络科技有限公司 Insurance underwriting method and device based on text semantic processing
CN111652737B (en) * 2020-04-17 2023-12-22 世纪保众(北京)网络科技有限公司 Insurance verification method and apparatus based on text semantic processing
CN111563142A (en) * 2020-07-14 2020-08-21 成都四方伟业软件股份有限公司 SQL automatic benchmarking matching method and device
CN111899829A (en) * 2020-07-31 2020-11-06 青岛百洋智能科技股份有限公司 Full-text retrieval matching engine based on ICD9/10 participle lexicon
CN111933244A (en) * 2020-08-17 2020-11-13 医渡云(北京)技术有限公司 Medicine data encoding method and device, computer readable medium and electronic equipment
CN112668280A (en) * 2020-12-29 2021-04-16 杭州依图医疗技术有限公司 Medical data processing method and device and storage medium
CN112733528A (en) * 2020-12-31 2021-04-30 平安医疗健康管理股份有限公司 Code matching method, device and equipment for medical data and storage medium
CN112800317A (en) * 2021-02-04 2021-05-14 北京易车互联信息技术有限公司 Search platform architecture for automobile vertical field
CN113722429A (en) * 2021-08-11 2021-11-30 上海保链科技有限公司 Data normalization processing method, device and equipment and computer readable storage medium
CN113823404A (en) * 2021-08-26 2021-12-21 山东健康医疗大数据有限公司 Medical big data-based method for standardizing medical terms for construction of specific diseases
CN114242262A (en) * 2022-02-28 2022-03-25 台州市中心医院(台州学院附属医院) Medical scientific research information rapid processing system based on big data record

Similar Documents

Publication Publication Date Title
CN108922633A (en) A kind of disease name standard convention method and canonical system
CN116628172B (en) Dialogue method for multi-strategy fusion in government service field based on knowledge graph
AU2019263758B2 (en) Systems and methods for generating a contextually and conversationally correct response to a query
US8775433B2 (en) Self-indexing data structure
CN103124980B (en) Comprise collect answer from multiple document section problem answers is provided
CN111291161A (en) Legal case knowledge graph query method, device, equipment and storage medium
US9830381B2 (en) Scoring candidates using structural information in semi-structured documents for question answering systems
US8862458B2 (en) Natural language interface
JP5243167B2 (en) Information retrieval system
KR101524889B1 (en) Identification of semantic relationships within reported speech
US9613125B2 (en) Data store organizing data using semantic classification
US20120331003A1 (en) Efficient passage retrieval using document metadata
US20040243556A1 (en) System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS)
US20040243554A1 (en) System, method and computer program product for performing unstructured information management and automatic text analysis
US9239872B2 (en) Data store organizing data using semantic classification
US10503830B2 (en) Natural language processing with adaptable rules based on user inputs
CN105045852A (en) Full-text search engine system for teaching resources
CN103229162A (en) Providing answers to questions using logical synthesis of candidate answers
US9754083B2 (en) Automatic creation of clinical study reports
CN108831562A (en) A kind of disease name standard convention database and its method for building up
US9081847B2 (en) Data store organizing data using semantic classification
CN110633375A (en) System for media information integration utilization based on government affair work
CN114491079A (en) Knowledge graph construction and query method, device, equipment and medium
CN112183110A (en) Artificial intelligence data application system and application method based on data center
Kiran et al. An approach towards establishing reference linking in desktop reference manager

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181130