CN115391494B - Intelligent traditional Chinese medicine syndrome identification method and device - Google Patents

Intelligent traditional Chinese medicine syndrome identification method and device Download PDF

Info

Publication number
CN115391494B
CN115391494B CN202211323785.7A CN202211323785A CN115391494B CN 115391494 B CN115391494 B CN 115391494B CN 202211323785 A CN202211323785 A CN 202211323785A CN 115391494 B CN115391494 B CN 115391494B
Authority
CN
China
Prior art keywords
word
chinese medicine
traditional chinese
vector
symptom
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211323785.7A
Other languages
Chinese (zh)
Other versions
CN115391494A (en
Inventor
雷亮
贺跃杰
丁宇
申冠生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yuanzhi Chuangzhi Technology Co ltd
Original Assignee
Beijing Yuanzhi Chuangzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yuanzhi Chuangzhi Technology Co ltd filed Critical Beijing Yuanzhi Chuangzhi Technology Co ltd
Priority to CN202211323785.7A priority Critical patent/CN115391494B/en
Publication of CN115391494A publication Critical patent/CN115391494A/en
Application granted granted Critical
Publication of CN115391494B publication Critical patent/CN115391494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application provides a method and a device for intelligently identifying traditional Chinese medicine syndromes, wherein the method comprises the following steps: according to each target symptom word corresponding to the target traditional Chinese medicine medical record, matching the word vector database used for storing the corresponding relation between each symptom word and each word vector to obtain the respective target word vector of each target symptom word, wherein each symptom word is generated by performing dictionary matching and text word segmentation on the traditional Chinese medicine medical record description text in advance; each word vector is generated by pre-adopting N-gram language characteristics to cut words of each symptom word and training based on a CBOW or skip-gram model; generating a target traditional Chinese medicine medical record vector; and identifying the symptom information of the target traditional Chinese medicine medical case according to the target traditional Chinese medicine medical case vector and the intelligent traditional Chinese medicine symptom identification model. The method and the device can fully extract semantic features in the traditional Chinese medicine case, and can effectively improve the accuracy and effectiveness of traditional Chinese medicine syndrome classification and identification on the basis of ensuring the automation and intellectualization of traditional Chinese medicine syndrome classification and identification.

Description

Intelligent traditional Chinese medicine syndrome identification method and device
Technical Field
The application relates to the technical field of natural language processing, in particular to a method and a device for intelligently identifying traditional Chinese medicine syndromes.
Background
The traditional Chinese medicine is the traditional medicine verified by long-term practice, the symptoms of a patient are collected and analyzed in a mode of 'looking for and asking for' and the symptoms of the patient are judged according to the symptoms of the patient by using practical experience, so that the optimal treatment scheme for the symptoms is finalized and the prescription is made. However, in the traditional Chinese medicine syndrome judgment mode, the dialectical development is highly dependent on the experience of doctors in traditional Chinese medicine, the description language of symptoms lacks a unified standard, and certain obstacles exist for realizing the automation of the Chinese medicine syndrome identification. Therefore, research for automatically assisting doctors or patients with automatic recognition of chinese medical symptoms is receiving more and more attention.
At present, the existing automatic traditional Chinese medicine syndrome identification process is generally realized by applying a support vector machine, a neural network, a Bayesian statistical algorithm or a Bert model and the like, however, the methods are influenced by the lack of samples such as traditional Chinese medicine medical record document data and the like to a certain extent, so that the accuracy of the automatic traditional Chinese medicine syndrome identification by adopting the methods is limited, and the application reliability of the traditional Chinese medicine syndrome identification result is influenced.
Disclosure of Invention
In view of this, the embodiments of the present application provide a method and an apparatus for intelligently identifying chinese medicine syndromes, so as to eliminate or improve one or more defects existing in the prior art.
One aspect of the application provides an intelligent recognition method for traditional Chinese medicine syndromes, which comprises the following steps:
respectively matching and obtaining target word vectors corresponding to the target symptom words from a word vector database for storing corresponding relations between the target symptom words and the word vectors according to the target symptom words corresponding to the target traditional Chinese medicine medical record, wherein the target symptom words are generated by performing dictionary matching and text word segmentation on historical traditional Chinese medicine medical record document data in advance; each word vector is generated by pre-adopting N-gram language characteristics to cut words of each symptom word and then training the words based on a CBOW or skip-gram word vector neural network model;
generating a target traditional Chinese medicine medical record vector corresponding to each target word vector;
and identifying according to the target traditional Chinese medicine medical record vector and a preset traditional Chinese medicine syndrome intelligent identification model to obtain syndrome information corresponding to the target traditional Chinese medicine medical record.
In some embodiments of the present application, before obtaining a target word vector corresponding to each target symptom word by respectively matching in the word vector database for storing correspondence between each symptom word and each word vector, the method further includes:
acquiring data of a plurality of historical traditional Chinese medicine medical record documents;
performing dictionary matching and text word segmentation processing on the historical traditional Chinese medicine medical record document data respectively to obtain corresponding word segmentation results of the historical traditional Chinese medicine medical record containing various symptom words;
segmenting the word segmentation result of the historical traditional Chinese medicine medical record based on N-gram language features to obtain N-gram feature words corresponding to the symptom words respectively;
performing vector initialization operation on each symptom word and each N-gram feature word according to preset word vector dimensions to obtain initialization vectors corresponding to each symptom word and each N-gram feature word respectively;
model training is carried out on the initialization vector based on a CBOW or skip-gram word vector neural network model to obtain word vectors corresponding to each symptom word and each N-gram feature word, and the corresponding relation between each symptom word and each N-gram feature word and each word vector is stored in a word vector database.
In some embodiments of the present application, the intelligent recognition model of chinese medical syndrome comprises: a syndrome vector database for storing each syndrome vector and a similarity calculation formula;
correspondingly, the identifying according to the target traditional Chinese medicine medical record vector and a preset intelligent identification model of traditional Chinese medicine symptoms to obtain the corresponding symptom information of the target traditional Chinese medicine medical record comprises:
obtaining the similarity between the target traditional Chinese medicine medical case vector and each symptom vector in the symptom vector database;
selecting one with the largest value from the similarity as a target similarity, and determining a syndrome vector corresponding to the target similarity as a syndrome vector corresponding to the target Chinese medical record vector;
and outputting the symptom information corresponding to the symptom vector.
In some embodiments of the present application, before the identifying and obtaining the syndrome information corresponding to the target chinese medical record according to the target chinese medical record vector and a preset intelligent recognition model of chinese medical record, the method further includes:
classifying historical traditional Chinese medicine medical record document data corresponding to each preset symptom, and acquiring each symptom word corresponding to each historical traditional Chinese medicine medical record document data;
respectively matching in the word vector database to obtain a word vector corresponding to each symptom word;
adding the word vectors corresponding to the historical Chinese medicine medical record document data respectively to obtain medical record vectors corresponding to the historical Chinese medicine medical record document data respectively;
respectively obtaining the average value of each medical case vector classified to each symptom to obtain an initial symptom vector corresponding to each symptom;
obtaining Euclidean distance or cosine similarity between each medical case vector under each symptom and the corresponding initial symptom vector;
sequencing the medical case vectors under each symptom according to the sequence of the Euclidean distance or the cosine similarity from small to large to obtain a first medical case vector sequencing sequence corresponding to each symptom;
deleting a preset percentage of medical case vectors from the tail part of each first medical case vector sorting sequence to form a second medical case vector sorting sequence corresponding to each symptom;
obtaining the average value of each medical case vector in the second medical case vector sorting sequence corresponding to each symptom, obtaining the symptom vector corresponding to each symptom, and storing each symptom vector to a symptom vector database.
In some embodiments of the present application, the intelligent recognition model of chinese medical syndrome comprises: leaf nodes are the symptomatic Huffman tree;
correspondingly, the identifying according to the target traditional Chinese medicine medical record vector and a preset intelligent identification model of traditional Chinese medicine symptoms to obtain the corresponding symptom information of the target traditional Chinese medicine medical record comprises the following steps:
inputting the vector of the target traditional Chinese medicine medical case into a Huffman tree with leaf nodes as symptoms so that the Huffman tree outputs the corresponding symptom information of the target nodes.
In some embodiments of the present application, before the inputting the target chinese medical record vector into the huffman tree with leaf nodes as symptoms, the method further comprises:
classifying historical traditional Chinese medicine medical record document data corresponding to each preset symptom, and acquiring each symptom word corresponding to each historical traditional Chinese medicine medical record document data;
respectively matching in the word vector database to obtain respective corresponding word vectors of the symptom words so as to obtain corresponding training data sets;
adding the word vectors corresponding to the historical Chinese medicine medical record document data respectively to obtain medical record vectors corresponding to the historical Chinese medicine medical record document data respectively;
and constructing a Huffman tree of which the leaf nodes are symptoms according to the frequency of the symptoms appearing in the training data set, and iteratively updating the Huffman tree based on each medical case vector.
Another aspect of the present application provides a chinese medicine syndrome intelligent recognition apparatus, the method comprises the following steps:
the word vector matching module is used for respectively matching target word vectors corresponding to the target symptom words from a word vector database used for storing corresponding relations between the target symptom words and the word vectors according to the target symptom words corresponding to the target traditional Chinese medicine medical plan, wherein the target word vectors are generated by performing dictionary matching and text word segmentation on historical traditional Chinese medicine medical plan document data in advance; each word vector is generated by pre-adopting N-gram language characteristics to cut words of each symptom word and then training the words based on a CBOW or skip-gram word vector neural network model;
the medical case vector generating module is used for generating a target traditional Chinese medicine medical case vector corresponding to each target word vector;
and the syndrome identification module is used for identifying and obtaining the syndrome information corresponding to the target traditional Chinese medicine medical scheme according to the target traditional Chinese medicine medical scheme vector and a preset traditional Chinese medicine syndrome intelligent identification model.
In some embodiments of the present application, further comprising: a word vector database construction module;
the word vector database construction module is used for executing the following contents:
acquiring data of a plurality of historical traditional Chinese medicine medical record documents;
performing dictionary matching and text word segmentation processing on the historical traditional Chinese medicine medical record document data respectively to obtain corresponding word segmentation results of the historical traditional Chinese medicine medical record containing various symptom words;
segmenting the word segmentation result of the historical traditional Chinese medicine medical record based on N-gram language features to obtain N-gram feature words corresponding to the symptom words respectively;
performing vector initialization operation on each symptom word and each N-gram feature word according to preset word vector dimensions to obtain initialization vectors corresponding to each symptom word and each N-gram feature word respectively;
model training is carried out on the initialization vector based on a CBOW or skip-gram word vector neural network model to obtain word vectors corresponding to each symptom word and each N-gram feature word, and the corresponding relation between each symptom word and each N-gram feature word and each word vector is stored in a word vector database.
Another aspect of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the intelligent recognition method for chinese medical syndrome when executing the computer program.
Another aspect of the present application provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the intelligent recognition method of chinese medical syndrome.
According to the intelligent traditional Chinese medicine symptom identification method, according to target symptom words corresponding to a target traditional Chinese medicine medical plan, target word vectors corresponding to the target symptom words are respectively obtained by matching in a word vector database used for storing corresponding relations between the symptom words and the word vectors, wherein the symptom words are generated by performing dictionary matching and text word segmentation on historical traditional Chinese medicine medical plan document data in advance; each word vector is generated by pre-adopting N-gram language characteristics to cut words of each symptom word and then training the words based on a CBOW or skip-gram word vector neural network model; generating a target traditional Chinese medicine medical record vector corresponding to each target word vector; according to the method, the corresponding symptom information of the target traditional Chinese medicine case is obtained through recognition according to the target traditional Chinese medicine case vector and a preset intelligent traditional Chinese medicine symptom recognition model, the word vector expression is carried out on each symptom word by segmenting the word segmentation result of the historical traditional Chinese medicine case through N-gram language features, the semantic environment of the word vector can be expressed in a more detailed mode through the word vector of the word vector database, the semantic features in the traditional Chinese medicine case document can be fully extracted when the word vector database is used for intelligently recognizing traditional Chinese medicine symptoms, the problem that the recognition accuracy is low due to the fact that the existing intelligent traditional Chinese medicine symptom recognition method is not enough in sample number can be solved, the accuracy and the effectiveness of the traditional Chinese medicine symptom classification recognition can be effectively improved on the basis of guaranteeing the automation and the intellectualization of the traditional Chinese medicine symptom classification recognition, the application reliability of the traditional Chinese medicine symptom recognition result can be further, and the user experience of doctors, patients and the like applying the traditional Chinese medicine symptom recognition method can be improved.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present application are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present application will be more clearly understood from the detailed description that follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, are incorporated in and constitute a part of this application, and are not intended to limit the application. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the application. For purposes of illustrating and describing certain portions of the present application, the drawings may have been enlarged, i.e., may be larger, relative to other features of the exemplary devices actually made in accordance with the present application. In the drawings:
fig. 1 is a schematic flow chart of an intelligent recognition method for chinese medical symptoms in an embodiment of the present application.
Fig. 2 is another schematic flow chart of an intelligent recognition method for chinese medical symptoms in an embodiment of the present application.
Fig. 3 is a flowchart illustrating a specific process of step 300 in the method for intelligently identifying chinese medical syndrome in an embodiment of the present application.
Fig. 4 is a schematic structural diagram of an intelligent recognition apparatus for chinese medical syndrome in another embodiment of the present application.
Fig. 5 is another schematic structural diagram of an intelligent recognition apparatus for chinese medical syndrome in another embodiment of the present application.
Fig. 6 is a schematic processing procedure diagram of a word vector representation method provided in an application example of the present application.
Fig. 7 is an exemplary schematic diagram of a piece of historical medical record document data provided in an application example of the present application.
Fig. 8 is an exemplary diagram of correspondence relationship between physician-medical record-symptom combination provided in the application example of the present application.
FIG. 9 is a diagram illustrating a process of updating a syndrome vector according to an exemplary application of the present application.
Fig. 10 is a schematic diagram illustrating an exemplary CBOW model provided in an application example of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present application are provided to explain the present application and not to limit the present application.
Here, it should be further noted that, in order to avoid obscuring the present application with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present application are shown in the drawings, and other details not so relevant to the present application are omitted.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
It is also noted that, unless otherwise specified, the term "coupled" is used herein to refer not only to a direct connection, but also to an indirect connection with an intermediate.
Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. In the drawings, the same reference numerals denote the same or similar parts, or the same or similar steps.
The traditional Chinese medicine syndrome classification or identification method has the following limitations:
(1) Dialectical prescription is highly dependent on the experience of the doctor of traditional Chinese medicine. Physicians with less experience may have difficulty in determining the most appropriate clinical manifestations for the patient compared to more experienced physicians. However, the number of experienced physicians in traditional Chinese medicine is scarce, which causes certain limitations to the resources of traditional Chinese medicine, and thus some automatic recognition modes of traditional Chinese medicine symptoms are required for assistance.
(2) There is a lack of uniform criteria for the description language of symptoms. I.e. the symptoms that a number of physicians acquire and record by looking at the questions of highly similar patients, there may be variability. For example, one physician may record "fever" and another physician may record "fever"; or if one physician records "soreness of limbs" and the other physician records "soreness of limbs". Different description languages for highly similar symptoms have presented difficulties in existing automatic recognition of chinese medical symptoms.
The existing automatic Chinese medicine syndrome identification method includes the following steps:
mode 1: the traditional Chinese medicine symptoms of the cardiovascular diseases are classified by using a support vector machine and a neural network, and the accuracy rate is over 60 percent;
mode 2: classifying the established symptoms of the consumptive lung disease patient and the expert group symptom classification data by using a Bayesian statistical algorithm, and taking the first ten symptoms with higher scores as a symptom classification rule, so that the robustness of the syndrome differentiation rule is improved;
mode 3: and extracting the text features of the symptoms by the Bert model, and reintegrating the multi-scale semantic features by means of pyramid pooling and the like to construct a symptom classification model.
However, both the above modes 1 and 2 are only for a certain subdivision field, and when the traditional statistical machine learning algorithm faces the condition of limited sample number, the generalization capability of the corresponding syndrome classification model is weak, and the prediction accuracy is reduced; in the method 3, the Bert model with higher algorithm complexity is utilized, so that the recognition accuracy is improved, but the recognition efficiency is lower in an actual scene, and the application reliability of the Chinese medicine syndrome recognition result is further influenced.
Therefore, the method for intelligently identifying the traditional Chinese medicine symptoms can fully extract semantic features in a traditional Chinese medicine case, and can effectively improve the accuracy and the effectiveness of the traditional Chinese medicine symptoms classification identification on the basis of ensuring the automation and the intellectualization of the traditional Chinese medicine symptoms classification identification.
In one or more embodiments of the present application, the syndrome of TCM may also be abbreviated as: the syndromes refer to a series of interrelated general terms of symptoms in traditional Chinese medicine; the disease process known by the four clinics of inspection, smell, inquiry and cutting shows the body reaction state and the movement and change thereof on the whole level.
The details are explained by the following examples.
In order to effectively improve the accuracy and effectiveness of the classification and identification of the traditional Chinese medicine syndromes on the basis of ensuring the automation and the intelligence of the classification and identification of the traditional Chinese medicine syndromes, the embodiment of the application provides an intelligent identification method of the traditional Chinese medicine syndromes, and referring to fig. 1, the intelligent identification method of the traditional Chinese medicine syndromes, which can be executed by an intelligent identification device of the traditional Chinese medicine syndromes, specifically comprises the following contents:
step 100: respectively matching and obtaining target word vectors corresponding to the target symptom words from a word vector database for storing corresponding relations between the target symptom words and the word vectors according to the target symptom words corresponding to the target traditional Chinese medicine medical record, wherein the target symptom words are generated by performing dictionary matching and text word segmentation on historical traditional Chinese medicine medical record document data in advance; and each word vector is generated by pre-adopting N-gram language characteristics to cut words of each symptom word and then training the word vectors based on a CBOW or skip-gram word vector neural network model.
In one or more embodiments of the present application, the target chinese medical record refers to the archive data of the chinese medical record to be currently subjected to chinese medical syndrome recognition, and the target symptom words refer to the symptom words recognized from the target chinese medical record; the target word vector is a word vector which is obtained in a word vector database in a matching mode and corresponds to the target traditional Chinese medicine medical record.
In step 100, the intelligent recognition apparatus for chinese medical syndrome may receive image data of a target medical solution document from a client device or the like, recognize the image data based on OCR character recognition or the like to obtain text data corresponding to the image data, pre-process the text data, and extract each target symptom word from the text data based on a preset word bank for chinese medical syndrome.
It is understood that the target or historical medical records may include at least one of western medical diagnosis, clinical manifestations, treatment and prescription. In order to further improve the accuracy of intelligent recognition of traditional Chinese medicine symptoms, a concrete implementation mode of the data of a target traditional Chinese medicine medical record or historical traditional Chinese medicine medical record can include five types of data of western medicine diagnosis, traditional Chinese medicine diagnosis, clinical manifestation, treatment and prescription.
In one or more embodiments of the present application, based on CBOW or skip-gram, the Model for introducing the Language features of N-gram is a Language Model LM (Language Model) based on neural network and huffman tree in natural Language processing, and the Language Model is a discriminant Model based on probability.
In step 100, the dictionary matching refers to performing symptom professional word matching on the historical medical record document data in preset dictionary data, and using the matched data as a symptom data matching result corresponding to the historical medical record document data.
Based on this, examples of performing dictionary matching and text word segmentation on the historical traditional Chinese medicine medical record document data in advance are as follows: the method can be used for establishing a symptom professional vocabulary table in the traditional Chinese medicine field as dictionary data in advance, firstly matching symptom professional words from the data of the historical traditional Chinese medicine medical record to obtain a symptom data matching result, then performing word segmentation on the matching result to obtain each symptom word, and taking each obtained symptom word as a word segmentation result of the historical traditional Chinese medicine medical record.
In one or more implementations of the present application, N-gram linguistic features refer not to an N-gram linguistic model, but to a means of word segmentation based on N-grams, examples of applications of which are: and aiming at each symptom word in the word segmentation result of the historical traditional Chinese medicine medical record, segmenting each symptom word by adopting a character-level sliding window with a fixed size to obtain N-gram feature words corresponding to each symptom word, and then obtaining initialization vectors corresponding to each symptom word and each N-gram feature word. Step 200: and generating a target traditional Chinese medicine medical record vector corresponding to each target word vector.
It can be understood that the target chinese medical record vectors are generated according to each target word vector, and specifically, the target word vectors may be obtained by directly summing each target word vector, or by performing weighted summation on each target word vector based on a preset weight value corresponding to each word vector, and the like, and the manner of accumulating or superimposing various vectors is applicable to step 300 in the embodiment of the present application. In a preferred example of step 300, a target chinese medical record vector P may be obtained by directly summing each target word vector, so as to effectively reduce the calculation complexity of the target chinese medical record vector on the basis of ensuring the application reliability of the target chinese medical record vector.
Step 300: and identifying according to the target traditional Chinese medicine medical record vector and a preset traditional Chinese medicine syndrome intelligent identification model to obtain syndrome information corresponding to the target traditional Chinese medicine medical record.
In step 300, the intelligent recognition device of traditional Chinese medicine syndrome can input the target traditional Chinese medicine medical case vector into a preset machine learning model or other data structures for intelligent recognition of traditional Chinese medicine syndrome, so as to output the syndrome vector corresponding to the target traditional Chinese medicine medical case by using the machine learning model or other data structures for intelligent recognition of traditional Chinese medicine syndrome, and take the syndrome text information corresponding to the syndrome vector as the intelligent recognition result of traditional Chinese medicine syndrome.
From the above description, it can be seen that the intelligent recognition method for traditional Chinese medicine symptoms provided in the embodiments of the present application segments the word segmentation results of the historical traditional Chinese medicine cases by using N-gram linguistic features, and represents each symptom word by using a word vector, so that the word vector of the word vector database can more finely express the semantic environment thereof, and further, when the word vector database is used for intelligent recognition of traditional Chinese medicine symptoms, the semantic features in the traditional Chinese medicine case documents can be fully extracted, the problem of low recognition accuracy of the existing intelligent recognition method for traditional Chinese medicine symptoms due to insufficient sample number can be solved, the accuracy and effectiveness of the classification recognition of traditional Chinese medicine symptoms can be effectively improved on the basis of ensuring the automation and intellectualization of the classification recognition of traditional Chinese medicine symptoms, and further, the application reliability of the recognition results of traditional Chinese medicine symptoms can be improved, and the user experience of doctors, patients and the like using the intelligent recognition method for traditional Chinese medicine symptoms can be improved.
In order to further improve the accuracy of performing intelligent recognition of chinese medical syndrome by using a word vector database, in the intelligent recognition method of chinese medical syndrome provided in the embodiment of the present application, referring to fig. 2, the following contents are also specifically included before step 100 in the intelligent recognition method of chinese medical syndrome:
step 010: acquiring data of a plurality of historical traditional Chinese medicine medical records.
In step 010, document data of a large-scale medical record of traditional Chinese medicine is obtained, wherein the medical record document comprises western medicine diagnosis, traditional Chinese medicine diagnosis, clinical manifestation, treatment and prescription data.
Step 020: and performing dictionary matching and text word segmentation processing on the historical traditional Chinese medicine medical record document data respectively to obtain corresponding word segmentation results of the historical traditional Chinese medicine medical record containing various symptom words.
Step 030: segmenting the word segmentation result of the historical traditional Chinese medicine medical record based on the N-gram language characteristics to obtain N-gram characteristic words corresponding to the symptom words respectively.
Step 040: and performing vector initialization operation on each symptom word and each N-gram feature word according to a preset word vector dimension to obtain initialization vectors corresponding to each symptom word and each N-gram feature word respectively.
Step 050: model training is carried out on the initialization vector based on a CBOW or skip-gram word vector neural network model to obtain word vectors corresponding to each symptom word and each N-gram feature word, and the corresponding relation between each symptom word and each N-gram feature word and each word vector is stored in a word vector database.
In step 050, the Word vector representation may adopt a CBOW Word vector representation or a Skip-gram Word vector representation in Word2Vec, where Word2Vec is a type of neural network model — a vector capable of expressing semantics is generated for words in a corpus given unlabeled corpus. The CBOW word vector representation mode specifically refers to a continuous bag-of-words model CBOW, and both the CBOW and Skip-gram word vector neural network models are implementation methods for carrying out vector representation on texts in word2 vec. After training is completed, each word is used as a central word, word vectors of surrounding words are adjusted, and therefore word vectors of all words in the whole text are obtained.
The prediction times of the Skip-gram word vector neural network model are more than that of CBOW: because each word, when it is the central word, is predicted once using the surrounding words. This corresponds to K more passes than the CBOW approach (assuming K is the window size), and thus the time complexity is O (KV) and the training time is longer than CBOW. Therefore, in order to further improve the efficiency of word vector representation and further improve the efficiency of the intelligent recognition process of chinese medicine syndrome, in the intelligent recognition method of chinese medicine syndrome provided in the embodiment of the present application, a CBOW word vector representation mode may be selected as a specific implementation mode of the word vector representation mode in step 050, and step 050 in the intelligent recognition method of chinese medicine syndrome specifically includes the following contents:
step 051: initializing the symptoms of preset word vector dimension degrees and N-gram characteristic word vectors thereof;
step 052: according to a preset fixed window, according to a CBOW word vector training mode, covering the intermediate times, predicting intermediate words by using the context in the window, traversing the word vectors which appear in the context and are initialized, accumulating the vectors with the dimension number V to obtain a new vector, and calling the new vector as a projection layer.
And then, a vector with the projection layer dimension degree of V is accessed into a binary tree (Hoffman tree), and leaf nodes of the binary tree are all known words in a word bank. And each non-leaf node is provided with a Sigmoid function, and the Sigmoid value is calculated by using the dot product result of the input vector and the parameter vector theta. The left sub-tree is selected if the Sigmoid value is 0.5 or more and the right sub-tree is selected if the Sigmoid value is less than 0.5. Recursively points the vector with the input dimension V to a leaf node, i.e. the corresponding intermediate word to be predicted. This binary tree is referred to as the output layer.
And then outputting a target word w according to the given context (w), determining a maximum likelihood function, iteratively updating a node parameter vector theta and a word vector X by using a gradient ascending process until the gradient is converged, and finally obtaining all word vectors to form a word vector database.
From the above description, it can be known that the intelligent recognition method for the traditional Chinese medicine syndrome provided in the embodiments of the present application can effectively improve the efficiency of the expression of the word vector in the traditional Chinese medicine on the aspect of ensuring the accuracy of the expression result of the word vector by adopting the CBOW word vector expression mode.
In another example of step 050, a Skip-gram word vector neural network model may be selected to perform model training as a specific implementation of the word vector representation in step 050, and therefore step 050 in the method for intelligent recognition of chinese medical syndrome may further include the following steps:
step 053: initializing the symptoms of preset word vector dimension degrees and N-gram characteristic word vectors thereof;
step 054: according to a preset fixed window, according to a Skip-gram word vector training mode, covering other words except the intermediate words, predicting the upper and lower words by using the intermediate words in the window, traversing the initialized word vectors appearing in the context, traversing the combination formed by each upper and lower word and the intermediate word, and calling each upper and lower word vector as a projection layer.
And then, a vector with the projection layer dimension degree of V is accessed into a binary tree (Hoffman tree), and leaf nodes of the binary tree are all known words in a word bank. And each non-leaf node is provided with a Sigmoid function, and a Sigmoid value is calculated by using a dot product result of the input vector and the parameter vector theta. The left sub-tree is selected if the Sigmoid value is 0.5 or more and the right sub-tree is selected if the Sigmoid value is less than 0.5. Recursively pointing the vector with the input dimension number V to a leaf node, i.e. the corresponding intermediate word to be predicted. This binary tree is referred to as the output layer.
And then outputting a target word w according to the given context (w), determining a maximum likelihood function, iteratively updating a node parameter vector theta and a word vector X by using a gradient ascending process until the gradient is converged, and finally obtaining all word vectors to form a word vector database.
In order to further solve the problems of high computational complexity and low efficiency commonly existing in the existing automatic recognition mode of traditional Chinese medicine syndrome, in the intelligent recognition method of traditional Chinese medicine syndrome provided by the embodiment of the application, the intelligent recognition model of traditional Chinese medicine syndrome comprises: a syndrome vector database for storing each syndrome vector and a similarity calculation formula; referring to fig. 3, one implementation manner of step 300 in the intelligent recognition method for chinese medical syndrome specifically includes the following contents:
step 310: and obtaining the similarity between the target traditional Chinese medicine medical case vector and each symptom vector in the symptom vector database.
It can be understood that a similarity calculation formula based on the similarity between the target medical plan vector and each of the syndrome vectors in the syndrome vector database may be adopted, and the similarity calculation formula may select a cosine similarity calculation formula:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 380688DEST_PATH_IMAGE002
represents the ith target TCM medical record vector,
Figure DEST_PATH_IMAGE003
representing the ith syndrome vector.
Step 320: and selecting one with the largest value from the similarity as a target similarity, and determining a syndrome vector corresponding to the target similarity as a syndrome vector corresponding to the target Chinese medical record vector.
Step 330: and outputting the symptom information corresponding to the symptom vector.
As can be seen from the above description, the intelligent recognition method for traditional Chinese medicine syndrome provided in the embodiments of the present application can effectively reduce the computational complexity of intelligent recognition for traditional Chinese medicine syndrome and improve the recognition efficiency by using the similarity between each syndrome vector in the syndrome vector database.
In order to improve the application reliability and effectiveness of the syndrome vector database for storing each syndrome vector, in the method for intelligently identifying a chinese medical syndrome provided in the embodiment of the present application, the following contents are further included before step 310 in the method for intelligently identifying a chinese medical syndrome:
step 061: classifying historical traditional Chinese medicine medical record document data corresponding to each preset symptom, and acquiring each symptom word corresponding to each historical traditional Chinese medicine medical record document data;
step 062: respectively matching in the word vector database to obtain a word vector corresponding to each symptom word;
step 063: adding the word vectors corresponding to the historical traditional Chinese medicine medical record document data respectively to obtain medical record vectors corresponding to the historical traditional Chinese medicine medical record document data respectively;
step 064: respectively obtaining the average value of each medical case vector classified under each symptom to obtain an initial symptom vector corresponding to each symptom;
step 065: obtaining Euclidean distance or cosine similarity between each medical case vector under each symptom and the corresponding initial symptom vector;
step 066: sequencing the medical case vectors under each symptom according to the sequence from small Euclidean distance or cosine similarity to large, and obtaining a first medical case vector sequencing sequence corresponding to each symptom;
step 067: deleting a preset percentage of medical case vectors from the tail of each first medical case vector sorting sequence to form a second medical case vector sorting sequence corresponding to each symptom;
step 068: obtaining the average value of each medical case vector in the second medical case vector sorting sequence corresponding to each symptom, obtaining the symptom vector corresponding to each symptom, and storing each symptom vector to a symptom vector database.
In one example, the preset percentage may be set according to the specific application requirement, and may be selected before 1% to 10%, for example, may be 5%.
Specifically, classifying each medical case according to symptoms, traversing all symptoms of each medical case, and aiming at each symptom, removing word vector representation in a word vector database to obtain word vector representation of the symptom; adding all the word vectors under each medical case to represent the medical case; summing all medical case vectors under the symptom and taking the average to represent the symptom; sorting the medical cases under each symptom from small to large according to the Euclidean distance from the central point of each medical case, and removing the medical cases positioned in the last 5 percent of the sequence; the center point is then recalculated using the remaining 95% of the cases and the syndrome vector is updated to this center point. And locally storing the updated syndrome vector.
In order to further solve the problems of high computational complexity and low efficiency commonly existing in the existing automatic recognition mode of traditional Chinese medicine syndrome, in the intelligent recognition method of traditional Chinese medicine syndrome provided by the embodiment of the application, the intelligent recognition model of traditional Chinese medicine syndrome comprises: the leaf nodes are a Huffman tree of the syndrome vector; another implementation manner of step 300 in the intelligent recognition method of traditional Chinese medicine syndrome specifically includes the following contents:
step 340: inputting the target Chinese medical record vector into a Huffman tree with leaf nodes as syndrome vectors so that the Huffman tree outputs syndrome information corresponding to the target nodes.
As can be seen from the above description, the intelligent recognition method for traditional Chinese medicine syndromes provided in the embodiments of the present application can effectively reduce the computational complexity of intelligent recognition for traditional Chinese medicine syndromes and improve the recognition efficiency by using the huffman tree with the leaf nodes as the syndrome vectors.
In order to improve the application reliability and effectiveness of the huffman tree with leaf nodes as syndrome vectors, in the method for intelligently identifying a chinese medical syndrome provided in the embodiment of the present application, the following contents are further included before step 340 in the method for intelligently identifying a chinese medical syndrome:
step 071: classifying historical traditional Chinese medicine medical record document data corresponding to each preset symptom, and acquiring each symptom word corresponding to each historical traditional Chinese medicine medical record document data;
step 072: respectively matching in the word vector database to obtain respective corresponding word vectors of the symptom words so as to obtain corresponding training data sets;
step 073: adding the word vectors corresponding to the historical Chinese medicine medical record document data respectively to obtain medical record vectors corresponding to the historical Chinese medicine medical record document data respectively;
step 074: and constructing a Huffman tree of which the leaf nodes are symptoms according to the frequency of the symptoms appearing in the training data set, and iteratively updating the Huffman tree based on each medical case vector.
Specifically, training data of a physician-medical-symptom combination can be established first; traversing all symptoms under each medical case, and matching each symptom in a trained word vector library to obtain vectorized representation; adding all the word vectors; constructing a Huffman tree according to the occurrence frequency of symptoms in the training sample; and inputting the result of adding the word vectors into a Huffman tree, constructing a likelihood function according to the target symptom, selecting a gradient ascending method according to the probability maximization principle, and iteratively updating the Huffman vectorization parameter theta and the input word vector X.
From the software aspect, the present application further provides an intelligent recognition apparatus for chinese medical syndrome, which is used for executing all or part of the intelligent recognition method for chinese medical syndrome, and referring to fig. 4, the intelligent recognition apparatus for chinese medical syndrome specifically includes the following contents:
a word vector matching module 10, configured to match, according to each target symptom word corresponding to a target medical record, a word vector database for storing a correspondence between each symptom word and each word vector to obtain a target word vector corresponding to each target symptom word, where each symptom word is generated by performing dictionary matching and text word segmentation on historical medical record data in advance; each word vector is generated by pre-adopting N-gram language characteristics to cut words of each symptom word and then training the words based on a CBOW or skip-gram word vector neural network model;
a medical plan vector generation module 20, configured to generate a target traditional Chinese medical plan vector corresponding to each target word vector;
and the syndrome identification module 30 is configured to identify and obtain syndrome information corresponding to the target chinese medical scheme according to the target chinese medical scheme vector and a preset chinese medical syndrome intelligent identification model.
Referring to fig. 5, the intelligent recognition apparatus for chinese medical syndrome further specifically includes: a word vector database construction module 01;
the word vector database construction module 01 is configured to perform the following:
acquiring document data of a plurality of historical traditional Chinese medicine medical records;
performing dictionary matching and text word segmentation processing on the historical traditional Chinese medicine medical record document data respectively to obtain corresponding word segmentation results of the historical traditional Chinese medicine medical record containing various symptom words;
segmenting word segmentation results of the historical traditional Chinese medicine medical record based on N-gram language features to obtain N-gram feature words corresponding to the symptom words respectively;
performing vector initialization operation on each symptom word and each N-gram feature word according to preset word vector dimensions to obtain initialization vectors corresponding to each symptom word and each N-gram feature word respectively;
model training is carried out on the initialization vector based on a CBOW or skip-gram word vector neural network model to obtain word vectors corresponding to each symptom word and each N-gram feature word, and the corresponding relation between each symptom word and each N-gram feature word and each word vector is stored in a word vector database.
The embodiment of the intelligent recognition device for Chinese medicine syndrome provided by the application can be specifically used for executing the processing flow of the embodiment of the intelligent recognition method for Chinese medicine syndrome in the above embodiment, and the functions of the device are not repeated herein, and reference can be made to the detailed description of the embodiment of the intelligent recognition method for Chinese medicine syndrome.
The part of the intelligent Chinese medicine syndrome recognition device for performing intelligent Chinese medicine syndrome recognition can be executed in a server, and in another practical application situation, all the operations can be completed in a client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. If all operations are completed in the client device, the client device may further include a processor for performing specific processing of intelligent recognition of chinese medical symptoms.
The client device may have a communication module (i.e., a communication unit) and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.
The server and the client device may communicate using any suitable network protocol, including a network protocol that has not been developed at the filing date of the present application. The network protocol may include, for example, a TCP/IP protocol, a UDP/IP protocol, an HTTP protocol, an HTTPS protocol, or the like. Of course, the network Protocol may also include, for example, an RPC Protocol (Remote Procedure Call Protocol), a REST Protocol (Representational State Transfer Protocol), and the like used above the above Protocol.
From the above description, it can be seen that the intelligent recognition device for traditional Chinese medicine syndrome provided in the embodiments of the present application segments the word segmentation result of the historical traditional Chinese medicine case by using the N-gram linguistic features, and performs word vector representation on each symptom word and its N-gram feature word, so that the word vector of the word vector database can more finely express the semantic environment thereof, and further, when the word vector database is used for intelligent recognition of traditional Chinese medicine syndrome, the semantic features in the traditional Chinese medicine case document can be fully extracted, the problem of recognition accuracy caused by insufficient sample number in the existing intelligent recognition method for traditional Chinese medicine syndrome can be solved, and the accuracy and effectiveness of the classification recognition of traditional Chinese medicine syndrome can be effectively improved on the basis of ensuring the automation and intelligence of the classification recognition of traditional Chinese medicine syndrome, and further, the application reliability of the recognition result of traditional Chinese medicine syndrome can be improved, and the user experience of doctors, patients and the like using the recognition method for traditional Chinese medicine syndrome can be improved.
In order to further explain the scheme, the application also provides a specific application example of the intelligent traditional Chinese medicine syndrome recognition method, and particularly relates to the training and using process of an intelligent traditional Chinese medicine syndrome recognition model based on a word vector technology.
The technical scheme adopted by the application example of the application comprises the following steps:
symptom word vector representation
Referring to FIG. 6, the symptom word vector representation method includes the processing procedures of symptom description, N-gram processing, initialization and the like, wherein d is greater than or equal to 4; [ X ] 11 、X 12 、X 13 …X 1d ]Dimension d code that refers to "bad"; [ X ] 21 、X 22 、X 23 …X 2d ]Dimension d code which refers to "chilliness"; [ X ] 31 、X 32 、X 33 …X 3d ]Dimension d code which refers to "cold"; [ X ] 41 、X 42 、X 43 …X 4d ]Dimension d code referred to as "cough"; [ X ] 51 、X 52 、X 53 …X 5d ]Dimension d code that refers to "cough"; [ X ] 61 、X 62 、X 63 …X 6d ]Dimension d coding which means 'cough and asthma'; [ X ] 71 、X 72 、X 73 …X 7d ]D-dimension coding, which refers to "more than"; [ X ] 81 、X 82 、X 83 …X 8d ]Dimension d encoding referred to as "header"; [ X ] 91 、X 92 、X 93 …X 9d ]Dimension d code for "headache"; [ X ] n1 、X n2 、X n3 …X nd ]The dimension d code corresponding to the nth symptom word "pain" is referred to.
The concrete description is as follows:
step 11: document data of all historical medical records, each containing a diagnosis of traditional Chinese medicine, clinical manifestations, treatment and prescription, is obtained, see fig. 7.
Step 12: matching the medical plan text by a dictionary, then segmenting the content, further segmenting the segmentation result according to a specified N-gram method, and putting all results containing symptoms and N-gram characteristic words into a word stock.
Step 13: and initializing the symptoms of the preset word vector dimension number and the N-gram characteristic word vectors thereof.
Step 14: and (3) appointing a context window with a fixed size aiming at all medical case texts, adopting a CBOW word vector training principle, and predicting intermediate words and maximizing the probability by using context words. Traversing each window, obtaining initialization vectors of all words except predicted words, adding all word vectors related to the context in the window to obtain an accumulated vector, and referring to fig. 10 for the CBOW model.
Step 15: inputting the projection layer vector in the step 14 into a huffman tree generated according to the word frequency, acquiring a huffman path of a word to be predicted in a window, determining a likelihood function for maximizing the probability, and iteratively updating the vectorization parameter theta in the huffman tree and the word vector of the input layer by using a gradient ascending principle.
Specifically, each vector with dimension number V is accumulated to obtain a new vector, which is called a projection layer. The projection layer uses the formula:
Figure 625724DEST_PATH_IMAGE004
where X represents each context word vector and Z represents the vector generated by the accumulation of the word vectors.
And (3) accessing the new vector with the dimension number V into a binary tree (Huffman tree), wherein leaf nodes of the binary tree are all words in a known word bank. And each non-leaf node is provided with a Sigmoid function, and the Sigmoid value is calculated by using the dot product result of the input vector and the parameter vector theta. The left sub-tree is selected if the Sigmoid value is equal to or greater than 0.5 and the right sub-tree is selected if the Sigmoid value is less than 0.5. Recursively points the vector with the input dimension V to a leaf node, i.e. the corresponding intermediate word to be predicted. This binary tree is referred to as the output layer. The formula of Sigmoid function is as follows:
Figure DEST_PATH_IMAGE005
the formula for selecting the left and right subtrees using Sigmoid function is as follows:
Figure 936620DEST_PATH_IMAGE006
and outputting a target word w according to the given context (w), determining a maximum likelihood function, iteratively updating a node parameter vector theta and a word vector X by using a gradient ascending process until the gradient is converged, and finally obtaining all word vectors to form a word vector database as the input of the next step.
(II) the first intelligent Chinese medicine syndrome identification method comprises the following steps: implementation using a syndrome vector database for storing individual syndrome vectors and a similarity calculation formula
Step 21: classifying each medical case according to symptoms, traversing all symptoms of each medical case, and aiming at each symptom, removing word vector in the word vector database for matching to obtain the word vector representation of the symptom.
Specifically, referring to fig. 8, correspondence relationships between combinations of physician-medical case-symptom are established. Each physician diagnoses a plurality of medical records, each medical record corresponds to a symptom combination of one patient, each symptom combination comprises a plurality of symptoms, and each symptom combination corresponds to one symptom.
And (3) while establishing the corresponding relation between symptom combinations and symptoms, cleaning the existing medical records, extracting each symptom one by one, and matching according to the word vector database in the step (I) to obtain the word vector information of each symptom.
Step 22: and adding all symptom word vectors under each case to represent the case.
Specifically, for each medical record, the word of each symptom is shown inThe quantities are accumulated to obtain a dimension Z i The vector of (a) is represented as an overall vector representation of all symptoms under the medical condition.
Step 23: the vector of all cases under the syndrome is summed up and averaged to represent the syndrome.
Specifically, all cases under a symptom are taken, and the symptom global vector of each case is accumulated to represent Z i And averaging to obtain the center point of all medical record vectors under the syndrome, and temporarily using the center point as the syndrome vector. The central point of each medical case vector is calculated as follows:
Figure DEST_PATH_IMAGE007
wherein m represents the total number of medical records recorded under the symptom, and C represents the vector of the central point, namely the symptom vector.
And step 24: the medical cases under each symptom are sorted from small to large according to the Euclidean distance from the center point, and the medical cases positioned at the last 5 percent (selected from the range of 1 to 10 percent) of the sequence are removed. Then, the center point is recalculated by using the reserved 95% medical records, such as the update process of the syndrome vector C shown in FIG. 9, the syndrome vector is updated to the center point, and the updated syndrome vector is locally stored, wherein V 1 And V 2 And respectively corresponding word vectors of different symptom words.
Specifically, the euclidean distance between each medical case and the center point under each symptom is calculated one by one, and the calculation formula of the euclidean distance is as follows:
Figure 987622DEST_PATH_IMAGE008
step 25: a prediction stage: adding all the symptom word vectors of the input medical case, making similarity with each symptom expression vector, and taking the symptom with the highest similarity for output.
Specifically, after the model is trained, the prediction task can be executed. Scanning requires forehandAnd (4) the symptom descriptions of the measured medical records are obtained one by one, and the symptoms included in the symptom descriptions are recorded in the symptom bank to form a symptom combination. Using the symptom combination as prediction data, after N-gram language feature processing, removing word vector database to match word vectors of each symptom, superposing the word vectors to obtain P, calling all symptom vectors { C } from the database, calculating vector P of predicted medical case and output symptom vector C i Cosine similarity of (c). The cosine similarity is calculated as follows:
Figure DEST_PATH_IMAGE009
wherein, P represents the vector of the predicted medical case, and C represents the output symptom vector.
And sequencing the similarity of the predicted medical case and all symptoms, finding out the symptom result with the highest similarity, outputting the result and ending the use stage of the model.
(III) the second intelligent Chinese medicine syndrome identification method comprises the following steps: hoffman tree implementation using leaf nodes as syndromes
Step 31: training data for physician-medical-symptom combinations are established. And traversing all symptoms under each medical case, wherein each symptom can be matched in the trained word vector library to obtain vectorization expression. All symptom word vectors are added.
Specifically, the correspondence relationship between physician-medical record-symptom combination is established. Each physician diagnoses a plurality of medical records, each medical record corresponds to a symptom combination of one patient, each symptom combination comprises a plurality of symptoms, and each symptom combination corresponds to one symptom.
And (4) aiming at each medical case, taking all symptoms, then sending the symptoms to the word vector database in the step (I), matching to obtain word vector representation of each symptom, summing the word vector representations, and outputting the word vector representation as a projection layer to a Hoffman tree.
Step 32: according to the occurrence frequency of the symptoms in the training samples, the word frequency of each symptom is counted, and a Huffman tree with the leaf nodes being the symptoms is established according to the word frequency, as shown in an output layer in FIG. 10.
Step 33: and inputting the result of adding the symptom word vectors into a Huffman tree, constructing a likelihood function according to the target symptom, selecting a gradient ascending method according to the probability maximization principle, and iteratively updating the Huffman vectorization parameter theta and the input word vector X.
Specifically, according to the target symptom to be predicted, a maximum likelihood function Confidence is constructed, and a node parameter vector theta and a word vector X of the Huffman tree are iteratively updated by using a gradient ascending process.
Based on fig. 10, configence (wind-cold beam meter) =Confidence(n 0 ,left)×Confidence(n 1 left)×Confidence(n 2 ,left)
Wherein V in FIG. 10 1 +V 2 +…+V n Representing the summation of the word vectors corresponding to each symptom word; n is a radical of an alkyl radical 0 、n 1 And n 2 Respectively representing each node on the binary tree; the Confidence in the above formula indicates that the symptom is the Confidence in the wind-cold syndrome table;Confidence(n 0 ,left)representing a passing node n 0 Confidence of left-going;Confidence(n 1 ,left)representing a passing node n 1 Confidence of left-going;Confidence(n 2 ,left)representing a passing node n 2 Confidence of left-going.
Step 34: a prediction stage: and adding all the symptom word vectors of the input medical plan, inputting the trained Hoffman tree, and outputting the final result of the leaf node symptom information to realize the intelligent syndrome differentiation learning method.
Specifically, each symptom is input in the prediction stage, each symptom word vector is obtained by matching the word vector database output in the step (I), all the symptom vectors are accumulated and input to the Huffman tree with the trained parameter vector theta, the symptom information of the final leaf node is output, the prediction is completed, and the intelligent syndrome differentiation is finished.
In summary, the intelligent recognition method for traditional Chinese medicine syndromes provided by the application example of the application is based on large-scale real medical record document data, words are further segmented by using N-gram language features, a CBOW word vector principle is adopted, and the trained word vectors can express semantic environments of the words in a more detailed manner and are input into a syndrome classification model; the invention provides two symptom prediction methods, wherein the first method is based on the similarity of a symptom vector and a prediction vector, and the second method is based on a neural network and a Hoffman tree, so that semantic features in medical case documents can be fully extracted, and the online prediction efficiency in a production environment is remarkably improved on the premise of ensuring the classification accuracy.
The embodiment of the present application further provides an electronic device (that is, an electronic device), where the electronic device may include a processor, a memory, a receiver, and a transmitter, and the processor is configured to execute the method for intelligent identification of chinese medical syndrome mentioned in the foregoing embodiment, where the processor and the memory may be connected through a bus or in another manner, for example, connected through a bus. The receiver can be connected with the processor and the memory in a wired or wireless mode. The electronic device may receive real-time motion data from sensors in the wireless multimedia sensor network and receive an original video sequence from the video capture device.
The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the intelligent recognition method for chinese medical syndrome in the embodiments of the present application. The processor executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions and modules stored in the memory, namely, the intelligent Chinese medicine syndrome identification method in the above method embodiment is realized.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory and when executed by the processor, perform the intelligent recognition method for chinese medical syndrome in the embodiment.
In some embodiments of the present application, the user equipment may include a processor, a memory, and a transceiving unit, the transceiving unit may include a receiver and a transmitter, the processor, the memory, the receiver, and the transmitter may be connected through a bus system, the memory to store computer instructions, the processor to execute the computer instructions stored in the memory to control the transceiving unit to transceive signals.
As an implementation manner, the functions of the receiver and the transmitter in the present application may be implemented by a transceiver circuit or a dedicated chip for transceiving, and the processor may be implemented by a dedicated processing chip, a processing circuit or a general-purpose chip.
As another implementation manner, a manner of using a general-purpose computer to implement the server provided in the embodiment of the present application may be considered. That is, program code that implements the functions of the processor, receiver, and transmitter is stored in the memory, and a general-purpose processor implements the functions of the processor, receiver, and transmitter by executing the code in the memory.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the foregoing steps of the intelligent identification method for Chinese medicine syndrome. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disks, removable storage disks, CD-ROMs, or any other form of storage medium known in the art.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments can be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link.
It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made to the embodiment of the present application for those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (7)

1. An intelligent traditional Chinese medicine syndrome identification method is characterized by comprising the following steps:
respectively matching and obtaining target word vectors corresponding to the target symptom words from a word vector database for storing corresponding relations between the target symptom words and the word vectors according to the target symptom words corresponding to the target traditional Chinese medicine medical record, wherein the target symptom words are generated by performing dictionary matching and text word segmentation on historical traditional Chinese medicine medical record document data in advance; each word vector is generated by pre-adopting N-gram language characteristics to cut words of each symptom word and then training the words based on a CBOW or skip-gram word vector neural network model;
generating a target traditional Chinese medicine medical record vector corresponding to each target word vector;
identifying according to the target traditional Chinese medicine medical record vector and a preset traditional Chinese medicine symptom intelligent identification model to obtain symptom information corresponding to the target traditional Chinese medicine medical record;
wherein, the intelligent recognition model of traditional Chinese medicine syndrome comprises: the device comprises an input layer, a projection layer and an output layer, wherein the output layer comprises: leaf nodes are the symptomatic Huffman tree;
correspondingly, the generating of a target traditional Chinese medicine medical record vector corresponding to each target word vector comprises:
inputting each target word vector into the input layer, and adding each target word vector in the projection layer connected with the input layer to obtain a target traditional Chinese medicine medical record vector;
correspondingly, the identifying according to the target traditional Chinese medicine medical record vector and a preset intelligent identification model of traditional Chinese medicine symptoms to obtain the corresponding symptom information of the target traditional Chinese medicine medical record comprises:
inputting the target Chinese medical record vector into a Huffman tree with leaf nodes in the output layer as symptoms, so that the Huffman tree performs iterative updating by utilizing a gradient ascending process according to a maximum likelihood function for representing confidence of the corresponding symptoms to output corresponding symptom information.
2. The intelligent recognition method of traditional Chinese medicine symptoms according to claim 1, wherein before the obtaining of the target word vector corresponding to each target symptom word from the word vector database for storing the corresponding relationship between each symptom word and each word vector by matching respectively, further comprises:
acquiring data of a plurality of historical traditional Chinese medicine medical record documents;
performing dictionary matching and text word segmentation processing on the historical traditional Chinese medicine medical record document data respectively to obtain corresponding word segmentation results of the historical traditional Chinese medicine medical record containing various symptom words;
segmenting the word segmentation result of the historical traditional Chinese medicine medical record based on N-gram language features to obtain N-gram feature words corresponding to the symptom words respectively;
performing vector initialization operation on each symptom word and each N-gram feature word according to preset word vector dimensions to obtain initialization vectors corresponding to each symptom word and each N-gram feature word respectively;
model training is carried out on the initialization vector based on a CBOW or skip-gram word vector neural network model to obtain word vectors corresponding to each symptom word and each N-gram feature word, and the corresponding relation between each symptom word and each N-gram feature word and each word vector is stored in a word vector database.
3. The intelligent recognition method of traditional Chinese medicine symptoms of claim 1, wherein before said inputting said target TCM medical case vector into a Huffman tree whose leaf nodes are symptoms, further comprising:
classifying historical traditional Chinese medicine medical record document data corresponding to each preset symptom, and acquiring each symptom word corresponding to each historical traditional Chinese medicine medical record document data;
respectively matching in the word vector database to obtain respective corresponding word vectors of the symptom words so as to obtain corresponding training data sets;
respectively inputting each word vector into the input layer, and respectively adding each word vector corresponding to each historical traditional Chinese medicine medical plan document data in the projection layer connected with the input layer to obtain medical plan vectors corresponding to each historical traditional Chinese medicine medical plan document data;
and constructing a likelihood function according to the to-be-predicted symptom information, selecting a gradient ascending method according to a probability maximization principle, and iteratively updating the Hoffman tree in the input layer connected with the projection layer based on each medical case vector so as to enable the leaf nodes of the Hoffman tree to output corresponding symptom information.
4. An intelligent recognition device for traditional Chinese medicine syndromes is characterized by comprising:
the word vector matching module is used for respectively matching target word vectors corresponding to the target symptom words from a word vector database used for storing corresponding relations between the target symptom words and the word vectors according to the target symptom words corresponding to the target traditional Chinese medicine medical plan, wherein the target word vectors are generated by performing dictionary matching and text word segmentation on historical traditional Chinese medicine medical plan document data in advance; each word vector is generated by pre-adopting N-gram language characteristics to cut words of each symptom word and then training the words based on a CBOW or skip-gram word vector neural network model;
the medical plan vector generating module is used for generating a target traditional Chinese medicine medical plan vector corresponding to each target word vector;
the syndrome identification module is used for identifying and obtaining syndrome information corresponding to the target traditional Chinese medicine medical record according to the target traditional Chinese medicine medical record vector and a preset traditional Chinese medicine syndrome intelligent identification model;
wherein, the intelligent recognition model of traditional Chinese medicine symptoms comprises: the projection display device comprises an input layer, a projection layer and an output layer, wherein the output layer comprises: leaf nodes are the symptomatic Huffman tree;
correspondingly, the medical plan vector generation module is specifically configured to execute the following:
inputting each target word vector into the input layer, and adding each target word vector in the projection layer connected with the input layer to obtain a target traditional Chinese medicine medical record vector;
the syndrome identification module is specifically configured to perform the following:
inputting the target Chinese medical record vector into a Huffman tree with leaf nodes in the output layer as symptoms, so that the Huffman tree performs iterative updating by utilizing a gradient ascending process according to a maximum likelihood function for representing confidence of the corresponding symptoms to output corresponding symptom information.
5. The intelligent recognition device of traditional Chinese medicine symptoms according to claim 4, further comprising: a word vector database construction module;
the word vector database construction module is used for executing the following contents:
acquiring data of a plurality of historical traditional Chinese medicine medical record documents;
performing dictionary matching and text word segmentation processing on the historical traditional Chinese medicine medical record document data respectively to obtain corresponding word segmentation results of the historical traditional Chinese medicine medical record containing various symptom words;
segmenting the word segmentation result of the historical traditional Chinese medicine medical record based on N-gram language features to obtain N-gram feature words corresponding to the symptom words respectively;
performing vector initialization operation on each symptom word and each N-gram feature word according to preset word vector dimensions to obtain initialization vectors corresponding to each symptom word and each N-gram feature word respectively;
model training is carried out on the initialization vector based on a CBOW or skip-gram word vector neural network model to obtain word vectors corresponding to each symptom word and each N-gram feature word, and the corresponding relation between each symptom word and each N-gram feature word and each word vector is stored in a word vector database.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the intelligent recognition method of chinese medical syndrome according to any one of claims 1 to 3 when executing the computer program.
7. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the intelligent recognition method of chinese medical syndrome according to any one of claims 1 to 3.
CN202211323785.7A 2022-10-27 2022-10-27 Intelligent traditional Chinese medicine syndrome identification method and device Active CN115391494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211323785.7A CN115391494B (en) 2022-10-27 2022-10-27 Intelligent traditional Chinese medicine syndrome identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211323785.7A CN115391494B (en) 2022-10-27 2022-10-27 Intelligent traditional Chinese medicine syndrome identification method and device

Publications (2)

Publication Number Publication Date
CN115391494A CN115391494A (en) 2022-11-25
CN115391494B true CN115391494B (en) 2023-02-17

Family

ID=84129130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211323785.7A Active CN115391494B (en) 2022-10-27 2022-10-27 Intelligent traditional Chinese medicine syndrome identification method and device

Country Status (1)

Country Link
CN (1) CN115391494B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116525100A (en) * 2023-04-26 2023-08-01 脉景(杭州)健康管理有限公司 Traditional Chinese medicine prescription reverse verification method and system based on label system
CN117854713B (en) * 2024-03-06 2024-06-04 之江实验室 Method for training traditional Chinese medicine syndrome waiting diagnosis model and method for recommending information

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199797A (en) * 2019-12-31 2020-05-26 中国中医科学院中医药信息研究所 Auxiliary diagnosis model establishing and auxiliary diagnosis method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021439B (en) * 2019-03-07 2023-01-24 平安科技(深圳)有限公司 Medical data classification method and device based on machine learning and computer equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199797A (en) * 2019-12-31 2020-05-26 中国中医科学院中医药信息研究所 Auxiliary diagnosis model establishing and auxiliary diagnosis method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Word2vec与BP神经网络的病历症状自动分类研究;叶辉 等;《医学信息杂志》;20181231;第39卷(第11期);第59-62页 *
基于支持向量机和人工神经网络的心血管疾病中医证候分类识别研究;许朝霞 等;《北京中医药大学学报》;20110831;第34卷(第8期);第539-543页 *

Also Published As

Publication number Publication date
CN115391494A (en) 2022-11-25

Similar Documents

Publication Publication Date Title
CN115391494B (en) Intelligent traditional Chinese medicine syndrome identification method and device
CN111709233B (en) Intelligent diagnosis guiding method and system based on multi-attention convolutional neural network
WO2023160472A1 (en) Model training method and related device
JP7143456B2 (en) Medical Fact Verification Method and Verification Device, Electronic Device, Computer Readable Storage Medium, and Computer Program
WO2021238333A1 (en) Text processing network, neural network training method, and related device
CN111914562B (en) Electronic information analysis method, device, equipment and readable storage medium
CN111881292B (en) Text classification method and device
CN109741824B (en) Medical inquiry method based on machine learning
WO2023029502A1 (en) Method and apparatus for constructing user portrait on the basis of inquiry session, device, and medium
CN115080764A (en) Medical similar entity classification method and system based on knowledge graph and clustering algorithm
CN114358001A (en) Method for standardizing diagnosis result, and related device, equipment and storage medium thereof
CN115798661A (en) Knowledge mining method and device in clinical medicine field
CN116168825A (en) Automatic diagnosis device for automatic interpretable diseases based on knowledge graph enhancement
US20240152770A1 (en) Neural network search method and related device
WO2024114659A1 (en) Summary generation method and related device
CN116595994A (en) Contradictory information prediction method, device, equipment and medium based on prompt learning
WO2023116572A1 (en) Word or sentence generation method and related device
CN114387602B (en) Medical OCR data optimization model training method, optimization method and equipment
CN114117082B (en) Method, apparatus, and medium for correcting data to be corrected
CN115132372A (en) Term processing method, apparatus, electronic device, storage medium, and program product
CN116842168B (en) Cross-domain problem processing method and device, electronic equipment and storage medium
CN117438104B (en) Intelligent medicine early warning method, electronic equipment and computer storage medium
Kulkarni et al. Deep Reinforcement-Based Conversational AI Agent in Healthcare System
WO2023143262A1 (en) Data processing method and related device
CN115223720A (en) Medical history quality inspection analysis method, device and equipment based on medical word stock enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant