CN114358001A - Method for standardizing diagnosis result, and related device, equipment and storage medium thereof - Google Patents

Method for standardizing diagnosis result, and related device, equipment and storage medium thereof Download PDF

Info

Publication number
CN114358001A
CN114358001A CN202111356719.5A CN202111356719A CN114358001A CN 114358001 A CN114358001 A CN 114358001A CN 202111356719 A CN202111356719 A CN 202111356719A CN 114358001 A CN114358001 A CN 114358001A
Authority
CN
China
Prior art keywords
standard
diagnostic
original
diagnosis
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111356719.5A
Other languages
Chinese (zh)
Inventor
葛健聪
肖飞
赵景鹤
贺志阳
鹿晓亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Iflytek Medical Information Technology Co ltd
Original Assignee
Anhui Iflytek Medical Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Iflytek Medical Information Technology Co ltd filed Critical Anhui Iflytek Medical Information Technology Co ltd
Priority to CN202111356719.5A priority Critical patent/CN114358001A/en
Publication of CN114358001A publication Critical patent/CN114358001A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application discloses a method for standardizing diagnosis results, and a related device, equipment and a storage medium thereof, wherein the method comprises the following steps: based on the standard diagnosis database, obtaining a standard diagnosis result set according to the original diagnosis result; wherein the set of standard diagnostic results comprises at least two standard diagnostic results; reordering each standard diagnosis result in the standard diagnosis result set by using the similarity between the original diagnosis result and each standard diagnosis result; wherein the similarity at least comprises semantic similarity and text similarity. Through the mode, the accuracy and the reliability of reordering standard diagnostic results can be improved.

Description

Method for standardizing diagnosis result, and related device, equipment and storage medium thereof
Technical Field
The present application relates to the field of intelligent medical information technology, and in particular, to a method for standardizing diagnostic results, and related apparatus, devices, and storage media.
Background
With the continuous deepening of the informatization technology innovation in the field of medical treatment and health, more and more medical treatment and health organizations use electronic medical records as medical record writing carriers for daily medical treatment and health work. However, the diagnosis in the electronic medical record is not standardized and standardized due to the uneven operation level of the computer of the doctor or the recognition and generation errors of the writing content of the doctor by the electronic medical record system.
In addition, due to differences in writing habits, regions and the like, different doctors have a normative problem in writing for diagnosis. The existence of these problems seriously affects the work of analyzing, retrieving, managing, etc. the medical information.
Disclosure of Invention
The technical problem mainly solved by the application is to provide a diagnostic result standardization method and a related device, equipment and storage medium thereof, which can improve the accuracy and reliability of standard diagnostic result reordering.
In order to solve the technical problem, the application adopts a technical scheme that: a method of normalizing diagnostic results is provided, the method comprising: based on the standard diagnosis database, obtaining a standard diagnosis result set according to the original diagnosis result; wherein the set of standard diagnostic results comprises at least two standard diagnostic results; reordering each standard diagnosis result in the standard diagnosis result set by using the similarity between the original diagnosis result and each standard diagnosis result; wherein the similarity at least comprises semantic similarity and text similarity.
Wherein, based on the standard diagnosis database, obtaining a standard diagnosis result according to the original diagnosis result, comprising: performing word segmentation processing on an original diagnosis result to obtain a plurality of original diagnosis words and the sequence of each original diagnosis word; determining a standard diagnostic word corresponding to each original diagnostic word in a standard diagnostic database according to the sequence of each original diagnostic word; and determining a standard diagnosis result according to the determined standard diagnosis word.
The standard diagnosis database comprises a standard diagnosis dictionary tree which is a tree-shaped connection relation diagram of standard diagnosis words in a standard diagnosis result; determining a standard diagnostic word corresponding to each original diagnostic word in a standard diagnostic database in accordance with the sequence of the original diagnostic words, comprising: and according to the sequence of the original diagnostic words, determining the standard diagnostic word corresponding to each original diagnostic word in the level of the standard diagnostic dictionary tree corresponding to the sequence of each original diagnostic word.
Wherein, in the level of the standard diagnosis dictionary tree corresponding to the sequence of each original diagnosis word, determining the standard diagnosis word corresponding to each original diagnosis word comprises: and in the level in the standard diagnosis dictionary tree corresponding to the sequence of each original diagnosis word, determining the probability of each standard diagnosis word in the corresponding level compared with the original diagnosis word, and selecting the standard diagnosis word corresponding to the original diagnosis word according to the probability.
Before obtaining a standard diagnosis result set according to the original diagnosis result based on the standard diagnosis database, the method for standardizing the diagnosis result further comprises the following steps: preprocessing an original diagnosis result; wherein the step of pre-treating comprises at least one of: deleting at least one of blank characters, medicine information and names of people; deleting English characters and/or digital characters before the preset vocabulary; converting the bracket characters into small brackets in an English state; and converting the full-angle English character into a half-angle English character.
The semantic similarity obtaining step comprises the following steps: semantic feature vector extraction is respectively carried out on the original diagnosis result and the standard diagnosis result to obtain a first semantic feature vector of the original diagnosis result and a second semantic feature vector of the standard diagnosis result; and acquiring the similarity between the first semantic feature vector and the second semantic feature vector as the semantic similarity.
The semantic similarity is a cosine distance between the first semantic feature vector and the second semantic feature vector; and/or the calculation method of the text similarity comprises one of an edit distance method and a longest common subsequence method.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an apparatus for normalizing a diagnosis result, the apparatus comprising: the acquisition module is used for obtaining a standard diagnosis result set according to the original diagnosis result based on the standard diagnosis database; wherein the set of standard diagnostic results comprises at least two standard diagnostic results; the sequencing module is used for reordering the standard diagnosis results in the standard diagnosis result set by utilizing the similarity between the original diagnosis result and each standard diagnosis result; wherein the similarity at least comprises semantic similarity and text similarity.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an electronic device comprising a memory and a processor coupled to each other, wherein the memory stores program instructions, and the processor is configured to execute the program instructions to implement the above-mentioned method for standardizing diagnostic results.
In order to solve the above technical problem, another technical solution adopted by the present application is: a computer-readable storage medium is provided for storing program instructions that can be executed to implement the above-described method of normalizing diagnostic results.
According to the scheme, the similarity between the original diagnosis result and the standard diagnosis result comprises at least two dimensions such as semantic similarity and text similarity, namely, multi-dimensional information such as at least the semantic similarity and the text similarity is fused, reordering of each standard diagnosis result in the standard diagnosis result set is realized, reordering of each standard diagnosis result in the standard diagnosis result set is made to depend on abundant text information and semantic information implied by the diagnosis data, the proportion of manually formulating an ordering strategy is weakened, accuracy and reliability of reordering results of each standard diagnosis result in the standard diagnosis result set are improved, and meanwhile, efficiency of generating the reordered standard diagnosis result set is improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for normalizing diagnostic results provided herein;
FIG. 2 is a schematic flow chart diagram illustrating one embodiment of obtaining a standard diagnostic result from a raw diagnostic result provided herein;
FIG. 3 is a schematic diagram of one embodiment of a standard diagnostic result generation model provided herein;
FIG. 4 is a schematic diagram of one embodiment of a canonical diagnostic dictionary tree provided herein;
FIG. 5 is a flowchart illustrating an embodiment of obtaining semantic similarity provided herein;
FIG. 6 is a block diagram of an embodiment of a device for normalizing diagnostic results provided herein;
FIG. 7 is a block diagram of an embodiment of an electronic device provided herein;
FIG. 8 is a block diagram of an embodiment of a computer-readable storage medium provided herein.
Detailed Description
The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an embodiment of a method for normalizing a diagnostic result provided by the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 1 is not limited in this embodiment. As shown in fig. 1, the present embodiment includes:
step S11: and obtaining a standard diagnosis result set according to the original diagnosis result based on the standard diagnosis database.
In one embodiment, the standard diagnosis database includes a plurality of standard diagnoses and a synonymous diagnosis set corresponding to each standard diagnosis, so that the standard diagnosis is generated based on the standard diagnosis and the synonymous diagnosis set thereof. For example, as shown in table 1, the standard diagnostic data includes "type 1 diabetes", "hypertension", "acute upper respiratory infection", and "cervical spondylosis". A synonymous diagnostic set for the standard diagnosis "type 1 diabetes" is "type one diabetes | type I diabetes | type 1 diabetes"; a synonymous diagnostic set for the standard diagnosis "hypertension" is "hypertensive disorders | hypertensive diseases"; a synonymous set of diagnoses for the standard diagnosis "acute upper respiratory infection" is "upper respiratory infection of acute upper respiratory type"; the synonymous diagnostic set for the standard diagnosis of "cervical spondylosis" is "cervical spondylitis". In the synonymous diagnosis set, each synonymous diagnosis is separated by a symbol such as "|". For example, when the diagnosis written by the doctor is "hypertension", the standard diagnosis "hypertension" is finally generated based on the standard diagnoses in the standard diagnosis database and the corresponding synonymous diagnosis set.
TABLE 1 Standard diagnostic database
Figure BDA0003357469360000041
Figure BDA0003357469360000051
In one embodiment, the standard diagnostic database may be constructed using the International and national standard diagnostic system, for example, ICD-10(International Classification of diseases). It is to be understood that, in other embodiments, the standard diagnosis database may also be constructed by using a standard diagnosis system constructed by other medical and health institutions, and the like, and may be specifically set according to an actual usage scenario, which is not specifically limited herein. Since the medical data in the international and national standard diagnostic systems or the standard diagnostic systems constructed by other adopted medical and health institutions may be updated iteratively, in a specific implementation, the standard diagnostic database may also be updated periodically, and the specific updating time is not limited and may be specifically set according to actual use requirements.
In this embodiment, a standard diagnostic result set is obtained from the original diagnostic result based on the standard diagnostic database. Since it is considered that a single standard diagnosis is generated from the original diagnosis result, and there may be a case where the standard diagnosis is generated incorrectly, in the present embodiment, a standard diagnosis result set is generated from the original diagnosis result for the user to select.
The standard diagnosis result set comprises at least two standard diagnosis results, which can be 2, 3, 4 or more, and the like, for the user to select. In an embodiment, the plurality of standard diagnostic results may be sorted from high to low according to the similarity of the standard diagnostic results and the original diagnostic result, and of course, the plurality of standard diagnostic results may also be sorted in other manners, which is not limited herein.
Due to the fact that computer operation level hierarchies of doctors are not uniform or the electronic medical record system identifies the original diagnosis result written by the doctors wrongly, blank characters, names of people, medicine information and other text information irrelevant to diagnosis exist in the original diagnosis result in the electronic medical record. Therefore, in one embodiment, in order to delete meaningless text information or character strings in the original diagnosis result to improve the matching between the standard diagnosis result set and the original diagnosis result, the original diagnosis result needs to be preprocessed before the standard diagnosis result set is obtained from the original diagnosis result based on the standard diagnosis data set.
In one embodiment, the preprocessing is to delete at least one of blank characters, drug information, and names of people in the original diagnosis result. For example, deleting blank characters in the original diagnosis result as "level 2 hypertension" → "level 2 hypertension"; deleting the drug information in the original diagnosis result into 'diabetes medicine fetching' → 'diabetes'; the person name "hypertension (diagnosis group)" → "hypertension" in the original diagnosis result was deleted.
In one embodiment, the preprocessing is to delete pre-lexical english characters and/or numeric characters in the original diagnosis result. The preset vocabulary is not specifically limited, and may be specifically set according to actual use requirements, for example, the preset vocabulary is "pregnancy" or "pregnancy". For example, if the predetermined word is "pregnancy" and the original diagnosis result is "G2P 0 pregnancy 16+1 week", the english character before "pregnancy" in the original diagnosis result, i.e., "pregnancy 16+1 week", needs to be deleted.
In one embodiment, the preprocessing is to replace the bracket character in the original diagnostic result with a small bracket in the english state, that is, if the bracket character exists in the original diagnostic result, the bracket is detected first, and if the bracket character is not the small bracket in the english state, the bracket character is converted into the bracket in the english state. For example, "hypertension (grade 2)" is converted to "hypertension (grade 2)"; "{" convert to "("; "] convert to") ".
In one embodiment, the preprocessing is to convert full-size english characters in the original diagnosis result into half-size english characters. For example, "A" is converted to "A".
It is understood that in other embodiments, the preprocessing may also include deleting at least one of blank characters, medicine information, and names of people in the original diagnosis result, deleting pre-vocabulary english characters and/or numeric characters in the original diagnosis result, replacing brackets in the original diagnosis result with small brackets in an english state, and converting full-angle english characters in the original diagnosis result into half-angle english characters. In other embodiments, the pretreatment may also include other pretreatment methods other than the above mentioned pretreatment method, and may be specifically configured according to actual use requirements, and is not specifically limited herein. It will be appreciated that in other embodiments, the pre-processing may also include any two or more of the above approaches, etc. simultaneously.
Step S12: and reordering the standard diagnostic results in the standard diagnostic result set by utilizing the similarity between the original diagnostic result and each standard diagnostic result.
In the present embodiment, the standard diagnostic results in the standard diagnostic result set are reordered by using the similarity between the original diagnostic result and each standard diagnostic result. Wherein the similarity at least comprises semantic similarity and text similarity. That is to say, the method of this embodiment obtains the correlation score between the original diagnostic result and each standard diagnostic result in the standard diagnostic result set by at least two dimensions of semantic similarity and text similarity, that is, by fusing multi-dimensional information such as at least semantic similarity and text similarity, and further realizes the reordering of each standard diagnostic result in the standard diagnostic result set. Through the mode, the reordering of the standard diagnostic results in the standard diagnostic result set is more dependent on the abundant text information and semantic information of the diagnostic data, and the proportion of manually making the ordering strategy is weakened, so that the accuracy and reliability of the reordering results of the standard diagnostic results in the standard diagnostic result set are improved, and the efficiency of generating the reordered standard diagnostic result set is improved.
In a specific embodiment, the similarity includes semantic similarity and text similarity, the semantic similarity and the text similarity may be processed by using a sigmoid function to obtain a correlation score between an original diagnostic result and a standard diagnostic result obtained based on the semantic similarity and the text similarity, and each standard diagnostic result in the standard diagnostic result set is reordered based on the level of the correlation score, so as to obtain a final reordered result of the standard diagnostic result. Wherein, the semantic similarity set is assumed to be:
SimSemantic={sem1,sem2,……,semn}
wherein, semiFor the ith semantic similarity, the k-dimensional vector represents: each one-dimensional vector represents the original diagnosisSemantic similarity between the broken result and the standard diagnosis result.
Assume that the text similarity set is:
Simcontext={con1,con2,……,conm}
wherein, coniFor the ith text similarity, the k-dimensional vector represents: each one-dimensional vector represents the textual similarity of the original diagnostic result to the standard diagnostic result.
Thus, the formula for calculating the correlation score result between the raw diagnosis result and the standard diagnosis result is:
Figure BDA0003357469360000071
wherein, yiScoring a correlation between the raw diagnosis result and the standard diagnosis result i; simSemantic[0:n][i]N semantic similarity sets of the original diagnosis result and the standard diagnosis result i are shown, wherein n: representing a dimension size of n; simcontext[0:m][i]M text similarity sets of the original diagnosis result and the standard diagnosis result i, wherein m: represents a dimension size of m;
Figure BDA0003357469360000072
is the weight of the semantic similarity set, n: representing a dimension size of n;
Figure BDA0003357469360000082
weight of text similarity set, m: represents a dimension size of m; sigmoid is a nonlinear function, and the output result is limited to [0,1 ]]Has a probabilistic meaning, and the expression is
Figure BDA0003357469360000081
It is to be understood that, in other specific embodiments, other functions or algorithms may also be used to process the semantic similarity and the text similarity, so as to obtain a correlation score between the original diagnosis result and the standard diagnosis result obtained based on the semantic similarity and the text similarity, thereby implementing a re-ranking of each standard diagnosis in the standard diagnosis results.
In one embodiment, the text similarity is calculated by an Edit Distance (Edit Distance) algorithm. It is to be understood that, in other embodiments, the text similarity calculation method may also be a longest common subsequence method or other algorithms, and may be specifically set according to actual use needs, and is not specifically limited herein.
In one embodiment, the semantic similarity may be calculated based on the feature vector similarity between feature vectors as the semantic similarity. For example, the cosine distance between feature vectors is taken as a semantic similarity result, and the like.
In addition, it should be noted that, in order to improve the efficiency and accuracy of reordering the standard diagnostic results in the standard diagnostic result set, a reordering model may be trained in advance, and the reordering model may be used to reorder the standard diagnostic results in the standard diagnostic result set. Specifically, a large number of original diagnostic results and a large number of corresponding standard diagnostic results can be collected in advance, the sample similarity between the original diagnostic results and the corresponding standard diagnostic results is obtained by using the reordering model, and the sample similarity is processed by using a loss function such as cross entropy to obtain a loss value. Further, network parameters of the reordering model can be adjusted based on the loss value, so that the reordering model can reorder each standard diagnostic result in the standard diagnostic result set more and more accurately in a multi-round training process.
In the above embodiment, the similarity between the original diagnostic result and the standard diagnostic result includes at least two dimensions, such as semantic similarity and text similarity, that is, at least multi-dimensional information, such as semantic similarity and text similarity, is fused to reorder the standard diagnostic results in the standard diagnostic result set, so that reordering of the standard diagnostic results in the standard diagnostic result set depends more on abundant text information and semantic information implied by the diagnostic data itself, and the proportion of manually formulating the ordering policy is reduced, thereby improving accuracy and reliability of reordering results of the standard diagnostic results in the standard diagnostic result set, and improving the efficiency of generating the reordered standard diagnostic result set.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of obtaining a standard diagnostic result according to an original diagnostic result. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 2 is not limited in this embodiment. As shown in fig. 2, the present embodiment includes:
step S111: and performing word segmentation processing on the original diagnosis result to obtain a plurality of original diagnosis words and the sequence of each original diagnosis word.
In the present embodiment, the original diagnosis result is subjected to word segmentation processing to obtain a plurality of original diagnosis words and a sequence of each original diagnosis word. In one embodiment, the original diagnosis result is processed by word segmentation + word embedding so as to digitally represent the original diagnosis result, and each original diagnosis word is represented by a vector with fixed dimension. Specifically, taking "cervical spondylitis" as an example, the "cervical spondylitis" is converted into three original diagnosis words by word segmentation + word embedding, and is specifically represented by vectors: x { 'neck': x is the number of1And a 'vertebral': x is the number of2'inflammation': x is the number of3-wherein the order of the original diagnosis words is: "cervical" → "vertebral" → "inflammation".
In order to facilitate the subsequent determination of the standard diagnostic word corresponding to each original diagnostic word in the standard diagnostic database, a semantic vector representing the original diagnostic result needs to be generated. Specifically, as shown in fig. 3, fig. 3 is a schematic diagram of an embodiment of the standard diagnostic result generation model provided in the present application, and first, a vector x representing original diagnostic words is arranged in the order of the original diagnostic wordstInput to an encoder to obtain an implicit state htImplicit State htRelated to the state of the last moment and the current input, i.e. ht=f(ht-1,xt) (ii) a Then, after the encoding stage is finished, the semantic vector c of the original diagnosis result can be obtained. Wherein, the encoder can adopt a Recurrent Neural Network (RNN) NetworkAnd the like, for example, a Long Short-Term Memory network (LSTM), without specific limitation.
In one embodiment, the implicit state of the last input original diagnostic word may be directly used as the semantic vector c. In other embodiments, the implicit state h of the last input original diagnostic word may also be addressedtAnd obtaining a semantic vector c through transformation. In other embodiments, the implicit states h of all the original diagnostic words that are input may also be usedtAnd obtaining a semantic vector c through transformation. The setting can be specifically set according to actual use requirements, and is not specifically limited herein.
Step S112: the standard diagnostic word corresponding to each original diagnostic word is determined in a standard diagnostic database in accordance with the order of the original diagnostic words.
In this embodiment, the standard diagnostic word corresponding to each original diagnostic word is determined in the standard diagnostic database in accordance with the order of the original diagnostic words. Specifically, as shown in fig. 3, the semantic vector c representing the original diagnosis result is input into the decoder, and the first input word vector in the decoding stage is a decoding flag<S>:v1(ii) a After being processed by a decoder, an implicit state H is obtained1=f(c,v1) (ii) a Will imply state H1Inputting the softmax layer to obtain a probability vector S1(ii) a According to the probability vector S1The index position of the maximum value of (2), the word corresponding to the index position in the standard diagnosis database, as the output result Y of the decoding time1The output result Y1The corresponding standard diagnostic word determined in the standard diagnostic database for the original diagnostic word is obtained; will Y1Repeating the above decoding operation as the input of the decoder at the next moment until the output of the decoder is the decoding end mark</S>Indicating that the decoding of the semantic vector c of the original diagnostic result is finished, i.e. at the end of the decoding, the standard diagnostic word corresponding to each original diagnostic word can be determined in the standard diagnostic database. The decoder may employ a Recurrent Neural Network (RNN) Network, etc., which is not specifically limited herein, for example, a Long Short-Term Memory (LSTM).
In one embodiment, before the semantic vector c representing the original diagnosis result is input into the decoder for decoding, a standard diagnosis dictionary can be constructed according to a standard diagnosis database, so that the probability vector S can be used for the subsequent step1And determining the standard diagnostic word corresponding to the original diagnostic word. In one embodiment, each standard diagnosis in the standard diagnosis data is subjected to word segmentation and de-duplication processing to construct a standard diagnosis dictionary, wherein a decoding start mark needs to be added<S>And a decoding end flag</S>。
Theoretically, in obtaining the probability vector S1Then, according to the probability vector S1Determines the standard diagnostic word corresponding to the original diagnostic word, but at the decoding stage a decoding error may occur, so that the probability vector S is based on1May be incorrect for the standard diagnostic word determined by the index position of the maximum value of (a). Therefore, as shown in fig. 4, fig. 4 is a schematic diagram of an example of the standard diagnosis dictionary tree provided in the present application, and in an implementation, the standard diagnosis dictionary tree needs to be constructed according to the standard diagnosis database, and the standard diagnosis dictionary tree is a tree-shaped connection relation diagram of the standard diagnosis words in the standard diagnosis result, so that the probability vector S is used as a basis1Is reasonable. Specifically, according to the sequence of the original diagnostic words, the standard diagnostic word corresponding to each original diagnostic word is determined in the level of the standard diagnostic dictionary tree corresponding to the sequence of each original diagnostic word.
As shown in fig. 3, to ensure that the vector is based on the probability vector S1Is reasonable in obtaining the probability vector S1Post-input mask layer to obtain target probability vector S2. When target probability vector S2Is used for decoding the first output result to obtain the target probability vector S after mask processing2Except the vector in the first level in the standard dictionary tree, the other vectors are all set to be 0 after being processed, so that the standard diagnostic word corresponding to the original diagnostic word is determined in the level in the standard diagnostic dictionary tree corresponding to the original diagnostic word.
For example, taking the original diagnosis result as "cervical spondylitis" as an example, the standard diagnosis dictionary tree is shown in fig. 4, and when decoding, the prediction output result needs to be limited to the first layer of the standard diagnosis dictionary tree, i.e. to three words { high, brain, neck }. Therefore, firstly, the vector representation of the mask layer is set according to the standard diagnosis dictionary tree, and the dimension of the standard diagnosis dictionary tree is assumed to be n +2, wherein n represents the number appearing in all standard diagnoses, and 2 represents two decoding marks of < S >, </S >; assuming that the dictionary index positions of { height, brain, neck } are { a, b, c } respectively, then the vector of the mask layer is represented as:
M=[m1,m2,……,mi,……,mn]
wherein the content of the first and second substances,
Figure BDA0003357469360000111
then, the probability vector S1Input to decoding layer for processing, probability vector S1Performing point multiplication with the vector of the mask layer to obtain a target probability vector S2The concrete formula is as follows:
S2=S1⊙M
wherein an indicator is a dot product between two vectors.
That is, when i is "high", "brain", and "neck", the corresponding mask vector is 1, and the rest are 0, so that the probability vector S is the probability vector S1After vector dot multiplication with the mask layer, the target probability vector S2Except that the vectors corresponding to the high, brain and neck are kept unchanged, the other vectors are 0, namely the probability of corresponding to the high, brain and neck is higher, so that the prediction result is limited in the first layer of the standard diagnosis dictionary tree level.
Further, in one embodiment, in the level in the standard diagnosis dictionary tree corresponding to the order of each original diagnosis word, the probability of each standard diagnosis word in the corresponding level compared with the original diagnosis word is determined, and the standard diagnosis word corresponding to the original diagnosis word is selected according to the probability. That is, the standard diagnostic word corresponding to the original diagnostic word is selected as the standard diagnostic word having the highest probability according to the probability of each standard diagnostic word in the determined hierarchy compared to the original diagnostic word.
For example, the probability vector S1After point multiplication with the mask layer vector, a target probability vector S is obtained2According to a target probability vector S2The value with the highest probability in (1) determines that the decoding result is "neck".
According to the mode, after the first decoding result is obtained, the second-step decoding is carried out, according to the prior information of the standard diagnosis dictionary tree, the decoding result of the second step is limited to be 'vertebra', the operation of the masking layer is repeated, and the decoding result is 'vertebra'; during the third step of decoding, according to the prior information of the standard diagnosis dictionary tree, the decoding result is limited in disease and off, the operation of the masking layer is repeated, and the decoding result is disease; and in the fourth step of decoding, because the 'disease' is a leaf node of the standard diagnosis dictionary tree, the decoding result is limited to be S, the operation of the masking layer is repeated, the decoding result is obtained to be '</S', and the decoding stage is ended.
In other embodiments, in order to obtain the standard diagnostic result set, a Beam Search (Beam Search) algorithm may be used to select the target probability vector S2The K words with the maximum probability are used as standard diagnosis words output by the first step of decoding; and each step of decoding is carried out, K words with the highest probability in all combinations are selected as the output of the decoding result based on the decoding result output in the last step, and the standard diagnosis result set R ═ { R ═ R can be obtained until the decoding is finished1,r2,……,rk}。
Step S113: and determining a standard diagnosis result according to the determined standard diagnosis word.
In the present embodiment, the standard diagnosis result is finally determined based on the determined standard diagnosis word. For example, the standard diagnostic words determined in the above manner are "cervical", "vertebral", and "sick", respectively, so the standard diagnostic result finally determined is "cervical spondylosis".
Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of obtaining semantic similarity according to the present disclosure. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 5 is not limited in this embodiment. As shown in fig. 5, the present embodiment includes:
step S51: and respectively extracting semantic feature vectors of the original diagnosis result and the standard diagnosis result to obtain a first semantic feature vector of the original diagnosis result and a second semantic feature vector of the standard diagnosis result.
In this embodiment, semantic feature vector extraction is performed on the original diagnostic result and the standard diagnostic result, respectively, to obtain a first semantic feature vector of the original diagnostic result and a second semantic feature vector of the standard diagnostic result. In one embodiment, the seq2seq model can be used to perform semantic feature vector extraction on the original diagnosis result and the standard diagnosis result. It is to be understood that in other embodiments, other network models may also be used to perform semantic feature vector extraction on the original diagnosis result and the standard diagnosis result, which is not specifically limited herein.
Step S52: and acquiring the similarity between the first semantic feature vector and the second semantic feature vector as the semantic similarity.
In this embodiment, the similarity between the first semantic feature vector and the second semantic feature vector may be a cosine distance between the first semantic feature vector and the second semantic feature vector, and the cosine distance between the first semantic feature vector and the second semantic feature vector is used as the semantic similarity. It is to be understood that, in other embodiments, the similarity between the first semantic feature vector and the second semantic feature vector may also be calculated in other manners, and is not limited in particular herein.
Referring to fig. 6, fig. 6 is a schematic diagram of a framework of an embodiment of a device for normalizing a diagnosis result provided by the present application. The means 60 for normalizing the diagnostic result comprise an acquisition module 61 and a ranking module 62. The obtaining module 61 is configured to obtain a standard diagnosis result set according to the original diagnosis result based on the standard diagnosis database; wherein the set of standard diagnostic results comprises at least two standard diagnostic results; the sorting module 62 is configured to reorder the standard diagnostic results in the standard diagnostic result set by using the similarity between the original diagnostic result and each standard diagnostic result; wherein the similarity at least comprises semantic similarity and text similarity.
The obtaining module 61 is configured to obtain a standard diagnosis result according to an original diagnosis result based on a standard diagnosis database, and specifically includes: performing word segmentation processing on an original diagnosis result to obtain a plurality of original diagnosis words and the sequence of each original diagnosis word; determining a standard diagnostic word corresponding to each original diagnostic word in a standard diagnostic database according to the sequence of each original diagnostic word; and determining a standard diagnosis result according to the determined standard diagnosis word.
The standard diagnosis database comprises a standard diagnosis dictionary tree which is a tree-shaped connection relation diagram of standard diagnosis words in a standard diagnosis result; the obtaining module 61 is configured to determine a standard diagnostic word corresponding to each original diagnostic word in a standard diagnostic database according to the sequence of each original diagnostic word, and specifically includes: and according to the sequence of the original diagnostic words, determining the standard diagnostic word corresponding to each original diagnostic word in the level of the standard diagnostic dictionary tree corresponding to the sequence of each original diagnostic word.
The obtaining module 61 is configured to determine, in a hierarchy in the standard diagnostic dictionary tree corresponding to the order of each original diagnostic word, a standard diagnostic word corresponding to each original diagnostic word, and specifically includes: and in the level in the standard diagnosis dictionary tree corresponding to the sequence of each original diagnosis word, determining the probability of each standard diagnosis word in the corresponding level compared with the original diagnosis word, and selecting the standard diagnosis word corresponding to the original diagnosis word according to the probability.
The diagnostic result normalization apparatus 60 further includes a preprocessing module 63, where the preprocessing module 63 is configured to, before obtaining a standard diagnostic result set according to an original diagnostic result based on the standard diagnostic database, specifically include: preprocessing an original diagnosis result; wherein the step of preprocessing comprises at least one of: deleting at least one of blank characters, medicine information and names of people; deleting English characters and/or digital characters before the preset vocabulary; converting the bracket characters into small brackets in an English state; and converting the full-angle English character into a half-angle English character.
The semantic similarity obtaining step comprises the following steps: semantic feature vector extraction is respectively carried out on the original diagnosis result and the standard diagnosis result to obtain a first semantic feature vector of the original diagnosis result and a second semantic feature vector of the standard diagnosis result; and acquiring the similarity between the first semantic feature vector and the second semantic feature vector as the semantic similarity.
The semantic similarity is a cosine distance between the first semantic feature vector and the second semantic feature vector; and/or the text similarity calculation method comprises one of an edit distance method and a longest common subsequence method.
Referring to fig. 7, fig. 7 is a schematic diagram of a frame of an embodiment of an electronic device provided in the present application. The electronic device 70 comprises a memory 71 and a processor 72 coupled to each other, the memory 71 having stored therein program instructions, the processor 72 being configured to execute the program instructions to implement the steps of any of the above-described standardized method embodiments of diagnostic results. Specifically, the electronic device 70 may include, but is not limited to: desktop computers, notebook computers, servers, mobile phones, tablet computers, and the like, without limitation.
In particular, the processor 72 is adapted to control itself and the memory 71 to implement the steps in any of the above described standardized method embodiments of diagnostic results. The processor 72 may also be referred to as a CPU (Central Processing Unit). The processor 72 may be an integrated circuit chip having signal processing capabilities. The Processor 72 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 72 may be collectively implemented by an integrated circuit chip.
Referring to fig. 8, fig. 8 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 80 stores program instructions 81 executable by the processor, the program instructions 81 for implementing the steps in any of the above described embodiments of the method for normalizing diagnostic results.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method of normalizing diagnostic results, the method comprising:
based on the standard diagnosis database, obtaining a standard diagnosis result set according to the original diagnosis result; wherein the set of standard diagnostic results comprises at least two standard diagnostic results;
reordering each standard diagnostic result in the standard diagnostic result set by using the similarity between the original diagnostic result and each standard diagnostic result; wherein the similarity at least comprises semantic similarity and text similarity.
2. The method of claim 1, wherein obtaining the standard diagnostic result from the raw diagnostic result based on a standard diagnostic database comprises:
performing word segmentation processing on the original diagnosis result to obtain a plurality of original diagnosis words and the sequence of each original diagnosis word;
determining a standard diagnostic word corresponding to each original diagnostic word in the standard diagnostic database according to the sequence of each original diagnostic word;
and determining the standard diagnosis result according to the determined standard diagnosis word.
3. The method of claim 2 wherein said standard diagnostic database comprises a standard diagnostic dictionary tree that is a tree-connected relational graph of standard diagnostic words in said standard diagnostic results;
determining a standard diagnostic word corresponding to each of the original diagnostic words in the standard diagnostic database according to the sequence of the original diagnostic words, comprising:
and according to the sequence of the original diagnostic words, determining the standard diagnostic word corresponding to each original diagnostic word in the level in the standard diagnostic dictionary tree corresponding to the sequence of each original diagnostic word.
4. The method of claim 3, wherein said determining a standard diagnostic word for each said original diagnostic word at a level in said standard diagnostic dictionary tree corresponding to the order of each said original diagnostic word comprises:
in the level of the standard diagnosis dictionary tree corresponding to the sequence of each original diagnosis word, determining the probability of each standard diagnosis word in the corresponding level compared with the original diagnosis word, and selecting the standard diagnosis word corresponding to the original diagnosis word according to the probability.
5. The method of claim 1, wherein prior to said deriving a set of standard diagnostic results from raw diagnostic results based on a standard diagnostic database, the method further comprises:
preprocessing the original diagnosis result;
wherein the step of pre-treating comprises at least one of:
deleting at least one of blank characters, medicine information and names of people;
deleting English characters and/or digital characters before the preset vocabulary;
converting the bracket characters into small brackets in an English state;
and converting the full-angle English character into a half-angle English character.
6. The method according to claim 1, wherein the step of obtaining semantic similarity comprises:
semantic feature vector extraction is respectively carried out on the original diagnosis result and the standard diagnosis result to obtain a first semantic feature vector of the original diagnosis result and a second semantic feature vector of the standard diagnosis result;
and acquiring the similarity between the first semantic feature vector and the second semantic feature vector as the semantic similarity.
7. The method of claim 6,
the semantic similarity is a cosine distance between the first semantic feature vector and the second semantic feature vector;
and/or the text similarity calculation method comprises one of an edit distance method and a longest common subsequence method.
8. A device for standardizing diagnostic results, characterized in that,
the acquisition module is used for obtaining a standard diagnosis result set according to the original diagnosis result based on the standard diagnosis database; wherein the set of standard diagnostic results comprises at least two standard diagnostic results;
the sequencing module is used for reordering each standard diagnostic result in the standard diagnostic result set by utilizing the similarity between the original diagnostic result and each standard diagnostic result; wherein the similarity at least comprises semantic similarity and text similarity.
9. An electronic device, comprising a memory and a processor coupled to each other, the memory having stored therein program instructions, the processor being configured to execute the program instructions to implement the method of normalizing diagnostic results of any one of claims 1 to 7.
10. A computer-readable storage medium for storing program instructions executable to implement the method of normalizing diagnostic results of any one of claims 1-7.
CN202111356719.5A 2021-11-16 2021-11-16 Method for standardizing diagnosis result, and related device, equipment and storage medium thereof Pending CN114358001A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111356719.5A CN114358001A (en) 2021-11-16 2021-11-16 Method for standardizing diagnosis result, and related device, equipment and storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111356719.5A CN114358001A (en) 2021-11-16 2021-11-16 Method for standardizing diagnosis result, and related device, equipment and storage medium thereof

Publications (1)

Publication Number Publication Date
CN114358001A true CN114358001A (en) 2022-04-15

Family

ID=81095337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111356719.5A Pending CN114358001A (en) 2021-11-16 2021-11-16 Method for standardizing diagnosis result, and related device, equipment and storage medium thereof

Country Status (1)

Country Link
CN (1) CN114358001A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783596A (en) * 2022-06-17 2022-07-22 天津幸福生命科技有限公司 Diagnostic data identification method and device, electronic equipment and storage medium
CN114822865A (en) * 2022-06-27 2022-07-29 天津幸福生命科技有限公司 Diagnostic data identification method and device, electronic equipment and storage medium
CN116631550A (en) * 2023-07-26 2023-08-22 深圳爱递医药科技有限公司 Data management and logic checking method for clinical trial and medical system thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079026A (en) * 2007-07-02 2007-11-28 北京百问百答网络技术有限公司 Text similarity, acceptation similarity calculating method and system and application system
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
CN111696635A (en) * 2020-05-13 2020-09-22 平安科技(深圳)有限公司 Disease name standardization method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079026A (en) * 2007-07-02 2007-11-28 北京百问百答网络技术有限公司 Text similarity, acceptation similarity calculating method and system and application system
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
CN111696635A (en) * 2020-05-13 2020-09-22 平安科技(深圳)有限公司 Disease name standardization method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783596A (en) * 2022-06-17 2022-07-22 天津幸福生命科技有限公司 Diagnostic data identification method and device, electronic equipment and storage medium
CN114822865A (en) * 2022-06-27 2022-07-29 天津幸福生命科技有限公司 Diagnostic data identification method and device, electronic equipment and storage medium
CN114822865B (en) * 2022-06-27 2022-11-11 天津幸福生命科技有限公司 Diagnostic data identification method and device, electronic equipment and storage medium
CN116631550A (en) * 2023-07-26 2023-08-22 深圳爱递医药科技有限公司 Data management and logic checking method for clinical trial and medical system thereof
CN116631550B (en) * 2023-07-26 2023-11-28 深圳爱递医药科技有限公司 Data management and logic checking method for clinical trial and medical system thereof

Similar Documents

Publication Publication Date Title
CN111709233B (en) Intelligent diagnosis guiding method and system based on multi-attention convolutional neural network
CN106919793B (en) Data standardization processing method and device for medical big data
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN107731269B (en) Disease coding method and system based on original diagnosis data and medical record file data
CN107705839B (en) Disease automatic coding method and system
CN111540468B (en) ICD automatic coding method and system for visualizing diagnostic reasons
WO2021139424A1 (en) Text content quality evaluation method, apparatus and device, and storage medium
CN114358001A (en) Method for standardizing diagnosis result, and related device, equipment and storage medium thereof
EP4026047A1 (en) Automated information extraction and enrichment in pathology report using natural language processing
CN111696635A (en) Disease name standardization method and device
CN111274365A (en) Intelligent inquiry method and device based on semantic understanding, storage medium and server
CN109299227B (en) Information query method and device based on voice recognition
CN108427702B (en) Target document acquisition method and application server
CN111177375B (en) Electronic document classification method and device
CN115391494B (en) Intelligent traditional Chinese medicine syndrome identification method and device
CN112541066A (en) Text-structured-based medical and technical report detection method and related equipment
CN113723056A (en) ICD (interface control document) coding conversion method, device, computing equipment and storage medium
CN111104481B (en) Method, device and equipment for identifying matching field
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
CN114387602B (en) Medical OCR data optimization model training method, optimization method and equipment
CN115457586A (en) Case information extraction method, device, equipment and storage medium
CN114068028A (en) Medical inquiry data processing method and device, readable storage medium and electronic equipment
CN113010685A (en) Medical term standardization method, electronic device, and storage medium
CN117235137B (en) Professional information query method and device based on vector database
CN112700830B (en) Method, device and storage medium for extracting structured information from electronic medical record

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Floor 18, building A5, NO.666, Wangjiang West Road, high tech Zone, Hefei City, Anhui Province 230000

Applicant after: Anhui Xunfei Medical Co.,Ltd.

Address before: Floor 18, building A5, NO.666, Wangjiang West Road, high tech Zone, Hefei City, Anhui Province 230000

Applicant before: ANHUI IFLYTEK MEDICAL INFORMATION TECHNOLOGY CO.,LTD.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: Floor 18, building A5, NO.666, Wangjiang West Road, high tech Zone, Hefei City, Anhui Province 230000

Applicant after: IFLYTEK Medical Technology Co.,Ltd.

Address before: Floor 18, building A5, NO.666, Wangjiang West Road, high tech Zone, Hefei City, Anhui Province 230000

Applicant before: Anhui Xunfei Medical Co.,Ltd.