CN117574896B - Surgical fee identification method, device and storage medium based on electronic medical record text - Google Patents

Surgical fee identification method, device and storage medium based on electronic medical record text Download PDF

Info

Publication number
CN117574896B
CN117574896B CN202410056657.3A CN202410056657A CN117574896B CN 117574896 B CN117574896 B CN 117574896B CN 202410056657 A CN202410056657 A CN 202410056657A CN 117574896 B CN117574896 B CN 117574896B
Authority
CN
China
Prior art keywords
word
words
original
operation word
original operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410056657.3A
Other languages
Chinese (zh)
Other versions
CN117574896A (en
Inventor
李劲松
马骁勇
杨宗峰
周天舒
田雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202410056657.3A priority Critical patent/CN117574896B/en
Publication of CN117574896A publication Critical patent/CN117574896A/en
Application granted granted Critical
Publication of CN117574896B publication Critical patent/CN117574896B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a surgical fee identification method, a device and a storage medium based on electronic medical record text, wherein the method comprises the following steps: step S1: acquiring a standard operation word set, and constructing a superset of the standard operation word set; step S2: preprocessing unstructured electronic medical record text to obtain fragments related to surgery, and segmenting each fragment to obtain word sequences related to each fragment respectively; step S3: inputting the word sequence into a trained operation word generation model to generate an original operation word sequence; step S4: merging all the original operation words to obtain a plurality of original operation word sets; step S5: constructing an operation word tree based on the original operation word set; step S6: and mapping the nodes in the operation word tree to standard operation words, and taking the sum of the costs of the standard operation words mapped by all the first-level child nodes of the root node as a recognition result. Compared with the prior art, the method has the advantages of high accuracy and the like.

Description

Surgical fee identification method, device and storage medium based on electronic medical record text
Technical Field
The present invention relates to processing of medical data, and in particular, to a surgical fee recognition method, apparatus and storage medium based on electronic medical record text.
Background
Surgery is an important part of the medical procedure, and the costs incurred around surgery are also an important part of medical costs, such as cost research of medical insurance, etc.
In this regard, those skilled in the art typically employ less sensitive electronic medical record text for identification of surgical costs. The electronic medical record text as a diagnosis record of a patient covers a plurality of aspects including a main complaint, a medication record, a surgery record and the like, wherein the surgery record part consists of a section of unstructured record text and a title. The headings in the procedure records indicate the procedure items, and the record text describes the various detailed operations completed during the procedure, all being manual records. Usually, the specific project and cost of the operation can be defined according to the title, however, the content of the operation title is often missed or even lost in the whole process of creating and transmitting the electronic medical record, and the reasons are various, such as:
1. a conventionally omitted charge item in the title, such as a conventional monitoring item like "electrocardiogram";
2. the emergency during surgery causes the surgical item to change, while the surgical title is not modified with follow-up;
3. the complex surgical content makes it difficult for the logger to summarize into standard surgical items;
4. manually recording unavoidable nonstandard words and omission;
5. the surgical records undergo information loss in paper or electronic transmission.
In the prior art, the item names in the operation titles are mapped to the standard operation words of the standard price list in a term standardized manner, so that the operation cost is calculated according to the price of the standard operation words. This approach is technically easier to implement, but still suffers from the following drawbacks: the lack of utilization of the operation record text can not solve the problem of title missing, the operation record text is a detailed description of the operation process, and contains various detailed information such as operation positions, use instruments, specific operations, emergency and the like, and the prior art can not solve the problems of possible missing or even missing of title information according to the operation title identification cost, so that the cost identification error can not be identified even.
In this regard, some persons skilled in the art propose to use the same processing method for the text portion of the electronic medical record, but the implementation process of the surgery may include a plurality of surgery items in parallel relationship due to lack of judgment of the relation of the surgery words, or one item may include a plurality of individual sub-items, and this scheme may cause duplication of cost due to lack of judgment of the relation after obtaining the surgery-related surgery words.
Disclosure of Invention
The invention aims to provide a surgical expense recognition method, a device and a storage medium based on electronic medical record text, which are realized by using the surgical record text, can eliminate the dependence on header information, can be normally realized even if surgical item information is missing or the writing of the item information is irregular, and can avoid repeated calculation of surgical expense compared with a common mode based on electronic medical record data, thereby improving the accuracy of surgical expense recognition.
The aim of the invention can be achieved by the following technical scheme:
an operation expense identification method based on electronic medical record text comprises the following steps:
step S1: obtaining a standard operation word set and constructing a superset of the standard operation word set, wherein the superset of the standard operation word set comprises nonstandard operation words synonymous with each standard operation word;
step S2: preprocessing unstructured electronic medical record text to obtain fragments related to surgery, and segmenting each fragment to obtain word sequences related to each fragment respectively;
step S3: inputting a word sequence into a trained operation word generation model to generate an original operation word sequence, wherein the operation word generation model is input into the word sequence and output into the original operation word sequence composed of a plurality of original operation words, and the original operation words are elements in the superset;
step S4: merging all the original operation words to obtain a plurality of original operation word sets, wherein in each original operation word set, if two or more operation words exist in the original operation word set, for any original operation word, the relationship between at least one other original operation word and any original operation word exists in parallel, equivalent, modified or modified;
step S5: constructing an operation word tree based on an original operation word set, wherein each node except a root node in the operation word tree corresponds to one original operation word set, and any node is contained by a parent node of the upper level and contains all child nodes of the lower level;
step S6: and mapping the nodes in the operation word tree to standard operation words, and taking the sum of the costs of the standard operation words mapped by all the first-level child nodes of the root node as a recognition result.
The training process of the operation word generation model comprises the following steps:
step S3-1-1: generating a dictionary tree based on the superset, wherein a root node in the dictionary tree is an empty node, each other node represents a nonstandard operation word, and all operation words in the superset correspond to a path from the root node to a child node;
step S3-1-2: constructing training set data, wherein training samples in the training set data are word sequences and operation words corresponding to the word sequences, and the operation words are standard operation words or non-standard operation words;
step S3-1-3: and training the operation word generation model by using the training set data.
The operation word generation model generates operation words in a way of generating words one by one.
The process of generating the original operation word sequence by the operation word generation model specifically comprises the following steps:
step S3-2-1: setting a start pointer of a sliding windowstart=1, endpoint pointerend=|tI, position pointerp=startWhereintIs a sequence of words;
step S3-2-2: moving a position pointer from a start position to an end position along a word sequence to obtain a sub-sequencet start:p =[t start ,…,t p ]The position of the position pointer when the probability of the generated operation word is maximum;
step S3-2-3: will be composed of subsequencest start:p The generated operation word is taken as oneOriginal operation words;
step S3-2-4: cutting the word sequence by taking the position of the current position pointer as a cutting point to obtain two new word sequences, and repeating the steps S3-2-1 to S3-2-3 on all the obtained word sequences to generate a plurality of original operation words;
step S3-2-4: and splicing all the original operation words to form an original operation word sequence.
In the step S3-2-2, the probability of the operation word is specifically:
p=argmax start≤p≤end P(qt start:p )
q=Model g (t start:p )
wherein:pas a pointer to the location of the object,Pas a result of the probability,Model g a model is generated for the operation word,qa model is generated for the operation word, and the generated operation word is input by taking a subsequence from a starting pointer to a position pointer in the word sequence.
The step S4 includes:
step S41: sequentially reordering the original operation word sequences according to the length of the word sequences corresponding to the original operation words and the starting point positions, wherein in the reordering process, the longer the word sequences are, the more front the starting point positions are, the more front the operation words are;
step S42: selecting a first original operation word from the reordered original operation word sequence, establishing a first original operation word set, and taking the selected original operation word as an element in the first original operation word set;
step S42: selecting a next original surgical word in the reordered sequence of original surgical wordsq i 'Respectively calculating the relation between the original operation words and each original operation word in each original operation word set, if any original operation word set existsQ n,j Any one of the original operation words and the original operation wordq i 'Is parallel, equivalent, modified or modified, then the originalOperation wordq i 'Added to the original surgical word setQ n,j If not, then creating a new original operation word set and combining the original operation wordsq i 'As an element in the newly created original surgical word set;
step S43: and judging whether the traversal of all the original operation words in the reordered original operation word sequence is finished, if so, ending, otherwise, returning to the step S42.
In the step S42, the relationships between the operation words are obtained by an operation word relationship model, wherein the operation word relationship model is input by two operation words and word sequences respectively corresponding to the two operation words.
The step S5 includes:
step S51: creating an initial operation word tree, wherein the initial operation word tree only comprises one empty root node;
step S52: creating a first-level child node for the root node, and selecting an original operation word set as the first-level child node;
step S53: selecting the next non-traversed original operation word set and recording target comparison sub-nodesQ mubiao Is a root node;
step S54: judging the original operation word set and the child node to be comparedQ s Wherein the relation of inclusion of (c) wherein,Q s Sub (Q mubiao )Sub(Q mubiao )for a set of all levels of children of the target comparison children,
if any child node to be compared existsQ s Step S55 is executed if the original operation word set is included;
if any child node to be compared existsQ s Is included by the original operation word set, the original operation word set is inserted into the child node to be comparedQ s And to compare the positions of the child nodesQ s And its child node moves down, executing step S56;
if all the sub-nodes to be comparedQ s If there is no inclusion relation with the original operation word set, then the child node is compared with the targetQ mubia Creating a new next level child node, using the original operation word set as the newly created child node, and executing step S56;
step S55: judging the child node to be comparedQ s If the child node exists, the child node to be compared is determined to beQ s As a new target comparison sub-node, repeating step S54, otherwise, as a sub-node to be comparedQ s Creating a new next level child node, and taking the original operation word set as the newly created child node;
step S56, whether the traversal of all the original operation word sets is completed or not, if yes, an operation word tree is output, otherwise, the step S53 is returned.
An operation expense recognition device based on electronic medical record text comprises a memory, a processor and a program stored in the memory, wherein the processor realizes the method when executing the program.
A storage medium having stored thereon a program which when executed performs a method as described above.
Compared with the prior art, the invention has the following beneficial effects:
1. by using the operation record text, the dependence on the title information can be eliminated, and even if operation item information is missing or the item information is not written normally, the operation can be normally realized.
2. The standard operation word mapping step occurs in the final cost generation step, and the original operation words are adopted for the semantic processing process in the early stage, so that more operation words can be mined, and then the operation word relation is reduced, so that the accuracy is greatly improved.
3. And extracting a plurality of fragments from the operation record text through the combination of the sliding window and the generation model, and respectively generating related operation words, thereby avoiding omission of the original operation words.
4. And judging the relation among the operation words through the deep learning model, and structuring the collection into an operation word tree according to the relation, so that the repeated calculation of the cost is avoided.
Drawings
FIG. 1 is a schematic flow chart of main steps of the method of the invention;
FIG. 2 is a schematic diagram of the technical route of the present invention.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.
The surgical expense recognition method based on the electronic medical record text mainly comprises three parts, namely surgical word generation, surgical word tree construction and surgical expense recognition, as shown in fig. 2, and specifically comprises the following steps of:
step S1: obtaining a standard operation word set, and constructing a superset of the standard operation word set, wherein the superset of the standard operation word set comprises nonstandard operation words synonymous with each standard operation word;
the standard operation word set is obtained from the price list, but because the standard operation words in the price list have smaller scale, a large number of non-standard operation words are actually available, in order to improve the coverage degree of the operation record text, the reasonable operation words can be generated by the operation record text to be identified, so that the non-standard operation words need to be expanded perfectly to obtain the standard operation word set.
The nonstandard operation words can be obtained from various sources such as the existing operation records, the electronic medical records, the doctor's collection and the like. Aiming at the existing operation record or electronic medical record, if titles related to operation items exist, the titles can be divided according to separators such as a plus sign and the like, and nonstandard operation words are directly obtained; for a text paragraph, non-standard surgical words may be identified from the paragraph by training a CRF, BERT-based surgical word identification model. For the doctor's collection, the doctor can be provided with standard operation words in the price list, and the doctor directly fills out non-standard writing methods of the standard operation words.
Step S2: preprocessing unstructured electronic medical record text to obtain fragments related to surgery, and segmenting each fragment to obtain word sequences related to each fragment respectively;
the electronic medical record text corresponding to the surgical record is usually a complete surgical detail record and generally comprises various contents such as instruments, operations, phenomena in the operation and the like. For the surgical record to be identified, the preprocessing part performs basic text processing on the whole recorded text, including unifying case, full half angle, deleting redundant blank spaces, line feed and the like. And performing word segmentation operation on the text which is processed to obtain a word sequence.
Step S3: inputting the word sequence into a trained operation word generation model to generate an original operation word sequence, wherein the operation word generation model is input into the word sequence and output into an original operation word sequence consisting of a plurality of original operation words, and the original operation words are elements in a superset;
the training process of the operation word generation model comprises the following steps:
step S3-1-1: generating a dictionary tree based on the superset, wherein a root node in the dictionary tree is an empty node, each other node represents a nonstandard operation word, and all operation words in the superset correspond to a path from the root node to a child node;
step S3-1-2: constructing training set data, wherein training samples in the training set data are word sequences and operation words corresponding to the word sequences, and the operation words are standard operation words or non-standard operation words;
specifically, the existing operation records are collected, text fragments related to the operation are manually extracted, the fragments are segmented into word sequences, and the word sequences are marked as operation words in a superset and serve as training data of a generated model. The training sample is in the form of binary groupt,q) Wherein, the method comprises the steps of, wherein,tfor word sequences derived from the surgical recording segment,qis the operative word corresponding to the word sequence.
Step S3-1-3: and training the operation word generation model by using the training set data.
The operation word generation model generates operation words in a one-by-one generation mode, and the formula is as follows:
P(w k w k1:-1 ,t)=Decoder(h k )
h k =Encoder(tw k1:-1 )
W k ={w k w k Children(w k1:-1 ),P(w k1:t)>θ},|W k |<N
wherein:w k to generate the firstkThe number of words to be used in a word,w k1: is word sequence, from 1 st to 1 stkThe composition of the individual words is such that,W k to generate the firstkA result set of the individual words is provided,EncoderDecoderthe encoder and decoder, respectively, may be implemented by using a multi-layer transducer stack,Children(w k1: ) For the path on the dictionary tree to bew 1 ,…,w k In the time-course of which the first and second contact surfaces,w k a set of sub-nodes of the node,θin order to generate the probability threshold value for the process,Nin order to generate a threshold value for the number of processes,h k word sequences obtained for encoder to record fragments for surgerytAnd the generated former%k-1) sequences of wordsw k1:-1 The intermediate representation obtained after the encoding is performed,P(w k w k1:-1 ,t) To input word sequences derived from surgical log segmentstAnd has been generatedw k1:-1 Under the condition of (1) to generatew k1: Is a function of the probability of (1),P(w K1:t) To input word sequences derived from surgical log segmentstUnder the condition of (1) to generatew K1: Is a function of the probability of (1),Kis the total word quantity, representing the concatenation of vectors.
The generation model generates words one by one according to the formula, wherein the firstkStep of generating words inW k In the set, the generated subsequencesw k1: Is a path from a root node on the dictionary tree, stops generating when the end point of the path is a leaf node,w k1: i.e. to generate a result. If at firstkGenerating a set of wordsW k Is empty, then the current surgical fragmenttThere is no related operative word.
The generating process will obtain the satisfactory operation word setWTaking the operation word with the highest probability as the operation record segmenttThe related operation words of the model result is generated, and the formula is as follows:
Model g (t)=argmax w W P(w len w1:() )│t)
wherein:Model g a model is generated for the operation word,argmaxas a function of the maximum value of the parameter,len(w) Is a surgical wordwIs provided for the length of (a),w len w1:() to correspond to the operation wordwIs a word sequence of (a).
The process of generating the original operation word sequence by the operation word generation model specifically comprises the following steps:
step S3-2-1: setting a start pointer of a sliding windowstart=1, endpoint pointerend=|tI, position pointerp=startWhereintFor a sequence of words,t=[t 1 ,t 2 ,…],t 1 ,t 2 is a word;
step S3-2-2: moving a position pointer along the word sequence from a start position to an end position to obtainThe sub-sequence of the root is takent start:p =[t start ,…,t p ]The position of the position pointer when the probability of the generated operation word is maximum;
step S3-2-3: will be composed of subsequencest start:p The generated operation word is used as an original operation word;
step S3-2-4: cutting the word sequence by taking the position of the current position pointer as a cutting point to obtain two new word sequences, and repeating the steps S3-2-1 to S3-2-3 on all the obtained word sequences to generate a plurality of original operation words;
step S3-2-4: and splicing all the original operation words to form an original operation word sequence.
In the step S3-2-2, the probability of the operation word is specifically:
p=argmax start≤p≤end P(qt start:p )
q=Model g (t start:p )
wherein:pas a pointer to the location of the object,Pis probabilityModel g A model is generated for the operation word,qa model is generated for the operation word, and the generated operation word is input by taking a subsequence from a starting pointer to a position pointer in the word sequence.
In one embodiment, the above generation process will result in a set of all the original termsQWherein each original operation word has a corresponding word sequence, and the original operation word is generated by recordingqWord sequence of (2) ist(q)。
Cases of operation word generation step. Current surgical record text:
"after the patient takes the supine position … … into the abdominal cavity, see: … … intestinal canal, uterus and accessories are widely and densely adhered; the … … bladder is provided with thickened flaky tumor nodules at the return position, so that the uterine rectal fossa can reach the thickened flaky tumor-like nodules and invade the surface of the rectum; the double accessories shrink, a large amount of fine sand-like tumor nodules are visible on the surface, and the part of the large omentum with the tumor-like nodules … … has smooth liver surface, and the pelvic cavity can be enlarged and lymph nodes are several.
Separating adhesion, exposing the operation field … … by the laying open tube, horizontally cutting the fornix part of vagina at the external orifice of the cervix, taking down the whole uterus … …, opening the right back peritoneum, sequentially cleaning the total ilium, the external ilium, the internal ilium, the obturator foramen and the deep inguinal lymph node … … on the right side, cutting off the sheet tumor nodules at the reverse fold part of the uterus bladder, and cutting off the large omentum tumor body and most of the large omentum … …
Iodophor liquid is used for washing abdominal cavity … … to resect whole uterus, double accessories, large omentum and pelvic lymph node to send conventional pathology … …'
The original operation words generated are as follows: ovarian cancer radical surgery, ovarian tumor cytoreduction surgery, total uterus + double annex excision, hysterectomy, laparoscopic pelvic lymph node dissection, and laparoscopic pelvic adhesion separation.
Step S4: merging all the original operation words to obtain a plurality of original operation word sets, wherein in each original operation word set, if two or more operation words exist in the original operation word set, for any original operation word, the relationship between at least one other original operation word and any original operation word is parallel, equivalent, modified or modified, and the method specifically comprises the following steps:
step S41: sequentially reordering the original operation word sequence according to the length of the word sequence corresponding to the original operation word and the starting point position, wherein in the reordering process, the longer the recording segment is, the more front the starting point position is, the more front the operation word is;
step S42: selecting a first original operation word from the reordered original operation word sequence, establishing a first original operation word set, and taking the selected original operation word as an element in the first original operation word set;
step S42: selecting a next original surgical word in the reordered sequence of original surgical wordsq i 'Respectively calculating the relation between the original operation words and each original operation word in each original operation word set, if any original operation word set existsQ n,j Any one of the original operation words and the original operation wordq i 'Is parallel, equivalent, modified or modified, then the original operation word is usedq i 'Added to the original surgical word setQ n,j If not, then creating a new original operation word set and combining the original operation wordsq i 'As an element in the newly created original surgical word set;
step S43: and judging whether the traversal of all the original operation words in the reordered original operation word sequence is finished, if so, ending, otherwise, returning to the step S42.
Specifically, in step S42, the relationships between the operation words are obtained from an operation word relationship model, wherein the operation word relationship model is input to be composed of two operation words and word sequences corresponding to the two operation words, respectively. Training samples of the word relation model are quintuple, and category judgment is carried out through the multi-classification model:
r(t(q 1 ),t(q 2 ),q 1 ,q 2 )=Model r (emb(t(q 1 ))⊕emb(t(q 2 ))⊕emb(q 1 )⊕emb(q 2 ))
wherein:r(t(q 1 ),t(q 2 ),q 1 ,q 2 ) Is a surgical wordq 1 Surgical wordq 2 ) Wherein 0 represents parallel, equivalent, modified, inclusive, and inclusive, of one of 0,1,2,3,4,5,Model r is a word relation model, emb @ is·) Is a vector representation of text.
Step S5: constructing an operation word tree based on an original operation word set, wherein each node except a root node in the operation word tree corresponds to one original operation word set, any node is contained by a parent node of the upper level of the original operation word set, and all child nodes of the lower level are contained, and the operation word tree specifically comprises:
step S51: creating an initial operation word tree, wherein the initial operation word tree only comprises one empty root node;
step S52: creating a first-level child node for the root node, and selecting an original operation word set as the first-level child node;
step S53: selecting the next non-traversed original operation word set and recording target comparison sub-nodesQ mubiao Is a root node;
step S54: judging the original operation word set and the child node to be comparedQ s Wherein the relation of inclusion of (c) wherein,Q s Sub (Q mubiao )Sub(Q mubiao )for a set of all levels of children of the target comparison children,
if any child node to be compared existsQ s Step S55 is executed if the original operation word set is included;
if any child node to be compared existsQ s Is included by the original operation word set, the original operation word set is inserted into the child node to be comparedQ s And to compare the positions of the child nodesQ s And the child nodes thereof are moved down,
if all the sub-nodes to be comparedQ s If there is no inclusion relation with the original operation word set, then the child node is compared with the targetQ mubia Creating a new next level child node, and taking the original operation word set as the newly created child node;
step S55: judging the child node to be comparedQ s If the child node exists, the child node to be compared is determined to beQ s As a new target comparison sub-node, repeating step S54, otherwise, as a sub-node to be comparedQ s Creating a new next level child node, and taking the original operation word set as the newly created child node;
step S56, whether the traversal of all the original operation word sets is completed or not, if yes, an operation word tree is output, otherwise, the step S53 is returned.
Specifically, in some embodiments, the containment relationships are obtained through a surgical word relationship model for any nodeQ( n,i )={q n i,,1 ,q n i,,2 … sumQ( n,j )={q n j,,1 ,q n j,,2 … }, generate the original operation wordqQWord sequence of (2) ist(q). Judging the relationship types among the nodes through the operation word relationship model:
r(Q n,i ,Q n,j )=Model r (emb'(t(Q n,i ))⊕emb' (t(Q n,j ))⊕emb' (Q n,i )⊕emb' (Q n,j ))
,
,
wherein:r(Q n,i ,Q n,j ) Is the original operation word setQ n,i With the original operation word setQ n,j Is used in the relation of (a),emb' is a vector representation of the set,BERT(q) And (5) vector expression after the operation word q is coded for the BERT coding model. BERT may be replaced with other text encoding models.
And constructing a step case of the operation word tree. Based on the case of the operation word generating step, the generated operation word comprises: ovarian cancer radical treatment, ovarian tumor cytoreduction, total uterus + double annex excision, total hysterectomy, laparoscopic pelvic lymph node cleaning, laparoscopic pelvic adhesion separation.
The step of merging operation words generates nodes:
1. ovarian cancer radical cure, ovarian tumor cell debulking, wherein "ovarian cancer radical cure" is modified by "ovarian tumor cell debulking".
2. Total uterus + double annex resection.
3. Hysterectomy.
4. Laparoscopic pelvic lymph node sweeping.
5. Laparoscopic pelvic adhesion separation.
As a result of the construction of the operation word tree, the node 1 is the radical operation of ovarian cancer and the reduction of ovarian tumor cells, the node 2, the node 4 and the node 5 are the child nodes of the node 1, and the node 3 is the child node of the node 2.
Step S6: and mapping the nodes in the operation word tree to standard operation words, and taking the sum of the costs of the standard operation words mapped by all the first-level child nodes of the root node as a recognition result.
The surgical word normalization relies on a surgical word mapping model to map the original surgical word to a standard surgical word with a cost tag. The non-standard word list collected in the operation word generation step is taken, the data sources are various, the word list scale is large, partial non-standard words are taken according to different sources, standard words synonymous with the non-standard words are manually marked as positive samples, standard words of the same class but irrelevant are sampled as negative samples at the same time, a training set is formed, and the samples in the set are in a triplet forma,b,r). Wherein, the elements in the triplet respectively represent whether the original word, the standard word and the two are synonymous,r=1 indicates synonyms for each other,r=0 means non-synonymous.
The operation word mapping model is divided into two parts of coding and searching, and the coding part is based onBERTAnd training the equivalent text representation model, wherein the model training adopts a twin network method, uses two representation models sharing parameters to respectively carry out vector coding and pooling on the nonstandard words and the standard words, and predicts whether the nonstandard words and the standard words are synonymous through cosine similarity. Of course, in some embodiments, the first and second regions,BERTthe model may be replaced with other textual representation models.
The operation cost identification step obtains operation cost from the operation word tree constructed by the steps and the trained operation word mapping model. The operation word tree generated in the operation word tree construction step takes the original operation word set as a node, and the expense recognition step maps the original operation words in the node to standard operation words one by one through an operation word mapping model. For each node, standard surgical words of the mapping result are taken as surgical items represented by the node.
When a plurality of standard operation words with different costs appear in the node mapping result, the standard operation word with the lowest cost is taken, and meanwhile, manual verification is submitted.
When the original operation words in the node cannot be mapped to the standard operation words, taking the sum of the child node fees as a fee reference of the node, and submitting manual verification.
For a given surgical record, the root node cost of the surgical word tree will be the total cost generated as a cost identification step, the cost being added by the child node costs of the root node.
For example, in one embodiment, based on cases in the procedure word generation and procedure word tree construction steps, the nodes and mapping results are generated:
1. ovarian cancer radical surgery, ovarian tumor cytoreduction, all mapped to "ovarian cancer radical surgery".
2. Total uterus + double annex resection, mapped to the same standard surgical word.
3. Hysterectomy, mapped to "total hysterectomy".
4. Laparoscopic pelvic lymph node dissection, mapped to "laparoscopic pelvic lymph node dissection".
5. Laparoscopic pelvic adhesions separation, mapped to "laparoscopic pelvic adhesions separation".
In the operation word tree, the node 1 is the father node of the nodes 2, 4 and 5, and the node 2 is the father node of the node 3, so the cost of the node 1 comprises other node cost, and the cost of the ovarian cancer radical operation is the operation cost which is automatically identified.
The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims (8)

1. An operation expense identification method based on electronic medical record text is characterized by comprising the following steps:
step S1: obtaining a standard operation word set and constructing a superset of the standard operation word set, wherein the superset of the standard operation word set comprises nonstandard operation words synonymous with each standard operation word;
step S2: preprocessing unstructured electronic medical record text to obtain fragments related to surgery, and segmenting each fragment to obtain word sequences related to each fragment respectively;
step S3: inputting a word sequence into a trained operation word generation model to generate an original operation word sequence, wherein the operation word generation model is input into the word sequence and output into the original operation word sequence composed of a plurality of original operation words, and the original operation words are elements in the superset;
step S4: merging all the original operation words to obtain a plurality of original operation word sets, wherein in each original operation word set, if two or more operation words exist in the original operation word set, for any original operation word, the relationship between at least one other original operation word and any original operation word exists in parallel, equivalent, modified or modified;
step S5: constructing an operation word tree based on an original operation word set, wherein each node except a root node in the operation word tree corresponds to one original operation word set, and any node is contained by a parent node of the upper level and contains all child nodes of the lower level;
step S6: mapping nodes in the operation word tree to standard operation words, and taking the sum of the costs of the standard operation words mapped by all first-level child nodes of the root node as a recognition result;
the process of generating the original operation word sequence by the operation word generation model specifically comprises the following steps:
step S3-2-1: setting a start pointer of a sliding windowstart=1, endpoint pointerend=|tI, position pointerp=startWhereintIs a sequence of words;
step S3-2-2: moving a position pointer from a start position to an end position along a word sequence to obtain a sub-sequencet start:p =[t start ,…,t p ]The position of the position pointer when the probability of the generated operation word is maximum;
step S3-2-3: will be composed of subsequencest start:p The generated operation word is used as an original operation word;
step S3-2-4: cutting the word sequence by taking the position of the current position pointer as a cutting point to obtain two new word sequences, and repeating the steps S3-2-1 to S3-2-3 on all the obtained word sequences to generate a plurality of original operation words;
step S3-2-4: splicing all the original operation words to form an original operation word sequence;
in the step S3-2-2, the probability of the operation word is specifically:
p=argmax start≤p≤end P(qt start:p )
q=Model g (t start:p )
wherein:pas a pointer to the location of the object,Pas a result of the probability,Model g a model is generated for the operation word,qa model is generated for the operation word, and the generated operation word is input by taking a subsequence from a starting pointer to a position pointer in the word sequence.
2. The method for identifying surgical fees based on text of an electronic medical record according to claim 1, wherein the training process of the model for generating the surgical words comprises:
step S3-1-1: generating a dictionary tree based on the superset, wherein a root node in the dictionary tree is an empty node, each other node represents a nonstandard operation word, and all operation words in the superset correspond to a path from the root node to a child node;
step S3-1-2: constructing training set data, wherein training samples in the training set data are word sequences and operation words corresponding to the word sequences, and the operation words are standard operation words or non-standard operation words;
step S3-1-3: and training the operation word generation model by using the training set data.
3. The method for identifying surgical expense based on the text of the electronic medical record according to claim 2, wherein the surgical word generation model generates the surgical words in a way of generating the words one by one.
4. The method for identifying surgical fees based on text of an electronic medical record according to claim 1, wherein said step S4 comprises:
step S41: sequentially reordering the original operation word sequences according to the length of the word sequences corresponding to the original operation words and the starting point positions, wherein in the reordering process, the longer the word sequences are, the more front the starting point positions are, the more front the operation words are;
step S42: selecting a first original operation word from the reordered original operation word sequence, establishing a first original operation word set, and taking the selected original operation word as an element in the first original operation word set;
step S42: selecting a next original surgical word in the reordered sequence of original surgical wordsq i 'Respectively calculating the relation between the original operation words and each original operation word in each original operation word set, if any one existsOriginal operation word setQ n,j Any one of the original operation words and the original operation wordq i 'Is parallel, equivalent, modified or modified, then the original operation word is usedq i 'Added to the original surgical word setQ n,j If not, then creating a new original operation word set and combining the original operation wordsq i 'As an element in the newly created original surgical word set;
step S43: and judging whether the traversal of all the original operation words in the reordered original operation word sequence is finished, if so, ending, otherwise, returning to the step S42.
5. The method according to claim 4, wherein in the step S42, the relation between the operation words is obtained by an operation word relation model, and wherein the operation word relation model is input by two operation words and word sequences corresponding to the two operation words respectively.
6. The method for identifying surgical fees based on text of an electronic medical record according to claim 1, wherein said step S5 comprises:
step S51: creating an initial operation word tree, wherein the initial operation word tree only comprises one empty root node;
step S52: creating a first-level child node for the root node, and selecting an original operation word set as the first-level child node;
step S53: selecting the next non-traversed original operation word set and recording target comparison sub-nodesQ mubiao Is a root node;
step S54: judging the original operation word set and the child node to be comparedQ s Wherein the relation of inclusion of (c) wherein,Q s Sub (Q mubiao )Sub(Q mubiao )to compare all levels of child nodes by targetThe set of child nodes is composed of,
if any child node to be compared existsQ s Step S55 is executed if the original operation word set is included;
if any child node to be compared existsQ s Is included by the original operation word set, the original operation word set is inserted into the child node to be comparedQ s And to compare the positions of the child nodesQ s And its child node moves down, executing step S56;
if all the sub-nodes to be comparedQ s If there is no inclusion relation with the original operation word set, then the child node is compared with the targetQ mubia Creating a new next level child node, using the original operation word set as the newly created child node, and executing step S56;
step S55: judging the child node to be comparedQ s If the child node exists, the child node to be compared is determined to beQ s As a new target comparison sub-node, repeating step S54, otherwise, as a sub-node to be comparedQ s Creating a new next level child node, and taking the original operation word set as the newly created child node;
step S56, whether the traversal of all the original operation word sets is completed or not, if yes, an operation word tree is output, otherwise, the step S53 is returned.
7. An electronic medical record text-based surgical fee recognition device comprising a memory, a processor, and a program stored in the memory, wherein the processor implements the method of any one of claims 1-6 when executing the program.
8. A storage medium having a program stored thereon, wherein the program, when executed, implements the method of any of claims 1-6.
CN202410056657.3A 2024-01-16 2024-01-16 Surgical fee identification method, device and storage medium based on electronic medical record text Active CN117574896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410056657.3A CN117574896B (en) 2024-01-16 2024-01-16 Surgical fee identification method, device and storage medium based on electronic medical record text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410056657.3A CN117574896B (en) 2024-01-16 2024-01-16 Surgical fee identification method, device and storage medium based on electronic medical record text

Publications (2)

Publication Number Publication Date
CN117574896A CN117574896A (en) 2024-02-20
CN117574896B true CN117574896B (en) 2024-04-09

Family

ID=89884851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410056657.3A Active CN117574896B (en) 2024-01-16 2024-01-16 Surgical fee identification method, device and storage medium based on electronic medical record text

Country Status (1)

Country Link
CN (1) CN117574896B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471895A (en) * 2018-10-29 2019-03-15 清华大学 The extraction of electronic health record phenotype, phenotype name authority method and system
CN111125355A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Information processing method and related equipment
CN112002411A (en) * 2020-08-20 2020-11-27 杭州电子科技大学 Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN112149414A (en) * 2020-09-23 2020-12-29 腾讯科技(深圳)有限公司 Text similarity determination method, device, equipment and storage medium
CN112861535A (en) * 2021-01-18 2021-05-28 山东众阳健康科技集团有限公司 Surgery classification coding method and system based on diagnosis and treatment data
CN112966068A (en) * 2020-11-09 2021-06-15 袭明科技(广东)有限公司 Resume identification method and device based on webpage information
CN114638228A (en) * 2022-03-14 2022-06-17 南京航空航天大学 Chinese named entity recognition method based on word set self-attention
CN114724167A (en) * 2022-05-09 2022-07-08 济南大学 Marketing text recognition method and system
CN115309899A (en) * 2022-08-09 2022-11-08 烟台中科网络技术研究所 Method and system for identifying and storing specific content in text
CN116386800A (en) * 2023-06-06 2023-07-04 神州医疗科技股份有限公司 Medical record data segmentation method and system based on pre-training language model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111357015B (en) * 2019-12-31 2023-05-02 深圳市优必选科技股份有限公司 Text conversion method, apparatus, computer device, and computer-readable storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471895A (en) * 2018-10-29 2019-03-15 清华大学 The extraction of electronic health record phenotype, phenotype name authority method and system
CN111125355A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Information processing method and related equipment
CN112002411A (en) * 2020-08-20 2020-11-27 杭州电子科技大学 Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN112149414A (en) * 2020-09-23 2020-12-29 腾讯科技(深圳)有限公司 Text similarity determination method, device, equipment and storage medium
CN112966068A (en) * 2020-11-09 2021-06-15 袭明科技(广东)有限公司 Resume identification method and device based on webpage information
CN112861535A (en) * 2021-01-18 2021-05-28 山东众阳健康科技集团有限公司 Surgery classification coding method and system based on diagnosis and treatment data
CN114638228A (en) * 2022-03-14 2022-06-17 南京航空航天大学 Chinese named entity recognition method based on word set self-attention
CN114724167A (en) * 2022-05-09 2022-07-08 济南大学 Marketing text recognition method and system
CN115309899A (en) * 2022-08-09 2022-11-08 烟台中科网络技术研究所 Method and system for identifying and storing specific content in text
CN116386800A (en) * 2023-06-06 2023-07-04 神州医疗科技股份有限公司 Medical record data segmentation method and system based on pre-training language model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于依存句法分析的病理报告结构化处理方法;田驰远;陈德华;王梅;乐嘉锦;;计算机研究与发展;20161215(12);全文 *

Also Published As

Publication number Publication date
CN117574896A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
WO2021139424A1 (en) Text content quality evaluation method, apparatus and device, and storage medium
CN107731269B (en) Disease coding method and system based on original diagnosis data and medical record file data
CN110459287B (en) Structured report data from medical text reports
US20220301670A1 (en) Automated information extraction and enrichment in pathology report using natural language processing
CN105069123B (en) A kind of automatic coding and system of Chinese surgical procedure information
CN109871538A (en) A kind of Chinese electronic health record name entity recognition method
CN109710925A (en) Name entity recognition method and device
CN112541066B (en) Text-structured-based medical and technical report detection method and related equipment
EP4113358A1 (en) Tagging method, relationship extraction method, storage medium and operation apparatus
CN112735544B (en) Medical record data processing method, device and storage medium
CN112560450B (en) Text error correction method and device
CN113539409B (en) Treatment scheme recommendation method, device, equipment and storage medium
CN114358001A (en) Method for standardizing diagnosis result, and related device, equipment and storage medium thereof
CN111104481B (en) Method, device and equipment for identifying matching field
CN115983233A (en) Electronic medical record duplication rate estimation method based on data stream matching
CN113704415A (en) Vector representation generation method and device for medical text
CN117574896B (en) Surgical fee identification method, device and storage medium based on electronic medical record text
CN113658720A (en) Method, apparatus, electronic device and storage medium for matching diagnostic name and ICD code
CN112632910A (en) Operation encoding method, electronic device and storage device
CN117422074A (en) Method, device, equipment and medium for standardizing clinical information text
CN112749277A (en) Medical data processing method and device and storage medium
CN112101030A (en) Method, device and equipment for establishing term mapping model and realizing standard word mapping
CN110060749B (en) Intelligent electronic medical record diagnosis method based on SEV-SDG-CNN
TWI285849B (en) Optical character recognition device, document searching system, and document searching program
CN115374787B (en) Model training method and device for continuous learning based on medical named entity recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant