CN111581987A

CN111581987A - Disease classification code recognition method, device and storage medium

Info

Publication number: CN111581987A
Application number: CN202010285122.5A
Authority: CN
Inventors: 陈逸龙
Original assignee: Guangzhou Tianpeng Computer Technology Co ltd
Current assignee: Guangzhou Tianpeng Computer Technology Co ltd
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2020-08-25

Abstract

The embodiments of the invention disclose a disease classification code recognition method, a device, a computer device and a storage medium, wherein the disease classification code recognition method comprises the following steps: acquiring diagnostic data; inputting the diagnostic data as a source language into a machine translation model, the machine translation model comprising an encoding network and a decoding network; extracting the features of the source language through the coding network to obtain the features of the source language; inputting the features of the source language into the decoding network for decoding, so that the source language is translated into a target language, and the target language is a disease classification code matched with the diagnosis data. The disease classification code identification method, the device, the computer equipment and the storage medium provided by the invention solve the problem of low accuracy of disease classification code identification in the prior art.

Description

Disease classification code recognition method, device and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a disease classification code identification method, a device and a storage medium.

Background

With the development of medical technology, International Classification of Diseases (ICD) codes have been widely used to describe patient conditions, such as etiology, injury, cause of death, and so on. Therefore, it is important how to quickly convert non-standardized data of a doctor about a patient's condition description into a standardized disease classification code.

Generally, the non-standardized to standardized transformation process described above is to assign disease classification codes to data given by a physician about a patient's condition by a hospital-specific code member. This requires a lot of special skills from the coders, such as medical knowledge, coding rules, medical terminology, etc., which in turn results in that the identification of disease classification codes relying on manual implementation is not only very labor-intensive, but also too inefficient.

For this reason, the disease classification code recognition based on computer equipment is carried out at present, however, no matter the disease classification code recognition based on dictionary retrieval technology or the disease classification code recognition based on classification learning, although the difficulty of manual implementation is relieved to a certain extent, the accuracy of the disease classification code recognition is inevitably difficult to guarantee due to data sparseness.

From the above, the existing disease classification coding identification has the defect of low accuracy.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a computer device, and a storage medium for identifying a disease classification code, so as to solve the problem of low accuracy of identifying a disease classification code in the related art.

The technical scheme adopted by the invention is as follows:

according to an aspect of an embodiment of the present invention, a disease classification code recognition method includes: acquiring diagnostic data; inputting the diagnostic data as a source language into a machine translation model, the machine translation model comprising an encoding network and a decoding network; extracting the features of the source language through the coding network to obtain the features of the source language; inputting the features of the source language into the decoding network for decoding, so that the source language is translated into a target language, and the target language is a disease classification code matched with the diagnosis data.

According to an aspect of an embodiment of the present invention, a disease classification code recognition apparatus includes: a data acquisition module for acquiring diagnostic data; a data input module for inputting the diagnostic data as a source language into a machine translation model, the machine translation model comprising an encoding network and a decoding network; the coding module is used for extracting the characteristics of the source language through the coding network to obtain the characteristics of the source language; and the decoding module is used for inputting the characteristics of the source language into the decoding network for decoding, so that the source language is translated into a target language, and the target language is a disease classification code matched with the diagnosis data.

In one embodiment, the coding network comprises a first embedding layer and a number of coding sublayers; the encoding module includes: the encoding and mapping unit is used for mapping the participles in the source language into vectors to be encoded in the first embedding layer; and the feature extraction unit is used for extracting features of the vector to be coded through a plurality of coding sublayers to obtain the features of the source language.

In one embodiment, the coding sub-layers include a first multi-headed attention layer, a first fully-connected layer, and a first residual connected layer; the feature extraction unit includes: a first input subunit, configured to receive, for each coding sublayer, an input vector of the coding sublayer as an input vector of the first multi-headed attention layer, which is input by an input end of the first multi-headed attention layer; the vector to be coded is used as an input vector of a first coding sublayer; a first fusion subunit, configured to fuse an input vector and an output vector of the first multi-head attention layer through a first residual connection layer connected to the first multi-head attention layer, and transmit the fused input vector and output vector to the first full connection layer; a second fusion subunit, configured to fuse, through a first residual connection layer connected to the first full connection layer, an input vector and an output vector of the first full connection layer to obtain an output vector of the coding sublayer; the output vector of the coding sub-layer is used as the input vector of the next coding sub-layer; and the first output subunit is used for taking the output vector of the last coding sublayer as the characteristic of the source language.

In one embodiment, the inputs of the first multi-headed attention layer include a Q1 input, a K1 input, and a V1 input; the first input subunit includes: a first vector input subunit, configured to, in the coding sublayer, input the input vector of the coding sublayer into the first multi-headed attention layer through a Q1 input terminal, a K1 input terminal, and a V1 input terminal, as a Q-terminal input vector, a K-terminal input vector, and a V-terminal input vector of the first multi-headed attention layer, respectively.

In one embodiment, the decoding network comprises a second embedding layer and a number of decoding sublayers; the decoding module includes: the decoding mapping unit is used for inputting the characteristics of the source language into the second embedded layer for mapping to obtain a vector to be decoded; and the feature decoding unit is used for decoding the vector to be decoded through a plurality of decoding sublayers to obtain the target language.

In one embodiment, the decoding sub-layers include a second multi-headed attention layer, a third multi-headed attention layer, a second fully connected layer, a second residual connected layer, and a third residual connected layer; the feature decoding unit includes: a second input subunit, configured to receive, for each decoded sublayer, an input vector of the decoded sublayer as an input vector of the second multi-headed attention layer, which is input by an input end of the second multi-headed attention layer; the vector to be decoded is used as an input vector of a first decoding sublayer; a third fusion subunit, configured to fuse the input vector and the output vector of the second multi-head attention layer through a second residual connection layer connected to the second multi-head attention layer, and transmit the fused input vector and output vector to the third multi-head attention layer; a fourth merging subunit, configured to use, through the third residual connection layer, an output vector of the coding sublayer corresponding to the decoding sublayer as an input vector of the third multi-headed attention layer, input from an input end of the third multi-headed attention layer, and merge, through a second residual connection layer connected to the third multi-headed attention layer, the input vector and the output vector of the third multi-headed attention layer, and transmit the merged input vector and the output vector to the second full connection layer; a fifth fusion subunit, configured to fuse, through a second residual connection layer connected to the second full connection layer, the input vector and the output vector of the second full connection layer to obtain an output vector of the decoding sublayer; the output vector of the decoding sub-layer is used as the input vector of the next decoding sub-layer; and the second input unit is used for obtaining the target language from the output vector of the last decoding sublayer.

In one embodiment, the inputs of the second multi-headed attention layer include a Q2 input, a K2 input, and a V2 input; the second input unit includes: a second vector input subunit, configured to, in the decoding sublayer, input the input vector of the decoding sublayer into the second multi-headed attention layer through a Q2 input terminal, a K2 input terminal, and a V2 input terminal, as a Q-terminal input vector, a K-terminal input vector, and a V-terminal input vector of the second multi-headed attention layer, respectively.

In one embodiment, the inputs of the third multi-headed attention layer include a Q3 input, a K3 input, and a V3 input; the third fusion unit includes: a third vector input subunit, configured to input, in the third multi-headed attention layer, an output vector of the coding sublayer corresponding to the decoding sublayer into the third multi-headed attention layer through a K3 input terminal and a V3 input terminal, as a K-side input vector and a V-side input vector of the third multi-headed attention layer, respectively; and a fourth vector input subunit, configured to input a result of merging the input vector and the output vector of the second multi-headed attention layer into the third multi-headed attention layer through a Q3 input terminal as a Q-terminal input vector of the third multi-headed attention layer; the third fusion unit further includes: and the vector fusion subunit is used for fusing the Q-end input vector and the output vector of the third multi-head attention layer through a second residual connecting layer connected to the third multi-head attention layer.

According to an aspect of the embodiments of the present invention, a computer device includes a processor and a memory, the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, implement the disease classification code identification method as described above.

According to an aspect of an embodiment of the present invention, a storage medium has a computer program stored thereon, which when executed by a processor implements the disease classification code recognition method as described above.

In the technical scheme, a machine translation model is utilized to translate the diagnosis data serving as the source language to obtain a target language, and the target language is used as a disease classification code matched with the diagnosis data.

Specifically, diagnostic data is obtained and input into a machine translation model as a source language, feature extraction of the source language is carried out through a coding network in the machine translation model to obtain features of the source language, and then the features of the source language are input into a decoding network in the machine translation model to be decoded to obtain a target language which is used as a disease classification code matched with the diagnostic data, so that translation from the source language to the target language is realized, dictionary retrieval technology and classification learning are not involved, data sparseness does not need to be considered, accuracy of disease classification code recognition can be guaranteed, and the problem that accuracy of disease classification code recognition is not high in the prior art is effectively solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic illustration of an implementation environment in accordance with the present invention.

FIG. 2 is a diagram illustrating a hardware configuration of a computer device, according to an example embodiment.

FIG. 3 is a flow chart illustrating a disease classification code identification method according to an example embodiment.

Fig. 4 is a schematic diagram of a structure of a machine translation model according to a corresponding embodiment of fig. 3.

Fig. 5 is a schematic diagram illustrating the structure of a coding network according to an example embodiment.

FIG. 6 is a flow diagram of one embodiment of step 350 of the corresponding embodiment of FIG. 3.

Fig. 7 is a schematic diagram of a first multi-headed attention layer according to the corresponding embodiment of fig. 5.

Fig. 8 is a schematic diagram illustrating the structure of a decoding network according to an example embodiment.

FIG. 9 is a flow chart of one embodiment of step 370 of the corresponding embodiment of FIG. 3.

Fig. 10 is a schematic diagram of the structures of a second multi-headed attention layer and a third multi-headed attention layer according to the corresponding embodiment of fig. 8.

Fig. 11 is a block diagram illustrating a disease classification code recognition apparatus according to an example embodiment.

FIG. 12 is a block diagram illustrating a computer device according to an example embodiment.

FIG. 13 is a block diagram illustrating a storage medium in accordance with an exemplary embodiment.

While specific embodiments of the invention have been shown by way of example in the drawings and will be described in detail hereinafter, such drawings and description are not intended to limit the scope of the inventive concepts in any way, but rather to explain the inventive concepts to those skilled in the art by reference to the particular embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a schematic diagram of an implementation environment related to a disease classification code identification method. The implementation environment includes a user side 110 and a service side 130.

Specifically, the client 110 is used to provide non-standardized data, i.e., diagnostic data, about the patient's condition description for the physician. The user terminal 110 may be an electronic device with a communication function, such as a desktop computer, a notebook computer, a tablet computer, a smart phone, a palm computer, and a portable mobile terminal, and is not limited herein.

The server 130 may be a desktop computer, a notebook computer, a server, or other computer devices, or may be a server cluster formed by a plurality of servers, or even a cloud computing center formed by a plurality of servers. The server is a computer device providing background services for users, for example, the background services include, but are not limited to, a disease classification code recognition service, and the like.

The server 130 and the client 110 are connected in advance in a wired or wireless manner, and data transmission between the server 130 and the client 110 is realized through the communication connection. The data transmitted includes, but is not limited to: disease classification code, diagnostic data, and the like.

Through the interaction between the ue 110 and the server 130, the ue 110 uploads the collected diagnostic data to the server 130, so that the server 130 provides the disease classification code recognition service based on the diagnostic data.

For the server 130, after receiving the diagnosis data uploaded by the client 110, the server can invoke the disease classification code recognition service, obtain the disease classification code matched with the diagnosis data, and return the disease classification code to the client 110.

Based on the process, efficient and accurate disease classification coding identification is realized.

Fig. 2 is a block diagram illustrating a hardware configuration of a computer device according to an example embodiment. Such a computer device is suitable for use in the server 130 of the implementation environment shown in fig. 1.

It should be noted that this computer device is only one example adapted to the present invention and should not be considered as providing any limitation to the scope of use of the present invention. Nor should such a computer device be interpreted as having a need to rely on or have to have one or more components of the exemplary computer device 200 shown in fig. 2.

The hardware structure of the computer device 200 may be greatly different due to the difference of configuration or performance, as shown in fig. 2, the computer device 200 includes: a power supply 210, an interface 230, at least one memory 250, and at least one Central Processing Unit (CPU) 270.

Specifically, the power supply 210 is used to provide operating voltages for various hardware devices on the computer device 200.

The interface 230 includes at least one wired or wireless network interface for interacting with external devices. For example, the interaction between the user terminal 110 and the service terminal 130 in the implementation environment shown in fig. 1 is performed.

Of course, in other examples of the present invention, the interface 230 may further include at least one serial-to-parallel conversion interface 233, at least one input/output interface 235, at least one USB interface 237, etc., as shown in fig. 2, which is not limited herein.

The storage 250 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon include an operating system 251, an application 253, data 255, etc., and the storage manner may be a transient storage or a permanent storage.

The operating system 251 is used for managing and controlling hardware devices and application programs 253 on the computer device 200, so as to implement the operation and processing of the mass data 255 in the memory 250 by the central processing unit 270, which may be windows server, Mac OS XTM, unix, linux, FreeBSDTM, and the like.

The application 253 is a computer program that performs at least one specific task on the operating system 251, and may include at least one module (not shown in fig. 2), each of which may contain a series of computer-readable instructions for the computer device 200. For example, the disease classification code identification means may be considered as an application 253 deployed at the computer device 200.

The data 255 may be photographs, pictures, etc. stored in a disk, and may also be diagnostic data, disease classification codes, etc. stored in the memory 250.

The central processor 270 may include one or more processors and is configured to communicate with the memory 250 through at least one communication bus to read computer-readable instructions stored in the memory 250, and further implement operations and processing of the mass data 255 in the memory 250. The disease classification code identification method is accomplished, for example, by the central processor 270 reading a series of computer readable instructions stored in the memory 250.

Furthermore, the present invention can be implemented by hardware circuits or by a combination of hardware circuits and software, and thus, the implementation of the present invention is not limited to any specific hardware circuits, software, or a combination of both.

Referring to fig. 3, in an exemplary embodiment, a disease classification code recognition method is applied to a computer device, for example, the server 130 of the implementation environment shown in fig. 1, and the structure of the computer device may be as shown in fig. 2.

The disease classification code recognition method may be executed by a computer device, and may also be understood as being executed by an application program (e.g., a disease classification code recognition apparatus) running in the computer device. In the following method embodiments, for convenience of description, the execution subject of each step is described as a computer device, but the present invention is not limited thereto.

The disease classification code identification method can comprise the following steps:

at step 310, diagnostic data is acquired.

The diagnostic data refers to non-standardized data that a physician describes with respect to a patient's condition.

Referring back to fig. 1, the source of the diagnostic data may be the diagnostic data collected by the user terminal in real time, or the diagnostic data collected in a historical time period stored in the user terminal. In other words, for the server, the acquisition of the diagnostic data may be the diagnostic data collected and actively reported by the receiving client, or the server actively reads the diagnostic data stored in the client, that is, actively reads the diagnostic data collected by the client within a historical time period, which is not limited herein.

Accordingly, after the diagnostic data is acquired, the computer device may process the diagnostic data in real time, or may store the reprocessed data. For example, when the memory occupancy rate of the computer device is low, or the processing is performed according to the instruction of the encoder, so that the identification efficiency of the disease classification code identification is improved.

Step 330, inputting the diagnostic data as a source language into a machine translation model.

First, it is explained that the machine translation model essentially establishes a mathematical mapping relationship between a source language and a target language, and thus, when diagnostic data in the source language is input to the machine translation model, the source language can be translated into the target language, i.e., a disease classification code considered to match the diagnostic data, based on the mathematical mapping relationship.

Here, the machine translation model is generated by training a base model using a plurality of training samples. Wherein, the base model includes but is not limited to: neural network models, and the like.

With respect to the training samples, the substance is generated by matching corrections such that the diagnostic data matches the disease classification code. In other words, the training sample refers to diagnostic data carrying an ICD-10 tag indicating a matching disease classification code for the diagnostic data.

For example, if the diagnostic data is right lung emphysema (and bullous formation), and ICD-10 tags matched with the diagnostic data by matching correction are J43.901, J43.900, then the training sample can be expressed as: right lung emphysema (with formation of large lung blebs) -J43.901, J43.900.

In regard to training, parameters of a basic model are optimized through a plurality of training samples, so that a specific function is converged, further, the mathematical mapping relation between a source language and a target language is optimized, and a machine translation model is obtained through convergence of the basic model. Wherein the specific function includes but is not limited to: a maximum likelihood function, an activation loss function, an expectation function, and the like.

The following describes an example of a training process of a machine translation model with an activation loss function as a specific function.

Specifically, parameters of the base model are initialized randomly, and loss values of the constructed activation loss function are calculated based on the randomly initialized parameters and one of the training samples. Wherein the activation loss function is constructed in relation to randomly initialized parameters.

And if the loss value of the activation loss function reaches the minimum value, the activation loss function is regarded as convergence, and the machine translation model is obtained through convergence of the basic model. That is, the machine translation model corresponds to the base model carrying randomly initialized parameters.

Otherwise, if the loss value of the activation loss function does not reach the minimum, the parameters of the basic model are updated, and the loss value of the reconstructed activation loss function is calculated again based on the updated parameters and the next training sample. Wherein the reconstruction of the activation loss function is related to the updated parameters.

And circularly optimizing in such a way until the loss value of the activation loss function reaches the minimum, and converging the basic model to obtain the machine translation model. That is, the machine translation model corresponds to the base model that carries the updated parameters.

Of course, in consideration of the training efficiency, the iteration number may also be set, and the iteration number may be flexibly adjusted according to the actual needs of the application scenario, for example, an application scenario with a high requirement on the accuracy of disease classification coding identification sets a larger iteration number.

At this time, if the number of iterations reaches the maximum, even if the loss value of the activation loss function does not reach the minimum, the training is stopped, the activation loss function is considered to be converged, and the machine translation model is obtained through convergence of the base model.

Therefore, when training is completed, the machine translation model has translation capability, so that translation from a source language to a target language can be realized, and disease classification coding recognition is further realized.

Secondly, the structure of the machine translation model belongs to the encoder-decoder type, and it can also be understood that the machine translation model includes an encoding network and a decoding network.

In one embodiment, the encoding network and the decoding network employ an rnn-rnn structure.

Specifically, as shown in fig. 4(a), the coding network includes an embedding layer, a Bi-directional Long Short Term Memory layer (BLSTM, Bi-Long Short Term Memory), and a Self-Attention layer (Self-Attention). The embedding layer adopts character-level embedding, so that the problem that the word segmentation processing is not accurate enough due to the fact that a large number of medical terms are related to diagnosis data in the medical field is solved. The bidirectional long-short term memory layer has excellent sequence modeling capability, so that the characteristic extraction taking the diagnostic data as the source language is facilitated, and the problem of insufficient model training caused by gradient disappearance can be well avoided. And (4) from the attention layer, further considering the context semantic relation of each participle in the source language and the position relation of each participle in the source language, so that the representation effect of the features on the source language can be more effectively improved.

The decoding network is used for decoding the characteristics of the source language into the target language, namely the disease classification code matched with the diagnosis data. In order to ensure that the decoder can more accurately distinguish different target languages obtained by decoding the characteristics of different source languages, the decoder is realized by adopting a unidirectional Long Short Term Memory (LSTM).

In another embodiment, the encoding network and the decoding network employ a transform-transform structure.

Specifically, as shown in fig. 4(b), the coding network includes a first embedding layer and several coding sublayers. The first embedding layer adopts character-level embedding and is used for extracting context semantic features word embedding and position feature embedding of each participle in the source language, so that computer equipment can recognize the source language expressed in a natural language form. The source language features are extracted through a plurality of coding sublayers to uniquely and accurately represent the source language, so that the accuracy of disease classification coding identification is fully guaranteed.

The decoding network includes a second embedded layer and a number of decoding sublayers. The second embedding layer still adopts character-level embedding, and substantially extracts context semantic features and position feature embedding of each participle in the source language, so that computer equipment can better identify the source language expressed in a natural language form. Decoding through several decoding sublayers finally results in the source language being translated into the target language.

And 350, extracting the features of the source language through the coding network to obtain the features of the source language.

In this embodiment, the coding network is suitable for a transform-transform structure, as shown in fig. 4(b), that is, the coding network includes a first embedded layer and a plurality of coding sublayers.

Specifically, in the first embedding layer, the participles in the source language are mapped into vectors to be encoded, so that the source language expressed in a natural language form is converted into a machine language which can be recognized by a computer device. And performing feature extraction on the vector to be coded through a plurality of coding sublayers to obtain the features of the source language so as to uniquely and accurately represent the source language.

Step 370, inputting the characteristics of the source language into the decoding network for decoding, so that the source language is translated into a target language, and the target language is a disease classification code matched with the diagnosis data.

In this embodiment, the decoding network is suitable for a transform-transform structure, as shown in fig. 4(b), that is, the decoding network includes a second embedded layer and several decoding sublayers.

Specifically, the features of the source language are input into the second embedding layer for mapping to obtain a vector to be decoded, so that the source language expressed in the natural language form is further converted into a machine language which can be identified by the computer device, and the computer device can better identify the source language expressed in the natural language form. Decoding the vector to be decoded through a plurality of decoding sublayers to obtain the target language, namely, the disease classification code matched with the diagnosis data.

Through the process, the disease classification code recognition based on the machine translation model is realized, namely, the diagnosis data is used as the source language, the disease classification code matched with the diagnosis data is used as the target language, and in the translation process from the source language to the target language, the dictionary retrieval technology and the classification learning are not involved, so that the problem of low accuracy of the disease classification code recognition caused by the data sparseness problem is solved, and the accuracy of the disease classification code recognition is effectively improved.

Referring to fig. 5, in an exemplary embodiment, the coding sub-layers include a first multi-headed attention layer, a first fully connected layer, and a first residual connected layer. The first multi-head attention layer is used for extracting local features, the first full-connection layer is used for extracting global features, and the first residual connection layer is used for carrying out feature propagation inside the coding sub-layer.

Optionally, the coding sublayer further comprises a first normalization layer for performing layer normalization.

The following describes the feature extraction process in the source language in detail with reference to the coding sublayer structures shown in fig. 6 and 5.

As shown in fig. 6, step 350 may include the steps of:

step 351, for each coding sublayer, receiving an input vector of the coding sublayer as an input vector of the first multi-head attention layer, and inputting the input vector by an input end of the first multi-head attention layer.

And the vector to be coded is used as an input vector of the first coding sublayer.

And 353, fusing the input vector and the output vector of the first multi-head attention layer through a first residual connecting layer connected to the first multi-head attention layer, and transmitting the fused input vector and the output vector to the first full connecting layer.

Step 355, fusing the input vector and the output vector of the first full connection layer through the first residual connection layer connected to the first full connection layer to obtain the output vector of the coding sublayer.

Wherein the output vector of the coding sub-layer is used as the input vector of the next coding sub-layer.

Step 357, using the output vector of the last coding sublayer as the feature of the source language.

Specifically, as shown in fig. 5, in each coding sublayer, an input vector of the coding sublayer is input through an input end of a first multi-headed attention layer in the coding sublayer, and a local vector of the coding sublayer is output. And the vector to be coded is used as an input vector of the first coding sublayer.

And transmitting the input vector of the coding sublayer to a first normalization layer connected to the first multi-head attention layer in the coding sublayer through a first residual connecting layer connected to the first multi-head attention layer in the coding sublayer, and performing feature fusion and layer normalization on the input vector and the local vector of the coding sublayer to obtain an intermediate vector of the coding sublayer.

And inputting the intermediate vector of the coding sublayer into a first full-connection layer in the coding sublayer, and outputting the intermediate vector through the first full-connection layer to obtain a global vector of the coding sublayer.

And transmitting the intermediate vector of the coding sublayer to a first normalization layer connected to the first full connection layer in the coding sublayer through a first residual connection layer connected to the first full connection layer in the coding sublayer, and performing feature fusion and layer normalization on the intermediate vector and the global vector of the coding sublayer to obtain an output vector of the coding sublayer.

And traversing each coding sublayer in the coding network, and correspondingly obtaining the output vector of each coding sublayer, wherein the output vector of the previous coding sublayer is used as the input vector of the next coding sublayer.

And when each coding sublayer in the coding network finishes traversing, taking the output vector of the last coding sublayer as the characteristic of the source language.

In the process, the feature propagation between the layers in the coding sub-layer is realized based on the first residual connecting layer, and the use of convolution operation is avoided, so that the bottleneck problem of the feature propagation can be avoided.

In addition, through the feature fusion between the layers in the coding sublayer in the first normalization layer, the features with different resolutions and different scales can be correlated rather than isolated, and the accuracy of disease classification coding identification is favorably improved.

Further, referring to fig. 7, in an exemplary embodiment, the inputs of the first multi-attention layer include a Q1 input, a K1 input, and a V1 input.

In the coding sublayer, the input vectors of the coding sublayer are input into the first multi-headed attention layer through a Q1 input terminal, a K1 input terminal and a V1 input terminal as a Q-terminal input vector, a K-terminal input vector and a V-terminal input vector of the first multi-headed attention layer, respectively.

It should be noted that Q represents query, K represents keys, and V represents values, so that the first multi-head attention layer can focus on the context semantic relationship of each participle in the source language and the position relationship of each participle in the source language, thereby further ensuring the accuracy of disease classification coding identification.

Referring to fig. 8, in an exemplary embodiment, the decoding sub-layers include a second multi-head attention layer, a third multi-head attention layer, a second fully-connected layer, a second residual connected layer, and a third residual connected layer. The second multi-head attention layer and the third multi-head attention layer are used for extracting local features, the second full connection layer is used for extracting global features, the second residual connection layer is used for carrying out feature propagation in the decoding sub-layer, and the third residual connection layer is used for carrying out feature propagation between the decoding sub-layer and the coding sub-layer corresponding to the decoding sub-layer.

Optionally, the decoding sublayer further comprises a second normalization layer for performing layer normalization.

The following describes the feature decoding process in the source language in detail with reference to the structure of the decoding sublayers shown in fig. 9 and 8.

As shown in fig. 9, step 370 may include the steps of:

step 371, for each decoded sublayer, receiving an input vector of the decoded sublayer as an input vector of the second multi-headed attention layer, which is input by an input end of the second multi-headed attention layer.

And the vector to be decoded is used as an input vector of the first decoding sublayer.

In step 373, the input vector and the output vector of the second multi-headed attention layer are fused by the second residual connection layer connected to the second multi-headed attention layer, and are transmitted to the third multi-headed attention layer.

Step 375, using the output vector of the coding sublayer corresponding to the decoding sublayer as the input vector of the third multi-headed attention layer through the third residual connection layer, and inputting the input vector from the input end of the third multi-headed attention layer.

In step 376, the input vector and the output vector of the third multi-headed attention layer are fused by the second residual connection layer connected to the third multi-headed attention layer, and are transmitted to the second full connection layer.

Step 377, fusing the input vector and the output vector of the second full connection layer through the second residual connection layer connected to the second full connection layer to obtain the output vector of the decoding sublayer.

Wherein the output vector of the decoding sub-layer is used as the input vector of the next decoding sub-layer.

Step 379, the target language is obtained from the output vector of the last decoding sublayer.

Specifically, as shown in fig. 8, in each decoding sublayer, the input vector of the decoding sublayer is input via the input end of the second multi-headed attention layer in the decoding sublayer, and the local vector of the decoding sublayer is output. The vector to be decoded is used as the input vector of the first decoding sublayer.

And transmitting the input vector of the decoding sublayer to a second normalization layer connected to the second multi-head attention layer in the decoding sublayer through a second residual connecting layer connected to the second multi-head attention layer in the decoding sublayer, performing feature fusion and layer normalization on the input vector and the local vector of the decoding sublayer to obtain a first intermediate vector of the decoding sublayer, and transmitting the first intermediate vector to a third multi-head attention layer to serve as the input vector of the third multi-head attention layer.

And transmitting the output vector of the coding sublayer corresponding to the decoding sublayer to a third multi-head attention layer as the input vector of the third multi-head attention layer through a third residual connecting layer connected between the decoding sublayer and the coding sublayer corresponding to the decoding sublayer.

And transmitting the input vector of the third multi-head attention layer to a second normalization layer connected to the third multi-head attention layer in the decoding sublayer through a second residual connection layer connected to the third multi-head attention layer in the decoding sublayer, and performing feature fusion and layer normalization on the input vector and the output vector of the third multi-head attention layer to obtain a second intermediate vector of the decoding sublayer.

And inputting the second intermediate vector of the decoding sublayer into a second full-connection layer in the decoding sublayer, and outputting the second intermediate vector through the second full-connection layer to obtain a global vector of the decoding sublayer.

And transmitting the second intermediate vector of the decoding sublayer to a second standardized layer connected to the second fully-connected layer in the decoding sublayer through a second residual connecting layer connected to the second fully-connected layer in the decoding sublayer, and performing feature fusion and layer standardization on the second intermediate vector and the global vector of the decoding sublayer to obtain an output vector of the decoding sublayer.

And traversing each decoding sublayer in the decoding network, and correspondingly obtaining the output vector of each decoding sublayer, wherein the output vector of the previous decoding sublayer is used as the input vector of the next decoding sublayer.

And obtaining the target language by the output vector of the last decoding sublayer until each decoding sublayer in the decoding network finishes traversing.

In the process, the feature propagation between each layer in the decoding sub-layer is realized based on the second residual connecting layer, and the feature propagation between the decoding sub-layer and the corresponding coding sub-layer is realized based on the third residual connecting layer, so that the convolution operation is avoided, and the bottleneck problem of the feature propagation can be avoided.

In addition, through the feature fusion between each layer in the decoding sub-layer in the second normalization layer and the feature fusion between the decoding sub-layer and the corresponding coding sub-layer, the features with different resolutions and different scales can be correlated with each other instead of being isolated, thereby being beneficial to further improving the accuracy of disease classification coding identification.

Further, referring to FIG. 10, in an exemplary embodiment, the inputs of the second multi-headed attention layer include a Q2 input, a K2 input, and a V2 input.

In the decoding sublayer, the input vector of the decoding sublayer is input to the second multi-headed attention layer through a Q2 input terminal, a K2 input terminal, and a V2 input terminal as a Q-terminal input vector, a K-terminal input vector, and a V-terminal input vector of the second multi-headed attention layer, respectively.

Still further, with continued reference to FIG. 10, in an exemplary embodiment, the inputs of the third multi-attention layer include a Q3 input, a K3 input, and a V3 input.

In the third multi-headed attention layer, an output vector of the coding sublayer corresponding to the decoding sublayer is input to the third multi-headed attention layer through a K3 input terminal and a V3 input terminal as a K-side input vector and a V-side input vector of the third multi-headed attention layer, respectively.

Meanwhile, the result of the fusion of the input vector and the output vector of the second multi-headed attention layer is input into the third multi-headed attention layer through the Q3 input end as the Q-end input vector of the third multi-headed attention layer.

Accordingly, the Q-terminal input vector and the output vector of the third multi-head attention layer are fused through the second residual connecting layer connected to the third multi-head attention layer.

It should be noted that, similarly to the first multi-head attention layer, Q represents query, K represents keys, and V represents values, so that the second multi-head attention layer and the third multi-head attention layer can further focus on the context semantic relationship and the position relationship of each participle in the source language, thereby further ensuring the accuracy of disease classification coding identification.

Through the cooperation of the embodiments, the disease classification code recognition based on the machine translation model can meet the precision requirement of the disease classification code recognition, the accuracy rate precision reaches 97.86%, and the convenient feature propagation is realized in the machine translation model through different residual connecting layers, so that the feature fusion of different resolution and different scale features in the machine translation model is facilitated, the accuracy rate of the disease classification code recognition is further fully ensured, compared with the existing disease classification code, the F value (F1-score) is improved from 82.88% to 87.58%, and the recall rate (recall) is improved from 79.38% to 87.19%.

The following is an embodiment of the apparatus of the present invention, which can be used to execute the disease classification code recognition method of the present invention. For details not disclosed in the embodiments of the apparatus of the present invention, please refer to the method embodiments of the disease classification code recognition method of the present invention.

Referring to fig. 11, in an exemplary embodiment, a disease classification code identification apparatus 900 includes, but is not limited to: a data acquisition module 910, a data input module 930, an encoding module 950, and a decoding module 970.

The data acquiring module 910 is configured to acquire diagnostic data.

A data input module 930 for inputting the diagnostic data as a source language into a machine translation model, the machine translation model comprising an encoding network and a decoding network.

And the encoding module 950 is configured to perform feature extraction of the source language through the encoding network to obtain features of the source language.

A decoding module 970, configured to input the features of the source language into the decoding network for decoding, so that the source language is translated into a target language, and the target language is a disease classification code matched with the diagnosis data.

It should be noted that, when the disease classification code recognition apparatus provided in the above embodiment performs the disease classification code recognition, only the division of the above functional modules is taken as an example, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the disease classification code recognition apparatus is divided into different functional modules to complete all or part of the above described functions.

In addition, the disease classification code recognition apparatus provided in the above embodiments and the embodiments of the disease classification code recognition method belong to the same concept, wherein the specific manner in which each module performs operations has been described in detail in the method embodiments, and is not described herein again.

Referring to fig. 12, in an exemplary embodiment, a computer device 1000 includes at least one processor 1001, at least one memory 1002, and at least one communication bus 1003.

Wherein the memory 1002 has computer readable instructions stored thereon, the processor 1001 reads the computer readable instructions stored in the memory 1002 through the communication bus 1003.

The computer readable instructions, when executed by the processor 1001, implement the disease classification code recognition method in the embodiments described above.

Referring to fig. 13, in an exemplary embodiment, a storage medium 1100 stores a computer program 1101 thereon, and the computer program 1101 is executed by a processor to implement the disease classification code recognition method in the above embodiments.

The above-mentioned embodiments are merely preferred examples of the present invention, and are not intended to limit the embodiments of the present invention, and those skilled in the art can easily make various changes and modifications according to the main concept and spirit of the present invention, so that the protection scope of the present invention shall be subject to the protection scope of the claims. .

Claims

1. A method for identifying a disease classification code, comprising:

acquiring diagnostic data;

inputting the diagnostic data as a source language into a machine translation model, the machine translation model comprising an encoding network and a decoding network;

extracting the features of the source language through the coding network to obtain the features of the source language;

inputting the features of the source language into the decoding network for decoding, so that the source language is translated into a target language, and the target language is a disease classification code matched with the diagnosis data.

2. The method of claim 1, wherein the coding network comprises a first embedding layer and a number of coding sub-layers;

the extracting the features of the source language through the coding network to obtain the features of the source language comprises:

in the first embedding layer, mapping the participles in the source language into vectors to be coded;

and performing feature extraction on the vector to be coded through a plurality of coding sublayers to obtain the features of the source language.

3. The method of claim 2, wherein the coding sub-layers include a first multi-headed attention layer, a first fully-connected layer, and a first residual connected layer;

the extracting the features of the vector to be coded through a plurality of coding sublayers to obtain the features of the source language comprises the following steps:

for each coding sublayer, receiving an input vector of the coding sublayer as an input vector of the first multi-head attention layer, and inputting the input vector by an input end of the first multi-head attention layer; the vector to be coded is used as an input vector of a first coding sublayer;

fusing an input vector and an output vector of the first multi-head attention layer through a first residual connecting layer connected to the first multi-head attention layer, and transmitting the fused input vector and the output vector to the first full connecting layer;

fusing an input vector and an output vector of the first full connection layer through a first residual connection layer connected to the first full connection layer to obtain an output vector of the coding sublayer; the output vector of the coding sub-layer is used as the input vector of the next coding sub-layer;

and taking the output vector of the last coding sub-layer as the characteristic of the source language.

4. The method of claim 3, wherein the inputs of the first multi-tap attention layer comprise a Q1 input, a K1 input, and a V1 input;

for each coding sublayer, receiving an input vector of the coding sublayer as an input vector of the first multi-headed attention layer, which is input by an input end of the first multi-headed attention layer, including:

5. The method of claim 1, wherein the decoding network comprises a second embedding layer and a number of decoding sublayers;

inputting the features of the source language into the decoding network for decoding so that the source language is translated into a target language, comprising:

inputting the characteristics of the source language into the second embedded layer for mapping to obtain a vector to be decoded;

and decoding the vector to be decoded through a plurality of decoding sublayers to obtain the target language.

6. The method of claim 5, wherein the decoding sub-layers include a second multi-headed attention layer, a third multi-headed attention layer, a second fully connected layer, a second residual connected layer, and a third residual connected layer;

the decoding of the vector to be decoded by the plurality of decoding sublayers to obtain the target language includes:

for each decoding sublayer, receiving an input vector of the decoding sublayer as an input vector of the second multi-headed attention layer, and inputting the input vector by an input end of the second multi-headed attention layer; the vector to be decoded is used as an input vector of a first decoding sublayer;

fusing an input vector and an output vector of the second multi-headed attention layer through a second residual connecting layer connected to the second multi-headed attention layer, and transmitting to the third multi-headed attention layer;

using the third residual connecting layer, using the output vector of the coding sublayer corresponding to the decoding sublayer as the input vector of the third multi-headed attention layer, and inputting the input vector from the input end of the third multi-headed attention layer, and using the second residual connecting layer connected to the third multi-headed attention layer, so that the input vector and the output vector of the third multi-headed attention layer are fused and transmitted to the second full connecting layer;

fusing the input vector and the output vector of the second full connection layer through a second residual connection layer connected to the second full connection layer to obtain an output vector of the decoding sublayer; the output vector of the decoding sub-layer is used as the input vector of the next decoding sub-layer;

and obtaining the target language from the output vector of the last decoding sub-layer.

7. The method of claim 6, wherein the inputs of the second multi-tap attention layer comprise a Q2 input, a K2 input, and a V2 input;

for each decoding sublayer, receiving an input vector of the decoding sublayer as an input vector of the second multi-headed attention layer, which is input by an input terminal of the second multi-headed attention layer, including:

8. The method of claim 6, wherein the inputs of the third multi-headed attention layer include a Q3 input, a K3 input, and a V3 input;

the said using the output vector of the coding sub-layer corresponding to the decoding sub-layer as the input vector of the said third multi-headed attention layer through the said third residual connecting layer, and input by the input end of the said third multi-headed attention layer, includes:

in the third multi-headed attention layer, inputting an output vector of an encoding sub-layer corresponding to the decoding sub-layer into the third multi-headed attention layer through a K3 input terminal and a V3 input terminal as a K-terminal input vector and a V-terminal input vector of the third multi-headed attention layer, respectively; and the number of the first and second groups,

inputting a result of fusing the input vector and the output vector of the second multi-headed attention layer into the third multi-headed attention layer through a Q3 input end as a Q-end input vector of the third multi-headed attention layer;

the fusing the input vector and the output vector of the third multi-headed attention layer by the second residual connection layer connected to the third multi-headed attention layer, comprising:

and fusing the Q-end input vector and the output vector of the third multi-head attention layer through a second residual connecting layer connected to the third multi-head attention layer.

9. A disease classification code recognition apparatus, comprising:

a data acquisition module for acquiring diagnostic data;

a data input module for inputting the diagnostic data as a source language into a machine translation model, the machine translation model comprising an encoding network and a decoding network;

the coding module is used for extracting the characteristics of the source language through the coding network to obtain the characteristics of the source language;

and the decoding module is used for inputting the characteristics of the source language into the decoding network for decoding, so that the source language is translated into a target language, and the target language is a disease classification code matched with the diagnosis data.

10. A storage medium on which a computer program is stored, which computer program, when executed by a processor, implements a disease classification code recognition method according to any one of claims 1 to 8.