CN116341546A - Medical natural language processing method based on pre-training model - Google Patents
Medical natural language processing method based on pre-training model Download PDFInfo
- Publication number
- CN116341546A CN116341546A CN202310123797.3A CN202310123797A CN116341546A CN 116341546 A CN116341546 A CN 116341546A CN 202310123797 A CN202310123797 A CN 202310123797A CN 116341546 A CN116341546 A CN 116341546A
- Authority
- CN
- China
- Prior art keywords
- medical
- natural language
- text
- model
- language processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 55
- 238000003058 natural language processing Methods 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000011218 segmentation Effects 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 16
- 238000010606 normalization Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims 2
- 238000012795 verification Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 7
- 238000005065 mining Methods 0.000 abstract description 6
- 238000011143 downstream manufacturing Methods 0.000 abstract 1
- 239000011159 matrix material Substances 0.000 description 58
- 230000008569 process Effects 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 8
- 238000009499 grossing Methods 0.000 description 6
- 238000013507 mapping Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 206010046555 Urinary retention Diseases 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 208000015181 infectious disease Diseases 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 210000002784 stomach Anatomy 0.000 description 2
- 230000002485 urinary effect Effects 0.000 description 2
- 208000019206 urinary tract infection Diseases 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 210000001015 abdomen Anatomy 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The application is applicable to the technical field of computers, and provides a medical natural language processing method, a medical natural language processing device and a computer readable storage medium, wherein the medical natural language processing method comprises the following steps: acquiring a medical text of semantic information to be mined; preprocessing the text by using a text data preprocessing technology, such as word segmentation, word list establishment and the like; loading the weight of the pre-training model, and performing fine adjustment according to the specified classification or search task on the basis of the weight; adjusting the model super-parameters according to the fine adjustment result; and carrying out semantic mining extraction on the input text through the fine-tuned model, and further executing tasks under classification, recommendation and the like. According to the scheme, the rich knowledge in the pre-training model is utilized to conduct parameter fine adjustment on downstream tasks such as medical text information extraction, medical term normalization, medical text classification, medical knowledge question-answering and the like, so that the accuracy of medical natural language downstream processing tasks such as classification, retrieval recommendation and the like is greatly improved.
Description
Technical Field
The invention relates to medical natural language processing and deep learning technology, in particular to a medical natural language processing method based on a pre-training model, which can be used for intelligent medical scenes such as medical text information extraction (entity identification and relation extraction), medical term normalization, medical text classification and medical question-answering 4 general medical natural language processing tasks.
Background
Artificial intelligence is gradually changing medical practice with recent advances in biomedical language understanding. With the development of biomedical language understanding benchmarks, artificial intelligence applications have been widely used in the medical field. Biomedical natural language processing has prompted widespread applications such as biomedical text mining, utilizing text data in electronic health records. For example, biomedical natural language processing methods can be used to provide specialized medical advice for high risk groups through text and information in electronic medical records. In addition, natural language processing technology has great application in the medical fields of speech recognition, clinical files, clinical trial matching, computer aided coding and the like.
The medical field has a large number of natural language documents, such as medical textbooks, medical encyclopedias, clinical routes, inspection reports, etc., which contain a large amount of expertise and abundant medical information. Named entities in the medical field refer to extracting important medical entities such as diseases, symptoms and the like from medical texts, and the step is also the basis of various tasks such as medical relation extraction and the like. However, due to the limitation of the size of the current medical shared corpus, the progress of processing various tasks of medical text information is greatly hindered. How to judge different medical entity categories, how to define coverage between different entities, and how to classify intention of different medical sentences all bring great challenges to researchers.
Natural Language Processing (NLP) is one of the hot spots of research in the field of artificial intelligence, how to let a computer read human language is an important point of NLP technology, and along with the increase of research and development force, the NLP technology has already made breakthrough progress, and the figure of the NLP can be seen in numerous subdivision fields such as intelligent question-answering, machine translation, spam filtering, etc. NLP technology generally depends on an NLP model, BERT developed by Google research and development team, based on bi-directional coded representation of converters), is the most widely used NLP model in recent years and performs well. How to reasonably utilize the Bert model for understanding datasets in medical natural language with limited resources is a relatively challenging problem.
The medical natural language processing technology based on the pre-training model can be applied to medical text information extraction (entity identification and relation extraction), medical term normalization, medical text classification and medical question-answering 4 general medical natural language processing tasks to obtain good effects.
Disclosure of Invention
The invention aims to solve the technical problem of effectively modeling and representing medical natural language texts so as to finish knowledge mining tasks such as medical text information extraction, medical term normalization, medical text classification, medical question answering and the like with high quality. The invention provides a medical natural language processing method based on a pre-training model, which can solve the problem of few medical text training samples and efficiently realize medical natural language information extraction and knowledge mining.
The technical scheme of the invention is as follows: firstly, initializing a data acquisition module, a data preprocessing module, a pre-training word segmentation module, a pre-training model architecture module, a downstream task model fine adjustment module, a super-parameter setting module, a downstream task performance evaluation module and a log recording module. The data acquisition module is used for exchanging medical natural language data with the environment, such as medical texts of entities to be extracted; the data preprocessing module is used for carrying out characteristic preprocessing on the medical text so as to obtain better hidden vector representation; the pre-training word segmentation module interacts with the data acquisition module to segment the input medical natural language text and establish a word list so as to further vectorize the representation; the pre-training model architecture module is used for defining a network architecture of a pre-training model so as to read parameters from a pre-trained weight file and perform fine adjustment on a designated downstream task; the downstream task model fine adjustment module interacts with the pre-training model architecture module, initializes downstream task network parameters by using the trained weights of the pre-training model, and receives the data obtained by the pre-training word segmentation module to carry out the weight fine adjustment of the downstream task model, so that the method is used for downstream tasks such as medical text information extraction and medical term normalization. The super-parameter setting module is used for setting super-parameters, such as learning rate, loss function and the like, for the downstream task model fine-tuning module; the downstream task performance evaluation module is interacted with the downstream task fine adjustment module and is used for evaluating the performance of the corresponding downstream task; the log recording module is used for recording the change of the loss function in the fine adjustment process of the task, the change of the precision along with the training period and the like.
The invention comprises the following steps:
the first step, a pre-training model and a downstream task fine adjustment environment are built and initialized, wherein the environment is provided with an operating system Ubuntu18.04 and a deep learning framework Pytorch, and the environment is composed of a data acquisition module, a data preprocessing module, a pre-training word segmentation module, a pre-training model architecture module, a downstream task model fine adjustment module, a super-parameter setting module, a downstream task performance evaluation module and a log recording module. The data acquisition module is connected with the data sample through a database; the pre-training word segmentation module loads a jieba library and is used for word segmentation of the Chinese text; the data preprocessing module loads statistical word frequency, stop words and word list establishment algorithm; the super parameter setting module receives user-defined super parameter configuration input by a user.
And secondly, the data acquisition module acquires a medical natural language sample for training through a database and sends the medical natural language sample to the pre-training word segmentation module.
Thirdly, the pre-training word segmentation module receives a medical natural language sample from the database, and carries out word segmentation through the jieba database. Specifically, for example, if the currently received medical natural language text is "stomach is uncomfortable today", the text is segmented to obtain "i/today/stomach/uncomfortable". And then the pre-training word segmentation module inputs the segmented data to the data preprocessing module.
And fourthly, the data preprocessing module receives the segmented data from the pre-training word segmentation module, counts word frequencies of all words in the data, and selects N segmented words with word frequencies from big to small to establish a word list. For each word in the word segmentation list, the position corresponding to the word is encoded as 1 by searching the position of the word in the word list, and the other positions are 0. For example, the word list is [ "me", "yes", "today", "belly", "heart", "uncomfortable" ], the vector of "me" in the medical natural language text is denoted as [1,0,0,0,0,0,0,0], and the vector of "comfort" is denoted as [0,0,0,0,0,0,1,0]. Thus, a word vector for each word in the medical natural language text can be obtained. In the embodiment, the single-heat encoding is adopted to process the medical natural language text, so that the effect of expanding the characteristics is achieved to a certain extent, and the method is suitable for the pretrained model with more parameters such as Bert. After preprocessing is performed on the data sent by all the pre-training word segmentation modules, the data preprocessing module sends the preprocessed data to the downstream task model fine adjustment module.
And fifthly, after the downstream task model fine adjustment module receives the preprocessing data from the data preprocessing module. And interacting with the pre-training model architecture module to obtain a network frame and an initialization weight of the pre-training model, selecting corresponding fine-tuning models for different downstream tasks, and setting corresponding loss functions. Specifically, for a medical entity recognition task, a fine tuning model needs to add a layer of word classification head on the head of a pre-training model framework; for a medical relation extraction task, a fine tuning model needs to add a word-level classification layer at the head of a pre-training model framework for classifying subjects and objects; the same is true for the relationship determination of subject-object pairs; for medical clinical term normalization tasks, regression and ordering phases are applied to solve. Specifically, the fine tuning model needs to regress a preset appointed number of candidate standardized terms on the head of the pre-training model framework, and then a classification layer is added on the head of the pre-training model framework for similarity prediction; for clinical trial screening standard classification tasks and user query intention recognition tasks, the fine tuning model needs to add a sequence classification layer on the head of the pre-training model framework. And after the downstream task fine tuning module selects a specific fine tuning model, sending a super-parameter and loss function request to the super-parameter setting module.
And sixthly, after receiving the super-parameter and the loss function request from the downstream task fine adjustment module, the super-parameter setting module sends the loss function and the super-parameter corresponding to the corresponding task to the downstream task fine adjustment module. Specifically, for classification tasks, a cross entropy function is used; for ordering tasks, using a twin network loss function; after the loss function is determined, the hyper-parameter setting module sends hyper-parameter settings corresponding to the corresponding downstream tasks, such as the fine-tuning cycle number, the initial learning rate, the type of the optimizer and the like, to the downstream task fine-tuning module along with the loss function type.
Seventh, the downstream task fine tuning module starts the fine tuning stage of the downstream task after receiving the loss function type and the super parameter setting from the super parameter setting module. Specifically, the downstream task fine adjustment module obtains the program code of the attention vector through parallel one-time calculation, and is used for indicating to obtain a query matrix, a key matrix and a value matrix from the memory, and loading the query matrix, the key matrix and the value matrix into the GPU, so that the GPU obtains the first attention characteristic of the medical natural language text in a parallel processing mode based on the query matrix, the key matrix and the value matrix. It should be appreciated that the GPU calls the transducer encoder to calculate the self-attention feature based on the previously derived query matrix Q, key matrix K, and value matrix V in the BERT model, typically using the following formula:
e=Score(Q,K);
a=softmax(e);
Attention Values=aV;
where Score (Q, K) represents the attention Score, d represents the dimension of the key vector, softmax represents normalizing the attention scores of all words, and attritionvalues represents the calculated attention features. Loading the query matrix, the key matrix and the value matrix into the graphic processor so that the graphic processor obtains a first attention characteristic of the natural language text in a parallel processing mode based on the query matrix, the key matrix and the value matrix, and the method comprises the following steps: loading the query matrix, the key matrix and the value matrix into the graphics processor, so that the graphics processor calculates the attention weight in parallel based on the query matrix, the key matrix and the value matrix; multiplying the attention weight by the matrix of values to obtain a first attention feature of the natural language text. The query matrix, the key matrix and the value matrix are input of each attention mechanism in the multi-head attention mechanisms, and the first attention features are obtained by splicing the first attention features by the output of each attention mechanism; and obtaining a second attention characteristic of the natural language text according to the attention splicing characteristic. And according to the attention splicing characteristics, obtaining second attention characteristics of the natural language text, performing linear mapping on the attention splicing characteristics to obtain the second attention characteristics, or performing smoothing processing on the spliced parts of the attention splicing characteristics to obtain the attention splicing characteristics after smoothing processing, and performing linear mapping on the attention splicing characteristics after smoothing processing to obtain the second attention characteristics. The second attention characteristic refers to the final output characteristic of the self-attention layer in the transducer encoder, which is usually used as the input of the feedforward neural network, and the second attention characteristic and the first attention characteristic are both matrices. Assuming that the multi-head attention mechanism employs 8 attention heads, each of which receives attention calculations as a first attention feature, a second attention feature, and an eighth attention feature, respectively. And after all the features are spliced, attention splicing features are obtained. The linear mapping may be multiplied by a preset additional weight matrix, which is obtained by joint training in the model. And finally, inputting the attention splicing characteristics into a network layer newly added by a downstream task, and carrying out fine adjustment on the weight through a set loss function. In the fine tuning process, the downstream task fine tuning module sends loss function values, precision information and the like in the fine tuning process to the log recording module to record state information in the running process.
And eighth step, the user selects the moment with highest model precision from the log record module as a final model to be used for the prediction of knowledge mining tasks such as medical text information extraction, medical term normalization, medical text classification, medical question and answer and the like.
The invention has the following beneficial effects:
1. the invention carries out fine adjustment on different medical natural language processing tasks based on the pre-training model, and can exert good robustness and generalization in the downstream medical natural language processing tasks with few training samples;
2. the invention can be suitable for any pre-training model and any medical text knowledge in the current natural language processing field to mine downstream tasks, has strong mobility and wider application range;
3. the invention can ensure the reasoning speed in operation and provide good use experience for users.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a diagram of a first step of the pre-training model-based medical natural language processing software deployment of the present invention.
Fig. 2 is a flow chart of business logic of the present invention.
FIG. 3 is an example of attention head stitching for a pre-trained model query.
FIG. 4 is a schematic diagram of a transducer encoder.
Fig. 5 is a diagram of a module decomposition structure according to the present invention.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.
Firstly, constructing and initializing a pre-training model and a downstream task fine tuning environment, wherein the environment is provided with an operating system Ubuntu18.04 and a deep learning framework Pytorch1.7.0, and a pre-training model library transformers4.5.1. The environment further comprises a data acquisition module, a data preprocessing module, a pre-training word segmentation module, a pre-training model architecture module, a downstream task model fine adjustment module, a super-parameter setting module, a downstream task performance evaluation module and a log recording module. The data acquisition module is connected with the data sample through a database; the pre-training word segmentation module loads a jieba Chinese word segmentation library and is used for segmenting Chinese text to input a model to obtain vectorization representation; the data preprocessing module loads statistical word frequency, stop words and word list establishment algorithm; the super parameter setting module receives user-defined super parameter configuration input by a user.
And secondly, the data acquisition module acquires a medical natural language sample for testing, namely 'urinary system infection is easy to happen to a urinary retention person' and a medical entity identification task label through a database, and sends the medical natural language sample and the label to the pre-training word segmentation module.
Thirdly, the pre-training word segmentation module receives a medical natural language sample from the database, and carries out word segmentation through the jieba database. Specifically, the word segmentation results are "urinary retention/patient/susceptibility/secondary/urinary system/infection". And then the pre-training word segmentation module inputs the segmented data to the data preprocessing module.
And fourthly, the data preprocessing module receives the segmented data from the pre-training word segmentation module, counts word frequencies of all words in the data, and selects N segmented words with word frequencies from big to small to establish a word list. For each word in the word segmentation list, the position corresponding to the word is encoded as 1 by searching the position of the word in the word list, and the other positions are 0. Thus, a word vector for each word in the medical natural language text can be obtained. In the embodiment, the single-heat encoding is adopted to process the medical natural language text, so that the effect of expanding the characteristics is achieved to a certain extent, and the method is suitable for the pretrained model with more parameters such as Bert. After preprocessing is performed on the data sent by all the pre-training word segmentation modules, the data preprocessing module sends the preprocessed data to the downstream task model fine adjustment module.
And fifthly, after the downstream task model fine adjustment module receives the preprocessing data from the data preprocessing module. And interacting with the pre-training model architecture module to obtain a network frame and an initialization weight of the pre-training model, and selecting a corresponding fine tuning model and setting a corresponding loss function for a medical entity to identify a downstream task. Specifically, since this example is a task for identifying a medical entity, a word classification head is added to the head of the pre-training model architecture in the fine-tuning model; and after the downstream task fine tuning module selects a specific fine tuning model, sending a super-parameter and loss function request to the super-parameter setting module.
And sixthly, after receiving the super-parameter and the loss function request from the downstream task fine adjustment module, the super-parameter setting module sends the loss function and the super-parameter corresponding to the corresponding task to the downstream task fine adjustment module. Specifically, since this example is a classification task, a cross entropy function is used; after the loss function is determined, the super-parameter setting module sends super-parameter settings corresponding to the corresponding downstream tasks, such as the fine-tuning cycle number 10, the initial learning rate 0.01, the type Adam of the optimizer and the like, along with the loss function type cross entropy loss function, to the downstream task fine-tuning module.
Seventh, the downstream task fine tuning module starts the fine tuning stage of the downstream task after receiving the loss function type and the super parameter setting from the super parameter setting module. Specifically, the downstream task fine adjustment module obtains the program code of the attention vector through parallel one-time calculation, and is used for indicating to obtain a query matrix, a key matrix and a value matrix from the memory, and loading the query matrix, the key matrix and the value matrix into the GPU, so that the GPU obtains the first attention characteristic of the medical natural language text in a parallel processing mode based on the query matrix, the key matrix and the value matrix. It should be appreciated that the GPU calls the transducer encoder to calculate the self-attention feature based on the previously derived query matrix Q, key matrix K, and value matrix V in the BERT model, typically using the following formula:
e=Score(Q,K);
a=softmax(e);
Attention Values=aV;
where Score (Q, K) represents the attention Score, d represents the dimension of the key vector, softmax represents normalizing the attention scores of all words, a represents the weight of the relationship between Q and K, i.e. what the weight of Q should be when modeling the current K, and attritionvalues represents the calculated attention feature. Loading the query matrix, the key matrix and the value matrix into a graphics processor, so that the graphics processor obtains a first attention characteristic of the natural language text in a parallel processing mode based on the query matrix, the key matrix and the value matrix,
comprising the following steps: loading the query matrix, the key matrix and the value matrix into the graphics processor, so that the graphics processor calculates the attention weight in parallel based on the query matrix, the key matrix and the value matrix; multiplying the attention weight by the matrix of values to obtain a first attention feature of the natural language text. The query matrix, the key matrix and the value matrix are input of each attention mechanism in the multi-head attention mechanisms, and the first attention features are obtained by splicing the first attention features by the output of each attention mechanism; and obtaining a second attention characteristic of the natural language text according to the attention splicing characteristic. And according to the attention splicing characteristics, obtaining second attention characteristics of the natural language text, performing linear mapping on the attention splicing characteristics to obtain the second attention characteristics, or performing smoothing processing on the spliced parts of the attention splicing characteristics to obtain the attention splicing characteristics after smoothing processing, and performing linear mapping on the attention splicing characteristics after smoothing processing to obtain the second attention characteristics. The second attention characteristic refers to the final output characteristic of the self-attention layer in the transducer encoder, which is usually used as the input of the feedforward neural network, and the second attention characteristic and the first attention characteristic are both matrices. Assuming that the multi-head attention mechanism adopts 8 attention heads, each attention head obtains attention calculation results as a first attention characteristic, a second attention characteristic, … and an eighth attention characteristic respectively. And after all the features are spliced, attention splicing features are obtained. The linear mapping may be multiplied by a preset additional weight matrix, which is obtained by joint training in the model. And finally, inputting the attention splicing characteristics into a network layer newly added by a downstream task, and carrying out fine adjustment on the weight through a set loss function. In the fine tuning process, the downstream task fine tuning module sends loss function values, precision information and the like in the fine tuning process to the log recording module to record state information in the running process.
Eighth, the user selects the moment with highest model precision from the log record module as a final model to be used for knowledge mining tasks such as medical text information extraction, medical term normalization, medical text classification, medical question and answer prediction and the like, and a result of entity identification, { "entity": urinary retention "," start_idx ":0," end_idx ":2," entity_type ": dis" }, { "entity": "urinary infection", "start_idx":7, "end_idx":11, "entity_type": dis "} is obtained. Namely, 2 entities are shared in the input medical natural language text, wherein the first entity is urine retention and is positioned at the 0-2 position of the sentence, and the type is diseases; the second entity is "urinary infection", located at positions 7-11 of the sentence, of the type disease.
Claims (7)
1. A method of medical natural language processing, comprising:
acquiring a medical natural language text of a semantic to be mined, and acquiring a medical natural language processing downstream task to be executed, wherein the medical natural language processing downstream task comprises medical text information extraction, medical term normalization, medical text classification and medical question-answering;
determining the architecture of the downstream task fine tuning model according to the medical natural language processing downstream task category;
performing parameter fine adjustment on a downstream task model architecture by taking a pre-training model network architecture and parameters as initialization weights of downstream tasks;
performing preprocessing operations such as word segmentation, word list establishment and the like on the acquired medical natural language text to obtain an initialization vector representation corresponding to the medical natural language text;
and obtaining semantic extraction vector representation by using a trained fine tuning network for the initialized vector representation of the medical natural language text, and finally applying the semantic extraction vector representation to a designated downstream task.
2. The medical natural language processing method of claim 1, wherein the text is a natural language containing a medical entity;
the method for preprocessing the medical text comprises the following steps:
and segmenting the medical text, deactivating the words, establishing a word list according to word frequency, and encoding the text by using single-hot encoding to obtain an initial vectorization representation.
3. The method of claim 2, wherein network architecture required for downstream tasks is specified on a pre-trained model output layer concatenation, such as cross entropy loss functions are applied to the full connection layer to realize the classification requirements of word segmentation from word list.
4. The medical natural language processing method of claim 1, wherein existing weights of the pre-trained model are utilized as an initialization for fine-tuning downstream task weights to ensure model convergence to an optimal solution.
5. The medical natural language processing method according to claim 1, wherein the whole network is subjected to parameter tuning according to the loss function value during training, and a model with highest classification precision on the verification set is selected as a final release output model.
6. The method for processing the medical natural language according to claim 5, wherein the medical natural language text of the semantic to be mined is input into a fine-tuned model, the fine-tuned model obtains probability distribution of classification labels corresponding to the current text, and the label with the highest probability is taken as a final classification result.
7. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310123797.3A CN116341546A (en) | 2023-02-15 | 2023-02-15 | Medical natural language processing method based on pre-training model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310123797.3A CN116341546A (en) | 2023-02-15 | 2023-02-15 | Medical natural language processing method based on pre-training model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116341546A true CN116341546A (en) | 2023-06-27 |
Family
ID=86884837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310123797.3A Pending CN116341546A (en) | 2023-02-15 | 2023-02-15 | Medical natural language processing method based on pre-training model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116341546A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118070907A (en) * | 2024-04-18 | 2024-05-24 | 卓世智星(天津)科技有限公司 | Traditional Chinese medicine customer service recovery system based on large language model |
-
2023
- 2023-02-15 CN CN202310123797.3A patent/CN116341546A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118070907A (en) * | 2024-04-18 | 2024-05-24 | 卓世智星(天津)科技有限公司 | Traditional Chinese medicine customer service recovery system based on large language model |
CN118070907B (en) * | 2024-04-18 | 2024-07-09 | 卓世智星(天津)科技有限公司 | Traditional Chinese medicine customer service recovery system based on large language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109472024B (en) | Text classification method based on bidirectional circulation attention neural network | |
CN113011533B (en) | Text classification method, apparatus, computer device and storage medium | |
CN109871538A (en) | A kind of Chinese electronic health record name entity recognition method | |
CN109508459B (en) | Method for extracting theme and key information from news | |
CN112257449B (en) | Named entity recognition method and device, computer equipment and storage medium | |
CN111949759A (en) | Method and system for retrieving medical record text similarity and computer equipment | |
CN110287337A (en) | The system and method for medicine synonym is obtained based on deep learning and knowledge mapping | |
CN115048447B (en) | Database natural language interface system based on intelligent semantic completion | |
CN113408430B (en) | Image Chinese description system and method based on multi-level strategy and deep reinforcement learning framework | |
CN111881292B (en) | Text classification method and device | |
CN113204611A (en) | Method for establishing reading understanding model, reading understanding method and corresponding device | |
CN113220865B (en) | Text similar vocabulary retrieval method, system, medium and electronic equipment | |
CN111859938B (en) | Electronic medical record entity relation extraction method based on position vector noise reduction and rich semantics | |
CN111581364B (en) | Chinese intelligent question-answer short text similarity calculation method oriented to medical field | |
CN113836896A (en) | Patent text abstract generation method and device based on deep learning | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN116341546A (en) | Medical natural language processing method based on pre-training model | |
CN115545021A (en) | Clinical term identification method and device based on deep learning | |
CN116881336A (en) | Efficient multi-mode contrast depth hash retrieval method for medical big data | |
CN116680407A (en) | Knowledge graph construction method and device | |
CN114139531B (en) | Medical entity prediction method and system based on deep learning | |
CN116595170A (en) | Medical text classification method based on soft prompt | |
CN111813927A (en) | Sentence similarity calculation method based on topic model and LSTM | |
CN115510230A (en) | Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism | |
CN112100382B (en) | Clustering method and device, computer readable storage medium and processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |