CN112417241B - Method for mining topic learning pipeline based on neuroimaging literature of event - Google Patents

Method for mining topic learning pipeline based on neuroimaging literature of event Download PDF

Info

Publication number
CN112417241B
CN112417241B CN202011226838.4A CN202011226838A CN112417241B CN 112417241 B CN112417241 B CN 112417241B CN 202011226838 A CN202011226838 A CN 202011226838A CN 112417241 B CN112417241 B CN 112417241B
Authority
CN
China
Prior art keywords
word
topic
event
vector
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011226838.4A
Other languages
Chinese (zh)
Other versions
CN112417241A (en
Inventor
闫健卓
陈丽红
陈建辉
于涌川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202011226838.4A priority Critical patent/CN112417241B/en
Publication of CN112417241A publication Critical patent/CN112417241A/en
Application granted granted Critical
Publication of CN112417241B publication Critical patent/CN112417241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for mining a subject learning pipeline based on a neuroimaging literature of an event. An event-based topic learning task is designed to obtain rich semantic neural image research topics so as to improve the interpretability and accuracy of the topics. And a novel topic learning method is provided by fusing deep learning and domain knowledge with a probability topic model so as to realize topic learning based on events aiming at the full-text neuroimage document. Finally, aiming at two core indexes of theme learning, theme consistency and KL difference are selected as evaluation parameters. A set of experiments was done based on the actual data to compare the proposed method with four main topic learning methods. Experimental results show that the neural image Event-BTM can significantly improve the accuracy and integrity of the subject of neural image literature mining.

Description

Method for mining topic learning pipeline based on neuroimaging literature of event
Technical Field
The invention belongs to the field of computer science computation, and relates to a method for mining a subject learning pipeline based on neuroimaging literature of events.
Background
Neuro-image text mining is to extract knowledge from neuro-image text, and is receiving extensive attention, and subject learning is an important research focus of neuro-imaging text mining. However, current neuro-image topic learning studies mainly use traditional probabilistic topic models to extract topics from documents, and cannot obtain high-quality neuro-image topics. The existing topic learning method can not meet topic learning requirements for full-text neuroimage documents. Therefore, the invention provides a method for mining a subject learning pipeline based on the neuroimaging literature of an event.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for mining a subject learning pipeline based on neuroimaging literature of events. According to the method, three types of neuroimage research topic events are determined by analyzing the neuroimage research process and the information availability of neuroimage documents, and then a novel topic learning method is provided by fusing deep learning and domain knowledge with a probability topic model so as to realize topic learning based on the events for the full-text neuroimage documents.
In order to solve the problems, the invention adopts the following technical scheme:
the method for mining the subject learning pipeline based on the neuroimaging literature of the event comprises the following steps:
and step 1, preprocessing data.
The stop word processing is performed on paper data crawled from the PLoS One website.
And 2, expressing a predefined event.
By analyzing the course and results of the neuro-imaging study and the availability of relevant information in the neuro-imaging literature, a set of neuro-imaging study events is determined.
And 3, training an LSTM-CNN model.
Firstly, converting words into word vectors through an Embedding Layer, inputting LSTM for semantic feature extraction, and finally, taking the output of the LSTM as the input of CNN for further feature extraction.
And 4, training a PCNN model.
By vector representation, convolution, max pooling, classifying four parts, and obtaining a vector representation of the relationship.
And 5, constructing a neuroimaging Event-BTM topic learning pipeline.
And inputting functional neuroimage document data, and acquiring a theme representation result of the document.
And 6, evaluating the model.
Model performance was evaluated using model evaluation indicators.
Drawings
FIG. 1 is a diagram of a neuroimaging Event-BTM topic learning pipeline framework;
FIG. 2 is an Event-BTM topic learning model;
Detailed Description
The invention is further described below with reference to the accompanying drawings:
as shown in fig. 1, the method of the present invention mainly comprises the following steps:
step 1, data preprocessing
And performing stop word processing on the document data crawled from the PLoS One website.
Step 2, expression of predefined event
By analyzing the process and results of the neuro-image study and the availability of relevant information in the neuro-image literature, a set of neuro-image study events are determined and are divided into three subject events, "cognitive response", "experiment" and "analysis", which are used to describe the results of the neuro-image study, the experimental process and the analytical process, respectively. Each topic event contains several meta-events for the event extraction task design. The specific formula is as follows:
Event deduce-results =[trigger,<argument1,role1>?,<argument2,role2>?]=[{evoke,indicate,reveal...},<{EXPERIMENT TASK,COGNITIVE FUNCTION,MEDICAL PROBLEM},research object>+,<{FEATURES OF PHYSIOLOGY AND PSYCHOLOGY},biological mechanism>+]
in Event deduce-results Is an inference event, trigger represents a trigger word, and parameters 1 and 2 refer to the first argument and the second argument, respectively, and role1 and role12 represent roles of the first argument and the second argument, respectively.
Step 3, training LSTM-CNN model
Event recognition includes trigger word recognition, argument recognition and trigger word type recognition. BiLSTM-CNN is used to model text features for event recognition.
And 3.1, vectorizing the text data.
v word =[v w ,v c ,v t ,v char ]
Wherein v is word Is word in sentence i V of the combination vector of (c) w ,v c ,v t ,v char Word vectors, case vectors, term dictionaries, and character vectors, respectively.
Step 3.2, carrying out event element identification, wherein the characteristic modeling process based on BiLSTM is described as follows:
wherein v is word Is word in sentence i Is the combined vector of f i Is a word representation, h i Is the output of the LSTM hidden layer, and based on the output of the BiLSTM, a log-softmax function is used to obtain the log probability of each trigger word or argument.
And 4, training a PCNN model.
Taking the output of the BiLSTM-CNN model as the input of the model, the specific process is as follows:
and 4.1, feature vector.
V lf =[E 1t ,E 2t ,E 1tf ,E 1tb ,E 2tf ,E 2tb ,r]
Wherein V is lf Is the feature vector E 1t Word vector, E, which is a trigger word 2t Word vector being argument, E 1tf Is the word vector of the word preceding the trigger, E 1tb Is the word vector of the next word of the trigger, E 2tf Is a word vector of a word preceding the parameter, E 2tb Is the word vector of the word following the parameter, r is the index of the event role type.
Step 4.2, word representation.
Wherein v is wp Is a word representation vector, where v wf Is the word vector of the current word, d pft Is the distance vector between the current word and the trigger, d pfa Is the distance vector between the current word and the argument.
Step 4.3, CNN may extract global features at sentence level to predict roles.
And 5, constructing a neuroimaging Event-BTM topic learning pipeline.
Step 5.1, pre-training the BiLSTM-CNN model, wherein an input sentence firstly passes through an enabling layer to map each vocabulary or character into a word vector or character vector, then the word vector or character vector is transmitted into the BILSTM layer to obtain forward and backward vectors of the sentence, and then the forward and backward vectors are spliced to be used as hidden state vectors of the current vocabulary or character. While the CNN layer is used to extract local features of the current word.
And 5.2, taking the characteristic vector output by the BiLSTM-CNN model as input, and identifying the role of the argument by using the PCNN model.
And 5.3, applying the result identified in the step 5.2 to a topic model Event-BTM, and learning topic results of documents.
And 6, evaluating the model.
The performance of the model is evaluated by using the following model prediction effect evaluation indexes:
where coercie is the degree of topic aggregation, KL is the degree of divergence, where V is the set of words in a topic, ε is the smoothing factor (usually taken directly as 1), D (V i ,v j ) Is a calculation containing the word v i And v j And D (v) j ) For computing packagesContaining v j P is the topic distribution, p (x) is the topic word in p, q is another topic distribution, and q (x) is the topic word in q.
As described above, the present invention has the advantages that:
1. three types of neuro-imaging study topic events were determined by analyzing the course of neuro-imaging studies and the availability of information from neuro-imaging literature. Based on the above, an event-based topic learning task is designed to acquire rich semantic neural image research topics so as to improve the interpretability and accuracy of the topics.
2. By fusing deep learning and domain knowledge with a probability topic model, a novel topic learning method is provided to realize event-based topic learning for full-text neuroimage documents.
The above embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, the scope of which is defined by the claims. Various modifications and equivalent arrangements of this invention will occur to those skilled in the art, and are intended to be within the spirit and scope of the invention.

Claims (4)

1. A method for mining a subject learning pipeline based on neuroimaging literature of events, comprising the steps of:
step 1: preprocessing data; performing stop word processing on paper data crawled from a PLoS One website;
step 2: expression of predefined events;
step 3: training an LSTM-CNN model; firstly, converting words into word vectors through an Embedding Layer, inputting LSTM (least squares) for semantic feature extraction, and finally, taking the output of the LSTM as the input of CNN for further feature extraction;
step 4: training a PCNN model; by vector representation, convolution, max pooling, classifying four parts to obtain vector representation of the relationship;
step 5: constructing a neuroimaging Event-BTM topic learning pipeline, inputting functional neuroimage literature data, and obtaining topic representation results of the literature;
step 6: evaluating the model; evaluating model performance using model evaluation indicators;
the expression of the predefined event in the step 2 is specifically: the subject event is constructed by using meta event, which is expressed as a structure of "trigger word + argument";
the PCNN model is trained in the step 4, and the specific method comprises the following steps: taking the output of the BiLSTM-CNN model as the input of the PCNN model;
step one: a feature vector;
v lf =[E 1t ,E 2t ,E 1tf ,E 1tb ,E 2tf ,E 2tb ,r]
wherein V is lf Is a feature vector, E 1t Word vector, E, which is a trigger word 2t Word vector being argument, E 1tf Is the word vector of the word preceding the trigger, E 1tb Is the word vector of the next word of the trigger, E 2tf Is a word vector of a word preceding the parameter, E 2tb A word vector which is a word after the parameter, r is an index of the event role type;
step two: word representation;
wherein v is wp Is a word representation vector, where v wf Is the word vector of the current word, d pft Is the distance vector between the current word and the trigger, d pfa Is the distance vector between the current word and the argument;
step three: the CNN extracts the global features of sentence level to predict the roles as follows;
n=max(M 1 v wp )
V sf =tanh(W 2 n)
where n represents the most useful feature of each convolution kernel extracted by the maximum pool, max is used to maximize, v wp Is a word representation vector, M 1 And W is 2 Is a linear transformation matrix of the hidden layer, tanh is an activation function, V sf Is a sentence feature;
the Event-BTM topic learning model of step 5 specifically comprises the following steps:
step one: the probability of a single event pair b is:
step two: the probability of the whole event to the set B is:
step three: the topic distribution probability of the document is:
wherein b (e) i ,e j ) Is an event pair consisting of two events, e i And e j Representing an event, z being the subject of the event pair, θ z Is the distribution of the subject matter,is event e in topic z i Distribution;
the neuroimaging Event-BTM topic learning pipeline of step 5 comprises the following specific steps:
step 5.1, pre-training a BiLSTM-CNN model, wherein an input sentence firstly maps each word or character into a word vector or character vector through an enabling layer, then the word vector or character vector is transmitted into the BILSTM layer to obtain forward and backward vectors of the sentence, and then the forward and backward vectors are spliced to be used as hidden state vectors of the current word or character; and the CNN layer is used to extract local features of the current word;
step 5.2, using the feature vector output by the BiLSTM-CNN model as input, and identifying the role of the argument by using the PCNN model;
and 5.3, applying the result identified in the step 5.2 to a topic model Event-BTM, and learning topic results of documents.
2. The method for mining a topic learning pipeline based on neuroimaging literature of claim 1, wherein: the functional neuroimage literature data in the data preprocessing method described in the step 1 remove stop words including the 'the, a and an'.
3. The method for mining a topic learning pipeline based on neuroimaging literature of claim 1, wherein: the LSTM-CNN model training step in the step 3 comprises the following steps:
step one: text vectorization;
v word =[v w ,v c ,v t ,v char ]
wherein v is word Is a combined vector of words in a sentence, v w ,v c ,v t ,v char Word vectors, case vectors, term dictionary, and character vectors, respectively;
step two: event element recognition, for a sentence s= [ word ] 1 ,word 2 ,…,word i …,word n ]The BiLSTM-based feature modeling process is described as follows:
wherein f i Is a word representation, word i Is the i-th word in the sentence, h i Is the output of the LSTM hidden layer.
4. The method for mining a topic learning pipeline based on neuroimaging literature of claim 1, wherein: the performance indexes of the evaluation model in the step 6 are as follows:
wherein, coercie is the degree of topic aggregation, KL is the degree of divergence, V is the word set in a topic, ε is the smoothing factor, D (V i ,v j ) Is a calculation containing the word v i And v j And D (v) j ) For calculating the inclusion v j P is the topic distribution, p (x) is the topic word in p, q is another topic distribution, and q (x) is the topic word in q.
CN202011226838.4A 2020-11-06 2020-11-06 Method for mining topic learning pipeline based on neuroimaging literature of event Active CN112417241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011226838.4A CN112417241B (en) 2020-11-06 2020-11-06 Method for mining topic learning pipeline based on neuroimaging literature of event

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011226838.4A CN112417241B (en) 2020-11-06 2020-11-06 Method for mining topic learning pipeline based on neuroimaging literature of event

Publications (2)

Publication Number Publication Date
CN112417241A CN112417241A (en) 2021-02-26
CN112417241B true CN112417241B (en) 2024-03-12

Family

ID=74827018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011226838.4A Active CN112417241B (en) 2020-11-06 2020-11-06 Method for mining topic learning pipeline based on neuroimaging literature of event

Country Status (1)

Country Link
CN (1) CN112417241B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674298A (en) * 2019-09-29 2020-01-10 安徽信息工程学院 Deep learning mixed topic model construction method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674298A (en) * 2019-09-29 2020-01-10 安徽信息工程学院 Deep learning mixed topic model construction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A Topic Learning Pipeline for Curating Brain Cognitive Researches;YING SHENG 等;《IEEE》;20201019;全文 *

Also Published As

Publication number Publication date
CN112417241A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
Peng et al. A survey on deep learning for textual emotion analysis in social networks
CN109344236B (en) Problem similarity calculation method based on multiple characteristics
CN110147436B (en) Education knowledge map and text-based hybrid automatic question-answering method
CN111783394B (en) Training method of event extraction model, event extraction method, system and equipment
CN105095204B (en) The acquisition methods and device of synonym
Yu et al. An attention mechanism and multi-granularity-based Bi-LSTM model for Chinese Q&A system
CN109960786A (en) Chinese Measurement of word similarity based on convergence strategy
US20230069935A1 (en) Dialog system answering method based on sentence paraphrase recognition
CN112836046A (en) Four-risk one-gold-field policy and regulation text entity identification method
CN103154936A (en) Methods and systems for automated text correction
CN108874896B (en) Humor identification method based on neural network and humor characteristics
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN113919366A (en) Semantic matching method and device for power transformer knowledge question answering
CN113282711A (en) Internet of vehicles text matching method and device, electronic equipment and storage medium
CN115730078A (en) Event knowledge graph construction method and device for class case retrieval and electronic equipment
Aoumeur et al. Improving the polarity of text through word2vec embedding for primary classical arabic sentiment analysis
CN110674293B (en) Text classification method based on semantic migration
Peng et al. MPSC: A multiple-perspective semantics-crossover model for matching sentences
CN112417241B (en) Method for mining topic learning pipeline based on neuroimaging literature of event
Hu et al. Corpus of Carbonate Platforms with Lexical Annotations for Named Entity Recognition.
Panditharathna et al. Question and answering system for investment promotion based on nlp
Sadanandan et al. Improving accuracy in sentiment analysis for Malay language
CN110909547A (en) Judicial entity identification method based on improved deep learning
Wu et al. Incorporating semantic consistency for improved semi-supervised image captioning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant