CN116561591B - Training method for semantic feature extraction model of scientific and technological literature, feature extraction method and device - Google Patents

Training method for semantic feature extraction model of scientific and technological literature, feature extraction method and device Download PDF

Info

Publication number
CN116561591B
CN116561591B CN202310836743.1A CN202310836743A CN116561591B CN 116561591 B CN116561591 B CN 116561591B CN 202310836743 A CN202310836743 A CN 202310836743A CN 116561591 B CN116561591 B CN 116561591B
Authority
CN
China
Prior art keywords
representing
semantic
features
feature extraction
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310836743.1A
Other languages
Chinese (zh)
Other versions
CN116561591A (en
Inventor
李雅文
高鸿睿
梁美玉
薛哲
李洪波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Beijing Jizhijia Technology Co Ltd
Original Assignee
Beijing University of Posts and Telecommunications
Beijing Jizhijia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications, Beijing Jizhijia Technology Co Ltd filed Critical Beijing University of Posts and Telecommunications
Priority to CN202310836743.1A priority Critical patent/CN116561591B/en
Publication of CN116561591A publication Critical patent/CN116561591A/en
Application granted granted Critical
Publication of CN116561591B publication Critical patent/CN116561591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a training method, a feature extraction method and a device for a semantic feature extraction model of a scientific literature, which are used for enriching features of the scientific literature by combining global information and local information through self-adaptive feature processing; by introducing a multi-head attention mechanism, focusing on the relationship among the characteristics of scientific and technological literature; by constructing positive samples and negative samples, mutual information between positive and negative sample semantic representations and global graph level summarization vectors in potential space is compared and calculated, parameters of a graph annotation mechanism encoder are updated by constructing loss, and semantic representation learning capacity is improved.

Description

Training method for semantic feature extraction model of scientific and technological literature, feature extraction method and device
Technical Field
The application relates to the technical field of natural language processing, in particular to a training method for a semantic feature extraction model of scientific and technical literature, a feature extraction method and a device.
Background
Semantic representation learning of scientific literature can be applied to a plurality of tasks such as text classification, abstract generation, entity identification, similarity calculation and the like. Through semantic representation learning of scientific and technological literature, a computer can better understand and process information in the literature, the automation degree and accuracy of literature processing are improved, and powerful support is provided for scientific and technological research and development.
The existing method for semantic representation learning of the technical literature is usually only aimed at one kind of information in the title, abstract or key words of the technical literature, the semantic features of the obtained technical literature are not rich enough, key text features with influence cannot be accurately captured, and a large amount of redundancy of the technical literature information is caused, so that the influence of irrelevant information on the semantic representation learning capability of the technical literature is caused.
The unsupervised graph neural network method can extract the reference relation between the technical documents from the technical document relation graph and fuse the reference relation with the semantic features of the text of the technical documents, so that the semantic representation vector of the technical documents is constructed. However, most existing studies use a supervised graph neural network approach to learn the semantic representation of a scientific literature, which requires a large amount of tagged data for a specific task, and can result in a heavy training task while also resulting in a high coupling of the feature representation of the acquired scientific literature with the specific task, thus making it difficult to directly transfer to other tasks, resulting in poor versatility of the semantic representation of the scientific literature.
The prior art random walk-based scientific literature semantic learning method is highly dependent on parameter selection, so that neighbor nodes have similar representations. The random walk-based scientific literature semantic representation learning method generally assumes that nodes have the same importance in the graph, however, the degrees of the nodes in actual situations may be greatly different, and the random walk-based scientific literature semantic representation learning method may be affected by node degree bias, so that some nodes are embedded and represented inaccurately. The random walk-based scientific literature semantic representation learning method needs to perform multiple random walks, and many nodes may need to be accessed for each walk, which results in higher computational complexity, especially for large-scale graphs, and the random walk-based scientific literature semantic representation learning method may not be applicable to some large-scale graphs.
Another semantic representation learning based on graph roll-up neural networks (GCNs) can only transfer information between neighboring nodes, and cannot utilize information far from the nodes. This may result in the GCN not performing well when handling large graphs, as the distance between nodes is far, and information transfer is limited. GCNs are very sensitive to network architecture, and even on the same mission, the use of different network architectures may result in different performance manifestations. This makes the design and tuning of the GCN more difficult.
Therefore, a new semantic feature extraction method is needed for the scientific literature.
Disclosure of Invention
In view of this, the embodiment of the application provides a training method for a semantic feature extraction model of a scientific literature, a feature extraction method and a device thereof, so as to eliminate or improve one or more defects existing in the prior art, and solve the problems that the semantic feature of the scientific literature extracted in the prior art is not abundant and accurate enough, and high coupling for specific tasks is difficult to transfer and use, and the processing requirement of a large-scale graph cannot be met.
One aspect of the application provides a training method for a semantic feature extraction model of a scientific literature, which comprises the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of samples, each sample comprises technical literature features of graph structure data, each node in the graph structure data is a text feature of a single technical literature, and each side represents a reference relation between corresponding technical literatures;
acquiring an initial semantic feature extraction model, wherein the initial semantic feature extraction model comprises an adaptive feature module and a graph annotation semantic mechanism encoder, and the adaptive feature extraction module obtains corresponding adaptive features by carrying out weighted summation on original text features, average pooling features and maximum pooling features of graph structure data; the drawing attention mechanism encoder adopts a multi-head attention mechanism to process the self-adaptive characteristics of the drawing structure data to obtain semantic characteristics;
in the training process, each sample is taken as a positive sample, and the edges of each node in each sample are randomly disturbed to form a negative sample; the positive sample is processed by the initial semantic feature extraction model to obtain positive sample semantic features, and the negative sample is processed by the initial semantic feature extraction model to obtain negative sample semantic features; processing the positive sample semantic features by adopting a readout function to obtain a graph-level summary vector; calculating a first mutual information measurement of the positive sample semantic features and the graph-level summary vectors and a second mutual information measurement of the negative sample semantic features and the graph-level summary vectors by adopting a preset discriminator; calculating a loss between the first mutual information measure and the second mutual information measure using binary cross entropy; and carrying out parameter updating on the graph annotation semantic mechanism encoder in the initial semantic feature extraction model by adopting the obtained training sample set through minimizing the loss, and obtaining a target scientific and technological literature semantic feature extraction model.
In some embodiments, the adaptive feature extraction module obtains the adaptive feature by performing weighted summation on the original text feature, the average pooling feature and the maximum pooling feature of each node of the graph structure data, where the calculation formula is as follows:
wherein ,is normalized weight, W i Is the characteristic weight, F original Representing the original text features of each node, F max Representing the maximum pooling feature, F avg Representing the average pooling feature; e represents a natural base.
In some embodiments, the graph annotation mechanism encoder extracts semantic features of each node adaptive feature in the graph structure data, further comprising:
and (3) carrying out linear rectification on the semantic features by adopting a PReLU activation function, wherein the calculation formula is as follows:
wherein ,representing semantic features of an ith node in the graph structure data, K representing the number of attention headers, W k Representing the weight matrix under the kth attention mechanism,/->Representing the influence of the neighbor node j on the node i after normalization processing; n (N) i Representing the number of neighbor nodes of node i; />Representing the self-adaptive characteristic of the jth neighbor node in the graph structure data;
wherein ,representing the adaptive characteristics of the ith node in the graph structure data,/and (ii)>Representing the adaptive feature of the jth neighbor node in the graph structure data,/for>Representing the adaptive features of the mth node in the graph structure data,/and>representing the attention factor.
In some embodiments, the preset arbiter solves the mutual information metric using a bilinear function, and the calculation formula is:
wherein ,representing the mutual information measure, +.>Representing semantic features of an ith node in the graph structure data, W representing a weight matrix,/and%>Representing the level summary vector, +.>Representing a bilinear function.
In some embodiments, a loss between the first mutual information metric and the second mutual information metric is calculated using binary cross entropyThe calculation formula is:
wherein ,graph structure data representing the positive samples, < >>Text feature representing the positive sample, +.>Representing the application relation of said positive samples, +.>Representing semantic features of the ith positive sample; />Representing the ith positive sampleThe first mutual information measure of the graph-level summary vector; />Graph structure data representing said negative samples, a ∈>Text feature representing the negative sample, +.>Representing the application relation of said positive samples, +.>Representing semantic features of the jth negative sample;representing the j-th negative sample with the second mutual information measure of the graph-level summary vector; n represents a positive number of samples and M represents a negative number of samples.
In some embodiments, during the training process, the learning rate is set to 0.0001, the activation parameter is set to 0.2, the dropout is set to 0.8, and the number of heads of the multi-head attention mechanism is at least 6.
On the other hand, the application also provides a scientific literature semantic feature extraction method based on the graph structure data, which comprises the following steps:
acquiring technical literature characteristic data in the form of graph structure data, wherein each node in the graph structure data is a text characteristic of a single technical literature, and each side represents a reference relation between the corresponding technical literatures;
inputting the technical literature feature data into a target technical literature semantic feature extraction model in the technical literature semantic feature extraction model training method so as to output semantic features corresponding to the technical literature feature data.
In some embodiments, text features of a single scientific literature are extracted using pre-trained TF-IDF, N-gram models, recurrent neural networks, or long and short term memory networks.
On the other hand, the application also provides a scientific literature semantic feature extraction device based on the graph structure data, which comprises a processor and a memory, wherein the memory is stored with computer instructions, the processor is used for executing the computer instructions stored in the memory, and the device realizes the steps of the method when the computer instructions are executed by the processor.
In another aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
The application has the advantages that:
the training method, the feature extraction method and the device for the semantic feature extraction model of the scientific literature combine global information and local information through self-adaptive feature processing to enrich the features of the scientific literature; by introducing a multi-head attention mechanism, focusing on the relationship among the characteristics of scientific and technological literature; by constructing positive samples and negative samples, mutual information between positive and negative sample semantic representations and global graph level summarization vectors in potential space is compared and calculated, parameters of a graph annotation mechanism encoder are updated by constructing loss, and semantic representation learning capacity is improved.
Furthermore, in the multi-head attention mechanism, the influence of the concerned neighbor nodes on the semantic features is limited, the scientific and technological literature features with the reference relationship are weighted and summed, global scientific and technological literature graph information is not required to be received, and the processing capacity of a large graph can be improved.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present application are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present application will be more clearly understood from the following detailed description.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application. In the drawings:
FIG. 1 is a logic diagram of a training method for semantic feature extraction model of scientific literature according to an embodiment of the present application.
FIG. 2 is a logic diagram of a training method for semantic feature extraction models of scientific literature according to another embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent. The exemplary embodiments of the present application and the descriptions thereof are used herein to explain the present application, but are not intended to limit the application.
It should be noted here that, in order to avoid obscuring the present application due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present application are shown in the drawings, while other details not greatly related to the present application are omitted.
It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
It is also noted herein that the term "coupled" may refer to not only a direct connection, but also an indirect connection in which an intermediate is present, unless otherwise specified.
Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.
In order to solve the unlabeled problem on the large-scale graph, the weights of the technical literature node characteristics with reference relation are completely dependent on the technical literature node characteristics and independent of the technical literature graph structure; the application provides a training method for a semantic feature extraction model of a scientific literature, a feature extraction method and a device thereof. And carrying out feature weighted summation on the technical documents with the reference relation by using a graph attention mechanism, giving different feature weights to each technical document, and better showing the correlation between the features of different technical documents. The unmarked problem on the large-scale graph is solved by an unsupervised graph neural network semantic representation learning method. By comparing mutual information between positive and negative local scientific and literature semantic representations and global graph semantic representations on a potential space, the graph neural network can capture local and global information, thereby improving the learning capacity of the scientific and literature semantic representations.
Specifically, the application provides a training method for a semantic feature extraction model of a scientific literature, as shown in fig. 1, comprising the following steps of S101-S103:
step S101: a training sample set is obtained, wherein the training sample set comprises a plurality of samples, each sample comprises technical literature features of graph structure data, each node in the graph structure data is a text feature of a single technical literature, and each side represents a reference relation between corresponding technical literatures.
Step S102: acquiring an initial semantic feature extraction model, wherein the initial semantic feature extraction model comprises a self-adaptive feature module and a drawing semantic mechanism encoder, and the self-adaptive feature extraction module obtains corresponding self-adaptive features by carrying out weighted summation on original text features, average pooling features and maximum pooling features of drawing structure data; the graph attention mechanism encoder adopts a multi-head attention mechanism to process the self-adaptive characteristics of the graph structure data to obtain semantic characteristics.
Step S103: in the training process, each sample is taken as a positive sample, and the edges of each node in each sample are randomly disturbed to form a negative sample; the positive sample is processed by an initial semantic feature extraction model to obtain positive sample semantic features, and the negative sample is processed by the initial semantic feature extraction model to obtain negative sample semantic features; processing the semantic features of the positive samples by adopting a readout function to obtain a graph-level summary vector; calculating a first mutual information measure of the positive sample semantic features and the graph-level summary vectors and a second mutual information measure of the negative sample semantic features and the graph-level summary vectors by adopting a preset discriminator; calculating a loss between the first mutual information measure and the second mutual information measure using the binary cross entropy; and (3) carrying out parameter updating on the graph annotation semantic mechanism encoder in the initial semantic feature extraction model by adopting the acquired training sample set through minimizing loss, and obtaining the semantic feature extraction model of the target scientific and technological literature.
In step S101, the text features of the scientific literature of the graph structure data are processed, and the text features are extracted by using a pre-trained TF-IDF, N-gram model, a cyclic neural network or a long-short-term memory network.
TF-IDF (Term Frequency-Inverse Document Frequency) measures the importance of words by calculating the Frequency of each word occurrence in the text and the inverse document Frequency in the whole corpus. The N-gram model takes as one feature N consecutive words in the text for capturing grammatical structures and contextual information in the text. A Recurrent Neural Network (RNN) and a long and short term memory network (LSTM) may be used to process the sequence data, capturing timing information and context in the text.
In step S102, the adaptive feature module enhances the generalization ability and performance of the method by automatically learning and selecting the features that are most useful for the current task. In machine learning, the model can be better adapted to different data distributions by adaptively adjusting the features. The graph attention mechanism encoder is trained based on a graph neural network, which is a neural network capable of processing graph structure data, taking as input the feature vectors of each node and its neighbors, and sharing weights in the graph structure to learn and infer the entire graph. In the application, the Attention mechanism encoder adopts a Multi-Head Attention mechanism (Multi-Head Attention), and the Multi-Head Attention mechanism is an extension form of the Attention mechanism and can more effectively extract information when processing sequence data. The method comprises dividing input into multiple heads, performing attention calculation, and splicing multiple head results to obtain final output
Specifically, in the implementation process, the self-adaptive feature extraction module obtains the self-adaptive feature by carrying out weighted summation on the original text feature, the average pooling feature and the maximum pooling feature of each node of the graph structure data, and the calculation formula is as follows:
wherein ,is normalized weight, W i Is the characteristic weight, F original Representing the original text features of each node, F max Representing the maximum pooling feature, F avg Representing an average pooling feature; e represents a natural base. Wherein, the feature weight can be preset.
wherein ,Foriginal Original text characteristics of scientific and technical literature are reserved, F max Extracting local features based on maximum pooling, F avg Global features are extracted based on the averaging pooling. The processing mode of the maximum pooling is to divide the input feature map into a plurality of rectangular areas and output the maximum value for each sub-area. This reduces feature dimensionality and computation while retaining the most significant features. The maximum pooling can be divided into overlapping pooling and non-overlapping pooling, with the difference that whether the step size of the pooling window is smaller than the window size. The processing mode of the average pooling is to divide an input image or a feature map into a plurality of rectangular areas and output an average value for each sub-area. Therefore, the dimension of the features can be reduced, the calculated amount is reduced, and meanwhile, the change of the features is smoothed. It should be noted that, the adaptive feature extraction module in the present application does not need to learn parameters.
In some embodiments, the graph attention mechanism encoder extracts semantic features for each node adaptation feature in the graph structure data, further comprising:
and (3) carrying out linear rectification on the semantic features by adopting a PReLU activation function, wherein the calculation formula is as follows:
wherein ,representing semantic features of an ith node in the graph structure data, K representing the number of attention headers, W k Representing the weight matrix under the kth attention mechanism,/->Representing the influence of the neighbor node j on the node i after normalization processing; n (N) i Representing the number of neighbor nodes of node i; />Representing the self-adaptive characteristics of the j-th neighbor node in the graph structure data;
wherein ,representing the adaptive characteristics of the ith node in the graph structure data,/for>Representing adaptive features of the jth neighbor node in the graph structure data, +.>Representing the adaptive features of the mth node in the graph structure data,/for>Representing the attention factor.
In step S103, positive and negative samples are constructed, and mutual information with the graphic level representation is calculated in potential space, so as to improve learning ability of semantic representation of scientific literature. Specifically, in this embodiment, the negative samples are obtained by randomly disturbing edges between nodes in the positive samples, and this form of negative samples is used to promote association between the local representation and the global representation during training, so as to mine the reference relationship between the nodes. Specifically, the positive sample and the negative sample are processed by an adaptive feature module and a drawing semantic mechanism encoder in the initial semantic feature extraction model to output positive sample semantic features and negative sample semantic features.
The readout function employed by the present application is an operation for a graph neural network that aggregates node features in a graph into a graph-level representation by computing graph-level summary vectors from positive sample semantic features. The readout function typically needs to satisfy the permutation invariance, i.e., not affected by the order of nodes in the graph. Common readout functions are summation, averaging, maximum, etc. Instead of these simple functions, a readou function based on a neural network, e.g. Janossy, may be used, the main idea of which is to use the neural network to process each arrangement of graph nodes and then average the resulting representation of each arrangement. This approach can approximately meet the permutation invariance, i.e., is not affected by the order of nodes in the graph. Janossy Readout may use a multi-layer perceptron (MLP) or gated loop unit (GRU) as the infrastructure. Such as Plain Feedforward/Recurrent, etc. The functions can more effectively extract the information of the graph, capture the dependency relationship among the nodes and improve the expression capacity of the model.
Further, the graph-level summary vector calculates a first mutual information measure with positive sample semantic features and a second mutual information measure with negative sample semantic features.
Mutual information is a measure of the degree of dependency between two random variables. It is understood as the amount of information that two variables share, or the amount by which the uncertainty of one variable is reduced if the other variable is known. The mutual information may be calculated using the following calculation formula:
wherein I (X; Y) represents mutual information of the variables X and Y, p (X, Y) represents joint probability distribution of the two variables, and p (X) and p (Y) represent marginal probability distribution of the variables.
Mutual information can also be calculated based on entropy, and the calculation formula is as follows:
wherein I (X; Y) represents mutual information of variables X and Y, H (X|Y) and H (Y|X) represent conditional entropy of the two variables, and H (X, Y) is joint entropy of two white energies.
In this embodiment, the preset arbiter solves the mutual information metric by using a bilinear function, and the calculation formula is as follows:
wherein ,representing the mutual information measure, +.>Representing semantic features of an ith node in the graph structure data, W representing a weight matrix,/and%>Representing the level summary vector, +.>Representing a bilinear function.
In some embodiments, the loss between the first mutual information measure and the second mutual information measure is calculated using binary cross entropyThe calculation formula is:
wherein ,Graph structure data representing positive samples, +.>Text feature representing positive sample, ++>Representing the application relation of positive samples, +.>Representing semantic features of the ith positive sample; />A first mutual information metric representing an ith positive sample and a graph level summary vector; />Graph structure data representing negative samples, +.>Text feature representing negative sample, ++>Representing the application relation of positive samples, +.>Representing semantic features of the jth negative sample; />A second mutual information measure representing a j-th negative sample and a graph level summary vector; n represents a positive number of samples and M represents a negative number of samples.
In the training process, parameter updating is carried out on the graph annotation mechanism encoder by minimizing loss, and training cut-off conditions are set according to set iteration times or set loss values.
In some embodiments, the learning rate is set to 0.0001, the activation parameter is set to 0.2, the dropout is set to 0.8, and the number of heads of the multi-head attention mechanism is at least 6.
On the other hand, the application also provides a scientific literature semantic feature extraction method based on the graph structure data, which comprises the following steps of S201 to S202:
step S201: and acquiring technical literature characteristic data in the form of graph structure data, wherein each node in the graph structure data is a text characteristic of a single technical literature, and each side represents a reference relation between the corresponding technical literatures.
Step S202: inputting the technical literature feature data into a target technical literature semantic feature extraction model in the technical literature semantic feature extraction model training method in the steps S101-S103 so as to output semantic features corresponding to the technical literature feature data.
In some embodiments, text features of a single scientific literature are extracted using pre-trained TF-IDF, N-gram models, recurrent neural networks, or long and short term memory networks.
On the other hand, the application also provides a scientific literature semantic feature extraction device based on the graph structure data, which comprises a processor and a memory, wherein the memory is stored with computer instructions, the processor is used for executing the computer instructions stored in the memory, and the device realizes the steps of the method when the computer instructions are executed by the processor.
In another aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
The application is described below in connection with a specific embodiment:
the embodiment provides a training method for a semantic feature extraction model of a scientific literature, which aims at processing the features of the scientific literature of graph structure data, wherein each node in the graph structure data is the text feature of a single scientific literature, and each side represents the reference relation between the corresponding scientific literature; as shown in fig. 2, three parts are included, namely an adaptive feature module, a drawing meaning mechanism encoder and scientific literature semantic representation learning based on maximum mutual information. And the self-adaptive characteristic is utilized to consider global information and local information of the text characteristics of the scientific literature, so that the characteristic information of the text of the scientific literature is enriched. The graph attention encoder is utilized to learn the representation of the scientific literature nodes, and the unsupervised graph neural network semantic representation learning is utilized to solve the unlabeled problem on the large-scale graph.
The self-adaptive feature module part adds feature weights in order to fully utilize feature information of the scientific literature, so that the method processes the text features of the scientific literature according to feature distribution of the data of the scientific literature by using the specific weights, and the self-adaptive feature processing part does not need to learn parameters. The feature processing formula is as follows:
wherein ,is normalized weight, W i Is the characteristic weight, F original Representing the original text features of each node, F max Representing the maximum pooling feature, F avg Representing an average pooling feature; e represents a natural base. Wherein, the feature weight can be preset.
Averaging pooling feature F avg The calculation formula is as follows:
avg (X) is a function used to determine the average feature value of the scientific literature, and X is the textual feature of the scientific literature.
Max pooling F max The method is used for extracting the local characteristics of scientific literature, and the calculation formula is as follows:
max (X) is a function for obtaining the maximum feature value of the scientific literature, and X is the text feature of the scientific literature.
F original Original characteristics of the technical literature are reserved, the local information of the technical literature is focused by the maximum pooling part, and the global information of the technical literature is focused by the average pooling part. The scientific literature features are processed with specific weights according to the feature distribution of the scientific literature data.
In the graph-annotation mechanism encoder section, a new feature is generated assuming that the dimension of the node feature of this feature vector isThis feature can be expressed as +.>, wherein ,/>
In the attention layer of the figure, a weight matrix is usedActing on the scientific literature node, the attention coefficient is calculated by using a self-attention mechanism. Attention coefficient is expressed as->
wherein ,eij Is the influence of the characteristics of the technical literature j on the technical literature i, and the softmax is introduced to the neighbor node j of the technical literature node i to normalize the influence, wherein the expression is as follows:
wherein ,Ni Representing the number of neighbor nodes of node i.Is at->Normalized by softmax.
After the weight matrix between the neural network connection layers is obtained, the output layer of the feedforward neural network is processed by using a LeakyReLu function.
The complete attention mechanism formula is expressed as:
wherein ,representing the adaptive characteristics of the ith node in the graph structure data,/for>Representing adaptive features of the jth neighbor node in the graph structure data, +.>Representing the adaptive features of the mth node in the graph structure data,/for>Representing the attention factor.
In this way, regularized attention coefficients between nodes of each scientific literature can be obtained, which can be used to predict the characteristics of each scientific literature:
w is a matrix of weights and,all neighbor nodes of i are represented. The PReLU activation function is adopted to adaptively learn the parameters of the rectification linear unit, and the precision is improved under the condition of negligible additional calculation cost.
In order to stabilize the self-attention learning process, the present embodiment uses a multi-head attention mechanism, converts the above equation by introducing K independent attention mechanisms, performs K-means operation on the features thereof, and applies the pralu function. The calculation formula is as follows:
wherein ,representing semantic features of an ith node in the graph structure data, K representing the number of attention headers, W k Representing the weight matrix under the kth attention mechanism,/->Representing the influence of the neighbor node j on the node i after normalization processing; n (N) i Representing the number of neighbor nodes of node i; />And the adaptive characteristic of the j-th neighbor node in the graph structure data is shown.
Accordingly, under the multi-head attention mechanism, the effect of node j on node i is expressed as:
wherein ,representing the adaptive characteristics of the ith node in the graph structure data,/for>Representing adaptive features of the jth neighbor node in the graph structure data, +.>Representing the adaptive features of the mth node in the graph structure data,/for>Representing the attention factor.
The semantic representation learning part based on the maximum mutual information utilizes an unsupervised graph to compare learning strategies, so that the problems of no marking and expandability on a large-scale graph are solved. By comparing mutual information between positive and negative local scientific and technological literature node representations and global scientific and technological literature graph representations in potential space, the graph neural network can capture local and global information, and semantic representation learning effect of the scientific and technological literature can be improved. Graph attention layer generation node embeddingIs global information centered on node i and not just itself. To obtain the graph level summary vector +.>Use readout function +.>And obtaining the characteristic representation of the whole graph by aggregating the node characteristics. The process is expressed as:
a arbiter is used as a measure of local mutual information maximization,,/>the probability score given to this summary information, i.e., the expression for mutual information, is:
wherein ,representing the mutual information measure, +.>Representing semantic features of an ith node in the graph structure data, W representing a weight matrix,/and%>Representing the level summary vector, +.>Representing a bilinear function.
In this embodiment, positive and negative samples are introduced for training, and negative samples are formed by randomly disturbing edges of each node for scientific literature characteristics of the original graph structure data. Random functions may also be usedTo generate a negative sample of scientific literature. This process is to leave unchanged the adjacency matrix representing the scientific literature reference relationships, the characteristics of the random scientific literature nodes. The marginal distribution and the joint distribution of the local feature and the global feature product have larger Jensen-Shannon (JS) divergence, the JS divergence is positively correlated with mutual information, and the larger the mutual information between the local feature and the global feature is, the stronger the correlation between the local feature and the global feature is. The noise is compared with a discriminator and the loss is calculated using binary cross entropy. The discriminator can accurately distinguish negative and positive samples, enhances the JS divergence, and increases mutual information between the local characteristic representation and the global representation.
Finally, the loss function is constructed as:
wherein ,graph structure data representing positive samples, +.>Text feature representing positive sample, ++>Indicating the response of a positive sampleUse relationship(s)>Representing semantic features of the ith positive sample; />A first mutual information metric representing an ith positive sample and a graph level summary vector; />Graph structure data representing negative samples, +.>Text feature representing negative sample, ++>Representing the application relation of positive samples, +.>Representing semantic features of the jth negative sample; />A second mutual information measure representing a j-th negative sample and a graph level summary vector; n represents a positive number of samples and M represents a negative number of samples.
The embodiment reserves the original characteristics of the scientific literature, pays attention to the local information of the scientific literature and pays attention to the global information of the scientific literature. The scientific literature features are processed with specific weights according to the feature distribution of the scientific literature data. By comparing mutual information between positive and negative local scientific and technological literature semantic representations and global graph semantic representations on potential space, the graph neural network can capture local and global information, so that the semantic representation learning capacity is improved. By introducing a graph attention mechanism, relationships between the features of the scientific literature nodes can be better integrated into the method, and the node representations are only relevant to neighbor nodes, which can be directly applied to inductive learning without acquiring the whole graph.
In summary, the training method, the feature extraction method and the device for the semantic feature extraction model of the scientific literature enrich features of the scientific literature by combining global information and local information through self-adaptive feature processing; by introducing a multi-head attention mechanism, focusing on the relationship among the characteristics of scientific and technological literature; by constructing positive samples and negative samples, mutual information between positive and negative sample semantic representations and global graph level summarization vectors in potential space is compared and calculated, parameters of a graph annotation mechanism encoder are updated by constructing loss, and semantic representation learning capacity is improved.
Accordingly, the present application also provides an apparatus/system comprising a computer device including a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the apparatus/system implementing the steps of the method as described above when the computer instructions are executed by the processor.
The embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the edge computing server deployment method described above. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.
It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present application.
In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, and various modifications and variations can be made to the embodiments of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (8)

1. A training method for a semantic feature extraction model of a scientific literature is characterized by comprising the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of samples, each sample comprises technical literature features of graph structure data, each node in the graph structure data is a text feature of a single technical literature, and each side represents a reference relation between corresponding technical literatures;
acquiring an initial semantic feature extraction model, wherein the initial semantic feature extraction model comprises an adaptive feature module and a graph annotation semantic mechanism encoder, and the adaptive feature extraction module obtains corresponding adaptive features by carrying out weighted summation on original text features, average pooling features and maximum pooling features of graph structure data; the drawing attention mechanism encoder adopts a multi-head attention mechanism to process the self-adaptive characteristics of the drawing structure data to obtain semantic characteristics;
in the training process, each sample is taken as a positive sample, and the edges of each node in each sample are randomly disturbed to form a negative sample; the positive sample is processed by the initial semantic feature extraction model to obtain positive sample semantic features, and the negative sample is processed by the initial semantic feature extraction model to obtain negative sample semantic features; processing the positive sample semantic features by adopting a readout function to obtain a graph-level summary vector; calculating a first mutual information measurement of the positive sample semantic features and the graph-level summary vectors and a second mutual information measurement of the negative sample semantic features and the graph-level summary vectors by adopting a preset discriminator; calculating a loss between the first mutual information measure and the second mutual information measure using binary cross entropy; carrying out parameter updating on a graph annotation semantic mechanism encoder in the initial semantic feature extraction model by adopting the acquired training sample set through minimizing the loss, and obtaining a target scientific and technological literature semantic feature extraction model;
the self-adaptive feature extraction module obtains self-adaptive features by carrying out weighted summation on the original text features, the average pooling features and the maximum pooling features of each node of the graph structure data, and the calculation formula is as follows:
wherein ,is normalized weight, W i Is the characteristic weight, F original Representing the original text features of each node, F max Representing the most of thePooling features, F avg Representing the average pooling feature; e represents a natural base;
the graph annotation mechanism encoder extracts semantic features of each node self-adaptive feature in the graph structure data, and further comprises:
and (3) carrying out linear rectification on the semantic features by adopting a PReLU activation function, wherein the calculation formula is as follows:
wherein ,representing semantic features of an ith node in the graph structure data, K representing the number of attention headers, W k Representing the weight matrix under the kth attention mechanism,/->Representing the influence of the neighbor node j on the node i after normalization processing; n (N) i Representing the number of neighbor nodes of node i; />Representing the self-adaptive characteristic of the jth neighbor node in the graph structure data;
wherein the graph annotation mechanism encoder adopts a weight matrix W for the self-adaptive characteristic of the input k Acting on nodes of scientific literature, introducing linear conversion to perform normalization processing, and processing an output layer by adopting a LeakyReLu function;
wherein ,representing the adaptive characteristics of the ith node in the graph structure data,/and (ii)>Representing the adaptive feature of the jth neighbor node in the graph structure data,/for>Representing the adaptive features of the mth node in the graph structure data,/and>representing an attention coefficient;
the dimension of the input of the drawing meaning mechanism encoder is thatAdaptive feature of->, wherein ,
2. the training method of the semantic feature extraction model of the scientific literature according to claim 1, wherein the preset discriminator adopts a bilinear function to solve mutual information measurement, and the calculation formula is as follows:
wherein ,representing the mutual information measure, +.>Representing semantic features of an ith node in the graph structure data, W representing a weight matrix,/and%>Representing the graphLevel summary vector,/->Representing a bilinear function.
3. The scientific literature semantic feature extraction model training method of claim 2, wherein a loss between the first mutual information metric and the second mutual information metric is calculated using binary cross entropyThe calculation formula is:
wherein ,graph structure data representing the positive samples, < >>Text feature representing the positive sample, +.>Representing the application relation of said positive samples, +.>Representing semantic features of the ith positive sample; />Representing the first mutual information metric of an ith positive sample and the level summary vector; />Graph structure data representing said negative samples, a ∈>Text feature representing the negative sample, +.>Representing the application relation of said positive samples, +.>Representing semantic features of the jth negative sample; />Representing the j-th negative sample with the second mutual information measure of the graph-level summary vector; n represents a positive number of samples and M represents a negative number of samples.
4. The training method of the semantic feature extraction model of scientific literature according to claim 1, wherein in the training process, a learning rate is set to be 0.0001, an activation parameter is set to be 0.2, a dropout is set to be 0.8, and the number of heads of the multi-head attention mechanism is at least 6.
5. The scientific literature semantic feature extraction method based on the graph structure data is characterized by comprising the following steps of:
acquiring technical literature characteristic data in the form of graph structure data, wherein each node in the graph structure data is a text characteristic of a single technical literature, and each side represents a reference relation between the corresponding technical literatures;
inputting the technical literature feature data into a target technical literature semantic feature extraction model in the technical literature semantic feature extraction model training method according to any one of claims 1 to 4 so as to output semantic features corresponding to the technical literature feature data.
6. The graph structure data-based technology document semantic feature extraction method according to claim 5, wherein text features of a single technology document are extracted by using a pre-trained TF-IDF, N-gram model, a cyclic neural network or a long-short-term memory network.
7. A scientific literature semantic feature extraction device based on graph structure data, comprising a processor and a memory, wherein the memory has stored therein computer instructions for executing the computer instructions stored in the memory, the device implementing the steps of the method according to any one of claims 1 to 6 when the computer instructions are executed by the processor.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
CN202310836743.1A 2023-07-10 2023-07-10 Training method for semantic feature extraction model of scientific and technological literature, feature extraction method and device Active CN116561591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310836743.1A CN116561591B (en) 2023-07-10 2023-07-10 Training method for semantic feature extraction model of scientific and technological literature, feature extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310836743.1A CN116561591B (en) 2023-07-10 2023-07-10 Training method for semantic feature extraction model of scientific and technological literature, feature extraction method and device

Publications (2)

Publication Number Publication Date
CN116561591A CN116561591A (en) 2023-08-08
CN116561591B true CN116561591B (en) 2023-10-31

Family

ID=87503872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310836743.1A Active CN116561591B (en) 2023-07-10 2023-07-10 Training method for semantic feature extraction model of scientific and technological literature, feature extraction method and device

Country Status (1)

Country Link
CN (1) CN116561591B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067057A (en) * 2021-11-22 2022-02-18 安徽大学 Human body reconstruction method, model and device based on attention mechanism
CN114663857A (en) * 2022-03-22 2022-06-24 深圳海星智驾科技有限公司 Point cloud target detection method and device and domain controller
CN114817578A (en) * 2022-06-29 2022-07-29 北京邮电大学 Scientific and technological thesis citation relation representation learning method, system and storage medium
CN115690522A (en) * 2022-12-29 2023-02-03 湖北工业大学 Target detection method based on multi-pooling fusion channel attention and application thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067057A (en) * 2021-11-22 2022-02-18 安徽大学 Human body reconstruction method, model and device based on attention mechanism
CN114663857A (en) * 2022-03-22 2022-06-24 深圳海星智驾科技有限公司 Point cloud target detection method and device and domain controller
CN114817578A (en) * 2022-06-29 2022-07-29 北京邮电大学 Scientific and technological thesis citation relation representation learning method, system and storage medium
CN115690522A (en) * 2022-12-29 2023-02-03 湖北工业大学 Target detection method based on multi-pooling fusion channel attention and application thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Unsupervised Semantic Representation Learning of Scientific Literature Based on Graph Attention Mechanism and Maximum Mutual Information;Hongrui Gao 等;2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS);第496-500页 *

Also Published As

Publication number Publication date
CN116561591A (en) 2023-08-08

Similar Documents

Publication Publication Date Title
Chen et al. Order-free rnn with visual attention for multi-label classification
CN114067160B (en) Small sample remote sensing image scene classification method based on embedded smooth graph neural network
Deng et al. Active transfer learning network: A unified deep joint spectral–spatial feature learning model for hyperspectral image classification
Dai et al. Multilayer one-class extreme learning machine
Rizve et al. Openldn: Learning to discover novel classes for open-world semi-supervised learning
CN111914156A (en) Cross-modal retrieval method and system for self-adaptive label perception graph convolution network
US20220076074A1 (en) Multi-source domain adaptation with mutual learning
Choi et al. Face video retrieval based on the deep CNN with RBF loss
Wang et al. Unsupervised selective labeling for more effective semi-supervised learning
CN116263785A (en) Training method, classification method and device of cross-domain text classification model
US20230134508A1 (en) Electronic device and method with machine learning training
Lee et al. Learning in the wild: When, how, and what to learn for on-device dataset adaptation
CN114255371A (en) Small sample image classification method based on component supervision network
Berlin et al. Spiking neural network based on joint entropy of optical flow features for human action recognition
CN116089648A (en) File management system and method based on artificial intelligence
CN111598712A (en) Training and searching method for data feature generator in social media cross-modal search
Wang et al. Learning Domain‐Independent Deep Representations by Mutual Information Minimization
Tang et al. Adaptive pedestrian detection using convolutional neural network with dynamically adjusted classifier
CN117523295A (en) Passive domain adaptive image classification method based on class guide element learning
Hua et al. Robust and sparse label propagation for graph-based semi-supervised classification
Passalis et al. Deep temporal logistic bag-of-features for forecasting high frequency limit order book time series
CN116561591B (en) Training method for semantic feature extraction model of scientific and technological literature, feature extraction method and device
US20240119743A1 (en) Pre-training for scene text detection
Wang et al. Domain adaptation network based on hypergraph regularized denoising autoencoder
Tian et al. Partial domain adaptation by progressive sample learning of shared classes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant