CN115481247A - Author name disambiguation method based on comparative learning and heterogeneous graph attention network - Google Patents

Author name disambiguation method based on comparative learning and heterogeneous graph attention network Download PDF

Info

Publication number
CN115481247A
CN115481247A CN202211151607.0A CN202211151607A CN115481247A CN 115481247 A CN115481247 A CN 115481247A CN 202211151607 A CN202211151607 A CN 202211151607A CN 115481247 A CN115481247 A CN 115481247A
Authority
CN
China
Prior art keywords
paper
author
learning
disambiguation
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211151607.0A
Other languages
Chinese (zh)
Inventor
宫继兵
房小涵
彭吉全
赵祎
赵金烨
王成龙
黄朝园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202211151607.0A priority Critical patent/CN115481247A/en
Publication of CN115481247A publication Critical patent/CN115481247A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an author name disambiguation method based on comparative learning and heterogeneous graph attention network, belonging to the technical field of entity disambiguation constructed by knowledge maps, comprising the steps of using MongoDB to access information such as paper names, authors and organizations, using a python character processing library to clean data, removing noise to obtain more standard text, and cleaning the text into data suitable for subsequent steps; performing characterization learning on the paper by using contrast learning to obtain the embedding of the uniform codes of the paper; clustering the papers by using a purity priority principle, and relieving the problem of paper combination to obtain a paper cluster; aligning the thesis clusters obtained in the last step by using a heteromorphic image attention network; and an over-splitting detection and over-splitting alignment algorithm is provided, so that the disambiguation quality of the thesis is ensured. The invention better realizes the disambiguation problem of the same author and solves the problems of paper merging and paper splitting to a certain extent.

Description

Author name disambiguation method based on comparative learning and heterogeneous graph attention network
Technical Field
The invention relates to the technical field of entity disambiguation of knowledge graph construction, in particular to an author name disambiguation method based on comparative learning and heterogeneous graph attention network.
Background
Whether the big data of the present day or the meta-universe of the recent fire and heat, in the knowledge informatization process, how to disambiguate the entities with the same name is an important and challenging problem. The problem generally exists in the fields of academic database construction, information retrieval, automatic question answering, recommendation systems and the like, and has important research significance. The author name disambiguation has important research value in academic database construction, and a large number of scholars participate in related research in recent years. The disambiguation in the academic database construction aspect is mainly in the aspect of same-name authors, and a large number of papers in the current system have wrong distribution, wherein the phenomenon that English names of Chinese scholars have ambiguity is particularly serious. Many of these are historical errors that occur during the operation of the author name disambiguation system, and these errors will grow as the number of system papers increases.
In the process of surveying academic database construction, historical errors are divided into two sub-scenes of paper merging and paper splitting. The problem of paper overcombination refers to a paper with other experts in a certain expert-based library, and the problem of paper overcompletion refers to a paper of the same expert being split into a plurality of clusters. These two phenomena are widely present in the running process of the AND algorithm, AND these errors can seriously affect the stable execution of the subsequent algorithm if the errors are not emphasized AND solved, which is a big challenge in the current AND research.
Disclosure of Invention
The invention provides an author name disambiguation method based on contrast learning and a heterogeneous graph attention network, which converts the disambiguation problem into an alignment problem by preliminarily clustering papers through technologies such as a heterogeneous graph neural network, clustering and contrast learning, better realizes the disambiguation problem of authors with the same name, and solves the problems of paper over-merging and paper over-splitting to a certain extent.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
an author name disambiguation method based on a contrast learning and heterogeneous graph attention network comprises the following steps:
s1, data preprocessing: mongoDB is used for accessing the information of the thesis name, the author and the organization, a python character processing library is used for cleaning the data, noise is removed, a more standard text is obtained, and the data are cleaned into data suitable for the subsequent steps;
s2, the paper represents and learns: performing characterization learning on the paper by using contrast learning to obtain the embedding of the uniform codes of the paper;
s3, paper primary clustering: clustering the papers by using a purity priority principle, and relieving the problem of paper combination to obtain a paper cluster;
s4, aligning paper clusters: aligning the paper cluster obtained in the last step by using a heteromorphic graph attention network;
s5, obtaining a paper disambiguation result: and an over-split detection and over-split alignment algorithm is provided, so that the dissimilarity quality of the thesis is ensured.
The technical scheme of the invention is further improved as follows: s2, the method specifically comprises the following steps:
s21, obtaining the paper representation by using a language pre-training model BERT, wherein the process is described as follows:
Figure BDA0003856679240000021
in the formula (I), the compound is shown in the specification,
Figure BDA0003856679240000022
is the i-th paper of the author a,
Figure BDA0003856679240000023
is a paper
Figure BDA0003856679240000024
A corresponding characterization vector;
s22, constructing a correct example pair
Figure BDA0003856679240000031
Negative example pair of structure
Figure BDA0003856679240000032
And combining the positive and negative examples;
s23, introducing a trained objective function h = f (bert (x)), a trained objective loss l i The description is as follows:
Figure BDA0003856679240000033
where N is the minimum batch _ size, τ is the temperature hyperparameter, sim (h) 1 ,h 2 ) Is cosine similarity
Figure BDA0003856679240000034
S24, obtaining a representation vector v of the thesis finally after training i
The technical scheme of the invention is further improved as follows: s3, specifically comprising:
s31, dividing the thesis into more clusters according to rules by taking the clustering process as a disambiguation intermediate process, and reducing the occurrence of different authors in the same cluster;
s32, clustering is carried out through the LightGBN and the hierarchical clustering model, and the negative gradient of the loss function is used as a residual error approximate value of the current decision tree to fit a new decision tree;
s33, put forwardIndex Recall over-merge To describe the overcombination phenomenon of the clustering result, the index is defined as:
Figure BDA0003856679240000035
in the formula, P represents the number of cases of two papers of the same author in the same cluster; FN indicates the number of cases where two identical author papers are in two clusters, respectively; m is the number of ideal clustering results, and N is the number of actual clustering results; recall over-merge Clustering with higher values results in a lower degree of over-splitting.
The technical scheme of the invention is further improved as follows: s4, specifically comprising:
s41, generating candidate pairs for author entities with the same name;
s42, constructing a heteromorphic graph for each author entity, and if the mechanism and the co-author names between the candidate pairs are the same or the papers are similar, connecting the entities with each other to obtain a heteromorphic graph G (V, E);
and S43, determining author matching by using the heteromorphic image attention network.
The technical scheme of the invention is further improved as follows: in S43, the method specifically includes:
s431, obtaining semantic embedding of each thesis entity through the representation learning model of S2, and training the heterogeneous graph constructed in S42 through a LINE model to obtain structure embedding of each entity;
s432, combining the two kinds of embedding together as an input feature f, and finding out the importance among different author entities e through self-attribute, wherein the process is described as follows:
t ij =self-attention(Wf i ,Wf j )
Figure BDA0003856679240000041
wherein W is a shared weight matrix for each
Figure BDA0003856679240000042
Figure BDA0003856679240000043
Refer to e i All neighbor nodes of (1).
The technical scheme of the invention is further improved as follows: s5, specifically comprising:
s51, generating a non-repeated pair < name: cid1 and name: cid2> according to the rule of permutation and combination to construct a heteromorphic graph;
s52, detecting whether a group of pair belongs to an author or not by using a pre-trained HGAT;
s53, aligning the paper clusters by giving an alignment rule;
s54, the process needs to be carried out for multiple times, the times are defined as loops, and the finally obtained cluster _ pubs is the final disambiguation result.
The technical scheme of the invention is further improved as follows: in S53, specifically, the method includes:
s531, calculating adjacent edge nodes of each node, and connecting a group of edges with highest similarity scores of aligned two nodes;
s532, after all the nodes are judged, the dfs is used for realizing the connected subgraph algorithm, the alignment rule is obtained, and the nodes are combined.
Due to the adoption of the technical scheme, the invention has the technical progress that:
1. according to the invention, by comparing the learning technology to finely adjust the BERT-based paper representation, the learned paper representation is more suitable for the task of author name disambiguation.
2. According to the method, the similarity among the papers is calculated according to the paper representation obtained in the last step, so that the papers are preliminarily clustered to obtain fine-grained paper clusters, the disambiguation problem is converted into the alignment problem, the text semantic information of the papers is fully utilized for clustering, and the high-purity fine-grained paper clusters are generated.
3. In order to obtain a final disambiguation result, fine-grained paper clusters need to be aligned, in the process, a heteromorphic graph network is constructed by using each attribute of each paper cluster, then the representation of each paper cluster is learned by using a isomerous graph neural network, finally the similarity between every two paper clusters is calculated, the two paper clusters are most similar to each other to be aligned, structural information in the paper is considered in the process, and the paper clusters are learned through the isomerous graph neural network, so that the final paper disambiguation result is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts;
FIG. 1 is an algorithmic flow chart of an author name disambiguation method based on a comparative learning and heterogeneous graph attention network provided by the present invention;
FIG. 2 is a diagram of an algorithmic model framework for an author name disambiguation method based on a comparative learning and heterogeneous graph attention network provided by the present invention.
Detailed Description
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides an author name disambiguation method based on comparative learning AND heterogeneous graph attention network, solves the problem that the stable execution of a subsequent algorithm is seriously influenced by the phenomena of paper overmerging AND paper oversplitting which are widely generated in the running process of an AND algorithm in the prior art, emphatically considers two error scenes possibly generated in the AND process, AND provides the AND algorithm aiming at the problem AND how to apply the AND algorithm in a big data scene.
Part of the technical term interpretation:
author name disambiguation: author Name Disambiguation-AND, which correctly matches the same Author in the academic database, disambiguates the same Author.
The invention is further described in detail below with reference to the drawings and examples:
as shown in fig. 1 and 2, an author name disambiguation method based on a contrast learning and heterogeneous graph attention network includes the following steps:
s1, preprocessing data;
using MongoDB to access information such as a thesis name, an author, an organization and the like, using a python character processing library to clean data, removing noise to obtain a more standard text, and cleaning the text into data suitable for subsequent steps;
s2, performing paper characterization learning;
performing characterization learning on the paper by using contrast learning to obtain the embedding of the uniform codes of the paper;
s3, performing primary clustering on the thesis;
clustering the papers by using the principle of purity priority, and relieving the problem of paper over-merging;
s4, aligning the paper clusters;
aligning the paper cluster obtained in the last step by using a heteromorphic graph attention network;
s5, obtaining a paper disambiguation result;
an over-split detection and over-split alignment algorithm is provided, so that the paper disambiguation quality is guaranteed;
carrying out a specific implementation process;
s1, aiming at the data noise problem existing in a data set and factors possibly influencing disambiguation quality in the data, firstly preprocessing the data set, cleaning and analyzing the data, including cleaning abnormal data, analyzing samples from different characteristic angles, then performing characteristic engineering on the data, and taking the processed data as follow-up model training to provide input;
s2, in the paper characterization learning, firstly, preliminarily obtaining paper characterization by using a language pre-training model, then, referring to comparative learning, constructing positive and negative example pairs, and finally obtaining a characterization vector of the paper after training; the method specifically comprises the following steps:
s21, firstly, acquiring a preliminary characterization of the paper through a language pre-training model BERT, wherein the process can be described as follows:
Figure BDA0003856679240000081
in the formula (I), the compound is shown in the specification,
Figure BDA0003856679240000082
is an i-th paper of the author a,
Figure BDA0003856679240000083
is a paper
Figure BDA0003856679240000084
A corresponding characterization vector;
s22, drawing papers with similar paper similarity together by using a method of comparing and learning SimCSE, drawing papers with low paper similarity, constructing a positive example and a negative example and combining the positive example and the negative example; the method specifically comprises the following steps:
s221, a positive example structure: two BERT encoders were used to obtain each for a given article with author name a
Figure BDA0003856679240000085
And
Figure BDA0003856679240000086
the vectors generated in each time in the BERT process are not identical, but the two semantics are identical, thereby forming a positive example pair
Figure BDA0003856679240000087
In addition, in order to better make the papers of the same author closer in the obtained vector space, the same authorDifferent thesis of
Figure BDA0003856679240000088
Also regarded as positive samples, thereby constituting a positive case pair
Figure BDA0003856679240000089
S222, negative example structure: to make the paper farther apart between different authors of the same name, it is treated as a negative example
Figure BDA00038566792400000810
Thereby obtaining a negative example pair
Figure BDA00038566792400000811
S23, adding p pos And p neg Are combined to form
Figure BDA00038566792400000812
Wherein x i Is the basis for the measurement of the measurement,
Figure BDA00038566792400000813
is a positive example of the situation,
Figure BDA00038566792400000814
is a negative example. For training the implicit relationship between the two, a training objective function h = f (BERT (x)) is introduced after the BERT-Encoder, where f is a linear layer function. Loss of training target l i As shown in the following equation:
Figure BDA00038566792400000815
wherein N is the minimum batch _ size, τ is the (temperature) hyperparameter, and sim (h) 1 ,h 2 ) Is cosine similarity
Figure BDA00038566792400000816
After training, a representative vector v of a paper can be obtained i
S3, in the preliminary clustering of the thesis clusters, clustering is carried out through a clustering model according to the principle of purity priority, clusters with proper number are generated as much as possible, and then the clustering condition is adjusted reasonably according to the overcombination indexes; the method specifically comprises the following steps:
s31, in order to deal with the problem of excessive combination of the papers, the clustering process is taken as an intermediate process of disambiguation. In the clustering process, the papers are divided into more clusters as much as possible according to a certain rule, so that the situation that different authors of the papers appear in the same cluster can be effectively reduced.
S32, providing an index Recall over-merge To describe the overcombination phenomenon of the clustering result, the index is defined as the formula:
Figure BDA0003856679240000091
in the formula, TP represents the number of cases that two papers of the same author are in the same cluster; FN indicates the number of cases where two identical author papers are in two clusters, respectively; m is the number of ideal clustering results, and N is the number of actual clustering results; recall over-merge Clustering with higher values results in lower degrees of over-splitting.
S4, in the process of aligning the thesis clusters, firstly connecting author entities to obtain a heteromorphic graph. Then, determining author matching by using an abnormal picture attention network; the method specifically comprises the following steps:
s41, generating candidate pairs for author entities (clusters) with the same name;
s42, constructing a heterogeneous graph for each author entity, and if the names of organizations and co-workers between the candidate pairs are the same or the papers are similar, connecting the entities and the co-workers with each other to obtain a heterogeneous graph G (V, E);
s43, determining author matching by using the attention network of the heteromorphic image;
s431, obtaining the semantic embedding of each thesis entity through the representation learning model of the S2, and training the heterogeneous graph constructed in the S42 through a LINE model to obtain the structure embedding of each entity;
s432, combining the two embedding together as an input characteristic f, wherein the process can be described as solving by using self-attention among different author entities e, and the process is that the node e i To e j Importance of t ij The formula is as follows:
t ij =self-attention(Wf i ,Wf j )
wherein W is a shared weight matrix for each
Figure BDA0003856679240000101
Figure BDA0003856679240000102
E of finger i All the neighbor nodes of (2); wherein the normalized attention coefficient is as follows:
Figure BDA0003856679240000103
s5, finally aligning the thesis cluster through multiple alignment rules to finally obtain a final disambiguation result;
the method comprises the following steps:
s51, generating no-repeat Pairs < name: cid1 and name: cid2> according to the rule of permutation and combination, and constructing a heteromorphic graph;
s52, detecting whether a group of pair belongs to an author or not by using a pre-trained HGAT;
s53, aligning the paper clusters by giving an alignment rule;
s531, calculating adjacent edge nodes of each node, and connecting a group of edges with highest similarity scores of aligned two nodes;
s532, after all the nodes are judged, a unified subgraph algorithm is realized by using dfs, an alignment rule is obtained, and merging is carried out;
s54, the process needs to be carried out for multiple times, and the times are defined as loops; the final cluster _ pubs is the final disambiguation result.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. An author name disambiguation method based on a comparative learning and heterogeneous graph attention network is characterized in that: the method comprises the following steps:
s1, data preprocessing: mongoDB is used for accessing the information of the thesis name, the author and the organization, a python character processing library is used for cleaning the data, noise is removed, a more standard text is obtained, and the data are cleaned into data suitable for the subsequent steps;
s2, the paper represents and learns: performing characterization learning on the paper by using contrast learning to obtain the embedding of the uniform codes of the paper;
s3, performing primary clustering on the thesis: clustering the papers by using a purity priority principle, and relieving the problem of paper combination to obtain a paper cluster;
s4, aligning paper clusters: aligning the thesis clusters obtained in the last step by using a heteromorphic image attention network;
s5, obtaining a paper disambiguation result: and an over-split detection and over-split alignment algorithm is provided, so that the dissimilarity quality of the thesis is ensured.
2. The author name disambiguation method based on comparative learning and heterogeneous graph attention network as recited in claim 1, wherein: s2, specifically comprising:
s21, obtaining a paper representation by using a language pre-training model BERT, wherein the process is described as follows:
Figure FDA0003856679230000011
in the formula (I), the compound is shown in the specification,
Figure FDA0003856679230000012
is the i-th paper of the author a,
Figure FDA0003856679230000013
is a paper
Figure FDA0003856679230000014
A corresponding characterization vector;
s22, constructing a correct example pair
Figure FDA0003856679230000015
Negative example pair of structure
Figure FDA0003856679230000016
And combining the positive and negative examples;
s23, introducing a training objective function h = f (bert (x)), and a training objective loss
Figure FDA0003856679230000017
The description is as follows:
Figure FDA0003856679230000021
where N is the minimum batch _ size, τ is the temperature hyperparameter, sim (h) 1 ,h 2 ) Is cosine similarity
Figure FDA0003856679230000022
S24, obtaining a representation vector v of the thesis finally after training i
3. The author name disambiguation method based on a comparative learning and heterogeneous graph attention network of claim 1, further comprising: in S3, specifically including:
s31, dividing the thesis into more clusters according to rules by taking the clustering process as a disambiguation intermediate process, and reducing the occurrence of different authors in the same cluster;
s32, clustering is carried out through the LightGBN and the hierarchical clustering model, and the negative gradient of the loss function is used as a residual error approximate value of the current decision tree to fit a new decision tree;
s33, providing an index Recall over-merge To describe the overcombination phenomenon of the clustering result, the index is defined as:
Figure FDA0003856679230000023
in the formula, P represents the number of cases that two papers of the same author are in the same cluster; FN represents the number of cases that two same author papers are respectively in two clusters; m is the number of ideal clustering results, and N is the number of actual clustering results; recall over-merge Clustering with higher values results in lower degrees of over-splitting.
4. The author name disambiguation method based on comparative learning and heterogeneous graph attention network as recited in claim 1, wherein: s4, specifically comprising:
s41, generating candidate pairs for author entities with the same name;
s42, constructing a heterogeneous graph for each author entity, and if the names of organizations and co-workers between the candidate pairs are the same or the papers are similar, connecting the entities and the co-workers with each other to obtain a heterogeneous graph G (V, E);
and S43, determining author matching by using the heteromorphic image attention network.
5. The author name disambiguation method based on comparative learning and heterogeneous graph attention network as recited in claim 4, wherein: in S43, the method specifically includes:
s431, obtaining the semantic embedding of each thesis entity through the representation learning model of the S2, and training the heterogeneous graph constructed in the S42 through a LINE model to obtain the structure embedding of each entity;
s432, combining the two kinds of embedding together as an input feature f, and finding out the importance among different author entities e through self-attribute, wherein the process is described as follows:
t ij =self-attention(Wf i ,Wf j )
Figure FDA0003856679230000031
where W is a shared weight matrix for each
Figure FDA0003856679230000032
Figure FDA0003856679230000033
Refer to e i All neighbor nodes of (1).
6. The author name disambiguation method based on a comparative learning and heterogeneous graph attention network of claim 1, further comprising: in S5, specifically including:
s51, generating non-repeated Pairs < name: cid1 and name: cid2> according to the rule of permutation and combination, and constructing a heteromorphic graph;
s52, detecting whether a group of pair belongs to an author by using a pre-trained HGAT;
s53, aligning the paper cluster by giving an alignment rule;
s54, the process needs to be carried out for multiple times, the times are defined as the loops, and the finally obtained cluster _ pubs is the final disambiguation result.
7. The author name disambiguation method based on comparative learning and heterogeneous graph attention networks of claim 6, further comprising: in S53, specifically, the method includes:
s531, calculating adjacent edge nodes of each node, and connecting a group of edges with highest similarity scores of aligned two nodes;
s532, after all the nodes are judged, the dfs is used for realizing the connected subgraph algorithm, the alignment rule is obtained, and the nodes are combined.
CN202211151607.0A 2022-09-21 2022-09-21 Author name disambiguation method based on comparative learning and heterogeneous graph attention network Pending CN115481247A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211151607.0A CN115481247A (en) 2022-09-21 2022-09-21 Author name disambiguation method based on comparative learning and heterogeneous graph attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211151607.0A CN115481247A (en) 2022-09-21 2022-09-21 Author name disambiguation method based on comparative learning and heterogeneous graph attention network

Publications (1)

Publication Number Publication Date
CN115481247A true CN115481247A (en) 2022-12-16

Family

ID=84424288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211151607.0A Pending CN115481247A (en) 2022-09-21 2022-09-21 Author name disambiguation method based on comparative learning and heterogeneous graph attention network

Country Status (1)

Country Link
CN (1) CN115481247A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556058A (en) * 2024-01-11 2024-02-13 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556058A (en) * 2024-01-11 2024-02-13 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device
CN117556058B (en) * 2024-01-11 2024-05-24 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device

Similar Documents

Publication Publication Date Title
Saxena et al. Sequence-to-sequence knowledge graph completion and question answering
Cai et al. Generative adversarial network based heterogeneous bibliographic network representation for personalized citation recommendation
Dong et al. From data fusion to knowledge fusion
He et al. Learning entity representation for entity disambiguation
US20240013055A1 (en) Adversarial pretraining of machine learning models
US20210303783A1 (en) Multi-layer graph-based categorization
Hühn et al. FR3: A fuzzy rule learner for inducing reliable classifiers
Zhang et al. Autoblock: A hands-off blocking framework for entity matching
Liu et al. DAGOBAH: an end-to-end context-free tabular data semantic annotation system
CN109299263B (en) Text classification method and electronic equipment
US9009029B1 (en) Semantic hashing in entity resolution
CN109635157A (en) Model generating method, video searching method, device, terminal and storage medium
Gao et al. Building a large-scale, accurate and fresh knowledge graph
Dong et al. Data-anonymous encoding for text-to-SQL generation
CN115481247A (en) Author name disambiguation method based on comparative learning and heterogeneous graph attention network
CN114676346A (en) News event processing method and device, computer equipment and storage medium
CN113361270B (en) Short text optimization topic model method for service data clustering
Zimmermann et al. Incremental active opinion learning over a stream of opinionated documents
CN115391548A (en) Retrieval knowledge graph library generation method based on combination of scene graph and concept network
CN113535967B (en) Chinese universal concept map error correction device
CN113128224B (en) Chinese error correction method, device, equipment and readable storage medium
Bhowmick et al. Globally Aware Contextual Embeddings for Named Entity Recognition in Social Media Streams
Song et al. Metric sentiment learning for label representation
Im et al. Multilayer CARU model for text summarization
Xie et al. Author name disambiguation via heterogeneous network embedding from structural and semantic perspectives

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination