CN115481247A - Author name disambiguation method based on comparative learning and heterogeneous graph attention network - Google Patents
Author name disambiguation method based on comparative learning and heterogeneous graph attention network Download PDFInfo
- Publication number
- CN115481247A CN115481247A CN202211151607.0A CN202211151607A CN115481247A CN 115481247 A CN115481247 A CN 115481247A CN 202211151607 A CN202211151607 A CN 202211151607A CN 115481247 A CN115481247 A CN 115481247A
- Authority
- CN
- China
- Prior art keywords
- paper
- author
- learning
- disambiguation
- clustering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an author name disambiguation method based on comparative learning and heterogeneous graph attention network, belonging to the technical field of entity disambiguation constructed by knowledge maps, comprising the steps of using MongoDB to access information such as paper names, authors and organizations, using a python character processing library to clean data, removing noise to obtain more standard text, and cleaning the text into data suitable for subsequent steps; performing characterization learning on the paper by using contrast learning to obtain the embedding of the uniform codes of the paper; clustering the papers by using a purity priority principle, and relieving the problem of paper combination to obtain a paper cluster; aligning the thesis clusters obtained in the last step by using a heteromorphic image attention network; and an over-splitting detection and over-splitting alignment algorithm is provided, so that the disambiguation quality of the thesis is ensured. The invention better realizes the disambiguation problem of the same author and solves the problems of paper merging and paper splitting to a certain extent.
Description
Technical Field
The invention relates to the technical field of entity disambiguation of knowledge graph construction, in particular to an author name disambiguation method based on comparative learning and heterogeneous graph attention network.
Background
Whether the big data of the present day or the meta-universe of the recent fire and heat, in the knowledge informatization process, how to disambiguate the entities with the same name is an important and challenging problem. The problem generally exists in the fields of academic database construction, information retrieval, automatic question answering, recommendation systems and the like, and has important research significance. The author name disambiguation has important research value in academic database construction, and a large number of scholars participate in related research in recent years. The disambiguation in the academic database construction aspect is mainly in the aspect of same-name authors, and a large number of papers in the current system have wrong distribution, wherein the phenomenon that English names of Chinese scholars have ambiguity is particularly serious. Many of these are historical errors that occur during the operation of the author name disambiguation system, and these errors will grow as the number of system papers increases.
In the process of surveying academic database construction, historical errors are divided into two sub-scenes of paper merging and paper splitting. The problem of paper overcombination refers to a paper with other experts in a certain expert-based library, and the problem of paper overcompletion refers to a paper of the same expert being split into a plurality of clusters. These two phenomena are widely present in the running process of the AND algorithm, AND these errors can seriously affect the stable execution of the subsequent algorithm if the errors are not emphasized AND solved, which is a big challenge in the current AND research.
Disclosure of Invention
The invention provides an author name disambiguation method based on contrast learning and a heterogeneous graph attention network, which converts the disambiguation problem into an alignment problem by preliminarily clustering papers through technologies such as a heterogeneous graph neural network, clustering and contrast learning, better realizes the disambiguation problem of authors with the same name, and solves the problems of paper over-merging and paper over-splitting to a certain extent.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
an author name disambiguation method based on a contrast learning and heterogeneous graph attention network comprises the following steps:
s1, data preprocessing: mongoDB is used for accessing the information of the thesis name, the author and the organization, a python character processing library is used for cleaning the data, noise is removed, a more standard text is obtained, and the data are cleaned into data suitable for the subsequent steps;
s2, the paper represents and learns: performing characterization learning on the paper by using contrast learning to obtain the embedding of the uniform codes of the paper;
s3, paper primary clustering: clustering the papers by using a purity priority principle, and relieving the problem of paper combination to obtain a paper cluster;
s4, aligning paper clusters: aligning the paper cluster obtained in the last step by using a heteromorphic graph attention network;
s5, obtaining a paper disambiguation result: and an over-split detection and over-split alignment algorithm is provided, so that the dissimilarity quality of the thesis is ensured.
The technical scheme of the invention is further improved as follows: s2, the method specifically comprises the following steps:
s21, obtaining the paper representation by using a language pre-training model BERT, wherein the process is described as follows:
in the formula (I), the compound is shown in the specification,is the i-th paper of the author a,is a paperA corresponding characterization vector;
s22, constructing a correct example pairNegative example pair of structureAnd combining the positive and negative examples;
s23, introducing a trained objective function h = f (bert (x)), a trained objective loss l i The description is as follows:
where N is the minimum batch _ size, τ is the temperature hyperparameter, sim (h) 1 ,h 2 ) Is cosine similarity
S24, obtaining a representation vector v of the thesis finally after training i 。
The technical scheme of the invention is further improved as follows: s3, specifically comprising:
s31, dividing the thesis into more clusters according to rules by taking the clustering process as a disambiguation intermediate process, and reducing the occurrence of different authors in the same cluster;
s32, clustering is carried out through the LightGBN and the hierarchical clustering model, and the negative gradient of the loss function is used as a residual error approximate value of the current decision tree to fit a new decision tree;
s33, put forwardIndex Recall over-merge To describe the overcombination phenomenon of the clustering result, the index is defined as:
in the formula, P represents the number of cases of two papers of the same author in the same cluster; FN indicates the number of cases where two identical author papers are in two clusters, respectively; m is the number of ideal clustering results, and N is the number of actual clustering results; recall over-merge Clustering with higher values results in a lower degree of over-splitting.
The technical scheme of the invention is further improved as follows: s4, specifically comprising:
s41, generating candidate pairs for author entities with the same name;
s42, constructing a heteromorphic graph for each author entity, and if the mechanism and the co-author names between the candidate pairs are the same or the papers are similar, connecting the entities with each other to obtain a heteromorphic graph G (V, E);
and S43, determining author matching by using the heteromorphic image attention network.
The technical scheme of the invention is further improved as follows: in S43, the method specifically includes:
s431, obtaining semantic embedding of each thesis entity through the representation learning model of S2, and training the heterogeneous graph constructed in S42 through a LINE model to obtain structure embedding of each entity;
s432, combining the two kinds of embedding together as an input feature f, and finding out the importance among different author entities e through self-attribute, wherein the process is described as follows:
t ij =self-attention(Wf i ,Wf j )
The technical scheme of the invention is further improved as follows: s5, specifically comprising:
s51, generating a non-repeated pair < name: cid1 and name: cid2> according to the rule of permutation and combination to construct a heteromorphic graph;
s52, detecting whether a group of pair belongs to an author or not by using a pre-trained HGAT;
s53, aligning the paper clusters by giving an alignment rule;
s54, the process needs to be carried out for multiple times, the times are defined as loops, and the finally obtained cluster _ pubs is the final disambiguation result.
The technical scheme of the invention is further improved as follows: in S53, specifically, the method includes:
s531, calculating adjacent edge nodes of each node, and connecting a group of edges with highest similarity scores of aligned two nodes;
s532, after all the nodes are judged, the dfs is used for realizing the connected subgraph algorithm, the alignment rule is obtained, and the nodes are combined.
Due to the adoption of the technical scheme, the invention has the technical progress that:
1. according to the invention, by comparing the learning technology to finely adjust the BERT-based paper representation, the learned paper representation is more suitable for the task of author name disambiguation.
2. According to the method, the similarity among the papers is calculated according to the paper representation obtained in the last step, so that the papers are preliminarily clustered to obtain fine-grained paper clusters, the disambiguation problem is converted into the alignment problem, the text semantic information of the papers is fully utilized for clustering, and the high-purity fine-grained paper clusters are generated.
3. In order to obtain a final disambiguation result, fine-grained paper clusters need to be aligned, in the process, a heteromorphic graph network is constructed by using each attribute of each paper cluster, then the representation of each paper cluster is learned by using a isomerous graph neural network, finally the similarity between every two paper clusters is calculated, the two paper clusters are most similar to each other to be aligned, structural information in the paper is considered in the process, and the paper clusters are learned through the isomerous graph neural network, so that the final paper disambiguation result is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts;
FIG. 1 is an algorithmic flow chart of an author name disambiguation method based on a comparative learning and heterogeneous graph attention network provided by the present invention;
FIG. 2 is a diagram of an algorithmic model framework for an author name disambiguation method based on a comparative learning and heterogeneous graph attention network provided by the present invention.
Detailed Description
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides an author name disambiguation method based on comparative learning AND heterogeneous graph attention network, solves the problem that the stable execution of a subsequent algorithm is seriously influenced by the phenomena of paper overmerging AND paper oversplitting which are widely generated in the running process of an AND algorithm in the prior art, emphatically considers two error scenes possibly generated in the AND process, AND provides the AND algorithm aiming at the problem AND how to apply the AND algorithm in a big data scene.
Part of the technical term interpretation:
author name disambiguation: author Name Disambiguation-AND, which correctly matches the same Author in the academic database, disambiguates the same Author.
The invention is further described in detail below with reference to the drawings and examples:
as shown in fig. 1 and 2, an author name disambiguation method based on a contrast learning and heterogeneous graph attention network includes the following steps:
s1, preprocessing data;
using MongoDB to access information such as a thesis name, an author, an organization and the like, using a python character processing library to clean data, removing noise to obtain a more standard text, and cleaning the text into data suitable for subsequent steps;
s2, performing paper characterization learning;
performing characterization learning on the paper by using contrast learning to obtain the embedding of the uniform codes of the paper;
s3, performing primary clustering on the thesis;
clustering the papers by using the principle of purity priority, and relieving the problem of paper over-merging;
s4, aligning the paper clusters;
aligning the paper cluster obtained in the last step by using a heteromorphic graph attention network;
s5, obtaining a paper disambiguation result;
an over-split detection and over-split alignment algorithm is provided, so that the paper disambiguation quality is guaranteed;
carrying out a specific implementation process;
s1, aiming at the data noise problem existing in a data set and factors possibly influencing disambiguation quality in the data, firstly preprocessing the data set, cleaning and analyzing the data, including cleaning abnormal data, analyzing samples from different characteristic angles, then performing characteristic engineering on the data, and taking the processed data as follow-up model training to provide input;
s2, in the paper characterization learning, firstly, preliminarily obtaining paper characterization by using a language pre-training model, then, referring to comparative learning, constructing positive and negative example pairs, and finally obtaining a characterization vector of the paper after training; the method specifically comprises the following steps:
s21, firstly, acquiring a preliminary characterization of the paper through a language pre-training model BERT, wherein the process can be described as follows:
in the formula (I), the compound is shown in the specification,is an i-th paper of the author a,is a paperA corresponding characterization vector;
s22, drawing papers with similar paper similarity together by using a method of comparing and learning SimCSE, drawing papers with low paper similarity, constructing a positive example and a negative example and combining the positive example and the negative example; the method specifically comprises the following steps:
s221, a positive example structure: two BERT encoders were used to obtain each for a given article with author name aAndthe vectors generated in each time in the BERT process are not identical, but the two semantics are identical, thereby forming a positive example pairIn addition, in order to better make the papers of the same author closer in the obtained vector space, the same authorDifferent thesis ofAlso regarded as positive samples, thereby constituting a positive case pair
S222, negative example structure: to make the paper farther apart between different authors of the same name, it is treated as a negative exampleThereby obtaining a negative example pair
S23, adding p pos And p neg Are combined to formWherein x i Is the basis for the measurement of the measurement,is a positive example of the situation,is a negative example. For training the implicit relationship between the two, a training objective function h = f (BERT (x)) is introduced after the BERT-Encoder, where f is a linear layer function. Loss of training target l i As shown in the following equation:
wherein N is the minimum batch _ size, τ is the (temperature) hyperparameter, and sim (h) 1 ,h 2 ) Is cosine similarity
After training, a representative vector v of a paper can be obtained i 。
S3, in the preliminary clustering of the thesis clusters, clustering is carried out through a clustering model according to the principle of purity priority, clusters with proper number are generated as much as possible, and then the clustering condition is adjusted reasonably according to the overcombination indexes; the method specifically comprises the following steps:
s31, in order to deal with the problem of excessive combination of the papers, the clustering process is taken as an intermediate process of disambiguation. In the clustering process, the papers are divided into more clusters as much as possible according to a certain rule, so that the situation that different authors of the papers appear in the same cluster can be effectively reduced.
S32, providing an index Recall over-merge To describe the overcombination phenomenon of the clustering result, the index is defined as the formula:
in the formula, TP represents the number of cases that two papers of the same author are in the same cluster; FN indicates the number of cases where two identical author papers are in two clusters, respectively; m is the number of ideal clustering results, and N is the number of actual clustering results; recall over-merge Clustering with higher values results in lower degrees of over-splitting.
S4, in the process of aligning the thesis clusters, firstly connecting author entities to obtain a heteromorphic graph. Then, determining author matching by using an abnormal picture attention network; the method specifically comprises the following steps:
s41, generating candidate pairs for author entities (clusters) with the same name;
s42, constructing a heterogeneous graph for each author entity, and if the names of organizations and co-workers between the candidate pairs are the same or the papers are similar, connecting the entities and the co-workers with each other to obtain a heterogeneous graph G (V, E);
s43, determining author matching by using the attention network of the heteromorphic image;
s431, obtaining the semantic embedding of each thesis entity through the representation learning model of the S2, and training the heterogeneous graph constructed in the S42 through a LINE model to obtain the structure embedding of each entity;
s432, combining the two embedding together as an input characteristic f, wherein the process can be described as solving by using self-attention among different author entities e, and the process is that the node e i To e j Importance of t ij The formula is as follows:
t ij =self-attention(Wf i ,Wf j )
wherein W is a shared weight matrix for each E of finger i All the neighbor nodes of (2); wherein the normalized attention coefficient is as follows:
s5, finally aligning the thesis cluster through multiple alignment rules to finally obtain a final disambiguation result;
the method comprises the following steps:
s51, generating no-repeat Pairs < name: cid1 and name: cid2> according to the rule of permutation and combination, and constructing a heteromorphic graph;
s52, detecting whether a group of pair belongs to an author or not by using a pre-trained HGAT;
s53, aligning the paper clusters by giving an alignment rule;
s531, calculating adjacent edge nodes of each node, and connecting a group of edges with highest similarity scores of aligned two nodes;
s532, after all the nodes are judged, a unified subgraph algorithm is realized by using dfs, an alignment rule is obtained, and merging is carried out;
s54, the process needs to be carried out for multiple times, and the times are defined as loops; the final cluster _ pubs is the final disambiguation result.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.
Claims (7)
1. An author name disambiguation method based on a comparative learning and heterogeneous graph attention network is characterized in that: the method comprises the following steps:
s1, data preprocessing: mongoDB is used for accessing the information of the thesis name, the author and the organization, a python character processing library is used for cleaning the data, noise is removed, a more standard text is obtained, and the data are cleaned into data suitable for the subsequent steps;
s2, the paper represents and learns: performing characterization learning on the paper by using contrast learning to obtain the embedding of the uniform codes of the paper;
s3, performing primary clustering on the thesis: clustering the papers by using a purity priority principle, and relieving the problem of paper combination to obtain a paper cluster;
s4, aligning paper clusters: aligning the thesis clusters obtained in the last step by using a heteromorphic image attention network;
s5, obtaining a paper disambiguation result: and an over-split detection and over-split alignment algorithm is provided, so that the dissimilarity quality of the thesis is ensured.
2. The author name disambiguation method based on comparative learning and heterogeneous graph attention network as recited in claim 1, wherein: s2, specifically comprising:
s21, obtaining a paper representation by using a language pre-training model BERT, wherein the process is described as follows:
in the formula (I), the compound is shown in the specification,is the i-th paper of the author a,is a paperA corresponding characterization vector;
s22, constructing a correct example pairNegative example pair of structureAnd combining the positive and negative examples;
s23, introducing a training objective function h = f (bert (x)), and a training objective lossThe description is as follows:
where N is the minimum batch _ size, τ is the temperature hyperparameter, sim (h) 1 ,h 2 ) Is cosine similarity
S24, obtaining a representation vector v of the thesis finally after training i 。
3. The author name disambiguation method based on a comparative learning and heterogeneous graph attention network of claim 1, further comprising: in S3, specifically including:
s31, dividing the thesis into more clusters according to rules by taking the clustering process as a disambiguation intermediate process, and reducing the occurrence of different authors in the same cluster;
s32, clustering is carried out through the LightGBN and the hierarchical clustering model, and the negative gradient of the loss function is used as a residual error approximate value of the current decision tree to fit a new decision tree;
s33, providing an index Recall over-merge To describe the overcombination phenomenon of the clustering result, the index is defined as:
in the formula, P represents the number of cases that two papers of the same author are in the same cluster; FN represents the number of cases that two same author papers are respectively in two clusters; m is the number of ideal clustering results, and N is the number of actual clustering results; recall over-merge Clustering with higher values results in lower degrees of over-splitting.
4. The author name disambiguation method based on comparative learning and heterogeneous graph attention network as recited in claim 1, wherein: s4, specifically comprising:
s41, generating candidate pairs for author entities with the same name;
s42, constructing a heterogeneous graph for each author entity, and if the names of organizations and co-workers between the candidate pairs are the same or the papers are similar, connecting the entities and the co-workers with each other to obtain a heterogeneous graph G (V, E);
and S43, determining author matching by using the heteromorphic image attention network.
5. The author name disambiguation method based on comparative learning and heterogeneous graph attention network as recited in claim 4, wherein: in S43, the method specifically includes:
s431, obtaining the semantic embedding of each thesis entity through the representation learning model of the S2, and training the heterogeneous graph constructed in the S42 through a LINE model to obtain the structure embedding of each entity;
s432, combining the two kinds of embedding together as an input feature f, and finding out the importance among different author entities e through self-attribute, wherein the process is described as follows:
t ij =self-attention(Wf i ,Wf j )
6. The author name disambiguation method based on a comparative learning and heterogeneous graph attention network of claim 1, further comprising: in S5, specifically including:
s51, generating non-repeated Pairs < name: cid1 and name: cid2> according to the rule of permutation and combination, and constructing a heteromorphic graph;
s52, detecting whether a group of pair belongs to an author by using a pre-trained HGAT;
s53, aligning the paper cluster by giving an alignment rule;
s54, the process needs to be carried out for multiple times, the times are defined as the loops, and the finally obtained cluster _ pubs is the final disambiguation result.
7. The author name disambiguation method based on comparative learning and heterogeneous graph attention networks of claim 6, further comprising: in S53, specifically, the method includes:
s531, calculating adjacent edge nodes of each node, and connecting a group of edges with highest similarity scores of aligned two nodes;
s532, after all the nodes are judged, the dfs is used for realizing the connected subgraph algorithm, the alignment rule is obtained, and the nodes are combined.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211151607.0A CN115481247A (en) | 2022-09-21 | 2022-09-21 | Author name disambiguation method based on comparative learning and heterogeneous graph attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211151607.0A CN115481247A (en) | 2022-09-21 | 2022-09-21 | Author name disambiguation method based on comparative learning and heterogeneous graph attention network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115481247A true CN115481247A (en) | 2022-12-16 |
Family
ID=84424288
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211151607.0A Pending CN115481247A (en) | 2022-09-21 | 2022-09-21 | Author name disambiguation method based on comparative learning and heterogeneous graph attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115481247A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117556058A (en) * | 2024-01-11 | 2024-02-13 | 安徽大学 | Knowledge graph enhanced network embedded author name disambiguation method and device |
-
2022
- 2022-09-21 CN CN202211151607.0A patent/CN115481247A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117556058A (en) * | 2024-01-11 | 2024-02-13 | 安徽大学 | Knowledge graph enhanced network embedded author name disambiguation method and device |
CN117556058B (en) * | 2024-01-11 | 2024-05-24 | 安徽大学 | Knowledge graph enhanced network embedded author name disambiguation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Saxena et al. | Sequence-to-sequence knowledge graph completion and question answering | |
Cai et al. | Generative adversarial network based heterogeneous bibliographic network representation for personalized citation recommendation | |
Dong et al. | From data fusion to knowledge fusion | |
He et al. | Learning entity representation for entity disambiguation | |
US20240013055A1 (en) | Adversarial pretraining of machine learning models | |
US20210303783A1 (en) | Multi-layer graph-based categorization | |
Hühn et al. | FR3: A fuzzy rule learner for inducing reliable classifiers | |
Zhang et al. | Autoblock: A hands-off blocking framework for entity matching | |
Liu et al. | DAGOBAH: an end-to-end context-free tabular data semantic annotation system | |
CN109299263B (en) | Text classification method and electronic equipment | |
US9009029B1 (en) | Semantic hashing in entity resolution | |
CN109635157A (en) | Model generating method, video searching method, device, terminal and storage medium | |
Gao et al. | Building a large-scale, accurate and fresh knowledge graph | |
Dong et al. | Data-anonymous encoding for text-to-SQL generation | |
CN115481247A (en) | Author name disambiguation method based on comparative learning and heterogeneous graph attention network | |
CN114676346A (en) | News event processing method and device, computer equipment and storage medium | |
CN113361270B (en) | Short text optimization topic model method for service data clustering | |
Zimmermann et al. | Incremental active opinion learning over a stream of opinionated documents | |
CN115391548A (en) | Retrieval knowledge graph library generation method based on combination of scene graph and concept network | |
CN113535967B (en) | Chinese universal concept map error correction device | |
CN113128224B (en) | Chinese error correction method, device, equipment and readable storage medium | |
Bhowmick et al. | Globally Aware Contextual Embeddings for Named Entity Recognition in Social Media Streams | |
Song et al. | Metric sentiment learning for label representation | |
Im et al. | Multilayer CARU model for text summarization | |
Xie et al. | Author name disambiguation via heterogeneous network embedding from structural and semantic perspectives |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |