WO2022022045A1 - Procédé et appareil de comparaison de texte basée sur un graphe de connaissances, dispositif, et support de stockage - Google Patents

Procédé et appareil de comparaison de texte basée sur un graphe de connaissances, dispositif, et support de stockage Download PDF

Info

Publication number
WO2022022045A1
WO2022022045A1 PCT/CN2021/096862 CN2021096862W WO2022022045A1 WO 2022022045 A1 WO2022022045 A1 WO 2022022045A1 CN 2021096862 W CN2021096862 W CN 2021096862W WO 2022022045 A1 WO2022022045 A1 WO 2022022045A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
text
relationship
entities
entity
Prior art date
Application number
PCT/CN2021/096862
Other languages
English (en)
Chinese (zh)
Inventor
朱昱锦
徐国强
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022022045A1 publication Critical patent/WO2022022045A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of big data technologies, and in particular, to a text comparison method, apparatus, device and storage medium based on knowledge graphs.
  • Text content comparison technology is widely used in both vertical and general fields. For example, in insurance, banking, investment and other financial text processing scenarios involving incoming document review or risk monitoring, it is necessary to compare multiple documents and check whether there are any contradictions in the information provided by different documents to achieve the purpose of review.
  • the existing text comparison technology uses automatic abstract generation technology to split the text, then generates abstracts for each segment of the split, and finally compares the abstracts of the two articles to determine whether the main content of the two articles expresses the meaning. Consistent, and then judge whether the two articles belong to the same text.
  • This method will perform semantic extraction on the text content, which is conducive to refining the text and improving the efficiency of text element comparison.
  • the inventor realized that in the process of text refining, some useful text information will inevitably be lost, resulting in The comparison results are biased. There is an urgent need for a method that can improve the accuracy of text alignment.
  • the purpose of the embodiments of the present application is to propose a text comparison method based on knowledge graph, so as to improve the accuracy of text comparison.
  • the embodiments of the present application provide a text comparison method based on knowledge graph, including:
  • the relationship between any two adjacent target entities is extracted, the association relationship between any two target entities is judged, and any two target entities with an association relationship are used as Associative entity, take the associated relationship corresponding to the associated entity as the target relationship;
  • the coverage ratio exceeds a preset threshold, it is determined that the text to be compared is the same type of text.
  • a technical solution adopted in this application is to provide a text comparison device based on knowledge graph, including:
  • a training text acquisition module used for collecting training corpus in a preset field, and performing text preprocessing on the training corpus to obtain training text;
  • a target entity acquisition module used for part-of-speech tagging on the training text, and extracting entities in the training text according to the method of dependency syntax analysis, as target entities;
  • the target relationship acquisition module is used for extracting the relationship between any two adjacent target entities through the trained relationship extraction model combined with the training text, and judging the relationship between any two target entities, there will be an associated relationship Any two target entities are regarded as associated entities, and the associated relations corresponding to the associated entities are regarded as target relations;
  • an initial graph construction module used to construct and generate an initial graph with the target entity as a node and the target relationship as an edge
  • the target map building module is used to mark the target entity and target relationship of the initial map, take the marked target entity and target relationship as core information, and cluster the nodes of the initial map according to the core information. class, get the target map;
  • the core information comparison module is used to obtain the text to be compared, input the text to be compared into the target atlas, and count the relationship between the entities and relationships extracted from each of the texts to be compared to the core information in the target atlas. coverage;
  • the same text judgment module is used to determine that the text to be compared is the same type of text if the coverage ratio exceeds a preset threshold.
  • an embodiment of the present application further provides a computer device, including at least one processor; and,
  • the memory stores computer-readable instructions, and the processor implements the following steps when executing the computer-readable instructions:
  • the relationship between any two adjacent target entities is extracted, the association relationship between any two target entities is judged, and any two target entities with an association relationship are used as Associative entity, take the associated relationship corresponding to the associated entity as the target relationship;
  • the coverage ratio exceeds a preset threshold, it is determined that the text to be compared is the same type of text.
  • an embodiment of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the processing The device performs the following steps:
  • the relationship between any two adjacent target entities is extracted, the association relationship between any two target entities is judged, and any two target entities with an association relationship are used as Associative entity, take the associated relationship corresponding to the associated entity as the target relationship;
  • the coverage ratio exceeds a preset threshold, it is determined that the text to be compared is the same type of text.
  • a text comparison method based on knowledge graph in the above scheme constructs the form of graph by extracting entities and relations between entities from the text, and then compares the graphs to identify the similarity of the text, realizes the refinement of the comparison object, and avoids the original text.
  • the interference items in this book are not affected by the text format, which improves the accuracy of text comparison.
  • FIG. 1 is a schematic diagram of an application environment of a text comparison method based on a knowledge graph provided by an embodiment of the present application
  • Fig. 2 is a realization flow chart of the text comparison method based on knowledge graph provided according to the embodiment of the present application;
  • step S2 in the text comparison method based on knowledge graph provided by the embodiment of the present application
  • Fig. 4 is a realization flow chart after step S24 in the text comparison method based on knowledge graph provided by the embodiment of the present application;
  • step S3 is a flow chart of an implementation of step S3 in the text comparison method based on knowledge graph provided by the embodiment of the present application;
  • step S5 is a flow chart of an implementation of step S5 in the text comparison method based on knowledge graph provided by the embodiment of the present application;
  • FIG. 7 is a schematic diagram of a text comparison device based on a knowledge graph provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, search applications, instant communication tools, and the like.
  • the terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.
  • the server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
  • a knowledge graph-based text comparison method provided by the embodiments of the present application is generally executed by a server, and accordingly, a knowledge graph-based text comparison apparatus is generally set in the server.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 shows a specific implementation of a text comparison method based on knowledge graph.
  • the method of the present application is not limited to the flow sequence shown in FIG. 2, and the method includes the following steps:
  • S1 Collect training corpus in a preset field, and perform text preprocessing on the training corpus to obtain training text.
  • the text preprocessing includes data cleaning of the text, etc., so as to keep the text data consistent.
  • the training corpus in the preset field is selected according to the actual need to compare the text, which is not limited here.
  • the training corpus refers to the Chinese sentence pairs or question banks used for training and training;
  • the training corpus in the preset field refers to the Chinese sentence pairs or question banks in the field according to the needs, and the Chinese sentence pairs or question banks in the field are used as the training in the preset field. corpus. For example, if the text of a certain engineering project needs to be compared, the training corpus in the preset field is the text of the engineering project.
  • S2 Tag the training text by part of speech, and extract the entities in the training text according to the method of dependency syntax analysis as the target entity.
  • the nouns and pronouns in the training text are obtained by tagging the training text, and the entities in the training text are extracted according to the method of dependency syntax analysis and used as the target entity.
  • the nouns and pronouns in the training text are extracted by using the pyltp and hanlp open source libraries by tagging the parts of speech of the training text and relying on syntactic analysis.
  • Pyltp and Hanlp are the basic natural language processing libraries released by Harbin Institute of Technology and Hankcs respectively, which are used for part-of-speech tagging and entity extraction.
  • POS part-of-speech tagging
  • DP dependency parsing
  • n general noun
  • ni organization word
  • nl place word
  • ns location word
  • nt time word
  • p the pronoun.
  • I eat apples (I, p), (apples, n).
  • Dependency syntax analysis, using the subject-predicate-object (SBV) relationship, will mark the corresponding words in the sentences in the training text. For example, "I eat apples” is marked as (I, Subject), (eat, Predict), ( Apple, Object), correspond the extracted nouns to the subject and object components, and delete the nouns that do not satisfy these two components in the sentence.
  • dependency syntax analysis was first proposed by the French linguist L.Tesniere. It analyzes the sentence into a dependency syntax tree, and describes the dependency relationship between each word, that is, it points out the syntactic collocation between words, which is related to semantics.
  • entities in the training text are extracted by means of dependency syntax analysis.
  • S3 Extract the relationship between any two adjacent target entities through the trained relationship extraction model combined with the training text, determine the relationship between any two target entities, and use any two target entities with an associated relationship as Associative entities, take the association relationship corresponding to the associated entity as the target relationship.
  • the relationship between each two target entities includes the existence of an association relationship and the absence of an association relationship between each two target entities.
  • the target relationship is an association relationship between two entities
  • the association relationship refers to the state of interaction and mutual influence between the two entities in the text.
  • the relation extraction model includes four parts: Embedding, Encoding, Selector and Classifier. Among them, (1) Embedding performs word embedding and position embedding on the input training text to generate a vector, which is used as the input of the entire model; (2) The Encoding layer is composed of Piecewise-CNN (PCNN), and the context of the training text is input when it is input. The current two target entities are divided into three segments, and the PCNN obtains the feature vectors extracted from the three segments of text, and then splices them together; (3) Selector is the attention layer, which assigns different weights to the feature vectors to train the relation extraction model later.
  • Embedding performs word embedding and position embedding on the input training text to generate a vector, which is used as the input of the entire model
  • the Encoding layer is composed of Piecewise-CNN (PCNN), and the context of the training text is input when it is input.
  • PCNN Piecewise-CNN
  • Selector is the attention layer, which assigns different weight
  • Classifier is an ordinary multi-classification layer, which outputs the possibility that the target entities of the two inputs have a relationship with each other.
  • the model is trained by two-category labeling data (with/without relationship), and the relationship between each two target entities is output.
  • S4 Use the target entity as a node and the target relationship as an edge to construct and generate an initial graph.
  • an initial map is generated by taking the relationship between the entities and the entities in the training text, so as to facilitate the subsequent comparison of the texts to be compared through the map comparison method, and improve the text comparison efficiency. Accuracy and recognition efficiency.
  • S5 Label the target entity and target relationship of the initial graph, take the labeled target entity and target relationship as the core information, and cluster the nodes of the initial graph according to the core information to obtain the target graph.
  • the target entities and target relationships in the initial graph by labeling the target entities and target relationships in the initial graph, and clustering the nodes of the initial graph according to the labeled target entities and target relationships, reducing redundant target entities and target relationships in the initial graph, and finally obtaining the target Atlas.
  • the labeling method adopted is the consistent labeling method.
  • the consistent labeling method is a method of labeling the entities of the graph and the relationships between entities according to unified rules or methods.
  • Consistent labeling methods include but are not limited to: labeling methods based on historical data and experience, randomly selecting labeling methods, etc.
  • the annotation is performed according to the annotation method of historical data and experience, and through this annotation method, the best entity and the relationship between the entities are selected for annotation through the previous data and experience, which is beneficial to improve the accuracy of the map for the relationship between the entities and the entities.
  • S6 Obtain the text to be compared, input the text to be compared into the target graph, and count the coverage ratio of the entities and relationships extracted from each text to be compared to the core information in the target graph.
  • the texts to be compared are input into the target map in turn, and the coverage rate of the entities and relationships extracted from each text to be compared to the core information in the target map is counted, and the subsequent steps are used to determine the target map. Compare whether the text and the training text are the same text.
  • the coverage ratio is the ratio of the overlap between the entities and relationships extracted from the text to be compared and the nodes and edges of the core information.
  • the preset threshold is set according to the actual situation, and is not limited here.
  • a preferable preset threshold is 75%, and under this threshold, it can be clearly seen that there is little difference between the contents of the compared texts.
  • the two or more texts are of the same type.
  • the comparison object is refined, the interference items in the original text are avoided, and the text format is not affected. , which improves the accuracy of text comparison.
  • FIG. 3 shows a specific implementation of step S2.
  • step S2 part-of-speech tagging is performed on the training text, and entities in the training text are extracted according to the method of dependency syntax analysis, which is used as the target entity.
  • the specific implementation process is described in detail as follows:
  • the text delimiter contained in the training text is obtained, which is used to segment the text in subsequent steps.
  • the text delimiters include format delimiters and punctuation delimiters.
  • the format delimiter refers to the delimiter that is divided according to the text encoding type or the text structure. Through the format separator, it is possible to separate the training text according to the encoding type of the text or the structure of the text, and obtain short sentences of the same encoding type or structured text, which is beneficial to the subsequent acquisition of the target entity.
  • the punctuation separator refers to the separator that divides the text according to the punctuation characters. Through the punctuation separator, the training text can be quickly divided, and the efficiency of obtaining short text sentences can be improved.
  • S22 Perform text segmentation on the training text through the text separator to obtain short text sentences.
  • the text segments are spliced into short text sentences according to the preset length; in subsequent steps, part-of-speech tagging and entity extraction can be performed according to the long text short sentences, so as to improve the efficiency of text part-of-speech tagging and entity extraction.
  • the preset length is set according to the actual length, which is not limited here.
  • a preferred preset length is 300 words, and a long text sentence is spliced from 1-5 segments after segmented sentences.
  • S23 Mark the nouns and pronouns in the short text sentence by means of part-of-speech tagging to obtain the marked nouns and pronouns.
  • the consistency rule is to use the subject-verb-object (SBV) relationship, and mark the corresponding words. For example, "I eat apples” is marked as (I, Subject), (eat, Predict), (apple, Object), and the extracted nouns are mapped to the subject and object components, and nouns that do not satisfy these two components in the sentence to delete.
  • SBV subject-verb-object
  • the regular matching method is used to obtain the text separation contained in the training text, and the training text is divided into text by the text separator to obtain short text sentences.
  • Part-of-speech tagging and entity extraction provide a basis for subsequent graph construction, which is beneficial to improve the accuracy of text comparison.
  • FIG. 4 shows a specific implementation after step S24, including:
  • S25 Determine whether two or more initial entities form a compound word by counting the degree of cohesion of the initial entities in the short text sentence, and obtain a judgment result.
  • the aggregation degree of initial entities in short text sentences is counted.
  • tf-idf is a statistical method to evaluate the importance of a word to a document set or one of the documents in a corpus.
  • the importance of a word increases proportionally to the number of times it appears in the document, but decreases inversely to the frequency it appears in the corpus.
  • Co-word analysis utilizes the co-occurrence of words and noun phrases in a collection to determine the relationship between topics in the discipline represented by the collection. It is generally believed that the more times a lexical pair occurs in the same document, the closer the relationship between the two topics is.
  • a co-word network composed of the associations of these word pairs can be formed, and the distance between the nodes in the network can reflect the subject content. of intimacy.
  • tf-idf and co-occurrence analysis are used to count the degree of cohesion of the initial entities, and then it is judged whether two or more initial entities constitute a compound word.
  • the cohesion degree refers to the possibility that multiple words form the current phrase slice (ie compound word).
  • the agglomeration degree of the initial entities it is judged whether two or more initial entities constitute a compound word. For example, if a text phrase is ABC, first divide the frequency of ABC by the frequency of A, B, C, AB, BC, and AC, and divide the smallest of these results. Value as a compound word.
  • the determination of the target entity is further realized by judging whether two or more initial entities constitute a compound word.
  • the combined entity is an entity obtained by combining two or more initial entities to form a compound word.
  • the initial entities include entities that can form a compound word and entities that do not form a compound word; the initial entities that can form a compound word are combined as the target entity, and the entities that do not form a compound word are also used as the target entity alone.
  • FIG. 5 shows a specific implementation before step S3, including:
  • S31 Obtain sample text, and perform word embedding and position embedding on the sample text to generate an embedding vector.
  • the embedded vector is generated and used as the input of the relation extraction model for subsequent numerical operations.
  • the sample text is used to train the relationship extraction model, and the trained relationship extraction model is obtained, which is convenient for subsequent entity relationship extraction.
  • word embedding is a general term for language model and representation learning technology in natural language processing (NLP).
  • NLP natural language processing
  • word embedding refers to the embedding of a high-dimensional space of the number of all words into a continuous vector space of much lower dimension, and each word or phrase is mapped as a vector on the real number domain.
  • Position embedding is relative to word embedding by embedding different positions of the sample text.
  • S32 Divide the context of the sample text into three texts, and obtain the embedding vectors of the three texts as feature vectors.
  • the entities in the sample text are obtained first, and when the sample text is input into the context, the context is divided into three segments through the two entities of the context, and the embedding vectors of the three segments of text are obtained as feature vectors.
  • the feature vector is the output vector hidden layer state vector of the hidden layer of the neural network, which is used as the intermediate result of the relation extraction model for the numerical operation of the subsequent steps.
  • S33 Splicing feature vectors of the same type to obtain a target vector.
  • feature vectors of the same type are spliced to form a feature vector set, that is, a target vector.
  • a feature vector set that is, a target vector.
  • different types of feature vector sets have different weights.
  • Selector is selected as the attention layer of relation extraction.
  • the reason for choosing Selector is that the training data used by the relation extraction model is often derived from remote supervision technology, which leads to large data noise.
  • a common method is to combine multiple samples marked as the same type by remote supervision. Put it into a bag of words, train the entire bag of words in the current training batch at the same time, and then select the correct samples in each bag of words by comparison.
  • Selector can assign different weights to different samples in the same word bag, which is essentially a weighting, so Selector is selected.
  • the weight is obtained by calculating the difference between the probability that the current sample is predicted to be true and the probability that it is correct.
  • an embedding vector is generated, and the context of the sample text is divided into three pieces of text, and the embedding vectors of the three pieces of text are obtained.
  • the feature vectors of the same type are spliced to obtain the target vector, and finally the weight of the target vector is obtained, and the relation extraction model is trained according to the weight of the target vector and the target vector, and the trained relation extraction model is obtained to realize the training of the target extraction model. , which is used to output the relationship between entities in the training file to build a map, which is beneficial to improve the accuracy of text comparison.
  • step S4 the text comparison method based on knowledge graph also includes:
  • the target entities and target relations extracted from the training text are disambiguated and deduplicated. Because there may be the same entity that is expressed in different ways in different texts, or the entities connected by the same relationship are expressed in different ways, resulting in entity/relation redundancy.
  • To disambiguate and deduplicate use the python open source library dedupe to complete. Substitute all the extracted entities and relationships into the tool in the form of triples (entity A, relationship, entity B), and dedupe merges entities or relationships with the same meaning through clustering operations.
  • the clustering operation is to select the corresponding target entity and target relationship by aggregating duplicate items, select the optimal threshold through the calculation of the similarity value, and finally obtain the target entity and target relationship with the same meaning.
  • Dedupe is a python open source library for knowledge fusion.
  • the processing flow includes entity/relationship description similarity calculation (record similarity), smart comparisons (smart comparisons), aggregating duplicates (Grouping Duplicates), and selecting an optimal threshold (Choosing a Good Threshold) several main steps.
  • similarity calculation and intelligent matching use the method of active learning combined with rule matching, aggregation duplicates use the hierarchical clustering with centroid linkage, and finally put these three modules into active learning (active learning). ) framework for learning, and through a small number of annotations, dedupe determines the optimal threshold according to the annotations.
  • FIG. 6 shows a specific implementation of step S5.
  • step S5 the target entity and target relationship of the initial map are marked, and the marked target entity and target relationship are used as core information, and according to the core information
  • the nodes of the initial graph are clustered to obtain the specific implementation process of the target graph, which is described in detail as follows
  • S51 Acquire the text information of the marked target entity and the unmarked target entity in the training text, and obtain the marked text information and the unmarked text information.
  • the labeled text information and the unlabeled text information are obtained by annotating the target entity and the target relationship of the initial graph, and obtaining the text information of the labeled target entity and the unlabeled target entity in the training text.
  • S52 Substitute the labeled text information and the unlabeled text information into the BERT model to obtain the vector, and obtain the labeled vector and the unlabeled vector.
  • the labeled vector is obtained by substituting the labeled text information into the BERT model for vector acquisition
  • the unlabeled vector is obtained by substituting the unlabeled text information into the BERT model for vector acquisition.
  • the calculation of the similarity value includes but is not limited to: Minkowski Distance, Manhattan Distance, Euclidean Distance, Cosine Similarity, Hamming Distance, etc.
  • the preset threshold is set according to the actual situation, which is not limited here.
  • the marked text information and the unmarked text information are obtained, and the marked text information and the unmarked text information are substituted into the BERT
  • the vector acquisition is performed in the model to obtain the labeled vector and the unlabeled vector, and then the similarity value between each unlabeled vector and the labeled vector is calculated.
  • the marked target entity and target relationship are deleted to obtain the target map, which realizes the construction of the target map, which is conducive to comparing the texts to be compared and improving the accuracy of text comparison.
  • the above text to be compared can also be stored in a node of a blockchain.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • the present application provides an embodiment of a text comparison apparatus based on knowledge graph, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 2 .
  • the device can be specifically applied to various electronic devices.
  • the knowledge graph-based text comparison device of this embodiment includes: a training text acquisition module 71, a target entity acquisition module 72, a target relationship acquisition module 73, an initial graph construction module 74, a target graph construction module 75, The core information comparison module 76 and the same text judgment module 77, wherein:
  • the training text acquisition module 71 is used for collecting training corpus in a preset field, and performing text preprocessing on the training corpus to obtain training text;
  • the target entity acquisition module 72 is used to perform part-of-speech tagging on the training text, and extract the entities in the training text according to the method of dependency syntax analysis as the target entity;
  • the target relationship acquisition module 73 is used to extract the relationship between any two adjacent target entities through the trained relationship extraction model combined with the training text, and determine the relationship between any two target entities. Any two target entities are regarded as associated entities, and the associated relationship corresponding to the associated entities is regarded as the target relationship;
  • the initial graph construction module 74 is used for judging the association relationship between any two target entities, taking any two target entities that have the association relationship as the associated entity, and the association relationship corresponding to the associated entity as the target relationship;
  • the target map building module 75 is used to mark the target entity and target relationship of the initial map, take the marked target entity and target relationship as core information, and cluster the nodes of the initial map according to the core information to obtain the target map;
  • the core information comparison module 76 is used to obtain the text to be compared, input the text to be compared into the target map, and count the coverage rate of the core information in the target map of the entities and relationships extracted from each text to be compared;
  • the same text judgment module 77 is configured to determine that the texts to be compared are of the same type if the coverage ratio exceeds a preset threshold.
  • the target entity acquisition module 72 includes:
  • the text separator obtaining unit is used to obtain the text separator contained in the training text by means of regular matching;
  • the text short sentence acquisition unit is used to perform text segmentation on the training text through the text separator to obtain text short sentences;
  • the part-of-speech tagging unit is used to tag nouns and pronouns in short text sentences by means of part-of-speech tagging to obtain the tagged nouns and pronouns;
  • the initial entity determination unit is used to correspond the marked nouns and pronouns to the consistency rules according to the method of dependency syntax analysis, and use the marked nouns that conform to the consistency rules as the initial entities.
  • the target entity acquisition module 72 further includes:
  • the cohesion degree statistical unit is used to judge whether two or more initial entities form a compound word by counting the cohesion degree of the initial entities in the short sentences of the text, and obtain the judgment result;
  • the compound word judgment unit is used for combining the initial entities forming the compound word to obtain a combined entity if the result of the determination is that a compound word is formed, and the combined entity is used as a target entity.
  • the above-mentioned knowledge graph-based text comparison device further includes:
  • the sample text acquisition module is used to obtain sample text, perform word embedding and position embedding on the sample text, and generate an embedding vector:
  • the feature vector acquisition module is used to divide the context of the sample text into three texts, and obtain the embedded vectors of the three texts as feature vectors;
  • the target vector acquisition module is used to splicing the feature vectors of the same type to obtain the target vector;
  • the target extraction model training module is used to obtain the weight of the target vector, and train the relation extraction model according to the weight of the target vector and the target vector, so as to obtain a trained relation extraction model.
  • the above-mentioned knowledge graph-based text comparison device further includes:
  • the clustering operation module is used to perform clustering operations on target entities and target relations respectively, and respectively combine target entities with the same meaning and target relations with the same meaning.
  • the target map building module 75 includes:
  • the text information acquisition unit is used to acquire the text information of the marked target entity and the unmarked target entity in the training text, and obtain the marked text information and the unmarked text information;
  • the vector acquisition unit is used to substitute the marked text information and unmarked text information into the BERT model for vector acquisition, and obtain the marked vector and the unmarked vector;
  • the similarity value statistical unit is used to count the similarity value between each unlabeled vector and the labeled vector
  • the similarity value judgment unit is configured to delete the unlabeled target entity and target relationship in the initial map corresponding to the unlabeled vector if the similarity value exceeds a preset threshold, to obtain the target map.
  • the above target data can also be stored in a node of a blockchain.
  • FIG. 8 is a block diagram of a basic structure of a computer device according to this embodiment.
  • the computer device 8 includes a memory 81 , a processor 82 , and a network interface 83 that are connected to each other through a system bus. It should be pointed out that the figure only shows the computer device 8 with three components, the memory 81, the processor 82, and the network interface 83, but it should be understood that it is not required to implement all the components shown, and alternative implementations are possible. More or fewer components.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded equipment etc.
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, and a cloud server and other computing equipment.
  • Computer devices can interact with users through keyboards, mice, remote controls, touchpads, or voice-activated devices.
  • the memory 81 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 81 may be an internal storage unit of the computer device 8 , such as a hard disk or memory of the computer device 8 .
  • the memory 81 may also be an external storage device of the computer device 8, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 81 may also include both the internal storage unit of the computer device 8 and its external storage device.
  • the memory 81 is generally used to store the operating system and various application software installed on the computer device 8 , such as computer-readable instructions for a text comparison method based on a knowledge graph, and the like.
  • the memory 81 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 82 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 82 is typically used to control the overall operation of the computer device 8 .
  • the processor 82 is configured to execute computer-readable instructions stored in the memory 81 or process data, for example, computer-readable instructions for executing a text comparison method based on a knowledge graph.
  • the network interface 83 may comprise a wireless network interface or a wired network interface, and the network interface 83 is typically used to establish a communication connection between the computer device 8 and other electronic devices.
  • the present application also provides another embodiment, which is to provide a computer-readable storage medium, where computer-readable instructions are stored in the computer-readable storage medium, and the computer-readable instructions can be executed by at least one processor to cause at least one processing
  • the controller executes the steps of a knowledge graph-based text comparison method as described above.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods of the various embodiments of the present application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé de comparaison de texte basée sur un graphe de connaissances, se rapportant à la technologie des mégadonnées. Ledit procédé comporte les étapes consistant à: acquérir un texte d'apprentissage, reconnaître des entités cibles et des relations cibles dans le texte d'apprentissage, puis générer un graphe en prenant les entités cibles comme nœuds et les relations cibles comme arcs, et prendre le graphe comme graphe initial; marquer les entités cibles et les relations cibles du graphe initial, et regrouper les nœuds du graphe initial selon les entités cibles et les relations cibles marquées, de façon à obtenir un graphe cible; acquérir du texte à comparer, et introduire ledit texte dans le graphe cible, et calculer un taux de couverture, sur des informations centrales dans le graphe cible, d'entités et de relations extraites de chaque fragment dudit texte; et si le taux de couverture dépasse un seuil prédéfini, déterminer que ledit texte est un texte de la même catégorie. La présente invention concerne en outre la technologie des chaînes de blocs, et ledit texte est stocké dans une chaîne de blocs. Le procédé accroît l'exactitude et le rendement de la comparaison de textes au moyen d'une comparaison de graphe.
PCT/CN2021/096862 2020-07-27 2021-05-28 Procédé et appareil de comparaison de texte basée sur un graphe de connaissances, dispositif, et support de stockage WO2022022045A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010734571.3A CN111897970B (zh) 2020-07-27 2020-07-27 基于知识图谱的文本比对方法、装置、设备及存储介质
CN202010734571.3 2020-07-27

Publications (1)

Publication Number Publication Date
WO2022022045A1 true WO2022022045A1 (fr) 2022-02-03

Family

ID=73190588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096862 WO2022022045A1 (fr) 2020-07-27 2021-05-28 Procédé et appareil de comparaison de texte basée sur un graphe de connaissances, dispositif, et support de stockage

Country Status (2)

Country Link
CN (1) CN111897970B (fr)
WO (1) WO2022022045A1 (fr)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114138985A (zh) * 2022-02-08 2022-03-04 深圳希施玛数据科技有限公司 文本数据处理的方法、装置、计算机设备以及存储介质
CN114372732A (zh) * 2022-03-22 2022-04-19 杭州杰牌传动科技有限公司 实现用户需求智能匹配的减速电机协同制造方法和***
CN114496115A (zh) * 2022-04-18 2022-05-13 北京白星花科技有限公司 实体关系的标注自动生成方法和***
CN114661872A (zh) * 2022-02-25 2022-06-24 北京大学 一种面向初学者的api自适应推荐方法与***
CN114707005A (zh) * 2022-06-02 2022-07-05 浙江建木智能***有限公司 一种舰船装备的知识图谱构建方法和***
CN114741468A (zh) * 2022-03-22 2022-07-12 平安科技(深圳)有限公司 文本去重方法、装置、设备及存储介质
CN114741522A (zh) * 2022-03-11 2022-07-12 北京师范大学 一种文本分析方法、装置、存储介质及电子设备
CN114742029A (zh) * 2022-04-20 2022-07-12 中国传媒大学 一种汉语文本比对方法、存储介质及设备
CN114783559A (zh) * 2022-06-23 2022-07-22 浙江太美医疗科技股份有限公司 医学影像报告信息抽取方法、装置、电子设备和存储介质
CN114996389A (zh) * 2022-08-04 2022-09-02 中科雨辰科技有限公司 一种标注类别一致性检验方法、存储介质及电子设备
CN115129719A (zh) * 2022-06-28 2022-09-30 深圳市规划和自然资源数据管理中心 一种基于知识图谱的定性位置空间范围构建方法
CN115358341A (zh) * 2022-08-30 2022-11-18 北京睿企信息科技有限公司 一种基于关系模型的指代消歧的训练方法及***
CN115880120A (zh) * 2023-02-24 2023-03-31 江西微博科技有限公司 一种在线政务服务***及服务方法
CN115909386A (zh) * 2023-01-06 2023-04-04 中国石油大学(华东) 一种管道仪表流程图的补全和纠错方法、设备及存储介质
CN116703441A (zh) * 2023-05-25 2023-09-05 云内控科技有限公司 一种基于知识图谱的医疗项目成本核算可视分析方法
CN116882408A (zh) * 2023-09-07 2023-10-13 南方电网数字电网研究院有限公司 变压器图模型的构建方法、装置、计算机设备和存储介质
US20230359825A1 (en) * 2022-05-06 2023-11-09 Sap Se Knowledge graph entities from text
CN117195913A (zh) * 2023-11-08 2023-12-08 腾讯科技(深圳)有限公司 文本处理方法、装置、电子设备、存储介质及程序产品
WO2023246849A1 (fr) * 2022-06-22 2023-12-28 青岛海尔电冰箱有限公司 Procédé de génération de graphe de données de rétroaction et réfrigérateur
CN117332282A (zh) * 2023-11-29 2024-01-02 之江实验室 一种基于知识图谱的事件匹配的方法及装置
CN117371534A (zh) * 2023-12-07 2024-01-09 同方赛威讯信息技术有限公司 一种基于bert的知识图谱构建方法及***
CN117454884A (zh) * 2023-12-20 2024-01-26 上海蜜度科技股份有限公司 历史人物信息纠错方法、***、电子设备和存储介质
CN118171727A (zh) * 2024-05-16 2024-06-11 神思电子技术股份有限公司 三元组的生成方法、装置、设备、介质及程序产品

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897970B (zh) * 2020-07-27 2024-05-10 平安科技(深圳)有限公司 基于知识图谱的文本比对方法、装置、设备及存储介质
CN113051407B (zh) * 2021-03-26 2022-10-21 烽火通信科技股份有限公司 一种网络智能运维知识图谱协同构建和共享方法与装置
CN113220827B (zh) * 2021-04-23 2023-03-28 哈尔滨工业大学 一种农业语料库的构建方法及装置
CN113128231A (zh) * 2021-04-25 2021-07-16 深圳市慧择时代科技有限公司 一种数据质检方法、装置、存储介质和电子设备
CN113408271B (zh) * 2021-06-16 2021-11-30 北京来也网络科技有限公司 基于rpa及ai的信息抽取方法、装置、设备及介质
CN113742495B (zh) * 2021-09-07 2024-02-23 平安科技(深圳)有限公司 基于预测模型的评级特征权重确定方法及装置、电子设备
CN113590846B (zh) * 2021-09-24 2021-12-17 天津汇智星源信息技术有限公司 法律知识图谱构建方法及相关设备
CN114547327A (zh) * 2022-01-19 2022-05-27 北京吉威数源信息技术有限公司 时空大数据关系图谱生成方法、装置、设备及存储介质
CN114925210B (zh) * 2022-03-21 2023-12-08 中国电信股份有限公司 知识图谱的构建方法、装置、介质及设备
CN114880023B (zh) * 2022-07-11 2022-09-30 山东大学 面向技术特征的源代码对比方法、***与程序产品
CN117475086A (zh) * 2023-12-22 2024-01-30 知呱呱(天津)大数据技术有限公司 一种基于扩散模型的科技文献附图生成方法及***

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5371807A (en) * 1992-03-20 1994-12-06 Digital Equipment Corporation Method and apparatus for text classification
CN107633005A (zh) * 2017-08-09 2018-01-26 广州思涵信息科技有限公司 一种基于课堂教学内容的知识图谱构建、对比***及方法
CN110825882A (zh) * 2019-10-09 2020-02-21 西安交通大学 一种基于知识图谱的信息***管理方法
CN111259897A (zh) * 2018-12-03 2020-06-09 杭州翼心信息科技有限公司 知识感知的文本识别方法和***
CN111428044A (zh) * 2020-03-06 2020-07-17 中国平安人寿保险股份有限公司 多模态获取监管识别结果的方法、装置、设备及存储介质
CN111897970A (zh) * 2020-07-27 2020-11-06 平安科技(深圳)有限公司 基于知识图谱的文本比对方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241538B (zh) * 2018-09-26 2022-12-20 上海德拓信息技术股份有限公司 基于关键词和动词依存的中文实体关系抽取方法
CN109284396A (zh) * 2018-09-27 2019-01-29 北京大学深圳研究生院 医学知识图谱构建方法、装置、服务器及存储介质
CN110543571A (zh) * 2019-08-07 2019-12-06 北京市天元网络技术股份有限公司 用于水利信息化的知识图谱构建方法以及装置
CN111177393B (zh) * 2020-01-02 2023-03-24 广东博智林机器人有限公司 一种知识图谱的构建方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5371807A (en) * 1992-03-20 1994-12-06 Digital Equipment Corporation Method and apparatus for text classification
CN107633005A (zh) * 2017-08-09 2018-01-26 广州思涵信息科技有限公司 一种基于课堂教学内容的知识图谱构建、对比***及方法
CN111259897A (zh) * 2018-12-03 2020-06-09 杭州翼心信息科技有限公司 知识感知的文本识别方法和***
CN110825882A (zh) * 2019-10-09 2020-02-21 西安交通大学 一种基于知识图谱的信息***管理方法
CN111428044A (zh) * 2020-03-06 2020-07-17 中国平安人寿保险股份有限公司 多模态获取监管识别结果的方法、装置、设备及存储介质
CN111897970A (zh) * 2020-07-27 2020-11-06 平安科技(深圳)有限公司 基于知识图谱的文本比对方法、装置、设备及存储介质

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114138985B (zh) * 2022-02-08 2022-04-26 深圳希施玛数据科技有限公司 文本数据处理的方法、装置、计算机设备以及存储介质
CN114138985A (zh) * 2022-02-08 2022-03-04 深圳希施玛数据科技有限公司 文本数据处理的方法、装置、计算机设备以及存储介质
CN114661872A (zh) * 2022-02-25 2022-06-24 北京大学 一种面向初学者的api自适应推荐方法与***
CN114661872B (zh) * 2022-02-25 2023-07-21 北京大学 一种面向初学者的api自适应推荐方法与***
CN114741522A (zh) * 2022-03-11 2022-07-12 北京师范大学 一种文本分析方法、装置、存储介质及电子设备
CN114741468B (zh) * 2022-03-22 2024-03-29 平安科技(深圳)有限公司 文本去重方法、装置、设备及存储介质
CN114372732A (zh) * 2022-03-22 2022-04-19 杭州杰牌传动科技有限公司 实现用户需求智能匹配的减速电机协同制造方法和***
CN114741468A (zh) * 2022-03-22 2022-07-12 平安科技(深圳)有限公司 文本去重方法、装置、设备及存储介质
CN114496115A (zh) * 2022-04-18 2022-05-13 北京白星花科技有限公司 实体关系的标注自动生成方法和***
CN114496115B (zh) * 2022-04-18 2022-08-23 北京白星花科技有限公司 实体关系的标注自动生成方法和***
CN114742029A (zh) * 2022-04-20 2022-07-12 中国传媒大学 一种汉语文本比对方法、存储介质及设备
US20230359825A1 (en) * 2022-05-06 2023-11-09 Sap Se Knowledge graph entities from text
CN114707005A (zh) * 2022-06-02 2022-07-05 浙江建木智能***有限公司 一种舰船装备的知识图谱构建方法和***
CN114707005B (zh) * 2022-06-02 2022-10-25 浙江建木智能***有限公司 一种舰船装备的知识图谱构建方法和***
WO2023246849A1 (fr) * 2022-06-22 2023-12-28 青岛海尔电冰箱有限公司 Procédé de génération de graphe de données de rétroaction et réfrigérateur
CN114783559B (zh) * 2022-06-23 2022-09-30 浙江太美医疗科技股份有限公司 医学影像报告信息抽取方法、装置、电子设备和存储介质
CN114783559A (zh) * 2022-06-23 2022-07-22 浙江太美医疗科技股份有限公司 医学影像报告信息抽取方法、装置、电子设备和存储介质
CN115129719A (zh) * 2022-06-28 2022-09-30 深圳市规划和自然资源数据管理中心 一种基于知识图谱的定性位置空间范围构建方法
CN114996389B (zh) * 2022-08-04 2022-10-11 中科雨辰科技有限公司 一种标注类别一致性检验方法、存储介质及电子设备
CN114996389A (zh) * 2022-08-04 2022-09-02 中科雨辰科技有限公司 一种标注类别一致性检验方法、存储介质及电子设备
CN115358341A (zh) * 2022-08-30 2022-11-18 北京睿企信息科技有限公司 一种基于关系模型的指代消歧的训练方法及***
CN115909386A (zh) * 2023-01-06 2023-04-04 中国石油大学(华东) 一种管道仪表流程图的补全和纠错方法、设备及存储介质
CN115909386B (zh) * 2023-01-06 2023-05-12 中国石油大学(华东) 一种管道仪表流程图的补全和纠错方法、设备及存储介质
CN115880120A (zh) * 2023-02-24 2023-03-31 江西微博科技有限公司 一种在线政务服务***及服务方法
CN116703441A (zh) * 2023-05-25 2023-09-05 云内控科技有限公司 一种基于知识图谱的医疗项目成本核算可视分析方法
CN116882408B (zh) * 2023-09-07 2024-02-27 南方电网数字电网研究院有限公司 变压器图模型的构建方法、装置、计算机设备和存储介质
CN116882408A (zh) * 2023-09-07 2023-10-13 南方电网数字电网研究院有限公司 变压器图模型的构建方法、装置、计算机设备和存储介质
CN117195913B (zh) * 2023-11-08 2024-02-27 腾讯科技(深圳)有限公司 文本处理方法、装置、电子设备、存储介质及程序产品
CN117195913A (zh) * 2023-11-08 2023-12-08 腾讯科技(深圳)有限公司 文本处理方法、装置、电子设备、存储介质及程序产品
CN117332282A (zh) * 2023-11-29 2024-01-02 之江实验室 一种基于知识图谱的事件匹配的方法及装置
CN117332282B (zh) * 2023-11-29 2024-03-08 之江实验室 一种基于知识图谱的事件匹配的方法及装置
CN117371534A (zh) * 2023-12-07 2024-01-09 同方赛威讯信息技术有限公司 一种基于bert的知识图谱构建方法及***
CN117371534B (zh) * 2023-12-07 2024-02-27 同方赛威讯信息技术有限公司 一种基于bert的知识图谱构建方法及***
CN117454884A (zh) * 2023-12-20 2024-01-26 上海蜜度科技股份有限公司 历史人物信息纠错方法、***、电子设备和存储介质
CN117454884B (zh) * 2023-12-20 2024-04-09 上海蜜度科技股份有限公司 历史人物信息纠错方法、***、电子设备和存储介质
CN118171727A (zh) * 2024-05-16 2024-06-11 神思电子技术股份有限公司 三元组的生成方法、装置、设备、介质及程序产品

Also Published As

Publication number Publication date
CN111897970B (zh) 2024-05-10
CN111897970A (zh) 2020-11-06

Similar Documents

Publication Publication Date Title
WO2022022045A1 (fr) Procédé et appareil de comparaison de texte basée sur un graphe de connaissances, dispositif, et support de stockage
US20230142217A1 (en) Model Training Method, Electronic Device, And Storage Medium
US9317498B2 (en) Systems and methods for generating summaries of documents
WO2021068339A1 (fr) Procédé et dispositif de classification de texte, et support de stockage lisible par ordinateur
CN103049435B (zh) 文本细粒度情感分析方法及装置
US10025819B2 (en) Generating a query statement based on unstructured input
US10740545B2 (en) Information extraction from open-ended schema-less tables
WO2021121198A1 (fr) Procédé et appareil d'extraction de relation d'entité basée sur une similitude sémantique, dispositif et support
Al-Anzi et al. Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach
Bhargava et al. Atssi: Abstractive text summarization using sentiment infusion
US20130060769A1 (en) System and method for identifying social media interactions
WO2020232943A1 (fr) Procédé de construction de graphe de connaissances pour prédiction d'événement, et procédé de prédiction d'événement
US9477756B1 (en) Classifying structured documents
Zhang et al. A comprehensive survey of abstractive text summarization based on deep learning
US20210073257A1 (en) Logical document structure identification
Han et al. Text Summarization Using FrameNet‐Based Semantic Graph Model
Beheshti et al. Big data and cross-document coreference resolution: Current state and future opportunities
WO2015084757A1 (fr) Systèmes et procédés de traitement de données stockées dans une base de données
US20230282018A1 (en) Generating weighted contextual themes to guide unsupervised keyphrase relevance models
Makrynioti et al. PaloPro: a platform for knowledge extraction from big social data and the news
Khalid et al. Reference terms identification of cited articles as topics from citation contexts
Mishra et al. A novel approach to capture the similarity in summarized text using embedded model
Mishra et al. Similarity search based on text embedding model for detection of near duplicates
SCALIA Network-based content geolocation on social media for emergency management
Sonbhadra et al. Email classification via intention-based segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849774

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21849774

Country of ref document: EP

Kind code of ref document: A1