CN113947087A - Label-based relation construction method and device, electronic equipment and storage medium - Google Patents

Label-based relation construction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113947087A
CN113947087A CN202111558564.3A CN202111558564A CN113947087A CN 113947087 A CN113947087 A CN 113947087A CN 202111558564 A CN202111558564 A CN 202111558564A CN 113947087 A CN113947087 A CN 113947087A
Authority
CN
China
Prior art keywords
entity
target
entities
candidate
tags
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111558564.3A
Other languages
Chinese (zh)
Other versions
CN113947087B (en
Inventor
贾晓丰
江茜
肖益
张晰
李宝东
穆显显
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Big Data Center
Taiji Computer Corp Ltd
Original Assignee
Taiji Computer Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiji Computer Corp Ltd filed Critical Taiji Computer Corp Ltd
Priority to CN202111558564.3A priority Critical patent/CN113947087B/en
Publication of CN113947087A publication Critical patent/CN113947087A/en
Application granted granted Critical
Publication of CN113947087B publication Critical patent/CN113947087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a label-based relationship construction method, a label-based relationship construction device, electronic equipment and a storage medium, wherein the method comprises the steps of determining an entity identifier; analyzing a plurality of candidate entities corresponding to the entity identification from a plurality of data sets, wherein the plurality of data sets comprise: the data collection of a plurality of service domains, the data collection of a plurality of channels and the data collection of a plurality of types; selecting a part of candidate entities from a plurality of candidate entities as target entities; determining a plurality of target tags corresponding to a plurality of target entities; and determining the target entity relationship among the target entities according to the target tags. By the method and the device, intelligent matching of the candidate entity and the entity relationship can be realized, the influence of ambiguity on candidate entity identification is reduced, the accuracy of target entity and target entity relationship identification is effectively improved, and the identification effect of the target entity and target entity relationship is further improved.

Description

Label-based relation construction method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of knowledge graph relationship construction technologies, and in particular, to a method and an apparatus for constructing a relationship based on a tag, an electronic device, and a storage medium.
Background
With the development of technologies such as internet of things, cloud computing, artificial intelligence and big data, the data scale shows unprecedented explosive growth, and the data mode shows a highly complex evolution trend. The evolution of network infrastructure, the improvement of computing power, the innovation of algorithm and the sinking of application scenes provide a good foundation for the data element circulation. Meanwhile, data valuing also faces a plurality of problems, and the complexity, relevance and time variability of data bring challenges to information fusion and knowledge learning.
In the related technology, the accuracy of identifying the relationship between the entity with more ambiguity and the entity is not high, and the extraction effect of the relationship between the entities is poor.
Disclosure of Invention
The present disclosure is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the present disclosure aims to provide a tag-based relationship construction method, a tag-based relationship construction device, an electronic device, and a storage medium, which can implement intelligent matching of a candidate entity and an entity relationship, reduce the influence of ambiguity on candidate entity identification, effectively improve the accuracy of target entity and target entity relationship identification, and further improve the identification effect of the target entity and target entity relationship.
In order to achieve the above object, a method for constructing a relationship based on a tag according to an embodiment of the first aspect of the present disclosure includes: determining an entity identifier; analyzing a plurality of candidate entities corresponding to the entity identification from a plurality of data sets, wherein the plurality of data sets comprise: the data collection of a plurality of service domains, the data collection of a plurality of channels and the data collection of a plurality of types; selecting a part of candidate entities from a plurality of candidate entities as target entities; determining a plurality of target tags corresponding to a plurality of target entities; and determining the target entity relationship among the target entities according to the target tags.
In the method for constructing a relationship based on a tag according to an embodiment of the first aspect of the present disclosure, an entity identifier is determined, and a plurality of candidate entities corresponding to the entity identifier are obtained by parsing from a plurality of data sets, where the plurality of data sets include: the method comprises the steps of selecting partial candidate entities from a plurality of candidate entities as target entities, determining a plurality of target labels corresponding to the target entities, and determining target entity relationships among the target entities according to the target labels, wherein the target entities are subjected to labeling processing, the labels corresponding to the candidate entities are determined, and then the entity relationships among the candidate entities are matched, so that the intelligent matching of the candidate entities and the entity relationships can be realized, the influence of ambiguity on candidate entity identification is reduced, the accuracy of target entity and target entity relationship identification is effectively improved, and the identification effect of the target entities and the target entity relationships is further improved.
In order to achieve the above object, an embodiment of a second aspect of the present disclosure provides a tag-based relationship building apparatus, including: a determining module for determining an entity identifier; the analysis module is used for analyzing a plurality of candidate entities corresponding to the entity identification from a plurality of data sets, wherein the plurality of data sets comprise: the data collection of a plurality of service domains, the data collection of a plurality of channels and the data collection of a plurality of types; the selecting module is used for selecting part of candidate entities from the candidate entities as target entities; a first determining module for determining a plurality of target tags corresponding to a plurality of target entities; and the second determining module is used for determining the target entity relationship among the target entities according to the target tags.
In an embodiment of the second aspect of the present disclosure, a tag-based relationship building apparatus obtains, by determining an entity identifier, a plurality of candidate entities corresponding to the entity identifier by parsing from a plurality of data sets, where the plurality of data sets include: the method comprises the steps of selecting partial candidate entities from a plurality of candidate entities as target entities, determining a plurality of target labels corresponding to the target entities, and determining target entity relationships among the target entities according to the target labels, wherein the target entities are subjected to labeling processing, the labels corresponding to the candidate entities are determined, and then the entity relationships among the candidate entities are matched, so that the intelligent matching of the candidate entities and the entity relationships can be realized, the influence of ambiguity on candidate entity identification is reduced, the accuracy of target entity and target entity relationship identification is effectively improved, and the identification effect of the target entities and the target entity relationships is further improved.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the tag-based relationship building method of the embodiments of the first aspect of the present disclosure.
According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, storing computer instructions for causing a computer to execute the method for tag-based relationship building of the first aspect of the present disclosure.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the label-based relationship building method of the embodiments of the first aspect of the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a method for constructing a relationship based on a tag according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart diagram of a method for constructing a relationship based on tags according to another embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a candidate entity disambiguation model according to another embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a method for constructing a relationship based on tags according to another embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a BERT model according to another embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a method for label-based relationship construction and convergence according to another embodiment of the disclosure;
fig. 7 is a schematic structural diagram of a tag-based relationship building apparatus according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a tag-based relationship building apparatus according to another embodiment of the present disclosure;
FIG. 9 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of illustrating the present disclosure and should not be construed as limiting the same.
Fig. 1 is a schematic flowchart of a relationship building method based on tags according to an embodiment of the present disclosure.
The embodiment is exemplified by the tag-based relationship building method being configured as a tag-based relationship building apparatus, the tag-based relationship building method in the embodiment may be configured in the tag-based relationship building apparatus, and the tag-based relationship building apparatus may be disposed in a server or may also be disposed in an electronic device, which is not limited in this disclosure.
The embodiment takes the example that the label-based relationship building method is configured in the electronic device. Among them, electronic devices such as smart phones, tablet computers, personal digital assistants, electronic books, and other hardware devices having various operating systems.
It should be noted that the execution subject of the embodiment of the present disclosure may be, for example, a Central Processing Unit (CPU) in a server or an electronic device in terms of hardware, and may be, for example, a related background service in the server or the electronic device in terms of software, which is not limited to this.
As shown in fig. 1, the method for constructing a relationship based on tags includes:
s101: an entity identity is determined.
The unique identifier inherent to the entity for distinguishing the entity from other entities may be referred to as an entity identifier, the entity may be identified using data, character strings or texts, the symbol for identifying the entity may be referred to as an entity identifier, the entity identifier may be, for example, a character string composed of one or more of letters (a-Z ), numbers (0-9), and an underline "_", or may be a text representing the characteristics of the entity, and the like, without limitation.
In the embodiment of the present disclosure, the determining the entity identifier may be selecting an entity with an entity identifier configured in advance in a database, or determining the entity identifier corresponding to the entity from a sample text according to entity features such as semantics of the entity.
S102: analyzing a plurality of candidate entities corresponding to the entity identification from a plurality of data sets, wherein the plurality of data sets comprise: data sets of multiple business domains, data sets of multiple channels, and data sets of multiple types.
The plurality of data sets may be data sets of a plurality of business domains, a plurality of channels, and a plurality of types, and the data sets including the plurality of business domains, the plurality of channels, and the plurality of types are regarded as the plurality of data sets. The data sets of the multiple service domains may be data sets containing different service data, such as data sets generated by service domains for city management, traffic administration, and the like; the data sets of the multiple channels may be different data sets formed according to different data channel sources, such as a data set derived from a production channel or a data set derived from a processing channel; the multiple types of data sets may be data sets partitioned differently according to data structuring and unstructured, such as data sets representing traffic and data sets representing various types of sensor measurements in a city.
In the embodiment of the present disclosure, all the service domains, all the channels, and all the types of data may be collectively referred to as global data, and the plurality of data sets may be sets of global data.
In the embodiment of the present disclosure, the multiple candidate entities corresponding to the entity identifier are obtained by parsing from the multiple data sets, and may be obtained by determining entities in the multiple data sets according to the entity identifier, where the multiple candidate entities may have the same entity identifier, but semantics and attributes represented by the multiple candidate entities may be the same or different, that is, ambiguity may exist among the multiple candidate entities.
For example, the candidate entity indicated by the entity identifier is set to be "Xiaoming", and in different types of data sets, the candidate entity "Xiaoming" may represent a person, or may represent a business or a vehicle, that is, the candidate entity "Xiaoming" may have different semantics and attributes.
S103: and selecting a part of candidate entities from the plurality of candidate entities as target entities.
In the embodiment of the present disclosure, the selecting of the partial candidate entities may be randomly selecting partial entities of the multiple candidate entities, or selecting the partial entities according to the type of the data set, which is not limited herein.
For example, the multiple candidate entities analyzed from the multiple data sets to represent "xiaoming" may be set to a selection ratio (e.g., 50%), a part of the candidate entities may be randomly selected as target entities as needed, or the multiple candidate entities "xiaoming" may be selected from the data sets to represent the type of people.
S104: a plurality of target tags corresponding to a plurality of target entities is determined.
Herein, the label reflecting the semantic information of the application scenario for the recognition and description of the target entity may be referred to as a target label, it is understood that both the static basic attribute and the dynamic behavior data of the target entity may be referred to as target labels, for example, when the target entity is a person, the static basic attribute may be basic information (age, sex, etc.), family information, work information, etc., and the dynamic behavior data may be data of various dynamic behaviors generated by learning, work, life, entertainment, social activities, etc.
It is understood that one target entity may have a plurality of target tags, and one target tag may correspond to one or more target entities, without limitation.
In the embodiment of the present disclosure, the determining of the multiple target tags corresponding to the multiple target entities may be performed by determining the target tags according to attributes of the target entities in the data set, or may be performed by setting one or more attributes as the target tags in advance, and determining the target tags corresponding to the attributes of the target entities according to the attributes of the target entities in the data set.
For example, the sample text is: the cup for Xiaoming is a glass cup, the target entity cup can be determined to be an article, and meanwhile, the labels of the glass product and the belonged Xiaoming of the cup can be determined according to the context semantics of the target entity cup in the database.
S105: and determining the target entity relationship among the target entities according to the target tags.
For describing relationships between multiple target entities, which may be referred to as target entity relationships, the target entity relationships may be, for example, words representing relationships between target entities, such as occur in sample text: the target entity relationship between the target entity Xiaoming and the target entity glass can be determined to be 'owned' according to the sample text.
In this embodiment, the entity identifier is determined, and a plurality of candidate entities corresponding to the entity identifier are obtained by parsing from a plurality of data sets, where the plurality of data sets include: the method comprises the steps of selecting partial candidate entities from a plurality of candidate entities as target entities, determining a plurality of target labels corresponding to the target entities, and determining target entity relationships among the target entities according to the target labels, wherein the target entities are subjected to labeling processing, the labels corresponding to the candidate entities are determined, and then the entity relationships among the candidate entities are matched, so that the intelligent matching of the candidate entities and the entity relationships can be realized, the influence of ambiguity on candidate entity identification is reduced, the accuracy of target entity and target entity relationship identification is effectively improved, and the identification effect of the target entities and the target entity relationships is further improved.
Fig. 2 is a schematic flowchart of a method for constructing a relationship based on a tag according to another embodiment of the present disclosure.
As shown in fig. 2, the method for constructing a relationship based on tags includes:
s201: and acquiring mass entity data.
The data volume is huge, the data containing a plurality of target entities and a plurality of target entity relationships may be referred to as massive entity data, and the massive entity data may contain a plurality of data sets, which is not limited to this.
In the embodiment of the present disclosure, mass entity data may be obtained, mass data including a relationship between a target entity and the target entity may be obtained according to an actual application scenario, or data related to the relationship between the target entity and the target entity may be collected and obtained in a large database.
For example, in a real-world scenario, an enterprise acquires data recorded in an enterprise management platform, wherein the data includes information such as user basic information and user consumption information.
S202: extracting a plurality of reference entities from the mass entity data, wherein the plurality of reference entities correspond to the same or different information, and the information is service domain information, channel information or type information.
In the embodiment of the present disclosure, the extracting of the multiple reference entities may be multiple reference entities randomly extracted from the massive entity data, and it can be understood that the multiple reference entities correspond to the same or different information, that is, the multiple reference entities may represent multiple objects, for example, a reference entity "fire truck", a reference entity "fire emergency vehicle", and a reference entity "urban fire truck" may all represent the same object.
In the embodiment of the present disclosure, the information may be service domain information, or channel information, or type information, for example, in a service domain, the information may be customer information, service information, and the like represented by a reference entity, in a data set of a channel, the information may be a plurality of pieces of information of a plurality of reference entities collected by the channel, and in a data set of a type, the information may be a plurality of pieces of information representing types of the reference entities.
S203: and constructing a target entity knowledge base according to the plurality of reference entities.
The knowledge base related to the service domain information, the channel information or the type information is constructed according to the plurality of reference entities, and may be called a target entity knowledge base, and the target entity knowledge base may be a knowledge graph or a knowledge base containing a plurality of target entities and target entity relations.
In the embodiment of the present disclosure, a knowledge base based on a target entity may be constructed according to corresponding information between a plurality of reference entities, for example, in a city management scenario, a target entity "xiaoming" and a target entity "xiaohong" representing a person and a target entity "xiaohong" and "xiaohong" representing a vehicle may be constructed, and a target entity knowledge base { xiaohingman, xiaohong-man, vehicle, xiaohingman } may be constructed.
S204: and respectively labeling the mass entity data based on the plurality of service domains to obtain a plurality of candidate labels respectively corresponding to the plurality of service domains.
In the embodiment of the disclosure, massive entity data of all service domains, all channels and all types can be sorted, and information corresponding to the massive entity is labeled to generate a plurality of corresponding candidate labels.
In the embodiment of the present disclosure, the production data of the production business domain may be subjected to tagging, the sales data of the sales business domain may be subjected to tagging, and the like, and the mass entity data is divided into the production data, the sales data, and the like, so as to obtain the production tags, the sales tags, and the like corresponding to a plurality of business domains, such as the production business domain and the sales business domain, respectively.
S205: and labeling the mass entity data based on a plurality of channels respectively to obtain a plurality of candidate labels corresponding to the channels respectively.
In the embodiment of the disclosure, the tagging processing may be performed on the massive entity data according to a plurality of channels, for example, the massive entity data may be divided into data of different source channels such as internet data and real scene data, and the data may be tagged to obtain a plurality of candidate tags corresponding to the plurality of channels.
S206: and respectively labeling the mass entity data based on multiple types to obtain multiple candidate labels respectively corresponding to the multiple types.
In the embodiment of the present disclosure, tagging may be performed on the massive entity data according to multiple types, for example, the massive entity data may be divided into multiple category lines for representing people, enterprises, objects, others, and the like, and tagging may be performed according to the divided types to obtain multiple candidate tags respectively corresponding to the multiple types.
S207: and forming a universe data tag library according to the candidate tags.
In the embodiment of the present disclosure, the global data tag library includes the obtained multiple candidate tags respectively corresponding to multiple service domains, multiple channels, and multiple types.
Optionally, in some embodiments, the types of candidate tags include: attribute tags, feature tags, fact tags, inference tags.
The candidate tags formed by the attributes which are inherent in the mass entity data and do not change along with the change of the external conditions can be called as attribute tags, the attributes are description attributes of a certain entity or a certain relation, and the corresponding data source is generally unrelated to a specific scene, so that the attribute tags are relatively static and have a long life cycle.
The candidate tag formed by tagging the features that can be distinguished and identified from each other in the mass entity data may be called a feature tag, and different from the attribute tag, the feature tag may change with the change of the external condition.
The candidate tags can be divided into fact tags and inference tags according to different generation and calculation modes, the fact tags can be non-enumeration tags and can be obtained by simply sorting massive entity data, for example: a population address, a birth date, a social security account number, etc. The inference tags can be subdivided, for example, the inference tags can be divided into statistical tags, rule tags, mining tags and the like, the statistical tags can be statistical tags according to the dimension and the measurement matrix of the scene where the massive entity data are located and can be assembled into statistical tags through experience and actual service requirements, for example, effective sample numbers and the like, the rule tags can be tags which do not directly correspond to the massive entity data and need to be obtained through rule definition and calculation, for example, tags for infants, teenagers and the like, the mining tags can be tags which cannot be directly obtained and need to be obtained through complex logic analysis reasoning, and related conclusions obtained according to regularity of a plurality of events occurring in a plurality of scenes of analysis objects, for example, high-risk enterprise tags, high-growth enterprise tags and the like.
S208: an entity identity is determined.
S209: analyzing a plurality of candidate entities corresponding to the entity identification from a plurality of data sets, wherein the plurality of data sets comprise: data sets of multiple business domains, data sets of multiple channels, and data sets of multiple types.
For the description of S208 to S209, reference may be made to the above embodiments, which are not described herein again.
S210: and carrying out disambiguation processing on the candidate entities according to the target entity knowledge base to obtain the target entity.
In the target entity knowledge base, the term of the candidate entity may correspond to one or more different meanings, that is, the candidate entity has ambiguity and needs to be disambiguated, for example, in the sample text "door has no lock", the "lock" may be regarded as the candidate entity, and the "lock" of the candidate entity may refer to a "lock" of the noun static state or a "lock action" of the dynamic state, and the "lock" of the candidate entity has ambiguity and needs to be disambiguated, and entity disambiguation may be a process of linking the candidate entity with the corresponding target entity.
Fig. 3 is a schematic diagram of a candidate entity disambiguation model according to another embodiment of the present disclosure, as shown in fig. 3, disambiguation is performed on candidate entities by using entity links and an "unsupervised clustering + relationship transfer" method according to a plurality of candidate tags corresponding to a plurality of candidate entities before disambiguation, calculation of similarity after disambiguation is obtained by using entity term similarity calculation based on surface features, entity term similarity calculation based on extended features, and an entity term similarity calculation method based on a social network, and each category in a clustering result corresponds to a target entity by clustering the entity terms through a clustering algorithm. The method for calculating and processing the similarity by using the clustering algorithm and the relation transfer can be called an unsupervised clustering and relation transfer method.
The surface features may refer to basic features corresponding to the candidate entities, hidden features, features of the candidate entities that need to be calculated or processed to some extent, social networks, and features of the candidate entities that play roles in the society. The similarity calculation method of the entity nominal item based on the surface layer characteristics, the extended characteristics and the social network can improve the similarity calculation effect.
For example, when the candidate entity is a vehicle, the surface feature may be a color, a shape, a size, and other features corresponding to the candidate entity, the hidden feature may be a corresponding vehicle condition, a driving condition, and other features, and the social network may be a use (such as urban fire protection, public transportation) of the vehicle corresponding to the candidate entity.
Optionally, in some embodiments, the disambiguation processing is performed on a plurality of candidate entities according to the target entity knowledge base to obtain the target entity, the candidate entity may be connected to a plurality of target candidate entities in the target entity knowledge base, a plurality of similarities between the candidate entities and the plurality of target candidate entities are determined according to a pre-trained entity link model, and the target candidate entity with the similarity satisfying a set condition is used as the target entity, wherein the entity link model is an artificial intelligence model trained in advance based on an association modeling method and a consistency modeling method until the artificial intelligence model converges, and the artificial intelligence model obtained by training is used as the entity link model, since the similarity is determined according to the pre-trained entity link model and the candidate entity with the similarity meeting the requirement is selected as the target entity, the similarity between the candidate entity and the target candidate entity can be more accurately calculated, thereby improving the disambiguation processing effect of the candidate entity with ambiguity.
In other embodiments, disambiguation processing may be performed on a plurality of candidate entities, semantic features of the candidate entities in the context may also be mined, and a target entity may be determined using the semantic features, which is not limited thereto.
For example, as shown in FIG. 4, FIG. 4 isIn another embodiment of the present disclosure, a BERT model structure diagram is provided, which is a diagram given an input sequence of sample text sentences
Figure 731468DEST_PATH_IMAGE001
After the pretreatment and vectorization of BERT, the corresponding sentence segmentation vector, position vector, word vector and other vector representations are obtained
Figure 511205DEST_PATH_IMAGE002
Then, the sentence segmentation vector, the position vector and the word vector are coded by a coder to obtain an output vector
Figure 906414DEST_PATH_IMAGE003
After being processed by the BERT model, the similarity between the triple entities can be calculated according to the target label, and the inference of the universe weak relationship between the triple entities is further completed.
S211: candidate tags corresponding to the target entities are identified from the universe data tag library and serve as target tags.
In the embodiment of the present disclosure, the candidate tags corresponding to the multiple target entities are identified from the global data tag library, the candidate tags of the target entities may be determined in the disambiguation process, and then identified from the global data tag library, or the target tags may be identified after the disambiguation process is completed.
For example, in disambiguating the candidate entity "Xiaoming", the candidate tag of the candidate entity "Xiaoming" is determined to be "car", and the candidate tag "car" may be identified from the global database tag library and used as the target tag.
S212: and determining the target entity relationship among the target entities according to the target tags.
For the description of S212, reference may be made to the above embodiments, which are not described herein again.
In the embodiment, by obtaining mass entity data, extracting a plurality of reference entities from the mass entity data, wherein the plurality of reference entities correspond to the same or different information, the information is service domain information, channel information or type information, constructing a target entity knowledge base according to the plurality of reference entities, labeling the mass entity data based on the plurality of service domains to obtain a plurality of candidate labels respectively corresponding to the plurality of service domains, labeling the mass entity data based on the plurality of channels to obtain a plurality of candidate labels respectively corresponding to the plurality of channels, labeling the mass entity data based on the plurality of types to obtain a plurality of candidate labels respectively corresponding to the plurality of types, forming a global data label base according to the plurality of candidate labels, then determining entity identification, and analyzing from a plurality of data sets to obtain a plurality of candidate entities corresponding to the entity identification, the plurality of data sets includes: the method comprises the steps that a plurality of candidate entities are disambiguated according to a target entity knowledge base to obtain target entities, candidate tags corresponding to the target entities are identified from a global data tag base and serve as target tags, and target entity relations among the target entities are determined according to the target tags. Because the multiple candidate tags are divided, a large and complete tag system can be constructed, the accuracy of tag identification is effectively improved, meanwhile, the abundant candidate tags are beneficial to disambiguation of candidate entities, because the multiple reference entities are extracted from mass entity data, and the target entity knowledge base is constructed according to the multiple reference entities, the source range of the reference data can be expanded, the applicability of the target entity knowledge base is wider, and the disambiguation treatment effect on the multiple candidate entities is assisted to be improved.
Fig. 5 is a schematic flowchart of a method for constructing a relationship based on tags according to another embodiment of the present disclosure.
As shown in fig. 5, the method for constructing a relationship based on tags includes:
s501: an entity identity is determined.
S502: analyzing a plurality of candidate entities corresponding to the entity identification from a plurality of data sets, wherein the plurality of data sets comprise: data sets of multiple business domains, data sets of multiple channels, and data sets of multiple types.
S503: and selecting a part of candidate entities from the plurality of candidate entities as target entities.
S504: a plurality of target tags corresponding to a plurality of target entities is determined.
For the description of S501-S504, reference may be made to the above embodiments, which are not described herein again.
S505: and identifying the triple entity from the target entities by adopting an entity identification method.
The two target entities and the target entity relationship between the two target entities may be referred to as a triplet, and the target entities in the triplet may be referred to as triplet entities, for example, a sample text "little lie is a minuscule leader" may constitute a triplet: "plum, leader, xiaoming", wherein "leader" is an entity relationship and "plum" and "xiaoming" are triplet entities.
Optionally, in some embodiments, the identification of the triplet entity may be to perform entity tagging and part-of-speech tagging on a sample text to which the target entity belongs, extract a target entity conforming to the tag from the sample text, determine a part-of-speech of the tagged target entity, and determine the triplet entity from the target entity and the tagged target entity according to the part-of-speech.
In the embodiment of the present disclosure, the entity tagging and the part-of-speech tagging may be performed on the sample text to which the target entity belongs by using a method combining a sequence tagging algorithm and a rule matching algorithm, for example, the part-of-speech tag "P" may be a name of a person, "L" may be location information, "O" may be a multi-person organization, and the like.
In an embodiment of the present disclosure, the sequence annotation algorithm may be an algorithm model including three models, namely a Language Model (LM), a Long short-term memory (LSTM), and a Conditional Random Field (CRF), and the rule matching algorithm may be a predefined rule, and then identifies and extracts the triplet entities according to the predefined rule, for example, the predefined rule may be: extracting the subject/object and the modifier and compound words thereof, and extracting punctuation marks between the subject/object and the modifier and the compound words.
Optionally, in other embodiments, the triple entity may be determined by using a grammatical order, the sample text may be decomposed into a subject, a predicate, an object, and other character strings according to an algorithm, and the entity represented by the subject and the object may be determined as the triple entity according to the grammatical order, for example, in the sample text "Mingming" department service, the subject "Mingming", the predicate "processing", and the object "department service" may be determined, and then the triple entity is determined as "Mingming" and "department service".
S506: and processing the target tags and the triple entities by adopting a sequence labeling algorithm and a text binary classification algorithm to determine the universe weak relationship among the triple entities, wherein the universe weak relationship is based on a plurality of service domains, a plurality of channels and a plurality of types of entity relationships.
In the universe range, the relationship between the triplet entities, which is relatively dynamic and has a small influence on the triplet entities, may be referred to as a universe weak relationship.
In the embodiment of the present disclosure, the plurality of target tags and triplet entities may be processed, a weak global relationship between triplet entities may be extracted according to the target tags, a Bidirectional transform coder language model (BERT model may be used hereinafter), the BERT model may capture timing information between different words in a sample text, and implement Bidirectional information transfer in both forward and backward directions, in the BERT model, a header and a tail of a complete sample text statement may be respectively marked with [ CLS ] and [ SEP ] for distinguishing two different sample text statements, for example, a sample text is "performing urban greening processing", and after being marked with [ CLS ], the [ SEP ] is performing urban greening processing.
In the embodiment of the present disclosure, the global weak relationship between the triple entities may be obtained by processing the multiple target tags and the triple entities through a text binary classification algorithm, where the text binary classification algorithm may be to extract text features of sample text data, represent information in the sample text according to the text features, and convert the information in the sample text into a format (e.g., a computer programming language format, etc.) that can be recognized by a computer.
S507: and taking the universe weak relation as an entity relation among the triple entities.
In the embodiment of the present disclosure, the entity relationship among the triple entities is regarded according to the multiple service domains, the multiple channels, and the multiple types of entity relationships, for example, the entity relationship "business amount" and the entity relationship "cost" of the multiple service domains may be regarded as the entity relationship among the triple entities, that is, the global weak relationship, and when the triple entity is associated with the global weak relationship "business amount" or the global weak relationship "cost", the triple entity corresponding to the global weak relationship may be labeled.
S508: entity attributes corresponding to the triplet entities are determined.
The static attribute of the triple entity may be referred to as an entity attribute, and the entity attribute may be unrelated to a specific scene and has a long life cycle, such as attributes of a region, a gender, and the like.
In the embodiment of the present disclosure, the determining of the entity attribute corresponding to the triple entity may be to search an attribute tag used for representing the entity attribute in a global data tag library in advance, then determine the attribute tag corresponding to the global weak relationship according to the global weak relationship between the triple entities, or enumerate the global weak relationship between the triple entities, and then determine the entity attribute according to a machine identification or a manual identification, which is not limited.
S509: and taking the triple entity, the entity relationship and the entity attribute as the target entity relationship.
In the embodiment of the present disclosure, the triple entity, the global weak relationship corresponding to the triple entity, and the entity attribute corresponding to the triple entity are collectively used as the target entity relationship.
For example, as shown in fig. 6, fig. 6 is a flowchart illustrating a label-based relationship construction and convergence method according to another embodiment of the disclosure. Firstly, establishing a relation based on labels, wherein the relation is logically divided into an event layer, an entity layer and a space-time layer, and the event layer, the entity layer and the space-time layer respectively correspond to an event identification code, an entity identification code and a space-time identification code, wherein the event identification code is used for identifying corresponding events occurring in a city, such as health code identification, close investigation, vaccination, nucleic acid detection and other events; in the entity layer, entities can be types of people, enterprises, houses, vehicles, city components and the like corresponding to the entities in a city, a plurality of attribute labels and a plurality of feature labels can be extracted from data such as a base table, a text, an image, audio, video and the like by an automatic labeling learning method, an atomic-scale large-scale weak relation is constructed, and then a strong relation accurate convergence based on space-time mapping of the attribute labels and the feature labels is obtained by combining event identification codes with constraint conditions (such as sequential constraint, co-operation, co-trip and the like and spatial constraint, co-living, co-working, co-checking and the like), so that the establishment of the entity layer is realized; the space-time identification code corresponds to the time-space layer and is used for identifying data such as time sequence characteristics, space-time grids, space-time tracks, high-resolution images and the like. And obtaining the strength relation among the entities of the corresponding entities in the entity layer through the event code, the entity code and the space-time code, and then obtaining the result of the corresponding event according to the strength relation among the entities.
In this embodiment, the entity identifier is determined, and a plurality of candidate entities corresponding to the entity identifier are obtained by parsing from a plurality of data sets, where the plurality of data sets include: selecting partial candidate entities from the multiple candidate entities as target entities, determining multiple target labels corresponding to the multiple target entities, identifying triple entities from the multiple target entities by adopting an entity identification method, processing the multiple target labels and the triple entities by adopting a sequence labeling algorithm and a text binary classification algorithm to determine a global weak relationship among the triple entities, wherein the global weak relationship is based on the multiple service domains, the multiple channels and the multiple types of entity relationships, the global weak relationship is used as the entity relationship among the triple entities, the entity attributes corresponding to the triple entities are determined, the triple entities, the entity relationships and the entity attributes are used as the target entity relationship together, and the entity relationships among the triple entities are determined by combining a relationship identification model, the triple entities, the entity relationships and the entity attributes are jointly used as the target entity relationships, the extraction efficiency of the target entity relationships can be improved, the target entity relationships with higher quality can be obtained, and the sequence labeling algorithm and the text classification algorithm are adopted to determine the global weak relationships among the triple entities, so that the global weak relationships among the triple entities can be established in a large scale, the large-scale atomic-level global weak relationships are established, and the integrity of the entity relationship expression among the triple entities is improved.
Fig. 7 is a schematic structural diagram of a tag-based relationship building apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the tag-based relationship building apparatus 70 includes:
a determining module 701, configured to determine an entity identifier;
an analysis module 702, configured to analyze multiple candidate entities corresponding to the entity identifier from multiple data sets, where the multiple data sets include: the data collection of a plurality of service domains, the data collection of a plurality of channels and the data collection of a plurality of types;
a selecting module 703, configured to select a part of candidate entities from the multiple candidate entities as target entities;
a first determining module 704 for determining a plurality of target tags corresponding to a plurality of target entities;
the second determining module 705 is configured to determine a target entity relationship between a plurality of target entities according to the plurality of target tags.
In some embodiments of the present disclosure, as shown in fig. 8, fig. 8 is a schematic structural diagram of a tag-based relationship building apparatus according to another embodiment of the present disclosure, further including:
an obtaining module 706, configured to obtain massive entity data before determining the entity identifier;
a first processing module 707, configured to perform tagging processing on the mass entity data based on the multiple service domains, respectively, to obtain multiple candidate tags corresponding to the multiple service domains, respectively;
a second processing module 708, configured to perform tagging processing on the massive entity data based on a plurality of channels, respectively, so as to obtain a plurality of candidate tags corresponding to the plurality of channels, respectively;
a third processing module 709, configured to perform tagging processing on the mass entity data based on multiple types, respectively, so as to obtain multiple candidate tags respectively corresponding to the multiple types;
a forming module 710 for forming a population data tag library according to a plurality of candidate tags;
the first determining module 704 is specifically configured to:
candidate tags corresponding to the target entities are identified from the universe data tag library and serve as target tags.
In some embodiments of the present disclosure, as shown in fig. 8, the types of candidate tags include:
attribute tags, feature tags, fact tags, inference tags.
In some embodiments of the present disclosure, as shown in fig. 8, further comprising:
the extraction module 711 is configured to extract a plurality of reference entities from the massive entity data, where the plurality of reference entities correspond to the same or different information, and the information is service domain information, channel information, or type information;
a construction module 712 for constructing a target entity knowledge base from a plurality of reference entities;
the selecting module 703 is specifically configured to:
and carrying out disambiguation processing on the candidate entities according to the target entity knowledge base to obtain the target entity.
In some embodiments of the present disclosure, as shown in fig. 8, the selecting module 703 is specifically configured to:
connecting the candidate entity to a plurality of target candidate entities in a target entity knowledge base;
determining a plurality of similarities between the candidate entity and a plurality of target candidate entities respectively according to a pre-trained entity link model;
and taking a target candidate entity with the similarity meeting the set condition as a target entity, wherein the entity link model is an initial artificial intelligence model which is trained in advance based on an association modeling method and a consistency modeling method until the artificial intelligence model is converged, and taking the artificial intelligence model obtained by training as the entity link model.
In some embodiments of the present disclosure, as shown in fig. 8, the second determining module 705 is specifically configured to:
identifying a triple entity from a plurality of target entities by adopting an entity identification method;
determining entity relationships among the triple entities by combining a relationship recognition model according to the target tags;
determining entity attributes corresponding to the triple entities;
and taking the triple entity, the entity relationship and the entity attribute as the target entity relationship.
In some embodiments of the present disclosure, as shown in fig. 8, the second determining module 705 is specifically configured to:
carrying out entity marking and part-of-speech marking on a sample text to which a target entity belongs;
extracting a target entity which accords with the mark from the sample text, and determining the part of speech of the marked target entity;
and determining the triple entity from the target entity and the marked target entity according to the part of speech.
In some embodiments of the present disclosure, as shown in fig. 8, the relationship recognition model includes: a sequence labeling algorithm and a text classification algorithm, wherein the second determining module 705 is specifically configured to:
processing the target tags and the triple entities by adopting a sequence labeling algorithm and a text binary classification algorithm to determine the universe weak relationship among the triple entities, wherein the universe weak relationship is based on a plurality of service domains, a plurality of channels and a plurality of types of entity relationships;
and taking the universe weak relation as an entity relation among the triple entities.
Corresponding to the relationship construction method based on the label provided in the embodiments of fig. 1 to 6, the present disclosure also provides a relationship construction apparatus based on the label, and since the relationship construction apparatus based on the label provided in the embodiments of the present disclosure corresponds to the relationship construction method based on the label provided in the embodiments of fig. 1 to 6, the implementation manner of the relationship construction method based on the label is also applicable to the relationship construction apparatus based on the label provided in the embodiments of the present disclosure, and will not be described in detail in the embodiments of the present disclosure.
In this embodiment, the entity identifier is determined, and a plurality of candidate entities corresponding to the entity identifier are obtained by parsing from a plurality of data sets, where the plurality of data sets include: the method comprises the steps of selecting partial candidate entities from a plurality of candidate entities as target entities, determining a plurality of target labels corresponding to the target entities, and determining target entity relationships among the target entities according to the target labels, wherein the target entities are subjected to labeling processing, the labels corresponding to the candidate entities are determined, and then the entity relationships among the candidate entities are matched, so that the intelligent matching of the candidate entities and the entity relationships can be realized, the influence of ambiguity on candidate entity identification is reduced, the accuracy of target entity and target entity relationship identification is effectively improved, and the identification effect of the target entities and the target entity relationships is further improved.
In order to achieve the above embodiments, the present disclosure also proposes a non-transitory computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the tag-based relationship construction method as proposed by the foregoing embodiments of the present disclosure.
In order to implement the above embodiments, the present disclosure also provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the program, the label-based relationship construction method as proposed by the foregoing embodiments of the present disclosure is realized.
In order to implement the foregoing embodiments, the present disclosure also proposes a computer program product, which when executed by an instruction processor in the computer program product, executes the label-based relationship construction method proposed by the foregoing embodiments of the present disclosure.
FIG. 9 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure. The electronic device 12 shown in fig. 9 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 9, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16. Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 9, and commonly referred to as a "hard drive").
Although not shown in FIG. 9, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read Only Memory (CD-ROM), a Digital versatile disk Read Only Memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described in this disclosure.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network such as the Internet) via the Network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, for example, implementing the label-based relationship construction method mentioned in the foregoing embodiments, by executing a program stored in the system memory 28.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the embodiments of the present application. The words "if" and "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination", depending on the context.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present disclosure includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present disclosure have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present disclosure, and that changes, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present disclosure.

Claims (18)

1. A method for constructing relationships based on labels, the method comprising:
determining an entity identifier;
analyzing a plurality of candidate entities corresponding to the entity identification from a plurality of data sets, wherein the plurality of data sets comprise: the data collection of a plurality of service domains, the data collection of a plurality of channels and the data collection of a plurality of types;
selecting a part of candidate entities from the plurality of candidate entities as target entities;
determining a plurality of target tags corresponding to a plurality of the target entities;
and determining the target entity relationship among the target entities according to the target labels.
2. The method of claim 1, prior to said determining an entity identity, further comprising:
acquiring mass entity data;
labeling the mass entity data based on the plurality of service domains respectively to obtain a plurality of candidate labels corresponding to the plurality of service domains respectively;
labeling the mass entity data based on the channels respectively to obtain a plurality of candidate labels corresponding to the channels respectively;
labeling the mass entity data based on the multiple types respectively to obtain multiple candidate labels respectively corresponding to the multiple types;
forming a global data tag library according to the candidate tags;
wherein the determining a plurality of target tags corresponding to a plurality of the target entities comprises:
and identifying candidate tags corresponding to the target entities from the universe data tag library as the target tags.
3. The method of claim 2, wherein the type of the candidate tag comprises:
attribute tags, feature tags, fact tags, inference tags.
4. The method of claim 2, further comprising:
extracting a plurality of reference entities from mass entity data, wherein the plurality of reference entities correspond to the same or different information, and the information is service domain information, channel information or type information;
constructing a target entity knowledge base according to the plurality of reference entities;
wherein the selecting a part of candidate entities from the plurality of candidate entities as target entities comprises:
and carrying out disambiguation processing on the candidate entities according to the target entity knowledge base to obtain the target entity.
5. The method of claim 4, wherein disambiguating the plurality of candidate entities from the target entity repository to obtain the target entity comprises:
connecting the candidate entity to a plurality of target candidate entities in the target entity repository;
determining a plurality of similarities between the candidate entity and the plurality of target candidate entities respectively according to a pre-trained entity link model;
and taking the target candidate entity with the similarity meeting the set condition as the target entity, wherein the entity link model is an initial artificial intelligence model which is trained in advance based on an association modeling method and a consistency modeling method until the artificial intelligence model is converged, and taking the artificial intelligence model obtained by training as the entity link model.
6. The method of claim 1, wherein said determining a target entity relationship between a plurality of said target entities based on said plurality of target tags comprises:
identifying a triple entity from the plurality of target entities by adopting an entity identification method;
determining entity relationships among the triple entities according to the target tags and by combining a relationship recognition model;
determining entity attributes corresponding to the triple entities;
and taking the triple entity, the entity relation and the entity attribute as the target entity relation.
7. The method of claim 6, wherein said identifying triples from a plurality of said target entities using an entity identification method comprises:
carrying out entity marking and part-of-speech marking on the sample text to which the target entity belongs;
extracting a target entity which accords with the mark from the sample text, and determining the part of speech of the marked target entity;
and determining the triple entity from the target entity and the marked target entity according to the part of speech.
8. The method of claim 6, wherein the relationship recognition model comprises: a sequence labeling algorithm and a text classification algorithm, wherein the determining of the entity relationship among the triple entities according to the plurality of target tags and in combination with a relationship recognition model comprises:
processing the target tags and the triple entities by adopting the sequence labeling algorithm and the text classification algorithm to determine a global weak relationship among the triple entities, wherein the global weak relationship is based on the service domains, the channels and the entity relationships of the multiple types;
and taking the universe weak relation as an entity relation among the triple entities.
9. A tag-based relationship building apparatus, the apparatus comprising:
a determining module for determining an entity identifier;
an analysis module, configured to analyze multiple candidate entities corresponding to the entity identifier from multiple data sets, where the multiple data sets include: the data collection of a plurality of service domains, the data collection of a plurality of channels and the data collection of a plurality of types;
a selecting module, configured to select a part of candidate entities from the multiple candidate entities as target entities;
a first determining module for determining a plurality of target tags corresponding to a plurality of the target entities;
and the second determining module is used for determining the target entity relationship among the target entities according to the target labels.
10. The apparatus of claim 9, further comprising:
an obtaining module, configured to obtain massive entity data before the entity identifier is determined;
the first processing module is used for performing labeling processing on the massive entity data based on the plurality of service domains respectively to obtain a plurality of candidate labels corresponding to the plurality of service domains respectively;
the second processing module is used for performing labeling processing on the massive entity data based on the channels respectively to obtain a plurality of candidate labels corresponding to the channels respectively;
the third processing module is used for performing tagging processing on the mass entity data based on the multiple types respectively to obtain multiple candidate tags respectively corresponding to the multiple types;
the forming module is used for forming a global data tag library according to the candidate tags;
the first determining module is specifically configured to:
and identifying candidate tags corresponding to the target entities from the universe data tag library as the target tags.
11. The apparatus of claim 10, wherein the types of candidate tags comprise:
attribute tags, feature tags, fact tags, inference tags.
12. The apparatus of claim 10, further comprising:
the extraction module is used for extracting a plurality of reference entities from mass entity data, the plurality of reference entities correspond to the same or different information, and the information is service domain information, channel information or type information;
the construction module is used for constructing a target entity knowledge base according to the plurality of reference entities;
wherein, the selecting module is specifically configured to:
and carrying out disambiguation processing on the candidate entities according to the target entity knowledge base to obtain the target entity.
13. The apparatus of claim 12, wherein the selection module is specifically configured to:
connecting the candidate entity to a plurality of target candidate entities in the target entity repository;
determining a plurality of similarities between the candidate entity and the plurality of target candidate entities respectively according to a pre-trained entity link model;
and taking the target candidate entity with the similarity meeting the set condition as the target entity, wherein the entity link model is an initial artificial intelligence model which is trained in advance based on an association modeling method and a consistency modeling method until the artificial intelligence model is converged, and taking the artificial intelligence model obtained by training as the entity link model.
14. The apparatus of claim 9, wherein the second determining module is specifically configured to:
identifying a triple entity from the plurality of target entities by adopting an entity identification method;
determining entity relationships among the triple entities according to the target tags and by combining a relationship recognition model;
determining entity attributes corresponding to the triple entities;
and taking the triple entity, the entity relation and the entity attribute as the target entity relation.
15. The apparatus of claim 14, wherein the second determining module is specifically configured to:
carrying out entity marking and part-of-speech marking on the sample text to which the target entity belongs;
extracting a target entity which accords with the mark from the sample text, and determining the part of speech of the marked target entity;
and determining the triple entity from the target entity and the marked target entity according to the part of speech.
16. The apparatus of claim 14, wherein the relationship recognition model comprises: a sequence labeling algorithm and a text classification algorithm, wherein the second determining module is specifically configured to:
processing the target tags and the triple entities by adopting the sequence labeling algorithm and the text classification algorithm to determine a global weak relationship among the triple entities, wherein the global weak relationship is based on the service domains, the channels and the entity relationships of the multiple types;
and taking the universe weak relation as an entity relation among the triple entities.
17. An electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the label-based relationship building method of any one of claims 1-8.
18. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, implement the label-based relationship construction method according to any one of claims 1-8.
CN202111558564.3A 2021-12-20 2021-12-20 Label-based relation construction method and device, electronic equipment and storage medium Active CN113947087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111558564.3A CN113947087B (en) 2021-12-20 2021-12-20 Label-based relation construction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111558564.3A CN113947087B (en) 2021-12-20 2021-12-20 Label-based relation construction method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113947087A true CN113947087A (en) 2022-01-18
CN113947087B CN113947087B (en) 2022-04-15

Family

ID=79339341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111558564.3A Active CN113947087B (en) 2021-12-20 2021-12-20 Label-based relation construction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113947087B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116702785A (en) * 2023-08-03 2023-09-05 腾讯科技(深圳)有限公司 Processing method and device of relational tag, storage medium and electronic equipment
CN117057343A (en) * 2023-10-10 2023-11-14 腾讯科技(深圳)有限公司 Road event identification method, device, equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861939A (en) * 2017-09-30 2018-03-30 昆明理工大学 A kind of domain entities disambiguation method for merging term vector and topic model
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
CN108280062A (en) * 2018-01-19 2018-07-13 北京邮电大学 Entity based on deep learning and entity-relationship recognition method and device
CN109635297A (en) * 2018-12-11 2019-04-16 湖南星汉数智科技有限公司 A kind of entity disambiguation method, device, computer installation and computer storage medium
KR101983477B1 (en) * 2017-11-28 2019-05-29 한국과학기술원 Method and System for zero subject resolution in Korean using a paragraph-based pivotal entity identification
CN110555083A (en) * 2019-08-26 2019-12-10 北京工业大学 non-supervision entity relationship extraction method based on zero-shot
CN110569366A (en) * 2019-09-09 2019-12-13 腾讯科技(深圳)有限公司 text entity relation extraction method and device and storage medium
CN110781683A (en) * 2019-11-04 2020-02-11 河海大学 Entity relation joint extraction method
CN112560485A (en) * 2020-11-24 2021-03-26 北京三快在线科技有限公司 Entity linking method and device, electronic equipment and storage medium
CN113283236A (en) * 2021-05-31 2021-08-20 北京邮电大学 Entity disambiguation method in complex Chinese text
CN113761218A (en) * 2021-04-27 2021-12-07 腾讯科技(深圳)有限公司 Entity linking method, device, equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861939A (en) * 2017-09-30 2018-03-30 昆明理工大学 A kind of domain entities disambiguation method for merging term vector and topic model
KR101983477B1 (en) * 2017-11-28 2019-05-29 한국과학기술원 Method and System for zero subject resolution in Korean using a paragraph-based pivotal entity identification
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
CN108280062A (en) * 2018-01-19 2018-07-13 北京邮电大学 Entity based on deep learning and entity-relationship recognition method and device
CN109635297A (en) * 2018-12-11 2019-04-16 湖南星汉数智科技有限公司 A kind of entity disambiguation method, device, computer installation and computer storage medium
CN110555083A (en) * 2019-08-26 2019-12-10 北京工业大学 non-supervision entity relationship extraction method based on zero-shot
CN110569366A (en) * 2019-09-09 2019-12-13 腾讯科技(深圳)有限公司 text entity relation extraction method and device and storage medium
CN110781683A (en) * 2019-11-04 2020-02-11 河海大学 Entity relation joint extraction method
CN112560485A (en) * 2020-11-24 2021-03-26 北京三快在线科技有限公司 Entity linking method and device, electronic equipment and storage medium
CN113761218A (en) * 2021-04-27 2021-12-07 腾讯科技(深圳)有限公司 Entity linking method, device, equipment and storage medium
CN113283236A (en) * 2021-05-31 2021-08-20 北京邮电大学 Entity disambiguation method in complex Chinese text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈鹏之: "基于标签校正的端到端实体关系联合抽取", 《重庆理工大学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116702785A (en) * 2023-08-03 2023-09-05 腾讯科技(深圳)有限公司 Processing method and device of relational tag, storage medium and electronic equipment
CN116702785B (en) * 2023-08-03 2023-10-24 腾讯科技(深圳)有限公司 Processing method and device of relational tag, storage medium and electronic equipment
CN117057343A (en) * 2023-10-10 2023-11-14 腾讯科技(深圳)有限公司 Road event identification method, device, equipment and storage medium
CN117057343B (en) * 2023-10-10 2023-12-12 腾讯科技(深圳)有限公司 Road event identification method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113947087B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
WO2021147726A1 (en) Information extraction method and apparatus, electronic device and storage medium
CN107679039B (en) Method and device for determining statement intention
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN113947087B (en) Label-based relation construction method and device, electronic equipment and storage medium
CN110162786B (en) Method and device for constructing configuration file and extracting structured information
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN111858940B (en) Multi-head attention-based legal case similarity calculation method and system
CN113704429A (en) Semi-supervised learning-based intention identification method, device, equipment and medium
CN115983271B (en) Named entity recognition method and named entity recognition model training method
CN111666766A (en) Data processing method, device and equipment
CN111143574A (en) Query and visualization system construction method based on minority culture knowledge graph
CN112101029B (en) Bert model-based university teacher recommendation management method
CN115017303A (en) Method, computing device and medium for enterprise risk assessment based on news text
Xiong et al. Oracle bone inscriptions information processing based on multi-modal knowledge graph
Riquelme et al. Explaining VQA predictions using visual grounding and a knowledge base
CN114519356A (en) Target word detection method and device, electronic equipment and storage medium
CN106897274B (en) Cross-language comment replying method
Li et al. Multi-task deep learning model based on hierarchical relations of address elements for semantic address matching
CN112632223B (en) Case and event knowledge graph construction method and related equipment
CN111597330A (en) Intelligent expert recommendation-oriented user image drawing method based on support vector machine
CN109670045A (en) Emotion reason abstracting method based on ontology model and multi-kernel support vector machine
US20240028828A1 (en) Machine learning model architecture and user interface to indicate impact of text ngrams
CN112256765A (en) Data mining method, system and computer readable storage medium
CN113254814A (en) Network course video labeling method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230725

Address after: 9 Hong'an st, Lucheng Town, Tongzhou District, Beijing 100010

Patentee after: Beijing big data center

Patentee after: TAIJI COMPUTER Co.,Ltd.

Address before: 100102 No.211, Middle North Fourth Ring Road, Haidian District, Beijing

Patentee before: TAIJI COMPUTER Co.,Ltd.

Patentee before: Jia Xiaofeng

Patentee before: Jiang Qian

Patentee before: Zhang Xi