CN117473038A - Entity recommendation method and device, storage medium and electronic equipment - Google Patents

Entity recommendation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117473038A
CN117473038A CN202210840269.5A CN202210840269A CN117473038A CN 117473038 A CN117473038 A CN 117473038A CN 202210840269 A CN202210840269 A CN 202210840269A CN 117473038 A CN117473038 A CN 117473038A
Authority
CN
China
Prior art keywords
entity
label
preset granularity
entities
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210840269.5A
Other languages
Chinese (zh)
Inventor
刘焕勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3600 Technology Group Co ltd
Original Assignee
3600 Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3600 Technology Group Co ltd filed Critical 3600 Technology Group Co ltd
Priority to CN202210840269.5A priority Critical patent/CN117473038A/en
Publication of CN117473038A publication Critical patent/CN117473038A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses an entity recommending method, an entity recommending device, a storage medium and electronic equipment, wherein the method comprises the following steps: determining a link entity corresponding to the query text in response to the input query text; based on the linking entity, obtaining a corresponding entity set from each of at least two entity embedded representations, the at least two entity embedded representations comprising at least: a preset granularity label embedded representation and/or a network topology embedded representation; and determining the recommendation weight corresponding to each entity in the entity set, and recommending the entity based on the recommendation weight. According to the technical scheme of the embodiment of the application, the entity characteristic can be described from different angles by fusing the entity fine granularity embedding and the network topology embedding, and the accuracy of entity recommendation can be further improved.

Description

Entity recommendation method and device, storage medium and electronic equipment
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an entity recommending method, an entity recommending device, a storage medium and electronic equipment.
Background
With the development of internet technology, more and more people acquire various network contents through network searching, and entity searching and recommending are important scenes in searching services.
In one technical scheme, vectorization method such as word2vec is adopted to vectorize query text of a user and abstract information of an entity, and then similarity between the query text and the abstract information is further calculated. However, since the query text of the user is generally short, and the summary information of the entity is generally long text, so that the entity recommendation cannot be accurately performed, how to improve the accuracy of the entity recommendation becomes a technical problem to be solved.
Disclosure of Invention
The embodiment of the application provides an entity recommending method, an entity recommending device, a storage medium and electronic equipment, which can improve the accuracy of entity recommendation. The technical scheme is as follows:
in a first aspect, an embodiment of the present application provides an entity recommendation method, where the method includes:
determining a link entity corresponding to an input query text in response to the query text;
based on the linking entity, obtaining a corresponding entity set from each entity embedded representation of at least two entity embedded representations, the at least two entity embedded representations comprising at least: a preset granularity label embedded representation and/or a network topology embedded representation;
and determining recommendation weights corresponding to the entities in the entity set, and recommending the entities based on the recommendation weights.
In a second aspect, an embodiment of the present application provides an entity recommendation apparatus, where the apparatus includes:
the link entity determining module is used for responding to the input query text and determining a link entity corresponding to the query text;
an entity set determining module, configured to obtain, based on the link entity, a corresponding entity set from each entity embedded representation of at least two entity embedded representations, where the at least two entity embedded representations include: a preset granularity tag embedded representation and a network topology embedded representation;
and the entity recommending module is used for determining recommending weights corresponding to the entities in the entity set and recommending the entities based on the recommending weights.
In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method described above.
In a fourth aspect, embodiments of the present application provide an electronic device, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the steps of the method described above.
The technical scheme provided by some embodiments of the present application has the beneficial effects that at least includes:
on the one hand, in response to the input query text, determining a link entity corresponding to the query text, and accurately determining a target entity corresponding to the query text in an entity link mode; on the other hand, based on the link entity, a corresponding entity set is obtained from at least two entity embedded representations, wherein the at least two entity embedded representations at least comprise: the method has the advantages that the characteristics of the entity can be described from different angles by fusing the entity fine granularity embedding and the network topology embedding, so that the accuracy of entity recommendation can be improved; in still another aspect, recommendation weights corresponding to the entities in the entity set are determined, and entity recommendation is performed based on the recommendation weights, so that fine ranking can be performed on the content to be recommended according to the recommendation weights, and accuracy of entity recommendation is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application scenario of an entity recommendation method according to an embodiment of the present application;
FIG. 2 illustrates a flow diagram of an entity recommendation method provided in accordance with some embodiments of the present application;
FIG. 3 illustrates a flow diagram for generating a pre-set granularity tag-embedded representation provided in accordance with further embodiments of the present application;
FIG. 4 illustrates a flow diagram of an entity recommendation method provided in accordance with further embodiments of the present application;
fig. 5 shows a schematic structural diagram of an entity recommendation device according to an embodiment of the present application;
fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
First, terms involved in the embodiments of the present application are explained and explained.
Entity: an entity is a language object in natural language processing, e.g., an entity may be an organization, a quantity, a currency, a person's name, a company name, a geographic location, etc.
Linking entity: the method refers to a target entity corresponding to the query text in a preset entity library, and the target entity corresponding to the identified entity object (such as a person name, a place name, an organization name and the like) in the query text in the preset entity library.
Presetting granularity labels: the tags representing the preset granularity of the entity in the natural language processing, for example, the preset granularity tags may be fine granularity tags such as phrases or phrases.
Embedding the representation: refers to representing a physical object, which may be a word, a commodity, or a news, a movie, etc., by a vector that is dense in a low dimension.
Network topology embedded representation: the method is a method for Embedding the entity nodes in the graph network, and finally generated entity node embedded representation generally comprises global structure information of the graph and local similarity information of neighbor nodes.
At present, entity searching and recommending are an important scene in searching services, and the core problem of the entity searching and recommending is to calculate semantic similarity between a query text of a user and an entity. In one technical scheme, a vectorization method, such as a word2vec method, is adopted to vectorize and express a user query text and abstract information of an entity respectively, then similarity between the query text and the abstract information is further calculated, and a title corresponding to the entity is displayed to a user as recommended content, and the technical scheme has the following problems:
(1) The query text of the user is generally short, the abstract of the entity is generally long text, and the abstract contains other information (such as other entities) so that the semantics of the entity cannot be accurately expressed, and the semantics calculation of the two are distorted, so that more accurate entity description information is needed;
(2) An excessively thick entity tag cannot represent the specificity of one entity, such as in internet companies, which can be further subdivided into security companies, retail companies, etc., and thus a more fine-grained tag is required;
(3) Although the method of using fasteint can capture literal information, the network map structure between entities cannot be considered, so another entity representation method capable of describing entity network information needs to be introduced.
Based on the above, the embodiment of the application provides an entity recommendation method for fusing an entity fine-grained label with an entity intra-link network, and the entity recommendation method for fusing entity fine-grained embedding with network topology embedding can be used for describing the characteristics of the entity from different angles, so that the accuracy of entity recommendation can be improved.
The following describes in detail the technical solution of the entity recommendation method according to the embodiment of the present application with reference to the accompanying drawings.
Fig. 1 shows a schematic diagram of an application scenario of an entity recommendation method according to an embodiment of the present application.
Referring to fig. 1, a query text, such as "apple", is input on a search application interface of a client, a search control, i.e., a "search control, is clicked, and a server obtains a recommended entity below, including an apple phone, an apple watch, an apple tablet, etc., by adopting an entity recommendation method of fusing an entity fine-grained label with an entity in-link network through the input query text" apple ".
In an example embodiment, a search application is installed on a client, and a server determines a link entity corresponding to a query text in response to the query text input on the search application of the client; based on the linking entity, obtaining a corresponding entity set from each of at least two entity embedded representations, the at least two entity embedded representations comprising at least: a preset granularity label embedded representation and/or a network topology embedded representation; and determining the weight of each entity in the entity set, and recommending the entity based on the weight.
It should be noted that, the client may be an application terminal device configured with a search application, and the application terminal device may include, but is not limited to, at least one of the following: smart phones, notebook computers, tablet computers, palm computers, MIDs (Mobile Internet Devices ), desktop computers, smart televisions, etc., are not limited herein. The target client may be a video application client, an instant messaging client, a browser client, an educational application client, and the like. The server side may be a single server, a server cluster formed by a plurality of servers, or a cloud server. The foregoing is merely an example, and the embodiments of the present application are not limited thereto.
Fig. 2 illustrates a flow diagram of an entity recommendation method provided in accordance with some embodiments of the present application. The entity recommendation method may be executed by a computing device, such as a server, having a computing processing function. The entity recommendation method includes steps S210 to S230, and the entity recommendation method in the example embodiment is described in detail with reference to the accompanying drawings.
Referring to fig. 2, in response to an input query text, a link entity corresponding to the query text is determined in step S210.
In an example embodiment, the link entity refers to a target entity corresponding to the query text in a preset entity library. The client is provided with a search application program or a browser, and query text is input on a search page of the search application program or the browser. And the server responds to the input query text, and determines a link entity corresponding to the query text from a preset entity library.
For example, the server responds to the input query text, determines an embedded vector corresponding to the query text by using a vectorization representation method such as word2vec, calculates the similarity between the embedded vector corresponding to the query text and the embedded vector of the entity in the preset entity library, and determines the entity with the similarity greater than the preset threshold as the link entity corresponding to the query text.
In step S220, based on the linking entity, a corresponding entity set is obtained from each of at least two entity embedded representations, at least two entity embedded representations comprising at least: a pre-set granularity tag embedded representation and/or a network topology embedded representation.
In the exemplary embodiment, embedding the representation refers to representing a physical object, which may be a word, a commodity, or a news, a movie, etc., with a low-dimensional dense vector. The at least two entity embedded representations include at least: a pre-set granularity tag embedded representation and/or a network topology embedded representation, e.g., at least two entity embedded representations including at least a pre-set granularity tag embedded representation and a network topology embedded representation; or at least two entity embedded representations at least comprise a preset granularity tag embedded representation and a knowledge triplet embedded representation; alternatively, the at least two entity embedded representations include at least a network topology embedded representation, a knowledge triplet embedded representation, and the like. The preset granularity label of the entity can be a fine granularity label such as a phrase or a phrase corresponding to the entity. For example, let the entity be "Qilixiang", the preset granularity label may be "Zhou Jielun singing song". The embedded representation of the preset granularity label is the embedded representation of the fine granularity label of the entity point.
The network topology Embedding of the entity is a method for Embedding the entity nodes in the graph network, and finally generated entity node Embedding representation generally comprises global structure information of the graph and local similarity information of neighbor nodes. The network topology embedded representation of the entity may be determined by DeepWalk, EGES, node2vec, etc. Taking deep walk as an example, an edge is constructed between the entity of the current page and each inland entity by analyzing the encyclopedic page of the entity, an undirected graph network is formed, an entity sequence is generated by carrying out random walk on the graph network, the entity sequence is used as a training sample to be input into a context prediction neural network such as Skip-gram for training, and the network topology embedding of the entity is obtained.
Further, a similarity of the linked entity to the embedded representation of each of the at least two entity embedded representations is calculated, resulting in a set of entities having a similarity greater than a predetermined threshold. For example, the similarity between the linked entity and the embedded representation of each entity in the embedded representation of the tag with the predetermined granularity is calculated, so as to obtain an entity set formed by a plurality of embedded representations of tags with the predetermined granularity, wherein the similarity is larger than a predetermined threshold.
In step S230, recommendation weights corresponding to the respective entities in the entity set are determined, and entity recommendation is performed based on the recommendation weights.
In an example embodiment, each entity-embedded representation corresponds to one set of entities, that is, the preset granularity label corresponds to a first set of entities and the network topology-embedded representation corresponds to a second set of entities. Combining the first entity set and the second entity set, performing merging and deduplication processing to generate a merging entity set, determining recommendation weights corresponding to the entities in the merging entity set, and performing entity recommendation based on the recommendation weights corresponding to the entities, for example, recommending titles corresponding to at least one entity with the recommendation weights greater than a predetermined threshold to a client.
For example, in some example embodiments, the recommendation weights include entity weights, the occurrence frequencies of the entities in the combined entity set in the first entity set, the second entity set and the third entity set are determined, the entity weights corresponding to the entities in the combined entity set are determined based on the occurrence frequencies of the entities in the combined entity set, and the entity recommendation is performed based on the entity weights corresponding to the entities. For example, titles corresponding to a plurality of entities having entity weights greater than a predetermined threshold are recommended to the client.
In other example embodiments, the recommended weights include entity weights and tag weights, and each entity in the combined entity set is determined to be in a preset granularity tag corresponding to a < entity, preset granularity tag, and tag weights > triplet, so as to obtain a preset granularity tag set corresponding to the combined entity set; classifying the entities in the combined entity set based on each preset granularity label of the preset granularity label set to obtain an entity set corresponding to each preset granularity label; determining the total entity weight of each preset granularity label based on the entity weight of each entity in the entity set corresponding to each preset granularity label; based on the total entity weight and the label weight corresponding to each preset granularity label, determining the recommendation weight corresponding to each preset granularity label, and recommending the entity based on the recommendation weight corresponding to each preset granularity label.
According to the technical solution in the example embodiment of fig. 2, on the one hand, in response to the input query text, determining the link entity corresponding to the query text, and accurately determining the target entity corresponding to the query text by means of entity linking; on the other hand, based on the link entity, a corresponding entity set is obtained from at least two entity embedded representations, wherein the at least two entity embedded representations at least comprise: the method has the advantages that the characteristics of the entities can be described from different angles through the entity recommendation method integrating the entity fine granularity embedding and the network topology embedding, so that the accuracy of entity recommendation can be improved; in still another aspect, recommendation weights corresponding to the entities in the entity set are determined, and entity recommendation is performed based on the recommendation weights, so that the content to be recommended can be ranked according to the recommendation weights, and the accuracy of entity recommendation is further improved.
Further, to determine the network topology embedded representation of the entity, in an example embodiment, an undirected graph network is formed by parsing an encyclopedia page of the entity to be processed, and constructing an edge between the entity to be processed and the corresponding inlined entity; extracting an entity sequence corresponding to the entity to be processed by carrying out wandering on a graph network; and determining the network topology embedded representation corresponding to the entity to be processed based on the entity sequence.
For example, the physical network topology embedded representation includes: (1) constructing a network diagram; (2) network graph based entity embedded learning. These two parts are described in detail below:
(1) Network graph construction
By analyzing the encyclopedia page, an edge is constructed between the current page entity and each inland entity, and an undirected graph is formed entirely.
(2) Entity embedded learning based on network graph
The entity sequence is generated by performing wandering in the network through a network embedding method such as DeepWalk, node vec, etc., and is input into a context prediction neural network such as Skip-gram as a training sample for processing, so as to obtain the network topology embedding of the entity.
According to the technical scheme of the embodiment, the network topology embedded representation of the entities is generated, so that network characteristics among the entities can be learned by utilizing the entity in-link information, and further click jump behaviors of users on the entities can be simulated.
Furthermore, in some example embodiments, the at least two entity-embedded representations further include a triplet-embedded representation, which may be trained using, for example, a knowledge model such as a TransE. Based on the linking entity, obtaining a corresponding set of entities from each of at least two entity-embedded representations, including: based on the link entity, acquiring a corresponding first entity set from the embedded representation of the preset granularity tag; acquiring a corresponding second entity set from the network topology embedded representation based on the link entity; based on the linked entities, a corresponding third set of entities is obtained from the triplet embedded representation.
According to the technical scheme in the example embodiment, the entity fine granularity embedding, the knowledge triplet embedding and the network topology embedding are fused, and the characteristics of the entity can be furthest depicted from different angles, so that the accuracy of entity recommendation can be improved.
Further, performing entity merging and deduplication on the first entity set, the second entity set and the third entity set to generate a merged entity set; determining occurrence frequencies of all the entities in the combined entity set in the first entity set, the second entity set and the third entity set, determining entity weights corresponding to all the entities in the combined entity set based on the occurrence frequencies of all the entities in the combined entity set, and recommending the entities based on the entity weights.
According to the technical scheme in the embodiment of the example, entity recommendation is performed by fusing entity fine granularity embedding, knowledge triplet embedding and network topology embedding, and the accuracy of entity recommendation is further improved.
FIG. 3 illustrates a flow diagram for generating a pre-set granularity tag-embedded representation provided in accordance with further embodiments of the present application.
Referring to fig. 3, in step S310, preset granularity tags of respective entities are extracted from unstructured text by a predetermined entity extraction manner, and a < entity, preset granularity tag > tuple is generated.
In an example embodiment, the preset granularity label is a fine granularity label such as a short sentence, a phrase or a sentence corresponding to the entity, and the preset granularity label of each entity is extracted from the unstructured text by adopting a predetermined entity extraction mode such as a sequence labeling mode.
For example, the unstructured text may be: "Qilixiang" is ChinaTaiwan (Taiwan)Songs by singer Zhou Jielun are recorded in the same album "Qilixiang" issued in month 7 of Zhou Jielun 2004. The song in 2004 acquired ChinaHong KongTVB ten-big-gold Qu Zuijia makes a song, monitors and compiles 3 jackpots; in 2005, a plurality of rewards such as the 11 th global Chinese music list and the best songs in the list are obtained. ". The following are extracted from the unstructured text by adopting a sequence labeling mode <Entity, preset granularity label>:<Qilixiang Zhou Jielun singing song>;<Qilixiang, 2004 ChineseHong KongTVB ten-big golden starter best composition>And the like.
It should be noted that, although the two-tuple extraction is described by way of example in the case of the sequence labeling, it should be understood by those skilled in the art that the two-tuple extraction may be performed in other manners, such as sampling rule-based manner or MRC (Machine Reading Comprehension, machine reading understanding) based manner, and the like, which is also within the scope of the embodiments of the present disclosure.
In step S320, clustering is performed on the < entity, preset granularity label > tuples of each entity through clustering operation, so as to form a preset granularity label set corresponding to the non-synonym under the same entity.
In an example embodiment, because different meaning items may exist in entities with the same name, for example, the entity "apple" may be represented as fruit, mobile phone, or company, the fine granularity labels of the entities are aggregated through a clustering operation, for example, DBSCAN operation, to form a preset granularity label set corresponding to the different meaning items under the same entity name. For example, the mobile phone meaning item of the entity "apple" corresponds to the tag set { IPHONE7, IPHONE6S, IPHONE }.
In step S330, a preset granularity tag set corresponding to the non-synonym is used as a text, and a preset vector representation method is adopted to obtain a preset granularity tag embedded representation.
In an example embodiment, a corresponding set of preset granularity tags under the non-synonym is used as a text, and a preset vector representation method such as FastText is used to obtain a preset granularity tag embedded representation of the entity.
It should be noted that, although FastText is taken as an example, it should be understood by those skilled in the art that other suitable methods, such as a vector representation method of word2vec or Item2vec, may be used to obtain the preset granularity tag embedded representation of the entity.
According to the technical scheme in the example embodiment of fig. 3, on one hand, the preset granularity labels of the entities are extracted from the unstructured text through a preset entity extraction mode, so that a large number of specific entity fine granularity labels can be mined from the large-scale unstructured text, more accurate entity description information can be provided, and entity links are more accurate; on the other hand, the labels with finer granularity can improve the distinguishability among entities, and the recommendation results can be more diversified.
Further, in an example embodiment, determining a tag weight of each sense item of the same entity according to a tag frequency of a preset granularity tag corresponding to each sense item of the same entity; and generating the < entity, the preset granularity label and the label weight > triples corresponding to each sense item of the same entity according to the < entity, the preset granularity label and the label weight.
For example, to identify the heat of the non-synonym item of the same entity, the number of fine-grained labels and label frequency corresponding to the non-synonym item of the same entity may be selected, normalization processing is performed, and the result of normalization processing is used as the label weight of each meaning item, for example < apple, mobile phone, 0.8>, < apple, fruit, 0.7>, etc.
According to the technical scheme of the embodiment, the tag weight is determined through the tag frequency of the preset granularity tags corresponding to each sense item of the same entity, so that entity recommendation can be more accurately performed based on the tag weight.
Furthermore, in the exemplary embodiment, since the preset granularity tags, i.e., the fine granularity tags, which are semantically consistent but are expressed differently, are extracted from the large-scale unstructured text, it is necessary to align and fuse these different preset granularity tags into the same preset granularity tag. For example, "chinese max cyber-security company", "domestic max cyber-security company" are all aligned as a unified expression "chinese max cyber-security company".
By carrying out alignment fusion processing on preset granularity labels with consistent meanings but different expressions, the expression of the preset granularity labels on the entity can be more accurate, and thus the entity recommendation can be more accurately carried out.
Fig. 4 is a flow chart illustrating an entity recommendation method according to further embodiments of the present application.
Referring to fig. 4, in step S405, a query text vector corresponding to the query text is generated, and similarity matching is performed between the query text vector and the embedded representation of the preset granularity tag, so as to obtain a first candidate entity set.
In an example embodiment, for a query text input by a user, vectorizing the query text by using FastText and other modes to obtain a query text vector, and matching the query text vector with entities embedded in the representation with preset granularity to obtain a first candidate entity set.
In step S410, the query text vector is matched with the entity vector in the preset entity library, so as to obtain a second candidate entity set.
In an example embodiment, for a query text input by a user, vectorizing the query text by using FastText and other modes to obtain a query text vector, and matching the query text vector with an entity library to obtain a second candidate entity set.
In step S415, entities in the first candidate entity set and the second candidate entity set are scored and ranked, and a link entity corresponding to the query text is determined according to the ranking result.
In an example embodiment, entities in the first candidate entity set and the second candidate entity set are scored and ranked, and the first candidate entity is used as a final link entity.
In step S420, based on the linked entities, a corresponding first set of entities is obtained from the preset granularity tag embedded representation.
In an example embodiment, using the linked entity obtained in step S415, a first set of entities similar to the linked entity in the preset granularity tag embedding representation, i.e., the fine granularity tag embedding, is found.
In step S425, a corresponding second set of entities is obtained from the network topology embedded representation based on the linking entities.
In an example embodiment, using the linked entity obtained in step S415, searching for a second set of entities similar to the linked entity in network topology embedding;
in step S430, a corresponding third set of entities is obtained from the triplet embedded representation based on the linking entity.
In an example embodiment, using the linked entity obtained in step S415, a third set of entities that are similar in triplet embedding is found for the linked entity. For example, a knowledge model such as a TransE may be utilized, a triplet embedded representation of the entity may be obtained through training, similarity calculation may be performed between the embedded representation of the linked entity and the embedded representations of the three groups, and a third set of entities having a similarity greater than a predetermined threshold may be determined.
In step S435, entity merging and deduplication are performed on the first entity set, the second entity set, and the third entity set, a merged entity set is generated, and entity weights of the entities are determined according to the occurrence frequency.
In an example embodiment, the first entity set, the second entity set and the third entity set are combined and de-duplicated, and according to the occurrence frequency, which is used as the entity weight of the entity, the occurrence frequency of each entity in the combined entity set in the first entity set, the second entity set and the third entity set is determined, and based on the occurrence frequency of each entity in the combined entity set, the entity weight corresponding to each entity in the combined entity set is determined.
In step S440, it is determined that each entity in the merged entity set is a preset granularity tag corresponding to each entity in the < entity, preset granularity tag, tag weight > triplet, so as to obtain a preset granularity tag set corresponding to each entity in the merged entity set.
In an example embodiment, the merging entity set in step S435 is traversed, and for each entity, a preset granularity label, i.e. a fine granularity label, corresponding to the entity is found from the < entity, preset granularity label, label weight > triples, so as to obtain a preset granularity label set corresponding to each entity in the merging entity set.
In step S445, the entities in the combined entity set are classified based on each preset granularity tag of the preset granularity tag set, so as to obtain an entity set corresponding to each preset granularity tag.
In an example embodiment, according to a preset granularity label set corresponding to each entity in the combined entity set, the entities in the combined entity set are classified by taking the preset granularity label, namely the fine granularity label, as a key value, so as to obtain an entity set under the preset granularity label, and the total entity weight of the preset granularity label is the sum of all entity weights in the entity set under the preset granularity label.
In step S450, a recommendation weight corresponding to the preset granularity tag is determined.
In some example embodiments, the recommended weight includes an entity weight, the total entity weight of the preset granularity tag is the sum of all entity weights in the entity set under the preset granularity tag, and the entity weights of the entities in the entity set corresponding to each preset granularity tag are summed to determine the total entity weight of each preset granularity tag.
In other example embodiments, the recommendation weights include entity weights and tag weights, and the recommendation weights corresponding to each preset granularity tag are determined based on the total entity weights corresponding to each preset granularity tag and the tag weights in the triples. For example, a weighting operation is performed on the total entity weight and the tag weight corresponding to each preset granularity tag, and a recommendation weight corresponding to each preset granularity tag is determined.
In step S455, entity recommendation is made based on the recommendation weights.
In an example embodiment, according to the recommendation weights corresponding to the preset granularity tags in the preset granularity tag set, the preset granularity tags are displayed as recommendation results, for example, a plurality of preset granularity tags with recommendation weights greater than a predetermined threshold are determined, and are recommended to and displayed on the client.
According to the technical solution in the exemplary embodiment of fig. 4, on the one hand, the preset granularity is a fine granularity label, and the fine granularity label can provide more accurate entity description information, so that the obtained link entity is more accurate; on the other hand, the network characteristics among the entities can be learned by utilizing the entity link information, and the click jump behavior of a user to the entities can be simulated; in still another aspect, the entity recommendation method integrating the entity fine granularity embedding and the knowledge triplet embedding with the network topology embedding can describe the characteristics of the entity from different angles to the greatest extent, so that the accuracy of entity recommendation can be improved.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Fig. 5 is a schematic structural diagram of an entity recommendation device according to an exemplary embodiment of the present application.
Referring to fig. 5, the entity recommending apparatus 500 may be implemented as all or a part of an apparatus by software, hardware, or a combination of both, and the entity recommending apparatus 500 is applied to an electronic device. The entity recommendation device 500 includes a linked entity determination module 510, an entity set determination module 520, and an entity recommendation module 530. Wherein:
the link entity determining module 510 is configured to determine, in response to an input query text, a link entity corresponding to the query text.
An entity set determining module 520, configured to obtain, based on the link entity, a corresponding entity set from each entity embedded representation of at least two entity embedded representations, where the at least two entity embedded representations include at least: a pre-set granularity tag embedded representation and/or a network topology embedded representation.
The entity recommending module 530 is configured to determine a recommending weight corresponding to each entity in the entity set, and recommend an entity based on the recommending weight.
In some example embodiments, based on the above-described aspects, the apparatus further includes:
the binary group generation module is used for extracting preset granularity labels of all the entities from the unstructured text in a preset entity extraction mode to generate a < entity, preset granularity label > binary group;
The label set generating module is used for carrying out clustering processing on the preset granularity labels of the binary groups of each entity through clustering operation to form a preset granularity label set corresponding to the non-synonymous item under the same entity;
and the tag embedding representation module is used for taking a preset granularity tag set corresponding to the non-synonym item as a text, and obtaining the preset granularity tag embedding representation by adopting a preset vector representation method.
In some example embodiments, based on the above-described aspects, the apparatus further includes:
the tag frequency determining module is used for determining tag frequencies of preset granularity tags corresponding to each sense item of the same entity;
the tag weight determining module is used for carrying out normalization processing on the tag frequency and determining the tag weight of each sense item of the same entity based on the normalization processing result;
and the triplet generation module is used for generating the < entity, the preset granularity label and the label weight > triplet corresponding to each sense item of the same entity according to the < entity, the preset granularity label > triplet and the label weight.
In some example embodiments, based on the above-described scheme, the at least two entity-embedded representations further comprise a triplet-embedded representation, and the entity-set determining module 520 is further configured to:
Based on the link entity, acquiring a corresponding first entity set from the embedded representation of the preset granularity tag;
based on the link entity, acquiring a corresponding second entity set from the network topology embedded representation;
based on the linked entity, a corresponding third set of entities is obtained from the triplet embedded representation.
In some example embodiments, based on the above scheme, the entity recommendation module 530 is further configured to:
performing entity merging and deduplication on the first entity set, the second entity set and the third entity set to generate a merged entity set;
determining the occurrence frequency of each entity in the combined entity set in the first entity set, the second entity set and the third entity set,
and determining entity weights corresponding to the entities in the combined entity set based on the occurrence frequency of the entities in the combined entity set.
In some example embodiments, based on the above-described scheme, the recommendation weights include entity weights and tag weights, and the entity recommendation module 530 is further configured to:
determining the corresponding preset granularity labels of all the entities in the combined entity set in the < entity, preset granularity label and label weight > triples to obtain a preset granularity label set corresponding to all the entities in the combined entity set;
Classifying the entities in the combined entity set based on each preset granularity label of the preset granularity label set to obtain an entity set corresponding to each preset granularity label;
determining the total entity weight of each preset granularity label based on the entity weight of each entity in the entity set corresponding to each preset granularity label;
and determining recommended weights corresponding to the preset granularity labels based on the total entity weights corresponding to the preset granularity labels and the label weights.
In some example embodiments, based on the above-described aspects, the apparatus further includes:
the graph network generation module is used for constructing an edge between the entity to be processed and the corresponding inland entity by analyzing the encyclopedic page of the entity to be processed to form a graph network;
the entity sequence extraction module is used for extracting an entity sequence corresponding to the entity to be processed by carrying out migration on the graph network;
and the network topology embedding generation module is used for determining the network topology embedding representation corresponding to the entity to be processed based on the entity sequence.
In some example embodiments, based on the above scheme, the link entity determination module 510 is further configured to:
Vectorizing the query text to generate a query text vector corresponding to the query text;
performing similarity matching on the query text vector and the embedded representation of the preset granularity tag to obtain a first candidate entity set;
matching the query text vector with entity vectors in a preset entity library to obtain a second candidate entity set;
scoring and sorting the entities in the first candidate entity set and the second candidate entity set, and determining the link entity corresponding to the query text according to the sorting result.
It should be noted that, when the entity recommending apparatus provided in the foregoing embodiment executes the entity recommending method, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above.
In addition, the entity recommending device and the entity recommending method provided in the foregoing embodiments belong to the same concept, which embody detailed implementation procedures in the method embodiments, and are not described herein again.
The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are adapted to be loaded by a processor and execute the entity recommendation method in the foregoing embodiment, and a specific execution process may refer to a specific description of the foregoing embodiment, which is not repeated herein.
The embodiment of the present application further provides a computer program product, where at least one instruction is stored in the computer program product, where the at least one instruction is loaded by the processor and executed by the processor to perform the entity recommendation method in the foregoing embodiment, and a specific execution process may refer to a specific description of the foregoing embodiment, which is not repeated herein.
The embodiment of the present application further provides a chip configured to execute the entity recommendation method in the above embodiment, and the specific execution process may refer to the specific description of the above embodiment, which is not described herein.
In addition, referring to fig. 6, a schematic structural diagram of an electronic device is provided in an embodiment of the present application. As shown in fig. 6, the electronic device 600 may include: at least one processor 601, at least one communication module 604, an input output interface 603, a memory 605, at least one communication bus 602.
Wherein the communication bus 602 is used to enable connected communications between these components.
The input/output interface 603 may include a Display screen (Display) and a Camera (Camera), and the optional input/output interface 603 may further include a standard wired interface and a standard wireless interface.
The communication module 604 may optionally include a standard wired interface, a wireless interface (e.g., WIFI interface), among others.
Wherein the processor 601 may include one or more processing cores. The processor 601 utilizes various interfaces and lines to connect various portions of the overall electronic device 600, perform various functions of the electronic device 600 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 605, and invoking data stored in the memory 605. Alternatively, the processor 601 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 601 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an entity recommender (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 601 and may be implemented by a single chip.
The Memory 605 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 605 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 605 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 605 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 605 may also optionally be at least one storage device located remotely from the processor 601. As shown in fig. 6, an operating system, a communication module, an input-output interface module, and an entity recommendation application may be included in the memory 605, which is one type of computer storage medium.
In the electronic device 600 shown in fig. 6, the input/output interface 603 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 601 may be configured to invoke the entity recommendation program stored in the memory 605 such that the processor 601 performs steps in the entity recommendation method according to various exemplary embodiments of the present disclosure. For example, the processor 601 may perform the steps as shown in fig. 2: step S210, responding to the input query text, and determining a link entity corresponding to the query text; step S220, based on the link entity, obtaining a corresponding entity set from each entity embedded representation of at least two entity embedded representations, where the at least two entity embedded representations at least include: a preset granularity label embedded representation and/or a network topology embedded representation; step S230, determining recommendation weights corresponding to the entities in the entity set, and recommending the entities based on the recommendation weights.
The foregoing is a schematic solution of an electronic device according to an embodiment of the present disclosure. It should be noted that, the technical solution of the electronic device and the technical solution of the entity recommendation processing method belong to the same concept, and details of the technical solution of the electronic device, which are not described in detail, can be referred to the description of the technical solution of the entity recommendation processing method.
In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it is to be understood that the terms "comprise" and "have," and any variations thereof, are intended to cover non-exclusive inclusions, unless otherwise specifically defined and defined. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context. Furthermore, in the description of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.
The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims (10)

1. An entity recommendation method, the method comprising:
determining a link entity corresponding to an input query text in response to the query text;
based on the linking entity, obtaining a corresponding entity set from each entity embedded representation of at least two entity embedded representations, the at least two entity embedded representations comprising at least: a preset granularity label embedded representation and/or a network topology embedded representation;
and determining recommendation weights corresponding to the entities in the entity set, and recommending the entities based on the recommendation weights.
2. The method according to claim 1, wherein the method further comprises:
extracting preset granularity labels of all entities from unstructured text by a preset entity extraction mode to generate a (entity, preset granularity label) binary group;
clustering the preset granularity labels of the binary groups of each entity through clustering operation to form a preset granularity label set corresponding to the non-synonymous item under the same entity;
and taking the preset granularity tag set corresponding to the non-synonym as a text, and obtaining the embedded representation of the preset granularity tag by adopting a preset vector representation method.
3. The method according to claim 2, wherein the method further comprises:
determining label frequency of preset granularity labels corresponding to each sense item of the same entity;
normalizing the label frequency, and determining the label weight of each sense item of the same entity based on the normalization result;
and generating the < entity, the preset granularity label and the label weight > triples corresponding to each sense item of the same entity according to the < entity, the preset granularity label > triples and the label weight.
4. A method according to any of claims 1 to 3, wherein the at least two entity-embedded representations further comprise a triplet-embedded representation, the obtaining, based on the linked entities, a corresponding set of entities from each of the at least two entity-embedded representations, comprising:
Based on the link entity, acquiring a corresponding first entity set from the embedded representation of the preset granularity tag;
based on the link entity, acquiring a corresponding second entity set from the network topology embedded representation;
based on the linked entity, a corresponding third set of entities is obtained from the triplet embedded representation.
5. The method of claim 4, wherein determining the recommendation weights for each entity in the set of entities comprises:
performing entity merging and deduplication on the first entity set, the second entity set and the third entity set to generate a merged entity set;
determining the occurrence frequency of each entity in the combined entity set in the first entity set, the second entity set and the third entity set,
and determining entity weights corresponding to the entities in the combined entity set based on the occurrence frequency of the entities in the combined entity set.
6. The method of claim 5, wherein the recommendation weights include entity weights and tag weights, and wherein the determining recommendation weights for each entity in the set of entities comprises:
Determining the corresponding preset granularity labels of all the entities in the combined entity set in the < entity, preset granularity label and label weight > triples to obtain a preset granularity label set corresponding to all the entities in the combined entity set;
classifying the entities in the combined entity set based on each preset granularity label of the preset granularity label set to obtain an entity set corresponding to each preset granularity label;
determining the total entity weight of each preset granularity label based on the entity weight of each entity in the entity set corresponding to each preset granularity label;
and determining recommended weights corresponding to the preset granularity labels based on the total entity weights corresponding to the preset granularity labels and the label weights.
7. The method according to claim 1, wherein the method further comprises:
an edge is constructed between the entity to be processed and the corresponding inland entity by analyzing an encyclopedic page of the entity to be processed, so as to form a graph network;
extracting an entity sequence corresponding to the entity to be processed by performing migration on the graph network;
and determining the network topology embedded representation corresponding to the entity to be processed based on the entity sequence.
8. An entity recommendation device, the device comprising:
the link entity determining module is used for responding to the input query text and determining a link entity corresponding to the query text;
the entity set determining module is configured to obtain, based on the link entity, a corresponding entity set from each entity embedded representation of at least two entity embedded representations, where the at least two entity embedded representations include at least: a preset granularity label embedded representation and/or a network topology embedded representation;
and the entity recommending module is used for determining recommending weights corresponding to the entities in the entity set and recommending the entities based on the recommending weights.
9. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method according to any one of claims 1 to 7.
10. An electronic device, comprising: a processor and a memory storing a computer program adapted to be loaded by the processor and to perform the steps of the method according to any one of claims 1 to 7.
CN202210840269.5A 2022-07-18 2022-07-18 Entity recommendation method and device, storage medium and electronic equipment Pending CN117473038A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210840269.5A CN117473038A (en) 2022-07-18 2022-07-18 Entity recommendation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210840269.5A CN117473038A (en) 2022-07-18 2022-07-18 Entity recommendation method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117473038A true CN117473038A (en) 2024-01-30

Family

ID=89631640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210840269.5A Pending CN117473038A (en) 2022-07-18 2022-07-18 Entity recommendation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117473038A (en)

Similar Documents

Publication Publication Date Title
CN110069709B (en) Intention recognition method, device, computer readable medium and electronic equipment
KR101754473B1 (en) Method and system for automatically summarizing documents to images and providing the image-based contents
US20130060769A1 (en) System and method for identifying social media interactions
CN111046221A (en) Song recommendation method and device, terminal equipment and storage medium
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
CN109325146A (en) A kind of video recommendation method, device, storage medium and server
CN107526718A (en) Method and apparatus for generating text
Qian et al. Detecting new Chinese words from massive domain texts with word embedding
CN112685648A (en) Resource recommendation method, electronic device and computer-readable storage medium
US20090327877A1 (en) System and method for disambiguating text labeling content objects
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN112417133A (en) Training method and device of ranking model
CN116955591A (en) Recommendation language generation method, related device and medium for content recommendation
Wei et al. Online education recommendation model based on user behavior data analysis
CN111400456A (en) Information recommendation method and device
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN112926341A (en) Text data processing method and device
CN113032676A (en) Recommendation method and system based on micro-feedback
Fu et al. Attribute‐Sentiment Pair Correlation Model Based on Online User Reviews
CN114647739B (en) Entity chain finger method, device, electronic equipment and storage medium
CN113807920A (en) Artificial intelligence based product recommendation method, device, equipment and storage medium
CN117473038A (en) Entity recommendation method and device, storage medium and electronic equipment
CN116484085A (en) Information delivery method, device, equipment, storage medium and program product
CN113822065A (en) Keyword recall method and device, electronic equipment and storage medium
Li Deep Learning‐Based Natural Language Processing Methods for Sentiment Analysis in Social Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination