CN117743674A - Resource recall method, computer device and storage medium - Google Patents

Resource recall method, computer device and storage medium Download PDF

Info

Publication number
CN117743674A
CN117743674A CN202310177157.0A CN202310177157A CN117743674A CN 117743674 A CN117743674 A CN 117743674A CN 202310177157 A CN202310177157 A CN 202310177157A CN 117743674 A CN117743674 A CN 117743674A
Authority
CN
China
Prior art keywords
resource
topic
node
nodes
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310177157.0A
Other languages
Chinese (zh)
Inventor
沈慧
苏睿龙
魏逸
欧宝源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaohongshu Technology Co ltd
Original Assignee
Xiaohongshu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaohongshu Technology Co ltd filed Critical Xiaohongshu Technology Co ltd
Priority to CN202310177157.0A priority Critical patent/CN117743674A/en
Publication of CN117743674A publication Critical patent/CN117743674A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a resource recall method, computer equipment and a storage medium. The method comprises the following steps: acquiring seed resources of the interaction behavior of the target object in a preset time period and topic information of the seed resources; searching a first resource with similarity higher than a similarity threshold value with the seed resource based on the seed resource and the heterogeneous graph; the heterogeneous graph comprises a plurality of nodes, wherein the nodes comprise topic nodes and resource nodes, and the weights of the edges of every two nodes in the nodes are used for indicating the association degree between every two nodes; searching a second resource associated with the topic information based on the topic information and the heterogeneous graph; and taking the first resource and the second resource as recall resources of the target object. By adopting the method and the device, the resources which are interested by the object can be accurately found based on the topics which are interested by the object, so that the accuracy of resource recall is improved.

Description

Resource recall method, computer device and storage medium
Technical Field
The present disclosure relates to the field of computer applications, and in particular, to a resource recall method, a computer device, and a storage medium.
Background
With the rapid development of computing technology, recommending resources (such as resource recall) for users in various social applications and search scenarios is a hotspot of great research. Currently, recall of resources is performed for a target user, and the recall resources are generally determined by calculating the matching degree between the target user and the resources through the interaction behavior of the target user. However, the inventor finds that the effect of recalling the resources for the target users with less interaction behaviors is usually poor in the practice process, for example, the interaction behaviors of the target users are less, the matching degree of the target users or the resources is difficult to calculate accurately based on the interaction behaviors of the target users, and the error is large. How to improve the accuracy of resource recall is a highly desirable problem.
Disclosure of Invention
The embodiment of the application provides a resource recall method, computer equipment and a storage medium, which can accurately find out the resource of interest of a certain object based on the topic of interest of the object, thereby improving the accuracy of resource recall.
In one aspect, an embodiment of the present application provides a resource recall method, where the method includes:
acquiring seed resources of interaction behaviors generated by a target object in a preset time period and topic information of the seed resources;
Searching a first resource with similarity higher than a similarity threshold value with the seed resource based on the seed resource and the heterogram; the heterogeneous graph comprises a plurality of nodes, wherein the nodes comprise topic nodes and resource nodes, and the weights of the edges of every two nodes in the nodes are used for indicating the association degree between every two nodes;
searching a second resource associated with the topic information based on the topic information and the heterogeneous graph;
and taking the first resource and the second resource as recall resources of the target object.
In one embodiment, the searching for the first resource with the similarity to the seed resource higher than the similarity threshold based on the seed resource and the heterogram includes:
determining a target resource node for indicating the seed resource in the heterogram;
acquiring the distance between the target resource node and each first candidate node based on at least one first candidate node which has a connection relation with the target resource node and the weight of the edges of each two nodes in the target resource node and the at least one first candidate node;
searching a first candidate node with the distance between the first candidate node and the target resource node being smaller than a distance threshold value, and taking the resource indicated by the searched first candidate node as the first resource.
In one embodiment, the searching the second resource associated with the topic information based on the topic information and the heterogeneous graph includes:
determining a target topic node for indicating the topic information in the heterogram;
acquiring the distance between the target topic node and each second candidate node based on at least one second candidate node in a connection relation with the target topic node and the weight of the edges of each two nodes in the target topic node and the at least one second candidate node;
searching a second candidate node with the distance smaller than a distance threshold value from the target topic node, and taking the resource indicated by the searched second candidate node as the second resource.
In one embodiment, the first resource and the second resource are found by calling a resource recall model, and the method further comprises:
obtaining a training sample, wherein the training sample comprises topic information of a plurality of interacted resources with interaction behaviors in a historical time period and at least one group of training resource pairs; wherein each set of training resource pairs comprises two training resources, one of which is a similar training resource to the other training resource;
Invoking an initial resource recall model, and acquiring distances between each training resource and any training resource except each training resource in the training resources contained in the at least one group of training resource pairs based on the training resources in the at least one group of training resource pairs, topic information of each interacted resource and an initial heterogram; wherein the initial heterogram is constructed based on training resources in the at least one set of training resource pairs and topic information for the respective interacted resources;
and training the initial resource recall model by taking the distance between two neighbor nodes in the initial heterogram as a training target and increasing the distance between two non-neighbor nodes in the initial heterogram to obtain the resource recall model.
In one embodiment, the initial iso-composition is constructed in a manner that includes:
determining resource nodes for indicating various training resources in the training sample and topic nodes for indicating various topic information in the training sample;
traversing any two nodes of the determined resource nodes and the determined topic nodes, and determining weights of edges of the currently traversed resource nodes and the currently traversed topic nodes based on topic information of training resources indicated by the currently traversed resource nodes and the topic information of the plurality of interacted resources when the two currently traversed nodes comprise one topic node and one resource node;
When the two currently traversed nodes comprise two topic nodes, determining weights of edges of the two currently traversed topic nodes based on the number of training resources corresponding to topic information indicated by each currently traversed topic node and the number of identical training resources corresponding to topic information indicated by the two currently traversed topic nodes in training resources contained in the training sample;
determining weights of edges of the two currently traversed resource nodes based on whether training resources indicated by the two currently traversed resource nodes are training resources in the same group of training resource pairs under the condition that the two currently traversed nodes comprise the two resource nodes;
after the traversing is finished, the initial heterograms are constructed based on the determined resource nodes, the determined topic nodes, the weights of the edges of each resource node and topic node, and the weights of the edges of each topic node and topic node.
In one embodiment, the determining the weight of the edge of the currently traversed two topic nodes based on the number of training resources corresponding to the topic information indicated by the currently traversed topic nodes and the number of the same training resources corresponding to the topic information indicated by the currently traversed two topic nodes in the training resources contained in the training sample includes:
Comparing the quantity of training resources corresponding to topic information indicated by the two currently traversed topic nodes to determine the quantity with smaller value;
and taking the ratio of the number of the same training resources corresponding to the topic information indicated by the two currently traversed topic nodes and the determined number as the weight of the edges of the two currently traversed topic nodes.
In one embodiment, the determining a resource node for indicating each training resource in the training sample, and a topic node for indicating each topic information in the training sample, includes:
filtering long-tail topics in topic information contained in the training sample to obtain filtered topic information, and determining topic nodes for indicating each piece of filtered topic information; wherein, the long tail topics refer to: among training resources contained in the training samples, topic information of which the number of corresponding training resources is smaller than a first number threshold value;
filtering long tail resources in training resources contained in the training samples to obtain filtered training resources, and determining resource nodes for indicating each filtered training resource; wherein, the long tail resource refers to: training resources having a number of connections to other ones of the training resources contained in the training sample less than a second number threshold.
In one embodiment, the invoking the initial resource recall model, based on the training resources in the at least one set of training resource pairs, topic information of each interacted resource and an initial heterogram, obtains distances between each training resource and any training resource except for each training resource in the training resources included in the at least one set of training resource pairs, and includes:
sampling nodes in the initial heterogram based on the weights of the edges of each resource node and each topic node in the initial heterogram and the weights of the edges of each topic node and each topic node to obtain a plurality of node triples; wherein each node triplet includes three nodes;
aiming at any node triplet, acquiring the distance between every two nodes in the node triplet; the two nodes are neighbor nodes or non-neighbor nodes, and the two nodes are neighbor nodes or non-neighbor nodes which are determined based on the connection relation of the two nodes in the initial heterograph.
On the other hand, the embodiment of the application provides a resource recall device, which comprises:
The system comprises an acquisition unit, a storage unit and a control unit, wherein the acquisition unit is used for acquiring seed resources of interaction behaviors generated by a target object in a preset time period and topic information of the seed resources;
the first searching unit is used for searching a first resource with similarity higher than a similarity threshold value with the seed resource based on the seed resource and the iso-graph; the heterogeneous graph comprises a plurality of nodes, wherein the nodes comprise topic nodes and resource nodes, and the weights of the edges of every two nodes in the nodes are used for indicating the association degree between every two nodes;
the second searching unit is used for searching a second resource associated with the topic information based on the topic information and the heterogeneous graph;
and the determining unit is used for taking the first resource and the second resource as recall resources of the target object.
In another aspect, an embodiment of the present application provides a computer device, including a processor, a storage device, and a communication interface, where the processor, the storage device, and the communication interface are connected to each other, where the storage device is configured to store a computer program that supports the computer device to perform the method, the computer program includes program instructions, and the processor is configured to invoke the program instructions to perform the following steps:
Acquiring seed resources of interaction behaviors generated by a target object in a preset time period and topic information of the seed resources;
searching a first resource with similarity higher than a similarity threshold value with the seed resource based on the seed resource and the heterogram; the heterogeneous graph comprises a plurality of nodes, wherein the nodes comprise topic nodes and resource nodes, and the weights of the edges of every two nodes in the nodes are used for indicating the association degree between every two nodes;
searching a second resource associated with the topic information based on the topic information and the heterogeneous graph;
and taking the first resource and the second resource as recall resources of the target object.
In another aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the above-described resource recall method.
In another aspect, embodiments of the present application provide a computer program product comprising a computer program adapted to be loaded by a processor and to perform the above-described resource recall method.
In the embodiment of the application, as the topics are common functions of community-form products, the topics can more directly express the content topics of the content creator for the content creator, and in addition, the topics can help consumers to find the content of the interested topics more conveniently for the consumers. Based on the method, seed resources of interaction behaviors generated by a target object in a preset time period and topic information of the seed resources are obtained, then, based on the seed resources and a heterogeneous graph, a first resource with similarity higher than a similarity threshold value with the seed resources is searched, the heterogeneous graph comprises a plurality of nodes, the nodes comprise topic nodes and resource nodes, the weight of the edges of each two nodes in the nodes is used for indicating the relevance degree between each two nodes, based on the topic information and the heterogeneous graph, a second resource associated with the topic information is searched, and further, the first resource and the second resource are used as recall resources of the target object, so that the topic information and the resources can be mapped to a uniform semantic space, the relation between the topic information and the resources is learned, the resources interested by the target object can be accurately found, and the richness of the resources recalled by the target object can be improved. In addition, under the condition that the interaction behaviors of the target object are fewer, recall resources of the object cannot be accurately identified relative to a small amount of interaction behaviors based on the target object, and the embodiment of the application can accurately find the resources of interest of the object based on topics of interest of the object, so that the accuracy of resource recall can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a resource recall method according to an embodiment of the present application;
FIG. 2 is a flow chart of another resource recall method provided by an embodiment of the present application;
FIG. 3a is a schematic illustration of an iso-patterning provided in an embodiment of the present application;
FIG. 3b is a schematic illustration of another iso-patterning provided by an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a resource recall device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
In the whole link system of the recommendation system, recall is the most basic module, and has the significance of narrowing the calculation range of candidate sets, and is responsible for selecting a set possibly interested by a user from a large number of candidate sets, wherein the closer the recall resource is to the interest of the user, the higher the accuracy of the whole recommendation system is. But in a short period of time of user registration, the interaction behaviors of the user which can be acquired are fewer, or the interaction between the user and the content release platform is fewer, so that the interaction behaviors of the user which can be acquired are fewer, and resources which the user may be interested in cannot be accurately calculated based on a small amount of interaction behaviors. That is, in the existing recall mode, in a scene with fewer interaction behaviors of the user, the accuracy of resource recall cannot be ensured.
The inventor finds that the topics are common functions of community-form products, the topics can more directly express the content topics of the content creator for the content creator, and in addition, the topics can help consumers to find the content of the interested topics more conveniently for the consumers. Based on the topic, the topic information and the resources are mapped to the unified semantic space, so that the relation between the topic information and the resources is learned, the resources which are interested by the object are accurately found based on the topics which are interested by the object, the accuracy of resource recall can be improved, and the interaction and the viscosity of a user on a content release platform are improved.
A target object refers to a user, and in particular may refer to at least one user, or at least one type of user.
A resource refers to a resource released by a content release platform, such as an electronic resource or an entity resource. The assets may include advertisements, video, audio, text, images, merchandise, or the like. By way of example, the resources may include notes or short videos, among others.
In the specific embodiments of the present application, data related to an object, such as seed resources for generating interaction behavior by a target object within a preset period of time, when the embodiments of the present application are applied to specific products or technologies, permission or consent of the object needs to be obtained, and collection, use and processing of related data need to comply with local laws and regulations and standards.
The resource recall method provided by the embodiment of the application can be applied to a resource recall device, the resource recall device can be installed or integrated in a content release platform, the content release platform can be operated in computer equipment, the computer equipment can comprise terminal equipment or a server and the like, and the computer equipment comprises but is not limited to smart phones, vehicle-mounted equipment, wearable equipment or computers and the like.
Referring to fig. 1, fig. 1 is a schematic flow chart of a resource recall method provided in an embodiment of the present application, where the resource recall method may be executed by a resource recall device or a computer device; the resource recall scheme shown in fig. 1 includes, but is not limited to, steps S101 to S104, wherein:
s101, seed resources of interaction behaviors generated by the target object in a preset time period and topic information of the seed resources are obtained.
The seed resources refer to resources for generating interaction behaviors of the target object in a preset time period, and the number of the seed resources can be one or more. The preset time period refers to a time period that is formed with a time point where the current system time is less than a preset time length, for example, the preset time period may be the last month, the last week, or the last 1 hour, etc. The interaction behavior refers to an operation behavior of interaction between the target object and the seed resource, such as praise, sharing, comment or collection. The topic information of the seed resource can be used to describe the content subject of the seed resource, for example, the seed resource is a note which shares a set of infant tuina manipulations for strengthening the lung when the difference of day and night is large and the child coughs and diarrhea in autumn, and then the topic information of the seed resource can be "infant tuina".
For example, in the process of recalling the target object, a seed resource that the target object generates an interaction in a preset time period may be obtained first, and the target object interacts with the seed resource to indicate that the seed resource is likely to be a resource of interest to the target object, so that the first resource with a similarity higher than the similarity threshold value with the seed resource may be searched based on the seed resource and the heterogeneous graph. Since the topics of the seed resource are most likely topics of interest to the target object, topic information of the seed resource can be acquired, and a second resource associated with the topic information is searched based on the topic information and the heterogeneous graph. Then, the first resource and the second resource are taken as recall resources of the target object. That is, if the target object is interested in the seed resource, the target object is very likely to be interested in the topic of the seed resource, so that the embodiment of the application can find the resource similar to the seed resource and find the resource corresponding to the topic similar to the topic of the seed resource, and the found resource is very likely to be the resource interested in the target object, so that the found resource is taken as the recall resource of the target object, and the accuracy of the recall of the resource can be improved.
Optionally, if topic information of the seed resource does not exist, topic information of the seed resource can be obtained based on description information of the seed resource. The descriptive information includes one or more of the following: branding, keywords, categories, or labels, etc. For example, the description information of the seed resource is used as topic information of the seed resource, or keyword extraction is performed on the description information of the seed resource, so as to obtain topic information of the seed resource.
S102, searching a first resource with similarity higher than a similarity threshold value with the seed resource based on the seed resource and the heterogeneous graph.
The heterogeneous graph comprises a plurality of nodes, the nodes comprise topic nodes and resource nodes, and the weights of edges of every two nodes in the nodes are used for indicating the association degree between every two nodes. That is, the heterograms may be used to characterize the relationship between the resource and the resource, the relationship between the resource and the topic, and the relationship between the topic and the topic, and the resource similar to the seed resource, i.e., the first resource, may be searched for by the relationship between the topic and the topic, and the relationship between the resource and the topic, which may be characterized by the heterograms, and the resource corresponding to the topic similar to the topic of the seed resource, i.e., the second resource.
In one implementation manner, a target resource node for indicating seed resources can be determined in a heterogeneous graph, the distance between the target resource node and each first candidate node is obtained based on at least one first candidate node having a connection relation with the target resource node and the weight of edges of each two nodes in the target resource node and the at least one first candidate node, the first candidate node with the distance smaller than a distance threshold value with the target resource node is searched, and the resource indicated by the searched first candidate node is used as the first resource.
In one example, assuming that the seed resource includes resource 1, the target resource node in the heterogeneous graph for indicating resource 1 is resource node 1, and in the heterogeneous graph, resource node 1 is connected to resource node 2 and resource node 3, it may be determined that at least one first candidate node having a connection relationship with resource node 1 includes resource node 2 and resource node 3. Further, the distance between the resource node 1 and the resource node 2 may be obtained based on the weights of the edges of the resource node 1 and the resource node 2 in the heterogram, where the distance between the two nodes and the weight of the edges of the two nodes are inversely related, specifically, the greater the weights of the edges of the resource node 1 and the resource node 2, the smaller the distance between the resource node 1 and the resource node 2, that is, the more similar the resources indicated by the resource node 1 and the resources indicated by the resource node 2. Similarly, the distance between the resource node 1 and the resource node 3 may be obtained based on the weights of the edges of the resource node 1 and the resource node 3 in the heterograms. If the distance between a certain resource node and the resource node 1 is smaller than the distance threshold, it may be understood that the similarity between the resource indicated by the resource node and the resource indicated by the resource node 1 is higher than the similarity threshold, so that the resource indicated by the resource node may be used as the first resource. The distance threshold is a preset distance threshold, and the distance threshold may be set by a developer according to experience, or may be obtained through neural network learning, and is not limited in the embodiment of the present application.
Optionally, if the distance between a certain resource node and the resource node 1 is smaller than the distance threshold, not only the resource indicated by the resource node may be used as the first resource, but also the next-hop node (i.e. the node having a connection relationship with the resource node) of the resource node may be obtained, if the next-hop node is the resource node, the distance between the resource node and the next-hop node may be obtained based on the weights of the edges of the resource node and the next-hop node, if the distance between the resource node and the next-hop node is smaller than the distance threshold, it may be determined that the distance between the resource node 1 and the next-hop node is smaller than the distance threshold, and it may be understood that the similarity between the resource indicated by the next-hop node and the resource indicated by the resource node 1 is higher than the similarity threshold, so that the resource indicated by the next-hop node may be used as the first resource.
Alternatively, if the next-hop node is a topic node (e.g., topic node 3), the distance between the resource node and the topic node 3 may be obtained based on the weights of the edges of the resource node and the topic node 3, and if the distance between the resource node and the topic node 3 is less than the distance threshold, it may be determined that the resource indicated by the resource node is strongly related to the topic indicated by the topic node 3. Further, a next-hop node of the topic node 3 may be obtained, if the next-hop node of the topic node 3 is a resource node, a distance between the topic node 3 and the next-hop node may be obtained based on weights of edges of the topic node 3 and the next-hop node, if the distance between the topic node 3 and the next-hop node is smaller than a distance threshold, it may be determined that the distance between the resource node 1 and the next-hop node is smaller than the distance threshold, and it may be understood that a similarity between a resource indicated by the next-hop node and a resource indicated by the resource node 1 is higher than the similarity threshold, so that the resource indicated by the next-hop node may be used as the first resource.
In another example, assuming that the seed resource includes resource 1, a target resource node in the heterogeneous graph for indicating resource 1 is resource node 1, and in the heterogeneous graph, resource node 1 and topic node 2 are connected, it may be determined that at least one first candidate node having a connection relationship with resource node 1 includes topic node 1 and topic node 2. Further, the distance between the resource node 1 and the topic node 1 may be obtained based on the weights of the edges of the resource node 1 and the topic node 1 in the heterogram, where the distance between the two nodes and the weight of the edges of the two nodes are inversely related, specifically, the greater the weights of the edges of the resource node 1 and the topic node 1, the smaller the distance between the resource node 1 and the topic node 1, that is, the more related the resource indicated by the resource node 1 and the topic indicated by the topic node 1. Similarly, the distance between the resource node 1 and the topic node 2 can be obtained based on the weights of the edges of the resource node 1 and the topic node 2 in the heterogram.
If the distance between a certain topic node and the resource node 1 is smaller than the distance threshold, it is indicated that the topic indicated by the topic node is strongly related to the resource indicated by the resource node 1, then the next hop node of the topic node may be obtained, if the next hop node is a resource node, then the distance between the topic node and the next hop node may be obtained based on the weights of the edges of the topic node and the next hop node, if the distance between the topic node and the next hop node is smaller than the distance threshold, then the distance between the resource node 1 and the next hop node may be determined to be smaller than the distance threshold, and it may be understood that the similarity between the resource indicated by the next hop node and the resource indicated by the resource node 1 is higher than the similarity threshold, so that the resource indicated by the next hop node may be used as the first resource.
Optionally, if the topic indicated by the topic node 1 is strongly related to the resource indicated by the resource node 1, and the next-hop node of the topic node 1 is the topic node (for example, the topic node 4), the distance between the topic node 1 and the topic node 4 may be obtained based on the weights of the edges of the topic node 1 and the topic node 4, and if the distance between the topic node 1 and the topic node 4 is less than the distance threshold, it may be determined that the topic indicated by the topic node 1 and the topic indicated by the topic node 4 are similar topics. Further, a next-hop node of the topic node 4 may be obtained, if the next-hop node of the topic node 4 is a resource node, a distance between the topic node 4 and the next-hop node may be obtained based on weights of edges of the topic node 4 and the next-hop node, if the distance between the topic node 4 and the next-hop node is smaller than a distance threshold, it may be determined that the distance between the resource node 1 and the next-hop node is smaller than the distance threshold, and it may be understood that a similarity between a resource indicated by the next-hop node and a resource indicated by the resource node 1 is higher than the similarity threshold, so that the resource indicated by the next-hop node may be used as the first resource.
In this embodiment of the present application, the number of hops between the seed resource and the first resource may be limited, and specifically, if the number of hops between the seed resource and the first resource is limited to be smaller than n, the first resource may be found in the first hop node, the second hop node, …, and the n-1 hop node of the seed resource. For example, in the case where the number of hops between the seed resource and the first resource is defined to be less than 3, the first resource may be searched in the first and second hop nodes of the seed resource.
S103, searching a second resource related to the topic information based on the topic information and the heterogeneous graph.
In one implementation manner, a target topic node for indicating topic information can be determined in the heterogeneous graph, the distance between the target topic node and each second candidate node is obtained based on at least one second candidate node which has a connection relation with the target topic node and the weight of the edges of each two nodes in the target topic node and the at least one second candidate node, the second candidate node with the distance smaller than a distance threshold value between the target topic node and the target topic node is searched, and the resource indicated by the searched second candidate node is used as a second resource.
In one example, assuming that the seed resource includes resource 1, the topic information of resource 1 includes topic 1, a node indicating topic 1 in the heterogeneous graph is topic node 1, and in the heterogeneous graph, topic node 1 and resource node 2 and resource node 3 are connected, it may be determined that at least one node having a connection relationship with topic node 1 includes resource node 2 and resource node 3. Further, the distance between the resource node 2 and the topic node 1 may be obtained based on the weights of the edges of the topic node 1 and the resource node 2, and if the distance between the resource node 2 and the topic node 1 is smaller than the distance threshold, it may be determined that the resource indicated by the resource node 2 is strongly related to the topic indicated by the topic node 1, so that the resource indicated by the resource node 2 is used as the second resource. Similarly, the distance between the resource node 3 and the topic node 1 may be obtained based on the weights of the edges of the topic node 1 and the resource node 3, and if the distance between the resource node 3 and the topic node 1 is greater than or equal to the distance threshold, it may be determined that the resource indicated by the resource node 3 is weakly related to the topic indicated by the topic node 1, that is, the resource indicated by the resource node 3 is not used as the second resource.
Optionally, if the distance between a certain resource node and the topic node 1 is smaller than the distance threshold, not only the resource indicated by the resource node may be used as the second resource, but also the next-hop node (i.e. the node having a connection relationship with the resource node) of the resource node may be obtained, if the next-hop node is the resource node, the distance between the resource node and the next-hop node may be obtained based on the weights of the edges of the resource node and the next-hop node, if the distance between the resource node and the next-hop node is smaller than the distance threshold, it may be determined that the distance between the topic node 1 and the next-hop node is smaller than the distance threshold, and it may be understood that the resource indicated by the next-hop node is strongly related to the topic indicated by the topic node 1, so that the resource indicated by the next-hop node may be used as the second resource.
In another example, assuming that the seed resource includes resource 1, the topic information of resource 1 includes topic 1, a node indicating topic 1 in the heterogeneous graph is topic node 1, and in the heterogeneous graph, topic node 1 and topic node 2 are connected, at least one node that can be determined to have a connection relationship with topic node 1 includes topic node 2. Further, the distance between the topic node 1 and the topic node 2 may be obtained based on the weights of the edges of the topic node 1 and the topic node 2, and if the distance between the topic node 1 and the topic node 2 is smaller than the distance threshold, it may be determined that the topic indicated by the topic node 1 and the topic indicated by the topic node 2 are similar topics. Further, a next-hop node of the topic node 2 may be obtained, if the next-hop node of the topic node 2 is a resource node, a distance between the topic node 2 and the next-hop node may be obtained based on weights of edges of the topic node 2 and the next-hop node, if the distance between the topic node 2 and the next-hop node is smaller than a distance threshold value, it may be determined that the distance between the topic node 1 and the next-hop node is smaller than the distance threshold value, and it may be understood that a resource indicated by the next-hop node is strongly related to a topic indicated by the topic node 1, so that a resource indicated by the next-hop node may be used as a second resource.
In this embodiment of the present application, the number of hops between the topic information of the seed resource and the second resource may be limited, and specifically, if the number of hops between the topic information of the seed resource and the second resource is limited to be smaller than n, the second resource may be searched in the first hop node, the second hop node, …, and the n-1 hop node of the topic information of the seed resource. For example, in a case where the number of hops between the topic information defining the seed resource and the second resource is less than 3, the second resource may be searched for in the first-hop node and the second-hop node of the topic information of the seed resource.
S104, taking the first resource and the second resource as recall resources of the target object.
In a specific implementation, recall resources of the target object can be generated, the recall resources comprise first resources and second resources, and the recall resources can be further applied to a recommendation system to realize resource recommendation of the target object.
In the embodiment of the application, the seed resource of the interaction behavior of the target object in the preset time period and the topic information of the seed resource are obtained, then the first resource with the similarity higher than the similarity threshold value with the seed resource is searched based on the seed resource and the heterogeneous graph, the heterogeneous graph comprises a plurality of nodes, the plurality of nodes comprise topic nodes and resource nodes, the weight of the edge of each two nodes in the plurality of nodes is used for indicating the relevance between each two nodes, the second resource associated with the topic information is searched based on the topic information and the heterogeneous graph, and then the first resource and the second resource are used as recall resources of the target object, so that the accuracy of resource recall can be improved.
Based on the above description, please refer to fig. 2, fig. 2 is a flow chart of another resource recall method provided in an embodiment of the present application, where the resource recall method may be executed by a resource recall device or a computer apparatus; the resource recall scheme shown in fig. 2 includes, but is not limited to, steps S201 to S207, wherein:
s201, acquiring a training sample, wherein the training sample comprises topic information of a plurality of interacted resources with interaction behaviors in a historical time period and at least one group of training resource pairs.
Wherein each set of training resource pairs includes two training resources, one of which is a similar training resource to the other training resource.
Alternatively, the resource pairs produced per hour may be used as a set of training resource pairs by a swing algorithm. The swiping algorithm is a recall algorithm proposed by the Alaba, and considers the local graph structure relationship of users (user) -goods (item) -users (user), and for the users who click on a certain commodity, if the number of the commodities clicked together among the users is smaller, the commodities are indicated to be more similar.
S202, calling an initial resource recall model, and acquiring distances between each training resource and any training resource except each training resource in the training resources contained in at least one group of training resource pairs based on the training resources in at least one group of training resource pairs, topic information of each interacted resource and an initial heterogram.
Wherein the initial heterograms are constructed based on the training resources in at least one set of training resource pairs and topic information for each interacted resource.
Optionally, the initial heterograph is constructed by the following steps: determining resource nodes for indicating each training resource in the training sample and topic nodes for indicating each topic information in the training sample, and traversing any two nodes of the determined resource nodes and the determined topic nodes. In the case where the two nodes currently traversed include one topic node and one resource node, the weights of the edges of the currently traversed resource node and the currently traversed topic node are determined based on the number of the topic information of the training resource indicated by the currently traversed resource node that exists in the topic information of the plurality of interacted resources. And determining the weight of the edges of the two currently traversed topic nodes based on the number of training resources corresponding to topic information indicated by each currently traversed topic node and the number of the same training resources corresponding to topic information indicated by the two currently traversed topic nodes in training resources contained in the training samples. In the case where the two nodes currently traversed include two resource nodes, the weights of the edges of the two resource nodes currently traversed are determined based on whether the training resources indicated by the two resource nodes currently traversed are training resources in the same set of training resource pairs. After the traversal is finished, an initial heterogram is constructed based on the determined resource nodes, the determined topic nodes, the weights of the edges of each resource node and topic node, and the weights of the edges of each topic node and topic node.
Optionally, based on the topic information of the training resource indicated by the currently traversed resource node, the number of the topic information of the plurality of interacted resources exists, and the manner of determining the weights of the edges of the currently traversed resource node and the currently traversed topic node may include: and taking the topic information of the training resources indicated by the currently traversed resource node as the weight of the edges of the currently traversed resource node and the currently traversed topic node, wherein the inverse number of the quantity of the topic information of the plurality of interacted resources exists in the topic information of the currently traversed resource node.
For example, assuming that the training sample includes 100 pieces of topic information, namely topic information 1, topic information 2, …, topic information 100, topic information of a training resource indicated by a currently traversed resource node includes topic information 3, topic information 5, and topic information 120, it can be determined that topic information of a training resource indicated by a currently traversed resource node, topic information existing in a plurality of interacted resources includes topic information 3 and topic information 5, namely topic information of a training resource indicated by a currently traversed resource node, and the number existing in topic information of a plurality of interacted resources is 2, then the weights of edges of the currently traversed resource node and the currently traversed topic node are 1/2. In the embodiment of the present application, the weights of the edges of the currently traversed resource node and the currently traversed topic node may be used to characterize the degree of correlation between the resources indicated by the currently traversed resource node and the topics indicated by the currently traversed topic node, the topic information of the training resources indicated by the currently traversed resource node, the fewer the number of topic information existing in the plurality of interacted resources, the greater the weights of the edges of the currently traversed resource node and the currently traversed topic node, indicating that the resources indicated by the currently traversed resource node and the topics indicated by the currently traversed topic node are correlated.
Optionally, determining the weight of the edge of the currently traversed two topic nodes based on the number of training resources corresponding to the topic information indicated by each currently traversed topic node and the number of the same training resources corresponding to the topic information indicated by the currently traversed two topic nodes in the training resources included in the training sample includes: comparing the quantity of training resources corresponding to topic information indicated by the two currently traversed topic nodes, and determining the quantity with smaller value; and taking the ratio of the number of the same training resources corresponding to the topic information indicated by the two currently traversed topic nodes and the determined number as the weight of the edges of the two currently traversed topic nodes.
In a specific implementation, based on the number of training resources corresponding to topic information indicated by each currently traversed topic node and the number of the same training resources corresponding to topic information indicated by two currently traversed topic nodes in training resources included in a training sample, a manner of determining weights of edges of the two currently traversed topic nodes may be: comparing the quantity of training resources corresponding to the topic information indicated by each currently traversed topic node in the training resources contained in the training samples, determining the quantity with smaller value, dividing the quantity of the same training resources corresponding to the topic information indicated by the two currently traversed topic nodes by the determined quantity, and obtaining the weight of the edges of the two currently traversed topic nodes.
For example, assuming that the currently traversed topic nodes include topic node 1 and topic node 2, topic information indicated by the topic node 1 is topic 1, topic information indicated by the topic node 2 is topic 2, topic information of the training resource 1 included in the training sample includes topic 1, topic information of the training resource 5 includes topic 1, and topic information of the training resource 10 includes topic 1, it may be determined that the training resource corresponding to topic 1 includes training resource 1, training resource 5, and training resource 10, that is, the number of training resources corresponding to topic 1 is 3 in the training resources included in the training sample. Similarly, assuming that the topic information of the training resource 5 includes topic 2, the topic information of the training resource 12 includes topic 2, the topic information of the training resource 15 includes topic 2, and the topic information of the training resource 20 includes topic 2 in the training resources included in the training sample, it may be determined that the training resource corresponding to topic 2 includes training resource 5, training resource 12, training resource 15, and training resource 20, that is, the number of training resources corresponding to topic 2 is 4 in the training resources included in the training sample. Then the same training resources corresponding to topic 1 and topic 2 include training resource 5, i.e., the number of same training resources corresponding to topic 1 and topic 2 is 1. In addition, the number of training resources corresponding to topic 1 and the number of training resources corresponding to topic 2 in the training resources included in the training sample are compared, and the number determined to be smaller is 3. Then the weight of the edges of topic 1 and topic 2 may be determined to be 1/3. In the embodiment of the application, the weights of the edges of the two currently traversed topic nodes can be used for representing the similarity degree of topics indicated by the currently traversed topic nodes. The topics corresponding to the resources with the number larger than the number threshold are hot topics, namely the number of the resources corresponding to the hot topics is large, the weights of the edges of the two topic nodes are obtained through the method disclosed by the embodiment of the application, the weights of the edges of the hot topics can be reduced, namely the negative influence on the hot topics is reduced, the embodiment of the application performs heat reduction treatment when calculating the weights of the edges of the two topic nodes, the model is helped to learn more structural information, and the Martai effect of the system is reduced.
Optionally, based on whether the training resources indicated by the two currently traversed resource nodes are training resources in the same set of training resource pairs, the manner of determining the weights of the edges of the two currently traversed resource nodes may be: if the training resources indicated by the two currently traversed resource nodes are the training resources in the same group of training resource pairs, determining that the weights of the edges of the two currently traversed resource nodes are 1. If the training resources indicated by the two currently traversed resource nodes are not training resources in the same group of training resource pairs, determining that the weights of the edges of the two currently traversed resource nodes are 0.
In one implementation, determining a resource node for indicating each training resource in the training sample and a topic node for indicating each topic information in the training sample includes: and filtering long-tail topics in the topic information contained in the training sample to obtain filtered topic information, and determining topic nodes for indicating each piece of filtered topic information. And then, filtering long tail resources in the training resources contained in the training samples to obtain filtered training resources, and determining resource nodes for indicating each filtered training resource. According to the method and the device for model training, the long tail topics and the long tail resources are filtered, so that negative effects of the long tail topics and the long tail resources on model training can be avoided, and the accuracy of the trained model is improved.
The long tail topics refer to: and topic information, of which the number of corresponding training resources is smaller than a first number threshold, in the training resources contained in the training sample. The first number threshold may be a preset number threshold, and the first number threshold may be set by a developer according to experience, or may be learned through a neural network, which is not specifically limited by the embodiments of the present application. For example, assuming that the first number threshold is 6, the long-tail topic refers to topic information, of which the number of corresponding training resources is less than 6, in training resources included in the training sample.
Wherein, long tail resources refer to: the number of connections to other ones of the training resources contained in the training sample is less than a second number threshold. The second number threshold may be a preset number threshold, and the second number threshold may be set by a developer according to experience, or may be learned through a neural network, which is not specifically limited by the embodiments of the present application. For example, assuming that the second number threshold is 10, then long tail resources refer to resources having a connection number of less than 10 with other ones of the training resources contained in the training sample. The number of connections of a resource may refer to the number of nodes in the heterogram that have a connection relationship with a resource node for indicating the resource.
In one implementation, invoking an initial resource recall model, based on training resources in at least one set of training resource pairs, topic information and an initial heterogram of each interacted resource, obtaining distances between each training resource and any training resource except for each training resource in the training resources contained in the at least one set of training resource pairs, including: sampling nodes in the initial heterograms based on the weights of the edges of each resource node and each topic node in the initial heterograms and the weights of the edges of each topic node and each topic node to obtain a plurality of node triples; wherein each node triplet includes three nodes; aiming at any node triplet, obtaining the distance between every two nodes in any node triplet; each two nodes are neighbor nodes or non-neighbor nodes, and each two nodes are neighbor nodes or non-neighbor nodes, which are determined based on the connection relation of each two nodes in the initial heterograph.
In the specific implementation, the node vector learning can be performed through a graph neural network (Graph Sample Aggregate) algorithm, 2-hop neighbors are sampled according to the edge probability, the sampling strategies include topic-topic, topic-resource, topic-resource-topic, the number of resources is 6 from resource to resource, resource to topic, and resource to topic to resource. As an example of the heterogram shown in fig. 3a, the sampling policy may include topic-topic, topic-resource, topic-resource-topic. As shown in fig. 3b, the sampling policy may include resource-resource, resource-topic, resource-topic-resource.
S203, training the initial resource recall model to obtain the resource recall model by taking the distance between two neighbor nodes in the initial heterogram as a training target and increasing the distance between two non-neighbor nodes in the initial heterogram.
In specific implementation, the loss value can be calculated by a loss function based on the distance between two neighboring nodes in the initial heterogram and the distance between two non-neighboring nodes in the initial heterogram, and the loss function is as follows:
Loss=-[0.5*log_sigmoid(<topic,note+>-<topic,note->)+0.5*log_sigmoid(<topic,topic+>-<topic,topic->)]
where Loss may represent a Loss value, topic may represent a topic node, note+ may represent a sampling neighbor resource node of the topic node in vector space, note-may represent a negative sampling neighbor resource node of the topic node in vector space, topic+ may represent a sampling neighbor topic node of the topic node in vector space, and topic-may represent a negative sampling neighbor topic node of the topic node in vector space. The penalty function may be defined as the distance between a node (n) and its sampling neighbor node (n+) in vector space as close as possible, and the distance between a node (n) and its negative sampling neighbor node (n-).
According to the embodiment of the application, the heterogram can be constructed through the distribution relation of topics and resources, different migration modes are adopted, different node information is fully connected, and the generalization capability of the model is improved. And secondly, in the initial heterograph construction stage, the selection scheme of the nodes fully considers the correlation degree of topics and resources, and the recall precision during model training is ensured. In addition, the granularity of the topics is far greater than the granularity of the resources, so that the distance between the topics and the resources is learned, the distance between the topics is also learned, and the estimation efficiency and accuracy of the model on the topics can be improved.
S204, obtaining seed resources of the interaction behavior of the target object in a preset time period and topic information of the seed resources.
S205, calling a resource recall model, and searching for a first resource with similarity higher than a similarity threshold value with the seed resource based on the seed resource and the heterogeneous graph.
The heterogeneous graph comprises a plurality of nodes, the nodes comprise topic nodes and resource nodes, and the weights of edges of every two nodes in the nodes are used for indicating the association degree between every two nodes.
S206, searching a second resource associated with the topic information based on the topic information and the heterogeneous graph.
S207, the first resource and the second resource are taken as recall resources of the target object.
In the embodiments described above, the step S204 to the step S207 may be referred to as the specific description of the step S101 and the step S104, which are not repeated herein.
In the embodiment of the application, a training sample is obtained, the training sample comprises topic information of a plurality of interacted resources with interaction behavior in a historical time period and at least one group of training resource pairs, an initial resource recall model is called, topic information of each interacted resource and an initial heterogram are based on the training resources in the at least one group of training resource pairs, distances between each training resource and any training resource except each training resource in the at least one group of training resource pairs are obtained, the distances between two neighbor nodes in the initial heterogram are reduced, the distances between two non-neighbor nodes in the initial heterogram are increased to be training targets, the initial resource recall model is trained to obtain a resource recall model, seed resources with the interaction behavior generated by a target object in a preset time period are obtained, the topic information of the seed resources is called, a first resource with the similarity of the seed resources being higher than a similarity threshold is searched based on the seed resources and the heterogram, the second resource associated with the topic information is used as a target object, and the recall resources are rich in accuracy.
The present embodiment also provides a computer storage medium having stored therein program instructions for implementing the corresponding method described in the above embodiments when executed.
Referring to fig. 4 again, fig. 4 is a schematic structural diagram of a resource recall device according to an embodiment of the present application.
In one implementation manner of the resource recall device, the resource recall device comprises the following structure.
An obtaining unit 401, configured to obtain a seed resource for generating an interaction behavior by a target object in a preset time period, and topic information of the seed resource;
a first searching unit 402, configured to search, based on the seed resource and the iso-graph, a first resource having a similarity with the seed resource higher than a similarity threshold; the heterogeneous graph comprises a plurality of nodes, wherein the nodes comprise topic nodes and resource nodes, and the weights of the edges of every two nodes in the nodes are used for indicating the association degree between every two nodes;
a second searching unit 403, configured to search, based on the topic information and the heterogeneous graph, a second resource associated with the topic information;
a determining unit 404, configured to take the first resource and the second resource as recall resources of the target object.
In one embodiment, the first searching unit 402 searches for a first resource having a similarity with the seed resource higher than a similarity threshold, based on the seed resource and the iso-graph, including:
determining a target resource node for indicating the seed resource in the heterogram;
acquiring the distance between the target resource node and each first candidate node based on at least one first candidate node which has a connection relation with the target resource node and the weight of the edges of each two nodes in the target resource node and the at least one first candidate node;
searching a first candidate node with the distance between the first candidate node and the target resource node being smaller than a distance threshold value, and taking the resource indicated by the searched first candidate node as the first resource.
In one embodiment, the second searching unit 403 searches the second resource associated with the topic information based on the topic information and the heterogeneous map, including:
determining a target topic node for indicating the topic information in the heterogram;
acquiring the distance between the target topic node and each second candidate node based on at least one second candidate node in a connection relation with the target topic node and the weight of the edges of each two nodes in the target topic node and the at least one second candidate node;
Searching a second candidate node with the distance smaller than a distance threshold value from the target topic node, and taking the resource indicated by the searched second candidate node as the second resource.
In one embodiment, the first resource and the second resource are found by invoking a resource recall model,
the obtaining unit 401 is further configured to obtain a training sample, where the training sample includes topic information of a plurality of interacted resources having interaction behavior in a historical period, and at least one group of training resource pairs; wherein each set of training resource pairs comprises two training resources, one of which is a similar training resource to the other training resource;
the obtaining unit 401 is further configured to invoke an initial resource recall model, and obtain, based on the training resources in the at least one set of training resource pairs, topic information of each interacted resource and an initial heterogram, a distance between each training resource and any training resource, except for each training resource, in the training resources included in the at least one set of training resource pairs; wherein the initial heterogram is constructed based on training resources in the at least one set of training resource pairs and topic information for the respective interacted resources;
The resource recall device may further include a training unit 405, configured to train the initial resource recall model to obtain the resource recall model by taking a distance between two neighboring nodes in the initial heterogram reduced and a distance between two non-neighboring nodes in the initial heterogram increased as a training target.
In one embodiment, the initial iso-composition is constructed in a manner that includes:
determining resource nodes for indicating various training resources in the training sample and topic nodes for indicating various topic information in the training sample;
traversing any two nodes of the determined resource nodes and the determined topic nodes, and determining weights of edges of the currently traversed resource nodes and the currently traversed topic nodes based on topic information of training resources indicated by the currently traversed resource nodes and the topic information of the plurality of interacted resources when the two currently traversed nodes comprise one topic node and one resource node;
when the two currently traversed nodes comprise two topic nodes, determining weights of edges of the two currently traversed topic nodes based on the number of training resources corresponding to topic information indicated by each currently traversed topic node and the number of identical training resources corresponding to topic information indicated by the two currently traversed topic nodes in training resources contained in the training sample;
Determining weights of edges of the two currently traversed resource nodes based on whether training resources indicated by the two currently traversed resource nodes are training resources in the same group of training resource pairs under the condition that the two currently traversed nodes comprise the two resource nodes;
after the traversing is finished, the initial heterograms are constructed based on the determined resource nodes, the determined topic nodes, the weights of the edges of each resource node and topic node, and the weights of the edges of each topic node and topic node.
In one embodiment, the determining unit 404 determines the weights of the edges of the two currently traversed topic nodes based on the number of training resources corresponding to the topic information indicated by each currently traversed topic node and the number of identical training resources corresponding to the topic information indicated by the two currently traversed topic nodes, where the training resources included in the training sample include:
comparing the quantity of training resources corresponding to topic information indicated by the two currently traversed topic nodes to determine the quantity with smaller value;
and taking the ratio of the number of the same training resources corresponding to the topic information indicated by the two currently traversed topic nodes and the determined number as the weight of the edges of the two currently traversed topic nodes.
In one embodiment, the determining unit 404 determines a resource node for indicating each training resource in the training sample, and a topic node for indicating each topic information in the training sample, including:
filtering long-tail topics in topic information contained in the training sample to obtain filtered topic information, and determining topic nodes for indicating each piece of filtered topic information; wherein, the long tail topics refer to: among training resources contained in the training samples, topic information of which the number of corresponding training resources is smaller than a first number threshold value;
filtering long tail resources in training resources contained in the training samples to obtain filtered training resources, and determining resource nodes for indicating each filtered training resource; wherein, the long tail resource refers to: training resources having a number of connections to other ones of the training resources contained in the training sample less than a second number threshold.
In one embodiment, the obtaining unit 401 invokes an initial resource recall model, obtains, based on the training resources in the at least one set of training resource pairs, topic information of each interacted resource and an initial heterogram, a distance between each training resource and any training resource except for each training resource in the training resources included in the at least one set of training resource pairs, and includes:
Sampling nodes in the initial heterogram based on the weights of the edges of each resource node and each topic node in the initial heterogram and the weights of the edges of each topic node and each topic node to obtain a plurality of node triples; wherein each node triplet includes three nodes;
aiming at any node triplet, acquiring the distance between every two nodes in the node triplet; the two nodes are neighbor nodes or non-neighbor nodes, and the two nodes are neighbor nodes or non-neighbor nodes which are determined based on the connection relation of the two nodes in the initial heterograph.
In this embodiment of the present application, the obtaining unit 401 obtains a seed resource that generates an interaction behavior in a preset period of time and topic information of the seed resource, the first searching unit 402 searches, based on the seed resource and the heterogeneous map, a first resource having a similarity higher than a similarity threshold with the seed resource, the second searching unit 403 searches, based on the topic information and the heterogeneous map, a second resource associated with the topic information, and the determining unit 404 uses the first resource and the second resource as recall resources of the target object, so that accuracy of recall of the resources can be improved.
Referring to fig. 5 again, fig. 5 is a schematic structural diagram of a computer device provided in an embodiment of the present application, where the computer device in the embodiment of the present application includes a power supply module and other structures, and includes a processor 501, a storage 502, and a communication interface 503. Data can be interacted among the processor 501, the storage device 502 and the communication interface 503, and a corresponding target detection method is realized by the processor 501.
The storage 502 may include volatile memory (RAM), such as random-access memory (RAM); the storage 502 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a Solid State Drive (SSD), etc.; the storage 502 may also include a combination of the types of memory described above.
The processor 501 may be a central processing unit (central processing unit, CPU). The processor 501 may also be a combination of a CPU and a GPU. In the server, a plurality of CPUs and GPUs can be included as required to perform corresponding data processing. In one embodiment, storage 502 is used to store program instructions. The processor 501 may invoke program instructions to implement the various methods as referred to above in embodiments of the present application.
In a first possible implementation manner, the processor 501 of the computer device invokes the program instructions stored in the storage 502 to obtain seed resources for generating the interaction behavior of the target object in the preset time period, and topic information of the seed resources; searching a first resource with similarity higher than a similarity threshold value with the seed resource based on the seed resource and the heterogram; the heterogeneous graph comprises a plurality of nodes, wherein the nodes comprise topic nodes and resource nodes, and the weights of the edges of every two nodes in the nodes are used for indicating the association degree between every two nodes; searching a second resource associated with the topic information based on the topic information and the heterogeneous graph; and taking the first resource and the second resource as recall resources of the target object.
In one embodiment, the processor 501 may perform the following operations when searching for a first resource having a similarity with the seed resource higher than a similarity threshold based on the seed resource and the iso-graph:
determining a target resource node for indicating the seed resource in the heterogram;
acquiring the distance between the target resource node and each first candidate node based on at least one first candidate node which has a connection relation with the target resource node and the weight of the edges of each two nodes in the target resource node and the at least one first candidate node;
Searching a first candidate node with the distance between the first candidate node and the target resource node being smaller than a distance threshold value, and taking the resource indicated by the searched first candidate node as the first resource.
In one embodiment, the processor 501 may perform the following operations when searching for the second resource associated with the topic information based on the topic information and the heterogeneous map:
determining a target topic node for indicating the topic information in the heterogram;
acquiring the distance between the target topic node and each second candidate node based on at least one second candidate node in a connection relation with the target topic node and the weight of the edges of each two nodes in the target topic node and the at least one second candidate node;
searching a second candidate node with the distance smaller than a distance threshold value from the target topic node, and taking the resource indicated by the searched second candidate node as the second resource.
In one embodiment, the first resource and the second resource are found by calling a resource recall model, and the processor 501 is further configured to perform the following operations:
obtaining a training sample, wherein the training sample comprises topic information of a plurality of interacted resources with interaction behaviors in a historical time period and at least one group of training resource pairs; wherein each set of training resource pairs comprises two training resources, one of which is a similar training resource to the other training resource;
Invoking an initial resource recall model, and acquiring distances between each training resource and any training resource except each training resource in the training resources contained in the at least one group of training resource pairs based on the training resources in the at least one group of training resource pairs, topic information of each interacted resource and an initial heterogram; wherein the initial heterogram is constructed based on training resources in the at least one set of training resource pairs and topic information for the respective interacted resources;
and training the initial resource recall model by taking the distance between two neighbor nodes in the initial heterogram as a training target and increasing the distance between two non-neighbor nodes in the initial heterogram to obtain the resource recall model.
In one embodiment, the initial iso-composition is constructed in a manner that includes:
determining resource nodes for indicating various training resources in the training sample and topic nodes for indicating various topic information in the training sample;
traversing any two nodes of the determined resource nodes and the determined topic nodes, and determining weights of edges of the currently traversed resource nodes and the currently traversed topic nodes based on topic information of training resources indicated by the currently traversed resource nodes and the topic information of the plurality of interacted resources when the two currently traversed nodes comprise one topic node and one resource node;
When the two currently traversed nodes comprise two topic nodes, determining weights of edges of the two currently traversed topic nodes based on the number of training resources corresponding to topic information indicated by each currently traversed topic node and the number of identical training resources corresponding to topic information indicated by the two currently traversed topic nodes in training resources contained in the training sample;
determining weights of edges of the two currently traversed resource nodes based on whether training resources indicated by the two currently traversed resource nodes are training resources in the same group of training resource pairs under the condition that the two currently traversed nodes comprise the two resource nodes;
after the traversing is finished, the initial heterograms are constructed based on the determined resource nodes, the determined topic nodes, the weights of the edges of each resource node and topic node, and the weights of the edges of each topic node and topic node.
In one embodiment, the processor 501 may perform the following operations when determining the weights of the edges of the two currently traversed topic nodes based on the number of training resources corresponding to the topic information indicated by each currently traversed topic node and the number of identical training resources corresponding to the topic information indicated by the two currently traversed topic nodes, among the training resources included in the training samples:
Comparing the quantity of training resources corresponding to topic information indicated by the two currently traversed topic nodes to determine the quantity with smaller value;
and taking the ratio of the number of the same training resources corresponding to the topic information indicated by the two currently traversed topic nodes and the determined number as the weight of the edges of the two currently traversed topic nodes.
In one embodiment, the processor 501 may perform the following operations when determining a resource node for indicating each training resource in the training sample, and a topic node for indicating each topic information in the training sample:
filtering long-tail topics in topic information contained in the training sample to obtain filtered topic information, and determining topic nodes for indicating each piece of filtered topic information; wherein, the long tail topics refer to: among training resources contained in the training samples, topic information of which the number of corresponding training resources is smaller than a first number threshold value;
filtering long tail resources in training resources contained in the training samples to obtain filtered training resources, and determining resource nodes for indicating each filtered training resource; wherein, the long tail resource refers to: training resources having a number of connections to other ones of the training resources contained in the training sample less than a second number threshold.
In one embodiment, when the processor 501 invokes an initial resource recall model, based on the training resources in the at least one set of training resource pairs, topic information of each interacted resource and an initial heterogram, and obtains a distance between each training resource and any training resource in the training resources included in the at least one set of training resource pairs except for each training resource, the following operations may be performed:
sampling nodes in the initial heterogram based on the weights of the edges of each resource node and each topic node in the initial heterogram and the weights of the edges of each topic node and each topic node to obtain a plurality of node triples; wherein each node triplet includes three nodes;
aiming at any node triplet, acquiring the distance between every two nodes in the node triplet; the two nodes are neighbor nodes or non-neighbor nodes, and the two nodes are neighbor nodes or non-neighbor nodes which are determined based on the connection relation of the two nodes in the initial heterograph.
In this embodiment of the present application, the processor 501 obtains a seed resource that generates an interaction behavior in a preset time period of a target object, topic information of the seed resource, searches a first resource with a similarity higher than a similarity threshold value with the seed resource based on the seed resource and the heterogeneous graph, searches a second resource associated with the topic information based on the topic information and the heterogeneous graph, and uses the first resource and the second resource as recall resources of the target object, so that accuracy of resource recall can be improved.
Those skilled in the art will appreciate that the processes implementing all or part of the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the processes of the embodiments of the methods as above when executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like. The computer-readable storage medium of (a) may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
The above disclosure is only a few examples of the present application, and it is not intended to limit the scope of the claims, and those skilled in the art will understand that all or a portion of the above-described embodiments may be implemented and equivalents may be substituted for elements thereof, which are included in the scope of the present invention.

Claims (10)

1. A method of recall of a resource, comprising:
Acquiring seed resources of interaction behaviors generated by a target object in a preset time period and topic information of the seed resources;
searching a first resource with similarity higher than a similarity threshold value with the seed resource based on the seed resource and the heterogram; the heterogeneous graph comprises a plurality of nodes, wherein the nodes comprise topic nodes and resource nodes, and the weights of the edges of every two nodes in the nodes are used for indicating the association degree between every two nodes;
searching a second resource associated with the topic information based on the topic information and the heterogeneous graph;
and taking the first resource and the second resource as recall resources of the target object.
2. The method of claim 1, wherein the searching for a first resource having a similarity to the seed resource that is above a similarity threshold based on the seed resource and an iso-graph comprises:
determining a target resource node for indicating the seed resource in the heterogram;
acquiring the distance between the target resource node and each first candidate node based on at least one first candidate node which has a connection relation with the target resource node and the weight of the edges of each two nodes in the target resource node and the at least one first candidate node;
Searching a first candidate node with the distance between the first candidate node and the target resource node being smaller than a distance threshold value, and taking the resource indicated by the searched first candidate node as the first resource.
3. The method of claim 1, wherein the searching for a second resource associated with the topic information based on the topic information and the heterogeneous map comprises:
determining a target topic node for indicating the topic information in the heterogram;
acquiring the distance between the target topic node and each second candidate node based on at least one second candidate node in a connection relation with the target topic node and the weight of the edges of each two nodes in the target topic node and the at least one second candidate node;
searching a second candidate node with the distance smaller than a distance threshold value from the target topic node, and taking the resource indicated by the searched second candidate node as the second resource.
4. A method according to any of claims 1-3, wherein the first resource and the second resource are found by invoking a resource recall model, the method further comprising:
obtaining a training sample, wherein the training sample comprises topic information of a plurality of interacted resources with interaction behaviors in a historical time period and at least one group of training resource pairs; wherein each set of training resource pairs comprises two training resources, one of which is a similar training resource to the other training resource;
Invoking an initial resource recall model, and acquiring distances between each training resource and any training resource except each training resource in the training resources contained in the at least one group of training resource pairs based on the training resources in the at least one group of training resource pairs, topic information of each interacted resource and an initial heterogram; wherein the initial heterogram is constructed based on training resources in the at least one set of training resource pairs and topic information for the respective interacted resources;
and training the initial resource recall model by taking the distance between two neighbor nodes in the initial heterogram as a training target and increasing the distance between two non-neighbor nodes in the initial heterogram to obtain the resource recall model.
5. The method of claim 4, wherein the initial iso-patterning is constructed by:
determining resource nodes for indicating various training resources in the training sample and topic nodes for indicating various topic information in the training sample;
traversing any two nodes of the determined resource nodes and the determined topic nodes, and determining weights of edges of the currently traversed resource nodes and the currently traversed topic nodes based on topic information of training resources indicated by the currently traversed resource nodes and the topic information of the plurality of interacted resources when the two currently traversed nodes comprise one topic node and one resource node;
When the two currently traversed nodes comprise two topic nodes, determining weights of edges of the two currently traversed topic nodes based on the number of training resources corresponding to topic information indicated by each currently traversed topic node and the number of identical training resources corresponding to topic information indicated by the two currently traversed topic nodes in training resources contained in the training sample;
determining weights of edges of the two currently traversed resource nodes based on whether training resources indicated by the two currently traversed resource nodes are training resources in the same group of training resource pairs under the condition that the two currently traversed nodes comprise the two resource nodes;
after the traversing is finished, the initial heterograms are constructed based on the determined resource nodes, the determined topic nodes, the weights of the edges of each resource node and topic node, and the weights of the edges of each topic node and topic node.
6. The method according to claim 5, wherein the determining the weight of the edge of the currently traversed two topic nodes based on the number of training resources corresponding to the topic information indicated by each currently traversed topic node and the number of the same training resources corresponding to the topic information indicated by the currently traversed two topic nodes in the training resources included in the training sample includes:
Comparing the quantity of training resources corresponding to topic information indicated by the two currently traversed topic nodes to determine the quantity with smaller value;
and taking the ratio of the number of the same training resources corresponding to the topic information indicated by the two currently traversed topic nodes and the determined number as the weight of the edges of the two currently traversed topic nodes.
7. The method of claim 5, wherein the determining a resource node for indicating each training resource in the training sample and a topic node for indicating each topic information in the training sample comprises:
filtering long-tail topics in topic information contained in the training sample to obtain filtered topic information, and determining topic nodes for indicating each piece of filtered topic information; wherein, the long tail topics refer to: among training resources contained in the training samples, topic information of which the number of corresponding training resources is smaller than a first number threshold value;
filtering long tail resources in training resources contained in the training samples to obtain filtered training resources, and determining resource nodes for indicating each filtered training resource; wherein, the long tail resource refers to: training resources having a number of connections to other ones of the training resources contained in the training sample less than a second number threshold.
8. The method of claim 4, wherein invoking the initial resource recall model, based on the training resources in the at least one set of training resource pairs, topic information and initial heterograms for each interacted resource, obtains a distance between each training resource and any one of the training resources included in the at least one set of training resource pairs other than the each training resource, comprising:
sampling nodes in the initial heterogram based on the weights of the edges of each resource node and each topic node in the initial heterogram and the weights of the edges of each topic node and each topic node to obtain a plurality of node triples; wherein each node triplet includes three nodes;
aiming at any node triplet, acquiring the distance between every two nodes in the node triplet; the two nodes are neighbor nodes or non-neighbor nodes, and the two nodes are neighbor nodes or non-neighbor nodes which are determined based on the connection relation of the two nodes in the initial heterograph.
9. A computer device comprising a memory means and a processor, wherein:
The storage device is used for storing a computer program, and the computer program comprises program instructions;
the processor being operative to invoke the program instructions to perform the steps of the method according to any of claims 1-8.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause a computer device having the processor to perform the steps of the method of any of claims 1-8.
CN202310177157.0A 2023-02-25 2023-02-25 Resource recall method, computer device and storage medium Pending CN117743674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310177157.0A CN117743674A (en) 2023-02-25 2023-02-25 Resource recall method, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310177157.0A CN117743674A (en) 2023-02-25 2023-02-25 Resource recall method, computer device and storage medium

Publications (1)

Publication Number Publication Date
CN117743674A true CN117743674A (en) 2024-03-22

Family

ID=90253264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310177157.0A Pending CN117743674A (en) 2023-02-25 2023-02-25 Resource recall method, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN117743674A (en)

Similar Documents

Publication Publication Date Title
US10747771B2 (en) Method and apparatus for determining hot event
US9785888B2 (en) Information processing apparatus, information processing method, and program for prediction model generated based on evaluation information
US20120072408A1 (en) Method and system of prioritising operations
WO2017181866A1 (en) Making graph pattern queries bounded in big graphs
CN112989169B (en) Target object identification method, information recommendation method, device, equipment and medium
CN116089567A (en) Recommendation method, device, equipment and storage medium for search keywords
US20150302088A1 (en) Method and System for Providing Personalized Content
CN115905687A (en) Cold start-oriented recommendation system and method based on meta-learning graph neural network
Du et al. Additive co-clustering with social influence for recommendation
CN115841366A (en) Article recommendation model training method and device, electronic equipment and storage medium
CN116108150A (en) Intelligent question-answering method, device, system and electronic equipment
CN113468403A (en) User information prediction method based on big data mining and cloud computing AI (Artificial Intelligence) service system
RU2757592C1 (en) Method and system for clustering documents
CN116521990A (en) Method, apparatus, electronic device and computer readable medium for material processing
Shilin User Model‐Based Personalized Recommendation Algorithm for News Media Education Resources
CN109948056A (en) A kind of appraisal procedure and device of recommender system
CN117743674A (en) Resource recall method, computer device and storage medium
CN114547440A (en) User portrait mining method based on internet big data and artificial intelligence cloud system
CN111860655A (en) User processing method, device and equipment
Wang et al. A Tri‐Attention Neural Network Model‐BasedRecommendation
Zhang et al. Learning geographical hierarchy features for social image location prediction
CN113676505B (en) Information pushing method, device, computer equipment and storage medium
Zhang et al. Matrix Factorization Enriched with Item Features
CN117743673A (en) Resource recall method
CN117743672A (en) Resource recall method, computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination