CN113763012A - Model training method, similarity calculation device, similarity calculation equipment and similarity calculation medium - Google Patents

Model training method, similarity calculation device, similarity calculation equipment and similarity calculation medium Download PDF

Info

Publication number
CN113763012A
CN113763012A CN202011593278.6A CN202011593278A CN113763012A CN 113763012 A CN113763012 A CN 113763012A CN 202011593278 A CN202011593278 A CN 202011593278A CN 113763012 A CN113763012 A CN 113763012A
Authority
CN
China
Prior art keywords
sample
similarity
article
articles
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011593278.6A
Other languages
Chinese (zh)
Inventor
赵杨杰
陈东东
刘君亮
吕昊
易津锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202011593278.6A priority Critical patent/CN113763012A/en
Publication of CN113763012A publication Critical patent/CN113763012A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a model training method, a similarity calculation device, a model training device and a similarity calculation medium, wherein the model training method comprises the following steps: acquiring historical behavior data, and determining sample semantic features of sample articles according to incidence relations among the sample articles in the historical behavior data; determining sample similarity among sample articles according to the sample semantic features of the sample articles; the method comprises the steps of obtaining sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and sample similarity between the sample articles, and training a pre-constructed similarity model by using the training samples to obtain a trained similarity model. According to the method provided by the embodiment of the invention, the similarity model is trained by combining the sample attribute characteristics according to the sample semantic characteristics obtained from the historical behavior data, so that the trained similarity model can realize accurate analysis of the similarity between the articles only according to the attribute characteristics.

Description

Model training method, similarity calculation device, similarity calculation equipment and similarity calculation medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a model training method, a similarity calculation device, equipment and a medium.
Background
In the Internet shopping platform, the similarity analysis of the same type of articles in the platform can assist in understanding the development conditions of related business states. The current mainstream method is to artificially select products possibly having competitive relationship with retail commodity cards by combining the purchasing behavior of the past users with the business form of the products and utilizing the experience of experts.
In the process of implementing the invention, the inventor finds that at least the following technical problems exist in the prior art: the analysis of the similarity of the articles by combining the purchasing behavior of the user with the business form of the products depends on the historical behavior of the user, and when the new articles are released, the similarity analysis of the new articles cannot be accurately carried out.
Disclosure of Invention
The embodiment of the invention provides a model training method, a similarity calculation device, equipment and a medium, so as to realize similarity analysis between a new article and other articles.
In a first aspect, an embodiment of the present invention provides a similarity model training method, including:
acquiring historical behavior data, and determining sample semantic features of sample articles according to incidence relations among the sample articles in the historical behavior data;
determining sample similarity among sample articles according to the sample semantic features of the sample articles;
the method comprises the steps of obtaining sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and sample similarity between the sample articles, and training a pre-constructed similarity model by using the training samples to obtain a trained similarity model.
In a second aspect, an embodiment of the present invention further provides a similarity calculation method, including:
acquiring a reference attribute feature of a reference article and a candidate attribute feature of a candidate article;
inputting the reference attribute characteristics and the candidate attribute characteristics into a similarity model which is trained in advance to obtain a similarity prediction result output by the similarity model, wherein the similarity model is obtained by training by using a similarity model training method provided by any embodiment of the invention;
and determining the overall similarity between the reference article and the candidate article according to the similarity prediction result.
In a third aspect, an embodiment of the present invention further provides a similarity model training device, including:
the semantic feature acquisition module is used for acquiring historical behavior data and determining sample semantic features of the sample articles according to the incidence relation among the sample articles in the historical behavior data;
the similarity calculation module is used for determining sample similarity among the sample articles according to the sample semantic features of the sample articles;
and the model training module is used for acquiring the sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and the sample similarity between the sample articles, and training the pre-constructed similarity model by using the training samples to obtain the trained similarity model.
In a fourth aspect, an embodiment of the present invention further provides a similarity calculation apparatus, including:
the attribute feature acquisition module is used for acquiring the reference attribute feature of the reference article and the candidate attribute feature of the candidate article;
the model prediction module is used for inputting the reference attribute characteristics and the candidate attribute characteristics into a similarity model which is trained in advance to obtain a similarity prediction result output by the similarity model, wherein the similarity model is obtained by training by using the similarity model training method provided by any embodiment of the invention;
and the similarity determining module is used for determining the overall similarity between the reference article and the candidate article according to the similarity prediction result.
In a fifth aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the similarity model training method as provided by any of the embodiments of the present invention, and/or implement the similarity calculation method as provided by any of the embodiments of the present invention.
In a sixth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the similarity model training method provided in any embodiment of the present invention, and/or implements the similarity calculation method provided in any embodiment of the present invention.
The embodiment of the invention determines the sample semantic features of each sample article according to the incidence relation among the sample articles in the historical behavior data by acquiring the historical behavior data; determining sample similarity among sample articles according to the sample semantic features of the sample articles; the method comprises the steps of obtaining sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and sample similarity between sample articles, training a pre-constructed similarity model by using the training samples to obtain a trained similarity model, training the similarity model based on the sample semantic characteristics and the sample attribute characteristics according to sample semantic characteristics obtained by historical behavior data, and enabling the trained similarity model to consider the attribute characteristics of the articles and also consider the incidence relation between the articles in the historical behavior data so that the trained similarity model can realize accurate analysis of the similarity between the articles only according to the attribute characteristics.
Drawings
Fig. 1 is a flowchart of a similarity model training method according to an embodiment of the present invention;
fig. 2 is a flowchart of a similarity calculation method according to a second embodiment of the present invention;
fig. 3 is a schematic flowchart of similarity calculation according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a similarity model training apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a similarity calculation apparatus according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a similarity model training method according to an embodiment of the present invention. The present embodiment is applicable to the case when the similarity model is trained. The method may be performed by a similarity model training apparatus, which may be implemented in software and/or hardware, for example, and may be configured in a computer device. As shown in fig. 1, the method includes:
s110, obtaining historical behavior data, and determining sample semantic features of each sample article according to the incidence relation among the sample articles in the historical behavior data.
In this embodiment, the historical behavior data may be user behavior data in a set time period, including behaviors of browsing, collecting, purchasing, and trading of the user. It will be appreciated that when purchasing items, a user will typically browse through a plurality of similar items, from which an item is selected for purchase. For example, assuming that a user wants to purchase a desk, he or she typically browses multiple brands of desks of different structures and sizes, and selects a desk for purchase from the browsed multiple desks. Thus, it can be considered that under the same category, items that are repeatedly compared before the user transaction succeeds are potential contests, i.e., the degree of similarity is high. Therefore, the association relationship among the articles browsed by the user within the set time is determined, based on the association relationship, the historical behavior data of each user can generate the association relationship among the articles, the association degree among the articles can be obtained by combining the association relationship among the articles corresponding to the historical behavior data of all the users, and the semantic features of the articles can be calculated according to the association degree among the articles. And taking the articles related to the historical behavior data as sample articles, and calculating the sample semantic features of the sample articles based on the thought. Wherein, the sample semantic features can be embodied in the form of semantic vectors.
In one embodiment of the present invention, determining a sample semantic feature of each sample item according to an association relationship between sample items in historical behavior data includes: constructing a connection graph among the sample objects according to the incidence relation among the sample objects in the historical behavior data; and calculating the sample semantic features of each sample article based on the connection graph through a graph embedding algorithm. Alternatively, the sample semantic features of the sample item may be calculated using a graph embedding algorithm. Specifically, a connection diagram including all the sample items may be constructed, and the more connecting lines between the sample items, the higher the similarity between the sample items. And calculating the sample semantic features of each sample article by adopting a graph embedding algorithm based on the constructed connection graph. The graph embedding algorithm is an algorithm that maps graph data into vectors. Optionally, a LINE model may be adopted, and all sample articles are mapped into a high-dimensional vector space by connecting the relationship of each node in the graph, so as to obtain the sample semantic features of each sample article.
Optionally, constructing a connection graph between the sample items according to the association relationship between the sample items in the historical behavior data includes: and constructing a connecting edge between the interested article and the transaction of the same user in the historical behavior data within a set time period until all users related to the historical behavior data are traversed, and obtaining a connecting graph between the sample articles. Referring to the above, it can be considered that the commodities repeatedly compared by the same user in the last purchasing stage are highly likely to be potential competitive commodities, a connecting edge may be communicated between all browsed and purchased products and the last purchased products of the same user one day before the last purchase of a certain item, if the number of connecting edges between two sample items is more, the weight between the two sample items is greater until all purchasing users in the historical behavior data are traversed, a connecting graph between the sample items is obtained, the weight of an edge depends on the number of users who have connected the edge, and a greater weight indicates a greater similarity.
The connection relation between the sample articles is constructed through the interactive behaviors contained in the historical behavior data, so that the sample semantic features of the sample articles are obtained, the sample semantic features of the sample articles contain the operation features of the user, and the sample semantic features of the sample articles are more accurate.
And S120, determining the sample similarity among the sample articles according to the sample semantic features of the sample articles.
And after the sample semantic features of each sample article are obtained, calculating the similarity between the sample semantic features of every two sample articles as the similarity between the sample articles. In this embodiment, the method for calculating the similarity is not limited. For example, a cosine similarity between sample semantic features of sample items may be calculated as a sample similarity between sample items.
S130, obtaining sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and the sample similarity between the sample articles, and training the pre-constructed similarity model by using the training samples to obtain the trained similarity model.
Considering that the new product does not have behavior data as a reference, if the trained similarity model can be used for predicting the similarity between the new product and other articles, the similarity model needs to be enabled to calculate the similarity between the articles only by attribute features. Based on the above thought, sample attribute characteristics of sample articles are obtained, the sample attribute characteristics of the sample articles are used as input, sample similarity between the sample articles is used as a target, and a pre-constructed similarity model is trained to obtain a trained similarity model. The similarity model can be constructed based on the existing neural network, such as a convolutional neural network, a cyclic neural network, a long-term and short-term memory network and other neural networks.
Optionally, the sample property characteristics of the sample item may be obtained from item details of the sample item, which may include a plurality of property parameters of the sample item. Taking the sample article as a refrigerator as an example, the sample attribute characteristics of the sample article may include attribute parameters such as capacity, production area, power, energy consumption, function, brand, weight, and the like.
In one embodiment, generating a training sample based on sample attribute characteristics of each sample article and sample similarity between the sample articles, and training a pre-constructed similarity model using the training sample to obtain a trained similarity model, includes: for every two sample articles, inputting the sample attribute characteristics of each sample article into a similarity model to obtain the predicted similarity predicted by the similarity model; and determining a loss value according to the prediction similarity and the sample similarity between the sample articles, and obtaining a trained similarity model by taking the loss value reaching a convergence condition as a target. When the similarity model is trained, the sample attribute characteristics of the sample article are input into the similarity model, the prediction similarity output by the similarity model is compared with the sample similarity to obtain a loss value between the prediction similarity and the sample similarity, and the trained similarity model is obtained by taking the loss value as a target to reach a convergence condition. Optionally, the loss value meeting the convergence condition may be that a difference between two adjacent loss values is smaller than a set threshold, or the number of iterations reaches a set number of iterations.
The embodiment of the invention determines the sample semantic features of each sample article according to the incidence relation among the sample articles in the historical behavior data by acquiring the historical behavior data; determining sample similarity among sample articles according to the sample semantic features of the sample articles; the method comprises the steps of obtaining sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and sample similarity between sample articles, training a pre-constructed similarity model by using the training samples to obtain a trained similarity model, training the similarity model based on the sample semantic characteristics and the sample attribute characteristics according to sample semantic characteristics obtained by historical behavior data, and enabling the trained similarity model to consider the attribute characteristics of the articles and also consider the incidence relation between the articles in the historical behavior data so that the trained similarity model can realize accurate analysis of the similarity between the articles only according to the attribute characteristics.
Example two
Fig. 2 is a flowchart of a similarity calculation method according to a second embodiment of the present invention. The embodiment can be applied to the situation when the similarity between the articles is calculated, and is particularly suitable for the situation when the similarity between a new article and other articles in an internet platform is calculated. The method may be performed by a similarity calculation apparatus, which may be implemented in software and/or hardware, for example, and may be configured in a computer device. As shown in fig. 2, the method includes:
s210, acquiring the reference attribute feature of the reference article and the candidate attribute feature of the candidate article.
In this embodiment, when calculating the similarity between the articles, any article may be used as the reference article, and another article may be used as the candidate article. The reference attribute feature of the reference article and the candidate attribute feature of the candidate article may be a plurality of attribute parameters corresponding to the reference article and the candidate article, respectively.
When searching for new product competitions, the similarity between the new product and each old product can be calculated respectively by taking the new product as a reference product and the old product in the same category as the new product as a candidate product.
And S220, inputting the reference attribute characteristics and the candidate attribute characteristics into a pre-trained similarity model to obtain a similarity prediction result output by the similarity model.
The similarity model is obtained by training by using the similarity model training method provided by any embodiment of the invention. After the reference attribute features and the candidate attribute features are obtained, the reference attribute features and the candidate attribute features are input into the similarity model, and a similarity prediction result output by the similarity model can be obtained.
And S230, determining the overall similarity between the reference article and the candidate article according to the similarity prediction result.
Alternatively, the similarity prediction result may be directly used as the overall similarity between the reference item and the candidate item. After the overall similarity between the reference article and the candidate article is obtained, the degree of similarity between the reference article and the candidate article may be evaluated based on the overall similarity.
The embodiment of the invention obtains the reference attribute characteristics of the reference article and the candidate attribute characteristics of the candidate article; inputting the reference attribute features and the candidate attribute features into a similarity model trained in advance to obtain a similarity prediction result output by the similarity model; the overall similarity between the reference article and the candidate article is determined according to the similarity prediction result, and the similarity between the articles is calculated through the similarity model trained by the model training method provided by the embodiment of the invention, so that the trained similarity model can accurately analyze the similarity between the articles only according to the attribute characteristics, and the similarity analysis between the new article and other articles is realized.
On the basis of the scheme, the method further comprises the following steps: and taking the candidate article with the overall similarity larger than the set similarity threshold value with the reference article as the competitive article of the reference article.
When the competitive product of the new product is searched, the new product is taken as a reference article, the old product belonging to the same category with the new product is taken as a candidate article, and the overall similarity between the new product and each old product is respectively calculated. A similarity threshold value can be preset, and an old product with the overall similarity greater than the similarity threshold value with a new product is used as a new product competitive product. The similarity threshold value can be a fixed value or a variable, and the setting mode can be set according to actual requirements.
On the basis of the scheme, the method further comprises the following steps: and adjusting the attribute parameters in the reference attribute feature and the candidate attribute feature, and determining the contribution ratio of the attribute parameters in the overall similarity according to the adjusted similarity. After the competitive products of the new products are determined, the similarity can be recalculated by deleting one attribute parameter in the attribute characteristics, the recalculated similarity is compared with the overall similarity, the contribution ratio of the deleted attribute parameter in the overall similarity is obtained, and the further analysis of the similarity between the new products and the competitive products is realized.
EXAMPLE III
The present embodiment provides a preferred embodiment based on the above-described embodiments. The similarity calculation method provided by the embodiment can be applied to searching for the competitive products.
In this embodiment, firstly, the semantic features of the old article are obtained through historical old article data, the similarity between the old articles is obtained through calculation based on the semantic features of the old articles, the similarity model is trained based on the similarity between the old articles and the attribute features of the old articles, after the new article is released, the similarity between the new article (i.e., a reference article) and the old article (i.e., a candidate article) is calculated through the trained similarity model, and finally, the competitive product of the new article is determined according to the similarity between the new article and the old article.
Specifically, after historical behavior data in a set time period is acquired, a connection relation between all the historical artifacts is constructed based on interactive behaviors of browsing, shopping cart adding, purchasing and the like of the historical artifacts of the user. For example, it can be set that items that are repeatedly compared by the same user in the last purchase stage are highly likely to be potential competitive products, for example, a user has a connecting edge between all browsed and purchased items and the last purchased item on the day before the user finally purchases a certain item, if the connecting edge between two items is more, the weight between two items is greater, and finally after historical behavior data within a set time period is traversed, a connecting graph between all the old products is obtained. The weight of the edges depends on the number of the connecting edges, namely the number of users, the larger the weight is, the more similar the old products are, and finally, the semantic vector (namely the semantic features) of each old product is obtained according to the connecting graph through a graph embedding method. Optionally, a LINE model may be used to obtain a semantic vector of each of the junior commodities according to the connection diagram.
After the semantic vector of the old article is obtained, the similarity between the old articles is calculated according to the semantic vector, wherein the similarity between the old articles can adopt the existing similarity calculation method. Illustratively, the cosine similarity between the artifacts is calculated from the semantic vector.
And after the cosine similarity between the old commodities is obtained, obtaining the attribute characteristics of the old commodities, generating a training sample based on the attribute characteristics of the old commodities and the cosine similarity between the old commodities, and training a similarity model. Exemplarily, the attribute characteristics of two old commodities are input into a similarity model to obtain the predicted similarity output by the similarity model, a loss value is determined according to the predicted similarity and cosine similarity, the similarity model is trained by taking the convergence of the loss value as a target, and the trained similarity model is obtained.
Fig. 3 is a schematic flow chart of a similarity calculation method according to a third embodiment of the present invention. As shown in fig. 3, the attribute characteristics of the new product are obtained, the attribute characteristics of the new product and the attribute characteristics of the old product are input into a new product competition product searching system (i.e., a similarity model) after training, the similarity between the new product and each old product is obtained, and the competition product of the new product is determined based on the similarity between the new product and the old product. For example, the old product with the highest similarity to the new product is determined as the new product.
Furthermore, the similarity calculation can be performed after deleting a certain characteristic parameter in the attribute characteristics, and the adjusted similarity is compared with the original similarity to obtain the contribution degree condition of the deleted characteristic parameter to the similarity. The contribution degree condition of each characteristic parameter can be analyzed in turn to obtain the contribution ratio condition of each characteristic parameter to the similarity, and the similar characteristics between the new product and the competitive product are determined according to the contribution ratio of each characteristic parameter to the similarity.
The method provided by the embodiment of the invention realizes that the competitive product of the new product is found out based on the attribute characteristics of the new product and the old product when the new product is released by acquiring the semantic vector of the old product based on the historical behavior data and combining the attribute characteristic training similarity model of the old product; in addition, the attribute parameters in the attribute characteristics are deleted to analyze the proportion of the attribute parameters in the overall similarity, so that the rationality of the competitive products is powerfully explained, and the advantages and disadvantages of the new products are visualized.
Example four
Fig. 4 is a schematic structural diagram of a similarity model training device according to a fourth embodiment of the present invention. The similarity model training device may be implemented in software and/or hardware, for example, the similarity model training device may be configured in a computer device. As shown in fig. 4, the apparatus includes a semantic feature obtaining module 410, a similarity calculation module 420, and a model training module 430, wherein:
a semantic feature obtaining module 410, configured to obtain historical behavior data, and determine a sample semantic feature of each sample item according to an association relationship between sample items in the historical behavior data;
a similarity calculation module 420, configured to determine sample similarities between sample items according to the sample semantic features of each sample item;
the model training module 430 is configured to obtain sample attribute features of each sample article, generate a training sample based on the sample attribute features of each sample article and the sample similarity between the sample articles, and train a pre-constructed similarity model using the training sample to obtain a trained similarity model.
According to the embodiment of the invention, historical behavior data is obtained through a semantic feature obtaining module, and the sample semantic features of sample articles are determined according to the incidence relation among the sample articles in the historical behavior data; the similarity calculation module determines sample similarity among the sample articles according to the sample semantic features of the sample articles; the model training module obtains sample attribute characteristics of each sample article, training samples are generated based on the sample attribute characteristics of each sample article and sample similarity between sample articles, training samples are used for training a pre-constructed similarity model to obtain a trained similarity model, the similarity model is trained based on the sample semantic characteristics and the sample attribute characteristics according to sample semantic characteristics obtained by historical behavior data, and the trained similarity model not only considers the attribute characteristics of the articles, but also considers the incidence relation between the articles in the historical behavior data, so that the trained similarity model can realize accurate analysis of the similarity between the articles only according to the attribute characteristics.
Optionally, on the basis of the foregoing scheme, the semantic feature obtaining module 410 is specifically configured to:
constructing a connection graph among the sample objects according to the incidence relation among the sample objects in the historical behavior data;
and calculating the sample semantic features of each sample article based on the connection graph through a graph embedding algorithm.
Optionally, on the basis of the foregoing scheme, the semantic feature obtaining module 410 is specifically configured to:
and constructing a connecting edge between the interested article and the transaction of the same user in the historical behavior data within a set time period until all users related to the historical behavior data are traversed, and obtaining a connecting graph between the sample articles.
Optionally, on the basis of the foregoing scheme, the model training module 430 is specifically configured to:
for every two sample articles, inputting the sample attribute characteristics of each sample article into a similarity model to obtain the predicted similarity predicted by the similarity model;
and determining a loss value according to the prediction similarity and the sample similarity between the sample articles, and obtaining a trained similarity model by taking the loss value reaching a convergence condition as a target.
The similarity model training device provided by the embodiment of the invention can execute the similarity model training method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a similarity calculation apparatus according to a fifth embodiment of the present invention. The similarity model training device may be implemented in software and/or hardware, for example, the similarity model training device may be configured in a computer device. As shown in fig. 5, the apparatus includes an attribute feature obtaining module 510, a model predicting module 520, and a similarity determining module 530, wherein:
an attribute feature obtaining module 510, configured to obtain a reference attribute feature of a reference article and a candidate attribute feature of a candidate article;
the model prediction module 520 is configured to input the reference attribute features and the candidate attribute features into a similarity model trained in advance, and obtain a similarity prediction result output by the similarity model, where the similarity model is obtained by using a similarity model training method provided in any embodiment of the present invention;
and a similarity determination module 530, configured to determine an overall similarity between the reference item and the candidate item according to the similarity prediction result.
According to the embodiment of the invention, the reference attribute characteristics of a reference article and the candidate attribute characteristics of a candidate article are obtained through an attribute characteristic obtaining module; the model prediction module inputs the reference attribute characteristics and the candidate attribute characteristics into a similarity model trained in advance to obtain a similarity prediction result output by the similarity model; the similarity determining module determines the overall similarity between the reference article and the candidate article according to the similarity prediction result, and the similarity between the articles is calculated through the similarity model trained by the model training method provided by the embodiment of the invention, so that the trained similarity model can realize accurate analysis of the similarity between the articles only according to the attribute characteristics, and the similarity analysis between the new article and other articles is realized.
Optionally, on the basis of the above scheme, the apparatus further includes:
and the competitive product determining module is used for taking the candidate article with the overall similarity greater than the set similarity threshold value with the reference article as the competitive product of the reference article.
Optionally, on the basis of the above scheme, the apparatus further includes:
and the contribution degree analysis module is used for adjusting the attribute parameters in the reference attribute characteristics and the candidate attribute characteristics and determining the contribution ratio of the attribute parameters in the overall similarity according to the adjusted similarity.
The similarity calculation device provided by the embodiment of the invention can execute the similarity calculation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE six
Fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. Fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 612 suitable for use in implementing embodiments of the present invention. The computer device 612 shown in fig. 6 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in fig. 6, the computer device 612 is in the form of a general purpose computing device. Components of computer device 612 may include, but are not limited to: one or more processors 616, a system memory 628, and a bus 618 that couples various system components including the system memory 628 and the processors 616.
Bus 618 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and processor 616, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 612 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 612 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 628 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)630 and/or cache memory 632. The computer device 612 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage 634 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be connected to bus 618 by one or more data media interfaces. Memory 628 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 640 having a set (at least one) of program modules 642 may be stored, for example, in memory 628, such program modules 642 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 642 generally perform the functions and/or methods of the described embodiments of the present invention.
The computer device 612 may also communicate with one or more external devices 614 (e.g., keyboard, pointing device, display 624, etc.), with one or more devices that enable a user to interact with the computer device 612, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 612 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 622. Also, computer device 612 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) through network adapter 620. As shown, the network adapter 620 communicates with the other modules of the computer device 612 via the bus 618. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the computer device 612, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 616 executes programs stored in the system memory 628, so as to execute various functional applications and data processing, for example, implement a similarity model training method provided by the embodiment of the present invention, the method includes:
acquiring historical behavior data, and determining sample semantic features of sample articles according to incidence relations among the sample articles in the historical behavior data;
determining sample similarity among sample articles according to the sample semantic features of the sample articles;
obtaining sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and the sample similarity between the sample articles, and training a pre-constructed similarity model by using the training samples to obtain a trained similarity model;
and/or, the method for calculating the similarity provided by the embodiment of the invention is realized, and the method comprises the following steps:
acquiring a reference attribute feature of a reference article and a candidate attribute feature of a candidate article;
inputting the reference attribute characteristics and the candidate attribute characteristics into a similarity model which is trained in advance to obtain a similarity prediction result output by the similarity model, wherein the similarity model is obtained by training by using a similarity model training method provided by any embodiment of the invention;
and determining the overall similarity between the reference article and the candidate article according to the similarity prediction result.
Of course, those skilled in the art will understand that the processor may also implement the technical solution of the similarity model training method and/or the similarity calculation method provided in any embodiment of the present invention.
EXAMPLE seven
The seventh embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the similarity model training method provided in the embodiments of the present invention, and the method includes:
acquiring historical behavior data, and determining sample semantic features of sample articles according to incidence relations among the sample articles in the historical behavior data;
determining sample similarity among sample articles according to the sample semantic features of the sample articles;
obtaining sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and the sample similarity between the sample articles, and training a pre-constructed similarity model by using the training samples to obtain a trained similarity model;
and/or, the method for calculating the similarity provided by the embodiment of the invention is realized, and the method comprises the following steps:
acquiring a reference attribute feature of a reference article and a candidate attribute feature of a candidate article;
inputting the reference attribute characteristics and the candidate attribute characteristics into a similarity model which is trained in advance to obtain a similarity prediction result output by the similarity model, wherein the similarity model is obtained by training by using a similarity model training method provided by any embodiment of the invention;
and determining the overall similarity between the reference article and the candidate article according to the similarity prediction result.
Of course, the computer program stored on the computer-readable storage medium provided by the embodiments of the present invention is not limited to the above method operations, and may also perform related operations of the similarity model training method and/or the similarity calculation method provided by any embodiments of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments illustrated herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (11)

1. A similarity model training method is characterized by comprising the following steps:
obtaining historical behavior data, and determining sample semantic features of sample articles according to incidence relations among the sample articles in the historical behavior data;
determining sample similarity between the sample items according to the sample semantic features of the sample items;
obtaining sample attribute characteristics of each sample article, generating training samples based on the sample attribute characteristics of each sample article and the sample similarity between the sample articles, and training a pre-constructed similarity model by using the training samples to obtain a trained similarity model.
2. The method according to claim 1, wherein the determining the sample semantic features of each sample item according to the association relationship between the sample items in the historical behavior data comprises:
constructing a connection graph among the sample objects according to the incidence relation among the sample objects in the historical behavior data;
and calculating the sample semantic features of each sample item based on the connection graph through a graph embedding algorithm.
3. The method according to claim 2, wherein the constructing a connection graph between the sample items according to the association relationship between the sample items in the historical behavior data comprises:
and constructing a connecting edge between the interested article and the transaction of the same user in the historical behavior data within a set time period until all users related to the historical behavior data are traversed, and obtaining a connecting graph between the sample articles.
4. The method according to claim 1, wherein the generating training samples based on the sample attribute features of each sample article and the sample similarity between the sample articles, and training a pre-constructed similarity model using the training samples to obtain a trained similarity model comprises:
for every two sample articles, inputting the sample attribute characteristics of each sample article into the similarity model to obtain the predicted similarity predicted by the similarity model;
and determining a loss value according to the prediction similarity and the sample similarity between the sample articles, and obtaining a trained similarity model by taking the loss value reaching a convergence condition as a target.
5. A similarity calculation method is characterized by comprising the following steps:
acquiring a reference attribute feature of a reference article and a candidate attribute feature of a candidate article;
inputting the reference attribute features and the candidate attribute features into a similarity model trained in advance to obtain a similarity prediction result output by the similarity model, wherein the similarity model is obtained by training by using the similarity model training method of any one of claims 1 to 4;
and determining the overall similarity between the reference article and the candidate article according to the similarity prediction result.
6. The method of claim 5, further comprising:
and taking the candidate article with the overall similarity larger than a set similarity threshold value with the reference article as a competitive product of the reference article.
7. The method of claim 6, further comprising:
and adjusting the attribute parameters in the reference attribute feature and the candidate attribute feature, and determining the contribution ratio of the attribute parameters in the overall similarity according to the adjusted similarity.
8. A similarity model training device, comprising:
the semantic feature acquisition module is used for acquiring historical behavior data and determining sample semantic features of the sample articles according to the incidence relation among the sample articles in the historical behavior data;
the similarity calculation module is used for determining sample similarity among the sample articles according to the sample semantic features of the sample articles;
and the model training module is used for acquiring the sample attribute characteristics of each sample article, generating a training sample based on the sample attribute characteristics of each sample article and the sample similarity between the sample articles, and training a pre-constructed similarity model by using the training sample to obtain the trained similarity model.
9. A similarity degree calculation apparatus, comprising:
the attribute feature acquisition module is used for acquiring the reference attribute feature of the reference article and the candidate attribute feature of the candidate article;
a model prediction module, configured to input the reference attribute feature and the candidate attribute feature into a similarity model trained in advance, and obtain a similarity prediction result output by the similarity model, where the similarity model is obtained by using the similarity model training method according to any one of claims 1 to 4;
and the similarity determining module is used for determining the overall similarity between the reference article and the candidate article according to the similarity prediction result.
10. A computer device, the device comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the similarity model training method of any one of claims 1-4 and/or implement the similarity calculation method of any one of claims 5-7.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the similarity model training method according to any one of claims 1 to 4 and/or carries out the similarity calculation method according to any one of claims 5 to 7.
CN202011593278.6A 2020-12-29 2020-12-29 Model training method, similarity calculation device, similarity calculation equipment and similarity calculation medium Pending CN113763012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011593278.6A CN113763012A (en) 2020-12-29 2020-12-29 Model training method, similarity calculation device, similarity calculation equipment and similarity calculation medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011593278.6A CN113763012A (en) 2020-12-29 2020-12-29 Model training method, similarity calculation device, similarity calculation equipment and similarity calculation medium

Publications (1)

Publication Number Publication Date
CN113763012A true CN113763012A (en) 2021-12-07

Family

ID=78786227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011593278.6A Pending CN113763012A (en) 2020-12-29 2020-12-29 Model training method, similarity calculation device, similarity calculation equipment and similarity calculation medium

Country Status (1)

Country Link
CN (1) CN113763012A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744966A (en) * 2014-01-07 2014-04-23 Tcl集团股份有限公司 Item recommendation method and device
CN108665329A (en) * 2017-03-29 2018-10-16 北京京东尚科信息技术有限公司 A kind of Method of Commodity Recommendation based on user browsing behavior
CN110033097A (en) * 2019-03-07 2019-07-19 阿里巴巴集团控股有限公司 The method and device of the incidence relation of user and article is determined based on multiple data fields
CN110516033A (en) * 2018-05-04 2019-11-29 北京京东尚科信息技术有限公司 A kind of method and apparatus calculating user preference
CN111047410A (en) * 2019-12-16 2020-04-21 腾讯科技(深圳)有限公司 Recommendation method and device, terminal equipment and storage medium
KR20200046184A (en) * 2018-10-18 2020-05-07 카페24 주식회사 Search method for goods based on online shopping mall, apparatus and system using said method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744966A (en) * 2014-01-07 2014-04-23 Tcl集团股份有限公司 Item recommendation method and device
CN108665329A (en) * 2017-03-29 2018-10-16 北京京东尚科信息技术有限公司 A kind of Method of Commodity Recommendation based on user browsing behavior
CN110516033A (en) * 2018-05-04 2019-11-29 北京京东尚科信息技术有限公司 A kind of method and apparatus calculating user preference
KR20200046184A (en) * 2018-10-18 2020-05-07 카페24 주식회사 Search method for goods based on online shopping mall, apparatus and system using said method
CN110033097A (en) * 2019-03-07 2019-07-19 阿里巴巴集团控股有限公司 The method and device of the incidence relation of user and article is determined based on multiple data fields
CN111047410A (en) * 2019-12-16 2020-04-21 腾讯科技(深圳)有限公司 Recommendation method and device, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
US20210027146A1 (en) Method and apparatus for determining interest of user for information item
CN110427560B (en) Model training method applied to recommendation system and related device
CN107357874B (en) User classification method and device, electronic equipment and storage medium
CN110322300B (en) Data processing method and device, electronic equipment and storage medium
CN111737418B (en) Method, apparatus and storage medium for predicting relevance of search term and commodity
CN111612581A (en) Method, device and equipment for recommending articles and storage medium
WO2020221022A1 (en) Service object recommendation method
CN110084658B (en) Method and device for matching articles
CN114117216A (en) Recommendation probability prediction method and device, computer storage medium and electronic equipment
CN111754278A (en) Article recommendation method and device, computer storage medium and electronic equipment
CN111966886A (en) Object recommendation method, object recommendation device, electronic equipment and storage medium
CN108932625A (en) Analysis method, device, medium and the electronic equipment of user behavior data
CN111209351A (en) Object relation prediction method and device, object recommendation method and device, electronic equipment and medium
WO2022156589A1 (en) Method and device for determining live broadcast click rate
CN115423555A (en) Commodity recommendation method and device, electronic equipment and storage medium
CN111680213B (en) Information recommendation method, data processing method and device
CN112069404A (en) Commodity information display method, device, equipment and storage medium
CN111754300A (en) Commodity recommendation method, commodity recommendation device, commodity recommendation equipment and storage medium
CN110020195B (en) Article recommendation method and device, storage medium and electronic equipment
CN110851708A (en) Negative sample extraction method and device, computer equipment and storage medium
CN113094602B (en) Hotel recommendation method, system, equipment and medium
CN112905885B (en) Method, apparatus, device, medium and program product for recommending resources to user
CN113763012A (en) Model training method, similarity calculation device, similarity calculation equipment and similarity calculation medium
CN113947431A (en) User behavior quality evaluation method, device, equipment and storage medium
CN113362141A (en) Associated commodity recommendation method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination