CN113487351A - Privacy protection advertisement click rate prediction method, device, server and storage medium - Google Patents

Privacy protection advertisement click rate prediction method, device, server and storage medium Download PDF

Info

Publication number
CN113487351A
CN113487351A CN202110755722.8A CN202110755722A CN113487351A CN 113487351 A CN113487351 A CN 113487351A CN 202110755722 A CN202110755722 A CN 202110755722A CN 113487351 A CN113487351 A CN 113487351A
Authority
CN
China
Prior art keywords
client
clients
cluster
model
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110755722.8A
Other languages
Chinese (zh)
Inventor
刘洋
俞陈佳
王轩
徐睿峰
廖清
蒋琳
漆舒汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202110755722.8A priority Critical patent/CN113487351A/en
Publication of CN113487351A publication Critical patent/CN113487351A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, a device, a server and a storage medium for predicting the click rate of privacy protection advertisements, wherein the method comprises the following steps: issuing the global model to each client so that each client trains a local model, and obtaining a weight updating vector through calculating the gradient of a factorization machine component and a deep learning component respectively; calculating the similarity between the clients; clustering all clients by adopting a clustering federal learning algorithm to generate a global model for each cluster; in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached; and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so as to calculate the advertisement click rate of the candidate advertisement of the user. The invention protects the privacy and safety of the client data while maintaining the usability of the federal learning model.

Description

Privacy protection advertisement click rate prediction method, device, server and storage medium
Technical Field
The invention relates to a method and a device for predicting the click rate of an advertisement with privacy protection, a server and a storage medium, and belongs to the technical field of user privacy protection.
Background
The efficient prediction of the advertisement click rate plays a crucial role in improving the efficiency of advertisement putting. In order to provide personalized click-through rate prediction for users and capture interaction relations among different features so as to estimate the correlation between the users and advertisements, deep learning is introduced into the field by the industry and academia. Google corporation proposed a Wide & Deep model that performs feature learning in parallel through a linear algorithm with cross product and a Deep neural network layer, thereby capturing feature interaction relationships for advertisement recommendation.
In order to better capture the high-order interaction between the features, on the basis of Wide & Deep, the Deep FM model combines a decomposition machine and a Deep neural network to model the feature interaction. Compared with other advertisement click rate prediction strategies, the DeepFM not only has the function of FM, can learn the interaction relation of the features in sparse data, but also can use deep learning to construct a neural network for feature learning.
Traditional advertisement click rate prediction directly uploads user data to a central server for centralized training during model training. User data contains a lot of privacy sensitive information, and the original data is uploaded to a server without protection, so that privacy disclosure is caused.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a server and a storage medium for predicting a privacy protection advertisement click rate, which can balance accuracy and privacy of an advertisement click rate prediction algorithm in different client data non-independent and same-distribution scenarios, that is, protect privacy and security of client data while maintaining availability of a federal learning model.
The invention aims to provide a privacy-protecting advertisement click rate prediction method.
A third object of the present invention is to provide a privacy-preserving advertisement click-through rate prediction apparatus.
A third object of the present invention is to provide a server.
It is a fourth object of the present invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a privacy protection advertisement click rate prediction method is applied to a server and comprises the following steps:
issuing the global model to each client so that each client trains a local model according to local user data, obtaining a weight updating vector through calculating gradients of a factorization machine component and a deep learning component respectively, and uploading the weight updating vector to a server;
receiving weight updating vectors uploaded by each client, and calculating the similarity between the clients according to the weight updating vectors uploaded by each client;
clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model;
in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached;
and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
Further, the clustering all the clients by using a clustering federal learning algorithm to generate a global model for each cluster specifically includes:
clustering all clients by adopting a clustering federal learning algorithm, and judging whether splitting occurs;
if the splitting occurs, dividing all the clients into two clusters, and enabling each cluster to generate a global model;
if the split does not occur, judging whether the global model is converged;
and if the global model is not converged and does not reach the maximum round, all the clients are used as a cluster, and the cluster is used for generating a global model.
Further, the splitting of the clusters means: the near stagnation point of the federal learning objective function of the current cluster exists, and a stable point that a certain client does not reach the local loss function exists in the cluster.
Further, the method further comprises:
and sending the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to realize personalized advertisement recommendation to the user.
Further, the gradient of the factorizer component is calculated as follows:
Figure BDA0003147238570000021
wherein the content of the first and second substances,
Figure BDA0003147238570000022
the parameters of the kth client model are represented, x represents the characteristics of users, each user has n, and theta represents the general name of the model parameters.
Further, the gradient of the deep learning component is calculated as follows:
Figure BDA0003147238570000031
where t represents the iteration round, DkRepresents user data of the kth client, and SGD () represents a random gradient descent method.
Further, the similarity between the clients is calculated as follows:
Figure BDA0003147238570000032
wherein alpha isi,jRepresents the cosine similarity between the ith client and the jth client, Delta thetaiWeight update vector, Δ θ, representing the ith clientjA weight update vector representing the jth client.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a privacy-preserving advertisement click-through rate prediction device applied to a server comprises:
the model training module is used for issuing the global model to each client so that each client trains the local model according to local user data, obtains a weight updating vector through the gradients of the calculation factor decomposition machine component and the deep learning component respectively and uploads the weight updating vector to the server;
the similarity calculation module is used for receiving the weight update vectors uploaded by the clients and calculating the similarity between the clients according to the weight update vectors uploaded by the clients;
the clustering module is used for clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients so as to generate a global model for each cluster;
the model updating module is used for issuing the global model to all the clients in each cluster so as to update the local models of all the clients in the cluster until the global model converges or reaches the maximum turn;
and the advertisement click rate prediction module is used for receiving a request sent by a client of a certain user and issuing the global model to the client of the user in the corresponding cluster so as to enable the client of the user to calculate the advertisement click rate of the candidate advertisement of the user through the local model.
The third purpose of the invention can be achieved by adopting the following technical scheme:
a server comprises a processor and a memory for storing a program executable by the processor, and when the processor executes the program stored by the memory, the method for predicting the click rate of the privacy protection advertisement is realized.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program which, when executed by a processor, implements the privacy preserving advertisement click-through rate prediction method described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the method is realized based on a federal factorization machine, and can balance the accuracy and privacy of the advertisement click rate prediction algorithm under different client data non-independent same-distribution scenes, namely, the privacy safety of the client data is protected while the availability of a federal learning model is maintained; the existing centralized factorization machine is optimized, and a federal factorization machine is introduced, specifically, distributed training of federal learning enables a client not to directly upload user original data to a server, and only gradient information of the client is used for updating a model; the idea of clustering federal learning is introduced into a factorization machine, so that the advertisement click rate prediction capable of protecting the privacy of users is realized, and the linear aggregation model loss caused by the heterogeneity of user data can be solved.
2. The method maintains higher model precision, and under a Tencent2019 training set, the final global model precision is improved by 8% compared with the traditional Federal matrix decomposition model, and the Federal learning accuracy is improved by 2.5% compared with that of a single global model.
3. The invention enhances the privacy of the advertisement click rate prediction algorithm and provides a good solution for privacy protection under the federal advertisement recommendation of data heterogeneous scenes.
4. The invention designs a user-level distributed factorization machine which can be applied to a federal learning framework, and ensures that the original data of a user cannot be locally generated during model training, thereby reducing the risk of privacy disclosure of the user.
5. The invention adopts a mechanism of clustering federal learning, and improves the accuracy of the advertisement click rate prediction algorithm when the client data is heterogeneous.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a schematic diagram of a privacy preserving advertisement click-through rate prediction framework according to embodiment 1 of the present invention.
Fig. 2 is a flowchart of a privacy-preserving advertisement click-through rate prediction method according to embodiment 1 of the present invention.
Fig. 3 is a flowchart of clustering all clients according to embodiment 1 of the present invention.
Fig. 4 is a block diagram of an advertisement click-through rate prediction apparatus according to embodiment 2 of the present invention.
Fig. 5 is a block diagram of a server according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1:
in order to protect user privacy, a federated learning framework is introduced into advertisement click rate prediction, a federated learning security aggregation strategy learns the model parameters of a client, and meanwhile, the original data is guaranteed not to be out of the local, and due to the heterogeneity of user data, a simple model aggregation strategy (such as FedSGD and FedAvg) can cause performance reduction, and even model divergence is caused under the condition of extreme Non-IID of the user data.
In the face of the data heterogeneous challenge in the federal scenario, a security aggregation policy of a data subset shared by clients, adding a near-end item to an objective function, and the like can help solve the above challenge. However, the utility of the technique is not high due to the large time overhead of such calculations. The clustering loss term proposed by Sattler et al uses cosine similarity to overcome the problem of model divergence when clients have different data distributions. On the basis, the embodiment provides an advertisement click-through rate prediction framework based on a federal factorization machine, and the advertisement click-through rate prediction framework reduces the influence of data isomerism on model training by multi-centralized federal learning.
As shown in fig. 1, the privacy protection advertisement click-through rate prediction framework of the present embodiment includes two parts, the first part is an advertisement platform, which is implemented by a server, and has a plurality of clusters, the client data in the same cluster are distributed in the same manner, and the click-through probability of a user is predicted by a global model in the clusters; the second part is the user's client, which collects, analyzes and uploads model fades to the corresponding clusters of the advertising platform, primarily locally.
In the advertising platform, assuming that there is a group of candidate advertisements, which are denoted as D ═ D1, D2.. multidot.dm ], there are multiple advertising clusters in the advertising platform, which are composed of similar users, and the advertising model can be learned from characteristics of the ID, title and the like of the advertisements, when the client of user u sends a request to the advertising platform, the advertising platform will calculate the advertisement click rate of the candidate advertisements of user u in the corresponding clusters, which are denoted as y1, y 2.. multidot.ym, respectively, and send the selected partial advertisement list to the user, so as to realize personalized advertisement recommendation to user u; in the client, the local model training user features contain user personal information and advertisement click behavior data, the local model is trained locally using the user data, and the gradient of the local model is sent to the advertisement platform to update the global model.
As shown in fig. 1 and fig. 2, the present embodiment provides a method for predicting a click-through rate of a privacy-preserving advertisement, which is implemented on the basis of an advertisement platform (i.e., a server) of the above-mentioned frame for predicting a click-through rate of a privacy-preserving advertisement, and includes the following steps:
s201, issuing the global model to each client so that each client trains the local model according to local user data, obtaining a weight updating vector through calculating gradients of the factorization machine component and the deep learning component respectively, and uploading the weight updating vector to a server.
In this embodiment, the global model is a model to be trained, and because the model training requires multiple rounds, in the first round, the model to be trained is an initial global model, and in each subsequent round, the model to be trained is a global model obtained in the previous round.
After the server issues the global model to each client, each client receives the global model, trains the local model according to local user data to obtain a weight updating vector, and uploads the weight updating vector to the server, so that the server obtains a new round of global model through calculation; furthermore, the user characteristics of the local user data comprise user personal information and advertisement clicking behavior data, and since the user characteristics of the original user data are very sparse, the low-order and high-order characteristic interaction can be further learned only by converting the original user data into a dense vector through an embedded layer, and then a new continuous vector is obtained according to the uniform mapping sent by the server; furthermore, compared with the centralized Deep Factorization (Deep FM) model estimation, the model estimation formula in the distributed scenario is more complex, and a weight update vector of each client is obtained by respectively calculating gradients of a Factorization Machine (FM) component and a Deep learning component by using a chain calculation method, wherein the Deep learning component is a Deep Neural Network (DNN) component.
Calculating the gradient of the factorizer component as follows:
Figure BDA0003147238570000061
wherein the content of the first and second substances,
Figure BDA0003147238570000062
the parameters of the kth client model are represented, x represents the characteristics of users, each user has n, and theta represents the general name of the model parameters.
Calculating the gradient of the deep learning component, and each client executes multiple random gradient drops to iteratively update the local model, wherein the gradient of the kth client in the tth round is as follows:
Figure BDA0003147238570000063
where t represents the iteration round, DkRepresents user data of the kth client, and SGD () represents a random gradient descent method.
And the gradient of the factorization machine component and the deep learning component obtained through calculation is the weight updating vector.
S202, receiving the weight updating vectors uploaded by the clients, and calculating the similarity between the clients according to the weight updating vectors uploaded by the clients.
In this embodiment, the similarity between the clients is a cosine similarity, which is as follows:
Figure BDA0003147238570000064
wherein alpha isi,jRepresents the cosine similarity between the ith client and the jth client, Delta thetaiWeight update vector, Δ θ, representing the ith clientjRepresents the jth clientThe vector is updated with the weights of (1).
And S203, clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model.
Further, as shown in fig. 3, the step S203 specifically includes:
s2031, clustering all the clients by adopting a clustering federal learning algorithm, judging whether splitting occurs, if so, entering step S2032, and if not, entering step S2033.
In this embodiment, the client is observed at a fixed point (stagnation point) theta*Gradient changes, when the data distribution within a cluster is inconsistent, the smooth solution of the federal learning objective function in the cluster cannot be smooth in a single client; conversely, if the data distribution is consistent, the objective function optimization in the cluster will be able to reach the optimal solution of the local risk functions for all clients. Thus, as the objective function approaches the stationary point, the norm of the gradient of the client will approach zero, and therefore the occurrence of the split is determined by the following two points:
(1) approximate stagnation point theta of currently clustered Federal learning objective function*The following formula:
Figure BDA0003147238570000071
wherein D isiRepresenting the user data of the ith client, ∈1Representing a hyper-parameter, the specific value being determined experimentally, gk() Representing the objective function of the kth client.
(2) There is a stable point in the cluster where a client does not reach the local penalty function, as follows:
maxi=1,...,M||gk*)||>ε2>0 (5)
wherein epsilon2Representing a hyper-parameter, the specific value being determined experimentally, gk() Representing the objective function of the kth client.
S2032, dividing all clients into two clusters, and enabling each cluster to generate a global model.
In this embodiment, cluster federation learning recursively divides all clients into two clusters from top to bottom, which may minimize the maximum similarity between clients of different clusters.
S2033, judging whether the global model is converged, if the global model is not converged and does not reach the maximum round, entering step S2034, if the global model is converged or reaches the maximum round, ending the model training, and entering step S205 after the model training is ended.
S2034, regarding all the clients as a cluster, generating a global model for the cluster, and then proceeding to step S204.
And S204, in each cluster, issuing the global model to all the clients in the cluster so that all the clients in the cluster update the local model, uploading the weight update vector to the server, and returning to the step S202 until the global model converges or reaches the maximum round.
S205, receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in a corresponding cluster, so that the client of the user can calculate the advertisement click rate of the candidate advertisement of the user through the local model.
In this embodiment, a client of a certain user sends a request to a server, the server issues a global model to the client of the user in a corresponding cluster, and after receiving the global model, the client of the user calculates an advertisement click rate of a candidate advertisement of the user through a local model by using locally stored original user data (user personal information and click data), thereby realizing personalized click rate prediction.
S206, according to the advertisement click rate of the candidate advertisement of the user, sending the selected partial advertisement list to the client of the user, and realizing the personalized advertisement recommendation of the user.
Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 4, the present embodiment provides a privacy-preserving advertisement click-through rate prediction apparatus, which is applied to a server, and includes a model training module 401, a similarity calculation module 402, a clustering module 403, a model updating module 404, an advertisement click-through rate prediction module 405, and an advertisement recommendation module 406, where specific functions of each module are as follows:
and the model training module 401 is configured to send the global model to each client, so that each client trains the local model according to the local user data, and obtains a weight update vector through the gradients of the computer factorization machine component and the deep learning component.
A similarity calculation module 402, configured to receive the weight update vector uploaded by each client, and calculate a similarity between the clients according to the weight update vector uploaded by each client;
the clustering module 403 is configured to cluster all the clients by using a clustering federal learning algorithm according to the similarity between the clients, so that each cluster generates a global model;
a model update module 404, configured to send the global model to all clients in each cluster, so that all clients in the cluster update the local model until the global model converges or reaches a maximum turn;
the advertisement click rate prediction module 405 is configured to receive a request sent by a client of a certain user, and issue the global model to the client of the user in a corresponding cluster, so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
And the advertisement recommending module 406 is configured to send the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to implement personalized advertisement recommendation for the user.
The specific implementation of each module in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that, the apparatus provided in this embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3:
as shown in fig. 5, the present embodiment provides a server, which includes a processor 502, a memory and a network interface 503 connected by a system bus 501, the processor is used for providing computing and control capability, the memory includes a nonvolatile storage medium 504 and an internal memory 505, the nonvolatile storage medium 504 stores an operating system, a computer program and a database, the internal memory 505 provides an environment for the operating system and the computer program in the nonvolatile storage medium to run, and when the processor 502 executes the computer program stored in the memory, the privacy-preserving advertisement click rate prediction method of the above embodiment 1 is implemented, as follows:
issuing the global model to each client so that each client trains a local model according to local user data, obtaining a weight updating vector through calculating gradients of a factorization machine component and a deep learning component respectively, and uploading the weight updating vector to a server;
receiving weight updating vectors uploaded by each client, and calculating the similarity between the clients according to the weight updating vectors uploaded by each client;
clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model;
in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached;
and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
Further, the clustering all the clients by using a clustering federal learning algorithm to generate a global model for each cluster specifically includes:
clustering all clients by adopting a clustering federal learning algorithm, and judging whether splitting occurs;
if the splitting occurs, dividing all the clients into two clusters, and enabling each cluster to generate a global model;
if the split does not occur, judging whether the global model is converged;
and if the global model is not converged and does not reach the maximum round, all the clients are used as a cluster, and the cluster is used for generating a global model.
Further, the method may further include:
and sending the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to realize personalized advertisement recommendation to the user.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the method for predicting a click rate of a privacy-preserving advertisement according to embodiment 1 above is implemented as follows:
issuing the global model to each client so that each client trains a local model according to local user data, obtaining a weight updating vector through calculating gradients of a factorization machine component and a deep learning component respectively, and uploading the weight updating vector to a server;
receiving weight updating vectors uploaded by each client, and calculating the similarity between the clients according to the weight updating vectors uploaded by each client;
clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model;
in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached;
and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
Further, the clustering all the clients by using a clustering federal learning algorithm to generate a global model for each cluster specifically includes:
clustering all clients by adopting a clustering federal learning algorithm, and judging whether splitting occurs;
if the splitting occurs, dividing all the clients into two clusters, and enabling each cluster to generate a global model;
if the split does not occur, judging whether the global model is converged;
and if the global model is not converged and does not reach the maximum round, all the clients are used as a cluster, and the cluster is used for generating a global model.
Further, the method may further include:
and sending the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to realize personalized advertisement recommendation to the user.
It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In conclusion, the method is realized based on the federal factorization machine, and can balance the accuracy and privacy of the advertisement click rate prediction algorithm under different client data non-independent same-distribution scenes, namely, the privacy safety of the client data is protected while the availability of the federal learning model is maintained; the existing centralized factorization machine is optimized, and a federal factorization machine is introduced, specifically, distributed training of federal learning enables a client not to directly upload user original data to a server, and only gradient information of the client is used for updating a model; the idea of clustering federal learning is introduced into a factorization machine, so that the advertisement click rate prediction capable of protecting the privacy of users is realized, and the linear aggregation model loss caused by the heterogeneity of user data can be solved.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.

Claims (10)

1. A privacy protection advertisement click rate prediction method is applied to a server and is characterized by comprising the following steps:
issuing the global model to each client so that each client trains a local model according to local user data, obtaining a weight updating vector through calculating gradients of a factorization machine component and a deep learning component respectively, and uploading the weight updating vector to a server;
receiving weight updating vectors uploaded by each client, and calculating the similarity between the clients according to the weight updating vectors uploaded by each client;
clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model;
in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached;
and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
2. The privacy-preserving advertisement click-through rate prediction method according to claim 1, wherein the clustering all clients by using a clustering federal learning algorithm so that each cluster generates a global model specifically comprises:
clustering all clients by adopting a clustering federal learning algorithm, and judging whether splitting occurs;
if the splitting occurs, dividing all the clients into two clusters, and enabling each cluster to generate a global model;
if the split does not occur, judging whether the global model is converged;
and if the global model is not converged and does not reach the maximum round, all the clients are used as a cluster, and the cluster is used for generating a global model.
3. The privacy-preserving advertisement click-through rate prediction method according to claim 2, wherein the splitting of the clusters is: the near stagnation point of the federal learning objective function of the current cluster exists, and a stable point that a certain client does not reach the local loss function exists in the cluster.
4. The privacy preserving advertisement click-through rate prediction method according to claim 1, further comprising:
and sending the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to realize personalized advertisement recommendation to the user.
5. The privacy preserving advertisement click-through rate prediction method according to any one of claims 1-4, wherein a gradient of a factorizer component is calculated as follows:
Figure FDA0003147238560000021
wherein the content of the first and second substances,
Figure FDA0003147238560000022
the parameters of the kth client model are represented, x represents the characteristics of users, each user has n, and theta represents the general name of the model parameters.
6. The privacy preserving advertisement click-through rate prediction method according to any one of claims 1-4, wherein a gradient of a deep learning component is calculated as follows:
Figure FDA0003147238560000023
where t represents the iteration round, DkRepresents user data of the kth client, and SGD () represents a random gradient descent method.
7. The privacy-preserving advertisement click-through rate prediction method according to any one of claims 1-4, wherein the similarity between the clients is calculated as follows:
Figure FDA0003147238560000024
wherein alpha isi,jRepresents the cosine similarity between the ith client and the jth client, Delta thetaiWeight update vector, Δ θ, representing the ith clientjA weight update vector representing the jth client.
8. A privacy-preserving advertisement click-through rate prediction device applied to a server is characterized by comprising:
the model training module is used for issuing the global model to each client so that each client trains the local model according to local user data, obtains a weight updating vector through the gradients of the calculation factor decomposition machine component and the deep learning component respectively and uploads the weight updating vector to the server;
the similarity calculation module is used for receiving the weight update vectors uploaded by the clients and calculating the similarity between the clients according to the weight update vectors uploaded by the clients;
the clustering module is used for clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients so as to generate a global model for each cluster;
the model updating module is used for issuing the global model to all the clients in each cluster so as to update the local models of all the clients in the cluster until the global model converges or reaches the maximum turn;
and the advertisement click rate prediction module is used for receiving a request sent by a client of a certain user and issuing the global model to the client of the user in the corresponding cluster so as to enable the client of the user to calculate the advertisement click rate of the candidate advertisement of the user through the local model.
9. A server comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the privacy preserving advertisement click-through rate prediction method of any one of claims 1-7.
10. A storage medium storing a program, wherein the program, when executed by a processor, implements the privacy-preserving advertisement click-through rate prediction method of any one of claims 1-7.
CN202110755722.8A 2021-07-05 2021-07-05 Privacy protection advertisement click rate prediction method, device, server and storage medium Pending CN113487351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110755722.8A CN113487351A (en) 2021-07-05 2021-07-05 Privacy protection advertisement click rate prediction method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110755722.8A CN113487351A (en) 2021-07-05 2021-07-05 Privacy protection advertisement click rate prediction method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN113487351A true CN113487351A (en) 2021-10-08

Family

ID=77939950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110755722.8A Pending CN113487351A (en) 2021-07-05 2021-07-05 Privacy protection advertisement click rate prediction method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN113487351A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988207A (en) * 2021-11-09 2022-01-28 长春理工大学 Article recommendation method and system
CN113988314A (en) * 2021-11-09 2022-01-28 长春理工大学 Cluster federal learning method and system for selecting client
CN114595831A (en) * 2022-03-01 2022-06-07 北京交通大学 Federal learning method integrating adaptive weight distribution and personalized differential privacy
CN115081003A (en) * 2022-06-29 2022-09-20 西安电子科技大学 Gradient leakage attack method under sampling aggregation framework
CN115311692A (en) * 2022-10-12 2022-11-08 深圳大学 Federal pedestrian re-identification method, system, electronic device and storage medium
CN117077817A (en) * 2023-10-13 2023-11-17 之江实验室 Personalized federal learning model training method and device based on label distribution

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310047A (en) * 2020-02-20 2020-06-19 深圳前海微众银行股份有限公司 Information recommendation method, device and equipment based on FM model and storage medium
CN111507765A (en) * 2020-04-16 2020-08-07 厦门美图之家科技有限公司 Advertisement click rate prediction method and device, electronic equipment and readable storage medium
WO2020229684A1 (en) * 2019-05-16 2020-11-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concepts for federated learning, client classification and training data similarity measurement
CN112364943A (en) * 2020-12-10 2021-02-12 广西师范大学 Federal prediction method based on federal learning
CN112396099A (en) * 2020-11-16 2021-02-23 哈尔滨工程大学 Click rate estimation method based on deep learning and information fusion
CN112508203A (en) * 2021-02-08 2021-03-16 同盾控股有限公司 Federated data clustering method and device, computer equipment and storage medium
WO2021115480A1 (en) * 2020-06-30 2021-06-17 平安科技(深圳)有限公司 Federated learning method, device, equipment, and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020229684A1 (en) * 2019-05-16 2020-11-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concepts for federated learning, client classification and training data similarity measurement
CN111310047A (en) * 2020-02-20 2020-06-19 深圳前海微众银行股份有限公司 Information recommendation method, device and equipment based on FM model and storage medium
CN111507765A (en) * 2020-04-16 2020-08-07 厦门美图之家科技有限公司 Advertisement click rate prediction method and device, electronic equipment and readable storage medium
WO2021115480A1 (en) * 2020-06-30 2021-06-17 平安科技(深圳)有限公司 Federated learning method, device, equipment, and storage medium
CN112396099A (en) * 2020-11-16 2021-02-23 哈尔滨工程大学 Click rate estimation method based on deep learning and information fusion
CN112364943A (en) * 2020-12-10 2021-02-12 广西师范大学 Federal prediction method based on federal learning
CN112508203A (en) * 2021-02-08 2021-03-16 同盾控股有限公司 Federated data clustering method and device, computer equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988207A (en) * 2021-11-09 2022-01-28 长春理工大学 Article recommendation method and system
CN113988314A (en) * 2021-11-09 2022-01-28 长春理工大学 Cluster federal learning method and system for selecting client
CN113988314B (en) * 2021-11-09 2024-05-31 长春理工大学 Clustering federation learning method and system for selecting clients
CN114595831A (en) * 2022-03-01 2022-06-07 北京交通大学 Federal learning method integrating adaptive weight distribution and personalized differential privacy
CN115081003A (en) * 2022-06-29 2022-09-20 西安电子科技大学 Gradient leakage attack method under sampling aggregation framework
CN115081003B (en) * 2022-06-29 2024-04-02 西安电子科技大学 Gradient leakage attack method under sampling aggregation framework
CN115311692A (en) * 2022-10-12 2022-11-08 深圳大学 Federal pedestrian re-identification method, system, electronic device and storage medium
CN115311692B (en) * 2022-10-12 2023-07-14 深圳大学 Federal pedestrian re-identification method, federal pedestrian re-identification system, electronic device and storage medium
CN117077817A (en) * 2023-10-13 2023-11-17 之江实验室 Personalized federal learning model training method and device based on label distribution
CN117077817B (en) * 2023-10-13 2024-01-30 之江实验室 Personalized federal learning model training method and device based on label distribution

Similar Documents

Publication Publication Date Title
CN113487351A (en) Privacy protection advertisement click rate prediction method, device, server and storage medium
KR20210028724A (en) Biased data removal using machine learning models
Chen et al. General functional matrix factorization using gradient boosting
Huang et al. Parallel ensemble of online sequential extreme learning machine based on MapReduce
Junaid et al. Modeling an optimized approach for load balancing in cloud
CN111382190B (en) Object recommendation method and device based on intelligence and storage medium
CN112836130A (en) Context-aware recommendation system and method based on federated learning
CN111191709A (en) Continuous learning framework and continuous learning method of deep neural network
Kulkarni et al. MapReduce framework based big data clustering using fractional integrated sparse fuzzy C means algorithm
Sun et al. A parallel recommender system using a collaborative filtering algorithm with correntropy for social networks
WO2022043798A1 (en) Automated query predicate selectivity prediction using machine learning models
WO2020033805A1 (en) Website representation vector to generate search results and classify website
Habib et al. Retracted: Forecasting model for wind power integrating least squares support vector machine, singular spectrum analysis, deep belief network, and locality‐sensitive hashing
Wang et al. DeepNetQoE: Self-adaptive QoE optimization framework of deep networks
Qiao et al. Mp-fedcl: Multi-prototype federated contrastive learning for edge intelligence
CN113128526B (en) Image recognition method and device, electronic equipment and computer-readable storage medium
Meng et al. Spatial-temporal aware intelligent service recommendation method based on distributed tensor factorization for big data applications
Yaseen et al. Cloud‐based video analytics using convolutional neural networks
Wu et al. [Retracted] FLOM: Toward Efficient Task Processing in Big Data with Federated Learning
CN111581420B (en) Flink-based medical image real-time retrieval method
CN114764469A (en) Content recommendation method and device, computer equipment and storage medium
CN105868435B (en) It is a kind of to analyze the efficient control method for realizing optical-fiber network construction based on linear dependence
Zhang et al. Small files storing and computing optimization in Hadoop parallel rendering
He Research on personalized search based on elasticsearch
CN112667394A (en) Computer resource utilization rate optimization method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination