CN113824546A - Method and apparatus for generating information - Google Patents

Method and apparatus for generating information Download PDF

Info

Publication number
CN113824546A
CN113824546A CN202010567116.9A CN202010567116A CN113824546A CN 113824546 A CN113824546 A CN 113824546A CN 202010567116 A CN202010567116 A CN 202010567116A CN 113824546 A CN113824546 A CN 113824546A
Authority
CN
China
Prior art keywords
feature
segmentation point
gradient information
point
division
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010567116.9A
Other languages
Chinese (zh)
Other versions
CN113824546B (en
Inventor
何恺
杨青友
洪爵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN202010567116.9A priority Critical patent/CN113824546B/en
Publication of CN113824546A publication Critical patent/CN113824546A/en
Application granted granted Critical
Publication of CN113824546B publication Critical patent/CN113824546B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9057Arrangements for supporting packet reassembly or resequencing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for generating information, and relates to the field of artificial intelligence. A specific implementation scheme comprises: obtaining gradient information of the sample according to the sample label and the prediction information of the current model aiming at the sample; determining a first feature and a corresponding optimal segmentation point from the features held by the local terminal based on the gradient information; sending the ciphertext of the gradient information obtained by adopting a homomorphic encryption algorithm to a feature providing end; receiving a second feature and a corresponding optimal segmentation point sent by the feature extraction end, wherein the second feature and the corresponding optimal segmentation point are determined by the feature providing end from held features based on the ciphertext of the gradient information and multi-party security calculation; and determining a final segmentation point from the optimal segmentation point corresponding to the first feature and the optimal segmentation point corresponding to the second feature based on multi-party security calculation between the feature providing terminals. This embodiment improves information security.

Description

Method and apparatus for generating information
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to an artificial intelligence technology.
Background
The data required for machine learning often involves multiple domains. Since the data held by a single data owner may be incomplete, cooperation of multiple data owners is often required to jointly train the model in order to obtain a more predictive model. Federal learning is a distributed machine learning technology, and aims to realize joint modeling, break a data island and improve the model effect on the basis of ensuring the privacy and safety of data. In federal learning, feature information is distributed with tags on different data owners. For the tree model, since the optimal segmentation points need to be jointly calculated, information leakage cannot be avoided.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, and storage medium for generating information.
According to a first aspect of the present disclosure, an embodiment of the present disclosure provides a method for generating information, which is applied to a tag providing end, and includes: obtaining gradient information of the sample according to the sample label and the prediction information of the current model aiming at the sample; determining a first feature and a corresponding optimal segmentation point from the features held by the local terminal based on the gradient information; sending the ciphertext of the gradient information obtained by adopting a homomorphic encryption algorithm to a feature providing end; receiving a second feature and a corresponding optimal segmentation point sent by the feature extraction end, wherein the second feature and the corresponding optimal segmentation point are determined by the feature providing end from held features based on the ciphertext of the gradient information and multi-party security calculation; and determining a final segmentation point from the optimal segmentation point corresponding to the first feature and the optimal segmentation point corresponding to the second feature based on multi-party security calculation between the feature providing terminals.
According to a second aspect of the present disclosure, an embodiment of the present disclosure provides an apparatus for generating information, which is partially disposed at a tag providing end, including: the first determining unit is configured to obtain gradient information of the sample according to the sample label and the prediction information of the current model for the sample; a second determination unit configured to determine the first feature and the corresponding optimal segmentation point from the features held by the local end based on the gradient information; the sending unit is configured to send the ciphertext of the gradient information obtained by adopting a homomorphic encryption algorithm to a characteristic providing end; a receiving unit, configured to receive a second feature and a corresponding optimal segmentation point sent by the feature extraction end, where the second feature and the corresponding optimal segmentation point are determined by the feature providing end from held features based on a ciphertext of the gradient information and multi-party security computation; and the generating unit is configured to determine a final segmentation point from the optimal segmentation point corresponding to the first feature and the optimal segmentation point corresponding to the second feature based on multi-party security calculation between the feature providing ends.
According to a third aspect of the present disclosure, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to any one of the first aspect.
According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are configured to cause the computer to perform the method according to any one of the first aspect.
According to the technology of the application, in the process of determining the final segmentation point, the label providing end and the feature providing end interact based on the multi-party security computing technology, so that each end can expose the held data to the opposite side to the minimum degree, and information leakage is avoided.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flow diagram of one embodiment of a method for generating information according to the present application;
FIG. 2 is a schematic diagram of an application scenario of a method for generating information according to the present application;
FIG. 3 is a flow diagram of one embodiment of a method for determining a second feature and a corresponding optimal segmentation point according to the present application;
FIG. 4 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present application;
fig. 5 is a block diagram of an electronic device for implementing a method for generating information according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to FIG. 1, a flow diagram 100 of one embodiment of a method for generating information in accordance with the present disclosure is shown. The method for generating information comprises the following steps:
and S101, obtaining gradient information of the sample according to the sample label and the prediction information of the current model for the sample.
In this embodiment, the method for generating information may be applied to a tag providing side. Here, the label providing end may hold the label and a part of the feature of the sample. The label providing end can obtain the gradient information of each sample according to the sample label and the prediction information of the current model for the sample. Here, the gradient information of the sample may be obtained based on a loss function of the model. As an example, the tag providing end may first calculate the derivative of the loss function. Then, for each sample, gradient information can be calculated according to the sample label of the sample, the prediction information of the current model for the sample, and the derivative of the loss function. It is to be appreciated that the prediction information of the current model for the sample can be generated jointly by multiple participants for a joint training model.
In practice, during the training of the model, the model is often subjected to multiple rounds of model training. The current model may refer to a model obtained from a previous round of training. The jointly trained model may be a tree model. The tree model is a supervised machine learning model, e.g., the tree model may be a binary tree or the like. As an example, the algorithm for implementing the Tree model may include an algorithm such as GBDT (Gradient Boosting Decision Tree). The tree model may include a plurality of nodes, and each node may correspond to a location identifier, which may be used to identify the location of the node in the tree model, e.g., may be the number of the node. The plurality of nodes may include leaf nodes and non-leaf nodes. Nodes in the tree model that cannot be split down are called leaf nodes. The leaf nodes correspond to leaf values, each of which may represent a prediction. Nodes in the tree model that can be split down are referred to as non-leaf nodes. The non-leaf nodes may include a root node, and other nodes other than the leaf nodes and the root node. Non-leaf nodes may correspond to segmentation points, which may be used to select a predicted path.
In a practical application scenario, the participants for the joint training model may include a label provider and a feature provider. The label providing end can hold the label and part of the characteristics of the sample. The feature providing end may hold a portion of the features of the sample. Taking one scenario as an example, in this scenario, the tag provider may be a credit agency that holds tags (e.g., high, medium, low, etc.) and partial features (e.g., user age, gender, etc.) of the user's credit risk. The feature provider may be a big data company that may hold part of the features of the user (e.g., academic calendar, annual income, etc.). In the course of the co-training model, credit agencies cannot provide large data companies with the tags and features they hold in order to protect data privacy. Nor can a big data company provide credit agencies with the features they hold. It will be appreciated that in actual joint training, one or more feature providers may be included.
In some optional implementations of this embodiment, the gradient information of the sample may include a first order gradient and a second order gradient. As an example, the tag supply end may first calculate the first and second derivatives of the loss function. Then, for each sample, a first order gradient and a second order gradient can be calculated according to the sample label of the sample, the prediction information of the current model for the sample, and the first order derivative and the second order derivative of the loss function. By the implementation mode, gradient information including first-order gradient and second-order gradient of each sample can be obtained, and conditions are provided for subsequent calculation of segmentation gain.
S102, based on the gradient information, determining a first feature and a corresponding optimal segmentation point from the features held by the local terminal.
In this embodiment, according to the gradient information obtained in S101, the label providing end may determine the first feature and the corresponding optimal segmentation point from the features held by the local end in various ways. As an example, for each feature held by the local terminal, since the tag providing terminal holds all the feature data corresponding to the local terminal feature, the tag providing terminal can determine all possible segmentation points corresponding to each feature. In this way, the label providing end can respectively calculate the segmentation gains obtained when the samples are divided into the subsets by the segmentation points corresponding to the various features in a plaintext form, select one feature from the multiple features as a first feature according to the segmentation gains, and determine the optimal segmentation point corresponding to the first feature. For example, a feature corresponding to the maximum division gain among the calculated division gains may be selected as the first feature, and the division point corresponding to the maximum division gain may be used as the optimal division point. Here, the segmentation gain may be calculated based on gradient information. As an example, the segmentation gain score may be calculated by the following formula:
Figure BDA0002548236410000051
wherein G isLRepresenting the sum of the first order gradients, H, of the samples in the left node after partitioningLRepresenting the sum of the second order gradients of the samples in the left node after partitioning, GRRepresenting the sum of the first order gradients, H, of the samples in the right node after partitioningRAnd B, representing the sum of second-order gradients of samples in the right node after division, G representing the sum of first-order gradients of samples without division, H representing the sum of second-order gradients of samples without division, and lambda representing a regular term coefficient.
In some optional implementations of the present embodiment, the step S102 may specifically be performed as follows:
first, for each division point corresponding to each feature held by the home terminal, a division gain corresponding to each division point is calculated from gradient information.
In this implementation, the label provider may first determine that all the segmentation methods of each of the held features correspond to the segmentation points. For each segmentation point corresponding to each feature, the label providing end can calculate the segmentation gain corresponding to the segmentation point according to the gradient information. For example, for a certain segmentation point of a certain feature, the label providing end may calculate the sum of the first order gradients and the sum of the second order gradients of all samples in the left node divided based on the segmentation point, and may also calculate the sum of the first order gradients and the sum of the second order gradients of all samples in the right node. And calculating the segmentation gain corresponding to the segmentation point based on the sum of the first-order gradients and the sum of the second-order gradients of the left node and the sum of the first-order gradients and the sum of the second-order gradients of the right node.
Then, based on the comparison result of the division gains corresponding to the respective division points, the first feature and the corresponding optimum division point are determined from the features held by the home terminal.
In this implementation, the feature providing side may determine one feature as the first feature and the corresponding optimal division point from among the features held by the side based on the comparison result of the division gains corresponding to the respective division points. For example, a greedy method may be adopted to select a feature corresponding to the maximum segmentation gain in the computed segmentation gains as the first feature, and a segmentation point corresponding to the maximum segmentation gain may be used as the optimal segmentation point. Through the implementation mode, the label providing end can select the first characteristic and the corresponding optimal segmentation point according to the segmentation gain of each segmentation point corresponding to each characteristic held by the end, so that the characteristic with the maximum segmentation gain and the corresponding optimal segmentation point in the characteristics held by the label providing end can be selected.
And S103, sending the cipher text of the gradient information obtained by adopting the homomorphic encryption algorithm to a feature providing end.
In this embodiment, the tag providing end may first encrypt the gradient information obtained in S101 by using a homomorphic encryption algorithm to obtain a ciphertext of the gradient information. And then, the label providing end can send the obtained ciphertext of the gradient information to the feature providing end. Homomorphic Encryption (Homomorphic Encryption) is an Encryption technique. The homomorphic encrypted data is processed to produce an output, which is decrypted, the result being the same as the output obtained by processing the unencrypted original data in the same way. The homomorphic encryption algorithm may include an additive homomorphic encryption algorithm and a multiplicative homomorphic encryption algorithm.
And S104, receiving the second feature and the corresponding optimal segmentation point sent by the feature extraction end.
In this embodiment, the tag providing end may receive the second feature and the corresponding optimal segmentation point sent by the feature providing end. Here, the second feature and the corresponding optimal segmentation point may be feature providing end based on ciphertext of gradient information and multi-party security computation determined from held features. As an example, after receiving the ciphertext of the gradient information sent by the tag providing end, the feature providing end may determine the second feature and the corresponding optimal segmentation point from the held features according to the ciphertext of the gradient information and the multi-party security calculation with the tag providing end. For example, the feature provider may employ multi-party security calculations in calculating the segmentation gains and comparing the segmentation gains. In this way, the feature providing terminal does not provide the label providing terminal with the plaintext segmentation gain, thereby avoiding the label providing terminal deducing the data of the feature held by the feature providing terminal based on the plaintext segmentation gain and protecting the data security of the feature providing terminal.
And S105, determining a final segmentation point from the optimal segmentation point corresponding to the first feature and the optimal segmentation point corresponding to the second feature based on multi-party safety calculation between the feature providing ends.
In this embodiment, the tag provider may determine a final segmentation point from the optimal segmentation point corresponding to the first feature and the optimal segmentation point corresponding to the second feature based on multi-party security calculation with the feature provider. For example, assuming that the segmentation gain obtained by dividing the optimal segmentation point corresponding to the first feature is used as the first segmentation gain and the segmentation gain obtained by dividing the optimal segmentation point corresponding to the second feature is used as the second segmentation gain, the tag providing end and the feature providing end may calculate the first segmentation gain and the second segmentation gain by using a multi-party secure calculation method, and compare the magnitudes of the first segmentation gain and the second segmentation gain. Thereafter, the tag providing terminal may determine a final segmentation point based on the comparison result of the first segmentation gain and the second segmentation gain. For example, a division point corresponding to the larger value of the first division gain and the second division gain may be selected as the final division point.
With continued reference to fig. 2, fig. 2 is a schematic diagram of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 2, the tag provider a is a credit agency holding tags and partial features of the user's credit risk. The label providing end A firstly generates gradient information of the sample according to the sample label and the prediction information of the current model aiming at the sample. Next, the tag providing terminal a determines the first feature and the corresponding optimal segmentation point split _ a from the features held by the terminal based on the gradient information. And then, the label providing end A sends the ciphertext of the gradient information obtained by adopting the homomorphic encryption algorithm to the feature providing end B. Then, the tag providing terminal a receives the second feature and the corresponding optimal split point split _ B sent by the feature extraction terminal B, wherein the second feature and the corresponding optimal split point split _ B are determined by the feature providing terminal B from the held features based on the ciphertext of the gradient information and the multiparty security calculation. Finally, the tag providing terminal a determines a final division point from the optimal division point split _ a corresponding to the first feature and the optimal division point split _ B corresponding to the second feature based on the multiparty security calculation with the feature providing terminal B.
In the method provided by the above embodiment of the present disclosure, in the process of determining the final segmentation point, the tag providing end and the feature providing end interact based on the multi-party security computing technology, and therefore, each end can expose the held data to the other end to the minimum extent, thereby avoiding information leakage.
With further reference to fig. 3, a flow 300 of one embodiment of a method for determining a second feature and a corresponding optimal segmentation point is illustrated. The process 300 for determining a second feature and a corresponding optimal segmentation point comprises the following steps:
s301, for each division point corresponding to each held feature, ciphertext of gradient information of samples in the left node and the right node obtained based on each division point is determined.
In this embodiment, the method for determining the second feature and the corresponding optimal segmentation point may be applied to the feature providing end. Here, the feature providing end may hold a part of the feature of the sample. For each division point corresponding to each held feature, the feature providing end may determine ciphertext of gradient information of samples in the left node and the right node obtained by dividing the node based on the division point.
S302, based on the multi-party security calculation with the label providing end, executing steps S3021-S3023.
In this embodiment, the feature provider may perform the following steps S3021 to S3023 based on the multiparty security calculation with the tag provider.
And S3021, converting the sum of the gradient information of the samples in the left node and the right node obtained based on each division point into fragments.
In this embodiment, the feature providing terminal may first beThe sum of the ciphertexts of the gradient information of the left node is calculated based on each division point, for example, the sum of the ciphertexts of the first order gradient and the sum of the ciphertexts of the second order gradient of all samples in the left node divided based on the division point can be calculated, and the sum of the ciphertexts of the first order gradient and the sum of the ciphertexts of the second order gradient of all samples in the right node can be calculated. As an example, the sum of the above various ciphertexts may be calculated by homomorphic cipher text addition. Then, the feature providing end can convert the sum of the first-order gradient ciphertexts and the sum of the second-order gradient ciphertexts of all samples in the left node, and the sum of the first-order gradient ciphertexts and the sum of the second-order gradient ciphertexts of all samples in the right node into fragments based on a multi-party security computing technology. For example, the sum of the ciphertexts may be converted into a slice based on an Arithmetic Circuit (arithmetric Circuit). Taking the participating parties participating in the calculation as two parties, the arithmetic circuit can realize the addition slicing calculation based on data of the two parties. For example, randomly splitting data x into x0X-r and x1R, wherein r is a random number. Wherein "means assigned value. One of the two parties involved in the calculation holds x0The other party holds x1. Since the holder of any fragment does not have all fragments corresponding to data, no information is leaked except the result.
In some optional implementations of this embodiment, S3021 may also specifically proceed as follows: and adopting an addition homomorphic encryption algorithm to segment the sum of the ciphertext of the gradient information of the samples in the left node and the right node obtained by each segmentation point.
In this implementation, the feature providing end may use an addition homomorphic encryption algorithm to segment the sum of the ciphertexts of the gradient information of the samples in the left node and the right node obtained at each segmentation point. For example, the addition homomorphic encryption algorithm may be used to convert the sum of the first-order gradient ciphertexts and the sum of the second-order gradient ciphertexts of all samples in the left node, and the sum of the first-order gradient ciphertexts and the second-order gradient ciphertexts of all samples in the right node into fragments. By the implementation mode, the data can be segmented by adopting an addition homomorphic encryption algorithm.
S3022, division gains corresponding to the division points are calculated based on the slices.
In this embodiment, the feature providing end and the tag providing end may jointly calculate, through multi-party security calculation, a segmentation point gain corresponding to each segmentation point based on the segmentation obtained in S3021. For example, the partition gain may be calculated in a homomorphic ciphertext manner.
S3023, a result of comparing the division gains corresponding to the respective division points is determined.
In this embodiment, the feature provider may determine a comparison result of the segmentation gains corresponding to the segmentation points by using multi-party security calculation with the tag provider. The data is compared based on a multi-party safety calculation mode, and the safety of the data can be ensured. For example, based on multi-party security computation, the sizes of two data may be compared in the following manner, taking x and y as examples of data to be compared, and the data is sliced in the above slicing manner, where x is x0+x1And y ═ y0+y1Both parties calculate z first0=x0-y0And z1=x1-y1Combining both parties to make z ═ z0+z1Converted into an obfuscation circuit z ', and the sign bit (most significant bit) of z' is restored to the plain text to obtain the result of the comparison of x and y magnitudes. As an example, z may be converted to z' by transferring z via an inadvertent Transfer (Oblivious Transfer)0And z1Conversion to garbled Circuit z'0And z'1And the two parties jointly calculate z ' ═ z ' through a garbled circuit '0+z′1
S303, based on the comparison result, the second feature and the corresponding optimal segmentation point are determined from the held features.
In this implementation, the feature providing end may determine one feature from the held features as the second feature and determine the corresponding optimal segmentation point according to the comparison result. For example, a greedy method may be adopted to select a feature corresponding to the maximum segmentation gain in the computed segmentation gains as the second feature, and a segmentation point corresponding to the maximum segmentation gain may be used as the optimal segmentation point.
In the method provided by the above embodiment of the present disclosure, in the process of determining the second feature and the corresponding optimal segmentation point, the tag providing end and the feature providing end interact based on the multi-party security computing technology, and therefore, each end can expose held data to the other end to the minimum extent, thereby avoiding information leakage.
With further reference to fig. 4, as an implementation of the manner shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for generating information, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1, and the apparatus may be partially disposed in a tag providing end.
As shown in fig. 4, the apparatus 400 for generating information of the present embodiment includes: a first determining unit 401, a second determining unit 402, a transmitting unit 403, a receiving unit 404, and a generating unit 405. Wherein the first determining unit 401 is configured to obtain gradient information of the sample according to the sample label and prediction information of the current model for the sample; the second determining unit 402 is configured to determine the first feature and the corresponding optimal segmentation point from the features held by the local end based on the gradient information; the sending unit 403 is configured to send the ciphertext of the gradient information obtained by using the homomorphic encryption algorithm to the feature providing terminal; the receiving unit 404 is configured to receive a second feature and a corresponding optimal segmentation point sent by the feature extraction end, where the second feature and the corresponding optimal segmentation point are determined by the feature providing end from held features based on the ciphertext of the gradient information and multi-party security calculation; the generating unit 405 is configured to determine a final segmentation point from the optimal segmentation point corresponding to the first feature and the optimal segmentation point corresponding to the second feature based on a multiparty security calculation with the feature provider.
In this embodiment, specific processes of the first determining unit 401, the second determining unit 402, the sending unit 403, the receiving unit 404, and the generating unit 405 of the apparatus 400 for generating information and technical effects brought by the specific processes may refer to descriptions of S101, S102, S103, S104, and S105 in the corresponding embodiment of fig. 1, and are not repeated herein.
In some optional implementations of the present embodiment, the second determining unit 402 is further configured to: calculating a division gain corresponding to each division point according to the gradient information for each division point corresponding to each feature held by the local terminal; based on the comparison result of the division gains corresponding to the respective division points, the first feature and the corresponding optimum division point are determined from the features held by the home terminal.
In some optional implementations of this embodiment, the apparatus 400 further includes a third determining unit (not shown in the figure) configured at a feature providing end, where the third determining unit is configured to determine the second feature and the corresponding optimal segmentation point, and the third determining unit includes: a determining subunit (not shown in the figure) configured to determine, for each division point corresponding to each held feature, a ciphertext of gradient information of samples in the left node and the right node obtained based on each division point; an execution unit (not shown in the figure) configured to execute a preset step based on the multi-party security calculation with the tag provider, the execution unit comprising: a conversion unit (not shown in the figure) configured to convert a sum of ciphertexts of gradient information of samples in the left node and the right node obtained based on the respective division points into a slice; a calculation unit (not shown in the figure) configured to calculate a division gain corresponding to each division point based on the slice; a result determination unit (not shown in the figure) configured to determine a comparison result of the division gains corresponding to the respective division points; a segmentation point determination unit (not shown in the figure) configured to determine a second feature and a corresponding optimal segmentation point from the held features based on the comparison result.
In some optional implementations of this embodiment, the above conversion unit is further configured to: and adopting an addition homomorphic encryption algorithm to segment the sum of the ciphertext of the gradient information of the samples in the left node and the right node obtained by each segmentation point.
In some optional implementations of this embodiment, the gradient information of the sample includes a first order gradient and a second order gradient.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 5, is a block diagram of an electronic device for generating information according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.
Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for generating information provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for generating information provided herein.
The memory 502, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for generating information in the embodiments of the present application (for example, the first determining unit 401, the second determining unit 402, the transmitting unit 403, the receiving unit 404, and the generating unit 405 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, i.e., implements the method for generating information in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 502.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device for generating information, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to an electronic device for generating information over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method for generating information may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus used to generate the information, such as an input device such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, in the process of determining the final segmentation point, the label providing end and the feature providing end interact based on the multi-party security computing technology, so that each end can expose the held data to the opposite side to the minimum degree, and information leakage is avoided.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A method for generating information, which is applied to a label providing end and comprises the following steps:
obtaining gradient information of the sample according to the sample label and the prediction information of the current model aiming at the sample;
determining a first feature and a corresponding optimal segmentation point from the features held by the local end based on the gradient information;
sending the cipher text of the gradient information obtained by adopting a homomorphic encryption algorithm to a characteristic providing end;
receiving a second feature and a corresponding optimal segmentation point sent by the feature extraction end, wherein the second feature and the corresponding optimal segmentation point are determined from held features by the feature providing end based on a ciphertext of the gradient information and multi-party security calculation;
and determining a final segmentation point from the optimal segmentation point corresponding to the first feature and the optimal segmentation point corresponding to the second feature based on multi-party security calculation between the feature providing terminals.
2. The method of claim 1, wherein the determining a first feature and a corresponding optimal segmentation point from features held by a local end based on the gradient information comprises:
calculating the segmentation gain corresponding to each segmentation point according to the gradient information for each segmentation point corresponding to each feature held by the local terminal;
based on the comparison result of the division gains corresponding to the respective division points, the first feature and the corresponding optimum division point are determined from the features held by the home terminal.
3. The method of claim 1, wherein the second feature and the corresponding optimal segmentation point are determined by the feature provider by:
determining ciphertext of gradient information of samples in the left node and the right node obtained based on each division point for each division point corresponding to each held characteristic;
based on the multi-party security calculation with the tag provider, the following steps are executed: converting the sum of the ciphertext of the gradient information of the samples in the left node and the right node obtained based on each division point into fragments; calculating the segmentation gain corresponding to each segmentation point based on the fragments; determining a comparison result of the segmentation gains corresponding to the segmentation points;
based on the comparison result, a second feature and a corresponding optimal segmentation point are determined from the held features.
4. The method of claim 3, wherein converting the sum of the ciphertext of the gradient information of the samples in the left node and the right node into the slice, which is obtained based on each partitioning point, comprises:
and adopting an addition homomorphic encryption algorithm to segment the sum of the ciphertext of the gradient information of the samples in the left node and the right node obtained by each segmentation point.
5. The method of claim 1, wherein the gradient information of the sample comprises first and second order gradients.
6. An apparatus for generating information, partially disposed at a tag provider, comprising:
the first determining unit is configured to obtain gradient information of the sample according to the sample label and the prediction information of the current model for the sample;
the second determining unit is configured to determine the first feature and the corresponding optimal segmentation point from the features held by the local terminal based on the gradient information;
the sending unit is configured to send the ciphertext of the gradient information obtained by adopting a homomorphic encryption algorithm to a characteristic providing end;
the receiving unit is configured to receive a second feature and a corresponding optimal segmentation point sent by the feature extraction end, wherein the second feature and the corresponding optimal segmentation point are determined from held features by the feature providing end based on the ciphertext of the gradient information and multi-party security calculation;
and the generating unit is configured to determine a final segmentation point from the optimal segmentation point corresponding to the first feature and the optimal segmentation point corresponding to the second feature based on multi-party security calculation between the feature providing ends.
7. The apparatus of claim 6, wherein the second determining unit is further configured to:
calculating the segmentation gain corresponding to each segmentation point according to the gradient information for each segmentation point corresponding to each feature held by the local terminal;
based on the comparison result of the division gains corresponding to the respective division points, the first feature and the corresponding optimum division point are determined from the features held by the home terminal.
8. The apparatus according to claim 6, wherein the apparatus further comprises a third determining unit configured at a feature providing end, the third determining unit configured to determine the second feature and the corresponding optimal segmentation point, the third determining unit comprising:
a determining subunit configured to determine, for each division point corresponding to each held feature, a ciphertext of gradient information of samples in the left node and the right node obtained based on each division point;
an execution unit configured to execute preset steps based on multi-party security calculation with the tag provider, the execution unit comprising: a conversion unit configured to convert a sum of ciphertexts of gradient information of samples in the left node and the right node obtained based on the respective division points into a slice; a calculation unit configured to calculate a division gain corresponding to each division point based on the slice; a result determination unit configured to determine a comparison result of the division gains corresponding to the respective division points;
and a division point determination unit configured to determine a second feature and a corresponding optimal division point from the held features based on the comparison result.
9. The apparatus of claim 8, wherein the conversion unit is further configured to:
and adopting an addition homomorphic encryption algorithm to segment the sum of the ciphertext of the gradient information of the samples in the left node and the right node obtained by each segmentation point.
10. The apparatus of claim 6, wherein the gradient information of the sample comprises a first order gradient and a second order gradient.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202010567116.9A 2020-06-19 2020-06-19 Method and device for generating information Active CN113824546B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010567116.9A CN113824546B (en) 2020-06-19 2020-06-19 Method and device for generating information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010567116.9A CN113824546B (en) 2020-06-19 2020-06-19 Method and device for generating information

Publications (2)

Publication Number Publication Date
CN113824546A true CN113824546A (en) 2021-12-21
CN113824546B CN113824546B (en) 2024-04-02

Family

ID=78911609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010567116.9A Active CN113824546B (en) 2020-06-19 2020-06-19 Method and device for generating information

Country Status (1)

Country Link
CN (1) CN113824546B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101573413B1 (en) * 2014-11-28 2015-12-01 건국대학교 산학협력단 Apparatus and method for detecting intrusion using principal component analysis
CN108536650A (en) * 2018-04-03 2018-09-14 北京京东尚科信息技术有限公司 Generate the method and apparatus that gradient promotes tree-model
CN108712260A (en) * 2018-05-09 2018-10-26 曲阜师范大学 The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment
CN109299728A (en) * 2018-08-10 2019-02-01 深圳前海微众银行股份有限公司 Federal learning method, system and readable storage medium storing program for executing
CN109684855A (en) * 2018-12-17 2019-04-26 电子科技大学 A kind of combined depth learning training method based on secret protection technology
CN110728687A (en) * 2019-10-15 2020-01-24 卓尔智联(武汉)研究院有限公司 File image segmentation method and device, computer equipment and storage medium
WO2020029590A1 (en) * 2018-08-10 2020-02-13 深圳前海微众银行股份有限公司 Sample prediction method and device based on federated training, and storage medium
WO2020034751A1 (en) * 2018-08-14 2020-02-20 阿里巴巴集团控股有限公司 Multi-party security computing method and apparatus, and electronic device
CN110995737A (en) * 2019-12-13 2020-04-10 支付宝(杭州)信息技术有限公司 Gradient fusion method and device for federal learning and electronic equipment
CN110990857A (en) * 2019-12-11 2020-04-10 支付宝(杭州)信息技术有限公司 Multi-party combined feature evaluation method and device for protecting privacy and safety
CN111144576A (en) * 2019-12-13 2020-05-12 支付宝(杭州)信息技术有限公司 Model training method and device and electronic equipment
CN111160573A (en) * 2020-04-01 2020-05-15 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy joint training by two parties

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101573413B1 (en) * 2014-11-28 2015-12-01 건국대학교 산학협력단 Apparatus and method for detecting intrusion using principal component analysis
CN108536650A (en) * 2018-04-03 2018-09-14 北京京东尚科信息技术有限公司 Generate the method and apparatus that gradient promotes tree-model
CN108712260A (en) * 2018-05-09 2018-10-26 曲阜师范大学 The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment
CN109299728A (en) * 2018-08-10 2019-02-01 深圳前海微众银行股份有限公司 Federal learning method, system and readable storage medium storing program for executing
WO2020029590A1 (en) * 2018-08-10 2020-02-13 深圳前海微众银行股份有限公司 Sample prediction method and device based on federated training, and storage medium
WO2020034751A1 (en) * 2018-08-14 2020-02-20 阿里巴巴集团控股有限公司 Multi-party security computing method and apparatus, and electronic device
CN109684855A (en) * 2018-12-17 2019-04-26 电子科技大学 A kind of combined depth learning training method based on secret protection technology
CN110728687A (en) * 2019-10-15 2020-01-24 卓尔智联(武汉)研究院有限公司 File image segmentation method and device, computer equipment and storage medium
CN110990857A (en) * 2019-12-11 2020-04-10 支付宝(杭州)信息技术有限公司 Multi-party combined feature evaluation method and device for protecting privacy and safety
CN110995737A (en) * 2019-12-13 2020-04-10 支付宝(杭州)信息技术有限公司 Gradient fusion method and device for federal learning and electronic equipment
CN111144576A (en) * 2019-12-13 2020-05-12 支付宝(杭州)信息技术有限公司 Model training method and device and electronic equipment
CN111160573A (en) * 2020-04-01 2020-05-15 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy joint training by two parties

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
贾延延;张昭;冯键;王春凯;: "联邦学习模型在涉密数据处理中的应用", 中国电子科学研究院学报, no. 01 *
陈国润;母美荣;张蕊;孙丹;钱栋军;: "基于联邦学习的通信诈骗识别模型的实现", 电信科学, no. 1 *

Also Published As

Publication number Publication date
CN113824546B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN111125727B (en) Confusion circuit generation method, prediction result determination method, device and electronic equipment
CN112560091B (en) Digital signature method, signature information verification method, related device and electronic equipment
CN111783124B (en) Data processing method, device and server based on privacy protection
CN112016110B (en) Method, device, equipment and storage medium for storing data
CN111934872B (en) Key processing method, device, electronic equipment and storage medium
KR20220041707A (en) Model federated training method, apparatus, equipment and storage medium
CN113098691B (en) Digital signature method, signature information verification method, related device and electronic equipment
CN110391895B (en) Data preprocessing method, ciphertext data acquisition method, device and electronic equipment
CN109359476B (en) Hidden input two-party mode matching method and device
CN111310204A (en) Data processing method and device
CN113762328B (en) Model training method, device, equipment and storage medium based on federal learning
CN114186256B (en) Training method, device, equipment and storage medium of neural network model
CN113407976B (en) Digital signature method, signature information verification method, related device and electronic equipment
US20230336344A1 (en) Data processing methods, apparatuses, and computer devices for privacy protection
CN112182109A (en) Distributed data coding storage method based on block chain and electronic equipment
CN113722739B (en) Gradient lifting tree model generation method and device, electronic equipment and storage medium
CN111046431B (en) Data processing method, query method, device, electronic equipment and system
US20200167665A1 (en) Performing data processing based on decision tree
CN112182112A (en) Block chain based distributed data dynamic storage method and electronic equipment
CN112182108A (en) Block chain based distributed data storage updating method and electronic equipment
CN113824546A (en) Method and apparatus for generating information
CN115426111A (en) Data encryption method and device, electronic equipment and storage medium
CN113626848A (en) Sample data generation method and device, electronic equipment and computer readable medium
US20230186102A1 (en) Training method and apparatus for neural network model, device and storage medium
US9916344B2 (en) Computation of composite functions in a map-reduce framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant