CN110807528A - Feature correlation calculation method, device and computer-readable storage medium - Google Patents

Feature correlation calculation method, device and computer-readable storage medium Download PDF

Info

Publication number
CN110807528A
CN110807528A CN201911046722.XA CN201911046722A CN110807528A CN 110807528 A CN110807528 A CN 110807528A CN 201911046722 A CN201911046722 A CN 201911046722A CN 110807528 A CN110807528 A CN 110807528A
Authority
CN
China
Prior art keywords
data
feature
correlation
characteristic data
random number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911046722.XA
Other languages
Chinese (zh)
Inventor
谭明超
范涛
魏文斌
马国强
郑会钿
陈天健
杨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201911046722.XA priority Critical patent/CN110807528A/en
Publication of CN110807528A publication Critical patent/CN110807528A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a feature correlation calculation method, a device and a computer readable storage medium, wherein the method comprises the following steps: receiving encrypted characteristic data sent by first equipment, carrying out normalization processing on the first characteristic data by the first equipment, and encrypting a processing result to obtain the encrypted characteristic data; normalizing the second characteristic data, and performing random number adding operation on a processing result to obtain random number added characteristic data; and calculating to obtain an encryption related value according to the encryption characteristic data and the encrypted random number characteristic data, and sending the encryption related value to the first equipment so that the first equipment can decrypt the encryption related value to obtain the related value of the first characteristic data and the second characteristic data. According to the method, one party takes encryption measures to the data, the other party takes measures for adding random numbers to the data, and both parties take certain encryption protection measures, so that both parties cannot steal the data of the other party, and the safety of longitudinal federal learning modeling is enhanced.

Description

Feature correlation calculation method, device and computer-readable storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a feature correlation calculation method, a feature correlation calculation device, and a computer-readable storage medium.
Background
With the rapid development and wide application of machine learning, the role of feature engineering in machine learning is more and more important, the feature engineering refers to the process of converting original data into training data of a model, and the purpose of the feature engineering is to obtain better training data features so that the performance of the model is improved. The feature engineering generally comprises three parts of feature construction, feature extraction and feature selection, and feature correlation calculation is an important method of the feature selection part. Some redundant features are eliminated by calculating the correlation among the features, and the features which have great effect on model training are reserved. For example, to examine which features have an influence on sales of ice cream, the correlation between the features is very high, such as temperature, season, whether hot or hot, whether summer, and the like, and if correlation calculation and feature selection are not performed, the features are used as training data of a model, so that noise is added to the model.
At present, under the application scene of longitudinal federal learning, each participant participating in the federal learning has different data characteristics, the correlation of the characteristics needs to be calculated jointly, and then the characteristics are selected according to the correlation. In order to ensure the data privacy of each party of data, each party generally cannot directly transmit original data, and at present, there are two main schemes for jointly calculating the feature correlation:
the method comprises the following steps that 1, the party A encrypts data and sends the encrypted data to the party B, and the party B directly calculates correlation according to the sent data and returns the correlation. The scheme has poor capability for the case that the A party has malicious data to be extracted from the B party, namely the A party can extract the data of the B party through forged data.
And 2, adding the random number into the data by the party A and then sending the data to the party B for calculation. The scheme effectively avoids the risk that the party A steals the data of the party B in the method 1. However, the scheme also has the defect in the aspect of safety, namely the B party can find the distribution of the random numbers added by the A party through a mode of modeling for many times, so that the random numbers are eliminated to a certain extent, and the approximate value of the original data of the A party is obtained.
Therefore, the scheme of jointly performing feature correlation calculation by all the participants in the longitudinal federal learning at present has certain safety risk.
Disclosure of Invention
The invention mainly aims to provide a feature correlation calculation method, a device and a computer readable storage medium, and aims to provide a feature correlation calculation scheme in longitudinal federal learning, so that the situation that one federal party steals data of the other federal party is avoided, and the safety of federal learning modeling is enhanced.
In order to achieve the above object, the present invention provides a feature correlation calculation method, which is applied to a second device participating in longitudinal federal learning, the second device being in communication connection with a first device participating in longitudinal federal learning, the feature correlation calculation method including:
receiving encrypted feature data sent by the first device, wherein the first device performs normalization processing on the first feature data of the correlation to be calculated, and encrypts a processing result according to an encryption algorithm to obtain the encrypted feature data;
the second feature data of the correlation to be calculated is subjected to normalization processing, and random number adding operation is carried out on the processing result to obtain random number added feature data;
and calculating to obtain an encryption related value according to the encryption characteristic data and the encrypted random number characteristic data, and sending the encryption related value to the first equipment so that the first equipment can decrypt the encryption related value to obtain the related value of the first characteristic data and the second characteristic data.
Optionally, after the step of obtaining an encryption related value by calculating according to the encryption characteristic data and the random number-added characteristic data, and sending the encryption related value to the first device, the method further includes:
receiving the correlation value sent by the first device;
and performing feature selection on the first feature data and the second feature data according to the correlation values.
Optionally, the step of performing the normalization process on the second feature data of the correlation to be calculated includes:
calculating the mean value and the standard deviation of each data in the second characteristic data of the correlation to be calculated;
and subtracting the mean value from each data in the second characteristic data respectively and dividing the result by the standard deviation.
Optionally, the step of performing an operation of adding a random number to the processing result to obtain feature data of the added random number includes:
and respectively adding different preset random numbers to each data in the second characteristic data after normalization processing to obtain random number added characteristic data.
Optionally, the encryption algorithm is a preset homomorphic encryption algorithm.
Optionally, when the second feature data is stored in different execution machines of the distributed cluster, the step of performing the normalization processing on the second feature data with the correlation to be calculated includes:
and performing the normalization processing on the second characteristic data of the correlation to be calculated by adopting a distributed calculation mode.
In order to achieve the above object, the present invention further provides a feature correlation calculation method, where the feature correlation calculation method is applied to a first device participating in longitudinal federal learning, and a second device participating in longitudinal federal learning is communicatively connected to the second device, and the feature correlation calculation method includes:
normalizing the first characteristic data of the correlation to be calculated, and encrypting the processing result according to an encryption algorithm to obtain encrypted characteristic data;
sending the encrypted feature data to the second device, so that the second device can calculate an encrypted correlation value according to the encrypted feature data and the random number adding feature data, wherein the second device performs normalization processing on second feature data to be correlated, and performs random number adding operation on a processing result to obtain the random number adding feature data;
and receiving the encrypted related value returned by the second device and decrypting the encrypted related value to obtain the related value of the first characteristic data and the second characteristic data.
Further, to achieve the above object, the present invention also provides a feature correlation calculation apparatus including a memory, a processor, and a feature correlation calculation program stored on the memory and executable on the processor, the feature correlation calculation program implementing the steps of the feature correlation calculation method as described above when executed by the processor.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a feature correlation calculation program which, when executed by a processor, realizes the steps of the feature correlation calculation method as described above.
In the invention, encrypted characteristic data sent by first equipment is received, wherein the first equipment carries out normalization processing on the first characteristic data of the correlation to be calculated and encrypts a processing result according to an encryption algorithm to obtain the encrypted characteristic data; normalizing the second characteristic data of the correlation to be calculated, and performing random number adding operation on the processing result to obtain random number added characteristic data; and calculating to obtain an encryption related value according to the encryption characteristic data and the encryption random number characteristic data, and sending the encryption related value to the first equipment so that the first equipment can decrypt the encryption related value to obtain the related value of the first characteristic data and the second characteristic data. Compared with the traditional scheme that one side of each of the two parties is added with an encryption or random number adding mechanism, so that the other party can steal the data of the other party in a mode of modeling for many times or constructing special data, the method adopts encryption measures on the data through the first equipment, adopts random number adding measures on the data through the second equipment, and adopts certain encryption protection measures on the two parties, so that the two parties can not steal the data of the other party, and the safety of longitudinal federal learning modeling is enhanced.
Drawings
FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram illustrating a first embodiment of a method for calculating a feature correlation according to the present invention;
fig. 3 is a schematic diagram illustrating joint calculation of feature correlation values for an a-party and a B-party according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a feature correlation computing device, and referring to fig. 1, fig. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the invention.
It should be noted that fig. 1 is a schematic structural diagram of a hardware operating environment of a feature correlation computing device. The characteristic correlation computing device of the embodiment of the invention can be a PC, and can also be a terminal device with a display function, such as a smart phone, a smart television, a tablet computer, a portable computer and the like.
As shown in fig. 1, the feature correlation calculation device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the feature correlation computing device may further include a camera, RF (Radio Frequency) circuitry, sensors, audio circuitry, a WiFi module, and so forth. Those skilled in the art will appreciate that the feature correlation computing device configuration shown in FIG. 1 does not constitute a limitation of the feature correlation computing device, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a feature correlation calculation program.
In the feature correlation calculation apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting other participating apparatuses participating in longitudinal federal learning, and performing data communication with the other participating apparatuses; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the feature correlation calculation program stored in the memory 1005 and perform the following operations:
receiving encrypted feature data sent by the first device, wherein the first device performs normalization processing on the first feature data of the correlation to be calculated, and encrypts a processing result according to an encryption algorithm to obtain the encrypted feature data;
the second feature data of the correlation to be calculated is subjected to normalization processing, and random number adding operation is carried out on the processing result to obtain random number added feature data;
and calculating to obtain an encryption related value according to the encryption characteristic data and the encrypted random number characteristic data, and sending the encryption related value to the first equipment so that the first equipment can decrypt the encryption related value to obtain the related value of the first characteristic data and the second characteristic data.
Further, after the step of calculating an encryption correlation value according to the encrypted feature data and the encrypted random number feature data, and sending the encryption correlation value to the first device, the processor 1001 may be configured to call a feature correlation calculation program stored in the memory 1005, and further perform the following operations:
receiving the correlation value sent by the first device;
and performing feature selection on the first feature data and the second feature data according to the correlation values.
Further, the step of performing the normalization process on the second feature data of the correlation to be calculated includes:
calculating the mean value and the standard deviation of each data in the second characteristic data of the correlation to be calculated;
and subtracting the mean value from each data in the second characteristic data respectively and dividing the result by the standard deviation.
Further, the step of performing an operation of adding a random number to the processing result to obtain the characteristic data of the added random number includes:
and respectively adding different preset random numbers to each data in the second characteristic data after normalization processing to obtain random number added characteristic data.
Further, the encryption algorithm is a preset homomorphic encryption algorithm.
Further, when the second feature data is stored in different execution machines of the distributed cluster, the step of performing the normalization processing on the second feature data of the correlation to be calculated includes:
and performing the normalization processing on the second characteristic data of the correlation to be calculated by adopting a distributed calculation mode.
The present invention also provides a feature correlation calculation apparatus comprising a memory, a processor, and a feature correlation calculation program stored on the memory and executable on the processor, the feature correlation calculation program when executed by the processor implementing the following:
normalizing the first characteristic data of the correlation to be calculated, and encrypting the processing result according to an encryption algorithm to obtain encrypted characteristic data;
sending the encrypted feature data to the second device, so that the second device can calculate an encrypted correlation value according to the encrypted feature data and the random number adding feature data, wherein the second device performs normalization processing on second feature data to be correlated, and performs random number adding operation on a processing result to obtain the random number adding feature data;
and receiving the encrypted related value returned by the second device and decrypting the encrypted related value to obtain the related value of the first characteristic data and the second characteristic data.
Based on the hardware structure, various embodiments of the feature correlation calculation method of the present invention are proposed.
Referring to fig. 2, a first embodiment of the feature correlation calculation method of the present invention provides a feature correlation calculation method, and it is noted that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that here. The feature correlation calculation method is applied to second equipment participating in longitudinal federal learning, the second equipment is in communication connection with first equipment participating in longitudinal federal learning, the first equipment and the second equipment can be servers, and also can be terminal equipment such as PCs, smart phones, smart televisions, tablet computers and portable computers, and the feature correlation calculation method comprises the following steps:
step S10, receiving encrypted feature data sent by the first device, wherein the first device performs normalization processing on the first feature data of the correlation to be calculated, and encrypts a processing result according to an encryption algorithm to obtain the encrypted feature data;
in this embodiment, each participant of the longitudinal federal learning locally owns a part of data, and each data has less overlap in feature dimension and more overlap in user dimension; all parties can determine users owned by all parties in a sample alignment mode, and then determine the correlation among different characteristics of all parties by using data after sample alignment so as to select the characteristics according to the correlation among the characteristics.
The following description of the present embodiment is made by using a first device and a second device participating in longitudinal federated learning, and it should be understood that the feature correlation calculation method described in the present embodiment may be generalized to a plurality of participating devices participating in longitudinal federated learning.
Specifically, the first device and the second device establish a communication connection. The first device and the second device can perform sample alignment, and a user common to both is determined. For example, the user dimension in the first device is { U1, U2, U3, U4}, the feature dimension is { X4, X5}, the user dimension in the second device is { U1, U2, U3, U4, U5}, the feature dimension is { X1, X2, X3}, and the data tag Y is also included in the second device; the first device and the second device determine that the common user is { U1, U2, U3} through sample alignment. The first device and the second device may jointly calculate the correlation between the two different features, for example, the correlations between the features X4 and X5 and the features X1, X2, X3 and Y may be calculated, respectively. Two features of the correlation to be calculated, namely a first feature in the first device and a second feature in the second device, are specifically set forth below.
The correlation between features needs to be calculated from the feature data, such as calculating the correlation between X1 and X5, feature data under feature X5 for users U1, U2 and U3 in the second device, and feature data under feature X6 for users U1, U2 and U3 in the second device. The feature data may be used as a vector, where the elements in the vector are data of each user under the feature, the dimension of the vector is the number of users shared by the first device and the second device, for example, the feature to be calculated as the age, and the feature data is the age value of the user U1, the age value of the user U2, and the age value of the user U3.
Because the data of the first device and the second device have privacy, the data of the first device cannot be directly sent to the second device, and the data of the second device cannot be directly sent to the first device. In this embodiment, for a first feature of a correlation to be calculated, a first device performs normalization processing on first feature data corresponding to the first feature, encrypts a result of the normalization processing according to an encryption algorithm to obtain encrypted feature data, and sends the encrypted feature data to a second device. The normalization process may be to calculate an average value and a standard deviation of each data in the first feature data, subtract the average value from each data, and divide the data by the standard deviation. The encryption algorithm used by the first device may be a preset homomorphic encryption algorithm, such as Paillier algorithm, which has the effect of processing homomorphic encrypted data to obtain an output, and decrypting the output, and the result is the same as the output result obtained by processing unencrypted original data by the same method.
The second device receives the encrypted feature data sent by the first device.
Step S20, the second feature data of the correlation to be calculated is processed with the normalization processing, and the processing result is processed with the random number adding operation to obtain the random number adding feature data;
and the second equipment normalizes second feature data corresponding to the second feature of the correlation to be calculated, and performs random number adding operation on the normalization processing result to obtain random number added feature data. The second device may add the same random number to each piece of data in the normalized second feature data, where the random number may be generated by the second device using a preset random number generation algorithm. For example, for the normalized second feature data { x1, x2, x3}, a random number r is added to each data to obtain added random number feature data { x1+ r, x2+ r, x3+ r }.
Further, the step of performing the normalization process on the second feature data of the correlation to be calculated in step S20 includes:
step S201, calculating the mean value and standard deviation of each data in the second characteristic data of the correlation to be calculated;
step S202, subtracting the mean value from each data in the second feature data, and dividing the result by the standard deviation.
Second device to second characteristic dataThe normalization process may be performed by: and calculating the mean value and the standard deviation of each data in the second characteristic data. If the second characteristic data is { x1, x2, x3}, the mean value is calculated as mux(x1+ x2+ x3)/3, with a standard deviation of:
Figure BDA0002254317210000081
subtracting the mean value from each of the second feature data, and dividing by the standard deviation, such as calculating (x 1-mu)x)/σx、(x2-μx)/σxAnd (x 3-mu)x)/σxAnd obtaining the result of each data after normalization.
Further, in order to enhance the security of the data in the second device, the step of performing an operation of adding a random number to the processing result in step S20 to obtain feature data of the added random number includes:
step S203, adding different preset random numbers to each data in the second feature data after the normalization processing, respectively, to obtain feature data with random numbers added.
The second device adds different preset random numbers to each data in the second characteristic data after the normalization processing to obtain the characteristic data added with the random numbers, wherein the preset random numbers can be different random numbers generated by the second device by adopting a preset random number generation algorithm, and the number of the random numbers is the same as the number of the data in the second characteristic data. If the preset random numbers are r1, r2 and r3, different random numbers are respectively added to the normalized second feature data { x1, x2 and x3}, and feature data { x1+ r1, x2+ r2, x3+ r3} of the added random numbers are obtained. The second device adds different random numbers to each data in the second characteristic data, so that the second characteristic data is protected more safely, and the safety of the data in the second device is further enhanced.
Step S30, calculating an encryption correlation value according to the encrypted feature data and the encrypted random number feature data, and sending the encryption correlation value to the first device, so that the first device decrypts the encryption correlation value to obtain the correlation value between the first feature data and the second feature data.
After the second device obtains the feature data of the random number by calculation, the feature data of the random number and the encrypted feature data are used for calculation to obtain an encrypted correlation value. The second device transmits the encrypted correlation value to the first device. The first device decrypts the encrypted related value, where the decryption operation corresponds to the encryption operation of the first device for encrypting the normalization processing result of the first feature data, that is, the first device encrypts the normalization processing result of the first feature data by using the encryption key, and then decrypts the encrypted related value by using the decryption key corresponding to the encryption key, where the final obtained result is the related value of the first feature data and the second feature data, that is, the related value of the first feature data and the second feature data.
The mathematical principle of the joint calculation of the feature correlation by the first device and the second device in the embodiment is derived as follows:
the correlation coefficient is a quantity used to measure the degree of linear correlation between two variables, and is commonly referred to as the pearson coefficient. The formula for the Pearson coefficient is:
Figure BDA0002254317210000091
in equation (1), x and y are two variables, respectively, cov (x, y) represents the covariance, σ, of the two variablesxAnd σyRespectively, the standard deviation of the two variables. X and Y are two vectors corresponding to the values of X and Y, respectively, muxAnd muyRespectively, are the average of the elements in the two vectors, E representing expectation.
As can be seen from equation (1), the correlation between two columns of data can be obtained by dividing the covariance of the two columns by the product of the standard deviation of the two columns. By transformation, the Pearson correlation can be obtained as the result of subtracting the mean value of the two eigenvectors divided by the standard deviation and then performing inner product. The subtraction of the mean value divided by the standard deviation is a normalization operation, so the Pearson correlation can be regarded as the result of the inner product of two feature vectors after normalization.
Assuming that the X vector belongs to party a (the first device) and the Y vector belongs to party B (the second device), party a normalizes the X vector, performs homomorphic encryption, and sends it to party B, party B normalizes the Y vector, adds an arbitrary random number R, and then calculates the correlation value, the above formula can be transformed:
Figure BDA0002254317210000101
wherein [ ] ] denotes homomorphic encryption. Since R is the random number added by the B-party itself, normalization with the X-vector is obviously independent, so the expectation of the product of the two is equal to the expected product of the two. At the same time, the expectation after X normalization is also 0, so the product of the latter two terms is zero. The final result is thus the encrypted result [ [ P ] ] of the correlation values of the X vector and the Y vector Pearson.
As shown in fig. 3, a diagram of the joint calculation of the correlation value between the feature X of the party a and the feature Y of the party B is shown for the parties a and B.
In this embodiment, encrypted feature data sent by a first device is received, where the first device performs normalization processing on first feature data of a to-be-calculated correlation, and encrypts a processing result according to an encryption algorithm to obtain the encrypted feature data; normalizing the second characteristic data of the correlation to be calculated, and performing random number adding operation on the processing result to obtain random number added characteristic data; and calculating to obtain an encryption related value according to the encryption characteristic data and the encryption random number characteristic data, and sending the encryption related value to the first equipment so that the first equipment can decrypt the encryption related value to obtain the related value of the first characteristic data and the second characteristic data. Compared with the traditional scheme that only one side of two parties is added with an encryption or random number adding mechanism, so that the other party can steal the data of the other party in a mode of modeling for many times or constructing special data, in the embodiment, the first equipment takes encryption measures on the data, the second equipment takes random number adding measures on the data, and the two parties take certain encryption protection measures, so that the two parties can not steal the data of the other party, and the safety of longitudinal federal learning modeling is enhanced.
Further, based on the first embodiment, a second embodiment of the feature correlation calculation method of the present invention provides a feature correlation calculation method. In this embodiment, after the step S30, the method further includes:
step S40, receiving the correlation value sent by the first device;
step S50, selecting the feature of the first feature data and the second feature data according to the correlation value.
In this embodiment, after the first device obtains the correlation value between the first feature data and the second feature data through decryption, the first device may send the correlation value to the second device. The second device receives the correlation value sent by the first device.
And the second equipment performs characteristic selection on the first characteristic data and the second characteristic data according to the relevant values. Specifically, a correlation threshold may be set in the second device, and if the correlation value is greater than the correlation threshold, it indicates that the correlation between the first feature data and the second feature data is high, the second device may select to remove one of the first feature data and the second feature data, where, if the first feature is season, the second feature is temperature, and the calculated correlation value is greater than the correlation threshold, the second device may remove one of the season and the temperature, that is, data under the feature is not used in a subsequent federal learning model training process. If the correlation value is not greater than the correlation threshold, the first characteristic data and the second characteristic data may be retained.
It should be noted that the first device may also perform feature selection directly according to the correlation value after obtaining the correlation value through decryption.
In this embodiment, the first device sends the decrypted correlation value to the second device, and the second device performs feature selection according to the correlation value, so that a model with higher quality can be obtained through subsequent modeling and model training of the federal learning model.
Further, when the second feature data is stored in different execution machines of the distributed cluster, the step of performing the normalization processing on the second feature data to be correlated in the step S20 includes:
and step S204, performing the normalization processing on the second characteristic data of the correlation to be calculated in a distributed calculation mode.
In this embodiment, the second feature data of the second device may be stored in different execution machines of the distributed cluster, and if the different execution machines store data of different users, the second device may perform normalization processing on the second feature data by using a distributed computing method when performing normalization processing on the second feature data. Specifically, different execution machines normalize respective local part of the second feature data, and then send the result of the normalization processing to the second device, and the second device summarizes the result of each execution machine and performs the operation of adding random numbers.
Similarly, the first device may perform normalization processing on the first data features in a distributed computing manner when performing normalization processing on the first data features.
In this embodiment, the first device and the second device may perform normalization processing in a distributed computing manner, which reduces the computing resource consumption of the first device and the second device, and increases the speed of normalization processing, thereby increasing the efficiency of the whole longitudinal federal learning modeling process.
Further, based on the first embodiment and the second embodiment, a third embodiment of the feature correlation calculation method according to the present invention provides a feature correlation calculation method, where the feature correlation calculation method is applied to a first device participating in vertical federal learning, the second device is in communication connection with a second device participating in vertical federal learning, the first device and the second device may be a server, or may be terminal devices such as a PC, a smart phone, a smart television, a tablet computer, and a portable computer, and the feature correlation calculation method includes:
step A10, normalizing the first characteristic data of the correlation to be calculated, and encrypting the processing result according to an encryption algorithm to obtain encrypted characteristic data;
in this embodiment, for a first feature of the correlation to be calculated, the first device performs normalization processing on first feature data corresponding to the first feature, and encrypts a result of the normalization processing according to an encryption algorithm to obtain encrypted feature data. The normalization process may be to calculate an average value and a standard deviation of each data in the first feature data, subtract the average value from each data, and divide the data by the standard deviation. The encryption algorithm used by the first device may be a preset homomorphic encryption algorithm, such as Paillier algorithm, which has the effect of processing homomorphic encrypted data to obtain an output, and decrypting the output, and the result is the same as the output result obtained by processing unencrypted original data by the same method.
Step A20, sending the encrypted feature data to the second device, so that the second device can calculate to obtain an encrypted correlation value according to the encrypted feature data and the random number adding feature data, wherein the second device performs the normalization processing on the second feature data to be correlated, and performs the random number adding operation on the processing result to obtain the random number adding feature data;
the first device sends the encrypted feature data to the second device. The second device receives the encrypted feature data sent by the first device. And the second equipment normalizes second feature data corresponding to the second feature of the correlation to be calculated, and performs random number adding operation on the normalization processing result to obtain random number added feature data. The second device may add the same random number to each piece of data in the normalized second feature data, where the random number may be generated by the second device using a preset random number generation algorithm. For example, for the normalized second feature data { x1, x2, x3}, a random number r is added to each data to obtain added random number feature data { x1+ r, x2+ r, x3+ r }.
After the second device obtains the feature data of the random number by calculation, the feature data of the random number and the encrypted feature data are used for calculation to obtain an encrypted correlation value. The second device transmits the encrypted correlation value to the first device.
Step a30, receiving the encrypted related value returned by the second device and decrypting the encrypted related value to obtain the related value of the first feature data and the second feature data.
The first device receives the encrypted related value returned by the second device and decrypts the encrypted related value, the decryption operation corresponds to the encryption operation of the first device for encrypting the normalization processing result of the first characteristic data, namely the first device encrypts the normalization processing result of the first characteristic data by using the encryption key, the encrypted related value is decrypted by using the decryption key corresponding to the encryption key, and the finally obtained result is the related value of the first characteristic data and the second characteristic data, namely the related value of the first characteristic and the second characteristic.
In this embodiment, a first device performs normalization processing on first feature data of correlation to be calculated, and encrypts a processing result according to an encryption algorithm to obtain encrypted feature data; sending the encrypted feature data to second equipment so that the second equipment can calculate to obtain an encrypted correlation value according to the encrypted feature data and the random number adding feature data, wherein the second equipment carries out normalization processing on the second feature data to be subjected to correlation calculation and carries out random number adding operation on a processing result to obtain random number adding feature data; and receiving the encrypted related value returned by the second equipment and decrypting to obtain the related value of the first characteristic data and the second characteristic data. Compared with the traditional scheme that only one side of two parties is added with an encryption or random number adding mechanism, so that the other party can steal the data of the other party in a mode of modeling for many times or constructing special data, in the embodiment, the first equipment takes encryption measures on the data, the second equipment takes random number adding measures on the data, and the two parties take certain encryption protection measures, so that the two parties can not steal the data of the other party, and the safety of longitudinal federal learning modeling is enhanced.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, on which a feature correlation calculation program is stored, and when the feature correlation calculation program is executed by a processor, the steps of the feature correlation calculation method are implemented.
The specific implementation of the feature correlation calculation device and the computer-readable storage medium of the present invention has substantially the same expansion content as the embodiments of the feature correlation calculation method, and will not be described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1.A feature correlation calculation method is applied to a second device participating in longitudinal federated learning, and the second device is in communication connection with a first device participating in longitudinal federated learning, and the feature correlation calculation method comprises the following steps:
receiving encrypted feature data sent by the first device, wherein the first device performs normalization processing on the first feature data of the correlation to be calculated, and encrypts a processing result according to an encryption algorithm to obtain the encrypted feature data;
the second feature data of the correlation to be calculated is subjected to normalization processing, and random number adding operation is carried out on the processing result to obtain random number added feature data;
and calculating to obtain an encryption related value according to the encryption characteristic data and the encrypted random number characteristic data, and sending the encryption related value to the first equipment so that the first equipment can decrypt the encryption related value to obtain the related value of the first characteristic data and the second characteristic data.
2. The method of claim 1, wherein after the step of calculating an encryption correlation value based on the encrypted feature data and the encrypted random number feature data and sending the encryption correlation value to the first device, further comprising:
receiving the correlation value sent by the first device;
and performing feature selection on the first feature data and the second feature data according to the correlation values.
3. The feature correlation calculation method according to claim 1, wherein the step of performing the normalization process on the second feature data whose correlation is to be calculated includes:
calculating the mean value and the standard deviation of each data in the second characteristic data of the correlation to be calculated;
and subtracting the mean value from each data in the second characteristic data respectively and dividing the result by the standard deviation.
4. The method of claim 1, wherein the step of performing the random number adding operation on the processing result to obtain the random number added feature data comprises:
and respectively adding different preset random numbers to each data in the second characteristic data after normalization processing to obtain random number added characteristic data.
5. The feature correlation calculation method according to claim 1, wherein the encryption algorithm is a preset homomorphic encryption algorithm.
6. The method according to any one of claims 1 to 5, wherein the step of performing the normalization processing on the second feature data to be correlated includes, when the second feature data is stored in different execution machines of the distributed cluster, performing the normalization processing on the second feature data to be correlated includes:
and performing the normalization processing on the second characteristic data of the correlation to be calculated by adopting a distributed calculation mode.
7. A feature correlation calculation method is applied to a first device participating in longitudinal federal learning, and a second device participating in longitudinal federal learning is in communication connection with the second device, and the feature correlation calculation method comprises the following steps:
normalizing the first characteristic data of the correlation to be calculated, and encrypting the processing result according to an encryption algorithm to obtain encrypted characteristic data;
sending the encrypted feature data to the second device, so that the second device can calculate an encrypted correlation value according to the encrypted feature data and the random number adding feature data, wherein the second device performs normalization processing on second feature data to be correlated, and performs random number adding operation on a processing result to obtain the random number adding feature data;
and receiving the encrypted related value returned by the second device and decrypting the encrypted related value to obtain the related value of the first characteristic data and the second characteristic data.
8. A feature correlation calculation apparatus comprising a memory, a processor and a feature correlation calculation program stored on the memory and executable on the processor, the feature correlation calculation program when executed by the processor implementing the steps of the feature correlation calculation method according to any one of claims 1 to 6.
9. A feature correlation calculation apparatus comprising a memory, a processor and a feature correlation calculation program stored on the memory and executable on the processor, the feature correlation calculation program when executed by the processor implementing the steps of the feature correlation calculation method as claimed in claim 7.
10. A computer-readable storage medium, characterized in that a feature correlation calculation program is stored thereon, which when executed by a processor implements the steps of the feature correlation calculation method according to any one of claims 1 to 7.
CN201911046722.XA 2019-10-30 2019-10-30 Feature correlation calculation method, device and computer-readable storage medium Pending CN110807528A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911046722.XA CN110807528A (en) 2019-10-30 2019-10-30 Feature correlation calculation method, device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911046722.XA CN110807528A (en) 2019-10-30 2019-10-30 Feature correlation calculation method, device and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN110807528A true CN110807528A (en) 2020-02-18

Family

ID=69489614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911046722.XA Pending CN110807528A (en) 2019-10-30 2019-10-30 Feature correlation calculation method, device and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN110807528A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444527A (en) * 2020-03-30 2020-07-24 腾讯云计算(北京)有限责任公司 Method, device and medium for determining correlation coefficient of data between different application programs
CN111461874A (en) * 2020-04-13 2020-07-28 浙江大学 Credit risk control system and method based on federal mode
CN112001452A (en) * 2020-08-27 2020-11-27 深圳前海微众银行股份有限公司 Feature selection method, device, equipment and readable storage medium
CN112102939A (en) * 2020-07-24 2020-12-18 西安电子科技大学 Cardiovascular and cerebrovascular disease reference information prediction system, method and device and electronic equipment
CN112989420A (en) * 2021-03-31 2021-06-18 支付宝(杭州)信息技术有限公司 Method and system for determining correlation coefficient for protecting data privacy
CN113569301A (en) * 2020-04-29 2021-10-29 杭州锘崴信息科技有限公司 Federal learning-based security computing system and method
CN115640509A (en) * 2022-12-26 2023-01-24 北京融数联智科技有限公司 Data correlation calculation method and system in federated privacy calculation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189825A (en) * 2018-08-10 2019-01-11 深圳前海微众银行股份有限公司 Lateral data cutting federation learning model building method, server and medium
CN109299728A (en) * 2018-08-10 2019-02-01 深圳前海微众银行股份有限公司 Federal learning method, system and readable storage medium storing program for executing
CN109325584A (en) * 2018-08-10 2019-02-12 深圳前海微众银行股份有限公司 Federation's modeling method, equipment and readable storage medium storing program for executing neural network based
CN109492420A (en) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federation's study
WO2019072316A2 (en) * 2019-01-11 2019-04-18 Alibaba Group Holding Limited A distributed multi-party security model training framework for privacy protection
CN109886417A (en) * 2019-03-01 2019-06-14 深圳前海微众银行股份有限公司 Model parameter training method, device, equipment and medium based on federation's study
US20190227980A1 (en) * 2018-01-22 2019-07-25 Google Llc Training User-Level Differentially Private Machine-Learned Models

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190227980A1 (en) * 2018-01-22 2019-07-25 Google Llc Training User-Level Differentially Private Machine-Learned Models
CN109189825A (en) * 2018-08-10 2019-01-11 深圳前海微众银行股份有限公司 Lateral data cutting federation learning model building method, server and medium
CN109299728A (en) * 2018-08-10 2019-02-01 深圳前海微众银行股份有限公司 Federal learning method, system and readable storage medium storing program for executing
CN109325584A (en) * 2018-08-10 2019-02-12 深圳前海微众银行股份有限公司 Federation's modeling method, equipment and readable storage medium storing program for executing neural network based
CN109492420A (en) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federation's study
WO2019072316A2 (en) * 2019-01-11 2019-04-18 Alibaba Group Holding Limited A distributed multi-party security model training framework for privacy protection
CN109886417A (en) * 2019-03-01 2019-06-14 深圳前海微众银行股份有限公司 Model parameter training method, device, equipment and medium based on federation's study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
STEPHEN HARDY等: "Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption", ARXIV.ORG, 29 November 2017 (2017-11-29) *
姚禹丞;宋玲;鄂驰;: "同态加密的分布式K均值聚类算法研究", 计算机技术与发展, no. 02, 10 January 2017 (2017-01-10) *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444527A (en) * 2020-03-30 2020-07-24 腾讯云计算(北京)有限责任公司 Method, device and medium for determining correlation coefficient of data between different application programs
CN111444527B (en) * 2020-03-30 2023-08-11 腾讯云计算(北京)有限责任公司 Method, device and medium for determining correlation coefficient of data between different application programs
CN111461874A (en) * 2020-04-13 2020-07-28 浙江大学 Credit risk control system and method based on federal mode
CN113569301A (en) * 2020-04-29 2021-10-29 杭州锘崴信息科技有限公司 Federal learning-based security computing system and method
CN112102939A (en) * 2020-07-24 2020-12-18 西安电子科技大学 Cardiovascular and cerebrovascular disease reference information prediction system, method and device and electronic equipment
CN112102939B (en) * 2020-07-24 2023-08-04 西安电子科技大学 Cardiovascular and cerebrovascular disease reference information prediction system, method and device and electronic equipment
CN112001452A (en) * 2020-08-27 2020-11-27 深圳前海微众银行股份有限公司 Feature selection method, device, equipment and readable storage medium
CN112001452B (en) * 2020-08-27 2021-08-27 深圳前海微众银行股份有限公司 Feature selection method, device, equipment and readable storage medium
CN112989420A (en) * 2021-03-31 2021-06-18 支付宝(杭州)信息技术有限公司 Method and system for determining correlation coefficient for protecting data privacy
CN115640509A (en) * 2022-12-26 2023-01-24 北京融数联智科技有限公司 Data correlation calculation method and system in federated privacy calculation

Similar Documents

Publication Publication Date Title
CN110807528A (en) Feature correlation calculation method, device and computer-readable storage medium
Gai et al. Blend arithmetic operations on tensor-based fully homomorphic encryption over real numbers
US20200396217A1 (en) Key Attestation Statement Generation Providing Device Anonymity
US10728018B2 (en) Secure probabilistic analytics using homomorphic encryption
CN110704860A (en) Longitudinal federal learning method, device and system for improving safety and storage medium
CN110851869B (en) Sensitive information processing method, device and readable storage medium
EP3146744B1 (en) Method, apparatus, and system for providing a security check
CN111162896A (en) Method and device for data processing by combining two parties
JP5762232B2 (en) Method and system for selecting the order of encrypted elements while protecting privacy
CN108270944B (en) Digital image encryption method and device based on fractional order transformation
CN111340247A (en) Longitudinal federated learning system optimization method, device and readable storage medium
CN109067517B (en) Encryption and decryption device, encryption and decryption method and communication method of hidden key
CN113169860A (en) Apparatus and method for non-polynomial computation of ciphertext
Abdul Hussien et al. [Retracted] A Secure Environment Using a New Lightweight AES Encryption Algorithm for E‐Commerce Websites
CN112529586B (en) Transaction information management method, device, equipment and storage medium
Lee et al. A novel group ownership delegate protocol for RFID systems
CN111368196A (en) Model parameter updating method, device, equipment and readable storage medium
CN110750520A (en) Feature data processing method, device and equipment and readable storage medium
CN111079164B (en) Feature correlation calculation method, device, equipment and computer-readable storage medium
Zhang et al. An efficient image encryption scheme for industrial Internet-of-Things devices
US20200145200A1 (en) Attribute-based key management system
CN111368314A (en) Modeling and predicting method, device, equipment and storage medium based on cross features
KR101751971B1 (en) Image processing method and apparatus for encoded image
CN114357504A (en) Federal learning method based on privacy protection and related equipment
CN113946862A (en) Data processing method, device and equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination