CN116933326A

CN116933326A - Data processing method and device based on safety house, electronic equipment and storage medium

Info

Publication number: CN116933326A
Application number: CN202310717547.2A
Authority: CN
Inventors: 刘鹏; 伊旻忞; 陈远旭
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-10-24

Abstract

The invention relates to the field of financial science and technology, and discloses a data processing method based on a safety house, which comprises the following steps: building a secure enclave in response to a first client request; dividing the data set into a public sample set and a non-public sample set, transmitting the public sample set to the front end of the safety house, and transmitting the non-public sample set to the rear end of the safety house; the front end is used for receiving the first code to debug the public sample set, and the debugged second code and training environment are packaged into a virtual mirror image file; and utilizing the back end to mount the virtual mirror image file and the non-public sample set, generating training parameters of the initial model, and enabling the second client to debug the initial model to obtain a target model based on the training parameters and returning the target model to the first client. The invention is applied to the field of financial science and technology, ensures that sensitive information of financial data of a consignor is placed in a background without contact of a third party in the development process of a consignment model, avoids the technical problem of leakage of the financial data, and improves the safety of financial transaction data.

Description

Data processing method and device based on safety house, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing in the field of financial technology, and in particular, to a data processing method and apparatus based on a secure enclave, an electronic device, and a storage medium.

Background

In the field of financial technology, data is an asset of great importance to enterprises, such as user data and business data. Most of the user data and business data relate to sensitive attribute information such as electronic payment, online shopping, securities trade and the like of the user.

In general, enterprises need to use machine learning models to predict data so as to determine the layout of business operation, so that how to ensure that sensitive attribute information in the field of financial science and technology is not stolen by a third party is a financial security subject which is always valued by enterprises.

The machine learning model is generally provided by a receiver of the model service, for example, a consignor (enterprise) directly gives data in the financial and technological field to the receiver, or exchanges data between the two parties through a trusted third party, and finally the receiver trains the machine learning model.

However, in the process of data training or exchange, as long as the third party or the receiver can contact the data of the sensitive attribute information (such as the bank account number, the credit card account number, the insurance order and the online shopping address of the user), there is a risk of data leakage, so that a financial security event occurs, for example, the receiver can save a copy of the data or the data is leaked and tampered by the third party.

Disclosure of Invention

In view of the above, it is necessary to provide a data processing method based on a security house, which aims to solve the technical problem of leakage of sensitive attribute information in the field of financial science and technology in the prior art.

The invention provides a data processing method based on a safety house, which comprises the following steps:

responding to a data processing request of a first client, and constructing a safety house in a preset trusted environment, wherein the data processing request comprises a data set to be processed;

dividing the data set into a public sample set and a non-public sample set, transmitting the public sample set to the front end of the safety house, and transmitting the non-public sample set to the rear end of the safety house;

the front end is used for receiving a first code input by a second client to debug the public sample set, and the second code and training environment obtained after debugging are packaged into a virtual mirror image file;

and mounting the virtual image file and the non-public sample set by using the back end, generating training parameters of a preset initial model, debugging the initial model by using the second client to execute the second code based on the training parameters to obtain a target model, and returning the target model to the first client.

Optionally, the secure enclave includes a front end, a back end and a virtual machine cloud desktop, where the front end is configured to receive the data set uploaded by the first client and to enable the second client to perform code debugging, and the back end is configured to display training parameters of the initial model.

Optionally, the debugging the public sample set by using the first code input by the front end receiving the second client, and packaging the second code and the first training environment obtained after the debugging into a virtual mirror file, including:

receiving a first code input by the second client on a first display interface of the virtual machine cloud desktop to deploy a first training environment;

and based on the first training environment and the first code, debugging the public sample set, and packaging the second code and the first training environment obtained after debugging into the virtual image file.

Optionally, the generating training parameters of the preset initial model by using the back end to mount the virtual image file and the non-public sample set includes:

mounting the virtual image file by using the image mounting tool at the rear end to generate the initial model;

And training the initial model by using the non-public sample set to obtain training parameters of the initial model.

Optionally, the training the initial model by using the non-public sample set to obtain training parameters of the initial model includes:

determining parameters of the initial model by using a training set of the non-public sample set;

determining network structure and super parameters of the initial model by using a verification set of the non-public sample set;

verifying generalization ability parameters of the initial model by using a test set of the non-public sample set;

and taking the parameters, the network structure, the super parameters and the generalization capability parameters as the training parameters.

Optionally, the debugging the initial model based on the training parameters to enable the second client to execute the second code to obtain a target model includes:

training parameters based on the initial model;

and receiving the second code debug by the second client on the first display interface of the front end so that the loss function of the initial model is smaller than a threshold value, and obtaining the target model.

Optionally, before the returning the target model to the first client, the method further includes:

Destroying all data except the target model in the safety house, and exporting the target model.

In order to solve the above problems, the present invention also provides a data processing apparatus based on a security house, the apparatus comprising:

the request module is used for responding to a data processing request of the first client, and constructing a safety house in a preset trusted environment, wherein the data processing request comprises a data set to be processed;

the dividing module is used for dividing the data set into a public sample set and a non-public sample set, transmitting the public sample set to the front end of the safety house, and transmitting the non-public sample set to the rear end of the safety house;

the packaging module is used for debugging the public sample set by utilizing the first code input by the front end receiving the second client, and packaging the second code obtained after debugging and the training environment into a virtual mirror image file;

the debugging module is used for mounting the virtual mirror image file and the non-public sample set by utilizing the rear end, generating training parameters of a preset initial model, enabling the second client to execute the second code to debug the initial model based on the training parameters to obtain a target model, and returning the target model to the first client.

In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a secure enclave-based data processing program executable by the at least one processor, the secure enclave-based data processing program being executable by the at least one processor to enable the at least one processor to perform the secure enclave-based data processing method described above.

In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored thereon a security house-based data processing program executable by one or more processors to implement the above-mentioned security house-based data processing method.

Compared with the prior art, the method and the system respond to the data processing request of the first client, and establish a safety house in a preset trusted environment, wherein the data processing request comprises a data set to be processed; the data set is divided into a public sample set and a non-public sample set, the public sample set is transmitted to the front end of the safety house, the non-public sample set is transmitted to the rear end of the unmanned safety house, the condition that the second client end has no opportunity to contact the non-public sample set from beginning to end is ensured, and the safety of sensitive data (the non-public sample set) is effectively ensured.

The front end is used for receiving a first code input by a second client to debug the public sample set, and the second code and training environment obtained after debugging are packaged into a virtual mirror image file; and mounting the virtual image file and the non-public sample set by using the back end, generating training parameters of a preset initial model, debugging the initial model by using the second client to execute the second code based on the training parameters to obtain a target model, and returning the target model to the first client. The method and the system ensure that sensitive attribute information of the financial science and technology field of a consignor (enterprise) can not only enter and exit in a safety house, but also prevent a second client from accurately inquiring the non-public sample set by using the public sample set, and improve the safety of data transaction of the financial field, wherein the second client can only see training parameters of a model at the rear end and can not enter the rear end to modify a second code, and the rear end ensures that the second client can not contact the non-public sample set in the whole process and can only work by using the non-sensitive attribute information of the financial science and technology field at the front end.

Drawings

FIG. 1 is a flow chart of a data processing method based on a security house according to an embodiment of the invention;

FIG. 2 is a schematic block diagram of a data processing apparatus based on a security house according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device implementing a data processing method based on a security house according to an embodiment of the present invention;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

With the rapid development of the financial science and technology field, when the consignor and the receiver perform financial data model training, sensitive attribute information leakage in financial data (such as a user's bank account number, a credit card account number, an insurance order, and an online shopping address) easily occurs, so that financial security events occur, and the main reason is that the receiver can save financial data copies or third parties leak and tamper with the financial data.

The invention provides a data processing method based on a safety house, which can be applied to the field of financial science and technology, and by constructing the safety house in a trusted environment, a first client divides a data set into a public sample set and a non-public sample set, transmits the public sample set to the front end of the safety house and transmits the non-public sample set to the rear end of the safety house. The method and the device can solve the technical problem that the data of the first client is prevented from being leaked in the financial data processing process when the model development process is entrusted.

The method and the system ensure that sensitive attribute information in the financial science and technology field of the consignor can not only enter and exit in a safety house, but also can not enter the rear end to modify the second code when the second client can only watch training parameters of the model at the rear end, ensure that the second client can not contact the non-public sample set in the whole process through the rear end, can only work by utilizing the non-sensitive attribute information in the financial science and technology field at the front end, ensure that the non-public sample set of the sensitive attribute information can not leak at all, also prevent the second client from accurately inquiring the non-public sample set by utilizing the public sample set, and improve the safety of data transaction in the financial field.

Referring to fig. 1, a flow chart of a data processing method based on a security house according to an embodiment of the invention is shown. The method is performed by an electronic device.

In this embodiment, the data processing method based on the security house includes:

s1, responding to a data processing request of a first client, and constructing a safety house in a preset trusted environment, wherein the data processing request comprises a data set to be processed.

In this embodiment, the first client is a client device of a client, which is a party that provides a data set and obtains a final target model.

The second client is the client device of the recipient, who is modeling and training the data set, or who is the party providing the model service (model training).

The data set refers to financial data owned by the consignor, which relates to financial attribute information such as online transactions of users, bank accounts, credit card accounts, insurance orders, online shopping addresses, and the like. For example, the data set includes user data and traffic data. For example, the data set may be a data set of an insurance agency, a data set of a shopping platform, or a banking data set.

The safety house is a safety platform built by a third party based on a TEE trusted environment, the whole building process of the safety house is free of participation, the first client and the third party do not have permission to acquire codes trained by the safety house, and the second client also does not have permission to acquire a data set provided by the first client from the safety house.

The TEE trusted environment may refer to a trusted execution environment (e.g., a marine CSV trusted execution environment) approved by a national cryptographic management center, and a CSV virtual machine in the trusted execution environment encrypts memory data by using a national cryptographic SM4 hardware engine to construct a secure TEE trusted execution environment, so as to ensure that a host administrator or other users cannot access data in the TEE, support start-up measurement and remote identity authentication during running, and ensure that legal user programs are running. The key of the encryption memory in the trusted execution environment is generated by a hardware random number generator, and the key is invisible to software, so that the data security of a user is ensured.

That is, in the secure enclave, the first client only provides the authority of the data set and obtains the final target model, the second client only models and trains the data set on the display interface of the secure enclave, the second client cannot contact and copy the data set, the third party only provides the authority of the secure enclave (service), the third party does not contact and copy the data set, the trained code does not contact and copy, the authority of the final target model is not obtained, and the third party, the first client and the second client have no direct relation, so that the security of the data can be ensured.

When the entrusting party has analysis requirements on the data set to be processed, the entrusting party logs in a website provided with services by a third party through a first client, and after clicking a key of a safety house component of the website, the entrusting party automatically generates a special safety house for the data processing request based on the data processing request of the first client, and the first client transmits the data set to be processed to the safety house through a TEE trusted environment.

In one embodiment, the secure enclave includes a front end, a back end and a virtual machine cloud desktop, the front end is configured to receive a data set uploaded by the first client, and to allow the second client to perform code debugging, and the back end is configured to display training parameters of the initial model.

The role of the front end is to prevent the problem of leakage of the data provided by the first client.

The back end is used for automatically executing a series of processes on the output result of the front end, such as deploying and executing the sent virtual image file, and visualizing the training result of the target model on the second display interface of the back end.

The cloud desktop of the virtual machine is used for providing a working interface of the cloud desktop for a second client to write and debug codes, packaging the debugged codes and the first training environment into a virtual mirror file, and sending the virtual mirror file to the back end.

In step S1, one-to-one data entrusting development flow without participation of a third party is realized by means of the safety house, the safety of data and a code model is ensured, and the ownership of the data is also ensured.

S2, dividing the data set into a public sample set and a non-public sample set, transmitting the public sample set to the front end of the safety house, and transmitting the non-public sample set to the rear end of the safety house.

In this embodiment, the data set refers to data owned by the principal person. For example, the data set includes user data and service data. Before the first client transmits the data set to the security house, labeling is required to be performed on all samples of the data set, and a labeling method can be completed by a consignor by himself or herself, or can be realized by a crowdsourcing system or a labeling model, and is not limited herein.

The data set is divided into a public sample set and a non-public sample set according to the attribute of each sample, wherein the attributes of the sample comprise the name, the mobile phone number, the age, the occupation, the income, the bank account number, the credit card account number, the security account number and the like of the user, for example, a financial institution extracts financial attribute data from a financial database as the data set, and the information related to the privacy of the user, the information generated in the financial transaction process of the user and the information of the business secret are divided into the non-public sample set, and the specific division is not limited herein according to the requirements of the actual application scene.

If the data set is not divided into a public sample set and a non-public sample set, once the financial attribute data of the non-public sample set leaks, hundreds of millions of financial account information is likely to be threatened, or illegal parties carry out mail or telephone harassment on users of a financial institution by analyzing the financial attribute data.

The public sample set is transmitted to the front end of the safety house, the non-public sample set is transmitted to the rear end of the unmanned safety house, the data set is divided into the public sample set and the non-public sample set, and the advantages are brought by the fact that the data of the first client always exist a part of sensitive data and a part of non-sensitive data, and the non-sensitive data (the public sample set) is disclosed to the second client at the front end so as to facilitate code writing and debugging, so that the safety of the sensitive data is ensured.

In one embodiment, before said dividing said data set into a public sample set and a non-public sample set, the method further comprises:

labeling the data set.

The first client is required to label all samples of the data set, for example, the data set is divided into a plurality of sample sets, and labels are packaged out through a crowdsourcing system (such as a crowdsourcing system of a security group) to realize manual labeling of all samples.

More category information corresponding to the sample is provided through the label, so that the prediction probability of the subsequent target model to the sample can be improved.

In one embodiment, before said transmitting said non-public sample set to the back end of said secure enclosure, the method further comprises:

the non-public sample set is divided into a training set, a testing set, and a validation set.

The non-public sample set is divided into a training set, a test set, and a validation set at a preset ratio (e.g., a preset ratio of 6:2:2).

Training set (train set): for training an initial model (precursor to the target model requested or to be acquired by the first client) and determining parameters of the initial model.

Validation set): for determining the network structure of the initial model and adjusting the hyper-parameters of the initial model.

Test set (test set): for checking the generalization ability of the initial model.

In step S2, the public sample set is transmitted to the front end of the security house, and the non-public sample set is transmitted to the rear end of the security house without human contact, so that the second client is ensured to have no opportunity to contact the non-public sample set from beginning to end, the security of sensitive data (non-public sample set) is effectively ensured, and the second client is prevented from accurately inquiring the non-public sample set by using the public sample set.

And S3, utilizing the front end to receive the first code input by the second client to debug the public sample set, and packaging the second code obtained after the debugging and the training environment into a virtual mirror image file.

In this embodiment, a first display interface (for example, a WEB interface) is provided at the front end of the security house, a first code is input to the second client at the first display interface, and the first training environment deployment, code writing and debugging are performed on the public sample set by using the first code.

For example, when the user H registers a shopping account, a security account, a bank account and a insurance account in the financial institution a and generates the data set K within a certain period, before the financial institution a performs model training commission, it is required to divide information related to user privacy, information generated during a financial transaction of the user and business secrets in the data set K into non-public sample sets, and divide data in the data set K, which is easily obtained through other ways, into public sample sets, where the public sample sets are usually what color clothing or shoes the user H purchases (usually does not relate to clothing, shoe prices, corresponding merchant information), or browse the activity preference information of the bank account (usually does not relate to transfer of money or money quantity).

When the first code input by the receiver at the second client terminal debugs the public sample set, even if the data of the public sample set is seen and leaked, the user H cannot be cheated by means of junk information and telephone harassment, such as threat and decoy, and the money of the user H cannot be cheated, and the data of the public sample set cannot be acquired, so that the account of the user H is invaded and the financial assets of the user H are illegally transferred.

After more than one round of code debugging is carried out, after the receiver logs in the first display interface through the second client to confirm the debugging effect, after clicking the corresponding submit button of the first display interface, the second client packages the second code obtained by debugging and the first training environment into a virtual mirror image file. The virtual image file can be a dock type image file or other types of image files.

The virtual mirror image file is a special file system, and contains some configuration parameters (such as anonymous volumes, environment variables, users and the like) prepared for the running time in addition to the files of programs, libraries, resources, configurations and the like required by the running time of the container; the mirror does not contain any dynamic data, nor does its content change after construction.

In one embodiment, the debugging the public sample set by using the first code input by the front end receiving the second client, and packaging the second code and the first training environment obtained after the debugging into a virtual image file, including:

The first display interface is a display interface of the cloud desktop of the virtual machine, has similar functions as a normal computer, and provides a certain calculation force for a second client to debug codes.

The first code is what code the second client decides to input according to the data processing request of the first client and the data set provided, for example, if the data processing request of the first client is to solve the data classification problem, python is installed on the virtual machine cloud desktop, and various libraries (such as pytorch) are installed as deployment first training environments. That is, the first training environment is an environment required for code to run.

After the first training environment is deployed, the second client inputs codes according to the data set, the public sample set is debugged, and the debugging process is determined by the second client. The purpose of the second client is to provide an available object model, which is generated by the code, how the code is debugged is a part of the content that the second client needs to be responsible for, such as parameter adjustment, code feasibility, etc., so that the neural network that determines the object model is built, the parameter adjustment and which debugged first training environment, second code, etc., are needed.

In step S3, the security house may perform data import from outside, and data in the security house cannot be exported to outside. That is, the receiver can import some data required by the receiver to the safety house to perform model pre-training, the public sample set and the non-public sample set cannot be exported from the safety house, the data of the safety house is ensured to be only input and not output, the fact that a third party or the receiver (a second client) cannot contact the data in the data transaction process is effectively ensured, and the risks of data leakage and tampering are avoided.

S4, mounting the virtual mirror image file and the non-public sample set by utilizing the rear end, generating training parameters of a preset initial model, enabling the second client to execute the second code to debug the initial model based on the training parameters to obtain a target model, and returning the target model to the first client.

In this embodiment, the virtual image file is transmitted to the back end of the security house, and the image mounting tool at the back end is used to virtualize the content of the virtual image file into the physical optical drive content, that is, the first training environment of the virtual image file is deployed in the back end and operates the second code, so as to generate an initial model corresponding to the second code, and the non-public sample set is mounted to train the initial model, so that the second client performs debugging on the second code, and a target model requested by the first client is generated.

After the entrusting party receives the trained target model, various financial attribute information of the financial database is analyzed by utilizing the target model,

in the field of financial science and technology, in the development process of the entrusting model, the sensitive attribute information of the entrusting party only cannot enter or exit in a safety house, particularly, the non-public sample set related to the financial attribute information is always in a state that any third party cannot touch, so that the non-public sample set of the sensitive attribute information is prevented from leaking at all, a second client is prevented from accurately inquiring the non-public sample set by using the public sample set, and the safety of data transaction in the financial field is improved.

The significance of financial data is increasingly highlighted, and the influence of security threats such as data leakage, abuse, tampering and the like is also increasing, so that the transfer from financial institutions is expanded to institutions and industries, and even national security, social security and public interests are influenced.

The data processing method based on the safety house can strengthen the data protection capability and ensure the safety mobility of the financial data on the basis of meeting the basic requirements of the financial business.

In one embodiment, the generating training parameters of the preset initial model by using the back end to mount the virtual image file and the non-public sample set includes:

The back-end mirror mounting tool refers to a virtual drive tool (e.g., the virtual drive tool is Tools software).

The image mounting tool automatically decompresses the input virtual image file to generate an ISO file virtual compact disc, installs the compact disc in the rear end and performs the first training environment obtained in the deployment step S3, after the first training environment is deployed, operates the second code to generate an initial model corresponding to the second code, mounts and trains the initial model by using the non-public sample set to obtain training parameters and displays the training parameters on a second display interface of the rear end, so that the second client executes the second code to debug the initial model to generate a target model.

In one embodiment, the training the initial model by using the non-public sample set to obtain training parameters of the initial model includes:

Training parameters (such as weight w and bias b) in a neural network of an initial model by using a training set, and comparing and judging a network structure (such as network layer number and network node number) and super parameters (such as iteration times epoch and learning rate and optimizer) of the initial model by using a verification set after the training set is used for training the initial model; finally, the generalization capability parameters of the initial model are checked by using the test set.

In one embodiment, the debugging the initial model based on the training parameters to enable the second client to execute the second code to obtain a target model includes:

Training parameters based on the initial model;

And displaying the training parameters in a second display interface of the rear end, wherein the second display interface does not provide an operation inlet, and if the second client considers that the training parameters do not meet the standard (for example, the standard is the accuracy rate of the target model required by the first client), and the second client wants to modify the second code, the second client needs to reenter the front end of the safety house, and modifies the original second code on the first display interface of the front end, that is, the second client does not have permission to carry out the rear end and can only watch the displayed training parameters of the second display interface, thereby ensuring that the second client cannot contact the non-public sample set and ensuring the safety of the non-public sample set at the rear end.

And (3) after the second client debugs the second code, executing step S3, repackaging the debugged second code and the first training environment into a virtual mirror image file, inputting the virtual mirror image file into the rear end, and performing the series of processing when the training parameters displayed by the second display interface are obtained to meet the standards, namely, the loss function of the initial model is smaller than a threshold value (namely, the accuracy rate of the target model is larger than the requirement specified by the first client), so as to obtain the target model.

In one embodiment, before said returning said object model to said first client, the method further comprises:

If the accuracy of the target model is greater than the requirement specified by the first client or the loss function of the initial model is less than the threshold, all data at the front end of the safety house are destroyed, all data at the rear end except the target model are destroyed, and the target model is taken out from the rear end and delivered to the first client.

In step S4, the second client only can watch training parameters of the model on the second display interface of the back end, and cannot enter the back end to modify the second code, so that the back end ensures that the second client cannot contact the non-public sample set in the whole process, thereby ensuring that the non-public sample set cannot leak at all, and ensuring that the non-public sample set cannot be accurately searched.

In steps S1-S4, a one-to-one delegated secure-house-based data processing method is provided, i.e. with only one first client and one second client. Through constructing the safety house in trusted environment, first customer end divides into public sample set and non-public sample set with the data set, transmits the front end of public sample set to the safety house, transmits the rear end of non-public sample set to the safety house, because the front end of safety house is for second customer end input code and debugging code, the rear end of safety house is the training parameter that is used for setting up model and demonstration model, has ensured that receiver (second customer end) can't all touch the non-public sample set in digital medical and finance science and technology field in whole flow, also ensures that second customer end does not have the authority and gets into the rear end. In other embodiments, the invention may also implement a one-to-many delegated secure-house based data processing method, i.e. with only one first client and a plurality of second clients.

S5, responding to a data processing request of the first client, and constructing a safety house in a preset trusted environment, wherein the data processing request comprises a data set to be processed;

in this embodiment, the first client refers to the party that provides the data set and obtains the final object model.

The second client refers to modeling and training the dataset, or the party providing the model service (model training).

A data set refers to data personally owned by a first client. For example, the data set includes user data and traffic data.

And generating relevant environments according to the number of the second clients, and prescribing the starting time and the ending time of the competition, wherein each second client shares the front end and the back end of the safety house.

That is, in the secure enclave, the first client only provides the authority of the data set and obtaining the final target model, the second client only models and trains the data set on the display interface of the secure enclave, the receiver (second client) cannot contact and copy the data set, the third party only provides the authority of the secure enclave (service), the third party does not contact and copy the data set, the trained code does not contact and copy, the authority of the final target model is not obtained, and the third party, the first client and the second client have no direct relation, so that the security of the data can be ensured.

When the first client has analysis requirements on the data set to be processed, logging in a website provided with services by a third party through a network, clicking a key of a safety house component of the website, automatically generating a special safety house for the request based on a data processing request of the first client, and transmitting the data set to be processed to the safety house by the first client through a TEE trusted environment.

In step S5, a one-to-many data entrusted development process without participation of a third party is realized by means of the safety house, so that the safety of data and a code model is ensured, and the ownership of the data is also ensured.

S6, dividing the data set into a public sample set and a non-public sample set, transmitting the public sample set to the front end of the safety house, and transmitting the non-public sample set to the rear end of the safety house;

in this embodiment, the data set refers to data personally owned by the first client. For example, the data set includes user data and service data. Before the first client transmits the data set to the security house, labeling is required to be performed on all samples of the data set, and the labeling method can be completed by the first client, can be realized by a crowdsourcing system or a labeling model, and is not limited herein.

The data set is divided into a public sample set and a non-public sample set according to the attribute of each sample, wherein the attribute of the sample comprises the name, age, occupation, income and the like of the user, for example, the information related to the privacy of the user and the secret of the business is divided into the non-public sample set, and the specific division is not limited herein according to the requirements of the actual application scene.

labeling the data set.

In step S6, the public sample set is transmitted to the front end of the security house, and the non-public sample set is transmitted to the rear end of the security house without human contact, so that the second client is ensured to have no opportunity to contact the non-public sample set from beginning to end, the security of sensitive data (non-public sample set) is effectively ensured, and the second client is prevented from accurately inquiring the non-public sample set by using the public sample set.

S7, utilizing the front end to receive first codes input by more than one second client to debug the public sample set, and packaging the second codes obtained after the debugging and the training environment into a virtual mirror image file;

After the second client confirms the debugging effect, clicking a corresponding submitting key of the first display interface, and packaging the second codes obtained by debugging and the first training environment into a virtual mirror file by each second client. The virtual image file can be a dock type image file or other types of image files.

In step S7, the security house may perform data import from outside, and data in the security house cannot be exported to outside. That is, the second client can import some data required by the second client into the safety house to conduct model pre-training, the public sample set and the non-public sample set cannot be exported from the safety house, the fact that the data of the safety house only cannot be imported or exported is ensured, the fact that a third party or the second client cannot touch the data in the data transaction process is effectively ensured, and risks of data leakage and tampering are avoided.

Wherein, the steps S5-S7 are the same as the steps S1-S3.

S8, mounting the virtual mirror image file and the non-public sample set by utilizing the rear end, generating training parameters of a preset initial model, and debugging the initial model by enabling each second client to execute the second code based on the training parameters to obtain a plurality of target models;

In the present embodiment, a relevant environment is generated according to the number of the second clients, and the competition start time and the competition end time are specified, and the respective second clients share the front end and the rear end of the secure enclave.

And transmitting the virtual image files to the rear end of the security house, receiving the virtual image files of each second client by a message queue component of the rear end, sequencing the virtual image files, and executing the test according to the sequence.

And virtualizing the content of the virtual image file into physical CD-ROM content by utilizing an image mounting tool at the back end, namely, deploying a first training environment of the virtual image file in the back end and running a second code to generate an initial model corresponding to the second code, mounting a non-public sample set to train the initial model so as to enable each second client to debug the second code and generate a target model requested by the first client.

training parameters based on the initial model;

And S8, scoring each target model by utilizing the back end, selecting the target model with the scoring value larger than a threshold value, and returning the target model to the first client.

And scoring each target model by utilizing the back end, wherein each second client can see the accuracy ranking of the test set of the second client and the other second clients on the second display interface in real time, and the competition is ended and counted down. The second client needs to submit the final code before the countdown is finished, and takes the model generated by the code as the target model.

After the countdown is finished, the back end evaluates the final model of each second client with verification set data one by one, and the following first client data pass the evaluation: 1. reaching a target model with verification set accuracy greater than a threshold (e.g., 99% of threshold); 2. the accuracy rate meets the requirements specified by the first client.

The respective second clients cannot see the model of each other. The second client can see the real-time ranking of test set correctness and the accuracy of the own model on the test set.

To prevent differential attacks, a 95% test set is randomly drawn for testing each time. The target model final performance test is performed on the validation set.

And if the performance evaluation of the target model is passed, all the back-end data except the target model are destroyed. The object model is fetched from the backend and handed to the first client.

In the one-to-many process, a plurality of second clients participate in each other in competition, and the back end automatically judges the best model generated by the n second clients to be delivered and used.

In step S8, the second client can only watch the training parameters of the model on the second display interface of the back end, and cannot enter the back end to modify the second code, so that the back end ensures that the second client cannot contact the non-public sample set in the whole process, thereby ensuring that the non-public sample set cannot leak at all, and ensuring that the non-public sample set cannot be accurately searched.

In the steps S5-S8, the data set is divided into a public sample set and a non-public sample set by the first client in a trusted environment, the public sample set is transmitted to the front end of the secure house, the non-public sample set is transmitted to the rear end of the secure house, and the front end of the secure house is used for inputting codes and debugging codes to the second clients, and the rear end of the secure house is used for building training parameters of a model and a display model, so that the second clients cannot contact the non-public sample set in the whole process, and the fact that the second clients have no authority to enter the rear end is ensured.

Fig. 2 is a schematic block diagram of a data processing device based on a security house according to an embodiment of the invention.

The data processing apparatus 100 based on a secure enclave according to the present invention may be installed in an electronic device. Depending on the functions implemented, the security house-based data processing apparatus 100 may include a request module 110, a partition module 120, a packaging module 130, and a debug module 140. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

a request module 110, configured to respond to a data processing request of a first client, where the data processing request includes a data set to be processed, and construct a secure room in a preset trusted environment;

a dividing module 120, configured to divide the data set into a public sample set and a non-public sample set, transmit the public sample set to a front end of the security house, and transmit the non-public sample set to a rear end of the security house;

the packaging module 130 is configured to debug the public sample set by using the first code input by the front end receiving the second client, and package the second code obtained after the debugging and the training environment into a virtual image file;

And the debugging module 140 is configured to mount the virtual image file and the non-public sample set by using the back end, generate training parameters of a preset initial model, debug the initial model by using the second code to obtain a target model based on the training parameters, and return the target model to the first client.

Training parameters based on the initial model;

Fig. 3 is a schematic structural diagram of an electronic device for implementing a data processing method based on a secure enclave according to an embodiment of the present invention.

In the present embodiment, the electronic device 1 includes, but is not limited to, a memory 11, a processor 12, and a network interface 13, which are communicably connected to each other via a system bus, and the memory 11 stores therein a security house-based data processing program 10, and the security house-based data processing program 10 is executable by the processor 12. Fig. 3 shows only the electronic device 1 with the components 11-13 and the secure house based data processing program 10, it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.

Wherein the storage 11 comprises a memory and at least one type of readable storage medium. The memory provides a buffer for the operation of the electronic device 1; the readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1; in other embodiments, the nonvolatile storage medium may also be an external storage device of the electronic device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. In this embodiment, the readable storage medium of the memory 11 is generally used to store an operating system and various application software installed in the electronic device 1, for example, to store codes of the security house-based data processing program 10 in one embodiment of the present invention. Further, the memory 11 may be used to temporarily store various types of data that have been output or are to be output.

Processor 12 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the electronic device 1, such as performing control and processing related to data interaction or communication with other devices, etc. In this embodiment, the processor 12 is configured to execute the program code or process data stored in the memory 11, for example, execute the data processing program 10 based on the security house.

The network interface 13 may comprise a wireless network interface or a wired network interface, the network interface 13 being used for establishing a communication connection between the electronic device 1 and a terminal (not shown).

Optionally, the electronic device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The secure house-based data processing program 10 stored in the memory 11 of the electronic device 1 is a combination of instructions which, when run in the processor 12, can implement:

Specifically, the specific implementation method of the above-mentioned data processing program 10 based on the security house by the processor 12 may refer to the description of the related steps in the corresponding embodiment of fig. 1, which is not repeated herein.

Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable medium may be nonvolatile or nonvolatile. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

The computer readable storage medium stores a data processing program 10 based on a security house, where the data processing program 10 based on the security house may be executed by one or more processors, and the specific implementation of the computer readable storage medium is basically the same as the above embodiments of the data processing method based on the security house, and is not described herein.

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A method of secure enclave-based data processing, the method comprising:

2. The secure enclosure-based data processing method of claim 1, wherein the secure enclosure comprises a front end for receiving the data set uploaded by the first client and for code debugging by the second client, a back end for displaying training parameters of the initial model, and a virtual machine cloud desktop.

3. The method for processing data based on a secure enclave according to claim 1, wherein the step of debugging the public sample set by using the first code input by the front end to receive the second client, and packaging the second code and the first training environment obtained after the debugging into a virtual mirror file includes:

4. The method for processing data based on a secure enclave according to claim 1, wherein the generating training parameters of a preset initial model by using the back end to mount the virtual image file and the non-public sample set includes:

5. The method for processing data based on a safe room according to claim 1, wherein training the initial model by using the non-public sample set to obtain training parameters of the initial model comprises:

6. The method for processing data based on a security house according to claim 1, wherein the step of debugging the initial model based on the training parameters to cause the second client to execute the second code to obtain a target model comprises:

training parameters based on the initial model;

7. The secure house-based data processing method of claim 1, wherein prior to said returning the object model to the first client, the method further comprises:

8. A security house-based data processing apparatus, the apparatus comprising:

9. An electronic device, the electronic device comprising:

At least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a secure enclave-based data processing program executable by the at least one processor to enable the at least one processor to perform the secure enclave-based data processing method of any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon a secure house-based data processing program executable by one or more processors to implement the secure house-based data processing method of any of claims 1 to 7.