CN114584306B - Data processing method and related device - Google Patents

Data processing method and related device Download PDF

Info

Publication number
CN114584306B
CN114584306B CN202210481239.XA CN202210481239A CN114584306B CN 114584306 B CN114584306 B CN 114584306B CN 202210481239 A CN202210481239 A CN 202210481239A CN 114584306 B CN114584306 B CN 114584306B
Authority
CN
China
Prior art keywords
key
data
result
task
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210481239.XA
Other languages
Chinese (zh)
Other versions
CN114584306A (en
Inventor
何林书
陈瑞钦
刘舒
黎相敏
张韬
张博
程勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210481239.XA priority Critical patent/CN114584306B/en
Publication of CN114584306A publication Critical patent/CN114584306A/en
Application granted granted Critical
Publication of CN114584306B publication Critical patent/CN114584306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Storage Device Security (AREA)

Abstract

A data processing method and related apparatus are performed by a computer device having a distributed computing framework including a key management service module, a scheduler, and an executor operating in a trusted execution environment. After receiving the computing task request, the key management service module receives a first key request which is sent by the dispatcher and comprises an initial password, and receives a second key request which is sent by the executor and comprises the initial password. And the key management service module responds to the first key request and the second key request respectively, performs message authentication code generation operation according to the generated sealing key and the initial password to obtain a security key, and returns the security key to the scheduler and the actuator so that the scheduler and the actuator establish RPC connection according to the security key. Based on the RPC connection, the executor executes the target calculation task to obtain task result data. The method and the device can ensure the security of the session, further protect the security of the data to be processed and reduce the potential safety hazard.

Description

Data processing method and related device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and a related apparatus.
Background
Distributed computing is a computer science which studies how to divide a problem which needs huge computing power to solve into a plurality of small parts, then the parts are distributed to a plurality of computing nodes for processing, and finally the computing results are integrated to obtain a final result. A distributed computing framework is a framework that implements distributed computing.
In a distributed computing framework, a scheduler (driver) is responsible for distributing computing tasks to various executors (executors), and the executors are responsible for executing specific computing tasks. And communication between the scheduler and the executor is implemented based on Remote Procedure Call (RPC). The current RPC security protection measures protect the session by sharing a password (secret) input by a user between a scheduler and an executor, establishing an RPC session key by using the shared secret between the scheduler and the executor, and subsequently further using the negotiated RPC session key.
However, the security protection measures have potential safety hazards, that is, the security protection measures consider that the secret is safe in the processes of disk storage, network transmission and memory use and cannot be stolen on the premise of trusting the host, and further consider that the RPC session key based on secret negotiation is safe and further consider that the session is safe. Under the condition of increasingly severe security situation, the assumed foundation of the trust host computer does not exist any more. When the host sinks, the secret will be exposed in each link of disk storage, network transmission, memory use and the like, the safety of the RPC session can be further threatened, and further the user data transmitted in the execution process of the computing task is not protected any more.
Disclosure of Invention
In order to solve the technical problem, the application provides a data processing method and a related device, even if a host sinks, session safety can be guaranteed through RPC connection established based on a security key, so that the safety of data to be processed used by a target computing task is protected, and potential safety hazards are greatly reduced.
The embodiment of the application discloses the following technical scheme:
in one aspect, embodiments of the present application provide a data processing method, where the method is performed by a computer device having a distributed computing framework, where the distributed computing framework includes a key management service module, a scheduler, and an executor, where the key management service module, the scheduler, and the executor run in a trusted execution environment, and the method includes:
after receiving a calculation task request sent by data user equipment, receiving a first key request sent by the scheduler through the key management service module, and receiving a second key request sent by the executor through the key management service module, wherein the first key request and the second key request comprise an initial password;
responding to the first key request through the key management service module, performing message authentication code generation operation according to a sealing key and the initial password generated by the key management service module to obtain a security key, and responding to the second key request through the key management service module, and performing message authentication code generation operation according to the sealing key and the initial password to obtain the security key;
returning the security key to the dispatcher and the executor through the key management service module so that the dispatcher and the executor can establish Remote Procedure Call (RPC) connection according to the security key;
and based on the RPC connection, executing a target calculation task aiming at the calculation task request through the actuator to obtain task result data.
In one aspect, an embodiment of the present application provides a data processing apparatus deployed on a computer device having a distributed computing framework, where the distributed computing framework includes a key management service module, a scheduler, and an executor, where the key management service module, the scheduler, and the executor operate in a trusted execution environment, and the apparatus includes a receiving unit, a generating unit, a returning unit, and an executing unit:
the receiving unit is configured to receive, through the key management service module, a first key request sent by the scheduler after receiving a computation task request sent by a data consumer device, and receive, through the key management service module, a second key request sent by the executor, where the first key request and the second key request include an initial password;
the generating unit is used for responding to the first key request through the key management service module, performing message authentication code generating operation according to a sealing key and the initial password generated by the key management service module to obtain a security key, and responding to the second key request through the key management service module, and performing message authentication code generating operation according to the sealing key and the initial password to obtain the security key;
the return unit is used for returning the security key to the dispatcher and the executor through the key management service module so that the dispatcher and the executor can establish Remote Procedure Call (RPC) connection according to the security key;
and the execution unit is used for executing a target calculation task aiming at the calculation task request through the executor based on the RPC connection to obtain task result data.
In one aspect, an embodiment of the present application provides a data processing method, which is performed by a computer device having a distributed computing framework, where the distributed computing framework is deployed in a trusted execution environment, and the method includes:
receiving a calculation task request sent by data user equipment, wherein the calculation task request comprises a storage path of encrypted data to be processed;
responding to the computing task request, acquiring a data encryption key in the trusted execution environment, and acquiring the encrypted data to be processed from a storage system according to the storage path;
decrypting the encrypted data to be processed according to the data encryption key to obtain the data to be processed;
and executing the target calculation task aiming at the calculation task request according to the data to be processed to obtain task result data.
In one aspect, an embodiment of the present application provides a data processing apparatus, where the apparatus is deployed in a computer device having a distributed computing framework, and the distributed computing framework is deployed in a trusted execution environment, and the apparatus includes a receiving unit, an obtaining unit, a decryption unit, and an execution unit:
the receiving unit is used for receiving a calculation task request sent by data user equipment, wherein the calculation task request comprises a storage path of encrypted data to be processed;
the obtaining unit is used for responding to the computing task request, obtaining a data encryption key in the trusted execution environment, and obtaining the encrypted data to be processed from a storage system according to the storage path;
the decryption unit is used for decrypting the encrypted data to be processed according to the data encryption key to obtain the data to be processed;
and the execution unit is used for executing the target calculation task aiming at the calculation task request according to the data to be processed to obtain task result data.
In one aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any of the preceding aspects in accordance with instructions in the program code.
In one aspect, the present application provides a computer-readable storage medium for storing program code for executing the method of any one of the preceding aspects.
In one aspect, the present application provides a computer program product comprising a computer program that, when executed by a processor, implements the method of any one of the preceding aspects.
According to the technical scheme, the method provided by the application is executed by the computer equipment with the distributed computing framework, the distributed computing framework comprises the key management service module, the scheduler and the executor, and the distributed computing framework is combined with the trusted execution environment, so that the key management service module, the scheduler and the executor are operated in the trusted execution environment. Thus, after receiving the computing task request, the key management service module receives a first key request including an initial password sent by the dispatcher and receives a second key request including the initial password sent by the executor. In order to improve the security of the established RPC connection, the key management service module responds to the first key request, and performs message authentication code generation operation according to the sealing key and the initial password generated by the key management service module to obtain a security key, and the key management service module responds to the second key request, and performs message authentication code generation operation according to the sealing key and the initial password to obtain the security key. Because the sealing key is generated by the trusted execution environment and the process of generating the security key is performed in the trusted execution environment, the security key cannot be acquired by the outside, and thus, the purpose of converting the unsafe initial password into the security key is achieved. And then the key management service module returns a security key to the scheduler and the executor so that the scheduler and the executor establish RPC connection according to the security key, and the executor executes a target calculation task aiming at the calculation task request to obtain task result data based on the RPC connection. Because the security key is produced in the trusted execution environment, cannot be acquired by the outside, and is safe, the target computing task is also performed in the trusted execution environment, so even if the host sinks, the RPC connection established based on the security key can also ensure the session safety, thereby protecting the security of the data to be processed used by the target computing task, and greatly reducing the potential safety hazard.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and for a person of ordinary skill in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a data processing method provided in the related art;
fig. 2 is an application scenario architecture diagram of a data processing method according to an embodiment of the present application;
fig. 3 is a flowchart of a data processing method according to an embodiment of the present application;
fig. 4 is a block diagram of a data processing method according to an embodiment of the present application;
fig. 5 is a flowchart of a key request process provided in an embodiment of the present application;
FIG. 6 is a flow chart of another data processing method provided by the embodiments of the present application;
fig. 7a is a signaling interaction diagram of a data processing method according to an embodiment of the present application;
FIG. 7b is a block diagram of another data processing method according to an embodiment of the present disclosure;
fig. 8 is a block diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a block diagram of another data processing apparatus according to an embodiment of the present application;
fig. 10 is a block diagram of a terminal according to an embodiment of the present disclosure;
fig. 11 is a block diagram of a server according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
As shown in fig. 1, the RPC security protection measure in the current distributed computing framework is to share a password (secret) input by a user between a scheduler (Driver) and an Executor (Executor), the Driver and the Executor use the shared secret to establish an RPC session key between themselves, and then further use the negotiated session key to secure the session. Fig. 1 includes two actuators, namely an actuator 1 and an actuator 2.
The security threat model of the distributed computing framework is a trusted host and an untrusted network. The secret is considered to be safe in the processes of disk storage, network transmission and memory use and cannot be stolen, and further, the session key based on secret negotiation is considered to be safe, and the session is further considered to be safe. Under the condition of increasingly severe security situation, the assumed foundation of the trust host computer does not exist any more. When the host sinks, the secret will be exposed in each link of disk storage, network transmission, memory use and the like, the security of RPC session in the distributed computing framework is further threatened, and the data transmitted in the RPC computing process is not protected any more.
In order to solve the above technical problems, embodiments of the present application provide a data processing method, in which a distributed computing framework is combined with a Trusted Execution Environment (TEE), so that a key management service module, a scheduler, and an actuator that can contact data in the distributed computing framework are operated in the TEE, thereby converting an unsafe initial password into a secure key, and thus, an RPC connection established based on the secure key can also ensure session security, thereby protecting security of data to be processed used by a target computing task, and greatly reducing potential safety hazards.
In the embodiment of the application, the trusted computing is a technology for protecting the security of a computing process and guaranteeing the privacy and integrity of data on the basis of hardware; the method is characterized in that a memory isolation and memory encryption mechanism is arranged outside, namely, the outside comprises an operating system and does not have access authority of a memory space in a trusted computing domain; there is a remote attestation mechanism to remotely attest to the authenticity of the trusted computing environment, the integrity of the logic running therein. The effect achieved by using trusted computing is that data and program logic in the trusted computing domain cannot be tampered by the outside and cannot be acquired by the external environment without active output. The environment in which trusted computing is performed may be referred to as a trusted execution environment (i.e., TEE), and the application implemented based on the TEE may be referred to as a TEE application. It is believed that the calculation may adopt an Intel Software protection eXtensions (Intel SGX) scheme, a processor trusted zone (Acorn RISC Machine trust zone, ARM trust zone) scheme, and the like.
The distributed computing framework can be any kind of distributed computing framework, for example, a big data distributed computing framework such as Hadoop, Storm, Samza, Spark, Flink, etc. According to the embodiment of the application, any one of the distributed computing frameworks and any one of the trusted computing can be freely combined, distributed computing based on trusted computing is achieved, and the requirements of various big data computing platforms under different architectures on privacy and compliance of data are met.
As shown in fig. 2, fig. 2 shows an application scenario architecture diagram of a data processing method. In this application scenario, a data consumer device 201 and a computer device 202 having a distributed computing framework including a key management service module 2021, a scheduler 2022 and an executor 2023 may be included, and the distributed computing framework is combined with a trusted execution environment, so that the key management service module, the scheduler and the executor are run in the trusted execution environment, and the distributed computing framework is run in a TEE, which is denoted as x on TEE, where x represents a name of the distributed computing framework, and if the distributed computing framework is Spark, denoted as Spark on TEE. The embodiment of the present application will be mainly described by taking a distributed computing framework, Spark as an example.
The data consumer device 201 may be a terminal and the computer device 202 with the distributed computing framework may be a server or a terminal. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart television, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
The data consumer device 201 may send a computing task request to the computer device 202 having a distributed computing framework, and the computer device 202 having a distributed computing framework receives, through the key management service module 2021, a first key request including an initial password sent by the scheduler 2022 and a second key request including an initial password sent by the enforcer 2023 after receiving the computing task request. The first key request and the second key request are used for requesting a security key from the key management service module 2021, so as to establish a secure RPC connection, facilitate subsequent execution of a target computation task for the computation task request based on the RPC connection, and further ensure security of data (for example, to-be-processed data) involved in the process of executing the target computation task.
The initial password (secret) is a pre-configured shared password for establishing the RPC connection, and is not secure because the initial password is shared and can be accessed by the outside world. Therefore, in order to improve the security of the established RPC connection, the key management service module 2021 may perform a message authentication code generation operation according to the sealing key (seal _ key) and the initial password generated by the key management service module 2021 to obtain a security key in response to the first key request, and perform a message authentication code generation operation according to the sealing key and the initial password to obtain a security key (sec _ secret) in response to the second key request by the key management service module. The Message Authentication Code generation operation is to convert the unsecure secret into the secure sec _ secret, and may be a method of Hash-based Message Authentication Code (HMAC), where the HMAC is a method of constructing a Message Authentication Code using a one-way Hash function, where H in the HMAC is the meaning of Hash.
Since the seal _ key is generated by the trusted execution environment and the process of generating the sec _ secret is performed in the trusted execution environment, the secure key cannot be acquired by the outside world, and thus, the purpose of converting the unsecure initial password into the secure key is achieved.
The key management service module 2021 then returns the security key to the scheduler 2022 and the executor 2023, so that the scheduler 2022 and the executor 2023 establish an RPC connection according to the security key, and further execute the target computing task for the computing task request by the executor 2023 based on the RPC connection to obtain task result data.
Because the security key is produced in the TEE, can't be obtained by the external world, safe, the target calculation task also goes on in the trusted execution environment, so even the host sinks, the RPC connection that establishes based on the security key also can guarantee session safety, and then has protected the security of the pending data that the target calculation task used, greatly reduced the potential safety hazard.
It should be noted that the embodiments of the present application may be applied to various distributed computing scenarios, and mainly relate to the field of cloud technologies, for example, cloud computing, cloud security, and the like in the field of cloud technologies.
Cloud computing (cloud computing) refers to a delivery and use mode of an IT infrastructure, and refers to obtaining required resources in an on-demand and easily-extensible manner through a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), Distributed Computing (Distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.
With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.
Cloud Security (Cloud Security) refers to a generic term for Security software, hardware, users, organizations, secure Cloud platforms for Cloud-based business model applications. The cloud security integrates emerging technologies and concepts such as parallel processing, grid computing and unknown virus behavior judgment, the latest information of trojans and malicious programs in the internet is obtained through abnormal detection of a large number of netted clients on software behaviors in the network, the latest information is sent to a server to be automatically analyzed and processed, and then the virus and trojan solution is distributed to each terminal.
The main research directions of cloud security include: 1. the cloud computing security mainly researches how to guarantee the security of the cloud and various applications on the cloud, including the security of a cloud computer system, the secure storage and isolation of user data, user access authentication, information transmission security, network attack protection, compliance audit and the like; 2. the cloud of the security infrastructure mainly researches how to adopt cloud computing to newly build and integrate security infrastructure resources and optimize a security protection mechanism, and comprises the steps of constructing a super-large-scale security event and an information acquisition and processing platform through a cloud computing technology, realizing the acquisition and correlation analysis of mass information, and improving the handling control capability and the risk control capability of the security event of the whole network; 3. the cloud security service mainly researches various security services, such as anti-virus services and the like, provided for users based on a cloud computing platform.
Next, a data processing method provided by an embodiment of the present application will be described in detail with reference to the accompanying drawings, taking a distributed computing framework as an example. Referring to fig. 3, fig. 3 shows a flow chart of a data processing method comprising:
s301, after receiving a calculation task request sent by a data user device, receiving a first key request sent by the scheduler through the key management service module, and receiving a second key request sent by the executor through the key management service module.
It should be noted that the method provided in the embodiment of the present application is executed by a computer device having a distributed computing framework, where the distributed computing framework includes a Key management service module (Key Server), a scheduler (Driver), and an Executor (execution), and the Key Server, the Driver, and the execution run in a Trusted Execution Environment (TEE), as shown in fig. 4.
When a data consumer needs to use the data to be processed of a certain data consumer to perform a computation task through the distributed computation framework, the data consumer may initiate a computation task request through the data consumer device (e.g., as shown in step 5 in fig. 4). For example, the data side is an organization that manages marital registration, the data user side is an insurance organization, and when a user applies for insurance to the insurance organization, the insurance organization may need to know the marital status, etc. of the user, which may be provided by the organization that manages marital registration. In this case, the insurance agency may serve as a data consumer, and the agency managing the marital registration may serve as a data consumer.
It should be noted that, in the embodiment of the present application, the acquisition of the relevant data (for example, the data to be processed) of the user is permitted by the user authorization.
The computing task request can be submitted to computer equipment with a distributed computing framework through a submission module, when the distributed computing framework is Spark, the submission module can be expressed as a Spark-TEE-submit module, the Spark-TEE-submit module is developed secondarily on a Spark component Spark-submit, and the newly added function is parameter encapsulation; receiving the computation task request may be a management module (Master) in Spark.
It should be noted that, before the data consumer device sends the computation task request, in order to ensure that the data consumer device and the scheduler are legitimate devices authorized by the data consumer, the data consumer device may further request authorization from the data consumer device (for example, as shown in step 3 in fig. 4), specifically, the data consumer device obtains metric information M (M is publishable) of Driver, and the data consumer device constructs data description information (P) of the data consumer device, where P may be composed of information such as data consumer information, a storage path (path), a third public key (res _ pubkey), a first hash (hash) value of a data packet of the task model, and the like. The data user device splices M and P to construct authorization data (M | P) and sends an authorization request to the data user device. After receiving the authorization request, the data side device may sign M | P with the second private key (y) to obtain a fourth signature (sig 4), and return { fourth signature (sig 4), encode _ dkey } to the data side device, thereby completing authorization of the data side device to the data side device and Driver.
Upon receiving the compute task request, the scheduler is started by the management module (e.g., as shown in step 6 of FIG. 4) and the executor is pulled up to establish the RPC connection. The key management service module receives a first key request sent by the scheduler, and the key management service module receives a second key request sent by the executor, wherein the first key request and the second key request comprise an initial password, that is, the scheduler initiates the first key request by taking the initial password as a parameter, the executor initiates the second key request by taking the initial password as a parameter, and the first key request and the second key request are used for requesting a security key. It should be noted that, in one possible case, the scheduler and the executor respectively have corresponding Guard modules (Guard), and the scheduler and the executor respectively may request the security key from the key management service module by the corresponding Guard agents (for example, as shown in step 7 in fig. 4).
It should be noted that, in order to ensure that the dispatcher, the executor and the key management service module are all trusted and operate in the TEE, the dispatcher or the executor may perform remote certification and verification with the key management service module, respectively, and after the verification is passed, request a security key from the key management service module.
Based on this, in one possible implementation, a dispatcher may initiate a remote attestation verification to the key management service module to obtain a first verification result, and an executor may initiate a remote attestation verification to the key management service module to obtain a second verification result. And if the first verification result and the second verification result indicate that the verification is passed, executing the steps of receiving a first key request sent by the scheduler through the key management service module and receiving a second key request sent by the executor through the key management service module so as to obtain a consistent security key subsequently. If at least one of the first verification result or the second verification result indicates that the remote attestation verification fails, a consistent sec _ secret cannot be acquired, and a subsequent RPC authentication process cannot be passed, so that an RPC connection cannot be established.
Remote attestation is a method by which a trusted execution environment proves to a third party (e.g., a scheduler or an executor) the legitimacy of the hardware of the environment on which it is running. Generally, after a third party initiates a challenge, the feasible execution environment sends an information set including hash measurement of own code logic, signs the information set, and returns the signed information set to the third party for identity verification. If the verification is successful, the remote attestation is complete.
The dispatcher or the executor is used for performing remote certification and verification with the key management service module respectively, so that the dispatcher, the executor and the key management service module can be guaranteed to be credible and run in the TEE, the safety of a subsequent RPC session is further guaranteed, and the safety of data transmitted in the RPC calculation process is guaranteed.
S302, responding to the first key request through the key management service module, performing message authentication code generation operation according to a sealing key and the initial password generated by the key management service module to obtain a security key, and responding to the second key request through the key management service module, and performing message authentication code generation operation according to the sealing key and the initial password to obtain the security key.
And after receiving the first key request, the key management service module performs message authentication code generation operation according to the sealing key and the initial password generated by the key management service module to obtain the security key. And after receiving the second key request, the key management service module performs message authentication code generation operation according to the sealing key and the initial password generated by the key management service module to obtain the security key.
The message authentication code generation operation may be an HMAC operation, which takes a seal key (seal _ key) and an initial password (secret) as parameters, and converts the secret into a sec _ secret using the seal _ key. Since the seal _ key is a key generated in the TEE, that is, a key locally stored in a Central Processing Unit (CPU), and cannot be obtained from the outside, the generated sec _ secret is safe.
By the method, the unsafe secret can be converted into the safe sec _ secret which can only be acquired in the execution, Driver and Key Server, the conversion process is executed in the TEE, and the value of the sec _ secret cannot be acquired by the outside, so that the safety of the subsequent RPC session is guaranteed.
S303, returning the security key to the dispatcher and the executor through the key management service module so that the dispatcher and the executor can establish Remote Procedure Call (RPC) connection according to the security key.
After the key management service module generates the security key, the security key can be returned to the dispatcher and the executor, and after the dispatcher and the executor receive the security key, Remote Procedure Call (RPC) connection can be established according to the security key.
Because the Driver and the Executor can call the generateKey function when establishing the RPC connection, and the generateKey function comprises the originally configured secret, when the Driver and the Executor receive the sec _ secret, the original secret in the generateKey function can be replaced by the sec _ secret, thereby executing the original RPC connection establishment process. A distributed computing framework such as Spark configuration needs to enable RPC authentication and encryption to be able to turn on the RPC encryption functionality.
And S304, executing a target calculation task aiming at the calculation task request through the actuator based on the RPC connection to obtain task result data.
After the RPC connection is established, RPC conversation can be carried out between the scheduler and the executor based on the RPC connection, so that the executor executes a target calculation task aiming at the calculation task request to obtain task result data.
It should be noted that, the execution of the target computing task needs to use the data to be processed on the data side, and the data to be processed is usually stored in the storage system, so that the data to be processed needs to be acquired from the storage system when the target computing task is executed. In this case, the storage path of the data to be processed may be included in the calculation task request, so that the executor acquires the data to be processed from the storage system according to the execution path and executes the target calculation task according to the data to be processed.
It is understood that, in order to ensure the security of the data to be processed, the data to be processed may be stored in an encrypted manner, that is, the data to be processed is encrypted by using the data encryption key to obtain encrypted data to be processed (for example, as shown in step 2 in fig. 4), so that the encrypted data to be processed is stored in the storage system (for example, as shown in step 4 in fig. 4). In this case, the computation task request may include a storage path of the to-be-processed encrypted data, and the implementation manner of S304 may be based on RPC connection, where the executor obtains a data encryption key (Dkey) and a storage path (path) from the scheduler, the data encryption key is obtained based on the key management service module, then, according to the storage path, the executor obtains the to-be-processed encrypted data (for example, as shown in step 8 in fig. 4), and then, according to the data encryption key, the executor decrypts the to-be-processed encrypted data to obtain the to-be-processed data, so that, according to the to-be-processed data, the executor executes the target computation task to obtain task result data (for example, as shown in step 9 in fig. 4).
In one possible implementation, the encrypted data to be processed may be transferred by multiplexing a data uploading tool of a hadoop distributed storage system or a pulse streaming system.
It should be noted that, since the data side may include a plurality of data sides, in order to ensure that the obtained to-be-processed data is to-be-processed data of a specific data side, and the obtained Dkey can decrypt encrypted to-be-processed data of the specific data side, the calculation task request may further include identification information of the data side, such as a number, a symbol, and the like, assuming that the identification information is represented by numbers 1, 2, and 3 … …, at this time, the storage path may be represented by path [ i ], and the data encryption key is represented by Dkey [ i ], where a value of i is identification information of the data side, such as a subscript of one of the plurality of data sides, and subsequently raised i has a similar physical meaning.
Through the encrypted storage of the data to be processed, the data to be processed is encrypted data to be processed in the processes of storage and transmission to the actuator, and the security of the data to be processed is ensured.
In the embodiment of the application, the data encryption key may have different generation modes, and the mode of acquiring the data encryption key from the scheduler through the executor may be different based on the RPC connection according to the different generation modes of the data encryption key. In a possible implementation manner, the data encryption key (Dkey) may be a first private key (X), where the first private key (X) is generated by the key management service module, the to-be-processed encrypted data is obtained by encrypting the to-be-processed data by using a first public key (X), and the first public key (X) and the first private key (X) form a first key pair (X, X). In this case, the data encryption key may be obtained from the scheduler by the executor through the scheduler by obtaining the first private key from the key management service module based on the RPC connection, and the first private key may be transmitted to the executor by the scheduler based on the RPC connection. At this time, the scheduler may request the first private key from the key management service module through the guard module.
In the method, the data to be processed is encrypted by using the first public key in the first key pair to obtain the encrypted data to be processed, and the encrypted data to be processed is decrypted by using the first private key, namely the encrypted data to be processed is obtained by encrypting by using an asymmetric encryption algorithm (such as an SM2 algorithm, a national secret asymmetric encryption algorithm), and the first private key is stored in the TEE and cannot be obtained by the outside, so that the calculation of the data to be processed in a 'available but invisible' form is realized, and the safety of the data to be processed is improved.
In another possible implementation manner, in order to improve the encryption efficiency while protecting the security of the data to be processed, the data encryption key (Dkey) may be a key for a symmetric encryption algorithm, which is generated by a data side device corresponding to the data side, for example, the data side device generates a random number by using a random number generation algorithm and uses the random number as a data encryption key for the symmetric encryption algorithm, and the data side device encrypts the data to be processed by using the data encryption key and the symmetric encryption algorithm to generate the encrypted data to be processed. And then encrypting the data encryption Key by using the first public Key (X) to obtain an encrypted text (encode _ Dkey) of the data encryption Key, wherein the data to be processed is protected by the Dkey, the Dkey is protected by a first private Key (X) corresponding to the first public Key (X), and the first private Key (X) is in the Key management service module, so that only the Key management service module (Key Server) or an application trusted by the Key management service module (Key Server) can unlock the encrypted data to be processed.
In this case, the computation task request may include a ciphertext of the data encryption Key, where the ciphertext of the data encryption Key is obtained by encrypting the data encryption Key using a first public Key, and based on the RPC connection, the data encryption Key may be obtained from the scheduler through the Executor, where the scheduler (Driver) obtains a first private Key (X) from a Key management service module (Key Server), and the first public Key (X) and the first private Key (X) form a first Key pair (X, X), and then, according to the first private Key, the scheduler (Driver) decrypts the ciphertext (encode _ Dkey) of the data encryption Key to obtain the data encryption Key (Dkey), and then, based on the RPC connection, the scheduler (Driver) sends the data encryption Key (Dkey) to the Executor (Executor). At this time, the scheduler may request the first private key from the key management service module through the guard module.
By the method, the data to be processed can be encrypted by adopting a symmetric encryption algorithm, so that the encryption rate of the data to be processed is improved. In addition, the data to be processed is protected by the Dkey, the Dkey is protected by a first private Key (X) corresponding to the first public Key (X), and the first private Key (X) is in the Key management service module, so that only the Key management service module (Key Server) or an application trusted by the Key management service module (Key Server) can unlock the encrypted data to be processed, the outside cannot acquire the X, further cannot decrypt the decrypted data to be processed, and the security of the encrypted data to be processed is improved.
In one possible implementation, before a first private Key (x) is acquired from a Key management service module (Key Server) through a scheduler (Driver), the Driver initiates remote attestation of the Key Server, so as to ensure that the Driver and the Key Server are both credible and run in the TEE. After the remote certification is passed, the Driver sends a request to the Key Server to obtain the first private Key (x).
In some cases, while the Driver initiates a request to the Key Server to obtain the first private Key (x), the Driver may also obtain the signature private Key (w) and the signature certificate certA of the Driver, so that the Driver transmits { storage path [ i ], Dkey [ i ], the signature private Key w, the signature certificate certA of the Driver } as parameters to the data packet of the task model.
According to the technical scheme, the method provided by the application is executed by the computer equipment with the distributed computing framework, the distributed computing framework comprises the key management service module, the scheduler and the executor, and the distributed computing framework is combined with the trusted execution environment, so that the key management service module, the scheduler and the executor are operated in the trusted execution environment. Thus, after receiving the computing task request, the key management service module receives a first key request including an initial password sent by the dispatcher and receives a second key request including the initial password sent by the executor. In order to improve the security of the established RPC connection, the key management service module responds to the first key request, and performs message authentication code generation operation according to the sealing key and the initial password generated by the key management service module to obtain a security key, and the key management service module responds to the second key request, and performs message authentication code generation operation according to the sealing key and the initial password to obtain the security key. Because the sealing key is generated by the trusted execution environment and the process of generating the security key is performed in the trusted execution environment, the security key cannot be acquired by the outside, and thus, the purpose of converting the unsafe initial password into the security key is achieved. And then the key management service module returns a security key to the scheduler and the executor so that the scheduler and the executor establish RPC connection according to the security key, and the executor executes a target calculation task aiming at the calculation task request to obtain task result data based on the RPC connection. Because the security key is generated in the trusted execution environment, the security key cannot be acquired by the outside world, and the target computing task is also performed in the trusted execution environment, even if the host sinks, the RPC connection established based on the security key can also ensure the session security, so that the security of the data to be processed used by the target computing task is protected, and the potential safety hazard is greatly reduced.
In the embodiment of the application, the distributed computing framework realizes the privacy computation in the form of 'available invisible' of data based on the TEE to protect the integrity and confidentiality of the computing process, and realizes both availability and privacy protection, namely, on the premise that a data side only trusts a CPU (central processing unit) of a physical machine, the data to be processed is put into a verified deterministic algorithm for operation, and the computing process keeps the security of hardware and software except the CPU. In addition, the TEE is introduced on the basis of a distributed computing framework, so that the operation efficiency is hardly influenced, and the performance loss is low.
In one possible implementation manner, one implementation manner of S304 may be based on RPC connection, and the task result data is obtained by the executor executing the target calculation task according to the instruction of the task model. Among other things, the task model may be used to instruct the executor how the target computing task is split, thereby instructing the target computing task to be executed. The task model may be specified by the data consumer and passed to the executor in the form of a data packet, which may be a jar packet (a compressed document that is unique in Java).
In one case, the data packets of the task model may be tampered by an illegal user, thereby affecting the execution of the target computing task and even threatening the security of the data to be processed. In this case, the calculation task request may further include a first hash (hash) value of a data packet of the task model, before the target calculation task is executed by the executor according to an instruction of the task model based on the RPC connection to obtain task result data, a second hash value of the data packet of the task model may be calculated by the scheduler, the scheduler compares the first hash value with the second hash value to obtain a comparison result, and if the comparison result indicates that the first hash value is consistent with the second hash value, it is determined that the data packet of the task model is not tampered, so that the RPC connection is executed, and the executor executes the target calculation task according to the instruction of the task model to obtain the task result data. And if the comparison result indicates that the first hash value is inconsistent with the second hash value, the data packet of the task model is not tampered, and the subsequent steps are not executed.
By the method, whether the data packet of the task model is consistent with the expectation or not is verified, and the data packet of the task model can be guaranteed not to be tampered, so that the influence on the execution of the target calculation task is avoided, and the safety of the data to be processed is guaranteed.
In a possible implementation manner, the computation task request may further include a second public key (Y) and a fourth signature (sig 4), where the fourth signature is obtained by signing the authorization data (M | P) with a second private key (Y), and the second public key (Y) and the second private key (Y) form a second key pair (Y, Y), where the second key pair (Y, Y) is generated by the data side device. The authorization data includes metric information (M) of the scheduler and data description information (P) of the data side device, wherein the metric information is a proof of correspondence between the TEE and its source code. In this case, before the target computing task for the computing task request is executed by the executor based on the RPC connection to obtain task result data, the fourth signature may be verified by the scheduler using the second public key, and if the verification is passed, the step of executing the target computing task for the computing task request by the executor based on the RPC connection to obtain task result data may be performed.
By the method, the authorization of the data user equipment by the data user equipment can be verified, so that the data user equipment and the scheduler are guaranteed to be legal equipment authorized by the data user equipment, and the use safety of the data to be processed is guaranteed.
In a possible implementation manner, the computation task request further includes a third public key (res _ pubkey), and then the embodiment of the present application may further acquire the third public key (res _ pubkey) from the scheduler through the executor based on the RPC connection, and further encrypt the result encryption key (sKey) through the executor according to the third public key (res _ pubkey) to obtain a ciphertext (sKey _ file) of the result encryption key, and encrypt the task result data (res _ data) through the executor according to the result encryption key to obtain encrypted result data. Wherein, the sKey may be randomly generated by the Executor so as to encrypt the task result data (res _ data) by a symmetric encryption algorithm. The path of the sKey _ file is specified by the task model, and the path res _ path of the encrypted result data is specified by the task model. The encrypted result data may be stored to a storage system (e.g., as shown in step 10 of fig. 4), for example.
In this case, the task result data (res _ data) is protected by the sbey, the sbey is protected by the third private key (res _ private) corresponding to the third public key (res _ pubkey), and the third private key (res _ private) is in the result user device, so that only the result user device can decrypt the encrypted result data, thereby improving security.
In a possible implementation manner, the computing task request may further include other information in the data description information (P), for example, data side information, and in addition, the aforementioned deposit path (path), the third public key (res _ public key), and the first hash (hash) value of the data packet of the task model may also belong to the data description information (P).
In a possible implementation manner, the computation task request may further include a fourth signature (sig 4), where it is to be noted that, if the data side includes multiple data sides, ciphertext of the data encryption key corresponding to different data sides may be represented by encode _ dkey [ i ], the second public key may be represented by Y [ i ], the fourth signature may be represented by sig4[ i ], and the storage path may be represented by path [ i ], where the path [ i ] needs to correspond to the order of encode _ dkey [ i ], and Y [ i ].
The data description information (P), the second public key (Y), the ciphertext of the data encryption key (encode _ dkey), and the fourth signature (sig 4) may be parameter encapsulated by the Spark-TEE-submit module and included in the computation task request.
It should be noted that, after the task result data is obtained by executing the target computing task through the foregoing method, a result user may need to use the task result data, and in general, the result user may be the same as or different from the data user, which is not limited in this embodiment of the application. When the result user needs to use the task result data, the result user can send a result request to the computer device with the distributed computing framework through the corresponding result user device, and after receiving the result request, returning, by the executor in response to the result request, result response data including a ciphertext (sKey _ file) of the result encryption key and the encrypted result data to the result consumer apparatus, the result user equipment decrypts the ciphertext (sKey _ file) of the encryption key by using the third private key (res _ private key) to obtain a result encryption key (sKey), decrypts the encrypted result data by using the result encryption key (sKey) to obtain task result data (res _ data), and the third public key (res _ pubkey) and the third private key (res _ private key) form a third key pair.
In some possible implementations, based on the RPC connection, a private signing key (w) is obtained by the executor from the scheduler, the private signing key (w) being requested by the scheduler from the key management service module. After the executor obtains the encrypted result data, the cipher text (sKey _ file) of the result encryption key can be signed by using the signature private key (w) to obtain a second signature, and the encrypted result data is signed by using the signature private key (w) to obtain a third signature. In this case, the result response data further includes a second signature and a third signature, so that the result user device performs signature verification based on the second signature and the third signature, respectively, and after the signature verification passes, performs a step of decrypting the ciphertext (sKey _ file) of the encryption key by using a third private key (res _ private) to obtain a result encryption key (sKey), and decrypting the encrypted result data by using the result encryption key (sKey) to obtain task result data (res _ data).
Through the signature verification, the encrypted result data can be ensured to be returned by the authentic Executor, and further, the security is ensured.
In a possible implementation manner, the result response data further includes a Driver certificate certA, so that the result user equipment can verify the Driver certificate certA by using the Key Server certificate certK, and if the verification is passed, the signature verification is performed. Therefore, when the Driver is credible and is positioned in the TEE, the next processing step is carried out, the safety is ensured, and meanwhile, unnecessary processing steps are avoided.
In the above method, the first key pair plays a very important role in protecting the security of the data to be processed. The first key pair may be obtained through a key request procedure of the data side device, and the key request procedure of the data side device will be described next. Referring to fig. 5, the method includes:
s501, an encrypted public key request sent by the data side equipment is received, and the encrypted public key request comprises a second public key generated by the data side equipment.
The data side device randomly generates its second key pair (Y, Y) using a cryptographic asymmetric encryption algorithm (e.g., SM2 algorithm), where Y is a second private key and Y is a second public key. The data side device then sends an encrypted public key request (e.g., as shown in step 1 of fig. 4) including the second public key to the key management service module to request the first public key (X) from the key management service module.
And S502, inquiring a key database through the key management service module according to the second public key.
S503, if the first key pair corresponding to the second public key is inquired, returning an encrypted public key response message to the data side equipment through the key management service module, wherein the encrypted public key response message comprises the first public key in the first key pair.
After receiving the encryption public key request, the key management service module may query whether a corresponding key pair, for example, a first key pair, exists in the key database with the second public key (Y) as a key value. And if the first public key exists, returning an encrypted public key response message to the data side equipment through the key management service module, wherein the encrypted public key response message comprises the first public key in the first key pair, so that the data side equipment can obtain the first public key (X) for encryption. If not, a pair of first keys is created.
In a possible implementation manner, the encrypted public key request further includes a first signature (sig 1), the first signature (sig 1) is obtained by signing the second public key (Y) through a second private key (Y), and the second private key (Y) and the second public key (Y) form a second key pair (Y, Y). In this way, before querying the key database through the key management service module according to the second public key, the key management service module may also verify the first signature by using the second public key, and if the verification is passed, perform the step of querying the key database through the key management service module according to the second public key.
In a possible implementation manner, in order to ensure that the Key management service module is authentic and operates in the TEE, the data side device may preset a certificate certK, which may be externally disclosed by a Key Server, in advance, and is used for a subsequent data side device to verify a signature of the Key Server and initiate a remote attestation.
And the certK is used by the data side equipment to initiate remote certification to the Key Server so as to verify the validity of the Key Server. The data side equipment initiates a remote certification request to the Key Server, the Key Server returns a request response, the request response contains the measurement information of the code executed by the Key Server, and the data side equipment can compare the measurement information analyzed from the certK with the measurement information returned by the request response. If the comparison is consistent, the remote certification is passed, and the data side equipment can trust the Key Server to be legal and continue to execute the subsequent flow.
In a possible implementation manner, a fourth signature (sig 4) of the first public Key (X) by the Key Server may also be included in the public Key response message, so that the fourth signature (sig 4) is also returned to the data side device.
After receiving the public Key response message, the data side device can parse out the certificate public Key (Z) from certK, and verify that sig4 is a legal signature of X by using Z, so as to trust that the first public Key (X) returned from Key Server in the previous step is indeed returned from the trusted Key Server.
According to the embodiment of the application, the distributed computing framework and the TEE are combined, safer RPC connection is established, and the to-be-processed data are protected in a safer mode, so that the scene requirements are met, namely the privacy safety of the to-be-processed data is effectively guaranteed in a big data computing scene, and the business compliance requirements are met.
An embodiment of the present application further provides a data processing method, where the method is performed by a computer device having a distributed computing framework, and the distributed computing framework is deployed in a trusted execution environment, and with reference to fig. 6, the method includes:
s601, receiving a calculation task request sent by data user equipment, wherein the calculation task request comprises a storage path of encrypted data to be processed.
When a data user needs to use the data to be processed of a certain data user and executes a calculation task through the distributed calculation framework, the data user can initiate a calculation task request through the data user equipment.
S602, responding to the computing task request, obtaining a data encryption key in the trusted execution environment, and obtaining the encrypted data to be processed from a storage system according to the storage path.
In the embodiment of the present application, the data encryption key may have different generation manners, and the manner of obtaining the data encryption key in the trusted execution environment is also different according to the different generation manners of the data encryption key. In one possible implementation, the data encryption key may be a first private key, which may be stored in the trusted execution environment, e.g., in the key management service module. The to-be-processed encrypted data can be obtained by encrypting the to-be-processed data by using a first public key, and the first public key and the first private key form a first key pair. At this point, the first private key may be obtained directly from the trusted execution environment.
In the method, the data to be processed is encrypted by using the first public key in the first key pair to obtain the encrypted data to be processed, and the encrypted data to be processed is decrypted by using the first private key, namely the encrypted data to be processed is encrypted by using an asymmetric encryption algorithm, and the first private key is stored in the TEE and cannot be acquired by the outside, so that the calculation of the data to be processed in a 'usable but invisible' form is realized, and the safety of the data to be processed is improved.
In another possible implementation manner, in order to improve the encryption efficiency while protecting the security of the data to be processed, the data encryption key (Dkey) may be a key for a symmetric encryption algorithm, which is generated by a data side device corresponding to the data side, for example, the data side device generates a random number by using a random number generation algorithm and uses the random number as a data encryption key for the symmetric encryption algorithm, and the data side device encrypts the data to be processed by using the data encryption key and the symmetric encryption algorithm to generate the encrypted data to be processed. And then encrypting the data encryption key by using the first public key (X) to obtain an encrypted text (encode _ Dkey) of the data encryption key, wherein the data to be processed is protected by the Dkey, the Dkey is protected by a first private key (X) corresponding to the first public key (X), and the first private key (X) is also in the TEE, such as a key management service module in the TEE, so that only the application trusted by the key management service module or the key management service module can unlock the encrypted data to be processed.
In this case, the computing task request includes a ciphertext of the data encryption key, where the ciphertext of the data encryption key may be obtained by encrypting the data encryption key using a first public key, and at this time, the manner of obtaining the data encryption key in the trusted execution environment may be to obtain a first private key stored in the trusted execution environment, where the first public key and the first private key form a first key pair. And then, decrypting the ciphertext of the data encryption key according to the first private key to obtain the data encryption key.
By the method, the data to be processed can be encrypted by adopting a symmetric encryption algorithm, so that the encryption rate of the data to be processed is improved. In addition, the data to be processed is protected by the Dkey, the Dkey is protected by a first private key (X) corresponding to the first public key (X), and the first private key (X) is in the TEE, so that the outside cannot acquire X, further the data to be processed and decrypted cannot be decrypted, and the security of the encrypted data to be processed is improved.
In the above method, the first key pair plays a very important role in protecting the security of the data to be processed. The first key pair may be obtained through a key request procedure of the data side device, and the key request procedure of the data side device will be described next. Specifically, the computer device with the distributed computing framework may receive an encrypted public key request sent by the data side device, where the encrypted public key request includes a second public key generated by the data side device, then query the key database according to the second public key, and return an encrypted public key response message to the data side device if a first key pair corresponding to the second public key is queried, where the encrypted public key response message includes a first public key in the first key pair.
In a possible implementation manner, the encrypted public key request further includes a first signature, the first signature is obtained by signing the second public key through a second private key, and the second public key and the second private key form a second key pair, so that before querying the key database according to the second public key, the second public key can be used to verify the first signature, and if the verification is passed, the step of querying the key database according to the second public key is performed.
The specific implementation manner of the above steps is described with reference to the embodiment corresponding to fig. 5, and is not described herein again.
S603, decrypting the encrypted data to be processed according to the data encryption key to obtain the data to be processed.
S604, according to the data to be processed, executing a target calculation task aiming at the calculation task request to obtain task result data.
In a possible implementation manner, one implementation manner of S604 may be to execute the target computing task according to the instruction of the task model to obtain the task result data according to the data to be processed. The task model can be used to instruct the target computing task how to split, and thus instruct the executor how to execute the target computing task. The task model may be specified by the data consumer and passed to the executor in the form of a data packet, which may be a jar packet (a compressed document that is unique in Java).
In one case, the data packets of the task model may be tampered by an illegal user, thereby affecting the execution of the target computing task and even threatening the security of the data to be processed. In this case, the calculation task request may further include a first hash value of a data packet of the task model, before the target calculation task is executed according to the instruction of the task model to obtain the task result data, a second hash value of the data packet of the task model may be further calculated, the first hash value and the second hash value are compared to obtain a comparison result, and if the comparison result indicates that the first hash value is consistent with the second hash value, it is determined that the data packet of the task model is not tampered, so that the step of executing the target calculation task according to the instruction of the task model to obtain the task result data is executed according to the data to be processed.
By the method, whether the data packet of the task model is consistent with the expectation or not is verified, and the data packet of the task model can be guaranteed not to be tampered, so that the influence on the execution of the target calculation task is avoided, and the safety of the data to be processed is guaranteed.
In a possible implementation manner, the computing task request may further include a second public key and a fourth signature, where the fourth signature is obtained by signing authorization data with a second private key, the second public key and the second private key form a second key pair, the authorization data is generated by requesting, by the data consumer device, authorization from the data consumer device, and the authorization data includes metric information obtained from the trusted execution environment and data description information of the data consumer device. In this case, before the target computing task for the computing task request is executed according to the data to be processed to obtain the task result data, the fourth signature may be verified by using the second public key, and if the verification is passed, the step of executing the target computing task for the computing task request to obtain the task result data according to the data to be processed is executed.
The above steps may specifically refer to the related description in the corresponding embodiment of fig. 3, and are not described herein again.
In the embodiment of the application, the data encryption key is acquired in the TEE and cannot be acquired from the outside, so that the distributed computing framework is organically combined with the TEE, and the security of user data (such as to-be-processed data) can be ensured.
In a possible implementation manner, the calculation task request further includes a third public key, and in the embodiment of the present application, the third public key may be further obtained from the calculation task request, the result encryption key is encrypted according to the third public key to obtain a ciphertext of the result encryption key, and the task result data is encrypted according to the result encryption key to obtain encrypted result data, so that the encrypted result data is stored in the storage system. For details, reference may be made to relevant content of processing the task result data in the embodiment corresponding to fig. 3, which is not described herein again.
It should be noted that, after the task result data is obtained by executing the target calculation task by the foregoing method, the result user may need to use the task result data, and in general, the result user may be the same as or different from the data user, which is not limited in this embodiment of the present application. When the result user needs to use the task result data, the result user can send a result request to the computer device with the distributed computing framework through the corresponding result user device, and the computer device with the distributed computing framework returns result response data to the result user device in response to the result request after receiving the result request. The result response data comprises a ciphertext of the result encryption key and the encryption result data, so that the result user equipment decrypts the ciphertext of the encryption key by using the third private key to obtain the result encryption key, decrypts the encryption result data by using the result encryption key to obtain the task result data, and the third public key and the third private key form a third key pair.
In some possible implementation manners, after the encrypted result data is obtained, a signature private key in the trusted execution environment may be used to sign a ciphertext of the result encryption key to obtain a second signature, and the signature private key is used to sign the encrypted result data to obtain a third signature. In this case, the result response data further includes a second signature and a third signature, so that the result user device performs signature verification based on the second signature and the third signature, respectively, and after the signature verification passes, performs the steps of decrypting the ciphertext of the encryption key by using the second private key to obtain a result encryption key, and decrypting the encrypted result data by using the result encryption key to obtain task result data. The data processing method provided by the embodiment of the present application is described in detail above. Based on the above introduction, the embodiment of the application further provides a data processing method. In the method, a distributed computing framework is Spark as an example, and Key Server, Driver and an Executor included in Spark run in a TEE to form a Spark on TEE platform. The embodiment of the present application will introduce a data processing method from the perspective of interaction between a data side device, a data using side device, a Spark on TEE platform (equivalent to the computer device of the foregoing embodiment), and a storage system. Referring to fig. 7a, fig. 7a shows a signaling interaction diagram of a data processing method, the method comprising:
s701, the data side equipment sends an encryption public key request to a Spark on TEE platform.
S702, responding to the request of the encryption public key by the Spark on TEE platform, inquiring a key database to obtain a first key pair.
S703, the Spark on TEE platform returns the first public key in the first key pair to the data side device.
The specific implementation process of S701-S703 may refer to the descriptions of S501-S503, which are not described herein again.
The process of S701-S703 may be referred to as a process of requesting a first public key, see step 1 in fig. 7 b.
S704, the data side equipment encrypts the data to be processed by using the data encryption key to obtain encrypted data to be processed, and encrypts the data encryption key by using the first public key to obtain a ciphertext of the data encryption key.
The process of S704 may be referred to as an encryption process, see step 2 in fig. 7 b.
S705, the data consumer device initiates an authorization request to the data consumer device.
And S706, the data side device returns the fourth signature and the ciphertext of the data encryption key to the data use side device.
The process of S705-S706 may be referred to as a process in which the data consumer requests authorization, see step 3 in fig. 7 b. The specific implementation manner of S705-S706 may refer to the description about request authorization in the embodiment corresponding to fig. 3, and is not described herein again.
And S707, the data side equipment transmits the encrypted data to be processed into a storage system.
Wherein, S707 can be seen as shown in step 4 in fig. 7 b. In the embodiment of the present application, the execution sequence of S707 and S705-S706 is not limited, that is, S707 may be executed first, and then S705-S706 may be executed, or S705-S706 may be executed first, and then S707 is executed.
S708, the data user equipment initiates a calculation task request to the Spark on TEE platform.
Wherein S708 can be seen in step 5 in fig. 7 b.
S709, the Spark on TEE platform initiates a data acquisition request to the storage system.
S710, the storage system inputs to-be-processed encrypted data to a Spark on TEE platform.
S709-S710 are procedures for requesting the encrypted data to be processed, and refer to step 6 in fig. 7 b.
S711, decrypting the encrypted data to be processed by Spark on TEE and executing a target calculation task to obtain task result data.
And S712, encrypting the task result data by the Spark on TEE platform to obtain encrypted result data.
Wherein, S711-S712 can be seen as shown in step 7 in fig. 7 b.
It should be noted that, for a specific implementation manner of the foregoing S708-S712, reference may be made to related descriptions in the corresponding embodiment of fig. 3, and details are not described here again.
S713, the Spark on TEE platform stores the encrypted result data in the storage system.
Wherein S713 can be seen as step 8 in fig. 7 b.
It should be noted that, for a specific implementation manner of the foregoing S713, reference may be made to related descriptions in the foregoing embodiments, and details are not described here again.
It should be noted that, on the basis of the implementation manners provided by the above aspects, the present application may be further combined to provide further implementation manners.
Based on the data processing method provided in the embodiment corresponding to fig. 3, an embodiment of the present application further provides a data processing apparatus 800. Referring to fig. 8, the data processing apparatus 800 is deployed on a computer device having a distributed computing framework including a key management service module, a scheduler, and an executor which run in a trusted execution environment, the data processing apparatus 800 includes a receiving unit 801, a generating unit 802, a returning unit 803, and an executing unit 804:
the receiving unit 801 is configured to receive, after receiving a computation task request sent by a data consumer device, a first key request sent by the scheduler through the key management service module, and receive, through the key management service module, a second key request sent by the executor, where the first key request and the second key request include an initial password;
the generating unit 802 is configured to, by the key management service module in response to the first key request, perform a message authentication code generating operation according to a sealing key and the initial password generated by the key management service module to obtain a security key, and, by the key management service module in response to the second key request, perform a message authentication code generating operation according to the sealing key and the initial password to obtain the security key;
the returning unit 803 is configured to return the security key to the scheduler and the executor through the key management service module, so that the scheduler and the executor establish a Remote Procedure Call (RPC) connection according to the security key;
the execution unit 804 is configured to execute a target calculation task for the calculation task request through the executor based on the RPC connection, so as to obtain task result data.
In one possible implementation, the apparatus further includes a verification unit:
the verification unit is used for initiating remote certification verification to the key management service module through the dispatcher to obtain a first verification result, and initiating remote certification verification to the key management service module through the executor to obtain a second verification result;
if the first verification result and the second verification result both indicate that the verification passes, trigger the receiving unit 801 to execute the step of receiving, by the key management service module, the first key request sent by the scheduler and receiving, by the key management service module, the second key request sent by the executor.
In a possible implementation manner, the computation task request includes a storage path of encrypted data to be processed, and the execution unit 804 is specifically configured to:
based on the RPC connection, obtaining a data encryption key and the storage path from the scheduler through the executor, wherein the data encryption key is obtained based on the key management service module;
acquiring the encrypted data to be processed through the actuator according to the storage path;
decrypting the encrypted data to be processed through the actuator according to the data encryption key to obtain the data to be processed;
and according to the data to be processed, executing the target computing task through the actuator to obtain task result data.
In a possible implementation manner, the computation task request includes a ciphertext of a data encryption key, where the ciphertext of the data encryption key is obtained by encrypting the data encryption key with a first public key, and the execution unit 804 is specifically configured to:
acquiring a first private key from the key management service module through the scheduler, wherein the first public key and the first private key form a first key pair;
decrypting the ciphertext of the data encryption key through the scheduler according to the first private key to obtain the data encryption key;
sending, by the scheduler, the data encryption key to the executor based on the RPC connection.
In a possible implementation manner, the data encryption key is a first private key, the first private key is generated by the key management service module, the to-be-processed encrypted data is obtained by encrypting the to-be-processed data by using the first public key, the first public key and the first private key form a first key pair, and the execution unit 804 is specifically configured to:
acquiring a first private key from the key management service module through the dispatcher;
based on the RPC connection, sending, by the scheduler, the first private key to the executor.
In a possible implementation manner, the apparatus further includes a query unit:
the receiving unit 801 is further configured to receive an encrypted public key request sent by a data side device, where the encrypted public key request includes a second public key generated by the data side device;
the query unit is used for querying a key database through the key management service module according to the second public key;
if the querying unit queries the first key pair corresponding to the second public key, the returning unit 803 is further configured to return an encrypted public key response message to the data side device through the key management service module, where the encrypted public key response message includes the first public key in the first key pair.
In a possible implementation manner, the encrypted public key request further includes a first signature, the first signature is obtained by signing the second public key through a second private key, the second public key and the second private key form a second key pair, and the apparatus further includes a verification unit:
the verification unit is configured to verify the first signature by using the second public key through the key management service module;
and if the verification is passed, triggering the query unit to execute the step of querying a key database through the key management service module according to the second public key.
In a possible implementation manner, the computing task request further includes a third public key, and the apparatus further includes an obtaining unit and an encrypting unit:
the obtaining unit is used for obtaining the third public key from the scheduler through the actuator based on the RPC connection;
the encryption unit is used for encrypting the result encryption key through the actuator according to the third public key to obtain a ciphertext of the result encryption key; and encrypting the task result data through the actuator according to the result encryption key to obtain encrypted result data.
In a possible implementation manner, the receiving unit 801 is further configured to receive a result request sent by a result consumer device;
the returning unit 803 is further configured to return, in response to the result request through the executor, result response data to the result user device, where the result response data includes a ciphertext of the result encryption key and the encrypted result data, so that the result user device decrypts the ciphertext of the encryption key by using a third private key to obtain the result encryption key, decrypts the encrypted result data by using the result encryption key to obtain the task result data, and the third public key and the third private key form a third key pair.
In one possible implementation manner, the apparatus further includes an obtaining unit and a signature unit:
the obtaining unit is used for obtaining a signature private key from the scheduler through the executor based on the RPC connection, wherein the signature private key is obtained by the scheduler requesting the key management service module;
the signature unit is used for signing the ciphertext of the result encryption key by using the signature private key through the actuator to obtain a second signature, and signing the encrypted result data by using the signature private key through the actuator to obtain a third signature;
the result response data further includes the second signature and the third signature, so that the result user device respectively performs signature verification based on the second signature and the third signature, and after the signature verification is passed, performs a step of decrypting a ciphertext of the encryption key by using a second private key to obtain the result encryption key, and decrypts the encrypted result data by using the result encryption key to obtain the task result data.
In a possible implementation manner, the execution unit 804 is specifically configured to:
and based on the RPC connection, executing the target calculation task by the executor according to the instruction of a task model to obtain the task result data.
In a possible implementation manner, the computing task request further includes a first hash value of a data packet of the task model, and the apparatus further includes a computing unit and a comparing unit:
the computing unit is configured to compute, by the scheduler, a second hash value of a data packet of the task model before the execution unit 804 executes the target computation task according to an instruction of a task model based on the RPC connection through the executor to obtain the task result data according to the instruction of the task model;
the comparison unit is used for comparing the first hash value with the second hash value to obtain a comparison result;
and if the comparison result indicates that the first hash value is consistent with the second hash value, triggering the execution unit 804 to execute the step of executing the target calculation task according to the indication of a task model based on the RPC connection by the executor to obtain task result data.
In a possible implementation manner, the computing task request further includes a second public key and a fourth signature, where the fourth signature is obtained by signing authorization data with a second private key, the second public key and the second private key form a second key pair, the authorization data is generated by a data consumer device requesting authorization from a data consumer device, and the authorization data includes metric information of the scheduler and data description information of the data consumer device, and the apparatus further includes a verification unit:
the verifying unit is configured to verify, by the scheduler, the fourth signature using the second public key;
and if the verification is passed, triggering the execution unit 804 to execute the target calculation task based on the RPC connection and aiming at the calculation task request through the actuator to obtain task result data.
According to the technical scheme, the method provided by the application is executed by the computer equipment with the distributed computing framework, the distributed computing framework comprises the key management service module, the scheduler and the executor, and the distributed computing framework is combined with the trusted execution environment, so that the key management service module, the scheduler and the executor are operated in the trusted execution environment. Thus, after receiving the computing task request, the key management service module receives a first key request including an initial password sent by the dispatcher and receives a second key request including the initial password sent by the executor. In order to improve the security of the established RPC connection, the key management service module responds to the first key request, and performs message authentication code generation operation according to the sealing key and the initial password generated by the key management service module to obtain a security key, and the key management service module responds to the second key request, and performs message authentication code generation operation according to the sealing key and the initial password to obtain the security key. Because the sealing key is generated by the trusted execution environment and the process of generating the security key is performed in the trusted execution environment, the security key cannot be acquired by the outside, and thus, the purpose of converting the unsafe initial password into the security key is achieved. And then the key management service module returns a security key to the scheduler and the executor so that the scheduler and the executor establish RPC connection according to the security key, and the executor executes a target calculation task aiming at the calculation task request to obtain task result data based on the RPC connection. Because the security key is produced in the trusted execution environment, cannot be acquired by the outside, and is safe, the target computing task is also performed in the trusted execution environment, so even if the host sinks, the RPC connection established based on the security key can also ensure the session safety, thereby protecting the security of the data to be processed used by the target computing task, and greatly reducing the potential safety hazard.
Based on the data processing method provided by the corresponding embodiment of fig. 6, an embodiment of the present application further provides a data processing apparatus 900, where the data processing apparatus 900 is deployed in a computer device with a distributed computing framework, and the distributed computing framework is deployed in a trusted execution environment, and the data processing apparatus 900 includes a receiving unit 901, an obtaining unit 902, a decryption unit 903, and an executing unit 904:
the receiving unit 901 is configured to receive a computation task request sent by a data consumer device, where the computation task request includes a storage path of encrypted data to be processed;
the obtaining unit 902 is configured to, in response to the computing task request, obtain a data encryption key in the trusted execution environment, and obtain the to-be-processed encrypted data from a storage system according to the storage path;
the decryption unit 903 is configured to decrypt the encrypted data to be processed according to the data encryption key to obtain the data to be processed;
the execution unit 904 is configured to execute the target computing task for the computing task request according to the to-be-processed data to obtain task result data.
In a possible implementation manner, the computation task request includes a ciphertext of a data encryption key, where the ciphertext of the data encryption key is obtained by encrypting the data encryption key with a first public key, and the obtaining unit 902 is specifically configured to:
obtaining a first private key stored in the trusted execution environment, wherein the first public key and the first private key form a first key pair;
and decrypting the ciphertext of the data encryption key according to the first private key to obtain the data encryption key.
In a possible implementation manner, the data encryption key is a first private key, the first private key is stored in the trusted execution environment, the to-be-processed encrypted data is obtained by encrypting the to-be-processed data by using the first public key, and the first public key and the first private key form a first key pair.
In a possible implementation manner, the apparatus further includes a query unit and a return unit:
the receiving unit 901 is further configured to receive an encrypted public key request sent by a data side device, where the encrypted public key request includes a second public key generated by the data side device;
the query unit is used for querying a key database according to the second public key;
the returning unit is configured to return an encrypted public key response message to the data side device if the first key pair corresponding to the second public key is queried, where the encrypted public key response message includes a first public key in the first key pair.
In a possible implementation manner, the encrypted public key request further includes a first signature, the first signature is obtained by signing the second public key through a second private key, the second public key and the second private key form a second key pair, and the apparatus further includes a verification unit:
the verification unit is configured to verify the first signature by using the second public key;
and if the verification is passed, triggering the query unit to execute the step of querying the key database according to the second public key.
In a possible implementation manner, the computing task request further includes a third public key, and the apparatus further includes an encryption unit and a storage unit:
the obtaining unit 902 is further configured to obtain the third public key from the computing task request;
the encryption unit is used for encrypting the result encryption key according to the third public key to obtain a ciphertext of the result encryption key; encrypting the task result data according to the result encryption key to obtain encrypted result data;
the storage unit is used for storing the encrypted result data into the storage system.
In one possible implementation, the apparatus further includes a return unit:
the receiving unit 901 is specifically configured to receive a result request sent by a result user device;
the returning unit is configured to return result response data to the result user device in response to the result request, where the result response data includes a ciphertext of the result encryption key and the encrypted result data, so that the result user device decrypts the ciphertext of the encryption key by using a third private key to obtain the result encryption key, decrypts the encrypted result data by using the result encryption key to obtain the task result data, and the third public key and the third private key form a third key pair.
In one possible implementation, the apparatus further includes a signature unit:
the obtaining unit 902 is further configured to obtain a private signature key in the trusted execution environment;
the signature unit is used for signing the ciphertext of the result encryption key by using the signature private key to obtain a second signature, and signing the encrypted result data by using the signature private key to obtain a third signature;
the result response data further includes the second signature and the third signature, so that the result user device respectively performs signature verification based on the second signature and the third signature, and after the signature verification is passed, performs a step of decrypting a ciphertext of the encryption key by using a second private key to obtain the result encryption key, and decrypts the encrypted result data by using the result encryption key to obtain the task result data.
In a possible implementation manner, the execution unit 904 is specifically configured to:
and executing the target calculation task according to the to-be-processed data and the indication of a task model to obtain task result data.
In a possible implementation manner, the computing task request further includes a first hash value of a data packet of the task model, and the apparatus further includes a computing unit and a comparing unit:
the computing unit is used for computing a second hash value of the data packet of the task model;
the comparison unit is used for comparing the first hash value with the second hash value to obtain a comparison result;
and if the comparison result indicates that the first hash value is consistent with the second hash value, triggering the execution unit 904 to execute the step of executing the target computing task according to the to-be-processed data and the indication of a task model to obtain task result data.
In a possible implementation manner, the computing task request further includes a second public key and a fourth signature, the fourth signature is obtained by signing authorization data with a second private key, the second public key and the second private key form a second key pair, the authorization data is generated by a data consumer device requesting authorization from a data consumer device, and the authorization data includes metric information acquired from the trusted execution environment and data description information of the data consumer device, and the apparatus further includes a verification unit:
the verification unit is configured to verify the fourth signature by using the second public key;
and if the verification is passed, triggering the execution unit 904 to execute the target calculation task according to the data to be processed and execute the calculation task request to obtain task result data. The embodiment of the application also provides computer equipment which can execute the data processing method. The computer device may be, for example, a terminal, taking the terminal as a smart phone as an example:
fig. 10 is a block diagram illustrating a partial structure of a smartphone according to an embodiment of the present application. Referring to fig. 10, the smart phone includes: radio Frequency (RF) circuit 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuit 1060, wireless fidelity (WiFi) module 1070, processor 1080, and power source 1090. The input unit 1030 may include a touch panel 1031 and other input devices 1032, the display unit 1040 may include a display panel 1041, and the audio circuit 1060 may include a speaker 1061 and a microphone 1062. It will be appreciated that the smartphone configuration shown in fig. 10 is not intended to be limiting of smartphones and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
The memory 1020 may be used to store software programs and modules, and the processor 1080 executes various functional applications and data processing of the smart phone by operating the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the smartphone, and the like. Further, the memory 1020 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
Processor 1080 is the control center for the smartphone, connects various portions of the entire smartphone using various interfaces and lines, and performs various functions and processes data of the smartphone by running or executing software programs and/or modules stored in memory 1020, as well as invoking data stored in memory 1020. Optionally, processor 1080 may include one or more processing units; preferably, the processor 1080 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, etc., and a modem processor, which primarily handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 1080.
In this embodiment, the steps performed by processor 1080 in the smartphone may be implemented based on the architecture shown in fig. 10.
Referring to fig. 11, fig. 11 is a block diagram of a server 1100 provided in this embodiment, where the server 1100 may have a large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1122 (e.g., one or more processors) and a memory 1132, and one or more storage media 1130 (e.g., one or more mass storage devices) storing an application program 1142 or data 1144. Memory 1132 and storage media 1130 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 1122 may be provided in communication with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the server 1100.
The Server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as a Windows Server TM ,Mac OS X TM ,Unix TM , Linux TM ,FreeBSD TM And so on.
In this embodiment, the central processor 1122 in the server 1100 may perform the following steps:
after receiving a calculation task request sent by data user equipment, receiving a first key request sent by the scheduler through the key management service module, and receiving a second key request sent by the executor through the key management service module, wherein the first key request and the second key request comprise an initial password;
responding to the first key request through the key management service module, performing message authentication code generation operation according to a sealing key and the initial password generated by the key management service module to obtain a security key, and responding to the second key request through the key management service module, and performing message authentication code generation operation according to the sealing key and the initial password to obtain the security key;
returning the security key to the dispatcher and the executor through the key management service module so that the dispatcher and the executor can establish Remote Procedure Call (RPC) connection according to the security key;
and based on the RPC connection, executing a target calculation task aiming at the calculation task request through the actuator to obtain task result data.
Alternatively, the first and second electrodes may be,
receiving a calculation task request sent by data user equipment, wherein the calculation task request comprises a storage path of encrypted data to be processed;
responding to the computing task request, acquiring a data encryption key in the trusted execution environment, and acquiring the encrypted data to be processed from a storage system according to the storage path;
decrypting the encrypted data to be processed according to the data encryption key to obtain the data to be processed;
and executing the target calculation task aiming at the calculation task request according to the data to be processed to obtain task result data.
According to an aspect of the present application, there is provided a computer-readable storage medium for storing program code for executing the data processing method described in the foregoing embodiments.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the embodiment.
The description of the flow or structure corresponding to each of the above drawings has emphasis, and a part not described in detail in a certain flow or structure may refer to the related description of other flows or structures.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (28)

1. A method of data processing performed by a computer device having a distributed computing framework including a key management service module, a scheduler, and an executor, the key management service module, the scheduler, and the executor operating in a trusted execution environment, the method comprising:
after receiving a calculation task request sent by data user equipment, receiving a first key request sent by the scheduler through the key management service module, and receiving a second key request sent by the executor through the key management service module, wherein the first key request and the second key request comprise an initial password;
responding to the first key request through the key management service module, performing message authentication code generation operation according to a sealing key and the initial password generated by the key management service module to obtain a security key, and responding to the second key request through the key management service module, and performing message authentication code generation operation according to the sealing key and the initial password to obtain the security key;
returning the security key to the dispatcher and the executor through the key management service module so that the dispatcher and the executor can establish Remote Procedure Call (RPC) connection according to the security key;
and based on the RPC connection, executing a target calculation task aiming at the calculation task request through the actuator to obtain task result data.
2. The method of claim 1, further comprising:
initiating remote certification verification to the key management service module through the dispatcher to obtain a first verification result, and initiating remote certification verification to the key management service module through the executor to obtain a second verification result;
and if the first verification result and the second verification result both indicate that the verification is passed, executing the steps of receiving, by the key management service module, the first key request sent by the scheduler, and receiving, by the key management service module, the second key request sent by the executor.
3. The method of claim 1, wherein the computation task request includes a storage path of encrypted data to be processed, and the performing, by the executor based on the RPC connection, a target computation task for the computation task request to obtain task result data includes:
based on the RPC connection, obtaining a data encryption key and the storage path from the scheduler through the executor, wherein the data encryption key is obtained based on the key management service module;
acquiring the encrypted data to be processed through the actuator according to the storage path;
decrypting the encrypted data to be processed through the actuator according to the data encryption key to obtain the data to be processed;
and according to the data to be processed, executing the target computing task through the actuator to obtain task result data.
4. The method of claim 3, wherein the computing task request includes a ciphertext of a data encryption key, the ciphertext of the data encryption key being encrypted by a first public key, the obtaining, by the executor from the scheduler, the data encryption key based on the RPC connection comprises:
acquiring a first private key from the key management service module through the scheduler, wherein the first public key and the first private key form a first key pair;
decrypting the ciphertext of the data encryption key through the scheduler according to the first private key to obtain the data encryption key;
sending, by the scheduler, the data encryption key to the executor based on the RPC connection.
5. The method of claim 3, wherein the data encryption key is a first private key generated by the key management service module, the to-be-processed encrypted data is obtained by encrypting the to-be-processed data with a first public key, the first public key and the first private key form a first key pair, and obtaining the data encryption key from the scheduler through the executor based on the RPC connection comprises:
acquiring a first private key from the key management service module through the dispatcher;
based on the RPC connection, sending, by the scheduler, the first private key to the executor.
6. The method according to claim 4 or 5, characterized in that the method further comprises:
receiving an encrypted public key request sent by a data side device, wherein the encrypted public key request comprises a second public key generated by the data side device;
inquiring a key database through the key management service module according to the second public key;
and if the first key pair corresponding to the second public key is inquired, returning an encrypted public key response message to the data side equipment through the key management service module, wherein the encrypted public key response message comprises the first public key in the first key pair.
7. The method according to claim 6, wherein the encrypted public key request further includes a first signature, the first signature is obtained by signing the second public key with a second private key, the second public key and the second private key form a second key pair, and before querying a key database through the key management service module according to the second public key, the method further includes:
verifying, by the key management service module, the first signature using the second public key;
and if the verification is passed, executing the step of inquiring a key database through the key management service module according to the second public key.
8. The method of claim 1, wherein the computing task request further includes a third public key, the method further comprising:
based on the RPC connection, acquiring the third public key from the scheduler through the executor;
encrypting the result encryption key through the actuator according to the third public key to obtain a ciphertext of the result encryption key;
and encrypting the task result data through the actuator according to the result encryption key to obtain encrypted result data.
9. The method of claim 8, further comprising:
receiving a result request sent by a result user device;
and responding to the result request through the actuator, and returning result response data to the result user equipment, wherein the result response data comprises the ciphertext of the result encryption key and the encrypted result data, so that the result user equipment decrypts the ciphertext of the encryption key by using a third private key to obtain the result encryption key, decrypts the encrypted result data by using the result encryption key to obtain the task result data, and the third public key and the third private key form a third key pair.
10. The method of claim 9, further comprising:
based on the RPC connection, obtaining a signature private key from the scheduler through the executor, wherein the signature private key is obtained by the scheduler through requesting the key management service module;
signing the ciphertext of the result encryption key by using the signature private key through the actuator to obtain a second signature, and signing the encrypted result data by using the signature private key through the actuator to obtain a third signature;
the result response data further includes the second signature and the third signature, so that the result user device respectively performs signature verification based on the second signature and the third signature, and after the signature verification is passed, performs a step of decrypting a ciphertext of the encryption key by using a second private key to obtain the result encryption key, and decrypts the encrypted result data by using the result encryption key to obtain the task result data.
11. The method of any of claims 1-5, wherein executing, by the executor, the target compute task for the compute task request based on the RPC connection, resulting in task result data, comprises:
and based on the RPC connection, executing the target calculation task by the executor according to the instruction of a task model to obtain the task result data.
12. The method of claim 11, wherein the computing task request further includes a first hash value of a packet of the task model, and before the task result data is obtained by the executor executing the target computing task as instructed by the task model based on the RPC connection, the method further comprises:
calculating, by the scheduler, a second hash value of a data packet of the task model;
comparing the first hash value with the second hash value to obtain a comparison result;
and if the comparison result indicates that the first hash value is consistent with the second hash value, executing the step of executing the target calculation task according to the indication of a task model by the executor based on the RPC connection to obtain task result data.
13. The method according to any one of claims 1 to 5, wherein the computing task request further includes a second public key and a fourth signature, the fourth signature is obtained by signing authorization data with a second private key, the second public key and the second private key form a second key pair, the authorization data is generated by a data user device requesting authorization from a data user device, the authorization data includes metric information of the scheduler and data description information of the data user device, and before the target computing task for the computing task request is executed by the executor based on the RPC connection, the method further includes:
verifying, by the scheduler, the fourth signature using the second public key;
and if the verification is passed, executing the target calculation task based on the RPC connection and aiming at the calculation task request through the actuator to obtain task result data.
14. A data processing apparatus deployed on a computer device having a distributed computing framework including a key management service module, a scheduler, and an executor, the key management service module, the scheduler, and the executor operating in a trusted execution environment, the apparatus comprising a receiving unit, a generating unit, a returning unit, and an executing unit:
the receiving unit is configured to receive, through the key management service module, a first key request sent by the scheduler after receiving a computation task request sent by a data consumer device, and receive, through the key management service module, a second key request sent by the executor, where the first key request and the second key request include an initial password;
the generating unit is used for responding to the first key request through the key management service module, performing message authentication code generating operation according to a sealing key and the initial password generated by the key management service module to obtain a security key, and responding to the second key request through the key management service module, and performing message authentication code generating operation according to the sealing key and the initial password to obtain the security key;
the return unit is used for returning the security key to the dispatcher and the executor through the key management service module so that the dispatcher and the executor can establish Remote Procedure Call (RPC) connection according to the security key;
and the execution unit is used for executing a target calculation task aiming at the calculation task request through the executor based on the RPC connection to obtain task result data.
15. A method of data processing performed by a computer device having a distributed computing framework deployed in a trusted execution environment, the method comprising:
receiving a calculation task request sent by data user equipment, wherein the calculation task request comprises a storage path of encrypted data to be processed;
in response to the computing task request, obtaining a data encryption key in the trusted execution environment based on a Remote Procedure Call (RPC) connection, and obtaining the encrypted data to be processed from a storage system according to the storage path, wherein the RPC connection is the RPC connection established by the data processing method of claim 1;
decrypting the encrypted data to be processed according to the data encryption key to obtain the data to be processed;
and executing the target calculation task aiming at the calculation task request according to the data to be processed to obtain task result data.
16. The method of claim 15, wherein the computing task request includes a ciphertext of a data encryption key, the ciphertext of the data encryption key being obtained by encrypting the data encryption key using a first public key, and wherein obtaining the data encryption key in the trusted execution environment comprises:
obtaining a first private key stored in the trusted execution environment, wherein the first public key and the first private key form a first key pair;
and decrypting the ciphertext of the data encryption key according to the first private key to obtain the data encryption key.
17. The method of claim 15, wherein the data encryption key is a first private key, wherein the first private key is stored in the trusted execution environment, wherein the to-be-processed encrypted data is obtained by encrypting the to-be-processed data with a first public key, and wherein the first public key and the first private key form a first key pair.
18. The method according to claim 16 or 17, further comprising:
receiving an encrypted public key request sent by a data side device, wherein the encrypted public key request comprises a second public key generated by the data side device;
inquiring a key database according to the second public key;
and if the first key pair corresponding to the second public key is inquired, returning an encrypted public key response message to the data side equipment, wherein the encrypted public key response message comprises the first public key in the first key pair.
19. The method of claim 18, wherein the encrypted public key request further includes a first signature, the first signature is obtained by signing the second public key with a second private key, the second public key and the second private key form a second key pair, and before querying a key database according to the second public key, the method further includes:
verifying the first signature using the second public key;
and if the verification is passed, executing the step of inquiring the key database according to the second public key.
20. The method of claim 15, wherein the computing task request further includes a third public key, the method further comprising:
acquiring the third public key from the computing task request;
encrypting the result encryption key according to the third public key to obtain a ciphertext of the result encryption key;
encrypting the task result data according to the result encryption key to obtain encrypted result data;
and storing the encrypted result data into the storage system.
21. The method of claim 20, further comprising:
receiving a result request sent by a result user device;
and returning result response data to the result user equipment in response to the result request, wherein the result response data comprises the ciphertext of the result encryption key and the encrypted result data, so that the result user equipment decrypts the ciphertext of the encryption key by using a third private key to obtain the result encryption key, decrypts the encrypted result data by using the result encryption key to obtain the task result data, and the third public key and the third private key form a third key pair.
22. The method of claim 21, further comprising:
obtaining a private signature key in the trusted execution environment;
signing the ciphertext of the result encryption key by using the signature private key to obtain a second signature, and signing the encrypted result data by using the signature private key to obtain a third signature;
the result response data further includes the second signature and the third signature, so that the result user device respectively performs signature verification based on the second signature and the third signature, and after the signature verification is passed, performs a step of decrypting a ciphertext of the encryption key by using a second private key to obtain the result encryption key, and decrypts the encrypted result data by using the result encryption key to obtain the task result data.
23. The method according to any one of claims 15 to 17, wherein the executing a target computing task for the computing task request according to the data to be processed to obtain task result data comprises:
and executing the target calculation task according to the to-be-processed data and the indication of a task model to obtain task result data.
24. The method according to claim 23, wherein the computing task request further includes a first hash value of a data packet of the task model, and before the target computing task is executed according to the instruction of the task model and the task result data is obtained according to the to-be-processed data, the method further includes:
calculating a second hash value of a data packet of the task model;
comparing the first hash value with the second hash value to obtain a comparison result;
and if the comparison result indicates that the first hash value is consistent with the second hash value, executing the target calculation task according to the to-be-processed data and the indication of a task model to obtain task result data.
25. The method according to any one of claims 15 to 17, wherein the computing task request further includes a second public key and a fourth signature, the fourth signature is obtained by signing authorization data with a second private key, the second public key and the second private key form a second key pair, the authorization data is generated by a data consumer device requesting authorization from a data consumer device, the authorization data includes metric information obtained from the trusted execution environment and data description information of the data consumer device, and before the target computing task for the computing task request is executed according to the data to be processed, the method further includes:
verifying the fourth signature using the second public key;
and if the verification is passed, executing the target calculation task aiming at the calculation task request according to the data to be processed to obtain task result data.
26. A data processing apparatus, the apparatus being deployed for execution on a computer device having a distributed computing framework deployed in a trusted execution environment, the apparatus comprising a receiving unit, an obtaining unit, a decryption unit, and an execution unit:
the receiving unit is used for receiving a calculation task request sent by data user equipment, wherein the calculation task request comprises a storage path of encrypted data to be processed;
the obtaining unit is used for responding to the computing task request, obtaining a data encryption key in the trusted execution environment based on a Remote Procedure Call (RPC) connection, and obtaining the encrypted data to be processed from a storage system according to the storage path, wherein the RPC connection is the RPC connection established by the data processing method in the claim 1;
the decryption unit is used for decrypting the encrypted data to be processed according to the data encryption key to obtain the data to be processed;
and the execution unit is used for executing the target calculation task aiming at the calculation task request according to the data to be processed to obtain task result data.
27. A computer device, the computer device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any of claims 1-13 or 15-25 according to instructions in the program code.
28. A computer-readable storage medium for storing program code, which when executed by a processor causes the processor to perform the method of any of claims 1-13 or 15-25.
CN202210481239.XA 2022-05-05 2022-05-05 Data processing method and related device Active CN114584306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210481239.XA CN114584306B (en) 2022-05-05 2022-05-05 Data processing method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210481239.XA CN114584306B (en) 2022-05-05 2022-05-05 Data processing method and related device

Publications (2)

Publication Number Publication Date
CN114584306A CN114584306A (en) 2022-06-03
CN114584306B true CN114584306B (en) 2022-08-02

Family

ID=81778254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210481239.XA Active CN114584306B (en) 2022-05-05 2022-05-05 Data processing method and related device

Country Status (1)

Country Link
CN (1) CN114584306B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065487B (en) * 2022-08-17 2022-12-09 北京锘崴信息科技有限公司 Privacy protection cloud computing method and cloud computing method for protecting financial privacy data
CN115378703A (en) * 2022-08-22 2022-11-22 北京冲量在线科技有限公司 Safe and trusted data processing system based on trusted execution environment and Spark
CN116010529B (en) * 2023-03-08 2023-08-29 阿里云计算有限公司 Data processing method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868684A (en) * 2021-09-30 2021-12-31 成都卫士通信息产业股份有限公司 Signature method, device, server, medium and signature system
CN114173328A (en) * 2021-12-06 2022-03-11 中国电信股份有限公司 Key exchange method and device and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090328081A1 (en) * 2008-06-27 2009-12-31 Linus Bille Method and system for secure content hosting and distribution
CN110034924B (en) * 2018-12-12 2022-05-13 创新先进技术有限公司 Data processing method and device
CN110011956B (en) * 2018-12-12 2020-07-31 阿里巴巴集团控股有限公司 Data processing method and device
JP2020528224A (en) * 2019-04-26 2020-09-17 アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited Secure execution of smart contract operations in a reliable execution environment
CN111181720B (en) * 2019-12-31 2021-04-06 支付宝(杭州)信息技术有限公司 Service processing method and device based on trusted execution environment
CN111429254B (en) * 2020-03-19 2021-09-10 腾讯科技(深圳)有限公司 Business data processing method and device and readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868684A (en) * 2021-09-30 2021-12-31 成都卫士通信息产业股份有限公司 Signature method, device, server, medium and signature system
CN114173328A (en) * 2021-12-06 2022-03-11 中国电信股份有限公司 Key exchange method and device and electronic equipment

Also Published As

Publication number Publication date
CN114584306A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
WO2022206349A1 (en) Information verification method, related apparatus, device, and storage medium
CN107743133B (en) Mobile terminal and access control method and system based on trusted security environment
US10142107B2 (en) Token binding using trust module protected keys
US11432150B2 (en) Method and apparatus for authenticating network access of terminal
CN114584306B (en) Data processing method and related device
US10601590B1 (en) Secure secrets in hardware security module for use by protected function in trusted execution environment
US9219722B2 (en) Unclonable ID based chip-to-chip communication
CN112926051B (en) Multi-party security computing method and device
US8745394B1 (en) Methods and systems for secure electronic communication
US10790979B1 (en) Providing high availability computing service by issuing a certificate
US20140270179A1 (en) Method and system for key generation, backup, and migration based on trusted computing
CN105873031B (en) Distributed unmanned plane cryptographic key negotiation method based on credible platform
CN109688098B (en) Method, device and equipment for secure communication of data and computer readable storage medium
CN114157415A (en) Data processing method, computing node, system, computer device and storage medium
Obert et al. Recommendations for trust and encryption in DER interoperability standards
WO2023174038A1 (en) Data transmission method and related device
Dey et al. Message digest as authentication entity for mobile cloud computing
CN115001841A (en) Identity authentication method, identity authentication device and storage medium
CN113411187A (en) Identity authentication method and system, storage medium and processor
Yadav et al. Mobile cloud computing issues and solution framework
CN115333839A (en) Data security transmission method, system, device and storage medium
CN111654503A (en) Remote control method, device, equipment and storage medium
CN111212026A (en) Data processing method and device based on block chain and computer equipment
CN114070568A (en) Data processing method and device, electronic equipment and storage medium
CN114079921B (en) Session key generation method, anchor point function network element and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40070396

Country of ref document: HK