CN115422432A - Dynamic searchable encryption method for massive high-dimensional medical data - Google Patents

Dynamic searchable encryption method for massive high-dimensional medical data Download PDF

Info

Publication number
CN115422432A
CN115422432A CN202211062637.4A CN202211062637A CN115422432A CN 115422432 A CN115422432 A CN 115422432A CN 202211062637 A CN202211062637 A CN 202211062637A CN 115422432 A CN115422432 A CN 115422432A
Authority
CN
China
Prior art keywords
attribute
index
vector
node
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211062637.4A
Other languages
Chinese (zh)
Inventor
唐飞
周旭君
凌国玮
单进勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211062637.4A priority Critical patent/CN115422432A/en
Publication of CN115422432A publication Critical patent/CN115422432A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of encrypted data query, in particular to a dynamic searchable encryption method for massive high-dimensional medical data; the medical center encrypts an electronic medical record data set to obtain ciphertext data, and generates a corresponding index vector for each electronic medical record based on a constructed attribute value set and an attribute hierarchical tree; constructing an index tree according to the index vector and encrypting to obtain an encrypted index tree; uploading the ciphertext data and the encrypted index tree to a medical cloud server; the data user obtains a search trapdoor from the medical center and feeds the search trapdoor back to the medical cloud server, and the medical cloud server performs matching on the encrypted index tree according to the search trapdoor to obtain a matched electronic medical record ciphertext; meanwhile, the medical center dynamically updates the encrypted data and the encrypted index in the medical cloud server; the invention reduces the dimensionality of the index vector through the attribute hierarchical structure, thereby not only reducing the calculated amount and the communication overhead, but also realizing the range retrieval of the attribute value.

Description

Dynamic searchable encryption method for massive high-dimensional medical data
Technical Field
The invention relates to the technical field of data encryption query, in particular to a dynamic searchable encryption method for massive high-dimensional medical data.
Background
In recent years, with the development of medical informatization, many paper medical records are converted into Electronic Medical Records (EMRs), so that a large amount of electronic medical record data is generated. For each medical center, how to realize safe and effective management, storage and retrieval of massive EMRs becomes a problem which needs to be solved urgently. In order to ensure the safety of data, the important measures are taken to encrypt and store the electronic medical record data and then search and inquire the encrypted and stored electronic medical record data. Many existing keyword-based search techniques are widely applied to plaintext data, and cannot be directly applied to ciphertext data. Searchable encryption technology provides a method for keyword retrieval on ciphertext.
Searchable encryption technology is one solution that enables users to search for encrypted data. At present, most of searchable encryption schemes supporting multiple keywords only support simple keyword equal queries, and multiple keyword queries supporting keyword range queries and subset queries are generally asymmetric searchable encryption based on complex algebraic structures, so that the calculation cost is high. Another class of multi-keyword searchable encryption schemes is directed to file ranking searches, which return ranked results with uncertainty. None of the above searchable encryption schemes is applicable to EMR data sets.
The electronic medical record has the characteristics of large data volume, multiple attributes and quick updating, and meanwhile, the electronic medical record belongs to personal sensitive information and is usually stored in an encrypted mode. Therefore, searching encrypted electronic medical records requires that accurate multi-keyword and scope searches be performed while returning results that are deterministic, which is a particular problem to be solved.
Disclosure of Invention
In order to carry out high-efficiency and dynamic searchable encryption on massive high-dimensional encrypted electronic medical record data, the invention provides a dynamic searchable encryption method for massive high-dimensional medical data. The method realizes multi-keyword accurate search based on the attribute range, realizes nonlinear search in the aspect of searching efficiency in the aspect of aiming at massive medical data, and can realize dynamic update of encrypted data in the aspect of searchable encryption retrieval.
A dynamic searchable encryption method for massive high-dimensional medical data is characterized by comprising the following steps:
s1, the medical center sets D = { D } to the electronic medical record data set 1 ,D 2 ,...,D N Encrypting to obtain ciphertext data, wherein N represents the total number of the electronic medical records, and D n Representing an nth electronic medical record; the medical center extracts all attribute values in the electronic medical record data set D to construct an attribute value set W = { W = { W } 1 ,W 2 ,...,W R R denotes attribute type total, W r A set of attribute values representing an r-th attribute type;
s2, the medical center constructs corresponding attribute hierarchical trees aiming at a plurality of attributes of the electronic medical records, and the electronic medical records D are obtained through the attribute hierarchical trees and the attribute value set n Generating an index vector I n
S3, the medical center processes the electronic medical record D by adopting the segmentation vector parameters n Index vector I of n Obtaining an electronic medical record D n A plurality of index subvectors;
s4, the medical center constructs an index tree according to the index subvectors of the electronic medical record data set D and distributes a sub-secret key for each layer of the index tree;
s5, the medical center encrypts each layer of the index tree according to the sub-secret key to obtain an encrypted index tree, and then uploads the ciphertext data and the encrypted index tree to the medical cloud server;
s6, the data user sends a search query request to the medical center, and the medical center processes the search query request to generate a search trapdoor and feeds the search trapdoor back to the data user;
s7, a data user sends a matching request to a medical cloud server and sends the searching trapdoor, the medical cloud server performs matching in an encrypted index tree according to the searching trapdoor, and a matched electronic medical record ciphertext is fed back to the data user;
s8, dynamically updating the data in the medical cloud server by the medical center; when the medical center adds the electronic medical record, the ciphertext data of the electronic medical record to be added, the corresponding encryption index, the corresponding search trapdoor and the ID information are uploaded to the medical cloud server, the medical cloud server searches on an encryption index tree according to the search trapdoor to find the largest attribute matching node, and the encryption search and the corresponding ID information of the electronic medical record to be added are added to the attribute matching node according to an index tree generation rule; when the medical center deletes the electronic medical record, the searching trapdoor corresponding to the attribute value of the electronic medical record needing to be deleted is uploaded to the medical cloud server, the medical cloud server conducts matching retrieval on the encrypted index tree according to the searching trapdoor, and the electronic medical record corresponding to the leaf node matched with the leaf node on the index tree and the ID of the leaf node is deleted.
Further, the process of constructing the attribute hierarchical tree includes:
s11, one electronic medical record represents an attribute record of one patient, wherein one attribute record comprises multiple attributes, and each attribute corresponds to one attribute value; dividing multiple attributes into two categories of numerical attributes and non-numerical attributes according to the attribute values;
s12, for the value attribute class: determining the maximum value range of the attribute value of the attribute, and taking the maximum value range as a root node; dividing the maximum value range according to the attribute value range and the attribute logic to obtain a plurality of sub-ranges, wherein each sub-range is used as a child node; dividing each sub-node in the same way to obtain a plurality of new sub-nodes, then continuing dividing until a single attribute value is divided, and taking the single attribute value as a leaf node to obtain an attribute hierarchical tree;
s13, for the non-numerical attribute class: determining the semantics of the attribute value of the attribute, defining a value range by using a semantic inclusion relationship, and selecting the maximum semantics as a root node; and carrying out layer-by-layer division according to the semantic inclusion relationship and the semantic parallel relationship to obtain the attribute hierarchical tree.
Further, constructing a vector representation for each path of the attribute hierarchical tree, including:
layering for attributesEach node of the tree is allocated with a vector, and the nodes comprise root nodes, child nodes and leaf nodes; when a certain node is positioned at the j-th level of the b-th level of the attribute hierarchical tree b Bits, then, are represented as
Figure BDA0003826917410000031
Wherein j b ∈J b ,J b Represents the total number of nodes at level B, B = {1, 2., B }, and B represents the total number of levels of the attribute hierarchical tree;
if node
Figure BDA0003826917410000032
There are K sub-nodes in the b-th layer, and the node
Figure BDA0003826917410000033
Is a node
Figure BDA0003826917410000034
The k-th child node of (1) is a node
Figure BDA0003826917410000035
Allocating K-dimensional vectors
Figure BDA0003826917410000036
And the K-dimensional vector
Figure BDA0003826917410000037
The k-th dimension value of (1) and the remaining dimension values of (0);
establishing a corresponding path for each attribute value
Figure BDA0003826917410000041
And representing the nodes in each path by using corresponding vectors, and combining all the vectors in each path to obtain the vector representation of each path. Further, the process of constructing the index tree is as follows:
s21, constructing a V + 1-layer index tree, taking an index vector of the electronic medical record, and equally dividing the index vector by adopting a segmentation vector parameter to obtain V index sub-vectors which are sequentially ordered;
s22, acquiring a first index sub-vector of the index vector, starting searching from Root of a Root node of the index tree, judging whether a value of a sub-node in a first layer of the index tree is the same as that of the first index sub-vector, if so, performing the step S23, otherwise, adding a sub-node in the first layer, wherein the value of the sub-node is the same as that of the first index sub-vector, and then entering the step S23;
s23, entering a V = {2, 3., V +1} layer of the index tree, judging whether a value of a child node of the V-th layer of the index tree is the same as that of a V-th index sub-vector, if so, performing a step S24, otherwise, adding a child node of the V-th layer, wherein the value of the child node is the same as that of the V-th index sub-vector, and then entering the step S24;
s24, judging whether the current index sub-vector is the last index sub-vector of the index vector, if so, finishing the addition of the index vector, and attaching the ID information of the electronic medical record corresponding to the index vector to the leaf node corresponding to the current index sub-vector, otherwise, returning to S23;
s24, repeating the steps S21-S24, and adding the index vectors of all the electronic medical records to obtain an index tree.
Furthermore, hierarchical traversal is performed from Root of the index tree, and the sub-key of each layer is represented as SK v ={S v ,M v,1 ,M v,2 },S v Dividing an identification matrix for the nodes of the v-th layer, wherein the identification matrix is an L multiplied by 1 dimension 0-1 matrix, and L is the vector dimension of the child nodes of each layer of the index tree; m is a group of v,1 L × L dimension first invertible matrix, M, for the v-th layer v,2 And L × L dimension second invertible matrix of the v-th layer. The medical center according to the sub-secret key SK v Sequentially encrypting child nodes of the v-th layer of the index tree to generate a corresponding encrypted index, wherein the process comprises the following steps:
s31, enabling the ith position of the v-th layer of the index tree to be a child node b v,i By a random number epsilon v,i Obtain a new vector b v,i
S32, for new vector b v,i Performing P division to obtain
Figure BDA0003826917410000042
And
Figure BDA0003826917410000043
if S is v [i]=0, then
Figure BDA0003826917410000044
If S is v [i]=1, then
Figure BDA0003826917410000045
Wherein S v [i]A node partition identifier representing an ith child node of the v-th layer of the index tree;
s33, generating the ith sub-node b of the v layer of the index tree according to the result of S32 v,i Is expressed as
Figure BDA0003826917410000051
Further, the medical center extracts all attribute values in the electronic medical record data set D to construct an attribute dictionary; the process that the medical center generates the search trapdoor according to the search query request through the attribute dictionary comprises the following steps:
s41, obtaining attribute keywords in the search query request, mapping the attribute keywords to the positions of corresponding attribute values through an attribute dictionary, and obtaining a position vector q by combining an attribute hierarchical tree;
s42, negating the position vector Q to obtain a query vector Q, and processing the query vector Q by adopting a partition vector parameter to obtain V sequentially ordered query sub-vectors;
s43, the v-th query subvector d v Multiplication by a random number beta v Obtain the vector d v ′;
S44, paired vectors d v ' performing q partitioning to obtain
Figure BDA0003826917410000052
And
Figure BDA0003826917410000053
if S is v [i]If not =0, then
Figure BDA0003826917410000054
If S is v [i]=1, then
Figure BDA0003826917410000055
Wherein S v [i]A node division identifier for indicating the ith child node of the v-th layer of the index tree;
s45, generating the v-th query subvector d of the query vector Q according to the result of S44 v Corresponding search trapdoor, denoted as
Figure BDA0003826917410000056
The invention has the beneficial effects that:
the invention provides a dynamic searchable encryption method for massive high-dimensional medical data, which comprises a medical center, a medical cloud server and a data user, can quickly and accurately search encrypted EMR, supports multi-keyword connection with a complex structure, realizes attribute range query by utilizing an attribute hierarchical structure, provides a simple and convenient search mode for massive electronic medical record data, and improves the search efficiency.
The invention reduces the dimensionality of the index vector through the attribute hierarchical structure (namely the attribute hierarchical tree), reduces the calculated amount and the communication expense, constructs the encryption index tree at the same time, and realizes the nonlinear search efficiency.
There is no connectivity between queries in the method of the present invention, i.e. other subjects than the medical center cannot generate a new search trapdoor from the previous search trapdoor, specifically, even if two identical search query requests are added, because of the random number added, a difference will be generated when generating the search trapdoor, and the medical cloud server cannot deduce the relationship between the search trapdoors.
The invention also provides a dynamic updating method of data, which can flexibly process addition and deletion of EMR without locally storing the index tree and reduce the risk of leakage of the local index tree.
Drawings
FIG. 1 is an architectural diagram of an embodiment of the present invention;
FIG. 2 is a diagram of an age attribute hierarchical tree structure according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an index vector according to an embodiment of the present invention;
FIG. 4 is a diagram of an index tree structure according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the dynamic searchable encryption method for massive high-dimensional medical data provided by the invention, as shown in fig. 1, three entities, namely a medical center, a medical cloud server and a data user, are included.
A medical center is a hospital or medical facility that has EMR resources that function to: for electronic medical record data set D = { D 1 ,D 2 ,...,D N Encrypting to obtain ciphertext data, N represents the total number of electronic medical records, D n Representing an nth electronic medical record; extracting all attribute values of the electronic medical record dataset D to construct a set of attribute values W = { W = { W 1 ,W 2 ,...,W R R denotes attribute type total, W r A set of attribute values representing an r-th attribute type; uploading the encrypted electronic medical record data and the encrypted index tree to a medical cloud server; meanwhile, the system is responsible for examining the data users and sending search trapdoors and EMR decryption keys for the examined data users; and the medical center is also responsible for updating the data in the medical cloud server in real time, and when the data are updated, the medical center locally generates update information and then sends the update information to the medical cloud server.
The data user is an authorized unit which allows the encrypted data to be searched in the medical cloud server, when the data user inquires the encrypted data, firstly, a search inquiry request is sent to the medical center, the medical center is waited to feed back a corresponding search trapdoor and a decryption key, then the search trapdoor is sent to the medical cloud server, the medical cloud server is waited to feed back the encrypted data meeting the conditions, and then the encrypted data is decrypted by using the decryption key fed back by the medical center.
The medical cloud server is used for storing the encrypted electronic medical record data and the encrypted index tree and performing corresponding processing operation when receiving the search trapdoor and the updating request.
In one embodiment, as shown in table 1, an electronic medical record has four attributes, namely age, regional race, gender maker and disease, each attribute has multiple attribute values, but each attribute of an electronic medical record corresponds to a specific attribute value, and each electronic medical record has its own ID information.
TABLE 1 electronic medical records
Figure BDA0003826917410000071
As can be seen from table 1, the gender attribute has only two attribute values, male and female, which are convenient to represent, but the attribute values of the three attributes of age, region and disease are various and have complex relationships, and in order to implement the range search in the attribute domain and reduce the length of the index vector, the medical center constructs the attribute hierarchical tree for different types of attributes respectively.
In an embodiment, taking an age attribute as an example, as shown in fig. 2, a hierarchical tree of the age attribute is constructed, and first, a maximum value range capable of containing all age attribute values recorded in EMRs is determined, in this embodiment, the maximum value range is determined to be [0,100], the [0,100] is used as a root node of the hierarchical tree of the age attribute, the root node [0,100] is divided according to the attribute range and attribute characteristics, so as to obtain 3 child nodes of [0,30], [31,60] and [61,100], and then the three child nodes are divided respectively until only one specific attribute value remains, and the attribute value is used as a leaf node, such as "1", "2" at the bottom in fig. 2.
Specifically, for the attributes of non-numerical classes such as regions and diseases, the semantic containment relationship is used to define the range, so as to construct the corresponding attribute hierarchical tree. For example, semantically, the region "Sichuan" includes the place "Chengdu", and "Yunnan" and "Hunan" are parallel province names.
In one embodiment, the medical center generates an index vector for the electronic medical record through the attribute hierarchical tree and the attribute value set, and one index vector of one EMR is composed of location information of multiple attributes, which are described in this embodiment with four attributes of age, region, gender and disease.
Specifically, constructing a vector representation for each path of the hierarchical tree of attributes includes:
allocating vectors for each node of the attribute hierarchical tree, wherein the nodes comprise root nodes, child nodes and leaf nodes; when a certain node is positioned at the j-th level of the b-th level of the attribute hierarchical tree b Bit, then expressed as
Figure BDA0003826917410000081
Wherein j is b ∈J b ,J b Represents the total number of nodes at level B, B = {1,2, ·, B }, and B represents the total number of levels of the attribute hierarchical tree;
if the node
Figure BDA0003826917410000082
There are K sub-nodes in the b-th layer, and the node
Figure BDA0003826917410000083
Is a node
Figure BDA0003826917410000084
The k-th child node of (1) is a node
Figure BDA0003826917410000085
Allocating K-dimensional vectors
Figure BDA0003826917410000086
And the K-dimensional vector
Figure BDA0003826917410000087
The k-th dimension value of (1) and the remaining dimension values of (0);
establishing a corresponding path for each attribute value
Figure BDA0003826917410000088
And representing the nodes in each path by using corresponding vectors, and combining all the vectors in each path to obtain the vector representation of each path.
Specifically, the age attribute hierarchical tree is described, and as shown in fig. 2 and 3, the path corresponding to the attribute value "61" is P 61 =(a 1,1 ,a 2,3 ,a 3,7 ,a 4,61 ) = ("0-100", "61-100", "61-70", "61"), node a 2,3 = "61-100" is node a 1,1 Sub-node of the 3 rd bit of = "0-100", and node a 1,1 If there are 3 child nodes in total for "0-100", node a is obtained 2,3 = 61-100 for allocating a 3-dimensional vector a 2,3 =001; then the next node in the processing path, node a 3,7 = "61-70" is node a 2,3 Bit 1 of sub-node of = "61-100", and node a 2,3 If there are 4 child nodes in total for "61-100", node a is identified 3,7 = "61-70" allocate 4-dimensional vector a 3,7 =1000; node a 4,61 = "61" is node a 3,7 Bit 1 of sub-node of = "61-70", and node a 3,7 If the number of the child nodes is 10 in total, the node is the node a 4,61 =61 allocates a 10-dimensional vector a 4,61 =1000000000, resulting in a vector representation of attribute value "61" of 100110001000000000. Vector representations of four attributes of the age, the region, the gender and the disease of an EMR are combined to form an index vector of the EMR, wherein the gender has only two attribute values of a male and a female, an attribute hierarchical tree is not constructed, and the vector representation is carried out based on the positions of the attribute value sets.
In one embodiment, the medical center constructs an index tree from the index subvector of the electronic medical record data set D, including:
s21, constructing a V + 1-layer index tree, taking an index vector of the electronic medical record, and equally dividing the index vector by adopting a segmentation vector parameter to obtain V index sub-vectors which are sequentially ordered;
s22, acquiring a first index sub-vector of the index vector, starting searching from Root of a Root node of the index tree, judging whether a value of a sub-node in a first layer of the index tree is the same as that of the first index sub-vector, if so, performing the step S23, otherwise, adding a sub-node in the first layer, wherein the value of the sub-node is the same as that of the first index sub-vector, and then entering the step S23;
s23, entering a V = {2, 3., V +1} layer of the index tree, judging whether a child node value of the V layer of the index tree is the same as the value of a V index sub-vector, if so, performing a step S24, otherwise, adding a child node in the V layer, wherein the child node value is the same as the value of the V index sub-vector, and then entering the step S24;
s24, judging whether the current index sub-vector is the last index sub-vector of the index vector, if so, finishing the addition of the index vector, and attaching the ID information of the electronic medical record corresponding to the index vector to the leaf node corresponding to the current index sub-vector, otherwise, returning to S23;
and S25, repeating the steps S21-S24, and adding the index vectors of all the electronic medical records to obtain an index tree.
Specifically, as shown in fig. 4, there are 5 index vectors of [ 1000100010001 ], [100110001], [110001001], [110001111], [110010111], and the division vector parameter V is used to process the 5 index vectors respectively, in this embodiment, V =3, and 3 index sub-vectors [100,010,001], [100,110,001], [110,001,111], and [110,010,111] corresponding to the 5 index vectors are obtained.
Constructing a Root node Root of an index tree, firstly adding an index vector [10001 ], if the Root node Root has no child node of [100], adding a child node of [100], then searching a child node of [010] under the branch of the child node [100], if the child node of [010] is not added, then searching a child node of [001] under the branch of the child node [010], if the child node of [001] is not added, at this time, completely adding the index child vector of a first EMR, and adding ID information of the EMR at the last child node [001], namely a leaf node, wherein ID =1; after the first four index vectors are added, and finally, an index vector [110010111] is added, whether the Root node Root is a child node of [110] is judged, if the Root node Root is added before, whether the Root node Root is a child node of [010] is searched by a branch entering the child node [110], if the Root node Root is not added with [010], the child node [111] is created in [010], and finally, the ID information is attached, and the ID =5.
Specifically, hierarchical traversal is performed starting from Root of the index tree, and the sub-key of each layer is represented as SK v ={S v ,M v,1 ,M v,2 },S v Dividing an identification matrix for the nodes of the v-th layer, wherein the identification matrix is an L multiplied by 1 dimension 0-1 matrix, and L is the vector dimension of the child nodes of each layer of the index tree; m v,1 Is a first invertible matrix of dimension L × L of the v-th layer, M v,2 Is an L × L dimension second reversible matrix of the v layer, and the medical center is based on the sub-secret key SK v Sequentially encrypting child nodes of the v-th layer of the index tree to generate a corresponding encrypted index, wherein the process comprises the following steps:
s31, enabling the ith position of the v-th layer of the index tree to be a child node b v,i By a random number epsilon v,i Get the new vector b v,i
S32, aiming at new vector b v,i Performing P division to obtain
Figure BDA0003826917410000101
And
Figure BDA0003826917410000102
if S is v [i]If not =0, then
Figure BDA0003826917410000103
If S is v [i]=1, then
Figure BDA0003826917410000104
Wherein S v [i]A node division identifier for indicating the ith child node of the v-th layer of the index tree;
s33, generating the ith bit child node b of the v layer of the index tree according to the result of S32 v,i Is expressed as
Figure BDA0003826917410000105
In one embodiment, the medical center extracts all attribute values in the electronic medical record data set D to construct an attribute dictionary; the process that the medical center generates the search trapdoor according to the search query request through the attribute dictionary comprises the following steps:
s41, obtaining attribute keywords in the search query request, mapping the attribute keywords to the positions of corresponding attribute values through an attribute dictionary, and obtaining a position vector q by combining an attribute hierarchical tree; for the attribute categories which are not searched by a data user, all attribute value position vector bits of the attribute categories are set to be 1, for the attribute categories which need to be searched, the corresponding attribute value position vector bits are set to be 1, and the rest attribute value position vector bits are set to be 0;
s42, negating the position vector Q to obtain a query vector Q, and processing the query vector Q by adopting a segmentation vector parameter to obtain V query sub-vectors which are sequentially ordered; the positions of all attribute values in the query vector Q correspond to the attribute value sets one by one, and attribute hierarchical trees are adopted to represent the attribute types with complex relations;
s43, the v-th query subvector d v Multiplication by a random number beta v Obtain the vector d v ′,β v ≠0;
S44, paired vectors d v ' performing q partitioning to obtain
Figure BDA0003826917410000106
And
Figure BDA0003826917410000107
if S is v [i]If not =0, then
Figure BDA0003826917410000108
If S is v [i]=1, then
Figure BDA0003826917410000109
Wherein S v [i]Representing the ith bit of the v-th layer of the index treeNode division identification of the nodes;
s45, generating the v-th query subvector d of the query vector Q according to the result of S44 v Corresponding search trapdoor, denoted as
Figure BDA0003826917410000111
In one embodiment, the medical cloud server executes a matching query after receiving a search trapdoor d ″ sent by a data user, and the matching query includes:
performing hierarchical traversal on the encryption index tree, starting from a root node, sequentially performing standard inner product on encryption indexes of child nodes on the v-th layer of the encryption index tree with the v-th search trapdoor of a query vector Q, if the result is 0, indicating that the child nodes are matched with the query attribute, then performing matching screening in branches of the child nodes, sequentially performing standard inner product on the encryption indexes of all child nodes of the child nodes with the v + 1-th search trapdoor of the query vector Q, searching to the last layer according to the rule, and pointing the nodes matched with the last layer of the encryption index tree to ID information of EMR (electronic magnetic resonance) encrypted data matched with the query of a data user, wherein the standard inner product is expressed as follows:
Figure BDA0003826917410000112
wherein epsilon is a random value added for generating an encryption index, and beta is a random value added for generating a search trapdoor.
In one embodiment, the medical center dynamically updates the data stored in the medical cloud server, and when the electronic medical record is deleted or added, the purpose is achieved by synchronously updating the nodes of the encryption index tree on the medical cloud server, and it is noted that the updating of the index is only based on the ID of the EMR encryption data and does not need to access the EMR data content.
Specifically, when the medical center adds EMRs, the EMRs to be added are firstly encrypted, the encryption indexes and the search trapdoors of the EMRs are generated by adopting the method mentioned in the foregoing, and then the encrypted EMRs, the IDs of the EMRs, the encryption indexes and the search trapdoors are sent to the medical cloud server.
After receiving a data adding request of a medical center, the medical cloud server searches and queries the encrypted index tree by using the search trapdoors of EMRs to be added based on the matching query mode mentioned in the above embodiment, after receiving the search trapdoors of the EMRs to be added, the medical cloud server starts to search from the root nodes of the encrypted index tree, performs standard inner product on the encrypted indexes of the child nodes at the v-th layer of the encrypted index tree sequentially with the v-th search trapdoor of the query vector Q, if the result is 0, the attribute values of the child nodes are the same as those of the EMRs to be added, performs matching screening in the branches of the child nodes, performs standard inner product on the encrypted indexes of all the child nodes of the child nodes sequentially with the v + 1-th search trapdoor of the query vector Q, and so on. In the search query process, if the v + i th layer of the encryption index tree is searched, only one node with the standard inner product of 0 exists, and the inner products of the child nodes of the node with the standard inner product of 0 and the v + i +1 th search trapdoor are not 0, the node is the attribute matching node of the EMR to be added, which is the largest in the encryption index tree. And starting from the v + i +1 th sub-vector of the encryption index of the EMR needing to be added, adding a sub-node for the maximum attribute matching node according to the construction rule of the index tree, continuing to add under the sub-node until the sub-vector used for adding the encryption index is added, namely a final leaf node is obtained, and adding the ID of the EMR needing to be added at the leaf node to finish the updating of the encryption index tree.
Specifically, the medical center needs to delete the abandoned medical record in the medical cloud server, so that waste of storage resources and reduction of search efficiency are avoided. In the case of a large amount of EMR data, an ID is not usually designated to delete a record, and more, deletion of a corresponding record is performed based on an attribute value that matches an attribute. When the medical center deletes the EMR, it first determines which attribute values, such as age =61and distance = diabetes, are deleted. And according to the condition of deleting the data, the medical center generates a corresponding search trapdoor and sends the search trapdoor to the medical cloud service. And the medical cloud server receives a data deletion request of the medical center, and searches the encrypted index tree for the search trapdoor for deleting the record in a search query mode. When the last layer is searched, there is a node that meets the requirements and that points to an ID that is the ID of the encrypted EMR that meets the delete data request. The medical cloud server needs to delete the node at the last level of the encrypted index tree and delete the encrypted EMR of the corresponding ID in the encrypted store.
In the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "connected," "fixed," "rotated," and the like are to be construed broadly, e.g., as being fixedly connected, detachably connected, or integrated; can be mechanically or electrically connected; the terms may be directly connected or indirectly connected through an intermediate agent, and may be used for communicating the inside of two elements or interacting relation of two elements, unless otherwise specifically defined, and the specific meaning of the terms in the present invention can be understood by those skilled in the art according to specific situations.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. A dynamic searchable encryption method for massive high-dimensional medical data is characterized by comprising the following steps:
s1, the medical center sets an electronic medical record data set D = { D = { D = } to electronic medical record data 1 ,D 2 ,...,D N Encrypting to obtain ciphertext data, wherein N represents the total number of the electronic medical records, and D n Representing an nth electronic medical record; the medical center extracts all attribute values in the electronic medical record data set D to construct an attribute value set W = { W = { W } 1 ,W 2 ,...,W R R denotes attribute type total, W r A set of attribute values representing an r-th attribute type;
s2, the medical center constructs a corresponding attribute hierarchical tree for a plurality of attributes of the electronic medical recordElectronic medical record D with attribute hierarchical tree and attribute value set n Generating an index vector I n
S3, the medical center processes the electronic medical record D by adopting the segmentation vector parameters n Index vector I of n Obtaining an electronic medical record D n A plurality of index subvectors of (a);
s4, the medical center constructs an index tree according to the index subvectors of the electronic medical record data set D and distributes a sub-secret key for each layer of the index tree;
s5, the medical center encrypts each layer of the index tree according to the sub-secret key to obtain an encrypted index tree, and then uploads the ciphertext data and the encrypted index tree to a medical cloud server;
s6, the data user sends a search query request to the medical center, and the medical center processes the search query request to generate a search trapdoor and feeds the search trapdoor back to the data user;
s7, a data user initiates a matching request to a medical cloud server and sends the searching trapdoor, the medical cloud server performs matching in an encryption index tree according to the searching trapdoor, and a matched electronic medical record ciphertext is fed back to the data user;
s8, the medical center updates the data in the medical cloud server; when the medical center adds the electronic medical record, the ciphertext data of the electronic medical record to be added, the corresponding encryption index, the corresponding search trapdoor and the corresponding ID are uploaded to a medical cloud server; and when the medical center deletes the electronic medical record, uploading the search trapdoor corresponding to the attribute value of the electronic medical record to be deleted to the medical cloud server.
2. The dynamic searchable encryption method for massive high-dimensional medical data according to claim 1, wherein the process of constructing the attribute hierarchical tree comprises:
s11, one electronic medical record represents an attribute record of one patient, one attribute record comprises multiple attributes, and each attribute corresponds to one attribute value; dividing multiple attributes into two categories, namely a numerical attribute and a non-numerical attribute according to the attribute values;
s12, for the numerical value attribute class: determining the maximum value range of the attribute value of the attribute, and taking the maximum value range as a root node; obtaining a plurality of sub-ranges according to the attribute value range and the maximum value range of the logical division of the attribute, wherein each sub-range is used as a child node; dividing each sub-node in the same way to obtain a plurality of new sub-nodes, then continuing dividing until a single attribute value is divided, and taking the single attribute value as a leaf node to obtain an attribute hierarchical tree;
s13, for the non-numerical attribute class: determining the semantics of the attribute value of the attribute, defining a value range by using a semantic inclusion relationship, and selecting the maximum semantics as a root node; and carrying out layer-by-layer division according to the semantic inclusion relation and the semantic parallel relation to obtain the attribute hierarchical tree.
3. The method for dynamically searchable encryption of massive amounts of high-dimensional medical data according to claim 2, wherein constructing a vector representation for each path of a hierarchical tree of attributes comprises:
allocating vectors for each node of the attribute hierarchical tree, wherein the nodes comprise root nodes, child nodes and leaf nodes; when a certain node is positioned at the j-th level of the b-th level of the attribute hierarchical tree b Bit, then expressed as
Figure FDA0003826917400000021
Wherein j b ∈J b ,J b Represents the total number of nodes at level B, B = {1, 2., B }, and B represents the total number of levels of the attribute hierarchical tree;
if the node
Figure FDA0003826917400000022
There are K sub-nodes in the b-th layer, and the node
Figure FDA0003826917400000023
Is a node
Figure FDA0003826917400000024
The k-th child node of (1) is a node
Figure FDA0003826917400000025
Allocating K-dimensional vectors
Figure FDA0003826917400000026
And the K-dimensional vector
Figure FDA0003826917400000027
The k-th dimension value of (1) and the remaining dimension values of (0);
establishing a corresponding path for each attribute value
Figure FDA0003826917400000028
And representing the nodes in each path by using corresponding vectors, and combining all the vectors in each path to obtain the vector representation of each path.
4. The dynamic searchable encryption method for massive high-dimensional medical data according to claim 1, wherein the process of constructing the index tree is as follows:
s21, constructing a V + 1-layer index tree, taking an index vector of the electronic medical record, and equally dividing the index vector by adopting a segmentation vector parameter to obtain V index sub-vectors which are sequentially ordered;
s22, obtaining a first index sub-vector of the index vector, starting searching from Root of a Root node of the index tree, judging whether a first layer of the index tree has a sub-node with the same value as the first index sub-vector, if so, performing a step S23, otherwise, adding a sub-node in the first layer, wherein the value of the sub-node is the same as the value of the first index sub-vector, and then entering the step S23;
s23, entering a V = {2, 3., V +1} layer of the index tree, judging whether a value of a child node of the V-th layer of the index tree is the same as that of a V-th index sub-vector, if so, performing a step S24, otherwise, adding a child node of the V-th layer, wherein the value of the child node is the same as that of the V-th index sub-vector, and then entering the step S24;
s24, judging whether the current index sub-vector is the last index sub-vector of the index vector, if so, finishing the addition of the index vector, and attaching the ID information of the electronic medical record corresponding to the index vector to the leaf node corresponding to the current index sub-vector, otherwise, returning to S23;
and S25, repeating the steps S21-S24, and adding the index vectors of all the electronic medical records to obtain an index tree.
5. The dynamic searchable encryption method for massive high-dimensional medical data according to claim 4, wherein hierarchical traversal is performed starting from Root of an index tree, and a sub-key of each layer is represented as SK v ={S v ,M v,1 ,M v,2 },S v Dividing an identification matrix for the nodes of the v-th layer; m is a group of v,1 Is the first invertible matrix of the v-th layer, M v,2 As a second reversible matrix at layer v, the medical center is based on the sub-key SK v Sequentially encrypting child nodes on the v-th layer of the index tree to generate a corresponding encryption index, wherein the process comprises the following steps:
s31, enabling the ith position of the v-th layer of the index tree to be a child node b v,i By a random number epsilon v,i Obtain a new vector b v,i
S32, aiming at new vector b v,i Performing P partition to obtain
Figure FDA0003826917400000031
And
Figure FDA0003826917400000032
if S is v [i]If not =0, then
Figure FDA0003826917400000033
If S is v [i]=1, then
Figure FDA0003826917400000034
Wherein S v [i]A node division identifier for indicating the ith child node of the v-th layer of the index tree;
s33. According to S32 the ith bit child node b of the v-th level of the resulting index tree v,i Is expressed as
Figure FDA0003826917400000035
6. The dynamic searchable encryption method for massive high-dimensional medical data according to claim 1, wherein the medical center extracts all attribute values in an electronic medical record data set D to construct an attribute dictionary; the process that the medical center generates the search trapdoor according to the search query request through the attribute dictionary comprises the following steps:
s41, obtaining attribute keywords in the search query request, mapping the attribute keywords to the positions of corresponding attribute values through an attribute dictionary, and obtaining a position vector q by combining an attribute hierarchical tree;
s42, negating the position vector Q to obtain a query vector Q, and processing the query vector Q by adopting a segmentation vector parameter to obtain V query sub-vectors which are sequentially ordered;
s43, the v-th query subvector d v Multiplication by a random number beta v Obtain the vector d v ′;
S44, paired vectors d v ' performing q partitioning to obtain
Figure FDA0003826917400000041
And
Figure FDA0003826917400000042
if S is v [i]=0, then
Figure FDA0003826917400000043
If S is v [i]=1, then
Figure FDA0003826917400000044
Wherein S v [i]A node division identifier for indicating the ith child node of the v-th layer of the index tree;
s45, generating the v-th query sub direction of the query vector Q according to the result of S44Quantity d v Corresponding search trapdoor, denoted as
Figure FDA0003826917400000045
CN202211062637.4A 2022-09-01 2022-09-01 Dynamic searchable encryption method for massive high-dimensional medical data Pending CN115422432A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211062637.4A CN115422432A (en) 2022-09-01 2022-09-01 Dynamic searchable encryption method for massive high-dimensional medical data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211062637.4A CN115422432A (en) 2022-09-01 2022-09-01 Dynamic searchable encryption method for massive high-dimensional medical data

Publications (1)

Publication Number Publication Date
CN115422432A true CN115422432A (en) 2022-12-02

Family

ID=84200921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211062637.4A Pending CN115422432A (en) 2022-09-01 2022-09-01 Dynamic searchable encryption method for massive high-dimensional medical data

Country Status (1)

Country Link
CN (1) CN115422432A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271711A (en) * 2023-11-21 2023-12-22 湖南格尔智慧科技有限公司 Medical case retrieval method and system based on similarity calculation
CN117521118A (en) * 2024-01-05 2024-02-06 深圳万海思数字医疗有限公司 Medical data searchable encryption privacy protection and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271711A (en) * 2023-11-21 2023-12-22 湖南格尔智慧科技有限公司 Medical case retrieval method and system based on similarity calculation
CN117521118A (en) * 2024-01-05 2024-02-06 深圳万海思数字医疗有限公司 Medical data searchable encryption privacy protection and system
CN117521118B (en) * 2024-01-05 2024-04-26 深圳万海思数字医疗有限公司 Medical data searchable encryption privacy protection and system

Similar Documents

Publication Publication Date Title
CN106815350B (en) Dynamic ciphertext multi-keyword fuzzy search method in cloud environment
Fu et al. Towards efficient content-aware search over encrypted outsourced data in cloud
CN115422432A (en) Dynamic searchable encryption method for massive high-dimensional medical data
Wang et al. Searchable encryption over feature-rich data
Ding et al. Privacy-preserving multi-keyword top-$ k $ k similarity search over encrypted data
US11550833B2 (en) Architecture for semantic search over encrypted data in the cloud
Xia et al. Secure semantic expansion based search over encrypted cloud data supporting similarity ranking
US9477694B2 (en) Guaranteeing anonymity of linked data graphs
CN106980796B (en) MDB-based cloud environment+Search method of tree multi-domain connection keywords
Woodworth et al. S3C: An architecture for space-efficient semantic search over encrypted data in the cloud
Guo et al. Dynamic multi-keyword ranked search based on bloom filter over encrypted cloud data
Dai et al. Enhanced semantic-aware multi-keyword ranked search scheme over encrypted cloud data
Dai et al. A keyword-grouping inverted index based multi-keyword ranked search scheme over encrypted cloud data
Arora et al. Implementing privacy using modified tree and map technique
McGlothlin et al. Materializing and persisting inferred and uncertain knowledge in RDF datasets
Yao et al. Efficient and privacy-preserving search in multi-source personal health record clouds
Kamble et al. A study on fuzzy keywords search techniques and incorporating certificateless cryptography
Karapiperis et al. FEMRL: A framework for large-scale privacy-preserving linkage of patients’ electronic health records
CN115203138A (en) Data retrieval method, device and storage medium
He et al. FMSM: A fuzzy multi-keyword search scheme for encrypted cloud data based on multi-chain network
Arca et al. Privacy protection in smart health
CN107391584A (en) Facet searching method and system based on formal notion lattice
Gampala et al. An efficient Multi-Keyword Synonym Ranked Query over Encrypted Cloud Data using BMS Tree
KR102625319B1 (en) Method of managing healthcare data based on cloud server and apparatus thereof
CN111966778A (en) Multi-keyword ciphertext sorting and searching method based on keyword grouping reverse index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination