Data searchable encryption and keyword search method, system, terminal and equipment
Technical Field
The invention belongs to the technical field of searchable encryption, and particularly relates to a method, a system, a terminal and equipment for data searchable encryption and keyword search.
Background
With the development of internet applications, more and more users often implement searches by inputting search keywords in search pages and triggering search operations. Specifically, after the search page obtains the input search keyword and the triggered search operation, corresponding association words are listed according to the input search keyword, a user clicks one of the association words to obtain a search result related to the association word, and the user can browse detailed information corresponding to the search result by clicking the search result expected to be viewed. The above search process has a disadvantage of low search efficiency because it requires that information desired by the user is available depending on a plurality of operators input by the user.
In past SSE studies, keywords were derived from keywords extracted from data files. The content and number of keywords are limited by the keyword extraction algorithm and are fixed. When the keyword set is modified, a new keyword set needs to be submitted or constructed again, so that the construction time of the data set is increased, and the operation is troublesome. The existing keyword search method is roughly as follows:
step 1, encryption process: the user encrypts the plaintext file locally using the key and uploads it to the server.
Step 2, a trapdoor generation process: and the user with retrieval capability uses the key to generate the trapdoor of the keyword to be queried, and the trapdoor is required not to reveal any information of the keyword.
Step 3, search process: the server executes a retrieval algorithm by taking the key word trapdoor as input, returns all the ciphertext files containing the key word corresponding to the trapdoor, and requires that the server can not obtain more information except knowing whether the ciphertext files contain a certain specific key word.
Step 4, decryption process: and the user decrypts the ciphertext file returned by the server by using the secret key to obtain a query result.
In addition, David Cash et al propose a safe and efficient data processing scheme for different database sizes, especially for larger databases, that can efficiently and privately search server encrypted databases having hundreds of billions of record key pairs. Their basic theoretical construct supports single-keyword searches and provides asymptotically optimized server index sizes, fully parallel searches and minimal leakage.
In the above methods, since the keywords are obtained by the keyword extraction algorithm, the content and number of the keywords are limited by the keyword extraction algorithm and are fixed. Therefore, the inquired keywords are limited, and when the keyword set is modified, a new keyword set needs to be submitted or constructed again, so that the process increases the time for constructing the data set and also causes the operation to be troublesome.
Disclosure of Invention
The first purpose of the present invention is to overcome the disadvantages and shortcomings of the prior art, and to provide a data searchable encryption method, which encrypts data files and summaries thereof uploaded by a data owner respectively, and generates a dictionary corresponding to a keyword at the same time.
It is a second object of the present invention to provide a data searchable encryption system.
A third object of the present invention is to provide a terminal.
The fourth purpose of the present invention is to provide a keyword searching method based on the above data searchable encryption method, which can automatically update the keywords during the searching process, thereby greatly improving the keyword searching efficiency.
It is a fifth object of the invention to provide a computing device.
The first purpose of the invention is realized by the following technical scheme: a data searchable encryption method comprises the following steps:
acquiring a data file uploaded by a data owner;
extracting key words of each data file;
extracting the abstract of each data file to obtain an abstract file;
generating a dictionary gamma after data processing is carried out through an encryption algorithm according to the corresponding relation between each keyword and each data file, wherein labels corresponding to each data file by each keyword and index information corresponding to each data file by each keyword are stored in the dictionary gamma, and the labels corresponding to each data file by the keyword and the index information corresponding to each data file by the keyword are in one-to-one pairing relation in the dictionary gamma for each keyword;
encrypting each data file to obtain an encrypted data file;
and encrypting each abstract file to obtain the encrypted abstract file.
Preferably, the specific process of generating the dictionary γ after performing data processing by an encryption algorithm according to the correspondence between each keyword and each data file is as follows:
s11, establishing an empty table L, and selecting a master key K for the table L;
s12, for each keyword, obtaining each data file including the keyword, and generating a pair of subkeys K for the keyword through the main key K1,K2:
K1←F(K,1||ω);
K2←F(K,2||ω);
Wherein omega is a keyword;
for each keyword, numbering each data file comprising the keyword to obtain a file number corresponding to each data file, and sequencing each file number to obtain a sequence number of each file number;
for each keyword, a key K is used1Generating labels in sequence according to each file number corresponding to the keyword, and simultaneously adopting a secret key K2Encrypting each file number corresponding to the keyword in sequence, and taking the encrypted result as the index information of the data file corresponding to the keyword to obtain a label index pair (L)i,di):
Li←F(K1,i);
di←Enc(K2,idi);
i=0,1,…,N-1;
Wherein L isiTo adopt a secret key K1The ith document number id corresponding to the keyword omegaiThe generated label; diTo adopt a secret key K2I (th) file number id corresponding to encryption keyword omegaiThen obtaining a result, using the result as the key word to correspond to the file number as idiN is the total number of data files including the keyword ω;
s13, obtaining one label index pair (L) for each keywordi,di) Sequentially inserting the data into a table L according to the order of a dictionary gamma; and indexes a pair (L) for each tagi,di) Time with timestamp addediObtaining a product containing (L)i,di,timei) A dictionary gamma is created through the table L; wherein timeiIs the initial time for encryption to complete each tag index pair (L)i,di) Middle index information diTime of (d).
Preferably, the process of acquiring each summary file is as follows:
firstly, extracting a summary file from a data file through a document summary extraction algorithm; then, taking the file number corresponding to the abstract as an index, storing the abstract at a corresponding position, and performing character filling on the rest positions to form an abstract file;
substring search encryption is carried out on each summary file by adopting a Burrows-Wheeler conversion algorithm and an FM indexing technology, and the specific process is as follows:
respectively creating a linked list aiming at each different character in the abstract file; for each character linked list, each node storage tuple is < nptr, addr >, nptr is a pointer pointing to the next node of the character linked list, addr is the position of the character at a certain position in the summary file in the FM index, and addr in different node storage tuples in the character linked list are the positions of the characters at different positions in the summary file in the FM index respectively;
aiming at each different character in the abstract file, the first node of each character linked list, namely the storage tuple of the linked list head, is encrypted to obtain:
wherein<nptr1,addr1>For the first node of each character chain table, i.e. the memory tuple of the head of the chain table, cmThe number of the m characters in different characters of the abstract file is Y, and the Y is the total number of the different characters in the abstract file; k' is a secondary key, FK′(cm) Indicating that the character c is pointed to by the secondary key KmCarrying out encryption;
aiming at different characters in the abstract file, firstly, encryption processing is carried out, data after encryption processing of the different characters are used as linked list indexes to obtain a linked list index set, and the linked list indexes corresponding to the different characters are respectively mapped to the linked list heads of the linked lists of the different characters to obtain the mapping relation between the linked list indexes and the linked list heads of the different characters; after encryption processing of different characters in the summary file, the method comprises the following steps:
k being the master key, FK(cm) Indicating that the character c is pointed to by the master key KmCarrying out encryption; fK′(cm) Indicating that the character c is pointed to by the secondary key KmEncryption is performed.
The second purpose of the invention is realized by the following technical scheme: a data searchable encryption system comprising:
the data file acquisition unit is used for acquiring a data file uploaded by a data owner;
a keyword extraction unit for extracting keywords of each data file,
the abstract extraction unit is used for extracting an abstract of each data file to obtain an abstract file;
the dictionary generating unit is used for generating a dictionary gamma after data processing is carried out through an encryption algorithm according to the corresponding relation between each keyword and each data file, wherein the dictionary gamma stores the label corresponding to each data file by each keyword and the index information corresponding to each data file by each keyword, and the label corresponding to each data file by each keyword and the index information corresponding to each data file by each keyword are in one-to-one pairing relation aiming at each keyword;
the data file encryption unit is used for encrypting each data file to obtain an encrypted data file;
and the digest file encryption unit is used for searching and encrypting substrings of the digest files to obtain encrypted digest files.
The third purpose of the invention is realized by the following technical scheme: a terminal comprising a processor and a memory for storing a program executable by the processor, wherein the processor executes the program stored in the memory to implement the data searchable encryption method according to the first object of the present invention.
The fourth purpose of the invention is realized by the following technical scheme: a keyword search method comprises the following steps:
step X1, firstly, acquiring a dictionary gamma, an encrypted data file and an encrypted summary file which are acquired by the data searchable encryption method of the first object of the invention;
when receiving each keyword which is sent by a user and needs to be searched, firstly, determining whether the encrypted data file comprises the keyword through a search dictionary gamma; if so, returning the corresponding encrypted data file as a query result to the user for decryption;
if not, go to step X2;
step X2, performing substring search on each keyword to be searched in the encrypted abstract file set;
if the key word is searched in the abstract file after the substring search, the encrypted data file corresponding to the abstract file is returned to the user as a query result; under the condition that the user confirms that the data files are correct, the corresponding data files serving as the query result are confirmed to comprise the key words, the labels of the data files corresponding to the key words and the index information of the corresponding data files are calculated and added into the dictionary gamma, and the dictionary gamma is updated;
and if the substring search is passed and the substring search fails, returning a search failure result to the user.
Preferably, in the step X1, for each keyword that needs to be searched, a specific process of determining whether the encrypted data file includes the keyword through the search dictionary γ is as follows:
step X11, aiming at each keyword which needs to be searched by the user, generating a pair of sub-keys K 'for the keyword according to the main key K sent by the user'1,K′2:
K′1←F(K,1||ω′);
K′2←F(K,2||ω′);
Wherein omega' is a keyword which needs to be searched by a user;
step X12, for each keyword needing to be searched, traversing the file number sequence corresponding to the data file, and passing through the sub-key K'1Generating a label of the data file of which the keyword corresponds to each file number:
Li′←F(K′1,i′);i′=0,1,2,…I;
wherein I' is a file number sequence number corresponding to the traversed data file, and I is the maximum value of the file number sequence number corresponding to the traversed data file; l isi′A label corresponding to the data file with the file number i 'for the keyword omega';
step X13, for each keyword which needs to be searched, searching dictionary gamma for whether there is label L of data file corresponding to each file number by the keyword generated in the step X12i′;
If not, go to step X2;
if yes, index information paired with the label is obtained in a dictionary gamma, and then the sub-key K 'of the keyword is passed'2Decrypting the index information, acquiring the corresponding encrypted data file through the decrypted index information, and returning the encrypted data file to a user for decryption as a query result; at the same time, the time stamp of the tag index pair stored in the dictionary γ is updated to the sub-key K 'of the keyword'2Time of completion of decrypting the index information;
the index information paired with the label acquired in the dictionary γ is:
di′←Get(γ,Li′);
wherein d isi′For obtaining and labeling L in dictionary gammai′Paired index information;
wherein, the sub-key K 'of the keyword'2The obtained decrypted index information is:
di←Dec(K′2,di′);
wherein d isiIs di′Sub-key K 'by keyword ω'2Decrypted index information, wherein the decrypted index information diI.e. the file number of the data file comprising the keyword omega'.
Preferably, in the step X2, a Burrows-Wheeler conversion algorithm and an FM indexing technique are collectively used for substring search in the encrypted digest file set, and the specific process is as follows:
step X21, aiming at the keyword omega' needing to be searched, generating a keyword query token tkT,S:
tkT,S=F(K,ω′[1…M])=F(K,ω′[1]),F(K,ω′[2]),…F(K,ω′[M]),F(K′,ω′[M]);
Wherein, ω ' 1, ω ' 2, …, ω ' M are each character of the keyword ω ' to be searched, M is the total number of characters of the keyword ω '; k' is a secondary key, K ═ F (K, 2), K is a primary key;
step X22, aiming at each character omega 'of the keyword omega' required to be searched]And M is 1,2,3, … M, which is first encrypted to yield:
then search the ciphertext from the linked list index set
By the index of each character ω' [ m ]]Mapping between linked list index and linked list headerObtaining the character omega'm by the radial relation]A linked list of;
step X23, for the last character ω ' M of the keyword ω ' that needs to be searched, mapping each node in the linked list of the character ω ' M to the encrypted FM tuple:
wherein
Data corresponding to column F at FM;
wherein
Data corresponding to L columns at FM;
wherein E (pos)j) Corresponding to data at column j of SA of FM, posjRepresenting a position ciphertext of a character corresponding to the data of the jth line of the SA column in the summary file, wherein n is the total line number of FM;
wherein the content of the first and second substances,
for the character corresponding to the data in column fth and row j of FM,
corresponding character for data in FM Fth column and j th line
The position number of (2);
for the character corresponding to the data in column lth and row jth of FM,
the character corresponding to the data in the L column and j row of FM
The position number of (2);
for each encrypted FM tuple to which each byte in the linked list of ω' M maps:
first of all, using F
K(ω′[m]) For data in F column of FM namely
Performing XOR operation to realize decryption to obtain
Then adopt
Decrypting an element of a first portion of data in an L-column of FM as a key
To obtain
Will be provided with
And elements of the second part of the data in the L columns of FM
Performing exclusive-or operation to obtain an exclusive-or operation result, and then entering step X24;
step X24, aiming at each XOR operation result obtained in the previous step, searching a row with data as the result of the XOR operation in the F column of the FM, then obtaining the FM tuple of the row, searching a linked list with nodes mapped to the FM tuple, and thus obtaining the character c corresponding to the linked listxAs the currently searched character; wherein x is the number of times data is searched in the F column of the FM currently; go to step X25;
step X25, determination of each character c acquired in step X24xWhether or not there is a sum character omega' [ M-x]The same character;
if yes, judging whether the number x of data searching in the F column of the FM is equal to M-1 or not; if yes, ending substring search, successfully searching substrings, and enabling the corresponding abstract files to comprise keywords omega' needing to be searched; if not, go to step X26;
if not, ending substring search, and returning a result of substring search failure, namely, the corresponding abstract file does not contain the keyword omega';
step X26, for the sum character ω' [ M-X ] acquired at step X24]Identical character cxThe character c is obtained in the obtaining step X24xEach FM tuple obtained, and for each FM tuple:
first of all, using F
K(c
x) For data in F column of FM namely
Performing XOR operation to realize decryption to obtain
Then adopt
Decrypting an element of a first portion of data in an L-column of FM as a key
To obtain
Will be provided with
And elements of the second part of the data in the L columns of FM
Carrying out XOR operation to obtain an XOR operation result; then proceed to step X24.
Preferably, the dictionary γ is set to be a fixed-length dictionary, and in the step X2, the updating process of the dictionary γ is implemented as follows:
when a new keyword corresponds to a tag of a data file and index information of the corresponding data file need to be added to the dictionary γ, that is, when a new keyword tag index pair needs to be added to the dictionary γ, if the dictionary currently stores a full tag index pair, the new keyword tag index pair is replaced by the tag index pair with the smallest timestamp in the dictionary γ, and when a plurality of new keyword tag index pairs are stored, the plurality of tag index pairs with the smallest timestamp in the dictionary γ are replaced.
The fifth purpose of the invention is realized by the following technical scheme: a computing device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the keyword search method according to the fourth aspect of the present invention.
Compared with the prior art, the invention has the following advantages and effects:
(1) in the data searchable encryption method, firstly, a data file uploaded by a data owner is obtained; extracting key words of each data file; and abstract extraction is carried out on each data file to obtain an abstract file; generating a dictionary gamma according to the corresponding relation between each keyword and each data file, and encrypting each data file to obtain an encrypted data file; meanwhile, substring searching and encrypting are carried out on each abstract file to obtain an encrypted abstract file; when the dictionary gamma, the encrypted data file and the encrypted abstract file acquired by the method are uploaded for searching, the keyword searching can be carried out not only through the dictionary but also through the abstract file, and the keyword searching efficiency can be higher.
(2) In the keyword searching method, firstly, a dictionary gamma, an encrypted data file and an encrypted summary file which are obtained by the data searching and encrypting method are obtained; firstly, determining whether the encrypted data file comprises the keyword or not by a search dictionary gamma after receiving each keyword which is sent by a user and needs to be searched; if so, returning the corresponding encrypted data file as a query result to the user for decryption; if not, further performing substring search on each keyword needing to be searched in the encrypted abstract file set; and if the key word is inquired in the abstract file after the substring search is carried out, returning the encrypted data file corresponding to the abstract file to the user as an inquiry result, calculating a label of the key word corresponding to the data file and index information of the corresponding data file, and adding the label and the index information into the dictionary gamma. In the invention, when the corresponding keyword is not searched through the dictionary gamma, the corresponding label of the keyword is not in the dictionary, namely the keyword is not in the keyword set of the initially generated dictionary, under the condition, the invention searches the keyword in the abstract file in a substring search mode, when the keyword is searched in the abstract file, the label corresponding to the keyword and the index information of the data file are added into the dictionary gamma, and then the corresponding keyword can be searched through the dictionary gamma in the next search; the method for updating the keyword dictionary in the searching process by combining the substring searching mode can enable the content of the keyword dictionary to be more accurate and flexible, is not limited by a keyword extraction algorithm, and greatly improves the searching efficiency of the keywords.
(3) In the keyword search method, the dictionary gamma is set as a dictionary gamma with the length, when a new keyword corresponding to a data file and index information corresponding to the data file need to be added to the dictionary gamma, namely when a new keyword corresponding to a label index pair needs to be added to the dictionary gamma, if the dictionary is stored with full label index pairs, the new keyword corresponding to the label index pair replaces the label index pair with the smallest timestamp in the dictionary gamma, and when the new keyword corresponding to the label index pair is multiple, the new keyword corresponding to the label index pair with the smallest timestamp in the dictionary gamma is replaced with the label index pair with the smallest timestamp in the dictionary gamma. The updatable keyword dictionary adopts a feedback mechanism similar to a fast table, so that the influence on a memory caused by continuous expansion of the dictionary due to incorrect query records can be avoided, and the dictionary can cover and update keywords with low use frequency in a fixed dictionary size.
Drawings
FIG. 1 is a flow chart of a data searchable encryption method of the present invention.
FIG. 2 is a flow chart of a keyword search method of the present invention.
FIG. 3 is a linked list index, linked list and FM map of the present invention.
FIG. 4 is a general block diagram of the data searchable encryption and keyword search methodology of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example 1
The embodiment discloses a data searchable encryption method, as shown in fig. 1, the steps are as follows:
step S1, acquiring the data file uploaded by the data owner;
step S2, extracting keywords of each data file; simultaneously, abstracting each data file to obtain abstract files; in this embodiment, the process of acquiring each summary file specifically includes:
firstly, extracting a summary file from a data file through a document summary extraction algorithm; and then, taking the file number corresponding to the abstract as an index, storing the abstract at a corresponding position, and performing character filling on the rest positions to form an abstract file.
Step S3, according to the corresponding relation between each keyword and each data file, generating a dictionary gamma after data processing is carried out through an encryption algorithm, wherein the dictionary gamma stores the label corresponding to each data file by each keyword and the index information corresponding to each data file by each keyword, and for each keyword, the label corresponding to each data file by the keyword and the index information corresponding to each data file by the keyword in the dictionary gamma are in one-to-one pairing relation; the specific process of generating the dictionary γ after performing data processing by an encryption algorithm according to the correspondence between each keyword and each data file is as follows:
s31, establishing an empty table L, and selecting a master key K for the table L;
s32, aiming at each keyword, obtaining the keyword comprisesEach data file of the key word, and a pair of subkeys K is generated for the key word by the main key K1,K2:
K1←F(K,1||ω);
K2←F(K,2||ω);
Wherein omega is a keyword;
step S33, firstly, aiming at each keyword, numbering each data file comprising the keyword to obtain a file number corresponding to each data file, and sequencing each file number to obtain a sequence number of each file number; then for each keyword, a key K is used1Generating labels in sequence according to each file number corresponding to the keyword, and simultaneously adopting a secret key K2Encrypting each file number corresponding to the keyword in sequence, and taking the encrypted result as the index information of the data file corresponding to the keyword to obtain a label index pair (L)i,di):
Li←F(K1,i);
di←Enc(K2,idi);
i=0,1,…,N-1;
Wherein L isiTo adopt a secret key K1The ith document number id corresponding to the keyword omegaiThe generated label; diTo adopt a secret key K2I (th) file number id corresponding to encryption keyword omegaiThen obtaining a result, using the result as the key word to correspond to the file number as idiN is the total number of data files including the keyword ω;
s34, obtaining one label index pair (L) for each keywordi,di) Sequentially inserting the data into a table L according to the order of a dictionary gamma; and indexes a pair (L) for each tagi,di) Time with timestamp addediObtaining a product containing (L)i,di,timei) A dictionary gamma is created through the table L; wherein timeiIs the initial time for encryption to complete each tag index pair (L)i,di) Middle index information diThe time of (d);
step S4, encrypting each data file to obtain an encrypted data file; and encrypting each abstract file to obtain the encrypted abstract file.
In this embodiment, each digest file is subjected to sub-string search encryption by using a Burrows-Wheeler conversion algorithm and an FM index technique, wherein the conversion Burrows-Wheeler conversion (BWT) algorithm converts a data stream by entropy of each character. In short, the data stream S is converted into the encoding W such that the compression algorithm provides a high compression rate, the steps of the conversion being substantially as follows: first, the algorithm builds a matrix W by changing the sequence of tokens $ after appending a termination token to the input string S. The changed sequence in each iteration is appended as a new row to the matrix W. Finally, the rows of W are arranged in ascending order according to the dictionary order. The data obtained after BWT algorithm conversion is mapped into FM through LF mapping technology, the LF mapping technology takes the first column F and the last column L of BWT conversion, and the original character string S is reconstructed through the iterative process of the algorithm. From the first element of each column of F and L, L is used as an index to the F column. The elements of the L columns are added to a last-in-first-out stack each time. The value of the current position of the L columns will be used as the index of the F columns in the next cycle. Where at the first iteration the pointer simultaneously points to the first location of F, L. From this the last position F7 is found]S. The character of the current L (i.e., s) is added to the stack D. The next iteration, the current character of L is the next F column index. The character i is pushed. When $ at L, the process ends. And (4) popping all elements in the stack D by an algorithm to obtain an initial character string S. In the FM index technique, FM consists of three column groups. The first is the F column and the second is the L column in the LF map, which corresponds to BWT (S). The last one is the suffix array SA. SA includes the position of each column i, sub-string in the original character string S, i of W matrix obtained after BWT conversionthAnd (6) rows.
The specific process of substring search encryption in this embodiment is as follows:
step S41, respectively creating a linked list aiming at each different character in the abstract file; for each character linked list, each node storage tuple is < nptr, addr >, nptr is a pointer pointing to the next node of the character linked list, addr is the position of the character at a certain position in the summary file in the FM index, and addr in different node storage tuples in the character linked list are the positions of the characters at different positions in the summary file in the FM index respectively; for example, a character T exists in 10 positions of a certain summary file, then addr in the storage tuple from the 1 st node to the 10 th node in the established linked list of the character T is the position of the character T in the 10 positions of the summary file in the FM index.
Aiming at each different character in the abstract file, the first node of each character linked list, namely the storage tuple of the linked list head, is encrypted to obtain:
wherein<nptr1,addr1>For the first node of each character chain table, i.e. the memory tuple of the head of the chain table, cmThe number of the m characters in different characters of the abstract file is Y, and the Y is the total number of the different characters in the abstract file; k' is a secondary key, FK′(cm) Indicating that the character c is pointed to by the secondary key KmEncryption is performed.
Step S42, aiming at each different character in the abstract file, firstly carrying out encryption processing, taking the data after the encryption processing of each different character as a linked list index to obtain a linked list index set, and respectively mapping each linked list index corresponding to each different character to a linked list head of each different character linked list to obtain a mapping relation between the linked list index and the linked list head of each different character; after encryption processing of different characters in the summary file, the method comprises the following steps:
k being the master key, FK(cm) Indicating that the character c is pointed to by the master key KmCarrying out encryption; fK′(cm) Indicating that the character c is pointed to by the secondary key KmEncryption is performed.
As shown in fig. 3, for the linked list indexes LLSET, the linked list LL, and the FM map obtained in the above steps in this embodiment, each linked list index is mapped to a linked list header of one linked list, that is, the corresponding linked list can be obtained through the linked list index; each node in each linked list is mapped to each group of FM tuples in the FM table correspondingly, namely each row in the FM table, and the corresponding linked list can be obtained through the FM tuples through the mapping relation between the linked list and the FM. In FIG. 3
Indicating character c
mThe tuple stored in the ith byte in the corresponding linked list, i is 1,2,3 ….
The embodiment also discloses a data searchable encryption system, which includes:
the data file acquisition unit is used for acquiring a data file uploaded by a data owner;
a keyword extraction unit for extracting keywords of each data file,
the abstract extraction unit is used for extracting an abstract of each data file to obtain an abstract file;
the dictionary generating unit is used for generating a dictionary gamma after data processing is carried out through an encryption algorithm according to the corresponding relation between each keyword and each data file, wherein the dictionary gamma stores the label corresponding to each data file by each keyword and the index information corresponding to each data file by each keyword, and the label corresponding to each data file by each keyword and the index information corresponding to each data file by each keyword are in one-to-one pairing relation aiming at each keyword;
the data file encryption unit is used for encrypting each data file to obtain an encrypted data file;
and the digest file encryption unit is used for searching and encrypting substrings of the digest files to obtain encrypted digest files.
The embodiment also discloses a terminal, which comprises a processor and a memory for storing the executable program of the processor, wherein when the processor executes the program stored in the memory, the data searchable encryption method of the embodiment is realized. In this embodiment, as shown in fig. 4, the terminal may be a computer, which serves as a client for data owner to upload a data file into the computer, and then the computer executes the above data searchable encryption method in this embodiment to obtain a dictionary γ, an encrypted data file, and an encrypted digest file, and the computer may upload the obtained dictionary γ, encrypted data file, and encrypted digest file to a server, so that an authorized user (a user owning a master key issued by the system) can search for a corresponding data file through a keyword.
Example 2
The embodiment discloses a keyword search method, as shown in fig. 2, including the following steps:
step X1, first obtaining a dictionary γ, an encrypted data file, and an encrypted digest file obtained by the data searchable encryption method of the embodiment;
when receiving each keyword which is sent by a user and needs to be searched, firstly, determining whether the encrypted data file comprises the keyword through a search dictionary gamma; if so, returning the corresponding encrypted data file as a query result to the user for decryption; if not, go to step X2;
in this embodiment, a specific process of determining whether there is any encrypted data file including the keyword by searching the dictionary γ for each keyword that needs to be searched is as follows:
step X11, aiming at each keyword which needs to be searched by the user, generating a pair of sub-keys K 'for the keyword according to the main key K sent by the user'1,K′2:
K′1←F(K,1||ω′);
K′2←F(K,2||ω′);
Wherein omega' is a keyword which needs to be searched by a user; wherein the function F () in the present embodiment represents a hash function. The main key K and each keyword needing to be searched are simultaneously sent by a user;
step X12, for each keyword needing to be searched, traversing the file number sequence corresponding to the data file, and passing through the sub-key K'1Generating a label of the data file of which the keyword corresponds to each file number:
Li′←F(K′1,i′);i′=0,1,2,…I;
wherein I' is a file number sequence number corresponding to the traversed data file, I is a maximum value of the file number sequence number corresponding to the traversed data file, and I +1 is the total number of the preset data files including the keyword to be searched; l isi′A label corresponding to the data file with the file number i 'for the keyword omega';
step X13, for each keyword which needs to be searched, searching dictionary gamma for whether there is label L of data file corresponding to each file number by the keyword generated in the step X12i′;
If not, the data file corresponding to the keyword required to be searched cannot be searched through the dictionary γ, and the process proceeds to step X2.
If yes, index information paired with the label is obtained in a dictionary gamma, and then the sub-key K 'of the keyword is passed'2Decrypting the index information, acquiring the corresponding encrypted data file through the decrypted index information, and returning the encrypted data file to a user for decryption as a query result; at the same time, the time stamp of the tag index pair stored in the dictionary γ is updated to the sub-key K 'of the keyword'2Time of completion of decrypting the index information;
the index information paired with the label acquired in the dictionary γ is:
di′←Get(γ,Li′);
wherein d isi′For obtaining and labeling L in dictionary gammai′Paired index information;
wherein, the sub-key K 'of the keyword'2The obtained decrypted index information is:
di←Dec(K′2,di′);
wherein d isiIs di′Sub-key K 'by keyword ω'2Decrypted index information, wherein the decrypted index information diThe file number of the data file containing the keyword omega' is obtained;
step X2, performing substring search on each keyword to be searched in the encrypted abstract file set;
if the key word is searched in the abstract file after the substring search, the encrypted data file corresponding to the abstract file is returned to the user as a query result; and if the user confirms that the data files are correct, determining that the corresponding data files as the query result comprise the key words, calculating labels of the data files corresponding to the key words and index information of the corresponding data files, adding the labels to the dictionary gamma, and updating the dictionary gamma. In this embodiment, the dictionary γ is set to be the dictionary γ with a fixed length, when it is necessary to add a tag of a data file corresponding to a new keyword and index information of the corresponding data file to the dictionary γ, that is, when it is necessary to add a tag index pair of the new keyword to the dictionary γ, if the dictionary currently stores a full tag index pair, the tag index pair of the new keyword is replaced by the tag index pair with the smallest timestamp in the dictionary γ, and when the tag index pair of the new keyword is multiple, the tag index pairs with the smallest timestamp in the dictionary γ are replaced by the multiple tag index pairs with the smallest timestamp in the dictionary γ.
And if the substring search is passed and the substring search fails, returning a search failure result to the user.
In the above step X2, the encrypted digest file set is correspondingly sub-string searched by using a Burrows-Wheeler conversion algorithm and an FM indexing technique, and the specific process is as follows:
step X21, aiming at the keyword omega' needing to be searched, generating a keyword query token tkT,S:
tkT,S=F(K,ω′[1…M])=F(K,ω′[1]),F(K,ω′[2]),…F(K,ω′[M]),F(K′,ω′[M]);
Wherein, ω ' 1, ω ' 2, …, ω ' M are each character of the keyword ω ' to be searched, M is the total number of characters of the keyword ω '; k' is a secondary key, K ═ F (K, 2), K is a primary key;
step X22, aiming at each character omega 'of the keyword omega' required to be searched]And M is 1,2,3, … M, which is first encrypted to yield:
then search the ciphertext from the linked list index set
By the index of each character ω' [ m ]]The mapping relation between the linked list index and the linked list head is obtained to obtain each character omega' [ m ]]A linked list of;
step X23, for the last character ω ' M of the keyword ω ' that needs to be searched, mapping each node in the linked list of the character ω ' M to the encrypted FM tuple:
wherein
Data corresponding to column F at FM;
wherein
Data corresponding to L columns at FM;
wherein E (pos)j) Corresponding to data at column j of SA of FM, posjRepresenting a position ciphertext of a character corresponding to the data of the jth line of the SA column in the summary file, wherein n is the total line number of FM;
wherein the content of the first and second substances,
for the character corresponding to the data in column fth and row j of FM,
corresponding character for data in FM Fth column and j th line
The position number of (2);
for the character corresponding to the data in column lth and row jth of FM,
the character corresponding to the data in the L column and j row of FM
The position number of (2);
for each encrypted FM tuple to which each byte in the linked list of ω' M maps:
first of all, using F
K(ω′[m]) For data in F column of FM namely
Performing XOR operation to realize decryption to obtain
Then adopt
Decrypting an element of a first portion of data in an L-column of FM as a key
To obtain
Will be provided with
And elements of the second part of the data in the L columns of FM
Performing exclusive-or operation to obtain an exclusive-or operation result, and then entering step X24;
step X24, for each xor operation result obtained in the previous step, searching a row with data as the xor operation result in an F column of the FM, then obtaining an FM tuple of the row, finding a linked list with corresponding nodes mapped to the FM tuple according to a mapping relationship between each node of the linked list and the FM tuple in the FM table as shown in fig. 3, and thus obtaining a character c corresponding to the linked listxAs the currently searched character; wherein x is the number of times data is searched in the F column of the FM currently; go to step X25;
step X25, determination of each character c acquired in step X24xWhether or not there is a sum character omega' [ M-x]The same character;
if yes, judging whether the number x of data searching in the F column of the FM is equal to M-1 or not; if yes, ending substring search, successfully searching substrings, and enabling the corresponding abstract files to comprise keywords omega' needing to be searched; if not, go to step X26;
if not, ending substring search, and returning a result of substring search failure, namely, the corresponding abstract file does not contain the keyword omega';
in step X26, for the character cx that is obtained in step X24 and is the same as the character ω' [ M-X ], each FM tuple obtained when the character cx is obtained in step X24 is obtained, and the following operations are performed for each FM tuple:
first of all, using F
K(c
x) For data in F column of FM namely
Performing XOR operation to realize decryption to obtain
Then adopt
As a secretThe key decrypts the elements of the first portion of the data in the L columns of FM
To obtain
Will be provided with
And elements of the second part of the data in the L columns of FM
Carrying out XOR operation to obtain an XOR operation result; then proceed to step X24.
The embodiment also discloses a keyword search system, which includes:
a data file acquisition module, configured to acquire a dictionary γ, an encrypted data file, and an encrypted digest file that are obtained by the data searchable encryption method according to the embodiment;
the keyword receiving module is used for receiving each keyword which is sent by a user and needs to be searched;
the first keyword searching module is used for determining whether the encrypted data file comprises the keyword or not through a searching dictionary gamma aiming at each keyword needing to be searched;
the keyword second searching module is used for searching substrings in the encrypted summary file set under the condition that the data file set cannot be searched aiming at each keyword which needs to be searched;
the query result returning unit is used for returning the query results of the first keyword searching module and the second keyword searching module to the user;
the dictionary gamma updating unit is used for updating the dictionary gamma according to the query result of the second keyword searching module, and specifically comprises the following steps: in case that the user confirms that the query result of the second search module for the keyword is correct, it is determined that the corresponding data file as the query result includes the keyword, a tag of the data file corresponding to the keyword and index information of the corresponding data file are calculated and added to the dictionary γ.
The embodiment also discloses a computing device, which comprises a processor and a memory for storing the executable program of the processor, wherein when the processor executes the program stored in the memory, the keyword search method of the embodiment is realized.
In this embodiment, as shown in fig. 4, the computing device includes a client and a server, where the client is a computer or other intelligent terminal, the client is user-oriented, and the user inputs a keyword and a master key to be searched through the client. After receiving the key word and the main key input by the user, the client executes step X11 of the key word searching method of the embodiment to generate a pair of sub keys of the key word and sends the pair of sub keys to the server, and after receiving the sub keys, the server executes step X12 and step X13 of the key word searching method of the embodiment to determine whether the key word omega' to be searched can be found in the encrypted data file through the search dictionary; the codes of step X12 and step X13 in which the server performs the above keyword search method of the present embodiment are as follows:
For(i′=0;i′!=⊥;i′++){
Li′←F(K′1,i′);
di′←Get(γ,Li′) (ii) a V/calculate Li′Then the contained label L can be found in the dictionary gammai′To obtain corresponding index information di′;
di←Dec(K′2,di′);//di′Decrypting to obtain the file number of the data file comprising the keyword omega';
refresh (time); updating a timestamp corresponding to a label index pair of the label in the dictionary gamma corresponding to the keyword omega';
when the data file including the keyword ω ' cannot be searched by searching the dictionary γ, the server performs steps X21 to X26 of the keyword search method, that is, searches the digest file set for a digest file including the keyword ω ', and if the search is successful, returns the searched encrypted data file to the client, and adds the tag index corresponding to the keyword ω ' to the dictionary γ to update the dictionary γ.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.