CN108366072B

CN108366072B - Cloud storage method supporting voice encryption search

Info

Publication number: CN108366072B
Application number: CN201810182984.8A
Authority: CN
Inventors: 李会格; 张方国
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-03-06
Filing date: 2018-03-06
Publication date: 2020-07-24
Anticipated expiration: 2038-03-06
Also published as: CN108366072A

Abstract

The invention belongs to the field of cloud storage, and particularly relates to a cloud storage method supporting voice encryption search. The user, the voice recognizer and the cloud server solve the problem of voice search encryption at the cloud end by operating a key generation algorithm, an encryption algorithm, a voice search instruction generation algorithm, a search encryption document algorithm and a local decryption algorithm. Compared with the existing text encryption searching mode, the voice encryption searching technology can be executed on any equipment supporting the voice function, the dependency on a keyboard is weakened, and the privacy of user data can be guaranteed not to be revealed. In addition, the voice encryption search can also detect the voice of a searcher, and the possibility of imitation of others and recording of electronic equipment is eliminated, so that the right of user search can be better maintained.

Description

Cloud storage method supporting voice encryption search

Technical Field

The invention belongs to the field of cloud storage, and particularly relates to a cloud storage method supporting voice encryption for searching documents.

Background

From ancient times to present, the storage mode of information has undergone three changes: paper archiving, local electronic archiving and cloud storage. The cloud storage can enable various different types of storage devices in a network to be integrated and cooperatively work through some application software, and provide data storage and service access for the outside. The system is established, so that a user can really realize the convenience of inquiring personal data on different devices at any time and any place, and the cost of the user for maintaining the devices is reduced to a certain extent. Thus, more and more users tend to save personal data to cloud products such as Amazon S3, hundredth clouds, and the like.

However, when the data is stored in the cloud, the user indirectly loses direct control of the data. The sensitive information in these data is often the subject of being stolen by others. To protect the privacy of data, a user usually encrypts data and then stores a ciphertext in the cloud. But what encryption algorithm is used is crucial because it is how to quickly and accurately implement conventional operation operations such as search, comparison, etc. on the encrypted data. In terms of searching, Song et al first proposed the concept of symmetric search encryption in 2000 and presented a specific example algorithm.

Symmetric search encryption schemes typically involve both the user and the server. Specifically, the user first encrypts the plaintext document D using a secure symmetric cryptographic algorithm, and then uploads the corresponding encryption result C to the server. If the cloud space occupancy rate is not considered, the user can also generate an encrypted search index I for the plaintext document D, and the index I is stored on the server together with the ciphertext document C. When a user wants to inquire a document containing a keyword w, the user firstly generates a search instruction t (w) for the keyword w locally by using the own key, and then sends the search instruction t (w) to the server. And after receiving the search instruction t (w), the cloud server performs matching test on the ciphertext document C by using t (w) under the condition of no index I, and returns the corresponding ciphertext document C to the user if the test is passed. And under the condition that the index I exists, the server calculates the pointer values of the target documents by combining t (w) and I, and then returns the encrypted documents C corresponding to the pointers to the user. And finally, after the user receives the C, the user locally decrypts the ciphertext C by using the own key.

After the symmetric search encryption algorithm is proposed, the subsequent research of the scholars mainly aims at the aspects of search function expansion, safety analysis and the like. On the search function, the initial symmetric search encryption algorithm only supports single keyword search, and later researchers respectively provide algorithms such as Boolean inquiry, fuzzy search, subset search, range search, sequencing search and the like; in the aspect of security analysis, researchers mainly analyze the security degree of the existing symmetric search encryption algorithm.

In fact, the current symmetric search encryption algorithm basically solves the search problem in the text form, and the search encryption algorithm has great dependence on a keyboard and a display screen. However, there still exist some small intelligent electronic devices without text input function, such as smart band, glasses, vehicle-mounted instrument, etc., and these devices cannot realize search encryption at present. Voice-assisted tools are often provided on these devices that assist users in voice communications in real time. In 2013, the advanced director Scott Huffman of Google engineering division indicates that future equipment only needs voice and does not need a display screen.

In addition to overcoming the current limitation that hands and eyes need cooperative cooperation, voice search is much faster than traditional text input in search speed. Although speech-related search engines exist today, such search engines do not take into account privacy issues of the data queried by the user and the speech input. Once the voice of the user is acquired by others, the privacy of the cloud data of the user is easily threatened under the condition that whether the voice source is legal or not is not distinguished. On the other hand, the current text search form cannot effectively detect whether the input keyword is input by the user or by others, but the voice can do the same. In 2017, Chen et al propose a system for accurately recognizing voice by using a magnetometer, but the system does not solve the problem of cloud-side voice search encryption.

In view of the above problems, it is considered necessary to provide a cloud storage method for encrypting an electronic document by using a search cloud terminal, which is convenient and secure by using voice.

Disclosure of Invention

The invention aims to provide a cloud storage method supporting voice encryption search. Through voice search, a user can quickly and conveniently inquire the files of the cloud of the user on any equipment supporting the voice function. Besides, the method can also distinguish the truth of the voice of the searcher. Compared with the existing text search mode, the privacy of the user can be better protected.

The symmetric search encryption method supporting voice provided by the invention mainly comprises three participants: a user, a speech recognizer and a cloud server.

The user is used for encrypting and uploading own local data to the cloud server on one hand, and is used for inputting voice information which is required to be inquired on the other hand.

The voice recognizer is used for detecting the authenticity of voice information input by a user and extracting main keyword information from the voice information.

The cloud server is used for storing and managing data of the user and helping the user to execute a search task.

The invention mainly comprises the following steps according to the time sequence:

(S1), the key generation algorithm is run. In the algorithm, a user inputs a system security parameter k and outputs n +2 k bits of key

Where n represents the total number of documents to be uploaded by the user.

(S2), the encryption algorithm is run. In the algorithm, users respectively use local documents D by using a symmetric encryption algorithm₁,D₂,…,D_nEncrypted as C₁,C₂,…,C_nAnd an encryption index table I is constructed for the encryption index table I. Finally, the user will encrypt the document C₁,C₂,…,C_nAnd storing the index table I on the cloud server.

(S3), the voice search instruction generation algorithm is executed. The algorithm is mainly completed by a speech recognizer and mainly comprises the following steps:

(S3a) a voice recognition detecting process, in which the voice recognizer recognizes the voice to judge whether the voice is the real voice of the user himself or herself during the input of the voice information w by the user. If the detection result shows that the voice is not the voice of the user, the subsequent operation is refused to be executed, and the input of the information inquired by the user is prompted to be wrong. If the detection result shows that it is the user' S own voice, a speech text keyword extraction process is performed (S3 b).

(S3b) a speech text keyword extraction process, the speech recognizer extracting a main text keyword w 'from the speech information w if the speech recognition detection result shows that it is the user' S own voice. Here, the main text keyword w' may be the entire content of the speech information w, i.e. both are identical; or the main text keyword w 'is only part of the content of the speech information w, i.e. w' is composed of the main keywords in w.

(S3c) Speech text keyword encryption Process, the speech recognizer uses the secret key K₁And K₂And encrypting the extracted main text keyword w ' and generating a voice search instruction T (w '), and sending the voice search instruction T (w ') to the cloud server.

(S4), the search encrypted document algorithm is executed. The algorithm is completed by the cloud server. The cloud server calculates the pointer value of the document required by the user in the index I by utilizing T (w'), and then, corresponding encrypted document C_ijSent to the user, where C_ijRepresenting an encrypted document C₁,C₂,…,C_nWhere ij is a subset of {1, …, n }, i.e., a document associated with speech information w

(S5), the local decryption algorithm is run. The algorithm is performed by the user. User receives document set C_ijThereafter, using the secret key

For document C_ijDecrypting to obtain a plaintext D_ijWherein

Representing an encrypted document D_ijThe key to be used is selected such that,

in the step (S2), the user mainly uses a symmetric encryption algorithm in encrypting the document and creating the index table.

Specifically, the process of encrypting the document by the user is as follows:

(S2a) user input Key

Applying symmetric encryption algorithm to document D_iCarry out encryption, addFor sealing fruits C_iWherein i is 1, …, n.

Specifically, the process of constructing the index I by the user is as follows:

(S2b), the slave document D₁,D₂,…,D_nExtracting out a keyword set W ═ { W ═ W₁,…,w_mAnd m refers to the total number of the keywords. User utilization of pseudo-random function f₁：{0,1}^k×{0,1}^*→{0,1}^kEncrypt each keyword w one by one_j：f₁(K₁,w_j)＝tr(w_j) J is 1, …, m. Selecting m n-dimensional arrays D (w) initialized to be empty₁),…,D(w_m) And the arrays are assigned as follows: if the ith document D_iContaining a keyword w_jThen D (w)_j) Is set to 1, otherwise 0, i equals 1, …, n, j equals 1, …, m. The user then utilizes a pseudo-random function f₂：{0,1}^k×{0,1}^*→{0,1}^kRespectively for the keyword w_jComputing

j is 1, …, m. Will be provided with

As a key of the symmetric encryption algorithm, pair D (w)_j) The encryption is carried out, and the encryption result is recorded as e (w)_j) Where j is 1, …, m. For convenience of description, the encryption result e (w) is denoted by the letter l_j) Length (j ═ 1, …, m). After completing the above steps, the user will apply the array (tr (w)_j),e(w_j) Sorting according to a dictionary sorting method, and storing the sorted result in an array I of m × (k + l) dimensions, where j is 1, …, m.

In the step (S3), the method for generating the voice search instruction mainly includes three steps of voice recognition detection, voice text keyword extraction, and voice text keyword encryption.

Preferably, the specific process of voice recognition detection is as follows:

in the process that a user inputs voice information w to be inquired, a voice recognizer firstly utilizes a magnetometer to check whether the voice w contains magnetic field information, and if the voice w contains the magnetic field information, the search service is terminated; if the monitoring result shows that no magnetic field information exists, in order to further eliminate the possibility of imitation by others, the recognizer calls the existing automatic speaker verification algorithm (ASV algorithm) to further recognize the voice. If the ASV algorithm finds that the speech is indeed the user's own voice, the next step is performed, otherwise a stop sign is output.

Preferably, the specific process of extracting the keywords of the voice text is as follows:

the speech recognizer converts the user's voice into corresponding digital signal information by using an audio signal analog-to-digital converter, and extracts main text keyword information w' from the voice by using a hidden Markov model.

Preferably, the specific process of encrypting the speech text keyword is as follows:

the speech recognizer uses a pseudo-random function f₁：{0,1}^k×{0,1}^*→{0,1}^kAnd a secret key K₁Calculating the text keyword information w': tr (w') ═ f₁(K₁W'). Then, using a pseudo-random function f₂：{0,1}^k×{0,1}^*→{0,1}^kAnd a secret key K₂Calculating the text keyword information w': k is a radical of_w′＝f₂(K₂W'). Finally, the speech recognizer sets T (w ') to (tr (w'), k_w′) And sending the data to the cloud server.

The specific operation flow of the cloud server performing the search for the encrypted document in the step (S4) is as follows;

(S4a) the cloud server uses T (w ') ═ tr (w'), k_w′) Is found in the index I in a dictionary lookup manner (tr (w '), e (w')). Then, T (w ') (tr (w'), k) is set to_w′) The second component k in_w′And e (w ') is decrypted to obtain D (w') as a decryption key of the algorithm. If the ijth bit of D (w') is

The ijth encrypted document C_ijReturning to the user, otherwise, not returning the document C_ijWherein

The process in which the user locally decrypts the document in the step (S5) is as follows:

(S5a), user key combination

As decryption key of algorithm, for corresponding document C returned_ijDecrypting to obtain the plaintext document related to the voice information w

The present invention has the following advantageous effects.

1. The privacy of cloud documents is disclosed. The plaintext document data is encrypted and then stored in the cloud server, and since an attacker cannot obtain a decryption key of the user, the privacy of the user data can be guaranteed against being invaded by others.

2. Convenience and quickness of inquiry are brought to the user. The data is stored in the cloud, and a user can look up own documents on any equipment at any time and any place. On the other hand, the voice search can perform the search at a faster speed than the conventional text-form search, and thus the method is very useful for the elderly and the teenagers.

3. The right of the user query is protected from being infringed by others. The searching mode by voice not only breaks the limitation of the current hand-eye requirement, but also checks the identity information of the retriever, so that the right of user query can be maintained better.

Drawings

Figure 1 is a system frame structure of the present invention,

FIG. 2 is a flow chart of the internal operation of a speech recognizer.

Detailed Description

The technical solution of the present invention will be specifically described below by taking embodiment 1 as an example, with reference to the accompanying drawings. First, we briefly describe the mathematical notation used.

-Enc (·,. Dec (·,)) a secure symmetric cryptosystem algorithm, such as the AES algorithm, wherein.

-Enc (·,), Dec (·,) determines a symmetric encryption algorithm, where Enc is the corresponding encryption algorithm,. Dec is the corresponding decryption algorithm, and the ciphertext length after the Dec () operation is denoted by the letter l.

The key used to encrypt the document and build the index, each key is k bits in length.

f_i:{0,1}^k×{0,1}*→{0,1}^kPseudo-random function, i ═ 1, 2.

D₁,D₂,…,D_nA clear text document to be uploaded is required.

D (w) an n-dimensional 0-1 string containing the keyword w.

W D₁,D₂,…,D_nThe set of keywords w in (2).

I m × (k + l) dimensional array, the elements in the array are arranged according to a dictionary ordering method.

Example 1

The symmetric encryption search algorithm supporting the voice form mainly comprises five steps:

the first step is that the user runs a key generation algorithm: firstly, the user inputs a system security parameter k to generate a key with n +2 k bits

Where n represents the total number of documents to be uploaded by the user.

The second step is that the user runs an encryption algorithm: first, the user will

Secret key, pair D, which is regarded as a symmetric encryption algorithm Enc (·,)_iEncrypting, the result after encryption being C_iHere, i is 1, …, n.

The user then generates an index for these documents. Specifically, the user first gets from document D₁,D₂,…,D_nExtracting the key word w₁,…,w_mAnd store in the set W ═ W₁,…,w_mIn (c) }. Will K₁，K₂Viewed as a pseudo-random function f, respectively₁,f₂For each keyword W in the set W_jAnd (3) calculating: tr (w)_j)＝f₁(K₁,w_j)，

Where j is 1, …, m. Then, the user selects m n-dimensional arrays D (w) initialized to be empty₁),…,D(w_m) And the arrays are assigned as follows: if the ith document D_iContaining a keyword w_jThen D (w)_j) Is equal to 1, otherwise 0 (i-1, …, n, j-1, …, m). Subsequently, the user will

As a key to determine the symmetric encryption algorithm Enc (·, ·), pair D (w)_j) Encrypting, and using e (w) as the result_j) Wherein j is 1, …, m. Because D (w)₁),…,D(w_m) Are the same, so e (w)₁),…,e(w_m) Are the same, and for convenience of description, the letter l is used herein to denote e (w)_j) Length (j ═ 1, …, m). After the above steps are completed, the user uses the dictionary sorting method to pair the array (tr (w)_j),e(w_j) And (j) is sorted to 1, …, m), and sorted results are sequentially stored in an array I with dimension m × (k + l).

Finally, the user encrypts the document C₁,…,C_nAnd uploading the index I to the cloud.

And thirdly, executing a voice search instruction generation algorithm, wherein the voice recognizer mainly recognizes voice information input by a user and extracts corresponding text keyword information in the process, and finally, generating a corresponding search instruction for the text keyword.

Specifically, in the process that the user inputs the voice information w which the user wants to inquire, the voice recognizer automatically calls the voice recognition module to recognize the voice of the searcher. In the module, firstly, the magnetometer is used for detecting the voice w so as to judge whether the voice contains magnetic field information or not, and if yes, the search service is terminated; otherwise, the existing automatic speaker verification algorithm (ASV algorithm) is continuously utilized to eliminate the possibility of artificial voice imitation attack. The automatic speaker verification algorithm mainly detects spectral information and prosodic information in artificial voice, and the accuracy rate can reach 99%. If the ASV algorithm judges that the voice is really the voice of the user, the search system converts the voice of the user into digital signal information by using an audio signal analog-to-digital converter and extracts text keyword information w' from the digital signal information by using a hidden Markov model; otherwise the speech recognizer outputs a termination symbol.

Subsequently, the speech recognizer utilizes a pseudo-random function f₁And f₂And respectively use the secret key K₁And K₂The text keyword information w' is calculated: tr (w') ═ f₁(K₁,w′)，k_w′＝f₂(K₂W'). Let T (w ') ═ tr (w'), k_w′) And finally, sending the value to a cloud server.

The fourth step cloud server executes a search encrypted document algorithm: the cloud server uses T (w ') ═ tr (w'), k_w′) Is found in the index I according to dictionary lookup (tr (w '), e (w')). Then, T (w ') (tr (w'), k) is set to_w′) The second component k in_w′As the decryption key of the Dec (-) algorithm, e (w ') is decrypted and D (w') is obtained. If D (w') is the first

If the bit is 1, the ijth encrypted text is processedGear

Returning to the user, otherwise not returning the document

Step five, the user locally executes a decryption algorithm: user receives ciphertext

Secret key

Decryption Key for the symmetric decryption Algorithm, Dec., for documents

The decryption is carried out, and the end user obtains the plaintext document related to the voice information w

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that several modifications and enhancements can be made without departing from the principles of the present invention, and such modifications and enhancements should also be considered within the scope of the present invention.

Claims

1. A cloud storage method supporting voice encryption search is characterized by comprising three participants: a user, a speech recognizer and a cloud server;

the method comprises the following steps:

s1, operating the key generation algorithm, inputting a system security parameter k by the user, and outputting n +2 k bit keys

Wherein n represents the total number of documents to be uploaded by the user;

s2, running an encryption algorithm, and enabling the user to obtain the local document D₁,D₂,…,D_nEncrypted as ciphertext document C₁,C₂,…,C_nAnd constructing an encryption index table I for the encryption index table I; the end user will encrypt the document C₁,C₂,…,C_nStoring the index table I on the cloud server;

s3, operating the voice search instruction generation algorithm, including the following processes:

s3a, recognizing the voice by the voice recognizer in the process of inputting the voice information w by the user to judge whether the voice is the real sound of the user; if the detection result shows that the voice of the user is not the voice of the user, refusing to execute any subsequent operation, and prompting that the information input inquired by the user is wrong; if the detection result shows that the voice is the voice of the user, performing S3b voice text keyword extraction process;

s3b, in the process of extracting the key words of the voice text, if the voice recognition detection result shows that the voice recognition detection result is the voice of the user, the voice recognizer extracts the main text key words w' from the voice information w; the main text keyword w' is the whole content of the voice information w or only part of the content of the voice information w;

s3c Speech text keyword encryption Process, Speech recognizer utilizes secret Key K₁And K₂Encrypting the extracted main text keyword w ' and generating a voice search instruction T (w '), and finally sending the voice search instruction T (w ') to the cloud server;

s4, operating the encrypted document searching algorithm, calculating the pointer value of the document required by the user in the index I by the cloud server by utilizing T (w'), and then adding the corresponding encrypted document C_ijSent to the user, where C_ijRepresenting an encrypted document C₁,C₂,…,C_nIn a document associated with the speech information w, wherein

S5, running a local decryption algorithm, and receiving the document set C by the user_ijThereafter, using the secret key

For document C_ijDecrypting to obtain a plaintext D_ijWherein

Representing an encrypted document D_ijA key used in the process, wherein

2. The cloud storage method supporting the voice encryption search according to claim 1, wherein: in step S2, a symmetric encryption algorithm is used in the process of encrypting the document and creating the index table by the user.

3. The cloud storage method supporting the voice encryption search according to claim 2, wherein: the process of encrypting the document by the user is as follows:

s2a, user input key

Enc (-) for document D Using a symmetric encryption Algorithm_iEncrypting, the result of encryption using C_iWherein i is 1, …, n;

the process of the user for constructing the index I is as follows:

s2b, from document D₁,D₂,…,D_nExtracting out a keyword set W ═ { W ═ W₁,…,w_mM represents the total number of keywords; user utilization of pseudo-random function f₁：{0,1}^k×{0,1}^*→{0,1}^kEncrypt each keyword w one by one_j：f₁(K₁,w_j)＝tr(w_j) J is 1, …, m; selecting mInitialized to empty n-dimensional array D (w)₁),…,D(w_m) And the arrays are assigned as follows: if the ith document D_iContaining a keyword w_jThen D (w)_j) Is set to 1, otherwise 0, i equals 1, …, n, j equals 1, …, m; the user then utilizes a pseudo-random function f₂：{0,1}^k×{0,1}^*→{0,1}^kRespectively for the keyword w_jComputing

Will be provided with

As a key of the symmetric encryption algorithm Enc (·, ·), pair D (w)_j) The encryption is carried out, and the encryption result is recorded as e (w)_j) Wherein j is 1, …, m; the encryption result e (w) is here denoted by the letter l_j) (j ═ 1, …, m) length; finally, the user will apply the array (tr (w)_j),e(w_j) (j ═ 1, …, m) is sorted according to the dictionary sorting method, and the sorted result is stored in the array I of m × (k + l) dimension.

4. The cloud storage method supporting voice encryption search according to claim 1, wherein: the voice recognition detection process is specifically as follows:

in the process that a user inputs voice information w to be inquired, a voice recognizer firstly utilizes a magnetometer to check whether the voice information w contains magnetic field information, and if the voice information w contains the magnetic field information, the voice recognizer terminates the search service; if the monitoring result shows that no magnetic field information exists, in order to further eliminate the possibility of imitation by others, the voice recognizer calls an automatic speaker verification algorithm to further recognize the voice information w; if it is found that the speech information w is indeed the user's own voice, the next step is performed, otherwise a termination symbol is output.

5. The cloud storage method supporting voice encryption search according to claim 1, wherein: the process of extracting the keywords of the voice text is as follows:

6. The cloud storage method supporting voice encryption search according to claim 1, wherein: the process of encrypting the speech text keywords is as follows:

the speech recognizer first uses a pseudo-random function f₁：{0,1}^k×{0,1}^*→{0,1}^kAnd a secret key K₁Calculating the text keyword information w': tr (w') ═ f₁(K₁W'); then, using a pseudo-random function f₂：{0,1}^k×{0,1}^*→{0,1}^kAnd a secret key K₂Calculating the text keyword information w': k is a radical of_w′＝f₂(K₂W'); finally, the speech recognizer sets T (w ') to (tr (w'), k_w′) And sending the data to the cloud server.

7. The cloud storage method supporting voice encryption search according to claim 1, wherein: the specific operation of the cloud server in step S4 to search for an encrypted document is as follows:

s4a, the cloud server sets T (w ') (tr (w'), k)_w′) Find (tr (w '), e (w')) in index I according to the dictionary lookup method; t (w ') (tr (w '), k) is then set to (w ')_w′) The second component k in_w′As the decryption key of the Dec (·, ·) algorithm, e (w ') is decrypted and D (w') is obtained; if the ij th bit of D (w') is 1, the ij th encrypted document C_ijReturned to the user

Otherwise, the document is not returned

8. The cloud storage method supporting voice encryption search according to claim 1, wherein: the process of the user locally decrypting the document in the step S5 is as follows:

s5a, user key combination

As decryption key for the Dec (-) algorithm, for the returned document

Decrypting to obtain a plaintext document D related to the voice information w_ij