CN112836005B - Cipher text sequencing search method and system based on PCA - Google Patents

Cipher text sequencing search method and system based on PCA Download PDF

Info

Publication number
CN112836005B
CN112836005B CN201911167134.1A CN201911167134A CN112836005B CN 112836005 B CN112836005 B CN 112836005B CN 201911167134 A CN201911167134 A CN 201911167134A CN 112836005 B CN112836005 B CN 112836005B
Authority
CN
China
Prior art keywords
dimension
index
matrix
vector
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911167134.1A
Other languages
Chinese (zh)
Other versions
CN112836005A (en
Inventor
刘良桂
刘政金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shuren University
Original Assignee
Zhejiang Shuren University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Shuren University filed Critical Zhejiang Shuren University
Priority to CN201911167134.1A priority Critical patent/CN112836005B/en
Publication of CN112836005A publication Critical patent/CN112836005A/en
Application granted granted Critical
Publication of CN112836005B publication Critical patent/CN112836005B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a ciphertext sequencing search method and system based on PCA. The dimensionality of the keyword index matrix is reduced by utilizing the PCA algorithm, so that the dimensionality of the key is reduced, and the data encryption speed and the searching efficiency are greatly improved. Aiming at the privacy invasion behaviors of an unauthorized user and an untrusted server, a reversible matrix encryption method is adopted to protect data information, and on the basis, a method for randomly setting a threshold value is provided, so that the randomness of data dimension reduction is realized, and the data security is further improved. In addition, the invention introduces the unit matrix before the dimensionality reduction of the index matrix through the deep research of the PCA algorithm principle, so that the components of the query vector also participate in the dimensionality reduction process, thereby not only improving the safety, but also ensuring the query precision.

Description

Cipher text sequencing search method and system based on PCA
Technical Field
The invention relates to the technical field of cloud computing and network security, in particular to a ciphertext sequencing search method and system based on PCA.
Background
The rise of the cloud storage technology leads the development of the computer technology to a new step, enterprises and users can store huge data on a cloud server, the limitation of a local storage space is avoided, and meanwhile, the enterprises and the users can enjoy faster and more convenient services. For example, saving local storage space, realizing information sharing, processing information by using a fast cloud service, and the like. However, the cloud storage technology brings convenience to users, and meanwhile, some data safety hidden dangers also exist. For example, personal information of the user, confidential documents of the enterprise and other sensitive private information are also easily extracted and leaked by the server. Therefore, in order to protect privacy and security, data owner must perform encryption processing before storing the data in the cloud. Although attacks of illegal users, unauthorized users and untrusted cloud servers can be prevented after data encryption processing, privacy disclosure is prevented, the search difficulty in a large amount of encrypted cloud data is greatly increased, and the search efficiency is also sharply reduced because the encrypted data is not convenient to retrieve like plaintext data. Therefore, research on ciphertext-sorted search schemes is imminent.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a ciphertext sorting and searching method based on PCA, which reduces the computation overhead and realizes efficient ciphertext search on a remote server on the premise of protecting the privacy of a user.
The technical scheme adopted by the embodiment of the invention is as follows:
one aspect of the embodiments of the present invention is to provide a ciphertext sorting search method based on PCA, which includes:
(1) extracting keywords from the document, calculating a standardized word frequency, establishing a keyword index, reducing the dimension of an index matrix through a PCA algorithm, encrypting the index after dimension reduction, and uploading the index to a server;
(2) a data user inputs keywords, a query vector is established, dimensionality reduction is carried out on the query vector through a PCA algorithm, and a trapdoor is generated;
(3) and the data user sends a request to the server, and the cloud server returns the sequencing result to the user through computing and searching operation after receiving the request.
Further, in the step (1), the step of extracting the keywords from the document specifically includes:
when a data owner processes data, firstly extracting keywords of each document to generate a document keyword set, then summarizing the keywords of all documents to generate a non-repeated keyword dictionary, wherein the number of the keywords of the dictionary is n.
Further, in the step (1), the normalized word frequency is calculated, the keyword index is established, then the dimensionality of the index matrix is reduced through a PCA algorithm, the index after the dimensionality reduction is encrypted and uploaded to a server, and the method specifically comprises the following steps:
(1.1) creating an initial index
Calculating the normalized word frequency of each keyword in the dictionary in the document i to generate an index Di,DiIs denoted by Di=(di1,di2,…,dik,…,din)TWherein d isikIs the normalized word frequency of the k-th keyword in the corresponding dictionary in the ith document, so that the m documents form an initial index matrix D (D) with m × n dimensions1,D2,…,Dm)T
(1.2) introducing an identity matrix
Adding an n × n identity matrix E after the initial index matrixWhich are combined into a matrix F of (m + n) × n dimensions, where F ═ D1,D2,…,Dm,E)T
(1.3) random dimensionality reduction of initial index
And randomly setting a threshold value within a certain range, analyzing principal components of the matrix F according to a PCA algorithm, removing similar keywords to obtain a principal component matrix R with dimension of n multiplied by k, and reducing the dimension of the matrix F to ensure that F 'is FR and obtain a matrix F' with dimension of (m + n) multiplied by k. Then, the following n-dimensional data is deleted to obtain a matrix D ' of m × k dimensions, which is expressed as D ' ═ D '1,D′2,…,D′i,…,D′m)TWherein D'1,D′2,…,D′i,…,D′mIs a column vector of dimension k;
(1.4) generating a Key
The data owner randomly generates a reversible matrix M of two (k + u +1) × (k + u +1) dimensions1,M2As a key, and a vector S of dimension k + u +1 as a division indicator, which are denoted as S ═ S (S)1,s2,…,si,…,sk) Where S is {0,1}(k+u+1)(j ═ 1,2, …, k), k being the dimension of each row vector in the reduced-dimension matrix, u +1 being the extended dimension;
(1.5) index dimension extension
For each vector D 'in D'iPerforming dimension expansion from the dimension k to the dimension k + u +1 to obtain a matrix of mx (k + u +1)
Figure BDA0002287755710000021
Wherein each column vector is D'iThe first k dimension of (c) is kept constant, the last dimension is set to a constant of 1, and the k +1 dimension to the k + u dimension are set to any random number epsiloni(ii) a Expanded column vector
Figure BDA0002287755710000022
Is shown as
Figure BDA0002287755710000023
Expanded matrix
Figure BDA0002287755710000024
Is shown as
Figure BDA0002287755710000025
Wherein epsiloni1i2,…,εiuRepresent arbitrary random numbers and they obey the same uniform distribution U (μ' -c );
(1.6) index random partitioning
Each column vector is vectored according to the value of the indicator vector S
Figure BDA0002287755710000031
Is divided into
Figure BDA0002287755710000032
And
Figure BDA0002287755710000033
the segmentation rule is as follows: if S [ n ]]Is equal to 0, then
Figure BDA0002287755710000034
If S [ n ]]Equal to 1, will
Figure BDA0002287755710000035
And
Figure BDA0002287755710000036
set to two non-equal and non-zero random numbers, and their sum equals
Figure BDA0002287755710000037
(1.7) index encryption
Using a secret key M1,M2For the index after division
Figure BDA0002287755710000038
And
Figure BDA0002287755710000039
encrypting to obtain the final encrypted index of the ith document
Figure BDA00022877557100000310
(1.8) the data owner uploads an encrypted fileset C and an encrypted index set I to the cloud server, where I ═ I (I ═ I)1,I2,…,Ii,…,Im)。
Further, the step (2) is specifically as follows:
(2.1) creating a query vector
When a data user inquires, firstly inputting key words, and then carrying out synonym and near synonym expansion on the key words by a program to generate a query vector q; each element of the query vector corresponds to n keywords, denoted q ═ q (q)1,q2,…,qi,…,qn) If the input keyword matches with the keyword in the keyword dictionary, q isiIs 1; otherwise, the value is 0;
(2.2) vector dimensionality reduction
The query vector q is dimensionality reduced such that q ' is qR, and q is reduced from n dimension to k dimension, denoted as q ' ═ q '1,q′2,…,q′i,…,q′k);
(2.3) query vector dimension expansion
And (3) performing dimension expansion on q', from the dimension k to the dimension k + u +1, wherein the expansion rule is as follows: randomly selecting v dimension from the k +1 dimension to the k + u dimension of q' to be set as 1, setting the other dimensions as 0, multiplying the k + u dimension by a non-zero random number r, and then setting the k + u +1 dimension as a random number t; the expanded query vector is represented as
Figure BDA00022877557100000311
(2.4) query vector random partitioning
Expanding the query vector according to the value of the indicator vector S
Figure BDA00022877557100000324
Randomly divided into two vectors
Figure BDA00022877557100000312
And
Figure BDA00022877557100000313
are respectively represented as
Figure BDA00022877557100000314
The segmentation rule is as follows: if S [ n ]]Equal to 0, will
Figure BDA00022877557100000315
And
Figure BDA00022877557100000316
set to two non-equal and non-zero random numbers, and their sum equals
Figure BDA00022877557100000317
If S [ n ]]Is equal to 1, then
Figure BDA00022877557100000318
(2.5) generating trapdoors
By reversal of the secret key
Figure BDA00022877557100000319
And
Figure BDA00022877557100000320
for query vector
Figure BDA00022877557100000321
And
Figure BDA00022877557100000322
the encryption generates a trapdoor T that,
Figure BDA00022877557100000323
further, the step (3) is specifically as follows:
(3.1) submitting the generated trap door to a cloud server by a data user for query;
(3.2) after the cloud server receives the trap door, calculating the inner product of the index and the trap door, sequencing the inner product in a descending order, and then returning k encrypted documents with higher scores to a data user; the inner product is calculated as follows:
Figure BDA0002287755710000041
a second aspect of the embodiments of the present invention provides a ciphertext search system based on PCA, including:
the index establishing module is used for extracting keywords from the document, calculating the standardized word frequency, establishing a keyword index, reducing the dimension of an index matrix through a PCA algorithm, encrypting the index after dimension reduction, and uploading the index to a server;
the trap door creating module is used for creating a query vector according to the key words input by a data user, reducing the dimension of the query vector through a PCA algorithm and generating a trap door;
and the query module is used for sending a request to the server by a data user, and returning the sequencing result to the user through computing and searching operation after the request is received by the cloud server.
Further, the step of extracting the keywords from the document is specifically as follows:
when a data owner processes data, firstly extracting keywords of each document to generate a document keyword set, then summarizing the keywords of all documents to generate a non-repeated keyword dictionary, wherein the number of the keywords of the dictionary is n.
Further, the index establishing module includes:
creating an initial index unit for calculating the normalized word frequency of each keyword in the dictionary in the document i to generate an index Di,DiIs denoted by Di=(di1,di2,…,dik,…,din)TWherein d isikIs the normalized word frequency of the k-th keyword in the corresponding dictionary in the ith document, so that the m documents form an initial index matrix D (D) with m × n dimensions1,D2,…,Dm)T
Introducing an identity matrix unit for adding an n × n identity matrix E after the initial index matrix and merging the n × n identity matrix E into a matrix F with (m + n) × n dimensions, wherein F ═ D1,D2,…,Dm,E)T
An initial index random dimension reduction unit, configured to randomly set a threshold value within a certain range, analyze principal components of the matrix F according to a PCA algorithm, remove similar keywords, obtain a principal component matrix R of n × k dimensions, then reduce the dimension of the matrix F such that F ' is FR, obtain a matrix F ' of (m + n) × k dimensions, then delete the following n-dimensional data, obtain a matrix D ' of m × k dimensions, which is expressed as D ' ═ D '1,D′2,…,D′i,…,D′m)TWherein D'1,D′2,…,D′i,…,D′mIs a column vector of dimension k;
a key generation unit for randomly generating a reversible matrix M of two (k + u +1) × (k + u +1) dimensions by a data owner1,M2As a key, and a vector S of dimension k + u +1 as a division indicator, which are denoted as S ═ S (S)1,s2,…,si,…,sk) Where S is {0,1}(k+u+1)(j ═ 1,2, …, k), k being the dimension of each row vector in the reduced-dimension matrix, u +1 being the extended dimension;
an index dimension extension unit for extending each vector D 'in D'iPerforming dimension expansion from the dimension k to the dimension k + u +1 to obtain a matrix of mx (k + u +1)
Figure BDA00022877557100000515
Wherein each column vector is D'iThe first k dimension of (1) is kept constant, the last dimension is set as a constant 1, and the (k +1) th dimension to the (k + u) th dimension are set as arbitrary random numbers epsiloni(ii) a Expanded column vector
Figure BDA0002287755710000051
Is shown as
Figure BDA0002287755710000052
Expanded matrix
Figure BDA0002287755710000053
Is shown as
Figure BDA0002287755710000054
Wherein epsiloni1i2,…,εiuRepresent arbitrary random numbers and they obey the same uniform distribution U (μ' -c );
an index random division unit for dividing each column vector according to the value of the indication vector S
Figure BDA0002287755710000055
Is divided into
Figure BDA0002287755710000056
And
Figure BDA0002287755710000057
the segmentation rule is as follows: if S [ n ]]Is equal to 0, then
Figure BDA0002287755710000058
If S [ n ]]Equal to 1, will
Figure BDA0002287755710000059
And
Figure BDA00022877557100000510
set to two random numbers which are not equal and not zero, and whose sum is equal to
Figure BDA00022877557100000511
An index encryption unit for encrypting the index by using a key M1,M2For the index after division
Figure BDA00022877557100000512
And
Figure BDA00022877557100000513
encrypting to obtain the final encrypted index of the ith document
Figure BDA00022877557100000514
An uploading unit, configured to upload, by a data owner, an encrypted document set C and an encrypted index set I to a cloud server, where I ═ I (I ═ I)1,I2,…,Ii,…,Im)。
Further, the creating a trapdoor module comprises:
the method comprises the steps of establishing a query vector unit, wherein the query vector unit is used for firstly inputting key words when a data user queries, and then carrying out synonym and near synonym expansion on the key words by a program to generate a query vector q; each element of the query vector corresponds to n keywords, denoted q ═ q (q)1,q2,…,qi,…,qn) If the input keyword matches with the keyword in the keyword dictionary, q isiIs 1; otherwise, the value is 0;
a vector dimension reduction unit for reducing the query vector q so that q ' is qR, reducing q from n dimension to k dimension, denoted as q ' — (q '1,q′2,…,q′i,…,q′k);
A query vector dimension expansion unit, configured to perform dimension expansion on q', from the k dimension to the k + u +1 dimension, according to the following expansion rule: randomly selecting v dimension from the k +1 dimension to the k + u dimension of q' to be set as 1, setting the other dimensions as 0, multiplying the k + u dimension by a non-zero random number r, and then setting the k + u +1 dimension as a random number t; the expanded query vector is represented as
Figure BDA0002287755710000061
A query vector random division unit for dividing the expanded query vector according to the value of the indication vector S
Figure BDA0002287755710000062
Randomly divided into two vectors
Figure BDA0002287755710000063
And
Figure BDA0002287755710000064
are respectively represented as
Figure BDA0002287755710000065
The segmentation rule is as follows: if S [ n ]]Equal to 0, will
Figure BDA0002287755710000066
And
Figure BDA0002287755710000067
set to two non-equal and non-zero random numbers, and their sum equals
Figure BDA0002287755710000068
If S [ n ]]Is equal to 1, then
Figure BDA0002287755710000069
Generating a trapdoor unit for inversion with a secret key
Figure BDA00022877557100000610
And
Figure BDA00022877557100000611
for query vector
Figure BDA00022877557100000612
And
Figure BDA00022877557100000613
the encryption generates a trapdoor T that,
Figure BDA00022877557100000614
further, the query module comprises:
the submitting unit is used for submitting the generated trapdoor to a cloud server for query by a data user;
the query returning unit is used for calculating the inner product of the index and the trapdoors and sequencing the inner product in a descending order after the cloud server receives the trapdoors, and then returning k encrypted documents with higher scores to the data user; the inner product is calculated as follows:
Figure BDA00022877557100000615
the embodiment provided by the invention has the following beneficial effects:
the dimensionality of the keyword index matrix is reduced by utilizing the PCA algorithm, so that the dimensionality of the key is reduced, and the data encryption speed and the searching efficiency are greatly improved. Aiming at the privacy invasion behaviors of an unauthorized user and an untrusted server, a reversible matrix encryption method is adopted to protect data information, and on the basis, a method for randomly setting a threshold value is provided, so that the randomness of data dimension reduction is realized, and the data security is further improved. In addition, the invention introduces the unit matrix before the dimensionality reduction of the index matrix through the deep research of the PCA algorithm principle, so that the components of the query vector also participate in the dimensionality reduction process, thereby not only improving the safety, but also ensuring the query precision.
To improve security, the trapdoor is regenerated each time it is queried. This experiment tested the variation of the trap door time with the number of documents in the FDRQM scheme at different thresholds as shown in fig. 2, and compared to the MRSE scheme as shown in fig. 3. By
Figure BDA00022877557100000616
It can be known that the time for acquiring the trapdoor is related to the dimension of the key and the dimension of the query vector, and the dimension of the key and the query vector is smaller and the time for creating the trapdoor is shorter as the dimension reduction amplitude is larger and the threshold is smaller. The experiment compares the FDRQM scheme with the MRSE scheme, and as can be seen from FIG. 3, the FDRQM scheme with the threshold of 0.95 is also much better than the MRSE scheme in terms of acquiring the trapdoors
In the experiment, the query time of the FDRQM scheme with the threshold of 0.95 is compared with the query time of the MRSE scheme, and the change of the query time with the number of documents is shown in fig. 4, and the change of the query time with the number of keywords is shown in fig. 5. Since the dimension of the key increases with the increase of the number of documents and the number of keywords, the curves of both schemes are in an ascending trend. However, the query time of the MRSE scheme is rapidly increased, and compared with the MRSE scheme, the increase of the number of the documents and the number of the keywords has little influence on the query time of the FDRQM scheme.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a model of a ciphertext ordering search system;
FIG. 2 shows the variation of the trap door time created by the FDRQM scheme with the number of documents under different thresholds;
FIG. 3 illustrates the variation of trapdoor time with the number of documents for the MRSE scheme and the FDRQM scheme with a threshold of 0.95;
FIG. 4 query time as a function of number of documents;
FIG. 5 query time as a function of number of keywords.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the invention provides a ciphertext sequencing search method based on PCA, which comprises the following steps:
(1) extracting keywords from the document, calculating a standardized word frequency, establishing a keyword index, reducing the dimension of an index matrix through a PCA (principal component analysis) algorithm, encrypting the index after dimension reduction, and uploading the index to a server;
firstly, the steps of extracting keywords from the document are as follows: when a data owner processes data, keywords of each document are extracted firstly to generate a document keyword set, then keywords of all documents are gathered to generate a non-repeated keyword dictionary, and the number of the dictionary keywords is n. The embodiment of the invention uses the RFC document as a data set to carry out encryption query. Request For Comments (RFC), is a series of files that are arranged by number. The files collect information about the internet, and software files for UNIX and the internet community. RFC documents are currently published under the auspices of the Internet Society (ISOC). The basic internet communication protocol is specified in the RFC document. The RFC document also adds a number of additional topics within the standard, such as all records in the development and protocols newly developed for the internet. Almost all internet standards are therefore incorporated in RFC documents.
(1.1) creating an initial index
Calculating the normalized word frequency of each keyword in the dictionary in the document i to generate an index Di。DiIs denoted by Di=(di1,di2,…,dik,…,din)TWherein d isikIs the normalized word frequency of the k-th keyword in the corresponding dictionary in the ith document, so that the m documents form an initial index matrix D (D) with m × n dimensions1,D2,…,Dm)T
(1.2) introducing an identity matrix
An n × n identity matrix E is added after the initial index matrix, and the n × n identity matrices E are combined into a matrix F of (m + n) × n dimensions, where F ═ D1,D2,…,Dm,E)T
(1.3) random dimensionality reduction of initial index
And randomly setting a threshold value within a certain range, analyzing principal components of the matrix F according to a PCA algorithm, removing similar keywords to obtain a principal component matrix R with dimension of n multiplied by k, and reducing the dimension of the matrix F to ensure that F 'is FR and obtain a matrix F' with dimension of (m + n) multiplied by k. The following n-dimensional data is then deleted, resulting in an m × k-dimensional matrix D ', denoted as D' ═ D (D)1′,D′2,…,D′i,…,D′m)TWherein D'1,D′2,…,D′i,…,D′mIs a column vector of dimension k.
(1.4) generating a Key
The data owner randomly generates a reversible matrix M of two (k + u +1) × (k + u +1) dimensions1,M2As a key, and a vector S of dimension k + u +1 as a division indicator, which are denoted as S ═ S (S)1,s2,…,si,…,sk). Where S ∈ {0,1}(k+u+1)(j ═ 1,2, …, k), k being the dimension of each row vector in the reduced-dimension matrix, and u +1 being the extended dimension.
(1.5) index dimension extension
For each vector D 'in D'iPerforming dimension expansion from the dimension k to the dimension k + u +1 to obtain a matrix of mx (k + u +1)
Figure BDA0002287755710000081
Wherein each column vector is D'iThe first k dimension of (1) is kept constant, the last dimension is set as a constant 1, and the (k +1) th dimension to the (k + u) th dimension are set as arbitrary random numbers epsiloni. Expanded column vector
Figure BDA0002287755710000082
Is shown as
Figure BDA0002287755710000083
Expanded matrix
Figure BDA0002287755710000084
Is shown as
Figure BDA0002287755710000085
Wherein epsiloni1i2,…,εiuRepresent arbitrary random numbers and they obey the same uniform distribution U (μ' -c ).
(1.6) index random partitioning
Each column vector is vectored according to the value of the indicator vector S
Figure BDA0002287755710000086
Is divided into
Figure BDA0002287755710000087
And
Figure BDA0002287755710000088
the segmentation rule is as follows: if S [ n ]]Is equal to 0, then
Figure BDA0002287755710000091
If S [ n ]]Equal to 1, will
Figure BDA0002287755710000092
And
Figure BDA0002287755710000093
set to two non-equal and non-zero random numbers, and their sum equals
Figure BDA0002287755710000094
(1.7) index encryption
Using a secret key M1,M2For the index after division
Figure BDA0002287755710000095
And
Figure BDA0002287755710000096
encrypting to obtain the final encrypted index of the ith document
Figure BDA0002287755710000097
(1.8) uploading the encryption document set C and the encryption index set I to a cloud server by a data owner. Wherein I ═ I (I)1,I2,…,Ii,…,Im)。
(2) A data user inputs keywords, a query vector is established, dimensionality reduction is carried out on the query vector through a PCA algorithm, and a trapdoor is generated; the method comprises the following specific steps:
(2.1) creating a query vector
When a data user inquires, firstly, a keyword is input, and then a program carries out synonym and near-synonym expansion on the keywordAnd (5) unfolding to generate a query vector q. Each element of the query vector corresponds to n keywords, denoted q ═ q (q)1,q2,…,qi,…,qn) If the input keyword matches with the keyword in the keyword dictionary, q isiIs 1; otherwise it is 0.
(2.2) vector dimensionality reduction
The query vector q is dimensionality reduced such that q ' is qR, and q is reduced from n dimension to k dimension, denoted as q ' ═ q '1,q′2,…,q′i,…,q′k)。
(2.3) query vector dimension expansion
And (3) performing dimension expansion on q', from the dimension k to the dimension k + u +1, wherein the expansion rule is as follows: the v dimension is randomly selected from the (k +1) th dimension to the (k + u) th dimension of q' to be set to 1, the remaining dimensions are set to 0, the (k + u) th dimension is multiplied by a non-zero random number r, and then the (k + u +1) th dimension is set to be a random number t. The expanded query vector is represented as
Figure BDA0002287755710000098
(2.4) query vector random partitioning
Expanding the query vector according to the value of the indicator vector S
Figure BDA00022877557100000921
Randomly divided into two vectors
Figure BDA0002287755710000099
And
Figure BDA00022877557100000910
are respectively represented as
Figure BDA00022877557100000911
The segmentation rule is as follows: if S [ n ]]Equal to 0, will
Figure BDA00022877557100000912
And
Figure BDA00022877557100000913
set to two non-equal and non-zero random numbers, and their sum equals
Figure BDA00022877557100000914
If S [ n ]]Is equal to 1, then
Figure BDA00022877557100000915
(2.5) generating trapdoors
By reversal of the secret key
Figure BDA00022877557100000916
And
Figure BDA00022877557100000917
for query vector
Figure BDA00022877557100000918
And
Figure BDA00022877557100000919
encryption generates trapdoors T.
Figure BDA00022877557100000920
(3) The data user sends a request to the server, and the cloud server returns the sequencing result to the user through computing and searching operations after receiving the request, wherein the method specifically comprises the following steps:
(3.1) submitting the generated trap door to a cloud server by a data user for query;
(3.2) after the cloud server receives the trap door, calculating the inner product of the index and the trap door, sequencing the inner product in a descending order, and then returning k encrypted documents with higher scores to a data user; the inner product is calculated as follows:
Figure BDA0002287755710000101
the embodiment of the invention also provides a system corresponding to the method, namely a ciphertext sequencing search system based on PCA, which comprises the following steps:
the index establishing module is used for extracting keywords from the document, calculating the standardized word frequency, establishing a keyword index, reducing the dimension of an index matrix through a PCA algorithm, encrypting the index after dimension reduction, and uploading the index to a server;
the trap door creating module is used for creating a query vector according to the key words input by a data user, reducing the dimension of the query vector through a PCA algorithm and generating a trap door;
and the query module is used for sending a request to the server by a data user, and returning the sequencing result to the user through computing and searching operation after the request is received by the cloud server.
Further, the step of extracting the keywords from the document is specifically as follows:
when a data owner processes data, firstly extracting keywords of each document to generate a document keyword set, then summarizing the keywords of all documents to generate a non-repeated keyword dictionary, wherein the number of the keywords of the dictionary is n.
Further, the index establishing module includes:
creating an initial index unit for calculating the normalized word frequency of each keyword in the dictionary in the document i to generate an index Di,DiIs denoted by Di=(di1,di2,…,dik,…,din)TWherein d isikIs the normalized word frequency of the k-th keyword in the corresponding dictionary in the ith document, so that the m documents form an initial index matrix D (D) with m × n dimensions1,D2,…,Dm)T
Introducing an identity matrix unit for adding an n × n identity matrix E after the initial index matrix and merging the n × n identity matrix E into a matrix F with (m + n) × n dimensions, wherein F ═ D1,D2,…,Dm,E)T
An initial index random dimension reduction unit for randomly setting a threshold value in a certain range, analyzing the principal component of the matrix F according to a PCA algorithm, removing similar keywords to obtain a principal component matrix R with dimension of nxk, and then reducing the matrix FDimension, so that F 'becomes FR, a matrix F' of (m + n) × k dimensions is obtained, and then the following n-dimensional data is deleted to obtain a matrix D 'of m × k dimensions, expressed as D' ═ D (D)1′,D′2,…,D′i,…,D′m)TWherein D'1,D′2,…,D′i,…,D′mIs a column vector of dimension k;
a key generation unit for randomly generating a reversible matrix M of two (k + u +1) × (k + u +1) dimensions by a data owner1,M2As a key, and a vector S of dimension k + u +1 as a division indicator, which are denoted as S ═ S (S)1,s2,…,si,…,sk) Where S is {0,1}(k+u+1)(j ═ 1,2, …, k), k being the dimension of each row vector in the reduced-dimension matrix, u +1 being the extended dimension;
an index dimension expanding unit for expanding each vector D in Di' dimension expansion from the k dimension to the k + u +1 dimension, resulting in a matrix of mx (k + u +1)
Figure BDA00022877557100001115
Wherein each column vector Di' the front k dimension remains unchanged, the last dimension is set to a constant 1, and the k +1 to k + u dimensions are set to any random number εi(ii) a Expanded column vector
Figure BDA0002287755710000111
Is shown as
Figure BDA0002287755710000112
Expanded matrix
Figure BDA0002287755710000113
Is shown as
Figure BDA0002287755710000114
Wherein epsiloni1i2,…,εiuRepresent arbitrary random numbers and they obey the same uniform distribution U (μ' -c );
an index random division unit for dividing each column according to the value of the indication vector S(Vector)
Figure BDA0002287755710000115
Is divided into
Figure BDA0002287755710000116
And
Figure BDA0002287755710000117
the segmentation rule is as follows: if S [ n ]]Is equal to 0, then
Figure BDA0002287755710000118
If S [ n ]]Equal to 1, will
Figure BDA0002287755710000119
And
Figure BDA00022877557100001110
set to two non-equal and non-zero random numbers, and their sum equals
Figure BDA00022877557100001111
Index encryption unit using key M1,M2For the index after division
Figure BDA00022877557100001112
And
Figure BDA00022877557100001113
encrypting to obtain the final encrypted index of the ith document
Figure BDA00022877557100001114
An uploading unit, configured to upload, by a data owner, an encrypted document set C and an encrypted index set I to a cloud server, where I ═ I (I ═ I)1,I2,…,Ii,…,Im)。
Further, the creating a trapdoor module comprises:
creating query vector unit for data user to queryFirstly, inputting key words, and then carrying out synonym and near synonym expansion on the key words by a program to generate a query vector q; each element of the query vector corresponds to n keywords, denoted q ═ q (q)1,q2,…,qi,…,qn) If the input keyword matches with the keyword in the keyword dictionary, q isiIs 1; otherwise, the value is 0;
a vector dimension reduction unit for reducing the query vector q so that q ' is qR, reducing q from n dimension to k dimension, denoted as q ' — (q '1,q′2,…,q′i,…,q′k);
A query vector dimension expansion unit, configured to perform dimension expansion on q', from the k dimension to the k + u +1 dimension, according to the following expansion rule: randomly selecting v dimension from the k +1 dimension to the k + u dimension of q' to be set as 1, setting the other dimensions as 0, multiplying the k + u dimension by a non-zero random number r, and then setting the k + u +1 dimension as a random number t; the expanded query vector is represented as
Figure BDA0002287755710000121
A query vector random division unit for dividing the expanded query vector according to the value of the indication vector S
Figure BDA0002287755710000122
Randomly divided into two vectors
Figure BDA0002287755710000123
And
Figure BDA0002287755710000124
are respectively represented as
Figure BDA0002287755710000125
The segmentation rule is as follows: if S [ n ]]Equal to 0, will
Figure BDA0002287755710000126
And
Figure BDA0002287755710000127
set to two non-equal and non-zero random numbers, and their sum equals
Figure BDA0002287755710000128
If S [ n ]]Is equal to 1, then
Figure BDA0002287755710000129
Generating a trapdoor unit for inversion with a secret key
Figure BDA00022877557100001210
And
Figure BDA00022877557100001211
for query vector
Figure BDA00022877557100001212
And
Figure BDA00022877557100001213
the encryption generates a trapdoor T that,
Figure BDA00022877557100001214
further, the query module comprises:
the submitting unit is used for submitting the generated trapdoor to a cloud server for query by a data user;
the query returning unit is used for calculating the inner product of the index and the trapdoors and sequencing the inner product in a descending order after the cloud server receives the trapdoors, and then returning k encrypted documents with higher scores to the data user; the inner product is calculated as follows:
Figure BDA00022877557100001215
the ciphertext sequencing search system mainly has three roles: data owner, cloud server and data consumer. The relationship between the three is shown in figure 1.
The method comprises the following steps that a data owner firstly extracts document keywords, creates a keyword index of each document, encrypts the keyword index and document information by using a key, and uploads the encrypted index and the encrypted document to a cloud server, wherein the cloud server does not know plaintext information of the index and does not have access to the content of the encrypted document; the method comprises the steps that authorized data users input keywords to generate query vectors, trapdoors are obtained through safety control, the generated trapdoors are submitted to a cloud server, after the cloud server receives a search request, the safe inner product of the trapdoors and index vectors of all documents is calculated, so that the keyword score of each document is obtained, then descending sorting is carried out according to the scores, and the top k encrypted documents in the sorting sequence are returned to the data users. After receiving the data, the data user obtains the key of the encrypted document through access control, and decrypts the document.
In order to better evaluate the reliability of the encryption algorithm, the attack categories of the cloud server are divided into different levels according to the acquired information. The embodiment of the invention adopts the following attack models:
and in the level 1, the cloud server can observe an encrypted data set C and an encrypted index set I uploaded by a data owner and a query trapdoor T submitted by a data user.
And 2, on the basis of the level 1, the cloud server can acquire more information, for example, the cloud server judges the relevance of the query trapdoor by combining the existing trapdoor and a query result, or analyzes the encryption process, and reversely deduces an encryption key by using encrypted background information.
Since the relevance of different keywords in different documents is different. To reflect the importance of each keyword to different documents, document scores are introduced herein. The document score is the basis for ranking and returning search results. The invention adopts word frequency and anti-word frequency (tf. idf) to calculate the document score. The word frequency represents the occurrence frequency of the keywords in the document, and the more the occurrence frequency is, the more important the keywords are to the document; the anti-word frequency represents the number of documents containing the keywords, and the more documents containing the keywords, the lower the distinguishing degree of the keywords to the documents. Document H at the time of submission of query Q is computed herein using tf · idf normalized in equation (1)iIs scored.
Figure BDA0002287755710000131
Wherein f isi,bRepresenting a keyword wbIn document HiThe number of times of occurrence of (a),
Figure BDA0002287755710000132
presentation document HiContaining a set of keywords, fiRepresenting the number of documents containing the keyword, and m represents the total number of documents. In creating the index, each dimension of the vector is set to the product of the normalized word frequency and the normalized anti-word frequency of the corresponding keyword.
To improve security, the trapdoor is regenerated each time it is queried. This experiment tested the variation of the trap door time with the number of documents in the FDRQM scheme at different thresholds as shown in fig. 2, and compared to the MRSE scheme as shown in fig. 3. By
Figure BDA0002287755710000133
It can be known that the time for acquiring the trapdoor is related to the dimension of the key and the dimension of the query vector, and the dimension of the key and the query vector is smaller and the time for creating the trapdoor is shorter as the dimension reduction amplitude is larger and the threshold is smaller. The experiment compares the FDRQM scheme with the MRSE scheme, and as can be seen from FIG. 3, the FDRQM scheme with the threshold of 0.95 is also much better than the MRSE scheme in terms of acquiring the trapdoors
In the experiment, the query time of the FDRQM scheme with the threshold of 0.95 is compared with the query time of the MRSE scheme, and the change of the query time with the number of documents is shown in fig. 4, and the change of the query time with the number of keywords is shown in fig. 5. Since the dimension of the key increases with the increase of the number of documents and the number of keywords, the curves of both schemes are in an ascending trend. However, the query time of the MRSE scheme is rapidly increased, and compared with the MRSE scheme, the increase of the number of the documents and the number of the keywords has little influence on the query time of the FDRQM scheme.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (6)

1. A ciphertext sorting search method based on PCA is characterized by comprising the following steps:
(1) extracting keywords from the document, calculating a standardized word frequency, establishing a keyword index, reducing the dimension of an index matrix through a PCA algorithm, encrypting the index after dimension reduction, and uploading the index to a server;
(2) a data user inputs keywords, a query vector is established, dimensionality reduction is carried out on the query vector through a PCA algorithm, and a trapdoor is generated;
(3) the data user sends a request to the server, and the cloud server receives the request and then returns the sequencing result to the user through calculation and search operation;
in the step (1), the step of extracting the keywords from the document is specifically as follows:
when a data owner processes data, firstly extracting keywords of each document to generate a document keyword set, then summarizing the keywords of all documents to generate a non-repeated keyword dictionary, wherein the number of the keywords of the dictionary is n;
in the step (1), the normalized word frequency is calculated, the keyword index is established, then the dimensionality of the index matrix is reduced through a PCA algorithm, the index after dimensionality reduction is encrypted and uploaded to a server, and the method specifically comprises the following steps:
(1.1) creating an initial index
Calculating the normalized word frequency of each keyword in the dictionary in the document i to generate an index Di,DiIs denoted by Di=(di1,di2,…,dik,…,din)TWherein d isikIs the normalized word frequency of the k-th keyword in the corresponding dictionary in the ith document, so that the m documents form an initial index matrix D (D) with m × n dimensions1,D2,…,Dm)T
(1.2) introducing an identity matrix
An n × n identity matrix E is added after the initial index matrix, and the n × n identity matrices E are combined into a matrix F of (m + n) × n dimensions, where F ═ D1,D2,…,Dm,E)T
(1.3) random dimensionality reduction of initial index
Randomly setting a threshold value within a certain range, analyzing principal components of a matrix F according to a PCA algorithm, removing similar keywords to obtain a principal component matrix R with dimension of n multiplied by k, and then reducing the dimension of the matrix F to ensure that F 'is FR and obtain a matrix F' with dimension of (m + n) multiplied by k; the following n-dimensional data is then deleted, resulting in an m × k-dimensional matrix D ', denoted as D' ═ D (D)1′,D2′,…,Di′,…,D′m)TWherein D'1,D′2,…,D′i,…,D′mIs a column vector of dimension k;
(1.4) generating a Key
The data owner randomly generates a reversible matrix M of two (k + u +1) × (k + u +1) dimensions1,M2As a key, and a vector S of dimension k + u +1 as a division indicator, which are denoted as S ═ S (S)1,s2,…,si,…,sk) Where S is {0,1}(k+u+1)(j ═ 1,2, …, k), k being the dimension of each row vector in the reduced-dimension matrix, u +1 being the extended dimension;
(1.5) index dimension extension
For each vector D 'in D'iPerforming dimension expansion from the dimension k to the dimension k + u +1 to obtain a matrix of mx (k + u +1)
Figure FDA0003574133750000021
Wherein each column vector is D'iThe first k dimension of (1) is kept constant, the last dimension is set as a constant 1, and the (k +1) th dimension to the (k + u) th dimension are set as arbitrary random numbers epsiloni(ii) a Expanded column vector
Figure FDA0003574133750000022
Is shown as
Figure FDA0003574133750000023
Expanded matrix
Figure FDA0003574133750000024
Is shown as
Figure FDA0003574133750000025
Wherein epsiloni1i2,…,εiuRepresent arbitrary random numbers and they obey the same uniform distribution U (μ' -c );
(1.6) index random partitioning
Each column vector is indexed according to the value of the vector S
Figure FDA0003574133750000026
Is divided into
Figure FDA0003574133750000027
And
Figure FDA0003574133750000028
the segmentation rule is as follows: if S [ n ]]Is equal to 0, then
Figure FDA0003574133750000029
If S [ n ]]Equal to 1, will
Figure FDA00035741337500000210
And
Figure FDA00035741337500000211
set to two non-equal and non-zero random numbers, and their sum equals
Figure FDA00035741337500000212
(1.7) index encryption
Using a secret key M1,M2For the index after division
Figure FDA00035741337500000213
And
Figure FDA00035741337500000214
encrypting to obtain the final encrypted index of the ith document
Figure FDA00035741337500000215
(1.8) the data owner uploads an encrypted fileset C and an encrypted index set I to the cloud server, where I ═ I (I ═ I)1,I2,…,Ii,…,Im)。
2. The ciphertext ordering search method based on PCA according to claim 1, wherein the step (2) is specifically as follows:
(2.1) creating a query vector
When a data user inquires, firstly inputting key words, and then carrying out synonym and near synonym expansion on the key words by a program to generate a query vector q; each element of the query vector corresponds to n keywords, denoted q ═ q (q)1,q2,…,qi,…,qn) If the input keyword matches with the keyword in the keyword dictionary, q isiIs 1; otherwise, the value is 0;
(2.2) vector dimensionality reduction
The query vector q is dimensionality reduced such that q ' is qR, and q is reduced from n dimension to k dimension, denoted as q ' ═ q '1,q′2,…,q′i,…,q′k);
(2.3) query vector dimension expansion
And (3) performing dimension expansion on q', from the dimension k to the dimension k + u +1, wherein the expansion rule is as follows: randomly selecting v dimension from the k +1 dimension to the k + u dimension of q' to be set as 1, setting the other dimensions as 0, multiplying the k + u dimension by a non-zero random number r, and then setting the k + u +1 dimension as a random number t; the expanded query vector is represented as
Figure FDA0003574133750000031
(2.4) query vector random partitioning
Expanding the query vector according to the value of the indicator vector S
Figure FDA00035741337500000315
Randomly divided into two vectors
Figure FDA0003574133750000032
And
Figure FDA0003574133750000033
are respectively represented as
Figure FDA0003574133750000034
The segmentation rule is as follows: if S [ n ]]Equal to 0, will
Figure FDA0003574133750000035
And
Figure FDA0003574133750000036
set to two non-equal and non-zero random numbers, and their sum equals
Figure FDA0003574133750000037
If S [ n ]]Is equal to 1, then
Figure FDA0003574133750000038
(2.5) generating trapdoors
By reversal of the secret key
Figure FDA0003574133750000039
And
Figure FDA00035741337500000310
for query vector
Figure FDA00035741337500000311
And
Figure FDA00035741337500000312
the encryption generates a trapdoor T that,
Figure FDA00035741337500000313
3. the ciphertext ordering search method based on PCA according to claim 2, wherein the step (3) is specifically as follows:
(3.1) submitting the generated trap door to a cloud server by a data user for query;
(3.2) after the cloud server receives the trap door, calculating the inner product of the index and the trap door, sequencing the inner product in a descending order, and then returning k encrypted documents with higher scores to a data user; the inner product is calculated as follows:
Figure FDA00035741337500000314
4. a ciphertext sorted search system based on PCA, comprising:
the index establishing module is used for extracting keywords from the document, calculating the standardized word frequency, establishing a keyword index, reducing the dimension of an index matrix through a PCA algorithm, encrypting the index after dimension reduction, and uploading the index to a server;
the trap door creating module is used for creating a query vector according to the key words input by a data user, reducing the dimension of the query vector through a PCA algorithm and generating a trap door;
the query module is used for sending a request to the server by a data user, and the cloud server returns a sequencing result to the user through computing and searching operation after receiving the request;
the steps of extracting the keywords from the document are as follows:
when a data owner processes data, firstly extracting keywords of each document to generate a document keyword set, then summarizing the keywords of all documents to generate a non-repeated keyword dictionary, wherein the number of the dictionary keywords is n;
the index establishing module comprises:
creating an initial index unit for calculating the normalized word frequency of each keyword in the dictionary in the document i to generate an index Di,DiIs denoted by Di=(di1,di2,…,dik,…,din)TWherein d isikIs the normalized word frequency of the k-th keyword in the corresponding dictionary in the ith document, so that the m documents form an initial index matrix D (D) with m × n dimensions1,D2,…,Dm)T
Introducing an identity matrix unit for adding an n × n identity matrix E after the initial index matrix and merging the n × n identity matrix E into a matrix F with (m + n) × n dimensions, wherein F ═ D1,D2,…,Dm,E)T
An initial index random dimension reduction unit, configured to randomly set a threshold value within a certain range, analyze principal components of the matrix F according to a PCA algorithm, remove similar keywords, obtain a principal component matrix R of n × k dimensions, then reduce the dimension of the matrix F such that F ' is FR, obtain a matrix F ' of (m + n) × k dimensions, then delete the following n-dimensional data, obtain a matrix D ' of m × k dimensions, which is expressed as D ' ═ D '1,D′2,…,D′i,…,D′m)TWherein D'1,D′2,…,D′i,…,D′mIs a column vector of dimension k;
a key generation unit for randomly generating a reversible matrix M of two (k + u +1) × (k + u +1) dimensions by a data owner1,M2As a key, and a vector S of dimension k + u +1 as a division indicator, which are denoted as S ═ S (S)1,s2,…,si,…,sk) Where S is {0,1}(k+u+1)(j ═ 1,2, …, k), k being the dimension of each row vector in the reduced-dimension matrix, u +1 being the extended dimension;
an index dimension extension unit for extending each vector D 'in D'iTo carry outDimension expansion from the k dimension to the k + u +1 dimension to obtain a matrix of mx (k + u +1)
Figure FDA0003574133750000041
Wherein each column vector is D'iThe first k dimension of (1) is kept constant, the last dimension is set as a constant 1, and the (k +1) th dimension to the (k + u) th dimension are set as arbitrary random numbers epsiloni(ii) a Expanded column vector
Figure FDA0003574133750000042
Is shown as
Figure FDA0003574133750000043
Expanded matrix
Figure FDA0003574133750000044
Is shown as
Figure FDA0003574133750000045
Wherein epsiloni1i2,…,εiuRepresent arbitrary random numbers and they obey the same uniform distribution U (μ' -c );
an index random division unit for dividing each column vector according to the value of the indication vector S
Figure FDA0003574133750000046
Is divided into
Figure FDA0003574133750000047
And
Figure FDA0003574133750000048
the segmentation rule is as follows: if S [ n ]]Is equal to 0, then
Figure FDA0003574133750000049
If S [ n ]]Equal to 1, then will
Figure FDA00035741337500000410
And
Figure FDA00035741337500000411
set to two non-equal and non-zero random numbers, and their sum equals
Figure FDA00035741337500000412
Index encryption unit using key M1,M2For the index after division
Figure FDA00035741337500000413
And
Figure FDA00035741337500000414
encrypting to obtain the final encrypted index of the ith document
Figure FDA00035741337500000415
An uploading unit, configured to upload, by a data owner, an encrypted document set C and an encrypted index set I to a cloud server, where I ═ I (I ═ I)1,I2,…,Ii,…,Im)。
5. The ciphertext sorted search system of claim 4, wherein the create trapdoor module comprises:
the method comprises the steps of establishing a query vector unit, wherein the query vector unit is used for firstly inputting key words when a data user queries, and then carrying out synonym and near synonym expansion on the key words by a program to generate a query vector q; each element of the query vector corresponds to n keywords, denoted q ═ q (q)1,q2,…,qi,…,qn) If the input keyword matches with the keyword in the keyword dictionary, qiIs 1; otherwise, the value is 0;
a vector dimension reduction unit for reducing the dimension of the query vector q so that q ' is qR, and reducing q from n dimension to k dimension is denoted as q ' (q ').1,q′2,…,q′i,…,q′k);
A query vector dimension expansion unit, configured to perform dimension expansion on q', from the k dimension to the k + u +1 dimension, where the expansion rule is as follows: randomly selecting v dimension from the k +1 dimension to the k + u dimension of q' to be set as 1, setting the other dimensions as 0, multiplying the k + u dimension by a non-zero random number r, and then setting the k + u +1 dimension as a random number t; the expanded query vector is represented as
Figure FDA0003574133750000051
A query vector random division unit for dividing the expanded query vector according to the value of the indication vector S
Figure FDA0003574133750000052
Randomly divided into two vectors
Figure FDA0003574133750000053
And
Figure FDA0003574133750000054
are respectively represented as
Figure FDA0003574133750000055
The segmentation rule is as follows: if S [ n ]]Equal to 0, will
Figure FDA0003574133750000056
And
Figure FDA0003574133750000057
set to two non-equal and non-zero random numbers, and their sum equals
Figure FDA0003574133750000058
If S [ n ]]Is equal to 1, then
Figure FDA0003574133750000059
To form a trapdoor unitFor inversion of keys
Figure FDA00035741337500000510
And
Figure FDA00035741337500000511
for query vector
Figure FDA00035741337500000512
And
Figure FDA00035741337500000513
the encryption generates a trapdoor T that,
Figure FDA00035741337500000514
6. the ciphertext ordering search system based on PCA of claim 5, wherein the query module comprises:
the submitting unit is used for submitting the generated trapdoor to a cloud server for query by a data user;
the query returning unit is used for calculating the inner product of the index and the trapdoors and sequencing the inner product in a descending order after the cloud server receives the trapdoors, and then returning k encrypted documents with higher scores to the data user; the inner product is calculated as follows:
Figure FDA00035741337500000515
CN201911167134.1A 2019-11-25 2019-11-25 Cipher text sequencing search method and system based on PCA Active CN112836005B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911167134.1A CN112836005B (en) 2019-11-25 2019-11-25 Cipher text sequencing search method and system based on PCA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911167134.1A CN112836005B (en) 2019-11-25 2019-11-25 Cipher text sequencing search method and system based on PCA

Publications (2)

Publication Number Publication Date
CN112836005A CN112836005A (en) 2021-05-25
CN112836005B true CN112836005B (en) 2022-05-17

Family

ID=75922288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911167134.1A Active CN112836005B (en) 2019-11-25 2019-11-25 Cipher text sequencing search method and system based on PCA

Country Status (1)

Country Link
CN (1) CN112836005B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240045B (en) * 2021-06-01 2024-03-08 平安科技(深圳)有限公司 Data dimension reduction method and device and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622212A (en) * 2017-10-13 2018-01-23 上海海事大学 A kind of mixing cipher text retrieval method based on double trapdoors
CN108228849A (en) * 2018-01-10 2018-06-29 浙江理工大学 Ciphertext sorted search method based on classification packet index in cloud network
CN108563732A (en) * 2018-04-08 2018-09-21 浙江理工大学 Towards encryption cloud data multiple-fault diagnosis sorted search method in a kind of cloud network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001195442A (en) * 2000-01-07 2001-07-19 Fujitsu Ltd System and device for model management

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622212A (en) * 2017-10-13 2018-01-23 上海海事大学 A kind of mixing cipher text retrieval method based on double trapdoors
CN108228849A (en) * 2018-01-10 2018-06-29 浙江理工大学 Ciphertext sorted search method based on classification packet index in cloud network
CN108563732A (en) * 2018-04-08 2018-09-21 浙江理工大学 Towards encryption cloud data multiple-fault diagnosis sorted search method in a kind of cloud network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向高效加密云数据排序搜索的类别分组索引方法;刘良桂等;《电子学报》;20190215;全文 *

Also Published As

Publication number Publication date
CN112836005A (en) 2021-05-25

Similar Documents

Publication Publication Date Title
CN108388807B (en) Efficient and verifiable multi-keyword sequencing searchable encryption method supporting preference search and logic search
CN110110163B (en) Secure substring search to filter encrypted data
CN109493017B (en) Trusted outsourcing storage method based on block chain
CN109063509A (en) It is a kind of that encryption method can search for based on keywords semantics sequence
CN106936771A (en) A kind of secure cloud storage method and system based on graded encryption
CN108363689B (en) Privacy protection multi-keyword Top-k ciphertext retrieval method and system facing hybrid cloud
CN112989375B (en) Hierarchical optimization encryption lossless privacy protection method
CN109739945B (en) Multi-keyword ciphertext sorting and searching method based on mixed index
CN112332979A (en) Ciphertext searching method, system and equipment in cloud computing environment
CN112446041A (en) Verifiable multi-keyword ciphertext query method and system based on security index
Hu et al. Efficient and secure multi‐functional searchable symmetric encryption schemes
CN110737912A (en) thesis duplicate checking method based on homomorphic encryption
CN110222012B (en) Data ciphertext query method based on fine-grained sequencing in single user environment
CN112836005B (en) Cipher text sequencing search method and system based on PCA
CN112966086A (en) Verifiable fuzzy search method based on position sensitive hash function
Zhao et al. Forward privacy multikeyword ranked search over encrypted database
CN114528370A (en) Dynamic multi-keyword fuzzy ordering searching method and system
Zhao et al. Privacy-preserving personalized search over encrypted cloud data supporting multi-keyword ranking
CN113158245A (en) Method, system, equipment and readable storage medium for searching document
CN114398660A (en) High-efficiency fuzzy searchable encryption method based on Word2vec and ASPE
Xue et al. Cuckoo-filter based privacy-aware search over encrypted cloud data
Manasrah et al. A privacy-preserving multi-keyword search approach in cloud computing
Sude et al. Authenticated CRF based improved ranked multi-keyword search for multi-owner model in cloud computing
Liu et al. A secure multi-keyword fuzzy search with polynomial function for encrypted data in cloud computing
Chen et al. Memory leakage-resilient dynamic and verifiable multi-keyword ranked search on encrypted smart body sensor network data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant