CN115757435A - Screening factor determination method supporting semantic perception ciphertext retrieval acceleration - Google Patents
Screening factor determination method supporting semantic perception ciphertext retrieval acceleration Download PDFInfo
- Publication number
- CN115757435A CN115757435A CN202211579597.0A CN202211579597A CN115757435A CN 115757435 A CN115757435 A CN 115757435A CN 202211579597 A CN202211579597 A CN 202211579597A CN 115757435 A CN115757435 A CN 115757435A
- Authority
- CN
- China
- Prior art keywords
- semantic
- keyword
- sequence
- document
- relevance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of information retrieval, and discloses a screening factor determination method supporting semantic perception ciphertext retrieval acceleration, wherein a semantic relevancy division sequence between each keyword and each document is established in the first stage: calculating a semantic vector of each document by using a semantic perception model; extracting keywords and calculating a semantic vector of each keyword; calculating the semantic relevance of each keyword and each document to form a semantic relevance sequence, and sequencing the sequence in a descending order; performing division, and generating a semantic relevancy division sequence of each keyword and each document for each keyword; and in the second stage, according to the search keywords, sequences are divided by utilizing semantic relevance, and screening factors are calculated and determined. The method for determining the accelerated screening factor is suitable for application scenes based on tree structure indexes in ciphertext retrieval supporting semantic perception, can obviously improve retrieval speed, and has no influence on the accuracy of a search result.
Description
Technical Field
The invention belongs to the technical field of information retrieval, and particularly relates to a screening factor determination method supporting semantic perception ciphertext retrieval acceleration.
Background
With the continuous development of internet technology and the increasing number of various software users, the data size is increasingly huge, and the localized data storage cannot meet the increasing business requirements. To address this dilemma, people have turned to outsourcing the data to cloud servers. Users may use computing resources in amounts as desired by individuals. In short, cloud computing uses the transmission capability of the internet to transmit data information from a local server to the internet and perform data processing on the internet. Although cloud computing has many advantages, there are also some problems, such as data privacy issues. In order to protect the privacy of outsourced data, the most common and most direct method is to encrypt the data before the outsourced data is sent to the cloud server, and then to outsource the encrypted data to the cloud server. However, the availability of encrypted data is reduced, and it is difficult to perform basic operations such as data retrieval. Meanwhile, the semanteme of the encrypted data is reduced, and the semantic relation between the data and the retrieval is difficult to find. Therefore, many searchable encryption methods capable of efficiently and accurately retrieving data on the cloud server while ensuring privacy of outsourced data are proposed.
In recent years, the searchable encryption method proposed by researchers mainly adopts a tree structure index on an index structure to carry out sequencing retrieval on encrypted documents, and the method searches out the most relevant top-k encrypted documents through depth-first search by constructing a tree structure index with simple structure and self safety. For example, the article "Xia Z, wang X, sun X, et al. A secure and dynamic Multi-key clustered Search scheme over Encrypted bound data. IEEE transactions on parallel and distributed systems,2015" uses binary balanced tree index, "Dai H, dai X, yi X, et al. Sematic-aware Multi-key clustered data. Journal of Network and computers, 2019" uses full binary tree index containing semantic feature information, "Hu Z, dai H, yang G, yi X, sheng W. Multi-key clustered Search scheme Z, data H, cloud G, yi X, sheng W. Multi-base-linked Search scheme" uses the implied tree index, the semantic tree index, 2022. Uses the semantic tree index to Search for information.
The general method of the searchable encryption is to convert the documents and the keywords into vector representation, store the documents by using the tree index, encrypt the documents and the index and send the encrypted documents and index to the cloud server. After the user submits the search to the cloud server, the cloud server retrieves the encrypted tree index, returns a ciphertext required by the user, and decrypts the ciphertext. Because the prior retrieval method based on the tree index usually uses depth-first search, and in the searching process, the retrieval screening factors are updated from 0 according to the traversed leaf nodes, and subtrees which do not meet the requirements are pruned by using the screening factors, thereby accelerating the retrieval process; however, the existing tree-index-based retrieval method is as in the three papers mentioned above, in which the initial retrieval filtering factors are all set to 0, if an appropriate filtering factor can be predetermined before the search is started, more sub-trees which do not meet the requirement can be filtered out in the early stage of the search, and the process of depth-first search is accelerated.
Disclosure of Invention
In order to solve the technical problem, the invention provides a screening factor determination method supporting semantic perception ciphertext retrieval acceleration, which can improve the retrieval efficiency under the condition of not influencing the retrieval result precision.
The invention relates to a screening factor determination method supporting semantic perception ciphertext retrieval acceleration, which comprises the following steps:
step 1, establishing a semantic relevancy division sequence of each keyword and each document for each keyword;
and 2, dividing the sequence by utilizing the semantic relevance according to the search keywords, and calculating a screening factor.
Further, step 1 specifically comprises:
step 1a, calculating each document D in the document set D by utilizing a semantic perception model j D = { D }, D = 1 ,d 2 ,…,d j ,…,d n J ranges from 1 to n; from each document d j Extracting keywords from the database to generate a keyword set W, W = { W = { W = } 1 ,w 2 ,…,w i ,…,w m I value range 1-m, and calculate each keyword w i The semantic vector of (2);
step 1b, for each keyword W in W i E.g. W, calculating the relation between the e and each document D in the document set D j Semantic relevance relevelence (w) for e D i ,d j ) Establishing w i Semantic relevance sequence L with each document in D i Then, sequencing the sequence according to descending order;
step 1c, according to w i Semantic relatedness sequence L of i And a given segmentation parameter tau, dividing the sequence by equal amounts to generate w i Semantic relatedness partitioning sequence with documentsEach partition is represented as a doubletWhereinAndrepresenting the upper and lower boundaries of this partition.
Further, step 1c specifically includes:
step 1c1, for each w i With relevance scores of documents in DArranging in descending order to generate a semantic relevance sequence L i (ii) a For each keyword W in W i For L, based on the segmentation parameter τ i Making equal partition to construct w i Corresponding includeSemantic relatedness of individual partitions;
step 1c2, partition sequenceWherein front isEach partition contains tau relevance scores, the last partition contains less than or equal to tau, and for any two adjacent partitionsAndin the case of a non-woven fabric,is greater thanAny of the relevancy scores;
further, step 1c3 specifically includes:
for w i Corresponding SPT i Each partition of (1), which divides the doubletAndwherein rand (X, y) represents a random value between X and y, min (X) represents a minimum value of an element in the set X, and max (X) represents a maximum value of an element in the set X:
further, step 2 specifically comprises:
step 2a, if Q is a retrieval keyword set delivered by a user, k is the number of documents required to be retrieved by the user; for each search keyword w in Q n And the value range of n is 1- | Q |, and the union U of the document mark sets in the previous x semantic relevance partitions is calculated x ,If U is x Satisfies the following formula conditions, thenIs namely w n Corresponding local retrieval screening factors;
step 2b, searching all the keywords w in the set Q n Calculating a final screening factor t according to the following formula;
further, step 2a specifically includes:
for each keyword w n Said U x The calculation method of (2) is as follows:
the invention has the beneficial effects that: 1. by utilizing the retrieval screening factor determination method, more sub-trees which do not meet the requirements can be screened out, and the encryption searching process is remarkably accelerated;
2. the invention utilizes the semantic relevance to divide the sequence and confirm and search the screening factor, this searches the screening factor and will not reveal every file and key word between the relevance score, and search the screening factor and is smaller than the relevance score of the last file in the candidate result set, will not miss and examine; therefore, the invention can accelerate the retrieval process on the premise of ensuring that the retrieval result is not changed;
3. the invention supports the ciphertext retrieval application scene of tree structure index based on semantic perception, does not depend on a specific keyword and document correlation quantification method, can be used by all correlation measurement methods (LDAmodel, BERT model) based on semantic perception, and has stronger universality.
Drawings
FIG. 1 is a flow chart of a search screening factor determination method according to the present invention;
FIG. 2 is a schematic diagram of a semantic relatedness partitioning sequence generated by the present invention;
FIG. 3 is a diagram illustrating an exemplary search process with a search screening factor of 0 according to the present invention;
FIG. 4 is a diagram illustrating an exemplary search process with a search screening factor of 0.51 according to the present invention.
Detailed Description
In order that the manner in which the present invention is attained and can be understood in detail, a more particular description of the invention briefly summarized above may be had by reference to the embodiments thereof which are illustrated in the appended drawings.
For convenience of description, the following definitions are now made for the relevant symbols:
document set D = { D = { [ D ] 1 ,d 2 ,…,d n D, the words contained in each document form a keyword set W = { W = { (W) 1 ,w 2 ,…,w m Q is a retrieval keyword set submitted by a user, and k is the number of documents to be returned for retrieval; relevance (w) i ,d j ) Representing a keyword w i And document d j Single keyword-single document semantic relatedness score therebetween; SPT i Is w i Dividing a sequence by the semantic relevance of each document;andrepresents w i Upper and lower boundaries of the xth partition.
FIG. 1 is a flow chart of the present invention depicting a method of screening factor computation to support semantic aware ciphertext retrieval acceleration. Calculating a semantic vector of each document by using a semantic perception model; extracting keywords from the document, and calculating a semantic vector of each keyword; calculating the semantic relevance of each keyword and each document to form a semantic relevance sequence, and sequencing the sequence according to descending order; performing division, and generating a semantic relevancy division sequence of each keyword and each document for each keyword; and according to the search keywords, dividing the sequence by using the semantic relevance of the subject, and calculating and determining a screening factor.
The invention relates to a screening factor determination method supporting semantic perception ciphertext retrieval acceleration, which comprises two stages: (1) constructing semantic correlation degree division sequence stage; (2) calculating and determining screening factors;
the first stage is as follows: and constructing a semantic relevancy division sequence of each keyword and each document for each keyword.
The method comprises the following specific steps:
step 1a, calculating each document d by utilizing a semantic perception model j The semantic vector of (2); extracting keywords from the document, generating a keyword set W, and calculating each keyword W i The semantic vector of (2);
step 1b, aiming at each keyword W in W i E.g. W, calculate it and every document D in D j Semantic relevance (w) for E D i ,d j ) Establishing w i Semantic relevance sequence L to each document in D i Then, sequencing the sequence according to descending order;
step 1c, according to w i Semantic relatedness sequence L of i And a given segmentation parameter tau, equally dividing the sequence to generate w i Semantic relatedness partitioning sequence with documentsEach partition is represented as a doubletWhereinAndrepresenting the upper and lower boundaries of this partition; generated semantic relatedness partitioning sequence SPT i As shown in fig. 2, the specific generation steps are as follows:
step 1c1, for each w i The relevance scores of the documents in the D are arranged in a descending order to generate a semantic relevance sequence L i (ii) a For each keyword W in W i According to the segmentation parameter tau, for L i Perform equal-amount scribingRespectively, construct w i Corresponding comprisesSemantic relatedness of each partition;
step 1c2, partition sequenceWherein frontEach partition contains tau relevance scores, the last partition contains a number of documents equal to or less than tau, and for any two adjacent partitionsAndin the case of a non-woven fabric,is greater thanAny of the relevancy scores;
step 1c3 for SPT i Each of which is partitionedConstruction binary setFor computing each partitionAndthe calculation method is as follows. Where rand (X, y) represents a random value between X and y, and min (X) represents the most significant element in the set XSmall value, max (X) represents the maximum value of the elements in set X;
and a second stage: according to the search keywords, dividing the sequence by using the semantic relevance of the subject, and calculating a screening factor:
step 2a, if Q is a retrieval keyword set delivered by a user, k is the number of documents required to be retrieved by the user; for each search keyword w in Q n And the value range of n is 1- | Q |, and the union U of the document mark sets in the previous x semantic relevance partitions is calculated x ,For each keyword w n Said U x The calculation method of (2) is as follows:
if U is x Satisfies the following formula conditions, thenIs namely w n Corresponding local retrieval screening factors;
step 2b, searching all the keywords w in the set Q n The final value is calculated according to the following formulaThe screening factor t of (1).
The effect of the accelerated Search process of the present invention will be described by taking the method described in the paper "Hu Z, dai H, yang G, yi X, sheng W.Semantic-Based Multi-key Search Schemes over Encrypted Cloud data. Security and Communication Networks,2022.
Assume document set D =<d 1 ,d 3 ,d 4 ,d 2 ,d 6 ,d 5 >And constructing a tree index according to the tree index, and supposing to retrieve the theme vector V of Q Q = (0, 0.8,0, 0.5), retrieval requires that the two most relevant documents k =2 be returned.
FIG. 3 is a search process with a filter factor of 0, starting from the root node, through r, r 2 ,r 3 To the first leaf node d 1 ,d 1 And V Q Has a semantic relevance score of relevance (V) Q ,d 1 ) =0.56 and d 1 Is added to R; then, the search passes r 3 Reach leaf node d 3 Semantic relevance score of relevance (V) Q ,d 3 ) =0.48 and d 3 Is added to the result set R; at this time, the filtering factor is updated to t =0.48; then, r 4 Node d of 4 And d 2 Is pruned because of relevance (V) Q ,d 4 )=0.4<t,relevance(V Q ,d 2 )=0.1<t; then, the search is performed through r, r 5 To d 6 Because of relevelence (V) Q ,d 6 )=0.53>t, so will d 6 Add to R and sort R in descending order. At this time, the filtering factor is updated to t =0.53. Due to relevance (V) Q ,d 5 )=0.4<t, so node d 5 Is pruned. Finally, the search result is R =<d 1 ,d 6 >。
FIG. 4 shows the screening process when the screening factor is 0.51. Unlike the above process, when the search passes d 3 Time, relevance (V) Q ,d 3 )=0.48<t node d 3 Is pruned. When retrieving r 4 Due to relevelence (V) Q ,r 4 )=0.5<t, the node and its subtree d 4 And d 2 Are pruned. According to the comparison of the retrieval example, the screening factor can screen more subtrees in advance, so that the retrieval process is accelerated, and the retrieval result is kept unchanged.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention further, and all equivalent variations made by using the contents of the present specification and the drawings are within the scope of the present invention.
Claims (6)
1. A screening factor determination method supporting semantic perception ciphertext retrieval acceleration is characterized by comprising the following steps:
step 1, establishing a semantic relevancy division sequence of each keyword and each document for each keyword;
and 2, dividing the sequence by utilizing the semantic relevance according to the search keywords, and calculating a screening factor.
2. The method for determining the screening factor supporting the acceleration of the semantic perception ciphertext retrieval according to claim 1, wherein the step 1 specifically comprises:
step 1a, calculating each document D in the document set D by utilizing a semantic perception model j Semantic vector of (D = { D) 1 ,d 2 ,…,d j ,…,d n J takes a value range of 1-n; from each document d j The key words are extracted from the Chinese character, generating a set of keywords W, W = { W 1 ,w 2 ,…,w i ,…,w m I value range 1-m, and calculating each keyword w i The semantic vector of (2);
step 1b, aiming at each keyword W in W i E.g. W, calculating the same and each document D in the document set D j Semantic relevance relevelence (w) for e D i ,d j ) Establishing w i Semantic relevance sequence L to each document in D i Then according to descendingSequencing the sequence;
step 1c, according to w i Semantic relatedness sequence L of i And a given segmentation parameter tau, dividing the sequence by equal amounts to generate w i Semantic relatedness partitioning sequence with documentsEach partition is represented as a doubletWhereinAndrepresenting the upper and lower boundaries of this partition.
3. The method for determining the screening factor supporting the semantic perception ciphertext retrieval acceleration according to claim 2, wherein the step 1c specifically comprises:
step 1c1, for each w i The relevance scores of the documents in the D are arranged in a descending order to generate a semantic relevance sequence L i (ii) a For each keyword W in W i For L, based on the segmentation parameter τ i Partition by equal amount to construct w i Corresponding includeSemantic relatedness of individual partitions;
step 1c2, dividing the sequenceWherein front isEach partition comprises tau correlation scores, and the last partition comprisesContaining a number of documents equal to or less than τ and for any two adjacent partitionsAndin the case of a non-woven fabric,is greater thanAny of the relevancy scores of (a);
4. the method for determining the screening factor supporting semantic perception ciphertext retrieval acceleration according to claim 3, wherein the step 1c3 specifically comprises:
for w i Corresponding SPT i Each partition of (1), which divides the doubletAndwherein rand (X, y) represents a random value between X and y, min (X) represents a minimum value of an element in the set X, and max (X) represents a maximum value of an element in the set X:
5. the method for determining the screening factor supporting the semantic perception ciphertext retrieval acceleration according to claim 1, wherein the step 2 specifically comprises:
step 2a, if Q is a retrieval keyword set delivered by a user, k is the number of documents required to be retrieved by the user; for each search keyword w in Q n And the value range of n is 1- | Q |, and the union U of the document mark sets in the previous x semantic relevance partitions is calculated x ,If U is x Satisfies the following formula conditions, thenIs namely w n Corresponding local retrieval screening factors;
step 2b, searching all the keywords w in the set Q n According to the followingCalculating a final screening factor t;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211579597.0A CN115757435A (en) | 2022-12-09 | 2022-12-09 | Screening factor determination method supporting semantic perception ciphertext retrieval acceleration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211579597.0A CN115757435A (en) | 2022-12-09 | 2022-12-09 | Screening factor determination method supporting semantic perception ciphertext retrieval acceleration |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115757435A true CN115757435A (en) | 2023-03-07 |
Family
ID=85346670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211579597.0A Pending CN115757435A (en) | 2022-12-09 | 2022-12-09 | Screening factor determination method supporting semantic perception ciphertext retrieval acceleration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115757435A (en) |
-
2022
- 2022-12-09 CN CN202211579597.0A patent/CN115757435A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304444B (en) | Information query method and device | |
US10339161B2 (en) | Expanding network relationships | |
CN110704743B (en) | Semantic search method and device based on knowledge graph | |
CN100541495C (en) | A kind of searching method of individual searching engine | |
US5926812A (en) | Document extraction and comparison method with applications to automatic personalized database searching | |
CN102207945B (en) | Knowledge network-based text indexing system and method | |
CN107590128B (en) | Paper homonymy author disambiguation method based on high-confidence characteristic attribute hierarchical clustering method | |
CN110929125B (en) | Search recall method, device, equipment and storage medium thereof | |
Murugesan et al. | Providing privacy through plausibly deniable search | |
CN108647322B (en) | Method for identifying similarity of mass Web text information based on word network | |
CN101727447A (en) | Generation method and device of regular expression based on URL | |
CN108399213B (en) | User-oriented personal file clustering method and system | |
US10275486B2 (en) | Multi-system segmented search processing | |
CN104391908B (en) | Multiple key indexing means based on local sensitivity Hash on a kind of figure | |
CN113297457B (en) | High-precision intelligent information resource pushing system and pushing method | |
CN103678550A (en) | Mass data real-time query method based on dynamic index structure | |
CN113377876A (en) | Domino platform-based data sub-database processing method, device and platform | |
CN106294784B (en) | resource searching method and device | |
CN103186650A (en) | Searching method and device | |
JP4219122B2 (en) | Feature word extraction system | |
CN109918661A (en) | Synonym acquisition methods and device | |
Tejasree et al. | An improved differential bond energy algorithm with fuzzy merging method to improve the document clustering for information mining | |
CN115757435A (en) | Screening factor determination method supporting semantic perception ciphertext retrieval acceleration | |
CN108256086A (en) | Data characteristics statistical analysis technique | |
CN108256083A (en) | Content recommendation method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |