CN107633017A - A kind of fuzzy set construction method of Chinese key - Google Patents

A kind of fuzzy set construction method of Chinese key Download PDF

Info

Publication number
CN107633017A
CN107633017A CN201710729995.9A CN201710729995A CN107633017A CN 107633017 A CN107633017 A CN 107633017A CN 201710729995 A CN201710729995 A CN 201710729995A CN 107633017 A CN107633017 A CN 107633017A
Authority
CN
China
Prior art keywords
phonetic
chinese
fuzzy set
tone
initial consonant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710729995.9A
Other languages
Chinese (zh)
Inventor
张亚玲
周时
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201710729995.9A priority Critical patent/CN107633017A/en
Publication of CN107633017A publication Critical patent/CN107633017A/en
Pending legal-status Critical Current

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a kind of fuzzy set construction method of Chinese key, the Chinese key alphabetizing of input is handled first, then obtained phonetic progress word segmentation processing is obtained into the phonetic of specific format, secondly the replacement of initial consonant, simple or compound vowel of a Chinese syllable, tone is carried out to obtained result according to the definition of Chinese editing distance, export fuzzy set, legal fuzzy set is exported after last phonetic validity checking, wherein, the definition of Chinese editing distance:The initial consonant or simple or compound vowel of a Chinese syllable of one phonetic change, then it is 2 to take editing distance value;The change of tone and caused by difference should be less than 1, the combination that may be changed according to definition determination initial consonant, simple or compound vowel of a Chinese syllable or the tone of given editing distance and the editing distance for improving phonetic, the replacement of initial consonant, simple or compound vowel of a Chinese syllable or tone is carried out according to different change combinations, fuzzy set is produced, the present invention solves the problems, such as that fuzzy consumption of the set construction method to room and time of existing Chinese of the prior art is higher and fuzzy set takes up too much space in itself.

Description

A kind of fuzzy set construction method of Chinese key
Technical field
The invention belongs to field of information security technology, and in particular to a kind of fuzzy set construction method of Chinese key.
Background technology
With the rise of cloud computing, the data volume of high in the clouds storage also gradually increases.Increasing user's selection is data Storage beyond the clouds, can so make it that access efficiency maximizes while overhead management minimizes.In fact, user and cloud service It is in different inter-trust domain, because the outsourcing of data may have risk, so people gradually begin to focus on cloud storage Safety problem.
Confidentiality, availability and the integrality for ensureing data storage are the safety problems that cloud storage need to solve.Modem Close property refers to that were it not for mandate, data can not be decrypted;Availability of data refers to that validated user can when wanting to use data To use at any time;Data integrity refers to that data are not tampered with when being transmitted and storing.Enterprise or individual are oneself Private data when high in the clouds be present, may worry that attacker understands the data that unauthorized access is stored.Generally, cloud service Device can prevent unauthorized users to access Cloud Server by modes such as access control or authentications.But for publicly-owned cloud service For device, the insincere of itself is exactly maximum threat.Therefore, increasing enterprise or user need in outer bag data Data are encrypted to prevent data from obtaining information by unauthorized access or by cloud service.But data are after encryption so that Search becomes highly difficult.
When to the document searching of encryption, if the whole encryption datas for first downloading high in the clouds are decrypted to local, then Search again for inquiring about, can so make efficiency very low and need to consume huge bandwidth.If in being locally created and safeguard one Complicated index structure, user can inquire related ciphertext block data, but will consume substantial amounts of storage resource, and data It is shared also to become very complicated, so more rational search plan must be used.The it is proposed that can search for encipherment scheme can be effective Solution this problem, it refers to data owner by data encryption and stores and arrive incredible high in the clouds, and user carries to Cloud Server Keyword trapdoor is handed over to be searched out as searching request, Cloud Server on the premise of related clear data data message is not obtained The search result for including this keyword returns to user.The security definition that can search for encipherment scheme includes three properties:Close Keyword trapdoor must be generated by the key of its owner;Ciphertext can not expose cleartext information;In given ciphertext and keyword It is only capable of obtaining corresponding search result after trapdoor.Can search for encryption technology can improve computational efficiency and reduce expense, so It has good development prospect.
For solving the problems, such as the search of encrypted document, can search for encipherment scheme is a relatively good solution method.It can search Suo Jiami has two kinds of classical methods:A kind of is the method based on ciphertext scanning, by being carried out to the keyword in encrypted document Compare, confirm the number that keyword whether there is and keyword occurs;Another kind is to be based on indexing means, and structure contains encryption The Security Index of keyword, it may search for search index and judge whether containing specific key word information.
Encryption research is can search for have been achieved for comparing great successes.At present, many researchs are both for English crucial The search of word, not fully it is applicable under Chinese environment.For example, a Chinese key has many synonyms or phonetic similar Word etc., this will can search for encryption to Chinese key and bring the problem of new.It can search for encrypting in Chinese key both at home and abroad at present, It is less in particular for the research in terms of Chinese fuzzy set construction method.
The content of the invention
It is an object of the invention to provide a kind of fuzzy set construction method of Chinese key, solve and exist in the prior art Consumption of the fuzzy set construction method to room and time it is higher and the problem of fuzzy set takes up too much space in itself.
The technical solution adopted in the present invention is the fuzzy set construction method of a kind of Chinese key, specifically according to following Step is implemented:
Step 1, the Chinese key alphabetizing processing by input;
Step 2, phonetic that step 1 obtains is subjected to word segmentation processing obtains the phonetic of specific format;
Step 3, the result that is obtained according to the definition of Chinese editing distance to step 2 carry out initial consonant, simple or compound vowel of a Chinese syllable, tone and replaced Change, export fuzzy set;
Step 4, phonetic validity checking;
Step 5, the legal fuzzy set of output.
The features of the present invention also resides in,
Step 1 is specially:
By the keyword of input by phonetic convert to obtain corresponding to phonetic structure, include initial consonant, simple or compound vowel of a Chinese syllable and tone.
Step 2 is specially:
Step (2.1), the judgement that initial consonant, simple or compound vowel of a Chinese syllable and tone are carried out to phonetic;
Step (2.2), initial consonant, simple or compound vowel of a Chinese syllable and the tone for separating with "-" keyword phonetic in the step (2.1) successively;
The phonetic of step (2.3), output specific format after step (2.2).
Step 3 is specially:
The definition of step (3.1), Chinese editing distance:The initial consonant or simple or compound vowel of a Chinese syllable of one phonetic change, then take editor away from It is 2 from value;The change of tone and caused by difference should be less than 1;
Step (3.2), determined according to the definition of given editing distance and the editing distance for improving phonetic initial consonant, simple or compound vowel of a Chinese syllable or The combination that person's tone may change;
Step (3.3), the replacement according to different change combination progress initial consonants, simple or compound vowel of a Chinese syllable or tone;
Step (3.4), output fuzzy set.
Step 4 is specifically implemented according to following steps:
Each element in the fuzzy set exported in step (4.1), extraction step 3;
Step (4.2), initial consonant is gone out to each element extraction, then judge whether the element is deposited according to the spelling book of initial consonant It is among the dictionary, if the element includes multiple initial consonants and has multigroup phonetic, should carry out judgement to multigroup phonetic is It is no to be present in dictionary, then illustrate that the element is legal in dictionary when multigroup phonetic is all present, retain the element, when there is one group Phonetic is not present in dictionary, then gives up the element.
The invention has the advantages that what a kind of fuzzy set construction method of Chinese key was formed according to initialization first Spelling book, alphabetizing processing and word segmentation processing are carried out for the Chinese key of input, finally according to the meter of editing distance Calculate rule and obtain fuzzy set, and validity checking is carried out to fuzzy set, obtain final fuzzy set.
Brief description of the drawings
Fig. 1 is the schematic diagram that a kind of fuzzy set construction method of Chinese key of the present invention can search for encrypting scene;
Fig. 2 is that Chinese character is converted into phonetic and disappeared in a kind of fuzzy set construction method emulation experiment of Chinese key of the present invention The time of consumption and the graph of a relation of keyword quantity;
Fig. 3 is that editing distance is respectively d=1 in a kind of fuzzy set construction method emulation experiment of Chinese key of the present invention With time loss situation map in the case of d=2;
Fig. 4 is that editing distance is respectively d=1 in a kind of fuzzy set construction method emulation experiment of Chinese key of the present invention With space consuming situation map in the case of d=2.
Embodiment
The present invention is described in detail with reference to the accompanying drawings and detailed description.
A kind of fuzzy set construction method of Chinese key of the present invention, as shown in figure 1, data owner is first from file Keyword set is extracted, then file is associated to encryption with keyword set upload publicly-owned Cloud Server, then uploads keyword set Privately owned Cloud Server;Privately owned Cloud Server receives the keyword set of data owner's upload, is given birth to first by the method for the present invention Into fuzzy set, encryption search index is generated then according to fuzzy set, search is uploaded and indexes publicly-owned Cloud Server;Receive on user After the keyword and editing distance of biography, the same method generation fuzzy set using the present invention, then searched to the request of publicly-owned Cloud Server Rope;After publicly-owned Cloud Server receives searching request, qualified result is returned into privately owned Cloud Server by searching for index; Authorized user receives the search result that privately owned Cloud Server returns, and asks to download file, publicly-owned cloud service to publicly-owned Cloud Server Device returns to encryption file, and user decrypts acquisition in plain text again.
Specifically implement according to following steps:
Step 1, the Chinese key alphabetizing processing by input, it is specially:
By the keyword of input by phonetic convert to obtain corresponding to phonetic structure, include initial consonant, simple or compound vowel of a Chinese syllable and tone;
Step 2, phonetic that step 1 obtains is subjected to word segmentation processing obtains the phonetic of specific format, be specially:
Step (2.1), the judgement that initial consonant, simple or compound vowel of a Chinese syllable and tone are carried out to phonetic;
Step (2.2), initial consonant, simple or compound vowel of a Chinese syllable and the tone for separating with "-" keyword phonetic in the step (2.1) successively;
The phonetic of step (2.3), output specific format after step (2.2);
Step 3, the result that is obtained according to the definition of Chinese editing distance to step 2 carry out initial consonant, simple or compound vowel of a Chinese syllable, tone and replaced Change, export fuzzy set, be specially:
The definition of step (3.1), Chinese editing distance:When Chinese is converted into phonetic, Chinese polyphone It is a reason for causing mistake, the difference between it should be less than 1 (the replacement cost that value is 1 in this paper emulation experiments), For pronouncing, difference value between similar simple or compound vowel of a Chinese syllable or initial consonant should be less than 1, in this species diversity this paper emulation experiments Value is 1 replacement cost, and such initial consonant and simple or compound vowel of a Chinese syllable share 12 to one to (the input method dictionary for being taken at different main flows), one The initial consonant or simple or compound vowel of a Chinese syllable of phonetic change, then it is 2 to take editing distance value;Because because mistake caused by tone is generally existing , and be not need user also to need to input tone while input Pinyin in most spelling input method, so Herein by the difference of tone by pronunciation rule of similarity processing, value is 1 replacement generation equally in this paper emulation experiment Valency, the change of tone and caused by difference should be less than 1;
Step (3.2), determined according to the definition of given editing distance and the editing distance for improving phonetic initial consonant, simple or compound vowel of a Chinese syllable or The combination that person's tone may change;
Step (3.3), the replacement according to different change combination progress initial consonants, simple or compound vowel of a Chinese syllable or tone;
Step (3.4), output fuzzy set;
Step 4, phonetic validity checking, specifically implement according to following steps:
Each element in the fuzzy set exported in step (4.1), extraction step 3;
Step (4.2), initial consonant is gone out to each element extraction, then judge whether the element is deposited according to the spelling book of initial consonant It is among the dictionary, if the element includes multiple initial consonants and has multigroup phonetic, should carry out judgement to multigroup phonetic is It is no to be present in dictionary, then illustrate that the element is legal in dictionary when multigroup phonetic is all present, retain the element, when there is one group Phonetic is not present in dictionary, then gives up the element;
Step 5, the legal fuzzy set of output.
A kind of fuzzy set construction method of Chinese key of the present invention, it is by the analysis to Chinese characteristic, and considers Alphabetizing processing is carried out to keyword, carries out phonetic word segmentation processing afterwards, mould is carried out according to the editing distance rule based on phonetic Paste collection construction, need to check the legitimacy of institute's structure phonetic while fuzzy set construction is carried out, only when the pinyin string constructed is closed It can be added to during method in fuzzy set, theoretically see, the present invention is feasible.
In order to verify that the Chinese key of the present invention obscures the feasible of set construction method, by emulation experiment to given Chinese key obscures set construction method and analyzed, news number of the data source used in experiment in search dog laboratory According to, the entitled keyword of every news is extracted, extracts 4336 keywords altogether, every keyword averagely has 10 Chinese characters, Test and emulated in the operating systems of Windows 7,4G internal memories, the computer of Intel Core i5 processors.Emulation experiment master Analyze the following aspects:1) Chinese character is converted into the time overhead of phonetic;2) it is of the invention under editing distance different situations Method structure fuzzy set time overhead;3) space of method of the invention in the case of different editing distances structure fuzzy set Expense.
Fig. 2 analyzes the time consumed to Chinese character pretreatment needed for phonetic, is being based on Chinese key mould herein , it is necessary to which pretreatment is done to Chinese key obtains its corresponding phonetic alphabet format in paste set construction method, from figure 2 it can be seen that Chinese character is converted into time that phonetic is consumed with the linear relation with increase of the increase of keyword quantity.
Fig. 3 is the time loss situation of method of the invention in the case that editing distance is respectively d=1 and d=2.
Fig. 4 is the space consuming situation of method of the invention in the case that editing distance is respectively d=1 and d=2.
Test result shows that a kind of fuzzy set construction method of Chinese key of the present invention reaches practical in keyword quantity During rank, still there can be preferable spatiotemporal efficiency.
Embodiment
As an a kind of concrete application of the fuzzy set construction method of Chinese key of the present invention, input is given here and is closed Keyword " China ", editing distance 1:
Step 1, Chinese is subjected to alphabetizing handles to obtain " zhong1guo2 ", wherein { 1,2,3,4 } to represent tone { cloudy Flat, rising tone, upper sound, falling tone };
Step 2, progress phonetic word segmentation processing obtain the phonetic of " zh-ong-1-g-uo-2 " form, in order to convenient Keyword is handled according to editing distance below;
Step 3, the combination for obtaining changing by the computation rule of above-mentioned editing distance have:1) similar sound is changed; 2) change tone, to it is possible change replacement can obtain fuzzy set zhong2guo2, zhong3guo2, zhong4guo2, zhong1guo1,zhong1guo3,zhong1guo4,zong1guo2};
Step 4, phonetic validity checking:
To first element { zhong2guo2 } extraction initial consonant { zh, g } in step 3, judged according to spelling book { zhong2 } and { guo2 } whether there is, hence it is evident that both exists, then retains the element;To in the fuzzy set that is obtained in step 3 Further element successively using the method carry out validity checking;
Step 5, the final legal fuzzy set of output:{zhong2guo2,zhong3guo2,zhong4guo2,
zhong1guo1,zhong1guo3,zhong1guo4,zong1guo2}。
A kind of fuzzy set construction method of Chinese key of the present invention, give under cloud storage environment towards Chinese can The method for building up for the keyword fuzzy set searched in encipherment scheme, complete Chinese can be formed by being combined with can search for encipherment scheme It can search for encipherment scheme.

Claims (5)

1. the fuzzy set construction method of a kind of Chinese key, it is characterised in that specifically implement according to following steps:
Step 1, the Chinese key alphabetizing processing by input;
Step 2, phonetic that step 1 obtains is subjected to word segmentation processing obtains the phonetic of specific format;
Step 3, the result obtained according to the definition of Chinese editing distance to step 2 carry out the replacement of initial consonant, simple or compound vowel of a Chinese syllable, tone, defeated Go out fuzzy set;
Step 4, phonetic validity checking;
Step 5, the legal fuzzy set of output.
A kind of 2. fuzzy set construction method of Chinese key according to claim 1, it is characterised in that the step 1 Specially:
By the keyword of input by phonetic convert to obtain corresponding to phonetic structure, include initial consonant, simple or compound vowel of a Chinese syllable and tone.
A kind of 3. fuzzy set construction method of Chinese key according to claim 1, it is characterised in that the step 2 Specially:
Step (2.1), the judgement that initial consonant, simple or compound vowel of a Chinese syllable and tone are carried out to phonetic;
Step (2.2), initial consonant, simple or compound vowel of a Chinese syllable and the tone for separating with "-" keyword phonetic in the step (2.1) successively;
The phonetic of step (2.3), output specific format after step (2.2).
A kind of 4. fuzzy set construction method of Chinese key according to claim 1, it is characterised in that the step 3 Specially:
The definition of step (3.1), Chinese editing distance:The initial consonant or simple or compound vowel of a Chinese syllable of one phonetic change, then take editing distance value For 2;The change of tone and caused by difference should be less than 1;
The definition of step (3.2), basis given editing distance and the editing distance for improving phonetic determines initial consonant, simple or compound vowel of a Chinese syllable or sound Adjust the combination that may change;
Step (3.3), the replacement according to different change combination progress initial consonants, simple or compound vowel of a Chinese syllable or tone;
Step (3.4), output fuzzy set.
A kind of 5. fuzzy set construction method of Chinese key according to claim 1, it is characterised in that the step 4 Specifically implement according to following steps:
Each element in the fuzzy set exported in step (4.1), extraction step 3;
Step (4.2), initial consonant is gone out to each element extraction, then according to the spelling book of initial consonant judge the element whether there is in Among the dictionary, if the element includes multiple initial consonants and has multigroup phonetic, multigroup phonetic should be carried out judging whether to deposit In dictionary, then illustrate that the element is legal in dictionary when multigroup phonetic is all present, retain the element, when there is a spelling sound In the absence of in dictionary, then give up the element.
CN201710729995.9A 2017-08-23 2017-08-23 A kind of fuzzy set construction method of Chinese key Pending CN107633017A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710729995.9A CN107633017A (en) 2017-08-23 2017-08-23 A kind of fuzzy set construction method of Chinese key

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710729995.9A CN107633017A (en) 2017-08-23 2017-08-23 A kind of fuzzy set construction method of Chinese key

Publications (1)

Publication Number Publication Date
CN107633017A true CN107633017A (en) 2018-01-26

Family

ID=61101226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710729995.9A Pending CN107633017A (en) 2017-08-23 2017-08-23 A kind of fuzzy set construction method of Chinese key

Country Status (1)

Country Link
CN (1) CN107633017A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101604A (en) * 2018-08-01 2018-12-28 深圳市元征科技股份有限公司 Vehicle brand knows method for distinguishing and vehicle brand identification device
CN109947955A (en) * 2019-03-21 2019-06-28 深圳创维数字技术有限公司 Voice search method, user equipment, storage medium and device
CN110097880A (en) * 2019-04-20 2019-08-06 广东小天才科技有限公司 Answer judgment method and device based on voice recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246478A (en) * 2007-02-14 2008-08-20 高德软件有限公司 Information storage and retrieval method
CN106297799A (en) * 2016-08-09 2017-01-04 乐视控股(北京)有限公司 Voice recognition processing method and device
CN106407447A (en) * 2016-09-30 2017-02-15 福州大学 Simhash-based fuzzy sequencing searching method for encrypted cloud data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246478A (en) * 2007-02-14 2008-08-20 高德软件有限公司 Information storage and retrieval method
CN106297799A (en) * 2016-08-09 2017-01-04 乐视控股(北京)有限公司 Voice recognition processing method and device
CN106407447A (en) * 2016-09-30 2017-02-15 福州大学 Simhash-based fuzzy sequencing searching method for encrypted cloud data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈何峰等: "基于密文的中文关键词模糊搜索方案", 《信息网络安全》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101604A (en) * 2018-08-01 2018-12-28 深圳市元征科技股份有限公司 Vehicle brand knows method for distinguishing and vehicle brand identification device
CN109947955A (en) * 2019-03-21 2019-06-28 深圳创维数字技术有限公司 Voice search method, user equipment, storage medium and device
CN110097880A (en) * 2019-04-20 2019-08-06 广东小天才科技有限公司 Answer judgment method and device based on voice recognition

Similar Documents

Publication Publication Date Title
US10037435B2 (en) Providing secure indexes for searching encrypted data
Ye et al. Web services classification based on wide & Bi-LSTM model
Chen et al. Coverless information hiding method based on the Chinese mathematical expression
CN109885640B (en) Multi-keyword ciphertext sorting and searching method based on alpha-fork index tree
CN107220343A (en) Chinese multi-key word Fuzzy Sorting cipher text searching method based on local sensitivity Hash
CN111797409B (en) Carrier-free information hiding method for big data Chinese text
US20210176258A1 (en) Large-scale malware classification system
CN109992978B (en) Information transmission method and device and storage medium
Zhang et al. A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing
CN107633017A (en) A kind of fuzzy set construction method of Chinese key
CN104239753A (en) Tamper detection method for text documents in cloud storage environment
Dong et al. Adversarial attack and defense on natural language processing in deep learning: A survey and perspective
CN115510249A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN113946871A (en) Privacy preserving data record integration method, system and computer readable storage medium
CN106250453A (en) The cipher text retrieval method of numeric type data based on cloud storage and device
CN107291851A (en) Ciphertext index building method and its querying method based on encryption attribute
Zhao Information iterative retrieval of internet of things communication terminal based on symmetric algorithm
CN110378136A (en) A kind of text-safe dividing method
Siwach et al. Encrypted Search & Cluster Formation in Big Data
Swami et al. A new secure data retrieval system based on ECDH and hierarchical clustering with Pearson correlation
Rahunathan et al. Efficient Multi Keyword Search in Heterogeneous Environment Based On Ranking Technique
Zhou et al. SAPMS: a semantic-aware privacy-preserving multi-keyword search scheme in cloud
CN103995900A (en) Ciphertext cloud data inquiring method
Deshpande et al. Secure Ranked Keyword Search Method with Conditional Random Fields over Encrypted Cloud Data
Li et al. A mapreduce-based quick search approach on large files.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180126

RJ01 Rejection of invention patent application after publication