CN108062355A

CN108062355A - Query word extended method based on pseudo- feedback with TF-IDF

Info

Publication number: CN108062355A
Application number: CN201711179719.6A
Authority: CN
Inventors: 徐志文; 田绪红; 古万荣; 毛宜军; 王国华; 李吉平
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2017-11-23
Filing date: 2017-11-23
Publication date: 2018-05-22
Anticipated expiration: 2037-11-23
Also published as: CN108062355B

Abstract

The invention discloses a kind of query word extended methods based on pseudo- feedback with TF IDF, this method mainly constrains selected ci poem by the inquiry of science and takes, it finally can be used to do the word that query word extends by screening twice proposed by the present invention, then given a mark for document by marking formula proposed by the present invention and sorting operation.The characteristic of the present invention is that proposing a kind of new inquiry constraint selected ci poem takes the selection mode of mode and candidate word, and has done screening operation twice and removed unrelated word.Traditional BM25 marking formula is had also combined, invents a new new marking formula for aiming at query word extension, result document that can be more after the extending query word of science is given a mark, so as to draw more scientific searching order result.

Description

Query word extended method based on pseudo- feedback with TF-IDF

Technical field

The present invention relates to vertical search, search term extension field, a kind of looking into based on pseudo- feedback and TF-IDF is referred in particular to Ask word extended method.

Background technology

Search engine has become the important tool that people obtain information needed at present, but since user can not be accurate sometimes Search term is provided, and causes search result unsatisfactory.In order to preferably provide the information useful to user, query word extension Technology is just come into being.

Want in user there are many kinds of the contents searched in the case of expression way, if only carried out by the search term of user Retrieval, it is easy to search term matching, the unsatisfactory situation of search result occur.So we should be as far as possible possible As a result fed back, then sorted, selected for user, so as to achieve the purpose that Optimizing Search according to weight calculation formula.Tradition Search term extension have the following problems：

1) expansion word is excessive, causes feedback result excessive, and user tends not to finish watching search result；

2) weight of the candidate word of many expansion words can be calculated, causes the calculating time excessive；

3) many unrelated words may be also counted as the larger word of weight when sequence, causes the data for coming several former There are extraneous datas.

4) expansion word causes to extend ineffective completely according to synonym table.

The content of the invention

It is an object of the invention to overcome the shortcomings of existing query word extension, it is proposed that one kind is based on pseudo- feedback and TF- The query word extended method of IDF the method define the preliminary selection of query word candidate word and the postsearch screening of candidate word, Give a relatively reasonable query word extended method.

To achieve the above object, technical solution provided by the present invention is：Expanded based on pseudo- feedback and the query word of TF-IDF Exhibition method, comprises the following steps：

1) inquiry constraint word is selected；

2) initial extension candidate word is selected；

3) screening obtains secondary expansion candidate word；

4) final expansion word is obtained by calculating score；

5) expansion word drawn according to step 4) creates inquiry；

6) sorted according to document weight.

In step 1), query statement is removed into stop words and is segmented, each word separated is first once individually looked into It askes, the word scored is respectively n₁, n₂, n₃..., the feedback number of documents of each word is recorded, is respectively Nn₁, Nn₂, Nn₃..., it will All query words carry out an AND inquiry, record the total FA of feedback document, then all inquiries are carried out an OR inquiry, The total FO of feedback document is recorded, in order to find the inquiry of query statement constraint word, first, with first in query statement Word n₁Exemplified by, we will be n₁One query is carried out, the operation of the inquiry is to reject n₁Afterwards, remaining all query words are carried out AND inquiry, the sum for the feedback document this time inquired about are denoted as Fn₁, then calculate FA and account for Fn₁Proportion, use D₁To record knot Fruit, D₁Calculation formula be：

Afterwards, we will be calculated comprising word n₁Number of files account for OR inquiries result document number proportion, n is represented with this₁ Degree of freedom (occur in the result document of OR inquiries more, illustrate freer, effect of contraction is smaller), use V₁It represents, V₁'s Calculation formula is：

If there is V₁More than D₁Then by word n₁It is defined as inquiry constraint word, other words n afterwards₂, n₃..., also repeat above-mentioned behaviour Make, until all words all carried out judgement, why V₁More than D₁Definition be inquiry constraint word, be because V₁It represents n₁Degree of freedom in OR query results, if n₁Presence to the no effect of contraction of the inquiry, then include n₁AND inquiry account for Not comprising n₁AND inquiry proportion D₁, it should more than or equal to its degree of freedom V₁If V₁More than D₁, then its presence is illustrated Allow the amount for feeding back number of files reduction that AND is inquired about higher than threshold value (if V₁Equal to D₁, then AND inquiry feedback number of documents reduction Just, it to the effect of contraction of query statement just, be at this time threshold value；If V₁Less than D₁, then V is illustrated₁Corresponding word n₁Deposit , it is also smaller than threshold value to the effect of contraction of the query statement, in other words, n₁Degree of freedom can to regard it as can be the inquiry The size for the advantageous effect that sentence plays the role of, if its presence is not played its minimum this and had, then it is assumed that it is looked into this The effect of sentence Constrained is ask, therefore is defined as inquiry constraint word), inquiry constraint word is carried out query word extension by us, If the word is directly only set to inquiry constraint word there are one word.

In step 2), each inquiry constraint word is carried out individually inquiry and obtains its feedback document by us, by it The content of feedback document carries out stop words and participle, then according to the TFnew*IDF of each word, score is calculated, by ranking Preceding 50 word is denoted as the initial candidate expansion word of the word, defines candidate word w_iIn the feedback document d of constraint word_jThe word frequency of middle appearance For w_id_j, the feedback number of files of note constraint word is k, and the calculation formula of TFnew is (max () function is to take all parameters in bracket Maximum, min () function is the minimum value for taking all parameters in bracket)：

Number of files all in corpus is defined as N, contains w_iNumber of files be w_iN, w_iThe calculation formula of IDF be：

In step 3), we will more each initial candidate word its it is corresponding inquiry constraint word feedback document in Average word frequency of the average word frequency with the initial candidate word in the feedback document of oneself, if the latter's bigger, by the word from initial Rejected in candidate word set, obtain two level candidate word set (though ensure by some with constraint word it is relevant, not only with constraint The related word of word is rejected), it is X to remember the number occurred in i-th of document of certain word in n document_i, average word frequency TFavg Calculation formula be：

In step 4), candidate word weight calculation formula is utilized：

-1

S (w, q)=(d+1)

W represents candidate word, and q represents constraint word, and d is the absolute value of the distance of the term vector of two words, represent candidate word with The distance of word is constrained, why uses (d+1)^-1The score of each candidate word is calculated, because first can ensure with distance Increase, score S is can be gradually smaller, and second when can allow closer to the distance, and score is increased very big, apart from it is far when score subtract Few very little more meets reality, also more Easy open gap, by sort result, using the ranking word of first three as the expansion of the constraint word Word is opened up, repeats aforesaid operations, until going out the expansion word of oneself for each inquiry constraint selected ci poem.

In step 5), with each constraint word and its 3 expansion words, with logical relation OR connections, theirs are formed Set, the set for then forming institute's Constrained word and their 3 expansion words, with logical relation AND connections, finally again They are connected with other query statements with logical relation AND, are inquired about with this relation, obtain feedback document.

In step 6), original BM25 marking formula is introduced：

In formula, W_iFor the weight of its equivalent, R (q_i, d) be current equivalent and document the degree of correlation, in original BM25 It gives a mark in formula, by the score formula in step 4), i.e. S (w, q) is added in, and is sorted, and the document to be given a mark is denoted as d, with Exemplified by some inquiry constraint word, the expansion word for inquiring about constraint word is denoted as q respectively₁, q₂, q₃, inquire about constraint word and be denoted as q₄, then S (q_i, q₄) represent inquiry constraint word q₄Expansion word q_iTheir distance calculated using the marking formula in step 4), they Weight in BM25 is respectively W₁, W₂, W₃, W₄, they are denoted as a query set Q_A, they in BM25 with each document The degree of correlation be denoted as R (q_i, d), first calculate each constraint word and the score S of its expansion word part_A, then all inquiry constraints The S that word and its expansion word calculate_AIt is all added, is finally calculated along with other query statements by original BM25 marking formula Score S_B, i.e., with the score S that the BM25 formula without adding in the score formula S (w, q) in step 4) are their calculating_B；

S_ACalculation formula be:

By whole S_AThe sum of add S_BScore afterwards is the final score of every document, afterwards by result press from greatly to Small return.

Compared with prior art, the present invention having the following advantages that and advantageous effect：

1st, of the invention is not to be extended for each word, but has what science selected, will be had about to query result The word of Shu Zuoyong is extended, more science.

2nd, the present invention calculates after can the maxima and minima of word frequency be rejected when selecting candidate word, can tie calculating Fruit is more fair.

3rd, the present invention has screens candidate word twice, can reject unrelated candidate word, so that not having in final result Unnecessary extension.

4th, the present invention utilizes x^-1The similarity calculation between word and word is carried out, is more tallied with the actual situation, first apart from smaller Similarity is bigger, and two words very close to when, similarity increases very big, and when far twice, similarity varies less.

5th, the present invention to finally to the marking of each document when only need with 3 final candidate words participations rather than all times Word is selected to both participate in marking, can save and calculate the time.And expansion word candidate word and the similarity of expansion word are also added in when giving a mark It calculates, and as auxiliary bonus point item, more science.

Description of the drawings

Fig. 1 is the processing step flow chart of the method for the present invention.

Specific embodiment

With reference to specific embodiment, the invention will be further described.

As shown in Figure 1, the query word extended method based on pseudo- feedback with TF-IDF that the present embodiment is provided, including following Step：

1) inquiry constraint word is selected

First, query statement is removed into stop words and segmented, stop words includes some auxiliary words of mood, preposition etc., these operations The existing segmenter such as IK segmenter can be used, we will judge whether the vocabulary quantity separated is more than 1 first, if more than 1, then Once individually inquiry is carried out in the search engine (such as solr) that each word separated is first put up at us, by each word Remittance (n₁, n₂, n₃...) and inquiry its obtain feedback number of documents (Nn₁, Nn₂, Nn₃...) and after each word removes it Feedback number of documents (the Fn of the AND inquiries of remaining all query words₁, Fn₂, Fn₃...) correspond, and use chained list ListArrayA is preserved, and all query words then are carried out an AND inquiry, and it is total to record feedback document with integer variable FA Number, then all inquiries are subjected to an OR inquiry, the total number of files of feedback is recorded with integer variable FO, with floating type variable array D records the threshold value proportion of each word, wherein each element D_x(which element x represents, does not imply that array index, x from 1 starts to take, and is taken until by all words) calculation formula be：

Further according to element number in the chained list defined before, the floating type array V of one is defined, records the freedom of each word Degree (occur in the result document of OR inquiries more, illustrate freer, effect of contraction is smaller), wherein each element V_x(x Which element represented, does not imply that array index, x takes since 1, is taken until by all words) calculation formula be：

By the element in array V one by one compared with element in corresponding D, if there is V_xMore than D_x, then will be under this element Mark takes out, and finds corresponding lower target element in chained list ListArrayA, the corresponding word of this element is deposited into new chained list ListArrayB, final ListArrayB all elements are then inquiry constraint word, our purpose is that inquiry is constrained word Query word extension is carried out, why V_xMore than D_xWord be defined as inquiry constraint word, be because V_xRepresent n_xIt inquires about and ties in OR Degree of freedom in fruit, if n_xPresence to the no effect of contraction of the inquiry, then include n_xAND inquiry account for not comprising n_xAND The proportion D of inquiry_x, it should more than or equal to its degree of freedom V_xIf V_xMore than D_x, then it is anti-to illustrate that its presence allows AND to inquire about The amount for presenting number of documents reduction is higher than threshold value (if V_xEqual to D_x, then just, it is to looking into for the feedback number of documents reduction of AND inquiries The effect of contraction of inquiry sentence is at this time threshold value just；If V_xLess than D_x, then V is illustrated_xCorresponding word n_xPresence, to the inquiry language The effect of contraction of sentence is also smaller than threshold value, in other words, n_xDegree of freedom can to regard it as can be having of playing of the query statement The size of profit effect, if its presence is not played the role of its minimum this and had, then it is assumed that it is to the query statement Constrained Effect, therefore be defined as inquiring about constraint word), inquiry constraint word is carried out query word extension by us, if word there are one only, Then directly the word is put into ListArrayB, is set to inquiry constraint word.

2) initial extension candidate word is selected

We define the chained list ListArrayC of a character type array (size 50) first, we will Element in ListArrayB repeats following operation one by one, afterwards corresponds in deposit ListArrayC result, takes first First constraint word, is carried out individually inquire about obtaining its feedback document, fed back document word carry out stop words with And participle, then according to the TFnew*IDF of each word, calculate score, by except the constraint word in itself in addition to ranking before 50 word remember It for the initial candidate expansion word of the word, that is, deposits into the first character type array in ListArrayC, is repeated in Operation is stated, defines candidate word w_iIn the feedback document d of constraint word_jThe word frequency of middle appearance is w_id_j, the feedback number of files of note constraint word For k, the calculation formula of TFnew is (max () function is the maximum for taking all parameters in bracket, and min () function is to take bracket In all parameters minimum value)：

Number of files all in corpus is defined as N, contains w_iNumber of files be w_iThe calculation formula of N, IDF is：

3) screening obtains secondary expansion candidate word

We will more each initial candidate word its it is corresponding inquiry constraint word feedback document in average word frequency with Average word frequency of the initial candidate word in the feedback document of oneself, if the latter's bigger, by the word from initial candidate set of words Middle rejecting, obtains two level candidate word set, deposit ListArrayD (though ensure some are relevant with constraint word, not only Rejected with the related word of constraint word), it is X to remember the number occurred in i-th of document of certain word in n document_i, average word frequency The calculation formula of Tfavg is：

4) final expansion word is obtained by calculating score

Utilize candidate word weight calculation formula：

S (w, q)=(d+1) -1

W represents candidate word, and q represents constraint word, and d is the absolute value of the distance of the term vector of two words, represent candidate word with The distance of word is constrained, why uses (d+1)^-1The score of each candidate word is calculated, because first can ensure with distance Increase, score S is can be gradually smaller, and second when can allow closer to the distance, and score is increased very big, apart from it is far when score subtract Few very little more meets reality, also more Easy open gap, by sort result, using the ranking word of first three as the expansion of the constraint word Word is opened up, is stored in ListArrayE, repeats aforesaid operations, until going out the expansion word of oneself for each inquiry constraint selected ci poem.

5) expansion word drawn according to step 4) creates inquiry

All words in ListArrayE and their corresponding constraint words are carried out to the inquiry operation of following form, (about The expansion word 3OR constraints word 1 of the expansion word 2OR constraint words 1 of the expansion word 1OR constraint words 1 of beam word 1) AND (extensions of constraint word 2 The expansion word 3OR constraints word 2 of the expansion word 2OR constraint words 2 of word 1OR constraint words 2) ... the mode of other query statements of AND carries out Inquiry.

6) sorted according to document weight

It sorts after feedback document in step 5) is given a mark, marking principle is：Introduce original BM25 marking formula：

In formula, W_iFor the weight of its equivalent, R (q_i, d) be current equivalent and document the degree of correlation, in original BM25 It gives a mark in formula, the score formula S (w, q) of step 4) is added in, each expansion word is also assisted in into point counting together, note is wanted The document of marking is denoted as d, and by taking some inquiry constraint word as an example, the expansion word for inquiring about constraint word is denoted as q respectively₁, q₂, q₃, look into It askes constraint word and is denoted as q₄, then S (q_i, q₄) represent inquiry constraint word q₄Expansion word q_iIt is calculated using the marking formula in step 4) Their distance, their weights in BM25 are respectively W₁, W₂, W₃, W₄, they are denoted as a query set Q_A, they In BM25 R (q are denoted as with the degree of correlation of each document_i, d), first calculate each constraint word and the score of its expansion word part S_A, then the S that all inquiry constraint words and its expansion word are calculated_AAll be added, finally along with other query statements pass through it is original The score S that BM25 marking formula calculate_B, i.e., it is them with the BM25 formula without adding in the score formula S (w, q) in step 4) The score S of calculating_B；

S_ACalculation formula be:

By all S_AAnd plus S_BScore afterwards is the final score of every document, afterwards by result press from greatly to Small return.

Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore The variation that all shape, principles according to the present invention are made should all be covered within the scope of the present invention.

Claims

1. the query word extended method based on pseudo- feedback with TF-IDF, which is characterized in that comprise the following steps：

1) inquiry constraint word is selected；

2) initial extension candidate word is selected；

3) screening obtains secondary expansion candidate word；

4) final expansion word is obtained by calculating score；

5) expansion word drawn according to step 4) creates inquiry；

6) sorted according to document weight.

2. the query word extended method according to claim 1 based on pseudo- feedback with TF-IDF, it is characterised in that：In step 1) in, query statement is removed into stop words and is segmented, each word separated is first once individually inquired about, the word scored point It Wei not n₁, n₂, n₃..., the feedback number of documents of each word is recorded, is respectively Nn₁, Nn₂, Nn₃..., all query words are carried out AND inquiry records the total FA of feedback document, then all inquiries is carried out an OR inquiry, records feedback document Total FO, in order to find the inquiry of query statement constraint word, first, with first word n in query statement₁Exemplified by, to be n₁ One query is carried out, the operation of the inquiry is to reject n₁Afterwards, remaining all query words are subjected to an AND inquiry, this is looked into The sum of the feedback document of inquiry is denoted as Fn₁, then calculate FA and account for Fn₁Proportion, use D₁To record as a result, D₁Calculation formula be：

Afterwards, to calculate comprising word n₁Number of files account for OR inquiries result document number proportion, n is represented with this₁Freedom Degree, uses V₁It represents, V₁Calculation formula be：

If there is V₁More than D₁Then by word n₁It is defined as inquiry constraint word, other word n afterwards₂, n₃..., also repeatedly aforesaid operations, directly Judgement was all carried out to all words, why V₁More than D₁Definition be inquiry constraint word, be because V₁Represent n₁In OR Degree of freedom in query result, if n₁Presence to the no effect of contraction of the inquiry, then include n₁AND inquiries account for and do not include n₁AND inquiry proportion D₁, it should more than or equal to its degree of freedom V₁If V₁More than D₁, then illustrate that its presence allows AND to look into The amount of the feedback number of documents reduction of inquiry is higher than threshold value, that is, it is bigger than threshold value to the effect of contraction of the query statement；If V₁ Equal to D₁, then just, it is at this time threshold just to the effect of contraction of query statement for the feedback number of documents reduction of AND inquiries Value；If V₁Less than D₁, then V is illustrated₁Corresponding word n₁Presence, it is also smaller than threshold value to the effect of contraction of the query statement, change sentence It talks about, n₁Degree of freedom can to regard it as can be the size of advantageous effect that the query statement plays, if its presence does not have Play the role of to its minimum this, then it is assumed that it is that the query statement Constrained is acted on, therefore is defined as inquiry constraint word； In addition, inquiry constraint word is subjected to query word extension, if the word directly only is set to inquiry constraint word there are one word.

3. the query word extended method according to claim 1 based on pseudo- feedback with TF-IDF, it is characterised in that：In step 2) in, each inquiry constraint word is subjected to individually inquiry and obtains its feedback document, the content for being fed back document carries out Stop words and participle are removed, then according to the TFnew*IDF of each word, score is calculated, before ranking 50 word is denoted as the word Initial candidate expansion word defines candidate word w_iIn the feedback document d of constraint word_jThe word frequency of middle appearance is w_id_j, remember the anti-of constraint word Feedback number of files is k, and the calculation formula of TFnew is：

Wherein, max () function is the maximum for taking all parameters in bracket, and min () function is to take in bracket all parameters most Small value；

4. the query word extended method according to claim 1 based on pseudo- feedback with TF-IDF, it is characterised in that：In step It 3), average word frequency and the initial time of more each initial candidate word in the feedback document of its corresponding inquiry constraint word in Average word frequency of the word in the feedback document of oneself is selected, if the latter's bigger, which from initial candidate set of words is rejected, is obtained To two level candidate word set, though ensureing some are relevant with constraint word, not only rejected with the related word of constraint word, note The number that certain word occurs in i-th of document in n document is X_i, the calculation formula of average word frequency TFavg is：

<mrow> <mi>T</mi> <mi>F</mi> <mi>a</mi> <mi>v</mi> <mi>g</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mfrac> <msub> <mi>X</mi> <mi>i</mi> </msub> <mi>n</mi> </mfrac> <mo>.</mo> </mrow>

5. a kind of query word extended method based on pseudo- feedback with TF-IDF according to claim 1, it is characterised in that： In step 4), candidate word weight calculation formula is utilized：

S (w, q)=(d+1)^-1

In formula, w represents candidate word, and q represents constraint word, and d is the absolute value of the distance of the term vector of two words, represent candidate word with The distance of word is constrained, why uses (d+1)^-1The score of each candidate word is calculated, because first can ensure with distance Increase, score S is can be gradually smaller, and second when can allow the distance to be close to preset range, score increase within a preset range, Apart from it is remote when score reduce within a preset range, more meet reality, also more Easy open gap, by sort result, by ranking first three Expansion word of the word as the constraint word, aforesaid operations are repeated, until going out the expansion word of oneself for each inquiry constraint selected ci poem；

In step 6), original BM25 marking formula is introduced：

<mrow> <mi>S</mi> <mi>c</mi> <mi>o</mi> <mi>r</mi> <mi>e</mi> <mrow> <mo>(</mo> <mi>Q</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mi>i</mi> <mi>n</mi> </munderover> <msub> <mi>W</mi> <mi>i</mi> </msub> <mo>*</mo> <mi>R</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> </mrow>

In formula, W_iFor the weight of its equivalent, R (q_i, d) be current equivalent and document the degree of correlation, original BM25 give a mark In formula, the score formula S (w, q) in step 4) is added in, and is sorted, the document to be given a mark is denoted as d, is looked into some Exemplified by asking constraint word, the expansion word for inquiring about constraint word is denoted as q respectively₁, q₂, q₃, inquire about constraint word and be denoted as q₄, then S (q_i, q₄) table Show inquiry constraint word q₄Expansion word q_iTheir distance calculated using the marking formula in step 4), they are in BM25 Weight be respectively W₁, W₂, W₃, W₄, they are denoted as a query set Q_A, their degrees of correlation in BM25 with each document It is denoted as R (q_i, d), first calculate each constraint word and the score S of its expansion word part_A, then all inquiry constraint words and its expansion Open up the S that word calculates_AIt is all added, finally along with other query statements pass through the score S of original BM25 marking formula_B, i.e., with not having Have and add in the score S that the BM25 formula of the score formula S (w, q) in step 4) calculate for them_B；

S_ACalculation formula be:

<mrow> <msub> <mi>S</mi> <mi>A</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>Q</mi> <mi>A</mi> </msub> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>4</mn> </munderover> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mn>4</mn> </msub> <mo>)</mo> </mrow> <msub> <mi>W</mi> <mi>i</mi> </msub> <mo>*</mo> <mi>R</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> </mrow>

By whole S_AThe sum of add S_BScore afterwards is the final score of every document, afterwards by result by returning from big to small It returns.

6. the query word extended method according to claim 1 based on pseudo- feedback with TF-IDF, it is characterised in that：In step 5) in, with each constraint word and its 3 expansion words, with logical relation OR connections, their set is formed, it then will be all The set that word and their 3 expansion words are formed is constrained, with logical relation AND connections, finally they and others are looked into again Sentence logical relation AND connections are ask, are inquired about with this relation, obtain feedback document.