CN111897926A - Chinese query expansion method integrating deep learning and expansion word mining intersection - Google Patents

Chinese query expansion method integrating deep learning and expansion word mining intersection Download PDF

Info

Publication number
CN111897926A
CN111897926A CN202010774430.4A CN202010774430A CN111897926A CN 111897926 A CN111897926 A CN 111897926A CN 202010774430 A CN202010774430 A CN 202010774430A CN 111897926 A CN111897926 A CN 111897926A
Authority
CN
China
Prior art keywords
word
expansion
chinese
pseudo
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010774430.4A
Other languages
Chinese (zh)
Inventor
黄名选
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University of Finance and Economics
Original Assignee
Guangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University of Finance and Economics filed Critical Guangxi University of Finance and Economics
Priority to CN202010774430.4A priority Critical patent/CN111897926A/en
Publication of CN111897926A publication Critical patent/CN111897926A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a Chinese query expansion method with intersection fusion of deep learning and expanded word mining, which comprises the steps of firstly carrying out word embedding semantic learning training on a primary detection document set by adopting a deep learning tool to obtain a word embedding expanded word set with rich context semantic information, then mining an association rule mode on a primary detection front row pseudo-related feedback document set by utilizing a pseudo-related feedback expanded word mining method based on Copulas theory to obtain a rule expanded word set containing feature interword association information based on statistical analysis, and finally carrying out intersection fusion on the word embedding expanded word set and the rule expanded word set to obtain a final expanded word set so as to improve the quality of expanded words. The method integrates deep learning and expansion word mining, and excavates the high-quality expansion words related to the original query, so that the problems of query subject drift and word mismatching can be solved, the text information retrieval performance is improved, and the method has good application value and popularization prospect.

Description

Chinese query expansion method integrating deep learning and expansion word mining intersection
Technical Field
The invention relates to a Chinese query expansion method integrating deep learning and expansion word mining, belonging to the technical field of information retrieval.
Background
Query expansion is one of key technologies for solving the problems of query topic drift and word mismatching in information retrieval, and the query expansion refers to modifying the weight of an original query or adding words related to the original query to obtain a new query longer than the original query so as to describe the semantic meaning or topic implied by the original query more completely and accurately, make up for the deficiency of user query information and improve the retrieval performance of an information retrieval system. The core problem of query expansion is the source of the expansion terms and the design of the expansion model. With the development of network technology and the arrival of the big data era, network users have more and more requirements on information retrieval, for example, how to accurately retrieve required information from massive big data makes query expansion research become a hotspot in the field of information retrieval.
In recent decades, researchers have conducted research on query expansion models from different perspectives and methods, and have gained abundant research results, wherein relevant feedback expansion based on association pattern mining and recently emerging query expansion based on deep learning are more concerned and discussed by the researchers at home and abroad. For example, Bouziri et al propose supervised learning based extension word mining (see: Bouziri A, Latiri C, Gaussian E et al. learning Query extension from rules in terms of [ C ]. Proceedings of the 7th International Journal Discovery on Knowledge Discovery and Knowledge management (IC3K), Lisbon, Portugal,2015: 525. 530.) and ranking learning model based extension word mining methods (see: Bouziri A, Latiri C, Gaussian E. efficiency assessment search for implementation of the invention, simulation analysis of the general implementation of the invention, extension word mining methods based on the ranking learning model (see: implementation of the general correlation of [ C ]. 19, simulation of the invention, simulation of the language of the invention classification of the extension of the general implementation of the Query, scientific analysis of the language of the invention [ C ]. 12, simulation of the language of the invention, simulation of the extension of the language of the invention, simulation of the language of the invention, and the implementation of the language of the Query of the language of the human classification of, 2019,9(6): 5016-: kuzi S, Shok A, Kurland O.query expansion Word embedding [ C ]. Proceedings of the 25th ACM International Conference on Information and Knowledge management.NewYork: ACM Press,2016:1929 and 1932.) A CBOW model of a deep learning tool Word2vec is applied to train Word vectors in search corpuses, and feature words related to query semantics are selected from the Word vectors to realize query expansion. Experimental results show that the query expansion method is effective and has better performance in the aspect of improving the information retrieval performance. However, the existing query expansion method has not completely solved the technical problems of query topic drift, word mismatching and the like existing in information retrieval, and meanwhile, although the expansion words from the association mode can carry feature word association information based on statistical analysis, the semantic information of the expansion words in the document context is lacked, and the word embedding expansion words derived from word vector semantic learning training have rich document context semantic information but have no feature word association information based on statistical analysis.
In order to fully exert the expansion advantages of the regular expansion words and the word embedded expansion words and make up the respective defects, the invention integrates deep learning with the expansion word mining based on the Copulas theory, and provides a Chinese query expansion method integrating the deep learning with the expansion word mining in an intersection manner, which can solve the problems of query theme drift and word mismatching in a text information retrieval system, improve the text information retrieval performance and has good application value and wide popularization prospect.
Disclosure of Invention
The invention aims to provide a Chinese query expansion method integrating deep learning and expansion word mining intersection, which is used in the field of information retrieval, such as an actual Chinese search engine and a web information retrieval system, and can improve and enhance the query performance of the information retrieval system and reduce the problems of query theme drift and word mismatching in information retrieval.
The invention adopts the following specific technical scheme:
a Chinese query expansion method integrating deep learning and expanded word mining intersection comprises the following steps:
step 1, searching a Chinese document set for original query to obtain a primary detection document set, and performing Chinese word segmentation and stop word removal pretreatment on the primary detection document set.
Step 2, performing word embedding semantic learning training on the initial inspection document set by using a deep learning tool to obtain a feature word and word embedding vector set, and specifically comprising the following steps:
and (2.1) performing word embedding semantic learning training on the primary detection pseudo-related feedback document set by adopting a Skip-gram model (in detail, https:// code. Google. com/p/word2vec /) of a deep learning tool Google open source word vector tool word2vec to obtain a word embedding vector set of the primary detection document feature words.
(2.2) in the word embedding vector set of the initial detection document characteristic words, calculating each query term qi(qiE.g. Q, Q is the original query term set, Q ═ Q1,q2,…,qn) I is more than or equal to 1 and less than or equal to n)) and all word embedding candidate expansion words (cet)1,cet2,…,cetm) Word vector cosine similarity degree of (q) VCosi,cetj) The formula is shown as formula (1), wherein j is more than or equal to 1 and less than or equal to m. The word embedding candidate expansion words refer to those non-query terms in the word embedding vector set.
Figure BDA0002617865330000021
In the formula (1), vcetjIndicating that the jth word is embedded into the candidate expansion word cetjWord vector value of, vqiRepresenting the ith query term qiThe word vector value of.
(2.3) given a minimum vector cosine similarity threshold minqvcos, extracting its VCos (q)i,cetj) Query term q of not less than minqvcosiThe word is embedded into a candidate expansion word as the query term qiWord embedding expansion word (q)iet1,qiet2,…,qietp1) Will query term q1,q2,…,qnEmbedding all the words into the extension Word combination, removing the repeated words to obtain the final Word embedding extension Word set ET _ WE (expansion Term from Word embedding) of the original query Term set Q, and calculating the weight w of the Word embedding extension wordsWEETAnd then, the step 3 is carried out. The ET _ WE is shown as formula (2):
Figure BDA0002617865330000031
the weight w of the word embedded expansion wordWEETFor querying the cosine similarity of terms and words embedded in expanded words, e.g.And (3) when repeated words appear, the word embedding expansion word weight is equal to the cumulative sum of the similarity of each vector of the repeated words.
Figure BDA0002617865330000032
Step 3, adopting a Copulas theory-based pseudo-related feedback expansion word mining method to mine rule expansion words in the initial detection pseudo-related feedback document set, and establishing a rule expansion word set, wherein the method specifically comprises the following steps:
and (3.1) extracting m pieces of primary detection documents in the primary detection document set, constructing a primary detection pseudo-related feedback document set, carrying out Chinese word segmentation, Chinese stop words removal and feature word extraction preprocessing on the primary detection pseudo-related feedback document set, calculating a feature word weight, and finally constructing a pseudo-related feedback Chinese document library and a Chinese feature word library.
The invention adopts TF-IDF (term frequency-inverse document frequency) weighting technology (see the literature: Ricardo Baeza-Yates Berthier Ribeiro-Net, et al, WangZhijin et al, modern information retrieval, mechanical industry Press, 2005: 21-22.) to calculate the weight of the feature words.
(3.2) taking the feature words in the Chinese feature word library as 1_ candidate item set C1
(3.3) calculation of C1Support degree CSup (C) based on Copulas theory1) If CSup (C)1) More than or equal to the minimum support threshold ms, C is set1As 1_ frequent item set L1And added to the frequent itemset set fis (frequency itemset).
The csup (copula basedsupport) represents the support degree based on copula theory. The CSup (C)1) Is calculated as shown in equation (4):
Figure BDA0002617865330000033
in the formula (4), C1"Count" represents 1_ candidate C1The frequency of occurrence in the pseudo related feedback Chinese document library, all doc (count) represents the total number of the pseudo related feedback Chinese document libraryNumber of documents, C1"Weight" represents the 1_ candidate C1Item set weights in the pseudo-relevance feedback Chinese document library, allItems (weight) represent the weighted sum of all Chinese feature words in the pseudo-relevance feedback Chinese document library.
(3.4) adopting a self-connection method to connect (k-1) _ frequent item set Lk-1Deriving k _ candidate C from concatenationkAnd k is more than or equal to 2.
The self-ligation method employs a candidate ligation method as set forth in Apriori algorithm (see: Agrawal R, Imielinski T, SwamiA. minor association rules between sections of entities in large database [ C ]// Proceedings of the 1993ACM SIGMOD International Conference on Management of data, Washington D C, USA,1993: 207-.
(3.5) when mining to 2_ candidate C2When, if the C is2If the original query term is not contained, the C is deleted2If the C is2If the original query term is contained, the C is left2Then, C is left2And (4) transferring to the step (3.6). When mining to k _ candidate CkAnd when the k is more than or equal to 3, directly transferring to the step (3.6).
(3.6) calculation of CkSupport degree CSup (C) based on Copulas theoryk) If CSup (C)k) Not less than ms, then CkIs k _ frequent item set LkAdding into FIS, then transferring into the step (3.7), or directly transferring into the step (3.7).
The CSup (C)k) Is calculated as shown in equation (5):
Figure BDA0002617865330000041
in the formula (5), Ck"Count" denotes k _ candidate CkFrequency of occurrence in pseudo-relevance feedback Chinese document library, Ck"Weight" denotes k _ candidate CkItem set weights in a pseudo-relevance feedback Chinese document library. AllDoc (count) and AllItems (weight) are defined as in equation (4).
(3.7) k is added with 1 and then is transferred to the step (3.4) to continue to sequentially execute the next step,up to said LkAnd (4) if the item set is an empty set, finishing the mining of the frequent item set, and turning to the step (3.8).
(3.8) taking out k _ frequent item set L from FISkAnd k is more than or equal to 2.
(3.9) extraction of LkIs set of proper subset entries EtjAnd QiAnd is and
Figure BDA0002617865330000042
Qi∪Etj=Lk
Figure BDA0002617865330000043
et (E) describedjFor a proper subset of terms set without query terms, said QiThe method comprises the steps of setting a proper subset item set containing query terms, wherein Q is an original query term set.
(3.10) calculate association rule Q based on Copulas theoryi→EtjConfidence of (CConf) (Q)i→Etj) If CConf (Q)i→ETj) If the confidence coefficient is more than or equal to the minimum confidence coefficient threshold mc, Q is addedi→EtjAdding into the association rule set AR (Association rule), then, proceeding to step (3.9), from LkTo re-extract the other proper subset item sets EtjAnd QiSequentially proceeding the next steps, and circulating the steps until LkIf and only if all proper subset entries in the set are retrieved once, then proceed to step (3.8), perform a new round of association rule pattern mining, and retrieve any other L from the FISkThen, the subsequent steps are performed sequentially, and the process is circulated until all k _ frequent item sets L in the FISkIf and only if all are taken out once, then the association rule pattern mining is finished, and the process goes to the following step (3.11).
The CConf (copula based Confidence) represents a confidence based on copula theory, the CConf (Q)i→ETj) Is represented by equation (6):
Figure BDA0002617865330000051
in the formula (6), the reaction mixture is,Qi"Count represents the proper subset item set QiFrequency of occurrence, Q, in pseudo-associative feedback Chinese document libraryi'Weight' represents a proper subset item set QiTerm set weights in pseudo-relevance feedback Chinese document library, (Q)i∪Etj) _ Count represents a set of items (Q)i∪Etj) Frequency of occurrence in pseudo-relevance feedback Chinese document library, (Q)i∪Etj) ' Weight representation item set (Q)i∪Etj) Item set weights in a pseudo-relevance feedback Chinese document library. AllDoc (count) and AllItems (weight) are defined as in equation (4).
(3.11) extracting the association rule backing Et from the association rule set ARjAs rule extension words, obtain rule extension word set ET _ AR (extension Term from Association rules), calculate rule extension word weight wEtThen, the process proceeds to step 4.
The ET _ AR is shown as formula (7):
Figure BDA0002617865330000052
in formula (7), RetiIndicating the ith rule expansion word.
The rule expansion word weight wEtThe calculation formula is shown in formula (8):
Figure BDA0002617865330000053
in the formula (8), max () represents the maximum value of the confidence of the association rule, and when the same rule expansion word appears in a plurality of association rule patterns at the same time, the maximum value of the confidence is taken as the weight of the rule expansion word.
Step 4, performing intersection fusion on the rule expansion word set and the word embedding expansion word set to obtain a final expansion word, and realizing query expansion, wherein the specific steps are as follows:
(4.1) performing intersection operation on the rule extension word Set ET _ AR and the word embedding extension word Set ET _ WE to obtain a final extension word Set ETS _ Q (expansion Term Set for Query Q) of the original Query Term Set Q, and calculating a final extension word weight w(ETi)。
The final extended word set ETS _ Q is calculated as shown in equation (9):
Figure BDA0002617865330000054
final expanded word weight w (ET)i) Is calculated as shown in equation (10):
w(ETi)=wEt+wWEET(10)
and (4.2) combining the final expansion word with the original query into a new query, and searching the Chinese document again to realize query expansion.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention provides a Chinese query expansion method with deep learning and expanded word mining intersection fusion, which is characterized in that a deep learning tool is adopted for carrying out word embedding semantic learning training on a primary detection document set to obtain a word embedding expanded word set with rich context semantic information, a copula theory-based pseudo-related feedback expanded word mining method is utilized to mine an association rule mode for the primary detection front row pseudo-related feedback document set to obtain a rule expanded word set containing characteristic interword association information based on statistical analysis, and the word embedding expanded word set and the rule expanded word set are subjected to intersection fusion to obtain a final expanded word set so as to improve the quality of expanded words. Experimental results show that the method can inhibit the problems of query theme drift and word mismatching, improve the information retrieval performance, has higher retrieval performance than similar comparison methods in recent years, and has better application value and popularization prospect.
(2) 4 similar query expansion methods appearing in recent years are selected as comparison methods of the method, and experimental data are Chinese corpus of a national standard data set NTCIR-5 CLIR. The experimental result shows that compared with the reference retrieval, the MAP of the method of the invention has the highest average amplification of 27.87 percent, the MAP of the method of the invention has higher average amplification than that of the comparison method, the average amplification of 18.21 percent, and the experimental effect is obvious, thus the retrieval performance of the method of the invention is better than that of the reference retrieval and comparison method, the information retrieval performance can be improved, the problems of query drift and word mismatching in the information retrieval are reduced, and the method has very high application value and wide popularization prospect.
Drawings
FIG. 1 is a general flow diagram of the method for expanding Chinese queries by merging deep learning and expanded word mining intersections according to the present invention.
Detailed Description
Firstly, in order to better explain the technical scheme of the invention, the related concepts related to the invention are introduced as follows:
1. item set
In text mining, a text document is regarded as a transaction, each feature word in the document is called an item, a set of feature word items is called an item set, and the number of all items in the item set is called an item set length. The k _ term set refers to a term set containing k items, k being the length of the term set.
2. Associating front and back pieces of a rule
Let x and y be any feature term set, and the implication of the form x → y is called association rule, where x is called rule antecedent and y is called rule postcedent.
3. Rule expansion word
The rule expansion word means that the expansion word is from a back item set of the association rule, and the front item set of the association rule is an original query item set.
4. Rule expansion word weight calculation
Taking the confidence coefficient of the association rule of the former item set as the original query word as the weight w of the rule expansion wordEt
The weight w of the expanded wordEtThe calculation formula is shown in formula (11):
Figure BDA0002617865330000061
in the formula (11), the association rule Qi→ETjIn, QiFor sets of terms containing query terms, for association rule antecedents, ETjAn item set which does not contain the query terms and contains the expansion terms is a back part of the association rule; AllDoc (count) indicates pseudo-correlation feedbackThe total number of documents in the Chinese document library; allItems (weight) represents the weight accumulation sum of all Chinese characteristic words in the pseudo-correlation feedback Chinese document library; qi_ Count represents a set of items QiFrequency of occurrence, Q, in pseudo-associative feedback Chinese document libraryi' Weight representation item set QiTerm set weights in pseudo-relevance feedback Chinese document library, (Q)i∪Etj) _ Count represents a set of items (Q)i∪Etj) Frequency of occurrence in pseudo-relevance feedback Chinese document library, (Q)i∪Etj) ' Weight representation item set (Q)i∪Etj) Item set weights in a pseudo-relevance feedback Chinese document library. max () represents the maximum value of the confidence of the association rule, and when the same rule expansion word appears in a plurality of association rule modes at the same time, the maximum value of the confidence is taken as the weight of the rule expansion word.
5. Word embedding expansion word and weight thereof
The word embedding expansion word is derived from a set of word embedding vectors. The specific description is as follows: in the word embedding vector set, calculating each query term Q of the original query term set Qi(qi∈Q,Q=(q1,q2,…,qn) I is more than or equal to 1 and less than or equal to n)) and all word embedding candidate expansion words (cet)1,cet2,…,cetm) Word vector cosine similarity degree of (q) VCosi,cetj) Giving a minimum similarity threshold value minqvcos, and extracting query terms q with the cosine similarity of the word vector not less than the similarity threshold value minqvcosiThe word is embedded into a candidate expansion word as the query term qiWord embedding expansion word (q)iet1,qiet2,…,qietp1) Will query term q1,q2,…,qnAnd embedding all the words into the extension Word combinations, and removing repeated words to obtain a final Word embedding extension Word set ET _ WE (expansion Term from Word embedding) of the original query Term set Q. The word embedding candidate expansion words refer to those non-query terms in the word embedding vector set.
The word embedding extension word ET _ WE is shown as equation (12):
Figure BDA0002617865330000071
the vector cosine similarity VCos (q)i,cetj) Is calculated as shown in equation (13):
Figure BDA0002617865330000072
in the formula (13), the vcetjIndicating that the jth word is embedded into the candidate expansion word cetjWord vector value of, vqiRepresenting the ith query term qiThe word vector value of.
And taking the vector cosine similarity value of the query lexical item and the word embedding expansion word as the weight of the word embedding expansion word.
The weight w of the word embedded expansion wordWEETWhen a repeated word appears, the word embedding expansion word weight is equal to the accumulated sum of the similarity of each vector of the repeated word, as shown in formula (14).
Figure BDA0002617865330000073
6. Support degree and confidence degree based on Copulas theory
The copula function theory (see Sklar A. the principles de repetition a n dimension sets sources marks J. the Publication de l 'institute de Statistique l' universities 1959,8(1): 229) is used to describe the correlation between variables, and arbitrary forms of distributions can be combined and connected into an effective multivariate distribution function. By referring to Copulas function theory, the invention provides support csup (Copulas based supported port) and confidence cconf (Copulas based configured confidence) based on Copulas theory, which are described in detail as follows.
Characteristic term set (T) based on copula theory1∪T2) Support CSup (T)1∪T2) Is calculated as shown in equation (15):
Figure BDA0002617865330000081
in formula (15), (T)1∪T2) _ Count represents a set of items (T)1∪T2) Frequency of occurrence in pseudo-relevance feedback Chinese document library, (T)1∪T2) ' Weight representation item set (T)1∪T2) Item set weight in the pseudo relevant feedback Chinese document library, and all doc (count) represents the total document quantity of the pseudo relevant feedback Chinese document library; allitems (weight) represents the weighted sum of all Chinese feature words in the pseudo-relevant feedback Chinese document library.
Association rule (T) based on copula theory1→T2) Confidence CConf (T)1→T2) Is calculated as shown in equation (16):
Figure BDA0002617865330000082
in formula (16), T1"Count" represents a set of items T1Frequency of occurrence, T, in pseudo-correlated feedback Chinese document library1' Weight representation item set T1Term set weights in pseudo-relevance feedback Chinese document library, (T)1∪T2) _ Count represents a set of items (T)1∪T2) Frequency of occurrence in pseudo-relevance feedback Chinese document library, (T)1∪T2) ' Weight representation item set (T)1∪T2) Item set weights in a pseudo-relevant feedback Chinese document library; AllDoc (count) and AllItems (weight) are defined as in equation (15).
The invention is further explained below by referring to the drawings and specific comparative experiments.
As shown in FIG. 1, the method for expanding Chinese queries by fusion of deep learning and expanded word mining intersection of the present invention comprises the following steps:
step 1, searching a Chinese document set for original query to obtain a primary detection document set, and performing Chinese word segmentation and stop word removal pretreatment on the primary detection document set.
Step 2, performing word embedding semantic learning training on the initial inspection document set by using a deep learning tool to obtain a feature word and word embedding vector set, and specifically comprising the following steps:
and (2.1) performing word embedding semantic learning training on the primary detection pseudo-related feedback document set by adopting a Skip-gram model of a deep learning tool Google open source word vector tool word2vec to obtain a word embedding vector set of the primary detection document characteristic words.
(2.2) in the word embedding vector set of the initial detection document characteristic words, calculating each query term qi(qiE.g. Q, Q is the original query term set, Q ═ Q1,q2,…,qn) I is more than or equal to 1 and less than or equal to n)) and all word embedding candidate expansion words (cet)1,cet2,…,cetm) Word vector cosine similarity degree of (q) VCosi,cetj) The formula is shown as formula (1), wherein j is more than or equal to 1 and less than or equal to m. The word embedding candidate expansion words refer to those non-query terms in the word embedding vector set.
Figure BDA0002617865330000091
In the formula (1), vcetjIndicating that the jth word is embedded into the candidate expansion word cetjWord vector value of, vqiRepresenting the ith query term qiThe word vector value of.
(2.3) given a minimum vector cosine similarity threshold minqvcos, extracting its VCos (q)i,cetj) Query term q of not less than minqvcosiThe word is embedded into a candidate expansion word as the query term qiWord embedding expansion word (q)iet1,qiet2,…,qietp1) Will query term q1,q2,…,qnEmbedding all the words into the extension Word combination, removing the repeated words to obtain the final Word embedding extension Word set ET _ WE (expansion Term from Word embedding) of the original query Term set Q, and calculating the weight w of the Word embedding extension wordsWEETAnd then, the step 3 is carried out. The ET _ WE is shown as formula (2):
Figure BDA0002617865330000092
the weight w of the word embedded expansion wordWEETFor searching terms and word-inlaysAnd (3) entering the vector cosine similarity of the expansion word, wherein when the repeated word appears, the weight of the word embedded expansion word is equal to the cumulative sum of the vector similarities of the repeated word.
Figure BDA0002617865330000093
Step 3, adopting a Copulas theory-based pseudo-related feedback expansion word mining method to mine rule expansion words in the initial detection pseudo-related feedback document set, and establishing a rule expansion word set, wherein the method specifically comprises the following steps:
and (3.1) extracting m pieces of primary detection documents in the primary detection document set, constructing a primary detection pseudo-related feedback document set, carrying out Chinese word segmentation, Chinese stop words removal and feature word extraction preprocessing on the primary detection pseudo-related feedback document set, calculating a feature word weight, and finally constructing a pseudo-related feedback Chinese document library and a Chinese feature word library.
The invention adopts TF-IDF weighting technology to calculate the weight of the feature words.
(3.2) taking the feature words in the Chinese feature word library as 1_ candidate item set C1
(3.3) calculation of C1Support degree CSup (C) based on Copulas theory1) If CSup (C)1) More than or equal to the minimum support threshold ms, C is set1As 1_ frequent item set L1And added to the frequent itemset set fis (frequency itemset).
The csup (copula basedsupport) represents the support degree based on copula theory. The CSup (C)1) Is calculated as shown in equation (4):
Figure BDA0002617865330000101
in the formula (4), C1"Count" represents 1_ candidate C1The frequency of occurrence in the pseudo related feedback Chinese document library, all doc (count) represents the total document number of the pseudo related feedback Chinese document library, C1"Weight" represents the 1_ candidate C1Item set weights in pseudo-relevance feedback Chinese document library, allItems (weight) representing pseudo-relevanceAnd feeding back the weight accumulation sum of all Chinese characteristic words in the Chinese document library.
(3.4) adopting a self-connection method to connect (k-1) _ frequent item set Lk-1Deriving k _ candidate C from concatenationkAnd k is more than or equal to 2.
The self-join method uses a candidate set join method given in Apriori algorithm.
(3.5) when mining to 2_ candidate C2When, if the C is2If the original query term is not contained, the C is deleted2If the C is2If the original query term is contained, the C is left2Then, C is left2And (4) transferring to the step (3.6). When mining to k _ candidate CkAnd when the k is more than or equal to 3, directly transferring to the step (3.6).
(3.6) calculation of CkSupport degree CSup (C) based on Copulas theoryk) If CSup (C)k) Not less than ms, then CkIs k _ frequent item set LkAdding into FIS, then transferring into the step (3.7), or directly transferring into the step (3.7).
The CSup (C)k) Is calculated as shown in equation (5):
Figure BDA0002617865330000102
in the formula (5), Ck"Count" denotes k _ candidate CkFrequency of occurrence in pseudo-relevance feedback Chinese document library, Ck"Weight" denotes k _ candidate CkItem set weights in a pseudo-relevance feedback Chinese document library. AllDoc (count) and AllItems (weight) are defined as in equation (4).
(3.7) after k is added with 1, the step (3.4) is carried out to continue the subsequent steps until the LkAnd (4) if the item set is an empty set, finishing the mining of the frequent item set, and turning to the step (3.8).
(3.8) taking out k _ frequent item set L from FISkAnd k is more than or equal to 2.
(3.9) extraction of LkIs set of proper subset entries EtjAnd QiAnd is and
Figure BDA0002617865330000103
Qi∪Etj=Lk
Figure BDA0002617865330000104
et (E) describedjFor a proper subset of terms set without query terms, said QiThe method comprises the steps of setting a proper subset item set containing query terms, wherein Q is an original query term set.
(3.10) calculate association rule Q based on Copulas theoryi→EtjConfidence of (CConf) (Q)i→Etj) If CConf (Q)i→ETj) If the confidence coefficient is more than or equal to the minimum confidence coefficient threshold mc, Q is addedi→EtjAdding into the association rule set AR (Association rule), then, proceeding to step (3.9), from LkTo re-extract the other proper subset item sets EtjAnd QiSequentially proceeding the next steps, and circulating the steps until LkIf and only if all proper subset entries in the set are retrieved once, then proceed to step (3.8), perform a new round of association rule pattern mining, and retrieve any other L from the FISkThen, the subsequent steps are performed sequentially, and the process is circulated until all k _ frequent item sets L in the FISkIf and only if all are taken out once, then the association rule pattern mining is finished, and the process goes to the following step (3.11).
The CConf (copula based Confidence) represents a confidence based on copula theory, the CConf (Q)i→ETj) Is represented by equation (6):
Figure BDA0002617865330000111
in the formula (6), Qi"Count represents the proper subset item set QiFrequency of occurrence, Q, in pseudo-associative feedback Chinese document libraryi'Weight' represents a proper subset item set QiTerm set weights in pseudo-relevance feedback Chinese document library, (Q)i∪Etj) _ Count represents a set of items (Q)i∪Etj) Frequency of occurrence in pseudo-relevance feedback Chinese document library, (Q)i∪Etj) ' Weight representation item set (Q)i∪Etj) Item set weights in a pseudo-relevance feedback Chinese document library. AllDoc (count) and AllItems (weight) are defined as in equation (4).
(3.11) extracting the association rule backing Et from the association rule set ARjAs rule extension words, obtain rule extension word set ET _ AR (extension Term from Association rules), calculate rule extension word weight wEtThen, the process proceeds to step 4.
The ET _ AR is shown as formula (7):
Figure BDA0002617865330000112
in formula (7), RetiIndicating the ith rule expansion word.
The rule expansion word weight wEtThe calculation formula is shown in formula (8):
Figure BDA0002617865330000113
in the formula (8), max () represents the maximum value of the confidence of the association rule, and when the same rule expansion word appears in a plurality of association rule patterns at the same time, the maximum value of the confidence is taken as the weight of the rule expansion word.
Step 4, performing intersection fusion on the rule expansion word set and the word embedding expansion word set to obtain a final expansion word, and realizing query expansion, wherein the specific steps are as follows:
(4.1) performing intersection operation on the rule extension word Set ET _ AR and the word embedding extension word Set ET _ WE to obtain a final extension word Set ETS _ Q (expansion Term Set for Query Q) of the original Query Term Set Q, and calculating a final extension word weight w (ET)i)。
The final extended word set ETS _ Q is calculated as shown in equation (9):
Figure BDA0002617865330000114
final expanded word weight w (ET)i) Is calculated as shown in equation (10):
w(ETi)=wEt+wWEET(10)
And (4.2) combining the final expansion word with the original query into a new query, and searching the Chinese document again to realize query expansion.
Experimental design and results:
we compared the method of the present invention with the prior art similar method to perform the search experiment to illustrate the effectiveness of the method of the present invention.
1. Experimental environment and experimental data:
the experimental corpus is NTCIR-5CLIR (see http:// research. ni. ac. jp/NTCIR/data/data-en. html.) Chinese text standard corpus, 901446 Chinese traditional documents (converted into Chinese simplified bodies during experiments) are distributed in 8 data sets as shown in Table 1. The NTCIR-5CLIR corpus has 50 chinese queries, 4 types of query topics and a result set of 2 evaluation criteria (i.e., Rigid (highly relevant, relevant to query) and Relax (highly relevant, and partially relevant to query) evaluation criteria). The retrieval experiment is completed by adopting the topics of Description (Desc for short, belonging to long query) and Title query (belonging to short query). The index for evaluation of the search for the Experimental results is MAP (mean Average precision)
TABLE 1 original corpus and its quantity
Figure BDA0002617865330000121
2. The reference retrieval and comparison method comprises the following steps:
the experimental basic retrieval environment is built by Lucene.
The reference retrieval is a retrieval result obtained by submitting an original query to Lucene.
The comparative method is described as follows:
comparative method 1: query expansion method of word vectors based on literature (see details: Kan, Linyuan, kojiu, etc.. patent query expansion word vector method study [ J ]. computer science and exploration, 2018,12(6):972-980.), parameters: α is 0.1 and k is 60.
Comparative method 2: mining rule expansion words by adopting a weighted association pattern mining technology of documents (see detail: yellow name selection, cross-English cross-language query expansion [ J ] information academic newspaper, 2017,36(3): 307-: c is 0.1, mi is 0.0001, and the results of the experiment are average values when ms is 0.004,0.005,0.006,0.007, respectively.
The Skip-gram model words used by the invention are embedded with semantic learning training parameters: batch _ size 128, embedding _ size 300, skip _ window 2, num _ skip 4, and num _ sampled 64.
3. The experimental methods and results are as follows:
the average value of the MAP obtained by the method of the invention is shown in the table 2 and the table 3, wherein the average amplification (%) in the table refers to the total average amplification of the search results of the method of the invention on 8 data sets relative to the reference search and the contrast expansion method.
The average amplification calculation method comprises the following steps: firstly, the amplification of the retrieval result of the method of the invention on each data set relative to the reference retrieval and contrast expansion method is calculated, then, the amplification on each data set is accumulated and then is divided by 8, and the total average amplification of the retrieval result of the method of the invention relative to other methods is obtained.
TABLE 2 MAP value of search performance (Title query) for the method of the present invention and the reference search and comparison method
Figure BDA0002617865330000131
TABLE 3 search Performance MAP values for the inventive method and the reference search and comparison method (Desc Inquiry)
Figure BDA0002617865330000132
Tables 2 and 3 show that compared with the reference retrieval, the method has the advantages that the MAP value of the retrieval result is better improved, the effect is obvious, the average amplification is 27.87%, the MAP value of the method is mostly higher than that of the comparison method, and the extended retrieval performance of the method is higher than that of the reference retrieval and the similar comparison method. The experimental result shows that the method is effective, can actually improve the information retrieval performance, and has very high application value and wide popularization prospect.

Claims (7)

1. A Chinese query expansion method integrating deep learning and expanded word mining intersection is characterized by comprising the following steps:
step 1, searching a Chinese document set for original query to obtain a primary check document set, and performing Chinese word segmentation and stop word removal preprocessing on the primary check document set;
step 2, performing word embedding semantic learning training on the initial inspection document set by using a deep learning tool to obtain a feature word and word embedding vector set, and specifically comprising the following steps:
(2.1) performing word embedding semantic learning training on the primary detection pseudo-related feedback document set by adopting a deep learning tool to obtain a word embedding vector set of the primary detection document feature words;
(2.2) in the word embedding vector set of the initial detection document characteristic words, calculating each query term qi(qiE.g. Q, Q is the original query term set, Q ═ Q1,q2,…,qn) I is more than or equal to 1 and less than or equal to n)) and all word embedding candidate expansion words (cet)1,cet2,…,cetm) Word vector cosine similarity degree of (q) VCosi,cetj) Wherein j is more than or equal to 1 and less than or equal to m; the word embedding candidate expansion words refer to those non-query terms in the word embedding vector set;
(2.3) given a minimum vector cosine similarity threshold minqvcos, extracting its VCos (q)i,cetj) Query term q of not less than minqvcosiThe word is embedded into a candidate expansion word as the query term qiWord embedding expansion word (q)iet1,qiet2,…,qietp1) Will query term q1,q2,…,qnEmbedding all the words into the expanded word combination, and removing repeated words to obtain the original query term setThe final word embedding expansion word set ET _ WE of the Q is combined, and the weight w of the word embedding expansion word is calculatedWEETThen, turning to the step 3;
step 3, adopting a Copulas theory-based pseudo-related feedback expansion word mining method to mine rule expansion words in the initial detection pseudo-related feedback document set, and establishing a rule expansion word set, wherein the method specifically comprises the following steps:
(3.1) extracting m pieces of primary detection documents in the primary detection document set, constructing a primary detection pseudo-related feedback document set, performing Chinese word segmentation, Chinese stop words removal and feature word extraction preprocessing on the primary detection pseudo-related feedback document set, calculating a feature word weight, and finally constructing a pseudo-related feedback Chinese document library and a Chinese feature word library;
(3.2) taking the feature words in the Chinese feature word library as 1_ candidate item set C1
(3.3) calculation of C1Support degree CSup (C) based on Copulas theory1) If CSup (C)1) More than or equal to the minimum support threshold ms, C is set1As 1_ frequent item set L1And adding to a frequent item set FIS;
(3.4) adopting a self-connection method to connect (k-1) _ frequent item set Lk-1Deriving k _ candidate C from concatenationkThe k is more than or equal to 2;
(3.5) when mining to 2_ candidate C2When, if the C is2If the original query term is not contained, the C is deleted2If the C is2If the original query term is contained, the C is left2Then, C is left2Transferring to the step (3.6); when mining to k _ candidate CkWhen k is more than or equal to 3, directly switching to the step (3.6);
(3.6) calculation of CkSupport degree CSup (C) based on Copulas theoryk) If CSup (C)k) Not less than ms, then CkIs k _ frequent item set LkAdding into FIS, then transferring into the step (3.7), or directly transferring into the step (3.7);
(3.7) after k is added with 1, the step (3.4) is carried out to continue the subsequent steps until the LkIf the item set is an empty set, finishing the excavation of the frequent item set, and turning to the step (3.8);
(3.8) taking out k _ frequent item set L from FISkThe k is more than or equal to 2;
(3.9) extraction of LkIs set of proper subset entries EtjAnd QiAnd is and
Figure FDA0002617865320000023
Qi∪Etj=Lk
Figure FDA0002617865320000024
et (E) describedjFor a proper subset of terms set without query terms, said QiThe method comprises the steps of (1) setting a proper subset item set containing query terms, wherein Q is an original query term set;
(3.10) calculate association rule Q based on Copulas theoryi→EtjConfidence of (CConf) (Q)i→Etj) If CConf (Q)i→ETj) If the confidence coefficient is more than or equal to the minimum confidence coefficient threshold mc, Q is addedi→EtjAdding to the association rule set AR, then proceeding to step (3.9), from LkTo re-extract the other proper subset item sets EtjAnd QiSequentially proceeding the next steps, and circulating the steps until LkIf and only if all proper subset entries in the set are retrieved once, then proceed to step (3.8), perform a new round of association rule pattern mining, and retrieve any other L from the FISkThen, the subsequent steps are performed sequentially, and the process is circulated until all k _ frequent item sets L in the FISkIf and only if the rule patterns are taken out once, the association rule pattern mining is finished, and the following step (3.11) is carried out;
(3.11) extracting the association rule backing Et from the association rule set ARjAs a rule expansion word, obtaining a rule expansion word set ET _ AR, and calculating a rule expansion word weight wEtThen, go to step 4;
step 4, performing intersection fusion on the rule expansion word set and the word embedding expansion word set to obtain a final expansion word, and realizing query expansion, wherein the specific steps are as follows:
(4.1) performing intersection operation on the rule extended word set ET _ AR and the word embedded extended word set ET _ WE to obtainTo the final extension word set ETS _ Q of the original query term set Q, and calculating the final extension word weight w (ET)i);
And (4.2) combining the final expansion word with the original query into a new query, and searching the Chinese document again to realize query expansion.
2. The method for expanding Chinese queries by intersection fusion of deep learning and expanded word mining according to claim 1, wherein:
in said step (2.2), the word vector cosine similarity VCos (q)i,cetj) The calculation of (a) is performed according to equation (1):
Figure FDA0002617865320000021
in the formula (1), vcetjIndicating that the jth word is embedded into the candidate expansion word cetjWord vector value of, vqiRepresenting the ith query term qiA word vector value of;
in the step (2.3), the word embedding expansion word set ET _ WE of the final query Q is shown as a formula (2);
Figure FDA0002617865320000022
the weight w of the word embedded expansion wordWEETThe cosine similarity of the vector of the query term and the word embedded expansion word is shown in formula (3), and when a repeated word appears, the weight of the word embedded expansion word is equal to the accumulated sum of the similarity of each vector of the repeated word;
Figure FDA0002617865320000031
3. the method for expanding Chinese queries by intersection fusion of deep learning and expanded word mining according to claim 1, wherein:
in the step (3.3), the CSup (C)1) Is calculated as shown in equation (4):
Figure FDA0002617865320000032
in the formula (4), C1"Count" represents 1_ candidate C1The frequency of occurrence in the pseudo related feedback Chinese document library, all doc (count) represents the total document number of the pseudo related feedback Chinese document library, C1"Weight" represents the 1_ candidate C1Item set weights in the pseudo-correlation feedback Chinese document library, wherein allItems (weight) represents the weight accumulation sum of all Chinese characteristic words in the pseudo-correlation feedback Chinese document library;
in the step (3.6), the CSup (C)k) Is calculated as shown in equation (5):
Figure FDA0002617865320000033
in the formula (5), Ck"Count" denotes k _ candidate CkFrequency of occurrence in pseudo-relevance feedback Chinese document library, Ck"Weight" denotes k _ candidate CkItem set weights in a pseudo-relevant feedback Chinese document library; the definitions of alldoc (count) and allItems (weight) are the same as in formula (4);
in the step (3.10), the CConf (Q)i→ETj) Is represented by equation (6):
Figure FDA0002617865320000034
in the formula (6), Qi"Count represents the proper subset item set QiFrequency of occurrence, Q, in pseudo-associative feedback Chinese document libraryi'Weight' represents a proper subset item set QiTerm set weights in pseudo-relevance feedback Chinese document library, (Q)i∪Etj) _ Count represents a set of items (Q)i∪Etj) Frequency of occurrence in pseudo-relevance feedback Chinese document library, (Q)i∪Etj) ' Weight representation item set (Q)i∪Etj) Item set weights in a pseudo-relevant feedback Chinese document library; allDoc (count) and allItems (weight) are defined as in formula (4);
in the step (3.11), the ET _ AR is as shown in formula (7):
Figure FDA0002617865320000035
in formula (7), RetiRepresenting the ith rule expansion word;
the rule expansion word weight wEtThe calculation formula is shown in formula (8):
Figure FDA0002617865320000041
in the formula (8), max () represents the maximum value of the confidence of the association rule, and when the same rule expansion word appears in a plurality of association rule patterns at the same time, the maximum value of the confidence is taken as the weight of the rule expansion word.
4. The method for expanding Chinese queries by intersection fusion of deep learning and expanded word mining according to claim 1, wherein:
in the step (4.1), the final extended word set ETS _ Q is calculated as shown in equation (9):
Figure FDA0002617865320000042
final expanded word weight w (ET)i) Is calculated as shown in equation (10):
w(ETi)=wEt+wWEET(10)。
5. the method for expanding Chinese queries by intersection fusion of deep learning and expanded word mining according to claim 1, wherein: in the step (2.1), the deep learning tool adopts a Skip-gram model of a Google open source word vector tool word2 vec.
6. The method for expanding Chinese queries by intersection fusion of deep learning and expanded word mining according to claim 1, wherein: in the step (3.1), a TF-IDF weighting technology is adopted to calculate the weight of the feature words.
7. The method for expanding Chinese queries by intersection fusion of deep learning and expanded word mining according to claim 1, wherein: in the step (3.4), the self-connection method adopts a candidate connection method given in Apriori algorithm.
CN202010774430.4A 2020-08-04 2020-08-04 Chinese query expansion method integrating deep learning and expansion word mining intersection Withdrawn CN111897926A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010774430.4A CN111897926A (en) 2020-08-04 2020-08-04 Chinese query expansion method integrating deep learning and expansion word mining intersection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010774430.4A CN111897926A (en) 2020-08-04 2020-08-04 Chinese query expansion method integrating deep learning and expansion word mining intersection

Publications (1)

Publication Number Publication Date
CN111897926A true CN111897926A (en) 2020-11-06

Family

ID=73245586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010774430.4A Withdrawn CN111897926A (en) 2020-08-04 2020-08-04 Chinese query expansion method integrating deep learning and expansion word mining intersection

Country Status (1)

Country Link
CN (1) CN111897926A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765966A (en) * 2021-04-06 2021-05-07 腾讯科技(深圳)有限公司 Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
CN114036516A (en) * 2021-10-27 2022-02-11 西安电子科技大学 Unknown sensitive function discovery method based on two-stage analogy reasoning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765966A (en) * 2021-04-06 2021-05-07 腾讯科技(深圳)有限公司 Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
CN112765966B (en) * 2021-04-06 2021-07-23 腾讯科技(深圳)有限公司 Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
CN114036516A (en) * 2021-10-27 2022-02-11 西安电子科技大学 Unknown sensitive function discovery method based on two-stage analogy reasoning

Similar Documents

Publication Publication Date Title
Wen et al. Research on keyword extraction based on word2vec weighted textrank
Mahata et al. Theme-weighted ranking of keywords from text documents using phrase embeddings
CN104182527A (en) Partial-sequence itemset based Chinese-English test word association rule mining method and system
CN111897926A (en) Chinese query expansion method integrating deep learning and expansion word mining intersection
CN111753066A (en) Method, device and equipment for expanding technical background text
CN111897922A (en) Chinese query expansion method based on pattern mining and word vector similarity calculation
CN109739953B (en) Text retrieval method based on chi-square analysis-confidence framework and back-part expansion
CN109726263B (en) Cross-language post-translation hybrid expansion method based on feature word weighted association pattern mining
CN109684463B (en) Cross-language post-translation and front-part extension method based on weight comparison and mining
CN111723179A (en) Feedback model information retrieval method, system and medium based on concept map
CN111897928A (en) Chinese query expansion method for embedding expansion words into query words and counting expansion word union
CN111897924A (en) Text retrieval method based on association rule and word vector fusion expansion
CN111897921A (en) Text retrieval method based on word vector learning and mode mining fusion expansion
Bouziri et al. Learning query expansion from association rules between terms
Heidary et al. Automatic text summarization using genetic algorithm and repetitive patterns
CN111897927B (en) Chinese query expansion method integrating Copulas theory and association rule mining
Li et al. Deep learning and semantic concept spaceare used in query expansion
CN108416442B (en) Chinese word matrix weighting association rule mining method based on item frequency and weight
CN111897919A (en) Text retrieval method based on Copulas function and pseudo-correlation feedback rule expansion
CN109684464B (en) Cross-language query expansion method for realizing rule back-part mining through weight comparison
CN111897925B (en) Pseudo-correlation feedback expansion method integrating correlation mode mining and word vector learning
CN109684465B (en) Text retrieval method based on pattern mining and mixed expansion of item set weight value comparison
Wu et al. Beyond greedy search: pruned exhaustive search for diversified result ranking
CN111897923A (en) Text retrieval method based on intersection expansion of word vector and association mode
CN113064978A (en) Project construction period rationality judgment method and device based on feature word matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201106