CN111324705B - System and method for adaptively adjusting associated search terms - Google Patents

System and method for adaptively adjusting associated search terms Download PDF

Info

Publication number
CN111324705B
CN111324705B CN201910088844.9A CN201910088844A CN111324705B CN 111324705 B CN111324705 B CN 111324705B CN 201910088844 A CN201910088844 A CN 201910088844A CN 111324705 B CN111324705 B CN 111324705B
Authority
CN
China
Prior art keywords
word
search
text
words
record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910088844.9A
Other languages
Chinese (zh)
Other versions
CN111324705A (en
Inventor
沈民新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial Technology Research Institute ITRI
Original Assignee
Industrial Technology Research Institute ITRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial Technology Research Institute ITRI filed Critical Industrial Technology Research Institute ITRI
Publication of CN111324705A publication Critical patent/CN111324705A/en
Application granted granted Critical
Publication of CN111324705B publication Critical patent/CN111324705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a system for adaptively adjusting associated search words, which comprises an input device, a record collecting module, a threshold setting module and an evolution module. The input device is used for receiving a search word. The record collecting module is used for judging whether the accumulated searching times of the searching words are larger than a first threshold value or smaller than a second threshold value. The threshold setting module is used for setting the number of search records meeting the first or second threshold. When the accumulated searching times of the search words are between the first threshold value and the second threshold value, the evolution module optimizes the medium-term searching flow so as to further find out at least one associated word and/or at least one historical search word which are related to the content or attribute of the search words and are maximized in the index text and the historical search records.

Description

System and method for adaptively adjusting associated search terms
Technical Field
The invention relates to a system and a method for adaptively adjusting associated search words.
Background
Modern search systems typically feed back to the user and other search terms related to the search term in the search results at the same time to assist the user in quickly learning the query target, because the search keywords used by the user often cannot describe their search intent precisely in terms of short words, or because the given search term or search target by the user has multiple description modes or ambiguities, which may cause mismatching of word terms between the user and text, or misuse of wrong search terms due to insufficient understanding or knowledge of the user about the search target, or typing errors of the user such as homophones or near-phones, and so on. In general, extraction techniques for associated search terms can be categorized according to data source into methods based on indexed text content and methods based on historical query records. The text-based method provides a suggestion list of related search words according to the related analysis among words in the index text content at the early stage of the online of the search system, but has the defect that suggestions can be provided according to fixed text content only, and the search intention of a user cannot be predicted according to the analysis of the history query records accumulated at the later stage. Although the method based on the historical query records can provide the latest search intention prediction according to the continuously accumulated user data so as to obtain a better suggestion list of the associated search words, suggestions cannot be provided immediately in the early stage of the system, and a long time is required for users to use so as to accumulate a sufficient amount of analysis data sources. In the existing method, the weight integration method is also utilized to combine the two methods, so that the associated search word can be recommended no matter in the early stage of the search system, in which no user history data stage exists, or in the later stage, enough history data stage is accumulated.
However, the weight integration method also has the problem of data sources of weight combination, so that the manual setting often cannot achieve the optimal effect, and enough search record data is usually required to be accumulated to train to obtain the first group of optimal weight combination in a statistical model or machine learning mode, and the difficult problem of transfer learning in different vertical fields still exists. Therefore, the above extraction techniques are applicable to search systems with different online periods, and because the number of search records is different, the relevant search terms suitable for the suggested users cannot be provided at any time, and there is a need for an improved method.
Disclosure of Invention
The invention relates to a system and a method for adaptively adjusting associated search words, which can self-adjust the associated search words according to the number of search records accumulated by the system so as to provide the associated search words suitable for suggested users.
According to an aspect of the present invention, a system for adaptively adjusting associated search terms is provided, which includes an input device, a record collection module, a threshold setting module, and an evolution module. The input device is used for receiving user input and generating a search word. The record collecting module is used for judging whether the accumulated searching times of the searching words are larger than a first threshold value or smaller than a second threshold value. The threshold setting module is used for setting the number of search records meeting the first or second threshold. The evolution module is used for adjusting a search flow according to the number of the search records, wherein when the accumulated search times of the search words is larger than a first threshold value, the evolution module finds out at least one historical search word related to the content or attribute of the search word according to a historical search record. When the accumulated searching times of the search words are smaller than a second threshold value, the evolution module executes an initial searching process to find out at least one related word related to the content or the attribute of the search words in a text. When the accumulated searching times of the search words are between the first threshold value and the second threshold value, the evolution module optimizes the medium-term searching flow so as to further find out at least one associated word and/or at least one historical search word which are maximized in the text and the historical search records and are related to the content or the attribute of the search words.
According to one aspect of the present invention, a method for adaptively adjusting associated search terms is provided, comprising the following steps. The input process is used for receiving user input and generating a search word. The record collecting process is used for judging whether the accumulated searching times of the search words are larger than a first threshold value or smaller than a second threshold value. The threshold setting process is used for setting the number of search records meeting the first or second threshold. The evolution process is used for adjusting a search process according to the number of the search records, wherein when the accumulated search times of the search words is larger than a first threshold value, the evolution process finds out at least one historical search word related to the content or attribute of the search word according to a historical search record. When the accumulated searching times of the search words are smaller than the second threshold value, the evolution process executes an initial searching process to find out at least one related word related to the content or the attribute of the search words in a text. When the accumulated searching times of the search words are between the first threshold value and the second threshold value, the evolution flow optimizes the medium-term searching flow so as to further find out at least one associated word and/or at least one historical search word which are maximized in the text and the historical search records and are related to the content or attribute of the search words.
For a better understanding of the above and other aspects of the invention, reference will now be made in detail to the following examples, examples of which are illustrated in the accompanying drawings:
drawings
FIG. 1 is a diagram of a system for adaptively adjusting associated search terms according to an embodiment of the invention.
FIG. 2 is a schematic diagram illustrating an initial search process performed by the system for adaptively adjusting associated search terms according to an embodiment of the invention.
FIG. 3 is a schematic diagram of a system for adaptively adjusting associated search terms to optimize a mid-term search process according to one embodiment of the invention.
FIG. 4 is a diagram illustrating a post-search process performed by a system for adaptively adjusting associated search terms according to an embodiment of the invention.
[ reference numerals ]
100: system for adaptively adjusting associated search terms
102: search engine
110: input device
112: search term
114: text of
120: record collection module
122: history search record
124: record database
126: text database
130: threshold setting module
132: threshold value
140: evolution module
142: new word discovery module
144: index vocabulary
146: word segmentation module
148: related words
150: text associated word generation module
152: text-related vocabulary
160: record associated word generating module
162: recording associated vocabulary
170: associated word discrimination calculating module
172: authentication value
174: associated word recommendation module
176: associated search word list
Detailed Description
The following examples are presented for illustrative purposes only and are not intended to limit the scope of the invention. The same/similar elements are denoted by the same/similar symbols. The directional terms mentioned in the following embodiments are, for example: upper, lower, left, right, front or rear, etc., are merely references to the directions of the accompanying drawings. Thus, the directional terminology is used for purposes of illustration and is not intended to be limiting of the invention.
In accordance with one embodiment of the present invention, a system for adaptively adjusting associated search terms, such as a search engine having a self-adjusting search process, is presented. For the search engine which is imported into the system in the initial stage, before a sufficient number of search records are not accumulated, the system can compare at least one associated word related to the text content or the characteristic attribute of the search word in the text in the initial stage according to the text and the index word list which are already established so as to establish an initial associated search word list. Then, after accumulating a certain number of search records in the middle stage, the system can compare at least one associated word and/or at least one historical search word which are maximized in relation to the content or attribute of the search word in the text and the historical search records according to a certain number of historical search records and the text which is established with the index in the early stage so as to establish a related search word list in the middle stage. After a sufficient number of search records are accumulated in the later period, the system can directly find at least one historical search word related to the content or attribute of the search word according to the search word input by the user so as to establish a later period associated search word list.
From the above, the system can achieve the function of self-optimization according to the number of the accumulated search records in different periods, so that the evolution module can smoothly evolve from the stage of the early useless behavior record (search record) to the stage of the later user behavior record (search record) as a main part, and further provides the associated search word suitable for the recommended user.
Referring to fig. 1, a system 100 for adaptively adjusting associated search terms according to an embodiment of the invention includes an input device 110, a record collecting module 120, a threshold setting module 130 and an evolution module 140. The input device 110 is used for receiving user input and generating a search term 112. The record collection module 120 is configured to determine whether the accumulated search times of the search term 112 is greater than a first threshold or less than a second threshold (represented by a threshold 132). The threshold setting module 130 is configured to set the number of search records satisfying the first or second threshold. In addition, the evolution module 140 is configured to adjust a search process according to the number of search records.
In one embodiment, the input device 110 may be a user interface for reading data input by a user, including text, symbols, and/or voice. Taking a computer or a remote server as an example, the input device 110 may be a handheld electronic device connected to the computer or the remote server, which is not limited by the present invention, the input device 110 may input the search word 112 to be searched by the user into the computer or the remote server, and then search the data of the online or local text database by importing the data into the search engine 102 of the system 100. The databases may include a records database 124 and a text database 126. The text database 126 is used for storing the source of the text 114 to be searched, including text files and/or database fields: text files such as product specification files, advertisement file files, product test report files, web page files, etc.; database fields such as data fields of a commodity database, data fields such as commodity names, keywords, commodity descriptions, brands, etc. The record database 124 is used to store a historical search record 126 of the user.
The record collection module 120 is configured to collect the content of the user's operation on the present system 100, including information such as the input search term, the click position, the click times, the browsing time, and the content or attribute of each search term 112. The record collection module 120 completes the data collection to form the history search record 126, and further stores the history search record in the record database 124. The content or attribute of the search term 112 may be a product chinese name, english name, abbreviation, brands, models, functions, and other brands, and the invention is not limited thereto, and the content or attribute of the search term 112 may be determined according to word meaning in a dictionary, user-customized semantic or manually edited open data (e.g. Wikipedia, DBpedia, open Directory Project), or statistical named entity recognition (Name EntityRecognition), and the like. After the content or attribute of the search term 112 is determined, the present system 100 then finds the related term 148 based on the content or attribute of the search term 112.
In addition, the system 100 may also filter words in the text 114 and/or the historical search records 126 that are not related to the content or attribute of the search term 112 through parsing and grammatical reconstruction of the search term 112 by the search engine 102 to ensure accuracy of data extraction and Zhou Yanxing.
Further, the threshold setting module 130 is configured to set the number of search records satisfying the first or second threshold. The number of search records is not limited to the number of search words 112 accumulated in the same vocabulary, but may be the number of search words 112 accumulated in the same type of vocabulary but similar in meaning. When different users search for the same type of search word 112 or similar search words 112, the system 100 may accumulate or weight the search records of the same type or similar search words 112, and when the number of the search records accumulated by the system 100 reaches a threshold 132, the evolution module 140 of the system 100 adjusts the search process according to the number of the search records, as shown in fig. 2, 3 and 4.
Referring to fig. 1, the system 100 may further include a word segmentation module 146, a record related word generation module 160, and a text related word generation module 150. The index vocabulary 144 comprises a set of word string tables, each word string may be composed of one or more alphanumerics or symbols, the index vocabulary may be preset manually, or may be a general dictionary or a professional field dictionary, or may be a combination of all word string phrases collected after analyzing the content of the text 114 by the word segmentation module 146, or may be a combination of the foregoing, for example, all words analyzed by the word segmentation module 146 in combination with the professional field dictionary and the text. The content of the text 114 may be a file, a web page, or a specified data table or data field of a database, for example, if the target of the search system is a commodity, the content of the text 114 may be a database field such as a commodity name, a commodity description, a commodity keyword, etc. of a commodity data table in a commodity database, and a commodity description web page content.
The word segmentation module 146 may segment the search term 112 (e.g., chinese word) entered by the user into meaningful phrases. For example: the search term 112 entered by the user is a wafer reader, and the term splitting module 146 may split the wafer reader into a wafer and a reader, or just a reader. Thus, when the search term 112 is not present in the text 114, the word segmentation module 146 breaks down the search term 112 into at least one index term according to the index term table 144 for the search engine 102 to further search for the index term present in the text 114. The Chinese words can be segmented by adopting a dictionary-based segmentation algorithm, a forward maximum matching algorithm, a reverse maximum matching algorithm or a bidirectional maximum matching algorithm, or a corpus-based statistical segmentation algorithm such as a conditional random field (ConditionalRandom Fields, CRF) or a deep neural network (Deep Neural Networks, DNN), and the like, and the invention is not limited thereto.
In addition, the text-related word generating module 150 can analyze the first M index words of the text 114 that are most relevant to the search word 112 according to the index word list 144 to generate a text-related word list 152.M is, for example, 5 or a positive integer greater than 5. As described above, in one embodiment, text-related term generation module 150 may calculate a degree of association by the probability that search term 112 and the index term appear alone or together in text 114, the stronger the degree of association, and conversely, the weaker the degree of association, the worse the degree of association. The above-mentioned calculation of the association strength can be achieved by association rule learning method, point mutual information algorithm (Pointwise Mutual Information, PMI), PMI improvement algorithm, KL divergence algorithm (Kullback-Leibler divergence), standardized Google distance algorithm, algorithm based on Wordnet distance, but the present invention is not limited thereto.
In addition, the record association word generating module 160 is configured to analyze the association degree between any two of the history search words in the history search records 122, and find the first N history search words most relevant to the search word 112 to generate a record association word list 162.N is, for example, 5 or a positive integer greater than 5. As described above, in one embodiment, the record association term generation module 160 may calculate an association strength by the probability that the content or attribute of the current search term 112 and the historical search term appear alone or together in the historical search record 122, the stronger the association strength, the stronger the association, and conversely, the weaker the association strength, the worse the association. In addition, the association degree may also be calculated according to other attributes of the search term in the history search record 122, such as click position, click times, browsing time, etc., in addition to the comparison of the occurrence positions of the vocabulary content, and the calculation of the association strength may be performed by using a point-to-point information algorithm (Pointwise Mutual Information, PMI), but the present invention is not limited thereto, and other algorithms may be used, such as association rule learning method, PMI improvement algorithm, KL divergence algorithm (Kullback-Leibler divergence), standardized Google distance algorithm, and algorithm based on the wordnet distance.
Referring to fig. 1, in order to optimize the middle search process, the system 100 further includes a related-word discrimination calculation module 170 and a related-word recommendation module 174. The associated word discrimination computation module 170 may compute discrimination values 172 for each associated word 148 from the text 114, the index vocabulary 144, the record associated vocabulary 162, and the text associated vocabulary 152. Discrimination value 172 is an indicator for determining how unique the associated word 148 is, i.e., how different the associated word 148 is in the text 114. And the method can be used for improving the diversification degree of the associated word list and avoiding the problem that recommended associated words are too identical. When the related word 148 appears only in a certain text 114, the higher the authentication value; the lower the authentication value when the related word 148 is present in a plurality of texts 114 at the same time. For example, in a plurality of texts 114, the degree of uniqueness of a certain associated term 148 is inversely proportional to the frequency (document frequency, DF) of the number of the spreads of the associated term 148 in such texts 114, i.e., the inverse document frequency (inverse document frequency, IDF). Accordingly, the related term discrimination calculation module 170 may calculate the discrimination value 172 of each related term 148 using, for example, an inverse document frequency algorithm, a Residual Inverse Document Frequency (RIDF) algorithm, or a discrimination force algorithm (discrimination power) (the present invention is not limited thereto), and build a matching table of related terms 148 and discrimination values 172.
In one embodiment, when a certain related term 148 exists in the index term table 144, the related term discrimination calculation module 170 directly calculates the discrimination value of the index term. When a certain related word 148 does not exist in the index word list 144, the word segmentation module 146 segments the related word 148, and then the related word discrimination calculation module 170 calculates discrimination values for each segmented index word, and estimates the discrimination values of the related word 148 in a mode of taking the minimum value, the maximum value, the arithmetic average value, the weighted average value or the like.
In one embodiment, the system 100 also includes a new word discovery module 142 that extracts new words from a given vocabulary that are not included in the index vocabulary. The new word discovery module 142 may be calculated by language rules such as phonological rules or grammatical rules or word forming rules, or by statistical models such as hidden markov models (Hidden Markov Model, HMM), conditional random fields (Conditional Random Fields, CRF), support vector machines (Support Vector Machine, SVM), deep neural networks (Deep Neural Network, DNN), or by specific statistics such as point mutual information (Pointwise Mutual Information, PMI) algorithms. When a certain related word 148 does not exist in the index vocabulary 144, the new word discovery module 142 gives the evaluated discrimination value after extracting the part of the word string identified as the new word from the related word 148, and the new word discrimination value can be calculated by a preset fixed value or a weighted value dynamically based on the maximum value or the maximum value of all vocabulary discrimination values in the index vocabulary 144. The string portion of the non-new word in the related word may be continuously calculated according to the index word list 144, if the string portion exists in the index word list 144, the related word discrimination calculation module 170 directly calculates the discrimination value of the index word. Finally, the discrimination values of the word groups of the new word and the non-new word are obtained, and then the discrimination values of the related word 148 are estimated in a mode of taking the minimum value, the maximum value, the arithmetic average value, the weighted average value and the like. If the word string part of the non-new word does not exist in the index word list 144, the word segmentation module 146 segments the word string to obtain at least one index word, the related word discrimination calculation module 170 calculates discrimination values for the segmented index words, finally obtains discrimination values of the word groups of the new word and the non-new word, and estimates the discrimination values of the related word 148 in a mode of taking the minimum value, the maximum value, the arithmetic average value, the weighted average value or the like.
In addition, the related word recommending module 174 is configured to compare the discrimination values of the related words 148 in the recorded related word list 162 with the discrimination values of the related words 148 in the text related word list 152, and select the top P related words 148 with higher discrimination values from the text related word list 152 and the recorded related word list 162 according to the sorting of the discrimination values of the related words 148.P is, for example, 5 or a positive integer greater than 5. In this manner, the associated search vocabulary 176 for appropriate suggestions is completed.
Referring to fig. 1 and 2, fig. 2 is a schematic diagram illustrating an initial search process performed by the system 100 for adaptively adjusting the associated search term 176 according to an embodiment of the invention, which includes steps S11-S14. Referring to steps S11 and S12, it is determined whether the search term 112 is in the search record, if yes, it is further determined whether the accumulated search times of the search term 112 is less than a second threshold. When the above two conditions are met, the evolution module 140 performs an initial search process, in which the search term 112 is not present in the history search record 122 or the cumulative number of searches of the search term 112 is very small, so that the search engine 102 cannot find the history search term suitable for suggestion according to the current search term 112. Referring to steps S13 and S14, it is determined whether the search word 112 is in a text 114, if not, the word segmentation module 146 breaks down the search word 112 into at least one index word according to the index word table 144, and returns to step S11 to further determine whether the index word is in the search record. When the search term 112 exists in a text 114, the text related term generation module 150 can find at least one related term 148 of the text 114 related to the content or attribute of the search term 112 according to the built-in text 114 and the index term table 144.
Next, referring to fig. 1 and 3, fig. 3 is a schematic diagram illustrating optimization of a middle search process by a system 100 for adaptively adjusting associated search terms according to an embodiment of the invention. The flow steps of the present embodiment are the same as those of the above embodiment, except that: in step S12, when the accumulated number of searches of the search term 112 is greater than the second threshold and less than the first threshold, the system 100 accumulates a certain number of search records for the evolution module 140 to execute a mid-term search process. At this point, the record association word generation module 160 may find at least one historical search word related to the content or attribute of the search word 112 according to a historical search record 122. Therefore, the search engine 102 can find out the related words 148 suitable for suggestion according to the current search word 112, and find out the related words 148 suitable for suggestion according to the built-in text 114 and the index word list 144, then generate the identification values of the related words through the related word identification degree calculation module 170 and the new word discovery module 142, and further find out at least one related word 148 and/or at least one historical search word which are maximally related to the content or attribute of the search word 112 through the selection of the related word recommendation module 174, so as to obtain the optimized search related word list 176.
Next, referring to fig. 1 and 4, fig. 4 is a schematic diagram showing a post-search process performed by the system 100 for adaptively adjusting related search terms according to an embodiment of the invention, which omits the text search process of steps S13 and S14 in the initial stage, and only performs the determining steps of steps S11 and S12. In the present embodiment, when the search term 112 appears in the history search records 122 and the accumulated number of searches of the search term 112 is greater than the first threshold and greater than the second threshold, the evolution module 140 is available to perform a post-search process because the system 100 has accumulated a sufficient number of history search records 122. At this point, the record association word generation module 160 may find at least one historical search word related to the content or attribute of the search word 112 according to a historical search record 122. Thus, rather than finding the appropriate suggested associated word 148 from the built-in text 114 and index word list 144, the search engine 102 finds the appropriate suggested associated word 148 from a historical search record 122 directly from the current search word 112. The first threshold and the second threshold are the accumulated search times of the search term 112, and can be determined according to the concept of a generally statistical large number of samples (the number of samples is greater than 30), or according to the search system of the same domain and similar scale, for example, in the shopping search domain, the accumulated search times required for recording the related term, which is satisfied by the user, can be achieved according to the cases of similar product numbers, so as to set the first threshold and the second threshold. Or the first and second thresholds may be dynamically adjusted by a domain expert according to the search result during the use of the search system 100 to adjust the evolution speed from the initial stage to the later stage, or the degradation from the middle stage or the later stage back to the previous stage.
In one embodiment, the method for adaptively adjusting the associated search term 176 may be implemented as a software program stored on a non-transitory computer readable medium (non-transitory computer readable medium), such as a hard disk, an optical disk, a portable disk, a memory, etc., and when the processor loads the software program from the non-transitory computer readable medium, the method as shown in fig. 2, 3 and 4 may be executed to change an initial search process into a middle search process, and then change the middle search process into a later search process.
In one embodiment, the system 100 for adaptively adjusting associated search terms may include a processor capable of executing one or more computer-executable instructions and a program storage device storing computer program modules that are executable by the processor, wherein the computer program modules, when executed by the processor, cause the processor to perform the operations of the steps shown in fig. 2, 3, and 4.
In another embodiment, the record collecting module 120, the threshold setting module 130, the evolution module 140, the new word discovery module 142, the text related word generating module 150, the record related word generating module 160, the related word discrimination calculating module 170, and the related word recommending module 174 may be implemented as software units or hardware units, or may be implemented by combining part of the modules in software, or by combining part of the modules in hardware. The module implemented by the software can be regarded as an operation flow, namely a record collection flow, a threshold setting flow, an evolution flow, a new word discovery flow, a text related word generation flow, a record related word generation flow, a related word discrimination calculation flow, a related word recommendation flow and the like, and can be loaded by the processor to execute corresponding functions. A module implemented in hardware may be implemented, for example, as a microcontroller (microcontroller), microprocessor (microprocesser), digital signal processor (digital signal processor), application specific integrated circuit (application specific integrated circuit, ASIC), digital logic circuit, or field programmable gate array (field programmable gate array, FPGA).
According to the system and the method for adaptively adjusting the associated search words disclosed by the embodiment of the invention, the associated search words can be self-adjusted according to the number of the search records accumulated by the system so as to provide the associated search words suitable for the suggested users, so that the manpower and time cost required by the system program development can be reduced, the problem of pre-learning of the first group of weight combinations is avoided, and the problem of vertical domain conversion learning is avoided. In addition, the invention also considers the situation that the search word recommending process can continuously evolve along with the change of the search records, and establishes a search word recommending mechanism with higher accuracy, so that the problem that the unification of the search word recommending process possibly generates related words which are irrelevant to the content or the attribute of the search word can be avoided, the convenience of management is improved, and the use flexibility is improved.
In summary, although the present invention has been described in terms of the above embodiments, it is not limited thereto. Those of ordinary skill in the art will recognize that various changes and modifications can be made without departing from the spirit and scope of the invention. The scope of the invention is therefore intended to be defined only by the scope of the appended claims.

Claims (18)

1. A system for adaptively adjusting associated search terms, comprising:
an input device for receiving a search term;
the record collecting module is used for judging whether the accumulated searching times of the search word is larger than a first threshold value or smaller than a second threshold value;
a threshold setting module for setting the number of search records meeting the first or the second threshold; and
an evolution module for adjusting a search process according to the number of search records, wherein when the accumulated search times of the search word is greater than the first threshold, the evolution module finds at least one historical search word related to the content or attribute of the search word according to a historical search record,
wherein when the accumulated search times of the search word is smaller than the second threshold, the evolution module executes an initial search process to find out at least one related word related to the content or attribute of the search word in a text,
when the accumulated searching times of the search word is between the first threshold and the second threshold, the evolution module optimizes the middle searching process to further find out the at least one related word and/or the at least one historical search word which are maximized in the text and the historical searching record and are related to the content or the attribute of the search word.
2. The system of claim 1, further comprising:
an index vocabulary; and
the text related word generation module is used for analyzing the first M index words most relevant to the search word in the text according to the index word list so as to generate a text related word list; and
and the record associated word generation module is used for analyzing the association degree between any two historical search words in the historical search records, and finding out the first N historical search words most relevant to the search words so as to generate a record associated word list.
3. The system of claim 2, wherein the text-related word generation module calculates a strength of association based on a probability that the search word and the index words appear in the text individually or together.
4. The system of claim 2, wherein the record-related term generation module calculates a strength of association based on a probability that the search term and the content or attributes of the historical search terms appear alone or together in the historical search record.
5. The system of claim 2, further comprising:
the related word discrimination degree calculation module calculates discrimination values of the related words according to the index word list, the record related word list and the text related word list; and
and the related word recommending module is used for comparing the identification value of each related word in the record related word list with the identification value of each related word in the text related word list and selecting the top P related words with higher identification values from the text related word list and the record related word list according to the sorting of the identification values of the related words.
6. The system of claim 5, wherein the related-word discrimination calculation module calculates the discrimination value based on a degree of difference in occurrence of each of the related words in the text, the degree of difference being related to a frequency of occurrence of each of the related words in the text or the texts.
7. The system of claim 2, further comprising:
and the word segmentation module is used for receiving the search word, and when the search word does not exist in the text, the word segmentation module breaks down the search word into at least one index word according to the index word list.
8. The system of claim 5, further comprising:
and a new word finding module for identifying whether the related word contains a new word which does not exist in the index word list, wherein when the related word contains the new word, the related word discrimination calculating module calculates the discrimination value of the related word according to the related word and the contained new word.
9. The system of one of claims 1 to 8, wherein the system is executed by a processor or a software program loaded by the processor.
10. A method of adaptively adjusting associated search terms, comprising:
an input process for receiving a search term;
a record collecting flow for judging whether the accumulated searching times of the searching words is larger than a first threshold or smaller than a second threshold;
a threshold setting process for setting the number of search records satisfying the first or the second threshold; and
an evolution process for adjusting a search process according to the number of search records, wherein when the accumulated search times of the search word is greater than the first threshold, the evolution process finds at least one historical search word related to the content or attribute of the search word according to a historical search record,
wherein when the accumulated search times of the search word is smaller than the second threshold, the evolution process executes an initial search process to find out at least one related word related to the content or attribute of the search word in a text,
when the accumulated searching times of the search word is between the first threshold and the second threshold, the evolution process optimizes the middle searching process to further find out the at least one related word and/or the at least one historical search word which are maximized in the text and the historical search records and related to the content or attribute of the search word.
11. The method of claim 10, further comprising:
establishing an index word list; and
a text related word generating process for analyzing the first M index words most relevant to the search word in the text according to the index word list to generate a text related word list; and
and a record associated word generating process for analyzing the association degree between any two history search words in the history search records, and finding out the first N history search words most relevant to the search word to generate a record associated word list.
12. The method of claim 11, wherein the text-related word generation process calculates a strength of association based on the probability that the search word and the index words appear in the text individually or together.
13. The method of claim 11, wherein the record association word generation process calculates an association strength based on the probability that the search word and the content or attributes of the historical search words appear alone or together in the historical search record.
14. The method of claim 11, further comprising:
the related word discrimination calculating process calculates discrimination values of the related words according to the index word list, the record related word list and the text related word list; and
and the related word recommending flow is used for comparing the identification value of each related word in the record related word list with the identification value of each related word in the text related word list and selecting the top P related words with higher identification values from the text related word list and the record related word list according to the sorting of the identification values of the related words.
15. The method of claim 14, wherein the associated word discrimination calculation process calculates the discrimination value based on a degree of difference in occurrence of each of the associated words in the text, the degree of difference being related to a frequency of occurrence of each of the associated words in the text or the texts.
16. The method of claim 11, further comprising:
and the word segmentation process is used for receiving the search word, and when the search word does not exist in the text, the word segmentation process breaks down the search word into at least one index word according to the index word list.
17. The method of claim 14, further comprising:
a new word finding process for identifying whether the related word contains a new word which does not exist in the index word list, wherein when the related word contains the new word, the related word discrimination calculating process calculates the discrimination value of the related word according to the related word and the new word.
18. The method of one of claims 11 to 17, wherein the method is performed by a processor or a software program loaded by the processor.
CN201910088844.9A 2018-12-14 2019-01-29 System and method for adaptively adjusting associated search terms Active CN111324705B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TW107145181 2018-12-14
TW107145181A TWI681304B (en) 2018-12-14 2018-12-14 System and method for adaptively adjusting related search words

Publications (2)

Publication Number Publication Date
CN111324705A CN111324705A (en) 2020-06-23
CN111324705B true CN111324705B (en) 2023-05-02

Family

ID=69942676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910088844.9A Active CN111324705B (en) 2018-12-14 2019-01-29 System and method for adaptively adjusting associated search terms

Country Status (2)

Country Link
CN (1) CN111324705B (en)
TW (1) TWI681304B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI742446B (en) * 2019-10-08 2021-10-11 東方線上股份有限公司 Vocabulary library extension system and method thereof
TWI787651B (en) * 2020-09-16 2022-12-21 洽吧智能股份有限公司 Method and system for labeling text segment
TWI755995B (en) * 2020-12-24 2022-02-21 科智企業股份有限公司 A method and a system for screening engineering data to obtain features, a method for screening engineering data repeatedly to obtain features, a method for generating predictive models, and a system for characterizing engineering data online

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365839A (en) * 2012-03-26 2013-10-23 腾讯科技(深圳)有限公司 Recommendation search method and device for search engines
CN105653533A (en) * 2014-11-13 2016-06-08 腾讯数码(深圳)有限公司 Method and device for updating classified associated word set
CN106649334A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Conjunction word set processing method and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7716207B2 (en) * 2002-02-26 2010-05-11 Odom Paul S Search engine methods and systems for displaying relevant topics
US20090037399A1 (en) * 2007-07-31 2009-02-05 Yahoo! Inc. System and Method for Determining Semantically Related Terms
US9043313B2 (en) * 2008-02-28 2015-05-26 Yahoo! Inc. System and/or method for personalization of searches
CN102184173A (en) * 2009-10-31 2011-09-14 佛山市顺德区汉达精密电子科技有限公司 Method for searching Internet data
CN103077179B (en) * 2011-09-12 2016-08-31 克利特股份有限公司 For showing the computer implemented method of the personal time line of the user of social networks, computer system and computer-readable medium thereof
CN102629257B (en) * 2012-02-29 2014-02-19 南京大学 Commodity recommending method of e-commerce website based on keywords
GB201418402D0 (en) * 2014-10-16 2014-12-03 Touchtype Ltd Text prediction integration
US10229210B2 (en) * 2015-12-09 2019-03-12 Oracle International Corporation Search query task management for search system tuning
CN105930376B (en) * 2016-04-12 2019-08-02 Oppo广东移动通信有限公司 A kind of searching method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365839A (en) * 2012-03-26 2013-10-23 腾讯科技(深圳)有限公司 Recommendation search method and device for search engines
CN105653533A (en) * 2014-11-13 2016-06-08 腾讯数码(深圳)有限公司 Method and device for updating classified associated word set
CN106649334A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Conjunction word set processing method and device

Also Published As

Publication number Publication date
CN111324705A (en) 2020-06-23
TW202022635A (en) 2020-06-16
TWI681304B (en) 2020-01-01

Similar Documents

Publication Publication Date Title
CN106874441B (en) Intelligent question-answering method and device
US20180300315A1 (en) Systems and methods for document processing using machine learning
CN107229668B (en) Text extraction method based on keyword matching
EP3819785A1 (en) Feature word determining method, apparatus, and server
KR100544514B1 (en) Method and system for determining relation between search terms in the internet search system
US20150074112A1 (en) Multimedia Question Answering System and Method
CN107463548B (en) Phrase mining method and device
US20090319449A1 (en) Providing context for web articles
CN111324705B (en) System and method for adaptively adjusting associated search terms
CN108027814B (en) Stop word recognition method and device
CN110008474B (en) Key phrase determining method, device, equipment and storage medium
US20230128497A1 (en) Machine learning-implemented chat bot database query system for multi-format database queries
US11935315B2 (en) Document lineage management system
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN103324641B (en) Information record recommendation method and device
CN111930949B (en) Search string processing method and device, computer readable medium and electronic equipment
CN106407332B (en) Search method and device based on artificial intelligence
AU2019290658B2 (en) Systems and methods for identifying and linking events in structured proceedings
CN114254622B (en) Intention recognition method and device
US11314794B2 (en) System and method for adaptively adjusting related search words
US20090234836A1 (en) Multi-term search result with unsupervised query segmentation method and apparatus
CN115577109A (en) Text classification method and device, electronic equipment and storage medium
CN115203206A (en) Data content searching method and device, computer equipment and readable storage medium
CN110543636B (en) Training data selection method for dialogue system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant