WO2013120373A1 - 搜索方法、装置及存储介质 - Google Patents

搜索方法、装置及存储介质 Download PDF

Info

Publication number
WO2013120373A1
WO2013120373A1 PCT/CN2012/086025 CN2012086025W WO2013120373A1 WO 2013120373 A1 WO2013120373 A1 WO 2013120373A1 CN 2012086025 W CN2012086025 W CN 2012086025W WO 2013120373 A1 WO2013120373 A1 WO 2013120373A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
searched
vector
matching algorithm
document
Prior art date
Application number
PCT/CN2012/086025
Other languages
English (en)
French (fr)
Inventor
路彦雄
杨月奎
王亮
焦峰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to US14/347,776 priority Critical patent/US9317590B2/en
Publication of WO2013120373A1 publication Critical patent/WO2013120373A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems

Definitions

  • the present invention relates to the field of computer network search technologies, and in particular, to a search method, device, and storage medium.
  • the traditional search scheme mainly includes: searching all the associated documents in the network according to the information input by the user, and calculating the degree of association between each associated document and the information to be searched according to a certain algorithm rule, based on the degree of association to all associations.
  • the document is sorted and the sorted result is returned to the user as a search result.
  • the degree of relevance directly affects the ranking results of related documents, directly affecting the user's search results, and the degree of relevance is generally reflected by the relevance score.
  • the word matching algorithm is usually used for correlation calculation, for example, BM25 (Best Match) algorithm, proximity (Term proximity scoring) algorithm, etc., relevance score, relevance score The higher the value, the stronger the association.
  • BM25 Best Match
  • proximity Term proximity scoring
  • relevance score relevance score The higher the value, the stronger the association.
  • the relevance score of the associated document is 0; for example: one of the associated documents is: "Beijing, it is a historical and cultural city with a history of more than 3,000 years of construction, more than 850 years of history Is the national political and cultural center, and is also the country's largest land and air production hub.
  • the correlation score of the associated document is 0, indicating that it is not related to the information to be searched, however, from the semantic relationship See, the correlation between the associated document and the information to be searched is actually very good.
  • the associated document may be arranged in the later search result page, which is not conducive to the user's viewing.
  • the technical problem to be solved by the embodiments of the present invention is to provide a search method, a device and a storage medium, which can obtain more accurate search results.
  • an embodiment of the present invention provides a search method, including:
  • the obtained related documents are sorted according to the calculated correlation, and the sorting result is displayed.
  • an embodiment of the present invention further provides a search apparatus, including:
  • a search module configured to acquire an associated document of the information to be searched
  • a calculation module configured to calculate a correlation between each associated document obtained by the search module and the information to be searched based on a word matching algorithm and a semantic matching algorithm
  • a sorting module configured to perform sorting processing on all associated documents obtained by the search module according to the correlation calculated by the calculating module
  • a display module configured to display a sort result obtained by the sorting module.
  • an embodiment of the present invention further provides a storage medium including computer executable instructions for performing a search method, the method comprising the steps of: acquiring an associated document of information to be searched ;
  • the obtained related documents are sorted according to the calculated correlation, and the sorting result is displayed.
  • the embodiment of the invention combines the word matching algorithm and the semantic matching algorithm, comprehensively considers the matching of words and words, and the matching of semantic relations between words and words, and obtains a relatively accurate correlation between each associated document and the information to be searched. Sorting based on the relevance and displaying the sorting result can provide users with ideal search results, so that the user can quickly obtain related documents with high relevance from the displayed search results, satisfying their actual search requirements, and improving search efficiency. , thus improving User satisfaction. BRIEF abstract
  • FIG. 1 is a flow chart of an embodiment of a search method provided by the present invention.
  • step S102 shown in FIG. 1;
  • FIG. 3 is a schematic diagram of an IDF table provided by the present invention.
  • FIG. 4 is a schematic diagram of a M1 table provided by the present invention.
  • FIG. 5 is a specific flowchart of step S103 shown in FIG. 1;
  • FIG. 6 is a schematic structural diagram of an embodiment of a search apparatus provided by the present invention.
  • FIG. 7 is a schematic structural diagram of an embodiment of the computing module shown in FIG. 6. Preferred embodiment of the invention
  • the searching device may calculate the relevance of all associated documents of the information to be searched based on the word matching and the semantic matching algorithm between words and words, and sort and display according to the relevance, so that the user You can quickly obtain related documents with high relevance from the displayed search results to meet your own search needs and improve search efficiency.
  • the information to be searched may be a search keyword sentence input by the user, and the query information may be used.
  • the associated document may be: a document included in a search result obtained by using an existing web search technology based on a search keyword sentence input by a user, which may be represented by a document.
  • the word matching algorithm refers to the search process based on the word matching, which may be: BM25 algorithm, proximity algorithm, etc., unless otherwise specified, the embodiment of the present invention uses the BM25 algorithm as an example for description.
  • the semantic matching algorithm means that the search process is based on the semantic relationship between words and words, that is, the search process is based on mutual information between words and words.
  • MI Matter
  • MI ual Information
  • FIG. 1 is a flowchart of an embodiment of a search method provided by the present invention. the method includes:
  • the score of the relevance of each associated document to the information to be searched may be composed of two parts, one is an association score obtained based on the word matching algorithm, and the other is an association score obtained based on the semantic matching algorithm.
  • the weights of the two-part correlation scores may be preset according to specific conditions, so that the correlation scores composed of the weighted two-part correlation scores can more accurately reflect the degree of association between the associated documents and the information to be searched.
  • all related documents obtained by the search may be sorted and displayed according to the relevance of each related document and the information to be searched in descending order, so that the displayed information is always related to the information to be searched.
  • the related documents enable the user to quickly obtain related documents with high relevance from the displayed search results, satisfying their own search requirements and improving search efficiency. It can be understood that this step can also perform sorting processing in other orders, for example, in descending order according to the relevance degree, or setting a part in descending order according to the relevance degree, one part The scores are ranked in descending order of relevance, and so on.
  • step S102 includes:
  • the search information is vectorized, that is, the word segmentation technique is used, and the search information is processed by word segmentation, and the information to be searched is divided into m words, which can be expressed as ⁇ to, where m and both are positive integers. , and lm.
  • 5212 Perform vectorization processing on each associated document obtained, and obtain n vectors corresponding to each associated document.
  • each document in the obtained related documents is vectorized, that is, using the word segmentation technology, each associated document is subjected to word segmentation, and the associated document is divided into n words, which can be expressed as ⁇ to ⁇ , where n and _/ are both positive integers, and 1 _/ n.
  • step S211 and step S212 are not sequential in sequence.
  • step S212 may be performed first, and then step S211 is performed.
  • the process of the vectorization process in step S211-step S212 can refer to the prior art, and details are not described herein.
  • the formula of the word matching algorithm can be:
  • the parameters, k, and the adjustment factor can play the role of smoothing the data;
  • the parameters, k, and k are constants, and the specific values can be set by the user according to the actual situation or the empirical value;
  • Qtfi is the first vector ⁇ , the word frequency in the information to be searched, that is, the number of times the vector t t appears in the information to be searched;
  • Tfi is a vector, the frequency of words in the associated document, ie vector ⁇ , the number of occurrences in the corresponding associated document;
  • Avdl is the average length of all associated documents
  • the weight of the vector ⁇ is generally the IDF (Inverse document frequency) value, which can be calculated by the following formula, which is as follows:
  • the weights (IDF values) of the vectors (words) in the network may be pre-calculated and stored.
  • the weights of the vectors may be stored in the form of a table.
  • FIG. 3 is a schematic diagram of an IDF table provided by the present invention.
  • the IDF table in the example shown in FIG. 3 stores the weights of the vectors. It can be understood that the IDF table of the example shown in FIG. 3 and the table are Each item is an example.
  • step S213 the weights of the vectors in the information to be searched can be directly read from the preset IDF table, and the parameters required to obtain the word matching algorithm are calculated according to the data obtained in step S211 and step S212, and substituted. Calculated in the calculation formula of the word matching algorithm, the correlation score of the related document and the to-be-searched information is obtained.
  • the formula of the semantic matching algorithm may be:
  • the parameters, k, and the adjustment factor can play the role of smoothing the data;
  • the parameters, k, and k are constants, and the specific values can be set by the user according to the actual situation or the empirical value;
  • / for the length of the corresponding associated document, according to the result of the vectorization processing in step S212, the value of / is n; Avdl is the average length of all associated documents obtained;
  • the service is a vector ⁇ ,. Mutual information with the vector.
  • the mutual information between each vector (word) and each vector in the network may be pre-calculated and stored before the execution of the search process.
  • the mutual information between the vectors may be stored in the form of a table. .
  • FIG. 4 it is a schematic diagram of the M1 table provided by the present invention; the M1 table in the example shown in FIG. 4 stores mutual information between the vectors, and it can be understood that the M1 table of the example shown in FIG. 4 And the items in the table are examples.
  • step S214 the mutual information of each vector in the to-be-searched information and each vector of the associated document can be directly read from the preset M1 table, and calculated according to the data obtained in step S211 and step S212.
  • the parameters required for obtaining the semantic matching algorithm are calculated and substituted into the calculation formula of the semantic matching algorithm to obtain an association score S 2 of the associated document and the information to be searched.
  • step S213 and step S214 are not sequential in sequence. For example, step S214 may be performed first, and then step S213 is performed.
  • step S103 includes: S311 , according to the relevance of each associated document and the information to be searched, in order of relevance from highest to lowest. Associate documents for sorting.
  • step S311 After the sorting process in step S311, the associated documents are arranged in descending order of relevance, and step S312 displays related documents arranged in descending order of relevance, so that the user can quickly display from the displayed search results. Get relevant documents with high relevance to meet your own search needs and improve search efficiency.
  • XX mobile phone price/performance ratio "XX mobile phone price/performance ratio"
  • XX brand mobile phone is very good value for money, and XX brand mobile phone is very durable;
  • Related document 2 I am a loyal friend of XX brand mobile phone, like to play XX brand mobile phone, brush machine, download program, game In all aspects, I feel that the various softwares of the XX brand mobile phone are relatively comprehensive, so I have been playing until now;
  • Step S212 performs vectorization processing on any associated document, and associates the document 1 as an example.
  • n vectors are obtained, as follows: XX card ⁇ ⁇ mobile phone ⁇ cost-effective ⁇ are ⁇ very ⁇ good ⁇ ⁇ , ⁇ and ⁇ XX card ⁇ mobile ⁇ very ⁇ durable ⁇ .
  • n 15, ⁇ as “XX” brand, ⁇ 2 "and” 4 "Mobile” for the “price” for "all” for the “Gen”, ⁇ ⁇ is “Yes”, ⁇ 3 ⁇ 4 For "”, ⁇ 9 for", ", 4.
  • d is "XX card”
  • d 12 is “mobile phone”
  • d 13 is ⁇
  • d 14 is “durable”
  • d l5 is "of".
  • the vectors may be separately counted.
  • the word frequency in the information to be searched is: ⁇ is 1, ⁇ 2 is 1, and 3 is 1.
  • the vector, the word frequency in the associated document, is: ⁇ is 2, ⁇ 2 is 2, and ⁇ 3 is 1.
  • / is the length 15 of the associated document 1.
  • Flw / / is the average length of the three associated documents.
  • the weights of the vectors in the information to be searched can be read from the preset IDF table shown in FIG. 3 as follows: ⁇ is 8.435292, w 2 is 5.256969, and w 3 is 8.952069. Based on the calculation formula of the word matching algorithm, the association score of the associated document and the to-be-searched information is calculated.
  • step S214 mutual information of each vector in the information to be searched and each vector of the associated document may be read from the preset M1 table shown in FIG. Based on the calculation formula of the semantic matching algorithm, the association score of the associated document and the information to be searched is calculated.
  • step S215 it may be set to, for example, 0.4 according to actual needs, so that the correlation between the associated document 1 and the information to be searched is calculated to be 1.759 by using ⁇ -pair and weighted summation.
  • Step S311 sorts the associated documents 1-3 in descending order of relevance to form an arrangement of "related documents 3 - associated documents 2 - associated documents.
  • Step S312 displays the arrangement obtained in step S311 to the user.
  • the user can obtain the most relevant related document 3 from the first search result, and the user can satisfy his actual search requirement without searching, thereby improving the search efficiency.
  • the embodiment of the invention combines the word matching algorithm and the semantic matching algorithm, comprehensively considers the matching of words and words, and the matching of semantic relations between words and words, and obtains a relatively accurate correlation between each associated document and the information to be searched. Sorting based on the relevance and displaying the sorting result can provide users with ideal search results, so that users can quickly obtain relevance from the displayed search results. Higher associated documents, to meet their actual search needs, improve search efficiency, thereby improving user satisfaction.
  • the search device provided by the embodiment of the present invention will be described in detail below with reference to FIG. 6 to FIG. 7. The device of the following embodiments may be used. It is applied to the above method embodiment.
  • FIG. 6 is a schematic structural diagram of an embodiment of a search apparatus provided by the present invention.
  • the apparatus includes:
  • the search module 101 is configured to acquire an associated document of the information to be searched.
  • the specific search process of the search module 101 can refer to the prior art, and details are not described herein.
  • the calculating module 102 is configured to calculate, according to the word matching algorithm and the semantic matching algorithm, the relevance of each associated document obtained by the search module 101 and the information to be searched.
  • the score of the relevance of each associated document to the information to be searched may be composed of two parts, one is an association score obtained based on a word matching algorithm, and the other is an association score obtained based on a semantic matching algorithm.
  • the weights of the two parts of the associated scores may be preset according to specific conditions, so that the relevance scores of the weighted two-part correlation scores more accurately reflect the degree of association between the associated documents and the information to be searched.
  • the sorting module 103 is configured to sort the associated documents obtained by the search module according to the correlation calculated by the calculating module 102.
  • the sorting module 103 may sort all the related documents obtained by the search according to the order of relevance of each associated document and the information to be searched calculated by the calculating module 102, or may perform sorting processing in other orders, for example, According to the relevance degree, the order is from low to high, or the part is set in descending order according to the relevance degree, and the part is ranked in descending order according to the relevance degree, and so on.
  • the display module 104 is configured to display the sorting result obtained by the sorting module 103.
  • the display module 104 displays the sorting result obtained by the sorting module 103, so that the displayed related document that is always related to the information to be searched is always displayed, so that the user can quickly obtain the related document with high relevance from the displayed search result. , to meet their own search needs, improve search efficiency.
  • FIG. 7 which is a schematic structural diagram of an embodiment of the computing module shown in FIG. 6, the computing module 102 includes:
  • the first vectorization processing unit 211 is configured to perform vectorization processing on the to-be-searched information to obtain m vectors ⁇ , ⁇ .
  • the first vectorization processing unit 211 performs vectorization processing on the search information, that is, uses a word segmentation technique to perform word segmentation processing on the search information, and divides the information to be searched into m words, which can be expressed as, wherein, m and both Positive integer, and lm.
  • the specific processing procedure of the first vectorization processing unit 211 can refer to the prior art, and details are not described herein.
  • the second vectorization processing unit 212 is configured to perform vectorization processing on each associated document obtained by the search module to obtain n vectors corresponding to each associated document.
  • the second vectorization processing unit 212 performs vectorization processing on the associated document, that is, uses word segmentation technology to perform word segmentation processing on the associated document, and divides the associated document into n words, which can be expressed as 4 to , where, ! ! And ⁇ ' are both positive integers, and 1 second vectorization processing unit
  • the word matching calculation unit 213 is configured to calculate, according to the word matching algorithm, an association score of the associated document processed by the second vectorization processing unit 212 and the information to be searched.
  • the word matching calculation unit 213 can directly read the weights of the vectors in the information to be searched directly from the preset IDF table in the example shown in FIG. 3, and according to the first vectorization processing unit 211 and the second vectorization processing unit.
  • the data obtained by 212 is used to calculate various parameters required for obtaining the word matching algorithm, and based on the calculation formula of the word matching algorithm, the associated score of the associated document and the information to be searched is calculated.
  • the semantic matching calculation unit 214 is configured to calculate, according to the semantic matching algorithm, the association score S 2 of the associated document processed by the second vectorization processing unit 212 and the to-be-searched information.
  • the semantic matching calculation unit 214 can directly read the mutual information of each vector in the information to be searched and each vector of the associated document from the preset M1 table in the example shown in FIG. 4, and according to the first direction
  • the data obtained by the quantization processing unit 211 and the second vectorization processing unit 212 calculates various parameters required to obtain the semantic matching algorithm, and calculates a correlation between the associated document and the to-be-searched information based on a calculation formula of the semantic matching algorithm. Rating S 2 .
  • the value set according to the specific situation may be such that the weighted sum and S 2 correlation degree score S can more accurately reflect the degree of association between the associated document and the information to be searched. It should be noted that the larger the value of S, the stronger the association between the associated document and the information to be searched.
  • the second vectorization processing unit 212, the word matching calculation unit 213, the semantic matching calculation unit 214, and the relevance calculation unit 215 may need to repeat the work until the relevance of all associated documents to the information to be searched is obtained. Then, the sorting module 103 may sort all the related documents obtained by the search module according to the relevance of each associated document and the information to be searched, in descending order of relevance; the display module 104 Then, the sorting module 103 displays all the associated documents processed by the sorting module 103.
  • the search apparatus may be: a search engine, a browser, and a terminal having a search function.
  • the embodiment of the present invention combines a word matching algorithm and a semantic matching algorithm, comprehensively considers the matching of words and words, and the matching of semantic relations between words and words, and obtains each associated document and information to be searched.
  • the more accurate correlation, sorting based on the relevance and displaying the sorting result can provide users with ideal search results, so that users can quickly obtain related documents with high relevance from the displayed search results, and satisfy their actual situation. Search requirements increase search efficiency and increase user satisfaction.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例公开了一种搜索方法、装置及存储介质,其方法包括:获取待搜索信息的所有关联文档;基于词匹配算法及语义匹配算法,计算每一个关联文档与所述待搜索信息的相关度;根据计算得到的相关度对所有关联文档进行排序处理,并显示排序结果。本发明实施例还公开了一种搜索装置。本发明综合考虑词与词的匹配,以及词与词之间的语义关系的匹配,获得准确的相关度计算结果,为用户提供理想的搜索结果,提高用户的满意度。

Description

搜索方法、 装置及存储介质
技术领域
本发明涉及计算机网络搜索技术领域, 尤其涉及一种搜索方法、 装置及 存储介质。
背景技术
目前, 传统的搜索方案主要为: 根据用户输入待搜索信息在网络中查找 所有的关联文档, 依据一定的算法规则计算每个关联文档与待搜索信息的关 联程度, 基于关联程度的高低对所有关联文档进行排序处理, 将排序结果作 为搜索结果返回给用户。 上述可知, 关联程度的高低直接影响关联文档的排 序结果, 直接影响用户的搜索结果, 而关联程度的高低一般采用相关度评分 直观反映。
传统的搜索方案中, 通常采用词匹配算法进行相关度计算, 例如采用 BM25 ( Best Match, 最佳匹配)算法、 proximity (Term proximity scoring , 词 近邻得分)算法等等进行相关度评分, 相关度评分越高, 表明关联程度越强。 以基于 ΒΜ25算法的搜索方案进行说明, 如下: 假设用户输入的待搜索信息 为 "中国的首都" , 根据 ΒΜ25算法的相关度评分原则, 关联文档中必须出 现 "中国" 、 "首都" , 才能够获得相应的相关度评分, 否则该关联文档的 相关度评分则为 0;例如:其中一个关联文档为: "北京,它是一座有着 3000 多年的建城史, 850多年的建者史的历史文化名城; 是全国政治、 文化中心, 也是全国最大的陆空产通枢纽" , 根据上述传统的搜索方案, 该关联文档的 相关度评分为 0, 表明与待搜索信息不相关, 然而, 从语义关系来看, 该关 联文档与待搜索信息的相关性实际上是十分好的。 经过排序处理后, 该关联 文档可能排列于较后的搜索结果页面中,不利于用户的查看。上述例子可知, 传统的搜索方案仅仅基于词进行相关度的匹配, 并未考虑词与词之间的语义 关系, 可能造成相关度计算结果的不准确, 影响搜索结果的排列顺序, 降低 用户对搜索结果的满意度, 降低用户的搜索体验。 发明内容
本发明实施例所要解决的技术问题在于, 提供一种搜索方法、 装置及存 储介质, 能够获得更准确的搜索结果。
一方面, 本发明实施例提供了一种搜索方法, 包括:
获取待搜索信息的关联文档;
基于词匹配算法及语义匹配算法, 计算获取到的每一个关联文档与所述 待搜索信息的相关度;
根据计算得到的相关度对获取到的关联文档进行排序, 并显示排序结 果。
另一方面, 本发明实施例还提供了一种搜索装置, 包括:
搜索模块, 用于获取待搜索信息的关联文档;
计算模块, 用于基于词匹配算法及语义匹配算法, 计算所述搜索模块获 得的每一个关联文档与所述待搜索信息的相关度;
排序模块, 用于根据所述计算模块计算得到的相关度对所述搜索模块获 得的所有关联文档进行排序处理;
显示模块, 用于显示所述排序模块获得的排序结果。
另一方面, 本发明实施例还提供了一种包含计算机可执行指令的存储介 质,所述计算机可执行指令用于执行一种搜索方法,所述方法包括以下步骤: 获取待搜索信息的关联文档;
基于词匹配算法及语义匹配算法, 计算获取到的每一个关联文档与所述 待搜索信息的相关度;
根据计算得到的相关度对获取到的关联文档进行排序, 并显示排序结 果。
实施本发明实施例, 具有如下有益效果:
本发明实施例结合词匹配算法及语义匹配算法, 综合考虑词与词的匹 配, 以及词与词之间的语义关系的匹配, 获得每一个关联文档与待搜索信息 之间较为准确的相关度, 基于该相关度进行排序并显示排序结果, 可以为用 户提供理想的搜索结果, 使得用户可以从显示的搜索结果中快速获得相关度 较高的关联文档, 满足自己实际的搜索需求, 提高了搜索效率, 从而提高了 用户的满意度。 附图概述
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明提供的搜索方法的一个实施例的流程图;
图 2为图 1所示步骤 S102的具体流程图;
图 3为本发明提供的 IDF表的示意图;
图 4为本发明提供的 Ml表的示意图;
图 5为图 1所示步骤 S103的具体流程图;
图 6为本发明提供的搜索装置的一个实施例的结构示意图;
图 7为图 6所示的计算模块的实施例的结构示意图。 本发明的较佳实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
本发明实施例提供的方案中, 搜索装置可以基于词匹配以及词与词之间 的语义匹配算法, 计算待搜索信息的所有关联文档的相关度, 并根据该相关 度进行排序和显示, 使得用户可以从显示的搜索结果中快速获得相关度较高 的关联文档, 满足自己的搜索需求, 提高搜索效率。
其中,所述待搜索信息可以为用户输入的搜索关键词句,其可以用 query
(查询)表示。 所述关联文档可以为: 基于用户输入的搜索关键词句, 利用 现有的网络搜索技术获得的搜索结果中包含的文档, 其可以用 document (文 档)表示。 所述词匹配算法是指搜索过程基于词进行匹配, 其可以为: BM25算法、 proximity等算法,除特别说明外,本发明实施例以 BM25算法为例进行说明。 所述语义匹配算法是指搜索过程基于词与词之间的语义关系进行匹配, 也 即, 搜索过程基于词与词之间的互信息进行匹配。 所谓 MI ( Mutual Information, 互信息) , 是对两个随机变量的关联程度的描述, 在文本处理 中, Ml用来衡量两个词的相关度, 两个词的 Ml越大, 表示该两个词的关联 程度越强。 下面将结合附图 1-附图 5 , 对本发明实施例提供的搜索方法进行详细介 绍。
请参见图 1 , 为本发明提供的搜索方法的一个实施例的流程图; 该方法 包括:
S101 , 获取待搜索信息的关联文档。 本步骤可以参照现有技术, 在此不 赘述。
S102, 基于词匹配算法及语义匹配算法, 计算获取到的每一个关联文档 与所述待搜索信息的相关度。
本步骤中, 每一个关联文档与待搜索信息的相关度的评分可以由两部分 组成, 一部分是基于词匹配算法获得的关联评分, 另一部分是基于语义匹配 算法获得的关联评分。 实际应用中, 可以根据具体情况, 预先设置两部分关 联评分的权重, 使得加权后的两部分关联评分所组成的相关度评分更能准确 体现关联文档与待搜索信息的关联程度。
S103 , 根据计算得到的相关度对获取到的关联文档进行排序处理, 并显 示排序结果。
本步骤中, 可以按照每个关联文档与待搜索信息的相关度评分由高至低 的顺序, 对搜索得到的所有关联文档进行排序和显示, 使得显示在前的始终 为与待搜索信息较相关的关联文档, 从而使得用户可以从显示的搜索结果中 快速获得相关度较高的关联文档, 满足自己的搜索需求, 提高搜索效率。 可 以理解的是, 本步骤也可以采用其他顺序进行排序处理, 例如按照相关度评 分由低至高的顺序, 或者设置一部分按照相关度评分由低至高的顺序, 一部 分按照相关度评分由高至低的顺序, 等等。
请参见图 2, 为图 1所示步骤 S102的具体流程图; 该步骤 S102包括:
5211 , 对所述待搜索信息进行向量化处理, 获得 m个向量 ί,·。
本步骤中, 对待搜索信息进行向量化处理, 即是利用分词技术, 对待搜 索信息进行分词处理,将待搜索信息分割成 m个词组成,可以表示为 ^至 , 其中, m和 均为正整数, 且 l m。
5212, 对获取到的每一个关联文档进行向量化处理, 获得每一个关联文 档所对应的 n个向量 。
本步骤中, 对获取到的所有关联文档中的每一个文档进行向量化处理, 即是利用分词技术, 对每一个关联文档进行分词处理, 将该关联文档分割成 n个词组成, 可以表示为 ^至^ , 其中, n和 _/均为正整数, 且 1 _/ n。
需要说明的是, 步骤 S211与步骤 S212在时序上不分先后, 例如也可以 先执行步骤 S212, 再执行步骤 S211。 步骤 S211-步骤 S212中的向量化处理 过程可以参照现有技术, 在此不赘述。
5213, 基于词匹配算法, 计算得到每一个关联文档与所述待搜索信息的 关联评分 。
本步骤中, 词匹配算法的公式可以为: )
Figure imgf000007_0001
avdl )
其中, 参数 、 、 k、 为调节因子, 可以起到平滑数据的作用; 具 体实现中, 参数 、 、 k、 为常数, 其具体取值可以根据实际情况或经 验值由用户进行设定;
qtfi为第 个向量 ί,.在所述待搜索信息中的词频, 即向量 tt在所述待搜 索信息中出现的次数;
tfi为向量 ,.在所述关联文档中的词频,即向量 ί,.在相应的关联文档中出 现的次数;
/为所述关联文档的长度,根据步骤 S212中的向量化处理结果, /的值 为 n;
avdl为所有关联文档的平均长度;
为向量^的权重, 一般为 IDF ( Inverse document frequency, 逆文 档频率)值, 其可以通过以下公式计算得到, 该计算公式如下:
Figure imgf000008_0001
1 htf ^ + 0.5 其中, ^为所有关联文档的个数, 为向量 ί,.在获取到的所有关联文 档中的词频。
本发明实施例中, 在搜索过程执行之前, 可以将网络中各个向量(词) 的权重(IDF值)预先计算出来并进行存储, 例如可以采用表的形式存储各 向量的权重。 请一并参见图 3 , 为本发明提供的 IDF表的示意图, 图 3所示 例子中的 IDF表中存储了各向量的权重,可以理解的是,图 3所示例子的 IDF 表以及表中各项均为举例。
步骤 S213中, 可直接从预设的 IDF表中读取到待搜索信息中的各向量 的权重, 并根据步骤 S211和步骤 S212所得到的数据, 计算获得词匹配算法 所需的各参数, 代入上述词匹配算法的计算公式中计算, 得到所述关联文档 与所述待搜索信息的关联评分 。
S214, 基于语义匹配算法, 计算得到每一个关联文档与所述待搜索信息 的关联评分 。
本步骤中, 所述语义匹配算法的公式可以为:
Figure imgf000008_0002
其中, 参数 、 、 k、 为调节因子, 可以起到平滑数据的作用; 具 体实现中, 参数 、 、 k、 为常数, 其具体取值可以根据实际情况或经 验值由用户进行设定;
/为相应关联文档的长度,根据步骤 S212中的向量化处理结果, /的值 为 n; avdl为获取到的所有关联文档的平均长度;
服 为向量 ί,.与向量 的互信息,实际应用中,向量 ί,.与向量 的 互信息的计算公式可以为: miit^ d■) = log
其中, p(ti , dj ) = ^ c(t d ) , 表示在网络中, 向量 ί与向量 同时出现在同一篇文档中的次数; , 、 c(t- )
∑ cit ) ' c(¾)表示在网络中, 向量 ^出现的次数;
^ ), 表示在网给中, 向量 出现的次数。 本发明实施例中, 在搜索过程执行之前, 可以将网络中各个向量(词) 与各个向量之间的互信息预先计算出来并进行存储, 例如可以采用表的形式 存储各向量之间的互信息。 请一并参见图 4, 为本发明提供的 Ml表的示意 图; 图 4所示例子中的 Ml表中存储了各向量之间的互信息, 可以理解的是, 图 4所示例子的 Ml表以及表中各项均为举例。
步骤 S214中,可直接从预设的 Ml表中读取到所述待搜索信息中的各向 量与所述关联文档的各向量的互信息, 并根据步骤 S211和步骤 S212所得到 的数据, 计算获得语义匹配算法所需的各参数, 代入上述语义匹配算法的计 算公式中计算, 得到所述关联文档与所述待搜索信息的关联评分 S2
需要说明的是, 步骤 S213与步骤 S214在时序上不分先后, 例如也可以 先执行步骤 S214, 再执行步骤 S213。
S215 ,根据公式 = β Χ^ (1- Q)xS2 ,计算得到每一个关联文档与所述待 搜索信息的相关度 S。
其中, 为预设的权重, 且 0 < < 1。 实际应用中, 可以根据具体情 况设置 的值, 使得加权后的 和 s2所组成的相关度评分 S更能准确体现 该关联文档与待搜索信息的关联程度。 需要说明的是, S的值越大, 表明该 关联文档与所述待搜索信息的关联程度越强。 请参见图 5 , 为图 1所示步骤 S103的具体流程图; 该步骤 S103包括: S311 , 根据每一个关联文档与所述待搜索信息的相关度, 按照相关度从 高至低的顺序对所有关联文档进行排序。
S312, 显示排序后的所有关联文档。
经步骤 S311排序处理之后, 各关联文档按照相关度由高至低的顺序进 行排列, 步骤 S312则显示按照相关度由高至低的顺序排列的关联文档, 使 得用户可以从显示的搜索结果中快速获得相关度较高的关联文档, 满足自己 的搜索需求, 提高搜索效率。
下面将结合一个具体示例, 详细阐述上述图 1-图 5所示例子中的搜索方 法。
假设用户想要查询关于 XX牌手机的一些资讯介绍, 可以在搜索引擎中 输入的待搜索信息为: "XX牌手机性价比" ; 经步骤 S101搜索后, 总共获 得三个关联文档, 包括:
关联文档 1 : XX牌的手机性价比都很不错的,而且 XX牌手机很耐用的; 关联文档 2: 我是 XX牌手机的忠实玩友, 喜欢玩 XX牌手机, 刷机呀, 下载程序呀, 游戏呀各方面, 觉得 XX牌手机的各种软件都比较多比较全, 所以一直玩到现在;
关联文档 3: 符合你要求的机型非常多, 给你几个参考: 1、 直板商务新 机 A, 2.4寸全键盘, 金属机身, 500万像素, 带 WIFI, 全面支持导航***; 2、 全触摸娱乐街机 B , 3.2的 1600万色屏, 支持 WIFI, 320万像素, 支持 导航***且带车载架; 3、 传统直板机 C, 功能同 B , 但更薄、 轻, 2.2寸屏, 500万像素。
步骤 S211对待搜索信息进行向量化处理, 得到获得 m个向量 ,. , 具体 如下: 牌\手机\性价比。 其中, m=3 , ^为 "XX牌" , ί2为 "手机" , t3 为 "性价比" 。
步骤 S212对任一个关联文档进行向量化处理, 以关联文档 1为例, 经 步骤 S212的向量化处理后, 获得 n个向量 , 具体如下: XX牌 \的\手机 \ 性价比 \都\很\不错 \的\, \而且 \XX牌 \手机\很\耐用 \的。其中, n=15 , ^为 "XX 牌" , < 2为 "的" , 4为 "手机" , 为 "性价比" , 为 "都" , 为 "艮" , άΊ为 "不错" , <¾为 "的" , < 9为 ", " , 4。为 "而且" , d 为 "XX牌" , d12为 "手机" , d13为 艮" , d14为 "耐用" , dl5为 "的" 。
步骤 S213中,可分别统计出向量 ,.在所述待搜索信息中的词频 分别 为: ^为 1 , ί2为 1 , 3为 1。 向量 ,.在所述关联文档中的词频?;分别为: ^ 为 2 , ί2为 2, ί3为 1。 /为关联文档 1的长度 15。 flw//为三个关联文档的 平均长度。 可以从图 3所示的预设的 IDF表中读取待搜索信息中的各向量的 权重分别为: ^为 8.435292, w2为 5.256969, w3为 8.952069。 基于词匹配 算法的计算公式, 计算得到所述关联文档与所述待搜索信息的关联评分 。
步骤 S214中,可以从图 4所示的预设的 Ml表中读取到所述待搜索信息 中的各向量与所述关联文档的各向量的互信息。 基于语义匹配算法的计算公 式, 计算得到所述关联文档与所述待搜索信息的关联评分 。
步骤 S215中, 可以根据实际需要设定 为, 例如 为 0.4, 从而利用 β 对 和 加权求和, 计算得到关联文档 1与所述待搜索信息的相关度 S为 1.759。
重复上述步骤 S211-步骤 S215 , 分别获得关联文档 2与所述待搜索信息 的相关度 S为 4.509; 关联文档 3与所述待搜索信息的相关度 S为 10.403。
步骤 S311按照相关度由高至低的顺序对关联文档 1-3进行排序,形成"关 联文档 3-关联文档 2-关联文档 的排列。 步骤 S312向用户显示步骤 S311 所获得的排列。
经过上述各步骤的处理, 用户可以从显示的搜索结果中最首位获得最相 关的关联文档 3 , 无需再进行查找即可满足自己的实际的搜索需求, 提高了 搜索效率。
本发明实施例结合词匹配算法及语义匹配算法, 综合考虑词与词的匹 配, 以及词与词之间的语义关系的匹配, 获得每一个关联文档与待搜索信息 之间较为准确的相关度, 基于该相关度进行排序并显示排序结果, 可以为用 户提供理想的搜索结果, 使得用户可以从显示的搜索结果中快速获得相关度 较高的关联文档, 满足自己实际的搜索需求, 提高了搜索效率, 从而提高了 用户的满意度。 对应于上述附图 1-附图 5任一实施例所述的搜索方法, 下面将结合附图 6-附图 7, 对本发明实施例提供的搜索装置进行详细介绍, 下述实施例的装 置可以应用于上述方法实施例中。
请参见图 6, 为本发明提供的搜索装置的一个实施例的结构示意图; 该 装置包括:
搜索模块 101 , 用于获取待搜索信息的关联文档。 搜索模块 101的具体 搜索过程可以参照现有技术, 在此不赘述。
计算模块 102, 用于基于词匹配算法及语义匹配算法, 计算所述搜索模 块 101获得的每一个关联文档与所述待搜索信息的相关度。
本实施例中, 每一个关联文档与待搜索信息的相关度的评分可以由两部 分组成, 一部分是基于词匹配算法获得的关联评分, 另一部分是基于语义匹 配算法获得的关联评分。 实际应用中, 可以根据具体情况, 预先设置两部分 关联评分的权重, 使得加权后的两部分关联评分所组成的相关度评分更能准 确体现关联文档与待搜索信息的关联程度。
排序模块 103, 用于根据所述计算模块 102计算得到的相关度对所述搜 索模块获得的关联文档进行排序。
排序模块 103可以按照计算模块 102计算获得的每个关联文档与待搜索 信息的相关度评分由高至低的顺序, 对搜索得到的所有关联文档进行排序, 也可以采用其他顺序进行排序处理, 例如按照相关度评分由低至高的顺序, 或者设置一部分按照相关度评分由低至高的顺序, 一部分按照相关度评分由 高至低的顺序, 等等。
显示模块 104, 用于显示所述排序模块 103获得的排序结果。
显示模块 104按照排序模块 103获得的排序结果进行显示, 使得显示在 前的始终为与待搜索信息较相关的关联文档, 从而使得用户可以从显示的搜 索结果中快速获得相关度较高的关联文档, 满足自己的搜索需求, 提高搜索 效率。 请参见图 7, 为图 6所示的计算模块的实施例的结构示意图, 该计算模 块 102包括:
第一向量化处理单元 211 , 用于对所述待搜索信息进行向量化处理, 获 得 m个向量 ί,·。
第一向量化处理单元 211对待搜索信息进行向量化处理, 即是利用分词 技术, 对待搜索信息进行分词处理, 将待搜索信息分割成 m个词组成, 可以 表示为 至 , 其中, m和 均为正整数, 且 l m。 第一向量化处理单 元 211的具体处理过程可以参照现有技术, 在此不赘述。
第二向量化处理单元 212, 用于对所述搜索模块获得的每一个关联文档 进行向量化处理, 获得每一个关联文档所对应的 n个向量 。
第二向量化处理单元 212对关联文档进行向量化处理, 即是利用分词技 术, 对关联文档进行分词处理, 将该关联文档分割成 n个词组成, 可以表示 为 4至 , 其中, !!和^'均为正整数, 且 1 第二向量化处理单元
212的具体处理过程可以参照现有技术, 在此不赘述。
词匹配计算单元 213, 用于基于词匹配算法, 计算得到所述第二向量化 处理单元 212处理后的关联文档与所述待搜索信息的关联评分 。
词匹配计算单元 213可直接从图 3所示例子中的预设的 IDF表中读取到 待搜索信息中的各向量的权重, 并根据第一向量化处理单元 211和第二向量 化处理单元 212所得到的数据, 计算获得词匹配算法所需的各参数, 基于词 匹配算法的计算公式, 计算得到所述关联文档与所述待搜索信息的关联评分 。
语义匹配计算单元 214, 用于基于语义匹配算法, 计算得到所述第二向 量化处理单元 212处理后的关联文档与所述待搜索信息的关联评分 S2
语义匹配计算单元 214可直接从图 4所示例子中的预设的 Ml表中读取 到所述待搜索信息中的各向量与所述关联文档的各向量的互信息, 并根据第 一向量化处理单元 211和第二向量化处理单元 212所得到的数据, 计算获得 语义匹配算法所需的各参数, 基于语义匹配算法的计算公式, 计算得到所述 关联文档与所述待搜索信息的关联评分 S2
相关度计算单元 215, 用于根据公式 S = o xSf (l- o)x , 计算得到所述 关联文档与所述待搜索信息的相关度 S,其中, 为预设的权重,且 0 < < 1。
其中, 为预设的权重, 且 0 < < 1。 实际应用中, 可以根据具体情 况设置 的值, 使得加权后的 和 s2所组成的相关度评分 S更能准确体现 该关联文档与待搜索信息的关联程度。 需要说明的是, S的值越大, 表明该 关联文档与所述待搜索信息的关联程度越强。
可以理解的是, 第二向量化处理单元 212、 词匹配计算单元 213、 语义 匹配计算单元 214以及相关度计算单元 215可能需要重复工作, 直至获得所 有关联文档与待搜索信息的相关度为止。 之后, 所述排序模块 103可以根据 每个关联文档与所述待搜索信息的相关度, 按照相关度从高至低的顺序对所 述搜索模块获得的所有关联文档进行排序; 所述显示模块 104则显示所述排 序模块 103排序处理后的所有关联文档。
需要说明的是, 本发明实施例所述的搜索装置可以为: 搜索引擎、 浏览 器以及具备搜索功能的终端。
通过上述实施例的描述, 本发明实施例结合词匹配算法及语义匹配算 法, 综合考虑词与词的匹配, 以及词与词之间的语义关系的匹配, 获得每一 个关联文档与待搜索信息之间较为准确的相关度, 基于该相关度进行排序并 显示排序结果, 可以为用户提供理想的搜索结果, 使得用户可以从显示的搜 索结果中快速获得相关度较高的关联文档, 满足自己实际的搜索需求, 提高 了搜索效率, 从而提高了用户的满意度。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流 程, 是可以通过计算机程序来指令相关的硬件来完成, 所述的程序可存储于 一计算机可读取存储介质中, 该程序在执行时, 可包括如上述各方法的实施 例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体( Read-Only Memory, ROM )或随机存储记忆体 ( Random Access Memory, RAM )等。
以上所揭露的仅为本发明一种较佳实施例而已, 当然不能以此来限定本 发明之权利范围, 本领域普通技术人员可以理解实现上述实施例的全部或部 分流程, 并依本发明权利要求所作的等同变化, 仍属于发明所涵盖的范围。

Claims

权 利 要 求 书
1、 一种搜索方法, 其特征在于, 包括:
获取待搜索信息的关联文档;
基于词匹配算法及语义匹配算法, 计算获取到的每一个关联文档与所述 待搜索信息的相关度;
根据计算得到的相关度对获取到的关联文档进行排序, 并显示排序结 果。
2、 如权利要求 1 所述的方法, 其特征在于, 所述基于词匹配算法及语 义匹配算法, 计算获取到的每一个关联文档与所述待搜索信息的相关度, 包 括:
对所述待搜索信息进行向量化处理, 获得 m个向量 ί,., 其中, m和 均 为正整数, J- 1 < < m;
对获取到的每一个关联文档进行向量化处理, 获得每一个关联文档所对 应的 n个向量 , 其中, !!和^'均为正整数, 且 K n;
基于词匹配算法, 计算得到每一个关联文档与所述待搜索信息的关联评 分 ,基于语义匹配算法, 计算得到每一个关联文档与所述待搜索信息的关 联评分 S2 ;
根据公式 S =o xSf (l- ο)χ ,计算得到每一个关联文档与所述待搜索信 息的相关度 S, 其中, β为预设的权重, 且 0 < < 1。
3、 如权利要求 2所述的方法, 其特征在于: 所述词匹配算法的公式为 )
Figure imgf000015_0001
其中, 、 k3、 k、 为常数; 为第 个向量 ί,·在所述待搜索信息 中的词频; ?;为向量 ,.在相应的关联文档中的词频; /为该相应关联文档的 长度, w /为获取到的所有关联文档的平均长度; w为向量 ί的权重。
4、 如权利要求 3所述的方法, 其特征在于, 向量^的权重的计算公式如 下:
, Η - htf. - 0.5
w- = log ■
htft + 0.5 其中, ^为获取到的所有关联文档的个数, 为向量 ,.在所有关联文 档中的词频。
5、 如权利要求 2所述的方法, 其特征在于: 所述语义匹配算法的公式 为:
Figure imgf000016_0001
其中, 、 k3 , k、 为常数; /为相应关联文档的长度, w /为获取 到的所有关联文档的平均长度; miiJi, )为向量 ί,.与向量 的互信息。
6、 如权利要求 5所述的方法, 其特征在于, 向量 ί,.与向量 的互信息 的计算公式如下:
其中, 在网络中, 向量^与向量
Figure imgf000016_0002
同时出现在同一篇文档中的次数; ( 、 c(t. )
c( ' C( 表示在网给中, 向量 ^出现的次数; c(d . )
P(d 二∑ c(d ) , ^^^表示在网给中, 向量 出现的次数。
7、 如权利要求 1-6任一项所述的方法, 其特征在于, 所述根据计算得到 的相关度对获取到的关联文档进行排序, 并显示排序结果, 包括:
根据每一个关联文档与所述待搜索信息的相关度, 按照相关度从高至低 的顺序对所有关联文档进行排序;
显示排序后的所有关联文档。
8、 一种搜索装置, 其特征在于, 包括:
搜索模块, 用于获取待搜索信息的关联文档;
计算模块, 用于基于词匹配算法及语义匹配算法, 计算所述搜索模块获 得的每一个关联文档与所述待搜索信息的相关度;
排序模块, 用于根据所述计算模块计算得到的相关度对所述搜索模块获 得的关联文档进行排序;
显示模块, 用于显示所述排序模块获得的排序结果。
9、 如权利要求 8所述的装置, 其特征在于, 所述计算模块包括: 第一向量化处理单元, 用于对所述待搜索信息进行向量化处理, 获得 m 个向量 ,·, 其中, m和 均为正整数, J- 1 < < m;
第二向量化处理单元, 用于对所述搜索模块获得的每一个关联文档进行 向量化处理, 获得每一个关联文档所对应的 n个向量 , 其中, 11和_ /均为 正整数, 且 1 _/ n;
词匹配计算单元, 用于基于词匹配算法, 计算得到所述第二向量化处理 单元处理后的关联文档与所述待搜索信息的关联评分 ;
语义匹配计算单元, 用于基于语义匹配算法, 计算得到所述第二向量化 处理单元处理后的关联文档与所述待搜索信息的关联评分 S2
相关度计算单元, 用于根据公式 5 = 0 X^ (1- o)x , 计算得到所述关联 文档与所述待搜索信息的相关度 S, 其中, 为预设的权重, 且 0 < < 1 (
10、如权利要求 9所述的装置, 其特征在于, 所述词匹配算法的公式为 )
Figure imgf000018_0001
所述语义匹配算法的公式为:
Figure imgf000018_0002
其中, 、 k、 为常数; 为第 个向量 ί,.在所述待搜索信息 中的词频; ?;为向量 ^在相应的关联文档中的词频; /为该相应关联文档的 长度, w /为所述搜索模块获得的所有关联文档的平均长度; w为向量 ί,.的 权重; mifjp d )为向量 tt与向量 d、的互信息 ,
11、 如权利要求 8-10任一项所述的装置, 其特征在于,
所述排序模块根据每个关联文档与所述待搜索信息的相关度, 按照相关 度从高至低的顺序对所述搜索模块获得的所有关联文档进行排序;
所述显示模块显示所述排序模块排序后的所有关联文档。
12、 一个或多个包含计算机可执行指令的存储介质, 所述计算机可执行 指令用于执行一种搜索方法, 其特征在于, 所述方法包括以下步骤:
获取待搜索信息的关联文档;
基于词匹配算法及语义匹配算法, 计算获取到的每一个关联文档与所述 待搜索信息的相关度;
根据计算得到的相关度对获取到的关联文档进行排序, 并显示排序结
PCT/CN2012/086025 2012-02-13 2012-12-06 搜索方法、装置及存储介质 WO2013120373A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/347,776 US9317590B2 (en) 2012-02-13 2012-12-06 Search method, search device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210031523.3A CN103246681B (zh) 2012-02-13 2012-02-13 一种搜索方法及装置
CN201210031523.3 2012-02-13

Publications (1)

Publication Number Publication Date
WO2013120373A1 true WO2013120373A1 (zh) 2013-08-22

Family

ID=48926205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/086025 WO2013120373A1 (zh) 2012-02-13 2012-12-06 搜索方法、装置及存储介质

Country Status (3)

Country Link
US (1) US9317590B2 (zh)
CN (1) CN103246681B (zh)
WO (1) WO2013120373A1 (zh)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699662B (zh) * 2013-12-27 2018-01-19 贝壳网际(北京)安全技术有限公司 一种通知栏消息展现方法及装置
GB201514249D0 (en) * 2015-08-12 2015-09-23 Trw Ltd Processing received radiation reflected from a target
US9984031B2 (en) 2015-10-26 2018-05-29 International Business Machines Corporation Adapter selection based on a queue time factor
CN106815252B (zh) * 2015-12-01 2020-08-25 阿里巴巴集团控股有限公司 一种搜索方法和设备
CN105653703A (zh) * 2015-12-31 2016-06-08 武汉传神信息技术有限公司 一种文档检索匹配方法
CN107341152B (zh) * 2016-04-28 2020-05-08 创新先进技术有限公司 一种参数输入的方法及装置
CN107798637A (zh) * 2016-08-30 2018-03-13 北京国双科技有限公司 同案异判文书的获取方法及装置
CN108415903B (zh) * 2018-03-12 2021-09-07 武汉斗鱼网络科技有限公司 判断搜索意图识别有效性的评价方法、存储介质和设备
CN110362813B (zh) * 2018-04-09 2023-12-05 乐万家财富(北京)科技有限公司 基于bm25的搜索相关性度量方法、存储介质、设备及***
CN109388786B (zh) * 2018-09-30 2024-01-23 广州财盟科技有限公司 一种文档相似度计算方法、装置、设备及介质
CN109408616A (zh) * 2018-10-10 2019-03-01 中南民族大学 内容相似性短文本查询方法、设备、***及存储介质
CN110162590A (zh) * 2019-02-22 2019-08-23 北京捷风数据技术有限公司 一种工程招标文本结合经济要素的数据库显示方法及其装置
CN111611372A (zh) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 搜索结果的排序方法及装置、音乐搜索方法及装置
CN109977292B (zh) * 2019-03-21 2022-12-27 腾讯科技(深圳)有限公司 搜索方法、装置、计算设备和计算机可读存储介质
CN113361248B (zh) * 2021-06-30 2022-08-12 平安普惠企业管理有限公司 一种文本的相似度计算的方法、装置、设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1741012A (zh) * 2004-08-23 2006-03-01 富士施乐株式会社 文本检索装置及方法
US20110087701A1 (en) * 2009-10-09 2011-04-14 International Business Machines Corporation System, method, and apparatus for pairing a short document to another short document from a plurality of short documents

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1162789C (zh) * 2001-09-06 2004-08-18 联想(北京)有限公司 通过主题词矫正基于向量空间模型文本相似度计算的方法
CN102043833B (zh) * 2010-11-25 2013-12-25 北京搜狗科技发展有限公司 一种基于查询词进行搜索的方法和搜索装置
US9589050B2 (en) * 2014-04-07 2017-03-07 International Business Machines Corporation Semantic context based keyword search techniques

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1741012A (zh) * 2004-08-23 2006-03-01 富士施乐株式会社 文本检索装置及方法
US20110087701A1 (en) * 2009-10-09 2011-04-14 International Business Machines Corporation System, method, and apparatus for pairing a short document to another short document from a plurality of short documents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PABLO CASTELLS ET AL.: "An Adaptation of the Vector-Space Model for Ontology-Based information Retrieval.", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING., vol. 19, no. 2, February 2007 (2007-02-01), pages 261 - 272, XP011152473 *

Also Published As

Publication number Publication date
US9317590B2 (en) 2016-04-19
US20140358914A1 (en) 2014-12-04
CN103246681A (zh) 2013-08-14
CN103246681B (zh) 2018-10-26

Similar Documents

Publication Publication Date Title
WO2013120373A1 (zh) 搜索方法、装置及存储介质
US11507975B2 (en) Information processing method and apparatus
CN104216942B (zh) 查询建议模板
JP5913736B2 (ja) キーワードの推薦
US11687968B1 (en) Serving advertisements based on partial queries
CN108628833B (zh) 原创内容摘要确定方法及装置,原创内容推荐方法及装置
US8103667B2 (en) Ranking results of multiple intent queries
CN108763362A (zh) 基于随机锚点对选择的局部模型加权融合Top-N电影推荐方法
US9864747B2 (en) Content recommendation device, recommended content search method, and program
WO2019023358A1 (en) SEMANTIC SIMILARITY FOR MODEL CLASSIFICATION OF RESULTS OF MACHINE LEARNING
CN103345517B (zh) 模拟tf-idf相似性计算的协同过滤推荐算法
CN106651544B (zh) 最少用户交互的会话式推荐***
US10152478B2 (en) Apparatus, system and method for string disambiguation and entity ranking
JP7150090B2 (ja) ショッピング検索のための商品属性抽出方法
JP2015522190A (ja) 検索結果の生成
CN107943910B (zh) 一种基于组合算法的个性化图书推荐方法
US11100169B2 (en) Alternative query suggestion in electronic searching
CN103744887B (zh) 一种用于人物搜索的方法、装置和计算机设备
CN110968789B (zh) 电子书推送方法、电子设备及计算机存储介质
CN111125348A (zh) 一种文本摘要的提取方法及装置
CN109960749A (zh) 模型获取方法、关键词生成方法、装置、介质及计算设备
CN107291894A (zh) 一种融合相似性和共同评分项数量的概率矩阵分解模型
US9251264B2 (en) Systems and methods for enabling an electronic graphical search space of a database
CN109144953B (zh) 搜索文件的排序方法、装置、设备、存储介质及搜索***
CN113449200A (zh) 物品推荐方法、装置及计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12868395

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14347776

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.01.15)

122 Ep: pct application non-entry in european phase

Ref document number: 12868395

Country of ref document: EP

Kind code of ref document: A1