TWI493366B - Retrieval methods and systems - Google Patents

Retrieval methods and systems Download PDF

Info

Publication number
TWI493366B
TWI493366B TW099104467A TW99104467A TWI493366B TW I493366 B TWI493366 B TW I493366B TW 099104467 A TW099104467 A TW 099104467A TW 99104467 A TW99104467 A TW 99104467A TW I493366 B TWI493366 B TW I493366B
Authority
TW
Taiwan
Prior art keywords
search result
search
correlation
sub
score
Prior art date
Application number
TW099104467A
Other languages
Chinese (zh)
Other versions
TW201128417A (en
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to TW099104467A priority Critical patent/TWI493366B/en
Publication of TW201128417A publication Critical patent/TW201128417A/en
Application granted granted Critical
Publication of TWI493366B publication Critical patent/TWI493366B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

檢索方法和系統Search method and system

本申請涉及網路資料處理領域,特別涉及一種檢索方法和系統。The present application relates to the field of network data processing, and in particular, to a retrieval method and system.

在搜索引擎的搜索過程中,對搜索結果按照某些屬性(例如:地域、來源或主題等)進行二次排序,使得前n(n>=1)條搜索結果在屬性上呈現分佈的多樣性,這樣的現象稱為搜索結果多樣化。在電子商務搜索中,搜索結果通常按照結果的相關性或者時間來排序,這樣的話將會導致供應商不斷的發佈同一種產品的資訊,以使得自己的產品能夠佔據搜索結果的前幾頁,從而惡意地擠掉其他供應商的產品展示機會,而且給普通用戶找到其他產品造成一定的困難。In the search process of the search engine, the search results are secondarily sorted according to certain attributes (for example, region, source or topic, etc.), so that the former n(n>=1) search results exhibit distribution diversity on the attributes. Such a phenomenon is called diversification of search results. In e-commerce search, search results are usually sorted according to the relevance or time of the results, which will cause the supplier to continuously publish the same product information, so that their products can occupy the first few pages of the search results, thus Maliciously squeeze out the product display opportunities of other suppliers, and it will cause certain difficulties for ordinary users to find other products.

為了避免這種現象,現有技術中,有一種通過抽取並按照相關性分級的方式進行檢索的方法,其具體實現過程為:預先對搜索結果按照相關性進行分檔,相關性分數值相近的搜索結果被劃分到同一個檔,然後再針對每個檔中的搜索結果進行抽取,該抽取的方式為:選取一個欄位作為多樣化的依據,例如:uid(供應商的唯一標識),則搜索結果中就包括了多樣化的供應商產品。在實際中,需要把搜索結果按照uid的值劃分成很多子集合,屬於同一個uid的搜索結果被劃分到一個子集合中,並在該子集合內按照相關性分數值從大到小排序,然後抽取每個子集合中m(m>=1)個最相關的搜索結果,並把抽取出來的這些結果展現在搜索頁面的前幾頁,這就能使得前幾頁的搜索結果中包含了多個不同uid的產品。In order to avoid this phenomenon, in the prior art, there is a method for performing retrieval by means of extraction and classification according to relevance, and the specific implementation process is: pre-searching the search results according to relevance, and searching for similar correlation values. The results are divided into the same file, and then the search results in each file are extracted. The extraction method is as follows: select a field as the basis for diversification, for example: uid (provider's unique identifier), then search The result includes a diverse range of supplier products. In practice, the search result needs to be divided into a plurality of sub-sets according to the value of uid, and the search results belonging to the same uid are divided into a sub-collection, and the sub-sets are sorted according to the relevance score values from large to small. Then extract m (m>=1) most relevant search results in each sub-set, and display the extracted results on the first pages of the search page, which can make the search results of the previous pages more A different uid product.

從上述過程中可以看出,在現有技術中,需要按照uid進行子集合劃分並排序,雖然也在一定程度上可以實現搜索結果的多樣化,但是現有技術在抽取和分檔的過程中會對全部的搜索結果進行一次重新組織,這需要在系統記憶體中重新拷貝一份搜索結果集,這樣就造成了大量的搜索引擎伺服器端的資源消耗,例如時間,硬體系統的耗費等,從而導致了搜索引擎伺服器端的性能降低;進一步的,每個子集合內部的排序其實並不是完全必要的,所以現有技術還進行了一部分無用的運算,這就浪費了用來進行這部分運算的系統資源;另外,現有技術採用相關性分檔的做法雖然在一定程度上可以平衡搜索結果的多樣性和相關性,但是也無法用一個固定的分檔區間正確切分所有搜索結果集合的分佈情況。如圖1所示,該檔位的劃分對查詢(Query)A是合適的,但是對於Query B的檔位劃分就不合適了,可以看出對於QueryA來說,相關性接近的搜索結果被劃分在相同的分檔區間,而對於QueryB,相關性接近的搜索結果並沒有被規律的劃分在同一分檔區間。因此現有技術的靈活性也不夠好。It can be seen from the above process that in the prior art, the sub-sets need to be divided and sorted according to the uid, although the search results are diversified to some extent, the prior art will be in the process of extracting and binning. All search results are reorganized once, which requires a copy of the search result set in the system memory, which results in a large amount of resource consumption on the server side of the search engine, such as time, hardware system consumption, etc., resulting in The performance of the search engine server is degraded; further, the sorting inside each sub-collection is not absolutely necessary, so the prior art also performs some useless operations, which wastes the system resources used to perform this part of the operation; In addition, the prior art adopting the correlation binning method can balance the diversity and relevance of the search results to a certain extent, but it is also impossible to correctly segment the distribution of all the search result sets by using a fixed binning interval. As shown in Fig. 1, the division of the gear position is suitable for the query (Query) A, but the division of the gear position of the Query B is not suitable. It can be seen that for the QueryA, the search result with close correlation is divided. In the same binning interval, for QueryB, the search results with close correlation are not regularly divided into the same binning interval. Therefore, the flexibility of the prior art is not good enough.

總之,目前需要本領域技術人員迫切解決的一個技術問題就是:如何能夠創新的提出一種檢索方法,以解決現有技術中在伺服器端資源過量消耗的問題。In short, a technical problem that needs to be solved urgently by those skilled in the art is how to innovate and propose a retrieval method to solve the problem of excessive consumption of resources on the server side in the prior art.

本申請所要解決的技術問題是提供一種檢索方法,用以解決現有技術中在伺服器端資源過量消耗導致的搜索引擎伺服器段性能降低的問題,更進一步的,還可以提升檢索方法的靈活性。The technical problem to be solved by the present application is to provide a retrieval method for solving the problem of the performance degradation of the search engine server segment caused by excessive consumption of resources on the server side in the prior art, and further, the flexibility of the retrieval method can be improved. .

本申請還提供了一種檢索系統,用以保證上述方法在實際中的實現及應用。The application also provides a retrieval system for ensuring the implementation and application of the above method in practice.

為了解決上述問題,本申請公開了一種檢索方法,包括:根據用戶端提交的查詢資料,獲得與所述查詢資料相關的第一檢索結果集合;根據所述集合中各個第一檢索結果的第一相關性分值和預置的多樣性欄位,計算獲取所述各個第一檢索結果的第二相關性分值;所述多樣性欄位用於表示所述第一檢索結果的屬性類別;根據所述第一相關性分值和第二相關性分值生成所述各個第一檢索結果的相關性參數值;按照第二檢索結果的預置個數和所述相關性參數值,從所述第一檢索結果集合中抽取需要向用戶端展示的第二檢索結果。In order to solve the above problem, the present application discloses a retrieval method, including: obtaining a first retrieval result set related to the query material according to the query data submitted by the user terminal; according to the first of the first retrieval results in the collection a correlation score and a preset diversity field, and calculating a second relevance score for obtaining each of the first search results; the diversity field is used to represent an attribute category of the first search result; And generating, by the first correlation score and the second correlation score, a correlation parameter value of each of the first retrieval results; according to a preset number of the second retrieval result and the correlation parameter value, from the The second search result set extracts a second search result that needs to be presented to the client.

本申請還提供了一種檢索系統,該系統包括:獲取單元,用於根據用戶端提交的查詢資料,獲得與所述查詢資料相關的第一檢索結果集合;計算單元,用於根據所述集合中各個第一檢索結果的第一相關性分值和預置的多樣性欄位,計算獲取所述各個第一檢索結果的第二相關性分值;所述多樣性欄位用於表示所述第一檢索結果的屬性類別;設置單元,用於根據所述第一相關性分值和第二相關性分值生成所述各個第一檢索結果的相關性參數值;抽取單元,用於按照第二檢索結果的預置個數和所述相關性參數值從所述第一檢索結果集合中抽取需要向用戶端展示的第二檢索結果。The present application further provides a retrieval system, comprising: an obtaining unit, configured to obtain a first retrieval result set related to the query material according to query data submitted by a user terminal; and a calculating unit, configured to be used according to the collection Calculating a second relevance score of each of the first search results by using a first relevance score of each of the first search results and a preset diversity field; the diversity field is used to indicate the first An attribute category of the search result; a setting unit, configured to generate a correlation parameter value of each of the first search results according to the first relevance score and the second relevance score; and an extracting unit, configured to follow the second The preset number of retrieval results and the correlation parameter value extract a second retrieval result that needs to be presented to the client from the first retrieval result set.

與現有技術相比,本申請包括以下優點:在本申請中,將現有技術中的第一相關性分數值和計算得到的第二相關性分數值之和作為相關性參數,以該相關性參數來對檢索結果進行二次抽取,以使得搜索結果中能夠儘量包括多樣化的搜索結果,並且本申請在多樣化的過程中進行了優化,使得運用本實施例所述的方法過程中系統資源的消耗更小、運算更快以及擴展更靈活,從而提升了搜索引擎伺服器段的性能。當然,實施本申請的任一產品並不一定需要同時達到以上所述的所有優點。Compared with the prior art, the present application includes the following advantages: In the present application, the sum of the first correlation score value in the prior art and the calculated second correlation score value is taken as a correlation parameter, and the correlation parameter is used as the correlation parameter. The search result is subjected to secondary extraction so that the search result can include diversified search results as much as possible, and the present application is optimized in a diversified process, so that the system resources in the process of using the method described in this embodiment are used. The performance of the search engine server segment is improved by lower consumption, faster computing, and more flexible expansion. Of course, implementing any of the products of the present application does not necessarily require all of the advantages described above to be achieved at the same time.

下面將結合本申請實施例中的附圖,對本申請實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例僅僅是本申請一部分實施例,而不是全部的實施例。基於本申請中的實施例,本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其他實施例,都屬於本申請保護的範圍。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

本申請可用於衆多通用或專用的計算裝置環境或配置中。例如:個人電腦、伺服器電腦、手持設備或攜帶型設備、平板型設備、多處理器裝置、包括以上任何裝置或設備的分散式計算環境等等。This application can be used in a variety of general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, handheld devices or portable devices, tablet devices, multi-processor devices, decentralized computing environments including any of the above devices or devices, and the like.

本申請可以在由電腦執行的電腦可執行指令的一般上下文中描述,例如程式模組。一般地,程式模組包括執行特定任務或實現特定抽象資料類型的常式、程式、物件、元件、資料結構等等。也可以在分散式計算環境中實踐本申請,在這些分散式計算環境中,由通過通信網路而被連接的遠端處理設備來執行任務。在分散式計算環境中,程式模組可以位於包括存儲設備在內的本地和遠端電腦存儲媒體中。The application can be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, a program module includes routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. The present application can also be practiced in a distributed computing environment where tasks are performed by remote processing devices that are connected through a communication network. In a distributed computing environment, program modules can be located in both local and remote computer storage media, including storage devices.

本申請的主要思想之一可以包括,首先採用現有技術中的方法,根據用戶端提交的查詢資料,獲得與所述查詢資料相關的第一檢索結果集合;本申請實施例中,關鍵是根據所述集合中各個第一檢索結果的第一相關性分值和預置的多樣性欄位,計算獲取所述各個第一檢索結果的第二相關性分值,所述多樣性欄位用於表示所述第一檢索結果的屬性類別;然後將所述第一相關性分值和第二相關性分值之和作為所述各個第一檢索結果的相關性參數值;最後按照第二檢索結果的預置個數和所述相關性參數值,從所述第一檢索結果集合中抽取需要向用戶端展示的第二檢索結果。這樣抽出的第二檢索結果就可以更加明顯的體現出搜索結果多樣化;也避免了大量的搜索引擎伺服器端的資源消耗,例如時間,硬體系統的耗費等,從而提升了搜索引擎伺服器端的性能;進一步的,還能夠使得本申請實施例的方法適應更多的搜索結果集合的分佈情況,增加了靈活性。One of the main ideas of the present application may include: first obtaining a first set of search results related to the query data according to the query data submitted by the user end by using the method in the prior art; Calculating a first relevance score of each first search result in the set and a preset diversity field, and calculating a second relevance score obtained by each of the first search results, wherein the diversity field is used to represent An attribute category of the first search result; then, a sum of the first relevance score and the second relevance score as a correlation parameter value of each of the first search results; and finally according to the second search result Presetting the number and the correlation parameter value, and extracting, from the first retrieval result set, a second retrieval result that needs to be presented to the client. The second search result extracted in this way can more clearly reflect the diversification of search results; also avoid a large number of search engine server side resource consumption, such as time, hardware system consumption, etc., thereby improving the search engine server side Performance; Further, the method of the embodiment of the present application can be adapted to the distribution of more search result sets, which increases flexibility.

參考圖2,示出了本申請一種檢索方法實施例1的流程圖,可以包括以下步驟:步驟201:根據用戶端提交的查詢資料,獲得與所述查詢資料相關的第一檢索結果集合。Referring to FIG. 2, a flowchart of Embodiment 1 of a retrieval method of the present application is shown, which may include the following steps: Step 201: Obtain a first retrieval result set related to the query material according to the query data submitted by the user end.

在搜索引擎相關的技術領域中,通常把用戶的查詢表示為符號Query,把與此Query匹配的一條結果表示為Doc,那麽與Query匹配的所有結果集就是Doc集合,表示為{Doc}。In the technical field related to search engines, the user's query is usually represented as a symbol Query, and a result matching the Query is represented as Doc, then all result sets matching the Query are Doc collections, denoted as {Doc}.

在本步驟中,當用戶端提交Query之後,搜索引擎伺服器內部處理過程的第一步就是把Query映射到{Doc},即Query->{Doc},其中,符號“->”表示映射的意思。同時,搜索引擎伺服器為{Doc}中的每個Doc計算出第一相關性分值(Score1),所述Score1用來表示當前Doc與當前Query的匹配程度,用符號表示即:{Doc}->{Doc,Score1}。其中,所述映射過程即是根據Query匹配檢索結果的過程,在計算Score1時,可以採用任何相關性演算法來計算,譬如經典的TF-IDF演算法,當然還可以採用其他方式,例如:資訊增益(IG)、互資訊(MI)以及熵的方法等。In this step, after the client submits the Query, the first step of the internal processing of the search engine server is to map the Query to {Doc}, ie Query->{Doc}, where the symbol "->" indicates the mapping. meaning. At the same time, the search engine server calculates a first relevance score (Score1) for each Doc in {Doc}, and the Score1 is used to indicate the degree of matching between the current Doc and the current Query, and is represented by a symbol: {Doc} ->{Doc, Score1}. The mapping process is a process of matching the retrieval result according to the Query. When calculating Score1, any correlation algorithm can be used for calculation, such as the classic TF-IDF algorithm, and of course other methods, such as: information Gain (IG), mutual information (MI), and entropy methods.

需要說明的是,第一檢索結果獲得的演算法可以由搜索引擎伺服器任意定義,本申請並不限制搜索引擎伺服器端採用何種演算法獲取第一檢索結果集合。因此,如果本步驟中的相關性演算法不一樣,那麽後續得到的第一檢索結果也會有差別,可以理解的是,這並不會對本申請的後續流程產生影響,因為本申請是針對給定第一檢索結果的多樣化處理,而無需限制獲得第一檢索結果的方式。It should be noted that the algorithm obtained by the first search result may be arbitrarily defined by the search engine server, and the application does not limit which algorithm is used by the search engine server to obtain the first search result set. Therefore, if the correlation algorithm in this step is different, the subsequent first retrieval result may be different. It can be understood that this does not affect the subsequent process of the application, because the application is directed to The diversification of the first search result is determined without limiting the manner in which the first search result is obtained.

步驟202:根據所述集合中各個第一檢索結果的第一相關性分值和預置的多樣性欄位,計算獲取所述各個第一檢索結果的第二相關性分值;所述多樣性欄位用於表示所述第一檢索結果的屬性類別。Step 202: Calculate, according to the first relevance score of each first search result in the set and the preset diversity field, a second relevance score obtained by acquiring each of the first search results; the diversity The field is used to represent the attribute category of the first search result.

在計算出第一檢索結果集合中各個第一檢索結果的Score1之後,需要根據預置的多樣性欄位和該Score1來計算各個第一檢索結果的第二相關性分值(Score2),其中,多樣性欄位用於標識第一檢索結果的屬性類別,例如,在電子商務垂直搜索中,各個檢索結果的uid(供應商的標識)或者地理位置資訊等;所述Score2用來表示根據Score1的值和各個第一檢索結果在該多樣性欄位下排名位置相關的一個數值,在實際應用中,所述Score2可以採用預先設置函數,並將該函數的參數設置為Score2和各個第一檢索結果的排名位置,則該函數的返回值即是Score2的值。該函數中設置排名位置和Score2具有某種關聯,例如,第一檢索結果的排名越靠前,得到的Score2越大。當然,根據實際情況的不同,本領域技術人員也可以在該函數中採取排名位置和Score2的其他關聯方式。After calculating Score1 of each first search result in the first search result set, it is necessary to calculate a second relevance score (Score2) of each first search result according to the preset diversity field and the Score1, where The diversity field is used to identify the attribute category of the first search result, for example, in the e-commerce vertical search, the uid of each search result (the identifier of the supplier) or the geographical location information, etc.; the Score2 is used to represent the score according to Score1 The value is a value related to the position of each of the first search results under the diversity field. In practical applications, the Score2 may adopt a preset function, and set the parameter of the function to Score2 and each first search result. The position of the function, the return value of the function is the value of Score2. The ranking position in the function has a certain relationship with Score2. For example, the higher the ranking of the first search result, the larger the Score2 obtained. Of course, depending on the actual situation, those skilled in the art can also adopt the ranking position and other association methods of Score2 in the function.

步驟203:根據所述第一相關性分值和第二相關性分值生成所述各個第一檢索結果的相關性參數值。Step 203: Generate a correlation parameter value of each of the first retrieval results according to the first correlation score and the second correlation score.

本步驟的內容是與現有技術中區別所在,即是在步驟202中計算了Score2的基礎上,根據Score1和Score2生成所述各個第一檢索結果的相關性參數值。具體生成所述各個第一檢索結果的相關性參數值的方法可以包括:將Score1和Score2之和作為所述各個第一檢索結果的相關性參數值;或者設定一權重值,使得相關性參數值等於Score2乘以該權重值所得的積加上Score1,例如設定所述權重值為2,則所述相關性參數值=Score1+2*Score2。本申請對如何根據Score1和Score2生成所述各個第一檢索結果的相關性參數值並不做具體限定,任何根據本申請思想的簡單變形都包括在本申請的保護範圍之內。在本實施例中,並不是僅僅通過Score1對第一檢索結果集合進行劃分,而是將Score1和Score2兩個參數共同生成的一個新的參數來對第一檢索結果做後續的處理。步驟204:按照第二檢索結果的預置個數和所述相關性參數值,從所述第一檢索結果集合中抽取需要向用戶端展示的第二檢索結果。The content of this step is different from the prior art, that is, based on Score2 calculated in step 202, the correlation parameter values of the respective first search results are generated according to Score1 and Score2. The method for specifically generating the correlation parameter values of the respective first retrieval results may include: using a sum of Score1 and Score2 as a correlation parameter value of each of the first retrieval results; or setting a weight value such that a correlation parameter value The product obtained by multiplying Score2 by the weight value plus Score1, for example, setting the weight value to 2, is the correlation parameter value = Score1 + 2 * Score2. The present application does not specifically limit how to generate the correlation parameter values of the respective first search results according to Score1 and Score2, and any simple modifications according to the idea of the present application are included in the protection scope of the present application. In this embodiment, the first search result set is not divided only by Score1, but a new parameter generated by the two parameters Score1 and Score2 is used to perform subsequent processing on the first search result. Step 204: Extract a second search result that needs to be displayed to the user end from the first search result set according to the preset number of the second search result and the correlation parameter value.

在本步驟中,假設多樣化欄位預置的為uid,本實施例需要的參數還包括第二檢索結果的預置個數,其中,具體的,預置個數的第二檢索結果可以通過預設多樣性值個數以及迴圈抽取次數獲得,即,通過預設的多樣性值個數與迴圈抽取次數的乘積計算獲得需要抽取的第二檢索結果的個數。該多樣性值個數用於表示在後續抽取的第二檢索結果中,不同uid的第一檢索結果中需要抽取的檢索結果的個數,例如,當個數為3時,表示不同的uid的搜索結果中均抽取3個。多樣化操作的迴圈次數,用於表示將抽取的第二檢索結果在後續向用戶端展示時,共需向用戶端展示多少個第二檢索結果,例如,當迴圈次數為1時,則返回3個第二檢索結果,當迴圈次數為2時,則返回6個第二檢索結果,以此類推。這樣抽取出的第二檢索結果就包括了與不同uid相關的檢索結果。In this step, assuming that the diversified field is preset to be uid, the parameters required in this embodiment further include a preset number of the second search result, wherein, specifically, the second search result of the preset number can pass The number of preset diversity values and the number of loop extractions are obtained, that is, the number of second search results that need to be extracted is obtained by multiplying the preset number of diversity values by the number of loop extraction times. The number of diversity values is used to indicate the number of search results that need to be extracted in the first search result of different uids in the second search result that is subsequently extracted, for example, when the number is 3, it indicates different uids. Three are extracted from the search results. The number of loops of the diversified operation is used to indicate how many second search results need to be displayed to the client when the second search result is displayed to the user, for example, when the number of loops is 1, Returns 3 second search results. When the number of loops is 2, 6 second search results are returned, and so on. The second search result thus extracted includes search results related to different uids.

在本實施例中,在避免大量的搜索引擎伺服器端的資源消耗,例如時間,硬體系統的耗費等,提升搜索引擎伺服器端的性能的同時,進一步的,還允許用戶可以很靈活的配置檢索結果多樣化的個數;同時可以通過定義不同的f(Position,Score1)函數來平衡檢索結果的相關性和多樣性,帶給用戶最好的搜索體驗。In this embodiment, while avoiding a large amount of resource consumption of the search engine server side, such as time, hardware system consumption, etc., the performance of the search engine server side is improved, and further, the user can flexibly configure the search. The number of results is diversified; at the same time, by defining different f(Position, Score1) functions to balance the relevance and diversity of the search results, the best search experience for users.

參考圖3,示出了本申請一種檢索方法實施例2的流程圖,可以包括以下步驟:步驟301:根據用戶端提交的查詢資料,獲得與所述查詢資料相關的第一檢索結果集合。Referring to FIG. 3, a flowchart of Embodiment 2 of a retrieval method of the present application is shown, which may include the following steps: Step 301: Obtain a first retrieval result set related to the query material according to the query data submitted by the user.

當然,在實際應用中,本申請實施例適用於搜索引擎伺服器的檢索結果並沒有實現多樣化的情況,即在獲得第一檢索結果並按照第一相關性分值進行排序後,該第一檢索結果中相同屬性的檢索結果仍聚集在一起。例如,在搜索引擎伺服器端的檢索結果的前若干位均為與同一個供應商相關的檢索結果。那麽,在所述步驟301之後,可以首先對第一檢索結果集合進行判斷,例如,判斷前若干個檢索結果是否屬於同一類別等,如果第一檢索結果集合中的前若干位元均為同一類別的檢索結果,則可以執行後續步驟。Of course, in practical applications, the embodiment of the present application is applicable to the case that the search result of the search engine server is not diversified, that is, after obtaining the first search result and sorting according to the first relevance score, the first Search results for the same attribute in the search results are still clustered together. For example, the first few digits of the search results on the search engine server side are search results related to the same vendor. Then, after the step 301, the first search result set may be first determined, for example, whether the previous plurality of search results belong to the same category, etc., if the first plurality of bits in the first search result set are the same category. After the search results, you can perform the next steps.

步驟302:按照預置的多樣性欄位對第一檢索結果集合進行分類,獲取所述第一檢索結果集合中每一個類別對應的子集合。Step 302: Classify the first search result set according to the preset diversity field, and obtain a subset corresponding to each category in the first search result set.

在本實施例中,假設接收得到的預置的多樣性欄位即是uid,那麽如表1所示,多樣性欄位uid有{A,B,C}三個值,本實施例中第一檢索結果集合{Doc}的關於uid的子集合為{A1,A2,A3},{B1,B2,B3},{C1,C2,C3},其中A1~A3的uid=A,A1~A3為A供應商下的檢索結果,B1~B3的uid=B,B1~B3為B供應商下的檢索結果,C1~C3的uid=C,C1~C3為C供應商下的檢索結果。In this embodiment, it is assumed that the received diversity preset field is uid, and as shown in Table 1, the diversity field uid has three values of {A, B, C}, in this embodiment The sub-set of uid for a search result set {Doc} is {A1, A2, A3}, {B1, B2, B3}, {C1, C2, C3}, where u1=A, A1~A3 of A1~A3 For the search results of the A supplier, the uid=B of B1~B3, the B1~B3 are the search results under the B supplier, the uid=C of C1~C3, and the C1~C3 are the search results under the C supplier.

步驟303:按照所述各個子集合中第一相關性分值大小獲取相應的第一檢索結果的位置。Step 303: Acquire a location of the corresponding first search result according to the first correlation score size in each of the subsets.

在本實施例中,具體需要按照Score1的大小對子集合中的各個第一檢索結果進行排序。具體如表1所示,表1為本實施例中的第一檢索結果集合{Doc}以及每個Doc對應的uid和第一相關性分數值(Score1):In this embodiment, it is specifically required to sort each of the first search results in the subset according to the size of Score1. Specifically, as shown in Table 1, Table 1 is the first search result set {Doc} in the present embodiment and the uid and the first relevance score value (Score1) corresponding to each Doc:

步驟304:根據預置的第二相關性分值與各個第一檢索結果在所述分類後的各子集合中的位置的關係,匹配獲取到各個第一檢索結果的第二相關性分值。Step 304: Match the acquired second relevance scores of the respective first search results according to the relationship between the preset second relevance scores and the positions of the respective first search results in the classified sub-sets.

在實際中,各個第一檢索結果在所述分類後的各子集合中的位置與Score2的關係可以通過預置的函數來表示,例如,通過運算預置的第二相關性分值的獲取函數,獲得第一檢索結果的第二相關性分值,所述獲取函數的參數分別為所述第一檢索結果在所述分類後的各子集合中的位置和第二相關性分值。其中,所述第二相關性分值與各個第一檢索結果在所述分類後的各子集合中的位置的關係,可以理解為對於按照第一相關性分值排序並且按照多樣性欄位進行分類之後獲得的各子集合中的檢索結果的位置與第二相關性分值的大小關係,在實際中可以採用函數f(Position,Score1)的形式來表現,該函數在實際中可以隨著用戶的需求或者實際情況而採用任何形式和內容,本申請並不限定該函數的形式具體實現。例如,在實際中一個函數的示例可以如下所示:In practice, the relationship between the position of each first search result in each of the classified sub-sets and Score2 may be represented by a preset function, for example, by computing a preset second correlation score acquisition function. And obtaining a second relevance score of the first search result, where the parameters of the acquisition function are respectively a position of the first search result in each of the classified sub-sets and a second relevance score. The relationship between the second relevance score and the position of each first search result in each of the classified sub-sets can be understood as being sorted according to the first relevance score and according to the diversity field. The relationship between the position of the search result and the second relevance score in each sub-collection obtained after the classification can be expressed in the form of a function f(Position, Score1), which can be followed by the user in practice. The form or content of the requirements or actual conditions, the application does not limit the specific implementation of the form of the function. For example, an example of a function in practice can be as follows:

float f(int position,float score){Float f(int position,float score){

if(position==1)If(position==1)

return 300.0f;Return 300.0f;

elseElse

return 0.0f;Return 0.0f;

}}

上述函數的含義即是,當子集合中第一檢索結果的排序位置為1時,則返回300,即是Score2值為300,而其他排序位置的第一檢索結果的Score2值為零。The meaning of the above function is that when the sorting position of the first search result in the sub-set is 1, then 300 is returned, that is, the Score2 value is 300, and the Score2 value of the first search result of the other sorting positions is zero.

步驟305:根據所述第一相關性分值和第二相關性分值生成所述子集合中各個第一檢索結果的相關性參數值。Step 305: Generate a correlation parameter value of each first retrieval result in the subset according to the first correlation score and the second correlation score.

本實施例生成各個第一檢索結果的相關性參數值的方法可以包括:將步驟304中得到的第二相關性分值與第一檢索結果的第一相關性分值進行相加,並將得到的和作為各個第一檢索結果的相關性參數值。參考表2,示出了本實施例中子集合中各個第一檢索結果的第一相關性分值、第二相關性分值以及相關性參數值。本實施例對獲得相關性參數值的具體方法並不做限定,任何針對本實施例思想的簡單變形都在本實施例的保護範圍之內。The method for generating the correlation parameter value of each first search result in this embodiment may include: adding the second relevance score obtained in step 304 to the first correlation score of the first search result, and And the correlation parameter value as the result of each first search. Referring to Table 2, the first relevance score, the second relevance score, and the correlation parameter values of the respective first search results in the subset in the present embodiment are shown. The specific method for obtaining the correlation parameter value is not limited in this embodiment, and any simple modification to the idea of the embodiment is within the protection scope of the embodiment.

步驟306:按照所述相關性參數值對所述第一檢索結果分類後的各子集合進行排序。Step 306: Sort each sub-set of the first search result according to the correlation parameter value.

對所述第一檢索結果分類後的各子集合再按照步驟305中得到的新的相關性參數值進行排序,可以得到重新排序後的各個第一檢索結果在各子集合中的新順序,在本實施例中,針對所述第一檢索結果重新排序後的各子集合的前三位元分別為A1、B1和C1。The sub-groups classified by the first search result are further sorted according to the new correlation parameter values obtained in step 305, and the new order of the re-sorted first search results in each sub-set can be obtained. In this embodiment, the first three bits of each subset that are reordered for the first search result are A1, B1, and C1, respectively.

步驟307:從排序後的各子集合中,按照排序的先後順序分別抽取預置個數的第二檢索結果,並將第二檢索結果反饋給用戶端。Step 307: Extract, from each of the sorted sub-sets, a preset number of second search results according to the order of sorting, and feed back the second search result to the user end.

其中,預置個數的第二檢索結果可以通過預設多樣性值個數以及迴圈抽取次數獲得,即,通過預設的多樣性值個數與迴圈抽取次數的乘積計算獲得需要抽取的第二檢索結果的個數。The second search result of the preset number can be obtained by the preset number of diversity values and the number of loop extraction times, that is, the product that needs to be extracted is calculated by multiplying the preset number of diversity values and the number of loop extraction times. The number of second search results.

其中,多樣性欄位用於表示所述第一檢索結果的屬性類別,而多樣性欄位值則是表示所述第一檢索結果的屬性類別的值,在本實施例中多樣性欄位即是uid,多樣性欄位值是A、B和C,按照多樣性欄位對第一檢索結果進行分類後可得到A、B和C三個子集合。其中,可以直接預設每個子集合中需要抽取的第二檢索結果的個數,也可以通過預設多樣性值個數以及迴圈抽取次數獲得。多樣性值個數用於表示在後續抽取的第二檢索結果中,不同uid需要抽取的第一檢索結果的個數。例如,個數為3時,表示A、B和C供應商的第一檢索結果分別需要抽取3個。在本實施例中,在抽取第二檢索結果時還需要依據迴圈抽取次數,所述迴圈抽取次數可以用來表示每個類別中需要迴圈抽取第二檢索結果的次數。例如,在本實施例中,其迴圈抽取次數(distinct_times)可以理解為,當迴圈抽取次數為1時,在每個供應商的檢索結果中只抽取3個作為第二檢索結果,當迴圈抽取次數為2時,則抽取每個供應商的檢索結果中3*2個作為第二檢索結果,其中,抽取方式和迴圈抽取次數為1時相同,依次類推。The diversity field is used to represent the attribute category of the first search result, and the diversity field value is a value indicating the attribute category of the first search result. In this embodiment, the diversity field is It is uid, and the diversity field values are A, B, and C. The first search results are classified according to the diversity field to obtain three sub-sets A, B, and C. The number of the second search results that need to be extracted in each sub-collection may be directly preset, or may be obtained by preset the number of diversity values and the number of loop extractions. The number of diversity values is used to indicate the number of first search results that need to be extracted by different uids in the second search result that is subsequently extracted. For example, when the number is 3, it means that the first search results of the A, B, and C suppliers need to extract 3 respectively. In this embodiment, when the second search result is extracted, it is also required to be based on the number of loop extractions, and the number of loop extractions may be used to indicate the number of times in each category that the loop needs to be extracted to extract the second search result. For example, in the present embodiment, the number of times of the circle extraction (distinct_times) can be understood as that when the number of times of loop extraction is 1, only three of the search results of each supplier are extracted as the second retrieval result, when When the number of circle extractions is 2, 3*2 of the search results of each supplier are extracted as the second retrieval result, wherein the extraction mode is the same as the number of loop extraction times is 1, and so on.

如果按照distinct_count=1,distinct_times=1的設置進行第二檢索結果的抽取,最後得到的第二檢索結果順序就是A1~B1~C1;如果按照distinct_count=1,distinct_times=3的設置進行第二檢索結果的抽取,最後得到的第二檢索結果順序就是A1~B1~C1~A2~A3~B2~B3~C2~C3。本領域技術人員還可以通過設置不同的distinct_count,distinct_times和f(Position,Score1)函數,以達到不一樣的多樣化效果,從而能使得搜索結果的多樣性與相關性之間能達到平衡。If the second search result is extracted according to the setting of distinct_count=1, distinct_times=1, the second search result sequence obtained is A1~B1~C1; if the second search result is performed according to the setting of distinct_count=1, distinct_times=3 The second search result sequence obtained by the extraction is A1~B1~C1~A2~A3~B2~B3~C2~C3. Those skilled in the art can also achieve different diversification effects by setting different distinct_count, distinct_times and f(Position, Score1) functions, thereby achieving a balance between the diversity and correlation of search results.

可以看出,在採用本實施例所述的方法中,第二檢索結果的前3條記錄中就分別包括了uid為A、B、C的3條檢索結果。這樣就能夠使得最終返回給用戶的第二檢索結果能夠實現多樣化,以滿足搜索結果多樣化的需求,並且在多樣化的過程中進行了優化,使得運用本實施例所述的方法過程中系統資源的消耗更小、運算更快以及擴展更靈活。It can be seen that, in the method described in this embodiment, the first three records of the second search result respectively include three search results with uids of A, B, and C. In this way, the second retrieval result finally returned to the user can be diversified to meet the diversification of the search results, and is optimized in a diversified process, so that the system in the process described in this embodiment is used. Resource consumption is smaller, operations are faster, and expansion is more flexible.

參考圖4,示出了本申請一種檢索方法實施例3的流程圖,本實施例可以理解為將本申請的檢索方法應用於實際中的一個具體例子,可以包括以下步驟:步驟401:按照第一相關性分值對所述用戶端提交的查詢資料進行檢索。Referring to FIG. 4, a flowchart of Embodiment 3 of a retrieval method of the present application is shown. This embodiment can be understood as a specific example of applying the retrieval method of the present application to practice, and may include the following steps: Step 401: According to the A correlation score is used to retrieve the query data submitted by the client.

在本實施例中,在搜索引擎伺服器在獲取第一檢索結果時,需要按照第一相關性分值對當前查詢資料進行查詢。In this embodiment, when the search engine server obtains the first search result, the current query data needs to be queried according to the first relevance score.

步驟402:將檢索結果按照預先選取的多樣化欄位抽取出第一檢索結果。Step 402: Extract the search result according to the pre-selected diversified field to extract the first search result.

這裏的多樣化欄位需要預先選取,例如,在實施例2中,就是將多樣化欄位選取為uid。The diversified fields here need to be pre-selected. For example, in Embodiment 2, the diversified fields are selected as uid.

步驟403:按照預置的多樣性欄位值對第一檢索結果集合進行分類,獲取所述第一檢索結果集合中每一個類別對應的子集合。Step 403: Classify the first search result set according to the preset diversity field value, and obtain a subset corresponding to each category in the first search result set.

根據選取的uid在第一檢索結果集合中,選擇檢索結果中和供應商A、B和C都相關的所有的檢索結果作為第一檢索結果中關於uid的子集合。According to the selected uid, in the first search result set, all the search results related to the suppliers A, B, and C in the search result are selected as the sub-sets of the uid in the first search result.

步驟404:按照所述各個子集合中第一相關性分值大小獲取相應的第一檢索結果的位置。Step 404: Obtain a location of the corresponding first search result according to the first correlation score size in each of the subsets.

步驟405:根據預置的第二相關性分值與各個第一檢索結果在所述分類後的各子集合中的位置的關係,匹配獲取到各個第一檢索結果的第二相關性分值。Step 405: Match the acquired second relevance scores of the respective first search results according to the relationship between the preset second relevance score and the positions of the respective first search results in the classified sub-sets.

步驟406:將所述第一相關性分值和第二相關性分值之和作為所述各個第一檢索結果的相關性參數值。Step 406: The sum of the first relevance score and the second relevance score is used as a correlation parameter value of each of the first retrieval results.

步驟407:按照所述相關性參數值對第一檢索結果分類後的各子集合進行排序。Step 407: Sort each sub-set of the first search result according to the correlation parameter value.

步驟408:從排序後的各子集合中,按照排序的先後順序分別抽取預置個數的第二檢索結果。Step 408: Extract, from each of the sorted sub-sets, a preset number of second search results according to the order of sorting.

其中,所述步驟404~408的執行過程可以參見實施例2中的描述。The execution process of the steps 404-408 can be referred to the description in Embodiment 2.

步驟409:將查詢資料、第二檢索結果以及兩者的對應關係保存至資料庫中。Step 409: Save the query data, the second search result, and the corresponding relationship between the two to the database.

在本實施例中,在得到用戶當前的查詢資料、第二檢索結果以及兩者的對應關係之後,將其保存至資料庫中。在保存時,可以採用資料表或者其他永久資料結構的形式進行保存。In this embodiment, after the current query data, the second search result, and the corresponding relationship between the two are obtained, the user is saved in the database. When saving, it can be saved in the form of a data sheet or other permanent data structure.

步驟410:將所述第二檢索結果向用戶端進行展示。Step 410: Display the second search result to the user end.

同時,將所述第二檢索結果向用戶端進行展示,例如,可以展示實施例2中的前三位第二檢索結果:A1、B2和C2,也可以將子集合中所有檢索結果都進行展示,例如:A1~B1~C1~A2~A3~B2~B3~C2~C3。At the same time, the second search result is displayed to the user end. For example, the first three second search results in the second embodiment can be displayed: A1, B2, and C2, and all the search results in the sub-set can also be displayed. For example: A1~B1~C1~A2~A3~B2~B3~C2~C3.

對於前述的各方法實施例,為了簡單描述,故將其都表述為一系列的動作組合,但是本領域技術人員應該知悉,本申請並不受所描述的動作順序的限制,因為依據本申請,某些步驟可以採用其他順序或者同時進行。其次,本領域技術人員也應該知悉,說明書中所描述的實施例均屬於優選實施例,所涉及的動作和模組並不一定是本申請所必須的。For the foregoing method embodiments, for the sake of brevity, they are all described as a series of action combinations, but those skilled in the art should understand that the present application is not limited by the described action sequence, because according to the present application, Some steps can be performed in other orders or at the same time. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present application.

與上述本申請一種檢索方法實施例1所提供的方法相對應,參見圖5,本申請還提供了一種檢索系統實施例1,在本實施例中,該系統可以包括:獲取單元501,用於根據用戶端提交的查詢資料,獲得與所述查詢資料相關的第一檢索結果集合。Corresponding to the method provided in Embodiment 1 of the search method of the present application, referring to FIG. 5, the present application further provides a retrieval system embodiment 1. In this embodiment, the system may include: an obtaining unit 501, configured to: Obtaining a first set of search results related to the query data according to the query data submitted by the client.

在搜索引擎相關的技術領域中,通常把用戶的查詢表示為符號Query,把與此Query匹配的一條結果表示為Doc,那麽與Query匹配的所有結果集就是Doc集合,表示為{Doc}。In the technical field related to search engines, the user's query is usually represented as a symbol Query, and a result matching the Query is represented as Doc, then all result sets matching the Query are Doc collections, denoted as {Doc}.

計算單元502,用於根據所述集合中各個第一檢索結果的第一相關性分值和預置的多樣性欄位,計算獲取所述各個第一檢索結果的第二相關性分值;所述多樣性欄位用於表示所述第一檢索結果的屬性類別。The calculating unit 502 is configured to calculate, according to the first relevance score of each first search result in the set and the preset diversity field, a second relevance score obtained by acquiring each of the first search results; The diversity field is used to represent the attribute category of the first search result.

在計算出第一檢索結果結合中各個第一檢索結果的Score1之後,需要根據預置的多樣性欄位和該Score1來計算各個第一檢索結果的第二相關性分值(Score2),其中,多樣性欄位用於標識第一檢索結果的屬性類別,例如,各個檢索結果的uid或者地理位置資訊等;所述Score2用來表示根據Score1的值和各個第一檢索結果在該多樣性欄位下排名位置相關的一個數值。After calculating Score1 of each first search result in the first search result combination, it is necessary to calculate a second relevance score (Score2) of each first search result according to the preset diversity field and the Score1, wherein The diversity field is used to identify the attribute category of the first search result, for example, the uid or geographic location information of each search result, etc.; the Score2 is used to indicate the value according to Score1 and each first search result in the diversity field. A value associated with the position of the position below.

設置單元503,用於根據所述第一相關性分值和第二相關性分值生成所述各個第一檢索結果的相關性參數值,具體生成相關性參數值的方法可以包括:將所述第一相關性分值和第二相關性分值之和作為所述各個第一檢索結果的相關性參數值。The setting unit 503 is configured to generate a correlation parameter value of each of the first retrieval results according to the first correlation score and the second correlation score, and the method for specifically generating the correlation parameter value may include: The sum of the first correlation score and the second relevance score is used as a correlation parameter value of each of the first retrieval results.

抽取單元504,用於第二檢索結果的預置個數和所述相關性參數值,從所述第一檢索結果集合中抽取需要向用戶端展示的第二檢索結果。The extracting unit 504 is configured to use a preset number of the second retrieval result and the correlation parameter value, and extract, from the first retrieval result set, a second retrieval result that needs to be displayed to the user end.

在本步驟中,假設多樣化欄位預置的為uid,本實施例需要的參數還包括第二檢索結果的預置個數,其中,具體的,預置個數的第二檢索結果可以通過預設多樣性值個數以及迴圈抽取次數獲得,即,通過預設的多樣性值個數與迴圈抽取次數的乘積計算獲得需要抽取的第二檢索結果的個數。針對所述多樣化欄位預置的每個多樣性值個數,該多樣性值個數用於表示在後續抽取的第二檢索結果中,不同uid需要抽取的個數,例如,當個數為3時,表示不同的uid的搜索結果中均抽取3個。In this step, assuming that the diversified field is preset to be uid, the parameters required in this embodiment further include a preset number of the second search result, wherein, specifically, the second search result of the preset number can pass The number of preset diversity values and the number of loop extractions are obtained, that is, the number of second search results that need to be extracted is obtained by multiplying the preset number of diversity values by the number of loop extraction times. For the number of diversity values preset for the diversified field, the number of diversity values is used to indicate the number of different uids to be extracted in the second search result that is subsequently extracted, for example, when the number is When it is 3, it means that 3 of the search results of different uids are extracted.

本實施例所述的系統可以集成到搜索引擎的伺服器上,也可以單獨作為一個實體與搜索引擎伺服器相連,另外,需要說明的是,當本申請所述的方法採用軟體實現時,可以作為搜索引擎的伺服器新增的一個功能,也可以單獨編寫相應的程式,本申請不限定所述方法或系統的實現方式。The system described in this embodiment may be integrated into the server of the search engine, or may be connected to the search engine server as an entity separately. In addition, it should be noted that when the method described in the present application is implemented by software, As a new function of the server of the search engine, the corresponding program can also be written separately, and the present application does not limit the implementation of the method or system.

與上述本申請一種檢索方法實施例2所提供的方法相對應,參見圖6,本申請還提供了一種檢索裝置的優選實施例2,在本實施例中,該裝置具體可以包括:獲取單元501,用於根據用戶端提交的查詢資料,獲得與所述查詢資料相關的第一檢索結果集合。Corresponding to the method provided in Embodiment 2 of the search method of the present application, referring to FIG. 6, the present application further provides a preferred embodiment 2 of the search apparatus. In this embodiment, the apparatus may specifically include: an obtaining unit 501. And configured to obtain, according to the query data submitted by the user end, a first set of search results related to the query data.

第一獲取子單元601,用於按照預置的多樣性欄位對第一檢索結果集合進行分類,獲取所述第一檢索結果集合中每一個類別對應的子集合。The first obtaining sub-unit 601 is configured to classify the first search result set according to the preset diversity field, and obtain a subset corresponding to each category in the first search result set.

第二獲取子單元602,用於按照所述各個子集合中第一相關性分值大小獲取相應的第一檢索結果的位置。The second obtaining sub-unit 602 is configured to obtain a location of the corresponding first search result according to the first correlation score size in each of the subsets.

匹配子單元603,用於根據預置的第二相關性分值與各個第一檢索結果在所述分類後的各子集合中的位置的關係,匹配獲取到各個第一檢索結果的第二相關性分值。The matching sub-unit 603 is configured to match and acquire the second correlation of each of the first search results according to the relationship between the preset second relevance score and the position of each of the first search results in the classified sub-sets Sex score.

設置單元503,用於根據所述第一相關性分值和第二相關性分值生成所述各個第一檢索結果的相關性參數值,具體生成相關性參數值的方法可以包括:將所述第一相關性分值和第二相關性分值之和作為所述各個第一檢索結果的相關性參數值。The setting unit 503 is configured to generate a correlation parameter value of each of the first retrieval results according to the first correlation score and the second correlation score, and the method for specifically generating the correlation parameter value may include: The sum of the first correlation score and the second relevance score is used as a correlation parameter value of each of the first retrieval results.

排序子單元604,用於按照所述相關性參數值對所述第一檢索結果分類後的各子集合進行排序。The sorting sub-unit 604 is configured to sort the sub-sets of the first search result according to the correlation parameter value.

第一抽取子單元605,用於從排序後的各子集合中,按照排序的先後順序分別抽取預置個數的第二檢索結果,並將第二檢索結果反饋給用戶端。The first extraction sub-unit 605 is configured to extract a preset number of second retrieval results from the sorted sub-sets in a sorted order, and feed the second retrieval result to the user end.

與上述本申請一種檢索方法實施例3所提供的方法相對應,參考圖7,本申請還提供了一種檢索系統的實施例,在本實施例中,該系統具體可以包括:檢索子單元701,用於按照第一相關性分值對所述用戶端提交的查詢資料進行檢索。Corresponding to the method provided in Embodiment 3 of the search method of the present application, with reference to FIG. 7, the present application further provides an embodiment of a retrieval system. In this embodiment, the system may specifically include: a retrieval sub-unit 701. And searching for the query data submitted by the client according to the first relevance score.

第二抽取子單元702,用於將檢索結果按照預先選取的多樣化欄位抽取出第一檢索結果。The second extraction sub-unit 702 is configured to extract the first retrieval result according to the pre-selected diversified field.

第一獲取子單元601,用於按照預置的多樣性欄位值對第一檢索結果集合進行分類,獲取所述第一檢索結果集合中每一個類別對應的子集合。The first obtaining sub-unit 601 is configured to classify the first search result set according to the preset diversity field value, and obtain a subset corresponding to each category in the first search result set.

第二獲取子單元602,用於按照所述各個子集合中第一相關性分值大小獲取相應的第一檢索結果的位置。The second obtaining sub-unit 602 is configured to obtain a location of the corresponding first search result according to the first correlation score size in each of the subsets.

匹配子單元603,用於根據預置的第二相關性分值與各個第一檢索結果在所述分類後的各子集合中的位置與的關係,匹配獲取到所述各個第一檢索結果的第二相關性分值。The matching sub-unit 603 is configured to match and obtain the first search result according to the relationship between the preset second relevance score and the position and relationship of each first search result in each of the classified sub-sets The second relevance score.

設置單元503,用於根據所述第一相關性分值和第二相關性分值生成所述各個第一檢索結果的相關性參數值,具體生成相關性參數值的方法可以包括:將所述第一相關性分值和第二相關性分值之和作為所述各個第一檢索結果的相關性參數值。The setting unit 503 is configured to generate a correlation parameter value of each of the first retrieval results according to the first correlation score and the second correlation score, and the method for specifically generating the correlation parameter value may include: The sum of the first correlation score and the second relevance score is used as a correlation parameter value of each of the first retrieval results.

排序子單元604,用於按照所述相關性參數值對所述第一檢索結果分類後的各子集合進行排序。The sorting sub-unit 604 is configured to sort the sub-sets of the first search result according to the correlation parameter value.

第一抽取子單元605,用於從排序後的各子集合中,屬於同一個多樣性欄位值的第一檢索結果中,按照排序的先後順序分別抽取預置個數的第二檢索結果。The first extraction sub-unit 605 is configured to extract a preset number of second retrieval results in the first retrieval result belonging to the same diversity field value from the sorted sub-sets.

保存單元703,用於將查詢資料、第二檢索結果以及兩者的對應關係保存至資料庫中。The saving unit 703 is configured to save the query data, the second search result, and the corresponding relationship between the two to the database.

展示單元704,用於將所述第二檢索結果向用戶端進行展示。The display unit 704 is configured to display the second search result to the user end.

需要說明的是,本說明書中的各個實施例均採用遞進的方式描述,每個實施例重點說明的都是與其他實施例的不同之處,各個實施例之間相同相似的部分互相參見即可。對於裝置類實施例而言,由於其與方法實施例基本相似,所以描述的比較簡單,相關之處參見方法實施例的部分說明即可。It should be noted that each embodiment in the specification is described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the embodiments are referred to each other. can. For the device type embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

最後,還需要說明的是,在本文中,諸如第一和第二等之類的關係術語僅僅用來將一個實體或者操作與另一個實體或操作區分開來,而不一定要求或者暗示這些實體或操作之間存在任何這種實際的關係或者順序。而且,術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含,從而使得包括一系列要素的過程、方法、物品或者設備不僅包括那些要素,而且還包括沒有明確列出的其他要素,或者是還包括為這種過程、方法、物品或者設備所固有的要素。在沒有更多限制的情況下,由語句“包括一個……”限定的要素,並不排除在包括所述要素的過程、方法、物品或者設備中還存在另外的相同要素。Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

以上對本申請所提供的一種檢索方法及系統進行了詳細介紹,本文中應用了具體個例對本申請的原理及實施方式進行了闡述,以上實施例的說明只是用於幫助理解本申請的方法及其核心思想;同時,對於本領域的一般技術人員,依據本申請的思想,在具體實施方式及應用範圍上均會有改變之處,綜上所述,本說明書內容不應理解為對本申請的限制。The foregoing is a detailed description of a search method and system provided by the present application. The principles and implementation manners of the present application are described in the specific examples. The description of the above embodiments is only used to help understand the method of the present application and At the same time, there will be changes in the specific embodiments and application scopes according to the idea of the present application, and the contents of the present specification should not be construed as limiting the present application. .

501...獲取單元501. . . Acquisition unit

502...計算單元502. . . Computing unit

503...設置單元503. . . Setting unit

504...抽取單元504. . . Extraction unit

601...第一獲取子單元601. . . First acquisition subunit

602...第二獲取子單元602. . . Second acquisition subunit

603...匹配子單元603. . . Matching subunit

604...排序子單元604. . . Sort subunit

605...第一抽取子單元605. . . First extraction subunit

701...檢索子單元701. . . Retrieve subunit

702...第二抽取子單元702. . . Second extraction subunit

703...保存單元703. . . Storage unit

704...展示單元704. . . Display unit

為了更清楚地說明本申請實施例中的技術方案,下面將對實施例描述中所需要使用的附圖作簡單地介紹,顯而易見地,下面描述中的附圖僅僅是本申請的一些實施例,對於本領域普通技術人員來講,在不付出創造性勞動的前提下,還可以根據這些附圖獲得其他的附圖。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.

圖1是現有技術中檔位元劃分的介面示意圖;1 is a schematic diagram of an interface of a bit position division in the prior art;

圖2是本申請的一種檢索方法實施例1的流程圖;2 is a flowchart of Embodiment 1 of a retrieval method of the present application;

圖3是本申請的一種檢索方法實施例2的流程圖;3 is a flow chart of Embodiment 2 of a retrieval method of the present application;

圖4是本申請的一種檢索方法實施例3的流程圖;4 is a flowchart of Embodiment 3 of a retrieval method of the present application;

圖5是本申請的一種檢索系統實施例1的結構框圖;FIG. 5 is a structural block diagram of Embodiment 1 of a retrieval system of the present application; FIG.

圖6是本申請的一種檢索系統實施例2的結構框圖;6 is a structural block diagram of Embodiment 2 of a retrieval system of the present application;

圖7是本申請的一種檢索系統實施例3的結構框圖。FIG. 7 is a structural block diagram of Embodiment 3 of a retrieval system of the present application.

Claims (10)

一種檢索方法,其特徵在於,該方法包括:根據用戶端提交的查詢資料,獲得與該查詢資料相關的第一檢索結果集合;根據該集合中各個第一檢索結果的第一相關性分值和預置的多樣性欄位,計算獲取該各個第一檢索結果的第二相關性分值;該多樣性欄位用於表示該第一檢索結果的屬性類別;根據該第一相關性分值和第二相關性分值生成該各個第一檢索結果的相關性參數值,其中該生成包括,將第一相關性分值和第二相關性分值之和作為該各個第一檢索結果的相關性參數值;及按照第二檢索結果的預置個數和該相關性參數值,從該第一檢索結果集合中抽取需要向用戶端展示的第二檢索結果,其中,該抽取包括:將該第一檢索結果集合分類,獲得複數個子集合;按照該相關性參數值對該第一檢索結果分類後的複數個子集合之各子集合進行排序;及從排序後的各子集合中,按照排序的先後順序分別抽取預置個數的第二檢索結果,該預置個數為多樣性值個數和迴圈抽取次數的乘積。 A retrieval method, comprising: obtaining a first retrieval result set related to the query data according to the query data submitted by the user terminal; and determining a first relevance score according to each first retrieval result in the set Presetting the diversity field, calculating a second relevance score for obtaining each of the first search results; the diversity field is used to represent an attribute category of the first search result; according to the first relevance score and The second correlation score generates a correlation parameter value of the respective first retrieval results, wherein the generating includes, as a correlation between the first correlation score and the second correlation score as the correlation of the respective first retrieval results a parameter value; and a second search result that needs to be displayed to the user end is extracted from the first search result set according to the preset number of the second search result and the correlation parameter value, wherein the extracting comprises: Sorting a search result set to obtain a plurality of sub-sets; sorting each sub-set of the plurality of sub-sets classified by the first search result according to the correlation parameter value; The sequence of each subset in the order sorted search results are extracted a second preset number, the preset number as the product of the number of loops value diversity and extraction times. 根據申請專利範圍第1項所述的方法,其中,將該第一檢索結果集合分類,獲得複數個子集合包括: 按照預置的多樣性欄位對第一檢索結果集合進行分類,獲取該第一檢索結果集合中每一個屬性類別對應的子集合;以及計算獲取該各個第一檢索結果的第二相關性分值包括:按照該各個子集合中第一相關性分值大小獲取相應的第一檢索結果的位置;及根據預置的第二相關性分值與各個第一檢索結果在該分類後的各子集合中的位置的關係,匹配獲取到各個第一檢索結果的第二相關性分值。 The method of claim 1, wherein the first search result set is classified to obtain a plurality of sub-sets including: Sorting the first set of search results according to the preset diversity field, obtaining a subset corresponding to each attribute category in the first search result set; and calculating a second relevance score for obtaining each of the first search results The method includes: obtaining a location of the corresponding first search result according to the first correlation score size in the respective subsets; and selecting, according to the preset second relevance score, each subset of the first search result in the classified subset The relationship of the positions in the match matches the second relevance scores obtained for each of the first search results. 根據申請專利範圍第1項所述的方法,其中,從該第一檢索結果集合中抽取需要向用戶端展示的第二檢索結果之前,還包括:將查詢資料、第二檢索結果以及兩者的對應關係保存至資料庫中。 The method of claim 1, wherein before extracting the second search result to be displayed to the client from the first set of search results, the method further comprises: querying the data, the second search result, and the like The correspondence is saved to the database. 根據申請專利範圍第1項所述的方法,其中,根據用戶端提交的查詢資料,獲得與該查詢資料相關的第一檢索結果集合,具體包括:按照第一相關性分值對該用戶端提交的查詢資料進行檢索;及將檢索結果按照預先選取的多樣化欄位抽取出第一檢索結果。 According to the method of claim 1, wherein the obtaining the first search result set related to the query data according to the query data submitted by the user terminal comprises: submitting the user to the client according to the first relevance score Searching for the query data; and extracting the search results according to the pre-selected diversified fields to extract the first search result. 根據申請專利範圍第1項所述的方法,其中,抽取需要向用戶端展示的第二檢索結果之後,還包括: 將該第二檢索結果向用戶端進行展示。 The method of claim 1, wherein after extracting the second search result to be displayed to the client, the method further includes: The second search result is displayed to the user end. 一種檢索系統,其特徵在於,該系統包括:獲取單元,用於根據用戶端提交的查詢資料,獲得與該查詢資料相關的第一檢索結果集合;計算單元,用於根據該集合中各個第一檢索結果的第一相關性分值和預置的多樣性欄位,計算獲取該各個第一檢索結果的第二相關性分值;該多樣性欄位用於表示該第一檢索結果的屬性類別;設置單元,用於根據該第一相關性分值和第二相關性分值生成該各個第一檢索結果的相關性參數值,其中該設置單元將該第一相關性分值和第二相關性分值之和作為該各個第一檢索結果的相關性參數值;及抽取單元,用於按照第二檢索結果的預置個數和該相關性參數值從該第一檢索結果集合中抽取需要向用戶端展示的第二檢索結果,其中該抽取單元包括:排序子單元,用於按照該相關性參數值對該第一檢索結果分類後的複數個子集合之個子集合進行排序;及第一抽取子單元,用於從排序後的各子集合中,按照排序的先後順序分別抽取預置個數的第二檢索結果,該預置個數為多樣性值個數和迴圈抽取次數的乘積。 A retrieval system, comprising: an obtaining unit, configured to obtain a first retrieval result set related to the query material according to the query data submitted by the user terminal; and a calculating unit, configured to use each of the first in the set Calculating a first relevance score of the search result and a preset diversity field, and calculating a second relevance score for obtaining each of the first search results; the diversity field is used to represent an attribute category of the first search result a setting unit, configured to generate a correlation parameter value of each of the first retrieval results according to the first correlation score and the second correlation score, wherein the setting unit compares the first correlation score with the second correlation a sum of sex scores as a correlation parameter value of each of the first search results; and an extracting unit configured to extract from the first search result set according to a preset number of the second search result and the correlation parameter value a second search result displayed to the client, wherein the extracting unit comprises: a sorting subunit, and a plurality of sub-category the first search result according to the correlation parameter value Sorting the sub-sets; and the first extracting sub-units are configured to extract, from the sorted sub-sets, a preset number of second search results in a sorted order, the preset number being diversity The product of the number of values and the number of times the loop is extracted. 根據申請專利範圍第6項所述的系統,其中,該計算單元具體包括:第一獲取子單元,用於按照預置的多樣性欄位對第一檢索結果集合進行分類,獲取該第一檢索結果集合中每一 個屬性類別對應的子集合;第二獲取子單元,用於按照該各個子集合中第一相關性分值大小獲取相應的第一檢索結果的順序;及匹配子單元,用於根據預置的該第一檢索結果的順序與第二相關性分值的關係,匹配獲取到各個第一檢索結果的第二相關性分值。 The system of claim 6, wherein the calculating unit comprises: a first obtaining subunit, configured to classify the first search result set according to the preset diversity field, and obtain the first search Each of the result sets a sub-collection corresponding to the attribute category; a second obtaining sub-unit, configured to obtain a sequence of the corresponding first search result according to the first correlation score size in each of the sub-sets; and a matching sub-unit, configured according to the preset The relationship between the order of the first search result and the second relevance score, and matching the second relevance scores of the respective first search results. 根據申請專利範圍第6項所述的系統,其中,該系統還包括:保存單元,用於將查詢資料、第二檢索結果以及兩者的對應關係保存至資料庫中。 The system of claim 6, wherein the system further comprises: a saving unit, configured to save the query data, the second search result, and the correspondence between the two to the database. 根據申請專利範圍第6項所述的系統,其中,該獲取單元具體包括:檢索子單元,用於按照第一相關性分值對該用戶端提交的查詢資料進行檢索;及第二抽取子單元,用於將檢索結果按照預先選取的多樣化欄位抽取出第一檢索結果。 The system of claim 6, wherein the obtaining unit comprises: a search subunit for searching the query data submitted by the client according to the first relevance score; and the second extracting subunit And used to extract the first search result according to the pre-selected diversified fields. 根據申請專利範圍第6項所述的系統,其中,還包括:展示單元,用於將該第二檢索結果向用戶端進行展示。The system of claim 6, further comprising: a display unit, configured to display the second search result to the user end.
TW099104467A 2010-02-11 2010-02-11 Retrieval methods and systems TWI493366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW099104467A TWI493366B (en) 2010-02-11 2010-02-11 Retrieval methods and systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW099104467A TWI493366B (en) 2010-02-11 2010-02-11 Retrieval methods and systems

Publications (2)

Publication Number Publication Date
TW201128417A TW201128417A (en) 2011-08-16
TWI493366B true TWI493366B (en) 2015-07-21

Family

ID=45025194

Family Applications (1)

Application Number Title Priority Date Filing Date
TW099104467A TWI493366B (en) 2010-02-11 2010-02-11 Retrieval methods and systems

Country Status (1)

Country Link
TW (1) TWI493366B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020143890A1 (en) * 2001-03-02 2002-10-03 Naqvi Shamim A. Method and system for selective content display
US6963867B2 (en) * 1999-12-08 2005-11-08 A9.Com, Inc. Search query processing to provide category-ranked presentation of search results
US20060123042A1 (en) * 2004-12-07 2006-06-08 Micrsoft Corporation Block importance analysis to enhance browsing of web page search results
US20060242129A1 (en) * 2005-03-09 2006-10-26 Medio Systems, Inc. Method and system for active ranking of browser search engine results
US20060248062A1 (en) * 2005-03-09 2006-11-02 Medio Systems, Inc. Method and system for content search with mobile computing devices
TWI289772B (en) * 2002-12-25 2007-11-11 Ibm Database system, terminal device, search database server, search key input support method, and program product
US20080183699A1 (en) * 2007-01-24 2008-07-31 Google Inc. Blending mobile search results

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6963867B2 (en) * 1999-12-08 2005-11-08 A9.Com, Inc. Search query processing to provide category-ranked presentation of search results
US20020143890A1 (en) * 2001-03-02 2002-10-03 Naqvi Shamim A. Method and system for selective content display
TWI289772B (en) * 2002-12-25 2007-11-11 Ibm Database system, terminal device, search database server, search key input support method, and program product
US20060123042A1 (en) * 2004-12-07 2006-06-08 Micrsoft Corporation Block importance analysis to enhance browsing of web page search results
US20060242129A1 (en) * 2005-03-09 2006-10-26 Medio Systems, Inc. Method and system for active ranking of browser search engine results
US20060248062A1 (en) * 2005-03-09 2006-11-02 Medio Systems, Inc. Method and system for content search with mobile computing devices
US20080183699A1 (en) * 2007-01-24 2008-07-31 Google Inc. Blending mobile search results

Also Published As

Publication number Publication date
TW201128417A (en) 2011-08-16

Similar Documents

Publication Publication Date Title
JP5615932B2 (en) Search method and system
TWI787196B (en) Method, device and system for generating business object attribute identification
TWI512506B (en) Sorting method and device for search results
JP5575902B2 (en) Information retrieval based on query semantic patterns
CN108304444B (en) Information query method and device
CN102346778B (en) Method and equipment for providing searching result
CN103514181B (en) A kind of searching method and device
CN102663064B (en) A kind of disposal route of favorites data and device
TW201220233A (en) by which category information of long tail keywords is provided for users within a specified time period
WO2015010566A1 (en) Method for accurately searching for comprehensive information
CN104346446A (en) Paper associated information recommendation method and device based on mapping knowledge domain
CN110222203B (en) Metadata searching method, device, equipment and computer readable storage medium
CN109726280B (en) Disambiguation method and device for homonyms
CN110968789B (en) Electronic book pushing method, electronic equipment and computer storage medium
CN106951527B (en) Song recommendation method and device
CN102930038A (en) Combined method of search result similar items and system of the same
CN102364467A (en) Network search method and system
WO2015179556A1 (en) Method, apparatus and system for processing promotion information
CN106919593B (en) Searching method and device
CN105159898A (en) Searching method and searching device
CN106909647B (en) Data retrieval method and device
CN103530344A (en) Real-time correction method for search words based on improved TF-IDF method
TWI493366B (en) Retrieval methods and systems
CN105574185A (en) Method and device for providing clustering type intelligent summaries
CN102622354B (en) Aggregated data quick searching method based on feature vector

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees