TW201011575A

TW201011575A - Recommendation apparatus and method of integrating rough sets and multiple-characteristic exploration

Info

Publication number: TW201011575A
Application number: TW97134973A
Authority: TW
Inventors: Xin-Mu Ceng; Jia-Hui Su; Qin-Yuan Xiao
Original assignee: Univ Nat Cheng Kung
Priority date: 2008-09-12
Filing date: 2008-09-12
Publication date: 2010-03-16
Also published as: TWI372983B

Abstract

A recommendation apparatus and the associated method of integrating rough sets and multiple-characteristic exploration are disclosed. The recommendation apparatus includes a user recording module, a data integration module, a correlated-rule exploration module, a user grouping module, a statistical analysis grading forecast module (ModelACR), a user grouping decision module, a data array module, a rough-set grading forecast module (ModelRS), and a behavior variation determination module. The recommendation method includes a training stage and a prediction stage. The training stage is used to establish the correlated rule, and the user grouping and grading table; the prediction stage applies respectively a rough-set algorithm and a statistical analysis prediction method to the grading record of a target user to predict the grading forecast value. Moreover, by setting the standard deviation in the statistics as a threshold value, the forecast result obtained from the two above-mentioned prediction method is used by dynamical adjustment to effectively improve satisfaction and accuracy of the prediction result.

Description

201011575 九、發明說明：【發明所屬之技術領域】本發明係有關於一種整合約略集合與多重特徵探勘之推薦裝置及其方法，尤指一種將利用資料探勘法 (Data Mining)找出關聯規則，並結合約略集合演算法為基礎之預測方式，以及以統計分析預測法為^礎之預測方式予以整合之推薦裝置及其方法。【先前技術】 ❹ 一般在應用推薦方法找出推薦商品（item)之過程中’主要之流程係包含下列兩個步驟： (A)主要係針對使用者未曾接觸過之商品，分析出使用者對各個商品可能之滿意程度，一般未接觸過之商〇π通常滿意程度值為未知（Unknown value );以及 (B )利用分析後之結果針對各個商品作出排序，從而找出使用者最感到興趣之商品並推薦給使用者。目前研究之方向係以提升步驟（A)之準確度為主〇要之研究目標。而近年來由於在推薦裝置之推薦方法相關研究中，逐漸係以協同過濾推薦為主要研究與發展之方向，然而，以該協同過濾推薦技術在第一階段之分析上卻常發生下列幾點之問題，包含： (1 )新加入之使用者問題（Cold Start):在過去之推薦裝置上’當面臨到一個全新之使用者時，由於該使用者無過去之紀錄可進行參考，因而造成在傳統協^ 過濾技術或内涵式過濾技術都不能夠對使用者作任何相似度之計算，因此也無法對該使用者作推薦； 201011575 (2 )新加入之商品問題（First Rater):不同於上述新加入之使用者問題，由於產品係剛推出之新品，因此在過去之紀錄中不可能存在相關之消費行為，儘管在協同過濾推薦技術上能夠找出與使用者相關之鄰居出來，但在之後之評分預測上也會因為沒有資料而無法作计算，因此新產品將一直不會被推薦，直到有任何使用者主動進行評分為止； (3 )資料稀疏（Sparsity)之問題：在協同過濾 ❹推薦技術中，由於必須要透過相似度之計算找出與使用者在評分行為上類似之其他使用者，然而相似度計算必須以兩者都有共同評分過之項目為基準，而在現實之情況下，由於大多數之使用者面對成千上萬之商品中，往往消費之範圍只係、其中之冰山一角，因此在相似度計算上要求兩者都要共同評分之條件就不易滿足，因此也造成相似使用者在尋找上之困難；以及 (4 )資料量處理問題（Scalab出ty):隨著使用者 ❹數目之增加，在使用者找尋最相關之其他使用者時所需要搜尋之範圍也會跟著增A’造成以協同過濾推薦技術上之執行效能降低。雖然以目前之研究趨勢而言，主要係以協同過遽推篇技術為主要之研究方法，然而，該法係具有一些無法抗拒之_點存在’如上所述之新❹者之㈣、新商 =之問題、資料稀疏及資料量處理之執行效能之問題等’因此使用該法並不能完全地幫助使用者取得其真正想要之資料。故’-般習用者係無法符合使用者於實際 6 201011575 使用時之所需。【發明内容】本發明之主要目的係在於，克服習知技藝所遭遇之隹供一種將利用資料探勘法找出關聯規則並 =合約略集合演算法為基礎之預測方式以及以統 ::::測法為基礎之預測方式予以整合之推薦裝置及其方法。201011575 IX. Description of the invention: [Technical field to which the invention pertains] The present invention relates to a recommended apparatus and method for integrating approximate set and multiple feature exploration, and more particularly to a method for finding association rules by using Data Mining. It is combined with the forecasting method based on the approximate set algorithm and the recommended device and method for integrating the statistical analysis forecasting method. [Prior Art] ❹ Generally, in the process of applying the recommended method to find the recommended item, the main process consists of the following two steps: (A) mainly for the user who has not touched the product, and analyzes the user. The degree of satisfaction of each commodity, generally untouched, 通常π is generally unknown (Unknown value); and (B) using the results of the analysis to sort the individual products to find the user's most interested The product is recommended to the user. The current research direction is based on the improvement of the accuracy of step (A). In recent years, due to the recommendation method of the recommended device, the collaborative filtering recommendation is gradually the main research and development direction. However, the following points often occur in the analysis of the first phase of the collaborative filtering recommendation technology. The questions include: (1) Newly added user questions (Cold Start): In the past recommended devices, when faced with a new user, the user has no past records for reference, resulting in Traditional association filtering technology or intrinsic filtering technology can not calculate the similarity of the user, so it is not recommended for the user; 201011575 (2) Newly added product problem (First Rater): different from the above The newly added user problem, because the product is just launched, it is impossible to have relevant consumer behavior in the past records, although the collaborative filtering recommendation technology can find out the neighbors related to the user, but after The rating prediction will not be calculated because there is no data, so the new product will not be recommended until there is any What is the user's initiative to score; (3) Sparsity problem: In the collaborative filtering recommendation technology, it is necessary to find other users similar to the user's rating behavior through the similarity calculation. However, the similarity calculation must be based on the items that both have been scored together. In reality, since most users face thousands of commodities, the scope of consumption is usually only The tip of the iceberg, so the condition that the two are required to be jointly scored in the similarity calculation is not easy to satisfy, so it also causes difficulties for similar users to search; and (4) the problem of data processing (Scalab ty): As the number of users increases, the range of searches that users need to search for the most relevant other users will also increase the performance of the collaborative filtering recommendation technology. Although in the current research trend, it is mainly based on the collaborative research method, however, the legal system has some irresistible _ point existences as mentioned above (4), new business = The problem, the sparseness of the data and the performance of the data processing, etc. 'The use of this method does not completely help the user to obtain the information he really wants. Therefore, the 'normal users' cannot meet the needs of users when they use the actual 6 201011575. SUMMARY OF THE INVENTION The main object of the present invention is to overcome the problems encountered in the prior art for a prediction method based on the data mining method to find the association rule and the contract abbreviated set algorithm and the system:::: A recommended device and method for integrating the method based on the prediction method.

本發明之次要目的係在於，結合商品及使用者等多元化之資料’利用資料探勘法找出關聯規則，進而可解決新加入之使用者問題（cold_start)、新加人之商品問題（Fim-Rater)及資料稀疏之問題（Sparsity)。本發明之另一目的係在於，透過分群演算法可有效過濾其他無關之使用者，以達到較好之資料處理能力（ Scalability )。本發明之再一目的係在於，最後可在門檻值之設定下，動態調整地應用上述兩種預測方式之預測結果，以有效地達到提升預測滿意度之準確度。為達以上之目的，本發明係一種整合約略集合與多重特徵探勘之推薦裝置及其方法，該推薦裝置係包括一使用者記錄模組、一資料整合模組、一關聯規則探勘模組、一使用者分群模組、一統計分析評分預測模組（ ModelACR)、一使用者分群決定模組、一資料陣列模組、一約略集合評分預測模組（ModelRS)及一行為變異判斷模組所構成。 201011575 . · 該推薦方法係包括一訓練階段與一預測階段，該訓練階段係分別用以建立關聯規則、使用者分群及評分表，該訓練階段首先係提供各種不同特性之資料，包含使用者之個人特性資料及個人評分記錄與商品之内涵資料；將上述各種不同特性之資料經一資料前置處理作結成多筆交易資料；關聯該些交易資料，擷取該些父易賣料間之多個關聯性，並以一資料探勘法建立該些交易資料中之多個關聯規則，且存入一關聯規則資料庫 ❹，擷取上述各種不同特性之資料中使用者之個人評分記錄，以一分群演算法將該些個人評分記錄中各使用者分割成幾個群集’並存人—使用者分群資料庫；以及分析 :該些父易資料中每—使用者之交易記錄，以—統計分析 (Statistical Analysis )預測法，將各商品之類別屬性以組〇方式重新編碼，並將每一使用者之評分資料以類別方式作整理，利用分析計算出每—使用者對各類別商品之評分平均，進而建立出每一使用者對各類別商品之評〇分預測值；該預測階段係針對一目標使用者之評分記錄，分別應用約略集合法預測評分預測值，以及應用統計分析預測法預測評分預測值，該預測階段首先係根據在該訓練階段所建立之多個群集中，找出與該目標使用者較為相關之其他使用者為基礎，以建立出該目標使用者及其他使用者對商品之評分表；並以該評分表，根據在該訓練階段所建立之關聯規則，將除了該目標使用者及其目標商αο外，對其他未知評分（unkn〇wn Value )作初步之 201011575 預測’以建立一完整之子陣列（Sub-Matrix );之後係先計算該子陣列中每項商品與該目標商品之相似度，找出與該目標商品相似度最高之商品作為類別標藏，以及找出與該目標商品相似度第二高之另一商品作為特徵商品’再利用一約略集合演算法，分別對各使用者以類別標籤之資料為主，將各使用者依不同之類別做分割，以建立相同類別（Equivalence Class)之第一元素集合 (Elementary Set),同時，並分別對各使用者以結合該 ❹目標商品與該特徵商品之資料為主，將各使用者依不同之商品集合做分割，以建立相同類別之第二元素集合，最後透過該第一元素集合及該第二元素集合之比較找出下界近似（Lower Approximation )，進而預測出該目標使用者對該目#商品之評分預測值；然後根據在該訓練階段以統計分析預測法建立之評分表，預測出每一使用者對各類別商品之評分預測值；以及根據以該統計分析預測法建立之評分預測值與以該約略集合演算法預 ©測之評分預測值，利用一動態交互使用基礎 (Smtch-based)混合法’將以該統計分析預測法建立，評分預測值以統計方法中之標準差，計算出使用者在每種類別之評分行為，透過統計中標準差之大小作門播值（Threshold)之設定，判斷該目標商品在使過去評分上之標準差是^超職職值，若標準差太大則以該約略集合演算法預測之評分預測值為準，否則以該統計分析預測法預測之評分預測值為準。 9 201011575 【實施方式】請參閱『第1圖』所示，係本發明推薦裝置之架構示意圖。如圖所示：本發明係一種整合約略集合與多重特徵探勘之推薦裝置及其方法，該推薦裝置係包括一使用者記錄模組1〇、一資料整合模組工i、一關聯規則探勘模組1 2、一使用者分群模組1 3、一統計分析評分預測模組（ModelACR) 1 4、一使用者分群決定模組1 5、一資料陣列模組1 6、一約略集合評分預測模 Ο 組（ModelRS) 1 7及一行為變異判斷模組i 8所構成〇該使用者記錄模組1 〇係包含一使用者軌跡檔工 0 1、一商品轨跡播1 〇 2及一使用者評分記錄1 〇 3 ，用以提供使用者之個人特性資料及個人評分記錄與商品之内涵資料。該資料整合模組1 1係接收該使用者記錄模組工 0輸出之個人特性資料、個人評分記錄及内涵資料，用 ❹ 以進行一資料前置處理，將使用者每一筆消費後之個人評分記錄結合使用者之個人特性資料及商品之内涵資料’產生一使用者交易資料（Transaction Table)。該關聯規則探勘模組1 2係接收該資料整合模組 11輸出之交易資料，用以將該些交易資料中之相同關聯性掏取出’並以規則之形式表現，以建立該些交易資料中之多個關聯規則，並存入一關聯規則資料庫1 2 1 供存取之用。 201011575 該使用者分群模組1 3係接收該使用者評分記錄 1 0 3輸出之個人評分記錄，用以針對使用者之個人評分記錄將使用者分割成幾個群集，並存入一使用者分群資料庫131供存取之用。該統計分析評分預測模組1 4係接收該資料整合模組11輸出之交易資料’用以針對每一使用者之交易記錄進行統計分析’並以類別之方式作整理與計算，以建立每一使用者對各編碼後之類別商品之評分表1 4 ❹ 1 ’並根據該評分表1 4 1預測出每一使用者對各類別商品之評分預測值。該使用者分群決定模組15係針對一目標使用者之評分記錄，根據該使用者分群資料庫i 3丄之多個群集中，找出與該目標使用者較為相關之其他使用者，並建立該目標使用者及其他使用者對商品之評分表。該資料陣列模組16係接收該使用者分群決定模組1 5輸出之評分表，並根據該關聯規則資料庫i 2工 © 之多個關聯規則中，對該評分表内之目標使用者及其目標商品外其他未知評分作初步之預測，且依此建立一完整之子陣列。該約略集合評分預測模組17係接收該資料陣列模組1 6輸出之子陣列，用以將依類別做分割建立相同類別之第一元素集合以及依商品集合做分割建立相同類別之第二元素集合加以比較，並找出下界近似以預測該目標使用者對該目標商品之評分預測值。 201011575 該行為變異判斷模組1 8係接收該統計分析評分預測模組1 4與該約略集合評分預測模組1 7各自輸出之評分預測值，透過一門檻值之設定，以動態之調整判斷其中一評分預測值為最終之評分預測值。以上所述 ’係構成一全新具整合約略集合與多重特徵探勘之推薦裝置。請參閱『第2圖』所示，係本發明推薦方法之流程示意圖。如圖所示：本發明係一種整合約略集合與多重 Ο 〇特徵探勘之推薦裝置及其方法，該推薦方法係包括一訓練階段2與一預測階段2 a，該訓練階段係分別用以建立關聯規則、使用者分群及評分表，該訓練階段包含下列步驟： (A)提供資料2 〇 :提供各種不同特性之資料，包含使用者之個人特性資料及個人評分記錄與商品之内涵資料； (B) 資料刚置處理21:將上述各種不同特性之資料經一資料前置處理作結合，形成多筆交易資料·， (C) 建立關聯規則2 2 :關聯該些交易資料，擷取該二交易資料間之多個關聯性，並以一資料探勘（細論心）法建立該些U資料中之多個關聯規則，且存入一關聯規則資料庫； (D)使用者分群23:擷取上述各種不同特性之 ::中使用者之個人評分記錄’以一分群演算法者分割成幾個群集，並存入-使 12 201011575 (E )建立評分預測值2 4 :分析該些交易資料中每一使用者之交易記錄，以一統計分析（Statistical Analysis )預測法，將各商品之類別屬性以組合方式重新編碼，並將每一使用者之評分資料以類別方式作整理，利用分析計算出每一使用者對各類別商品之評分平均，進而建立出每一使用者對各類別商品之評分預測值；該預測階段係針對一目標使用者之評分記錄，分別應用約略集合法預測評分預測值，以及應用統計分析預〇測法預測評分預測值，該預測階段包含下列步驟： (F) 建立評分表2 5 :根據在該訓練階段所建立之多個群集中，找出與該目標使用者較為相關之其他使用者為基礎，建立出該目標使用者及其他使用者對商品之評分表； (G) 建立完整子陣列2 6 :根據在該訓練階段所建立之關聯規則，將除了該目標使用者及其目標商品外，對其他未知評分（Unknown Value)作初步之預測， ❹以建立一完整之子陣列（Sub-Matrix ); (Η)約略集合預測評分預測值2 7 :利用一包含有一使用者數量限制參數及一商品數量限制參數之約略集合演算法，先計算該子陣列中每項商品與該目標商之相似度，找出與該目標商品相似度最高之商品作為類別標籤，之後再將除了作為該類別標籤之商品外，計算每項商品對該目標商品之相似度，藉該商品數量限制參數之設定找出一特徵商品，並透過排序找出與該目標商品最相關之商品集合，之後分別對各使用者以類別^ 13 201011575 籤之資料為主，將各使用者依不同之類別做分割，以建立相同類別（Equivalence Class )之第一元素集合（ Elementary Set)，同時，並分別對各使用者以結合該目標商品與該特徵商品之資料為主，將各使用者依不同之商品集合做分割，以建立相同類別之第二元素集合，最後透過該第一元素集合及該第二元素集合之比較找出下界近似（Lower Approximation )，進而預測出該目標使用者對該目標商品之評分預測值； © ( I )統計分析預測評分預測值2 8 :根據在該訓練階段以統計分析預測法建立之評分表，預測出每一使用者對各類別商品之評分預測值；以及 (J )判斷評分預測值2 9 :根據以該統計分析預測法建立之評分預測值以及以該約略集合演算法預測之評分預測值，利用一動態交互使用基礎（Switch based )混合法，將以該統計分析預測法建立之評分預測值以統计方法中之標準差，計算出使用者在每種類別之評分 ❹行為，透過統計中標準差之大小作為一門檻值（ Threshold)之設定，判斷該目標商品在使用者過去評分上之標準差是否超過該門檻值，若標準差太大則以該約略集合演算法預測之評分預測值為準’否則以該統計分析預測法預測之評分預測值為準》請進一步參閱『第3圖』所示，係本發明約略集合演算法之流程示意圖。如圖所示：根據上述步驟（H〇) 中，該約略集合演算法之流程係進一步包含下列步驟： 201011575 (h 1 )接收子陣列2 7丄：接收步驟（G)建立之子陣列； (h 2 )找出類別標籤2 7 2 :從上述之子陣列中，計算每項商品與該目標商品之相似度，並找出相似度最高之商品作為類別標蕺； ❹ 〇 (h 3 )找出特徵商品2 7 3 ··從上述之子陣列中，利用设定該約略集合演算法中之商品數量限制參數，將=了上述作為類別標籤之商品外，計算每項商品對該目標商品之相似度’藉該商品數量限制參數對該特徵商品所設之限制數量找出該特徵商品，並透過排序找出與該目標商品最相關之商品集合； (h4)建立第-、二元素集合274:分別對各用者以該類別標籤為基礎，將各使用者依具有相同類 2评分資料作分割，以建立數個第—元素集合，以及，、:目標使用者外之其他使用者以該商品集合為基礎、他使用者依具有相同類別之評分資料作分割以遂立數個第二元素集合；素集父5找別2 7 5 :透過比較該第-元界近似；找出有被完全包含者為其下找出上二：),斷是否找出相同類別2 7 6 :針對上述者: = 約略集合演算法中之使用無符合要求則斷疋否有符合要求之下界近似，若集’則回步驟(h3)重新設定最相關之商品，Μ及 15 201011575 (h 7 )預測評分預測值2 7 7 :以上述找出之下界近似為依據，預測該目標商品之評分係與同類別之集合相似，因此獲得目標使用者對其目標商品之評分預測值。藉此，應用本裝置之推薦方法，係可將使用者之個人特性資料及個人評分記錄與商品之内涵資料，透過資料探勘法即能有效地從大量之交易資料中找出適合此使用者之關聯規則，因此不僅能夠考慮使用者之行為同 ❹時，亦可考慮商品之内涵資訊，進而可解決協同過濾推薦技術上之新加入之使用者問題（c〇ld_Start)、新加入之商品問題（First-Rater )以及資料稀疏之問題（Sparshy )。在本裝置推薦方法下’不論係新使用者或新商品均能夠透過關聯規則之分析，找出使用者相對於產品最適合之關聯規則，並進-步推測可能之滿意度評分，而透過該關聯規則之分析將使用者對商品之評分作初步之推測也能夠克服資料稀疏之問題，因此本發明係具有高 ❹準確性，可提供有意義之關聯規職專家分析研究者。當本發明於運用時，於一較佳之實施例中，係以電影作為本發明推薦裝置實施推薦方法之推薦商品，並進行訓練階段與預測階段之推薦流程。推薦流程之訓練階段： [實施方法一]關聯規則探勘在伤使用者歷史之評分記錄中，本發明係同時結口使用者個人特性資料以及電影内涵㈣，形成一筆交易資料。其中，該使用者個人特性資料係包含有年齡、 16 201011575 性別、職業及居住區碼，而該電影内涵資料則包含有電影ID及電影類別。當上述各種不同特性之資料經前置處理整合為交易資料後，即可以關聯規則探勘找出具有相同定義之關聯規則。 [實施方法二]使用者分群請參閱『第4圖』所示，係本發明分群演算法之運作示意圖。如圖所示：本發明係採用資料探勘中之分群 ◎ 演算法施以相似度之計算，以一 Pearson相關係數（ Pearson Correlation Coefficient)公式，將使用者評分記錄中1 0 3評分行為表現上類似之使用者聚集在同一群中，藉此可在事先將使用者分割成幾個群集3 0，達到大幅縮減相似使用者之搜尋範圍。其中，本發明所採用之分群演算法係為K-means演算法，以KNN之方式將使用者分為K群。 [實施方法三]ModelACR之評分表〇請參閱『第5圖〜第8圖』所示，係分別為本發明商品之類別屬性資料示意圖、本發明使用者對商品之評分資料示意圖、本發明商品之類別編碼資料示意圖及本發明使用者對類別之評分示意圖。如圖所示：關於統計分析評分預測模組之建立，由於一部電影可具有多重類別之屬性，因此本發明係先應用編碼技術將多重類別之資訊同時考慮，並藉此建立出使用者對各類別電影之評分表，進而可在往後預測時供作使用。 17 201011575 於一較佳實施例中，假設目前電影之類別屬性如第 5圖所示，而使用者之評分資料以類別方式作整理如第 6圖所示。由於在類別屬性資料上電*Iteml &Item4 係相同之資料，因此可得知電影Iteml及Item4都分別屬於類別1、類別3以及類別4，經組合方式重新編碼後’此兩電影Iteml及Item4則會被編為同一種模擬類別A (Virtual Category A)，如第7圖所示。在評分表中，使用者Userl分別對電影Iteml及Item4之評分為 ❹ Vl，l = 3、Vl，4 = 5，因此根據第7圖之結果，可將電影 Iteml及Item4之評分視為對同一種類型之評分，所以可得到使用者Userl對模擬類別A之評分為電影Iteml 及Item4評分之平均：（3+5)/2 = 4，如第8圖所示。推薦流程之預測階段： [實施方法四]應用使用者分群建立評分表凊參閱『第9 A圖』所示，係本發明使用者對商品之砰分不意圖。如圖所示：針對一目標使用者之評分記 ❹錄，係將該訓練階段所建立之群集3 0 (如第4圖所示 ^對每個群集3 〇之中心點以Pearson相關係數之距離公式計算相似度，將與該目標使用者距離最近之群集找出’，並以找出之群集中其他使用者為基礎，建立出大小適當之目標使用者及其他使用者對商品之評分表，藉此能夠有效地過濾其他無相關之使用者，並進一步達到較好之資料處理能力（Scalability)，進而降低在之後以約略集合演算法中需要處理之資料量。 201011575 [實施方法五]應用關聯規則建立完整評分表假設在該訓練階段找出之關聯規則有以下幾條： {Young, M, Administrator, Action} -> { 3} {Young, M, Action, Fantasy} ^ {4} {Young, Administrator, Action, Fantasy} -> {4 } 第一條關聯規則表示年輕男性之管理員對動作電影之喜好程度為三分，而其他則如規則所述；因此當使用者之個人特性資料符合以上規則之敘述，同時欲預測 0 之商品（即目標商品）之特性也符合規則之描述時，即可預測該使用者對商品之評分可能為3或4，因此本推薦方法係同時考慮多條規則，並總結最後之結果來作預測，所以最後預測之結果為：（3+4+4)/3 = 3.3%3，而應用此方法則能夠有效地解決新使用者之問題、新商品之問題及資料稀疏之問題等問題。 [實施方法六]ModelRS之評分預測值請參閱『第9 B圖〜第1 i圖』所示，係分別為以 Ο 關聯規則對第9 A圖建立之子陣列示意圖、本發明以類別分類之第一元素集合示意圖及本發明以商品分類之第二兀素集合示意圖。如圖所示：本發明係利用包含有使用者數量限制參數及商品數量限制參數之約略集合演算法對使用者之評分作預測，可分別藉由該使用者數量限制參數對找出具有相同類別之第一、二元素集合其下界近似之集合大小作限制’以及藉由該商品數量限制參數對在建立第二元素集合時所參考之特徵商品之數 201011575 量作限制。於一較佳實施例中，假設上述兩項限制參數均為2 ; 當利用上述[實施方法五]找出之關聯規則，對上述[ 實施方法四]建立之目標使用者及其他使用者對商品之評分表作修正時，係將除了該目標商品外之其他未知評分作初步預測，進而將此評分表構成完整之子陣列如第9 B圖所示，接著利用一 Pears〇n相關係數之距離公式计算每項商品與目標商品之相似度其公式之定義為The secondary objective of the present invention is to combine the diversified materials such as commodities and users to find out the association rules by using the data mining method, thereby solving the newly added user problem (cold_start) and the newly added product problem (Fim). -Rater) and sparsity issues (Sparsity). Another object of the present invention is to effectively filter other unrelated users through a clustering algorithm to achieve better data processing capability (Scalability). A further object of the present invention is to finally apply the prediction results of the above two prediction methods dynamically under the setting of the threshold value, so as to effectively achieve the accuracy of improving the prediction satisfaction. For the purpose of the above, the present invention is a recommended apparatus and method for integrating the approximate set and multiple feature exploration. The recommended device comprises a user record module, a data integration module, an association rule exploration module, and a method. User grouping module, a statistical analysis score prediction module (ModelACR), a user group decision module, a data array module, a model collection score prediction module (ModelRS) and a behavior variation determination module . 201011575 . The recommendation method includes a training phase and a prediction phase, which are respectively used to establish association rules, user grouping, and a score table. The training phase first provides information of various characteristics, including the user. Personal characteristics data and personal score records and connotation data of commodities; the data of the above various characteristics are processed into a plurality of transaction materials through a pre-processing of data; associating the transaction materials, the amount of the fathers Relevance, and a plurality of association rules in the transaction materials are established by a data exploration method, and are stored in an association rule database, and the personal score records of the users in the data of the above different characteristics are taken as one The grouping algorithm divides each user in the individual score records into several clusters' concurrent users-user grouping database; and analyzes: each of the user-transaction records in the parent-to-user data is statistically analyzed ( Statistical Analysis), which re-encodes the category attributes of each item in groups, and scores each user. The data is sorted by category, and the average of each user's rating of each category of goods is calculated by using the analysis, and then the predicted value of each user's evaluation of each category of goods is established; the prediction stage is for a target user. The score record, respectively, using the approximate set method to predict the score prediction value, and applying the statistical analysis prediction method to predict the score prediction value, the prediction stage firstly finding out the target user according to the plurality of clusters established in the training stage Based on other related users to establish a score sheet for the target user and other users; and using the score sheet, according to the association rules established during the training phase, In addition to the target quotient αο, make a preliminary 201011575 prediction for other unknown scores (unkn〇wn Value) to establish a complete sub-matrix (Sub-Matrix); then calculate each item in the sub-array and the target item first. Similarity, find the product with the highest similarity to the target product as the category, and find out the product with the target Another product with the second highest degree is used as a feature product's re-use-approximately assembly algorithm, which mainly uses the category label data for each user, and divides each user according to different categories to establish the same category ( The first element set of the Equivalence Class, and at the same time, each user is combined with the target product and the characteristic product, and each user is divided according to different product sets to establish a second set of elements of the same category, and finally finding a lower approximation by comparing the first set of elements with the set of second elements, and then predicting a predicted value of the target user for the item #; Predicting the scores of each user for each category of merchandise according to the score table established by the statistical analysis and prediction method at the training stage; and predicting the scores based on the statistical analysis and prediction method and using the approximate set algorithm The predicted value of the pre-test score, using a dynamic interactive use base (Smtch-based hybrid method) will be predicted by the statistical analysis The method is established, and the score prediction value is calculated by the standard deviation in the statistical method, and the user's rating behavior in each category is calculated. The size of the standard deviation is used as the threshold value (Threshold), and the target commodity is judged to be past. The standard deviation of the score is ^ over-the-job value. If the standard deviation is too large, the predicted value of the score predicted by the approximate set algorithm is accurate. Otherwise, the predicted value of the score predicted by the statistical analysis prediction method is accurate. 9 201011575 [Embodiment] Please refer to the "Figure 1" for a schematic diagram of the architecture of the recommended device of the present invention. As shown in the figure: the present invention is a recommended device for integrating approximate set and multiple feature exploration, and the method thereof comprises a user record module, a data integration module i, and an association rule exploration module. Group 1 2, a user grouping module 1 3, a statistical analysis score prediction module (ModelACR) 1 4, a user grouping decision module 1 5, a data array module 1 6, an approximate set score prediction mode The ModelRS 1 7 and the Behavior Variation Judgment Module i 8 constitute the user record module 1 which includes a user track file 0 1 , a product track 1 〇 2 and a user The score record 1 〇 3 is used to provide the user's personal characteristics data and personal score records and product connotation data. The data integration module 1 1 receives the personal characteristic data, the personal score record and the connotation data output by the user record module 0, and uses ❹ to perform a data pre-processing, and the user is scored after each consumption. The record combines the user's personal characteristics data and the connotation data of the product to generate a user transaction data (Transaction Table). The association rule searching module 12 receives the transaction data output by the data integration module 11 to extract the same relevance in the transaction data and express it in the form of rules to establish the transaction materials. A plurality of association rules are stored in an association rule database 1 2 1 for access. 201011575 The user grouping module 1 3 receives the personal rating record of the user rating record 1 0 3 output, and divides the user into several clusters for the user's personal rating record, and deposits the user into a group. The database 131 is for access. The statistical analysis score prediction module 14 receives the transaction data 'for statistical analysis of each user's transaction record' outputted by the data integration module 11 and organizes and calculates in a category manner to establish each The user's score table 1 4 ❹ 1 ' for each of the coded products is predicted based on the score table 141 to predict the score of each user for each category of goods. The user grouping decision module 15 is for a score record of a target user, and among other clusters of the user grouping database i3, find other users more related to the target user, and establish A rating sheet for the target user and other users. The data array module 16 receives the score table output by the user group determination module 105, and according to the association rules of the association rule database, the target users in the score table and Other unknown scores other than the target product are initially predicted, and a complete sub-array is established accordingly. The approximate set score prediction module 17 receives the sub-array outputted by the data array module 16 for splitting the first element set of the same category by the category and dividing the second element set of the same category according to the product set. Compare and find the lower bound approximation to predict the target user's score prediction for the target commodity. 201011575 The behavioral variation judging module 18 receives the scoring prediction values respectively output by the statistical analysis score prediction module 14 and the approximate set score prediction module 17. The dynamic adjustment is used to determine the score by the threshold setting. A score prediction value is the final score prediction value. The above description constitutes a new recommended device with integrated approximate set and multiple feature exploration. Please refer to the "Figure 2" for a schematic diagram of the flow of the recommended method of the present invention. As shown in the figure: the present invention is a recommended apparatus for integrating approximate set and multiple 〇〇 feature exploration, and the method thereof includes a training phase 2 and a prediction phase 2 a, respectively, which are used to establish associations respectively. Rules, user grouping and score sheet, the training phase includes the following steps: (A) Providing information 2 〇: providing information on various characteristics, including the user's personal characteristics data and personal score records and product connotation data; ) Data processing 21: Combine the above-mentioned various characteristics of the data with a data pre-processing to form multiple transaction data, (C) Establish association rules 2 2: Associate the transaction data and retrieve the transaction Multiple associations between data, and a plurality of association rules in the U data are established by a data exploration (deliberation) method, and stored in an association rule database; (D) User group 23: capture The various characteristics of the above:: The user's personal score record 'divided into several clusters by a group algorithm, and deposited - so that 12 201011575 (E) to establish a score prediction value 2 4 : Analysis The transaction records of each user in the transaction data are re-encoded in a combined manner by a statistical analysis (Statistical Analysis) prediction method, and each user's rating data is sorted by category. Using the analysis to calculate the average of the scores of each category of products for each user, and then establish a score prediction value for each category of products for each category; the prediction stage is for a target user's score record, respectively, applying the approximate set The legal prediction score prediction value, and the application of the statistical analysis pre-measurement method to predict the score prediction value, the prediction stage includes the following steps: (F) establishing a score table 2 5: according to the plurality of clusters established in the training phase, Based on other users more relevant to the target user, establish a score sheet for the target user and other users; (G) Establish a complete sub-array 2 6 : According to the association rules established during the training phase , in addition to the target user and its target product, make preliminary predictions for other Unknown Values, ❹To establish a complete sub-matrix (Sub-Matrix); (Η) approximate set prediction score prediction value 27: using a approximate set algorithm including a user quantity limit parameter and a commodity quantity limit parameter, the child is first calculated The similarity between each item in the array and the target quotient, find the item with the highest similarity to the target item as the category label, and then calculate the similarity of each item to the target item except the item labeled as the category label. Degree, by the setting of the quantity limit parameter of the product, find a characteristic product, and find out the product collection most relevant to the target product through sorting, and then respectively, for each user, the information signed by the category ^ 13 201011575 is mainly The user divides according to different categories to establish an Elementary Set of the same category (Equivalence Class), and at the same time, respectively, for each user to combine the target product with the characteristic product, Each user divides according to different product collections to establish a second element set of the same category, and finally through the first element set And comparing the second set of elements to find the lower approximation, and then predicting the target user's score prediction value for the target product; © (I) statistical analysis predictive score prediction value 28: according to the training The scores established by the statistical analysis and prediction method are used to predict the predicted value of the scores of each category of products for each user; and (J) the predicted scores of the judged scores 2 9 : based on the scores predicted by the statistical analysis and prediction method and Calculating the scores predicted by the approximate set algorithm, using a dynamic method based on Switch based, and calculating the scores of the scores established by the statistical analysis and prediction method by the standard deviation in the statistical method. In each category of rating behavior, the standard deviation of the statistics is used as a Threshold setting to determine whether the standard deviation of the target product in the past score exceeds the threshold, if the standard deviation is too large Then, the prediction value of the score predicted by the approximate set algorithm is accurate. Otherwise, the score prediction predicted by the statistical analysis prediction method is predicted. Subject "Further referring to" FIG. 3 ", a schematic view showing the process of the present invention approximate set of algorithms. As shown in the figure: according to the above step (H〇), the flow of the approximate set algorithm further comprises the following steps: 201011575 (h 1 ) receiving sub-array 2 7丄: receiving the sub-array established by step (G); 2) Find the category label 2 7 2: Calculate the similarity between each item and the target item from the above sub-array, and find the item with the highest similarity as the category standard; ❹ 〇(h 3 ) find the feature Product 2 7 3 ·· From the sub-array described above, by setting the product quantity restriction parameter in the approximate set algorithm, the similarity of each item to the target item is calculated, in addition to the above-mentioned item as the category label. The product quantity restriction parameter is used to find the characteristic item for the limited quantity set by the characteristic item, and the item set most relevant to the target item is found by sorting; (h4) establishing the first-and second-element set 274: respectively Each user is based on the category label, and each user is divided according to the same class 2 scoring data to establish a plurality of first element collections, and: other users outside the target user are the merchants Based on the collection of products, his users are divided into several sets of second elements according to the scores of the same category; the set of parents 5 finds 2 7 5: by comparing the first-element approximation; The complete includer finds the top two:), whether the fault finds the same category 2 7 6 : for the above: = If the use in the approximate set algorithm does not meet the requirements, then the fault is below the required boundary approximation, if Set 'return step (h3) to reset the most relevant products, Μ and 15 201011575 (h 7 ) forecast score prediction value 2 7 7 : based on the above finding the lower bound approximation, predict the score of the target commodity and the same The collection of categories is similar, so the target user's rating for their target product is obtained. Therefore, by applying the recommended method of the device, the user's personal characteristic data and the personal score record and the connotation data of the product can be effectively found out from a large amount of transaction data through the data exploration method. The association rules, therefore, can not only consider the user's behavior, but also consider the connotation information of the product, and thus solve the newly added user problem (c〇ld_Start) and the newly added product problem in the collaborative filtering recommendation technology ( First-Rater) and the sparse problem (Sparshy). Under the recommended method of the device, 'new users or new products can analyze the association rules to find out the most suitable association rules for the user relative to the product, and further speculate the possible satisfaction score through the association. The analysis of the rules can also overcome the problem of sparse data by making the user's preliminary estimation of the product's score. Therefore, the present invention has high accuracy and can provide meaningful correlation analysis expert analysis researchers. When the present invention is utilized, in a preferred embodiment, a movie is used as a recommended product for implementing the recommended method of the present invention, and a recommended procedure for the training phase and the prediction phase is performed. The training phase of the recommendation process: [Implementation Method 1] Association Rule Exploration In the score record of the injury user history, the present invention simultaneously forms the personal characteristics of the user and the film connotation (4) to form a transaction data. Among them, the user's personal characteristics data includes age, 16 201011575 gender, occupation and residential area code, and the film connotation data contains the movie ID and movie category. When the data of the above various characteristics are integrated into the transaction data through pre-processing, the association rules can be associated to find the association rules with the same definition. [Embodiment 2] User grouping Referring to Fig. 4, it is a schematic diagram of the operation of the clustering algorithm of the present invention. As shown in the figure: The present invention uses the grouping ◎ algorithm in the data exploration to calculate the similarity, and uses a Pearson Correlation Coefficient formula to compare the scores of the user scores in the scores of the scores. The users are grouped together in the same group, thereby dividing the user into several clusters 30 in advance, which greatly reduces the search range of similar users. Among them, the clustering algorithm used in the present invention is a K-means algorithm, and the user is divided into K groups by means of KNN. [Implementation Method 3] The model ACR score sheet is shown in "Fig. 5 to Figure 8", which is a schematic diagram of the category attribute data of the product of the present invention, a summary of the rating data of the user of the present invention, and the product of the present invention. A schematic diagram of the category coded data and a summary of the ratings of the categories of the user of the present invention. As shown in the figure: regarding the establishment of the statistical analysis score prediction module, since a movie can have multiple categories of attributes, the present invention first applies the coding technique to simultaneously consider the information of multiple categories, thereby establishing a user pair. The score sheet for each category of film can be used for future predictions. 17 201011575 In a preferred embodiment, it is assumed that the category attribute of the current movie is as shown in FIG. 5, and the user's rating data is organized in a category manner as shown in FIG. Since the item attribute data is powered on *Iteml & Item4 is the same data, it can be known that the movie Item1 and Item4 belong to category 1, category 3 and category 4 respectively, and are re-encoded by the combination method, 'the two movies Item1 and Item4 It will be compiled into the same simulation category A (Virtual Category A), as shown in Figure 7. In the score table, the user Userl scores 电影Vl,l=3, Vl,4=5 for the movie Item1 and Item4, respectively, so according to the result of FIG. 7, the scores of the movie Iteml and Item4 can be regarded as the same The type of score, so the user Userl scores the simulated category A as the average of the movie Iteml and Item4 scores: (3 + 5) / 2 = 4, as shown in Figure 8. Predictive phase of the recommendation process: [Implementation Method 4] Application User Grouping to Establish a Score Table 凊 Refer to Figure 9A, which is intended by the user of the present invention. As shown in the figure: the score record for a target user is the cluster 3 established for the training phase (as shown in Figure 4, the distance from the center point of each cluster 3 以 to the Pearson correlation coefficient The formula calculates the similarity, finds the cluster closest to the target user, and based on the other users in the cluster, establishes a score sheet for the target user and other users with appropriate size. In this way, it is possible to effectively filter other unrelated users and further achieve better data processing capability (Scalability), thereby reducing the amount of data that needs to be processed in the approximate aggregation algorithm. 201011575 [Implementation Method 5] Application Association Rules establish a complete score table. Assume that the association rules found during this training phase are as follows: {Young, M, Administrator, Action} -> { 3} {Young, M, Action, Fantasy} ^ {4} {Young , Administrator, Action, Fantasy} -> {4 } The first association rule indicates that the young male administrator has a three-point preference for action movies, while others are as described in the rules; If the personal characteristic data meets the above rules, and the characteristics of the product (ie, the target product) to be predicted 0 are also in accordance with the description of the rule, the user's rating on the product may be predicted to be 3 or 4, so this recommendation The method considers multiple rules at the same time and summarizes the final results for prediction. Therefore, the final prediction result is: (3+4+4)/3 = 3.3%3, and applying this method can effectively solve new users. Problems, problems with new products, and issues such as sparse data. [Implementation Method 6] For the predicted value of ModelRS, please refer to "Phase 9 B to 1 i", which are respectively based on 关联 association rules. 9A is a schematic diagram of a sub-array established by the present invention, a schematic diagram of a first element set classified by a category of the present invention, and a schematic diagram of a second set of elements of the present invention classified by a product. As shown in the figure, the present invention utilizes a parameter number including a number of users and The approximate set algorithm of the product quantity limit parameter predicts the user's score, and the first and second elements having the same category can be found by the user quantity limit parameter pair respectively. Limiting the set size of the lower bound approximation' and limiting the number of feature items referenced in establishing the second set of elements by the quantity limit parameter 201011575. In a preferred embodiment, the above two terms are assumed The limit parameter is 2; when using the above-mentioned [Implementation Method 5] to find out the association rules, the target user and other users established in the above [Implementation Method 4] will be corrected in addition to the target score. Other unknown scores other than the commodity are initially predicted, and then the score table constitutes a complete sub-array as shown in Fig. 9B, and then the formula of the similarity between each commodity and the target commodity is calculated by using the distance formula of a Pears〇n correlation coefficient. Defined as

y Σ^· Ν Σχ7 (Σ^2 Ν„ 其中’ X及γ係分別各為兩個物件之陣列；以及Ν 係各為兩物件之資料數。藉上述之距離公式，從上述修正後之評分表中，計算出η、13、14及15分別對12之相似度係分別為〇 938y Σ^· Ν Σχ7 (Σ^2 Ν„ where 'X and γ are each an array of two objects; and the number of data for each of the two objects. By the above distance formula, from the above corrected score In the table, the similarity degrees of η, 13, 14 and 15 respectively to 12 are calculated as 〇938

、〇.1、-0.18 及-0.18，因此可知 ^ 對 12 之 Pears〇n 相關係數為最高’故以11為類別標籤；並分別對使用者以此類別標籤之評分資料為主，將使用者依不同之類別分割成數個第-元素集合，如第i〇圖所示；接著利用該約略集σ /寅算法中之商品數量限制參數係設定為2，將與該目標商品相似度第二高之另—商品13作為特徵商时’並以結合該目標商品12為主將使用者依不同之類別分割成數個第二元素集合，如第11圖所示；之後透過以類職籤分割H素集合以及以商品集 20 201011575 合分割之第二元素集合之比較，找出有被完全包含者為其下界近似’且第二元素集合特徵商品13之評分與目標使用者特徵商品13之評分相同之元素集合{元素5、元素7}，並利用該約略集合演算法中之使用者數量限制參數係设疋為2’決定最後符合要求之下界近似；以及依此最後找出之下界近似為依據{元素5}，預測該目標商品之評分係與同類別之集合相似，因此可得目標使用者對其目標商品之評分預測值係為4。〇 [實施方法七]ModelACR之評分預測值請參閱『第1 2圖』所示，係本發明一較佳實施例之商品類別屬性資料示意圖。如圖所示：於一較佳實施例中，假設其為目前電影之類別屬性資料，而使用者之評分資料如上述第5圖所示，承上述[實施方法三]所言 ’由於電影ItemX與Iteml及Item4係相同之資料，因此重新編碼後係被歸為同一類別，當本推薦裝置想預測使用者Userl對電影ItemX之評分時，電影ItemX預測〇之評分則可利用在訓練階段所建立之使用者對各編碼後之類別電影之評分表來預測（即第7圖所示），其評分資料則為電影Iteml及Item4評分之平均： (Valuel，l+Valuel，4)/2 = (4+4)/2 = 4。 [實施方法八]行為變異判斷最終評分預測值根據上述[實施方法七]ModelACR所預測之評分預測值，以及以上述[實施方法六]ModelRS所預測之評分預測值，本發明係以動態交互使用基礎混合法作結合，進而可有效地提升預測滿意度之準確度。 21 201011575 首先，係將ModelACR之評分預測值以統計方法中之標準差計算出使用者在每種類別之評分行為，其中，該標準差之計算公式為：〇.1, -0.18, and -0.18, so it can be seen that the correlation coefficient of Pears〇n of 12 is the highest, so 11 is the category label; and the user is the main user of the category label, and the user is the user. Divided into a plurality of first-element sets according to different categories, as shown in the figure ii; then, using the approximate quantity σ / 寅 algorithm, the quantity limit parameter is set to 2, and the similarity with the target product is the second highest In addition, when the product 13 is used as the characteristic quotient, the user is divided into a plurality of second element sets according to different categories, in combination with the target product 12, as shown in FIG. 11; The collection and the comparison of the second element set divided by the product set 20 201011575 find that the full inclusion is the lower bound approximation' and the score of the second element set feature item 13 is the same as the target user feature item 13 Element set {element 5, element 7}, and use the number of users in the approximate set algorithm to limit the parameter set to 2' to determine the final bounds of the lower bound; and finally find out Boundary approximation is based on the elements {5}, the ratings prediction system and the target product of the same set of similar categories, so you can get their target by using a predicted value of the score of the target product 4 is based. 〇 [Implementation Method 7] The predicted value of the model ACR is shown in Fig. 12, which is a schematic diagram of the product category attribute data according to a preferred embodiment of the present invention. As shown in the figure: in a preferred embodiment, it is assumed to be the category attribute data of the current movie, and the user's rating data is as shown in FIG. 5 above, as stated in the above [Implementation Method 3] due to the movie ItemX The same data as Iteml and Item4, so re-encoded is classified into the same category. When the recommendation device wants to predict the user Userl's rating on the movie ItemX, the movie ItemX predicts the score of the movie to be established during the training phase. The user predicts the score table of each coded movie (ie, as shown in Fig. 7), and the score data is the average of the scores of the movie Iteml and Item4: (Valuel, l+Valuel, 4)/2 = ( 4+4)/2 = 4. [Embodiment Method 8] Behavior Variation Judgment Final Score Prediction Value According to the above-mentioned [Implementation Method 7] Model ACR predicted score prediction value, and the above-mentioned [Implementation Method 6] ModelRS predicted score prediction value, the present invention is dynamically interactively used. The combination of the basic mixing method can effectively improve the accuracy of prediction satisfaction. 21 201011575 Firstly, the model ACR score prediction value is calculated by the user's standard deviation in the statistical method, and the standard deviation is calculated as:

其中，該N係為此目標商品之所有已評分之資料筆數，該xi係為使用者對此目標商品i之評分。該動態交互使用基礎混合法係利用上述計算公式 _ 所算出之標準差為門檻值，判斷目標商品在使用者過去評分上之標準差是否超過此門檻值，若標準差太大則以 ModelRS之預測結果為準，否則以ModelACR之預測結果為準，而該動態交互使用基礎混合法之判斷定義係為 :FRSA = iPer^〇rm M〇delRS - > thresholdWherein, the N is the number of all the scored data of the target product, and the xi is the user's rating on the target product i. The dynamic interaction using the basic mixing method uses the standard deviation calculated by the above calculation formula _ as a threshold value to determine whether the standard deviation of the target product in the past score of the user exceeds the threshold value, and if the standard deviation is too large, the prediction of ModelRS is used. The result is correct, otherwise the prediction result of ModelACR shall prevail, and the definition of the dynamic interaction using the basic hybrid method is: FRSA = iPer^〇rm M〇delRS - > threshold

其中，該ModelRS係為約略集合評分預測模組，該ai係利用該標準差公式所計算出之目標商品之評分 q 變異程度；以及該athreshold係為該約略集合評分預測模組之參數值。綜上所述，本發明係一種整合約略集合與多重特徵探勘之推薦裝置及其方法，可有效改善習用之種種缺點，係結合商品及使用者等多元化之資料，利用資料探勘法找出關聯規則，可解決新加入之使用者問題、新加入之商品問題及資料稀疏之問題，並透過分群演算法可有效過濾其他無關之使用者，以達到較好之資料處理能力，最後並可在門檻值之設定下，動態調整地應用兩種 22 201011575 預測方式之預測結果，以有效地達到提升預測滿意度之準確度，進而使本發明之産生能更進步、更實用、更符合使用者之所須，確已符合發明專利申請之要件，爰依法提出專利申請。惟以上所述者，僅為本發明之較佳實施例而已，當不能以此限定本發明實施之範圍；故，凡依本發明申請專利範圍及發明說明書内容所作之簡單的等效變化與修飾，皆應仍屬本發明專利涵蓋之範圍内。 ❹ ❹ 【圖式簡單說明】第1圖，係本發明推薦裝置之架構示意圖。第2圖，係本發明推薦方法之流程示意圖。 =3圖，係本發明約略集合演算法之流程示意圖。第4圖，係本發明分群演算法之運作示意圖。 =圖，係本發明商品之類別屬性資料示意圖。第6圖，係本發明使用者對商品之評分第7圖’係本發明商品之類別編碼’、。第8圖，係本發明使用者對類別之評第9 A圖，係本發明使用者對商；_、。第9 B圖’係以關聯規則對第9A圖建立之子意圖。 P ' 第1 0圖，係本發明以類別分類之第一圖。元素集合示意The ModelRS is an approximate set score prediction module, and the ai is a score of the target product calculated by the standard deviation formula, and the athereshold is a parameter value of the approximate set score prediction module. In summary, the present invention is a recommended device and method for integrating the approximate set and multiple feature exploration, which can effectively improve various shortcomings of the use, and combines the diversified materials such as commodities and users to find the association by using the data exploration method. The rules can solve the problem of newly added users, newly added product problems and sparse data, and can effectively filter other unrelated users through clustering algorithms to achieve better data processing capabilities, and finally at the threshold. Under the setting of the value, the prediction results of the two 22 201011575 prediction methods are dynamically adjusted to effectively achieve the accuracy of improving the prediction satisfaction, thereby making the invention more progressive, more practical and more suitable for the user. It must have met the requirements of the invention patent application and filed a patent application in accordance with the law. However, the above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto; therefore, the simple equivalent changes and modifications made in accordance with the scope of the present invention and the contents of the invention are modified. All should remain within the scope of the invention patent. ❹ ❹ [Simplified description of the drawings] Fig. 1 is a schematic diagram showing the structure of the recommended device of the present invention. Figure 2 is a schematic flow chart of the recommended method of the present invention. The =3 diagram is a schematic diagram of the flow of the approximate assembly algorithm of the present invention. Figure 4 is a schematic diagram of the operation of the clustering algorithm of the present invention. = diagram, which is a schematic diagram of the category attribute data of the commodity of the present invention. Fig. 6 is a score of a user of the present invention for a product. Fig. 7 is a category code of the article of the present invention. Figure 8 is a review of the category of the user of the present invention. Figure 9A is a user-to-business relationship of the present invention; _,. Figure 9B is a sub-intention to establish a map of Figure 9A with an association rule. P 'Fig. 10 is the first diagram of the classification of the invention by category. Element collection

係本發明以商品分類之第圖0 元素集合示意 23 201011575 第1 2圖，係本發明一較佳實施例之商品類別屬性資料示意圖。【主要元件符號說明】使用者記錄模組10 使用者執跡檔101 商品軌跡檔1 0 2 使用者評分記錄1 0 3 資料整合模組11 © 關聯規則探勘模組12 關聯規則資料庫1 2 1 使用者分群模組13 使用者分群資料庫1 3 1 統計分析評分預測模組1 4 評分表141 使用者分群決定模組15 資料陣列模組1 6 ❹ 約略集合評分預測模組17 行為變異判斷模組18 步驟（A)〜（J) 20〜29 步驟（hi)〜（h7) 271〜277 群集3 0 24BRIEF DESCRIPTION OF THE DRAWINGS The present invention is illustrated by a collection of elements of a commodity classification. FIG. 2 is a schematic diagram of a commodity category attribute of a preferred embodiment of the present invention. [Main component symbol description] User record module 10 User profile file 101 Product track file 1 0 2 User score record 1 0 3 Data integration module 11 © Association rule exploration module 12 Association rule database 1 2 1 User group module 13 User group database 1 3 1 Statistical analysis score prediction module 1 4 Score table 141 User group decision module 15 Data array module 1 6 约 Summary set score prediction module 17 Behavior variation judgment module Group 18 Step (A) ~ (J) 20~29 Step (hi) ~ (h7) 271~277 Cluster 3 0 24

Claims

201011575 十、申請專利範圍： 1 ·-種整合約略集合與多重特徵探勘之推其方法，該推薦方法係包括-訓練階段與一預測階段， ^訓練階段係分卿以建立g聯規則、使用者分群及 »平刀表，該訓練階段包含下列步驟： (A)提供各種不同特性之資料，包含使用者之個人特性資料及個人評分記錄與商品之内涵資料； (B )將上述各種不同特性之資料經一資料前置〇處理作結合’形成多筆交易資料； (C )關聯該些交易資料，揭取該些交易資料間之多個關聯性，並以一資料探勘（DataMining)法建立該些交易資料中之多個關聯規則，且存入一關聯規則資料庫； (D)類取上述各種不同特性之資料中使用者之個人评分記錄，以一分群演算法將該些個人評分記錄中各使用者分割成幾個群集，並存入一使用者分群資 ❹ 料庫；以及 (E )分析該些交易資料中每一使用者之交易記錄’以一統計分析（Statistical Analysis)預測法，將各商品之類別屬性以組谷方式重新編碼，並將每一使用者之評分資料以類別方式作整理，利用分析計算出每一使用者對各類別商品之評分平均，進而建立出每一使用者對各編碼後之類別商品之評分表；該預測階段係針對一目標使用者之評分記錄，分別應用約略集合法預測評分預測值，以及應用統計分 25 201011575 析預測法預測評分預測值，該預測階段包含下列步驟 (F)根據在該訓練階段所建立之多個群集中，找出與該目標使用者較為相關之其他使用者為基礎，建立出該目標使用者及其他使用者對商品之評分表201011575 X. The scope of application for patents: 1 · The method of integrating the approximate set and the multiple feature exploration, the recommended method includes a training phase and a prediction phase, and the training phase is to establish a g-rule rule and user Grouping and » flat knife table, the training phase includes the following steps: (A) Provide information on various characteristics, including the user's personal characteristics data and personal score records and product connotation data; (B) the above various characteristics The data is combined with a data pre-processing to form a plurality of transaction data; (C) correlating the transaction materials, extracting multiple associations between the transaction materials, and establishing the data by a data mining method (Data Mining) a plurality of association rules in the transaction data, and stored in an association rule database; (D) class takes the personal score records of the users in the data of the above different characteristics, and records the individual scores by a group algorithm Each user is divided into several clusters and stored in a user group resource library; and (E) each user in the transaction data is analyzed "Easy Recording" uses a statistical analysis (Statistical Analysis) forecasting method to re-encode the category attributes of each product in a group mode, and classify each user's rating data in a category manner, and use the analysis to calculate each user. Average the scores of each category of products, and then establish a score table for each user of each type of coded product; the prediction stage is based on the score record of a target user, respectively, using the approximate set method to predict the score prediction value, and Application Statistics 25 201011575 The prediction method predicts the score prediction value, and the prediction phase includes the following steps (F) based on finding other users more relevant to the target user among the multiple clusters established in the training phase. , establish a score sheet for the target user and other users on the product

(G )根據在該訓練階段所建立之關聯規則，將除了該目標使用者及其目標商品外，對其他未知評分 (Unknown Value )作初步之預測，以建立一完整之子陣列（Sub-Matrix); 從上述填滿後之評分表中，利用設定該約略集合演算法中之商品數量限制參數，口 (Η)利用一包含有一使用者數量限制參數及一商品數量限制參數之約略集合演算法，先列中每項商品與該目標商品之相似度，找出車商品相似度最高之商品作為類別標藏，之後再將除了作為該類別標籤之商品外，計算每項商品對該目標商品之相似度，藉該商品數量限制參數之設定找出一特徵商品，並透過排序找出與該目標商目集合，之後分關各使用者㈣職籤之資料為將各使用者依不同之類別做分割，以建立相同類別（ ¥ValenCeClaSS)之第一元素集合（ElementarySet )’同時，並分㈣各❹相結合該目標商品盘該特徵商品之資料為主，將各使用者依不同之商品集‘ 做分割’以建立相同類別之第二元素集合，最後透過 26 201011575 該第一元素集合及該第二元素集合之比較找出下界近似（Lower Approximation )，進而預測出該目標使用者對該目標商品之評分預測值； (Ϊ )根據在該訓練階段以統計分析預測法建立之評分表，預測出每一使用者對各類別商品之評分預測值；以及 (J )根據以該統計分析預測法建立之評分預測值以及以該約略集合演算法預測之評分預測值，利用 ❹ 一動態交互使用基礎（Switch-based )混合法，將以該統計分析預測法建立之評分預測值以統計方法中之標準差，計算出使用者在每種類別之評分行為，透過統計中標準差之大小作為一門檻值（Thresh〇ld)之 β又足，判斷該目標商品在使用者過去評分上之標準差是否超過該門檻值，若標準差太大則以該約略集合演算法預測之評分預測值為準，否則以該統計分析預^ 法預測之評分預測值為準。 ❹ 2 ·依據申請專利範圍第丄項所述之整合約略集合與多重特徵探勘之推薦裝置及其方法，其中，該訓練階段步驟（D)係以該分群演算法施以相似度之計算事先將使用者为割成幾個群集，並以pears〇n相關係數 (Pearson Correiation c〇efficien〇公式，將評分行為表現上類似之使用者聚集在同一群中。 3 .依據申請專利範圍第i項所述之整合約略集合與多重特徵探勘之推薦裝置及其方法，纟中，肖分群演算 27 201011575 法係為K-means演算法。 4 ·依據申請專利範圍第1項所述之整合約略集合與多重特徵探勘之推薦裝置及其方法，其中，該預測階段步驟（F )針對該目標使用者之評分記錄，係將該訓練階丨又所建立之群集，對每個群集之中心點以 Pearson相關係數之距離公式計算相似度，將與使用者距離最近之群集找出。 ❹ 5·依據申請專利範圍第1項所述之整合約略集合與多重特徵探勘之推薦裝置及其方法，其中，該預測階段步驟（G)計算該子陣列中每項商品與該目標商品之相似度係以Pearson相關係數之距離公式，其公式為 γ Σ^· —N Ν, Σ^2(G) Based on the association rules established during the training phase, preliminary predictions of other unknown scores (Unknown Value) will be made in addition to the target user and its target commodity to create a complete sub-matrix From the above-mentioned filled score table, using the set quantity limit parameter in the approximate set algorithm, the mouth (Η) utilizes an approximate set algorithm including a user quantity limit parameter and a commodity quantity limit parameter, Firstly, the similarity between each item in the first item and the target item is found, and the product with the highest similarity of the car product is identified as a category, and then the similarity of each item to the target item is calculated in addition to the item labeled as the category tag. Degree, by the setting of the quantity limit parameter of the commodity, find a characteristic commodity, and find out the collection of the target business by sorting, and then divide the information of each user (4) to divide each user according to different categories. To create the first element set (ElementarySet) of the same category ( ¥ValenCeClaSS) at the same time, and divide (four) each combination of the target product disk The material of the feature product is mainly composed, and each user is divided into different product sets to create a second element set of the same category, and finally through 26 201011575, the first element set and the second element set are compared. Lower Approximation, which in turn predicts the predicted value of the target user for the target product; (Ϊ) predicts each user for each category based on the score table established by the statistical analysis prediction method at the training stage. a predicted value of the score of the commodity; and (J) based on the score prediction value established by the statistical analysis prediction method and the score prediction value predicted by the approximate set algorithm, using a dynamic cross-switching method, The score prediction value established by the statistical analysis prediction method is used to calculate the user's score behavior in each category by the standard deviation in the statistical method, and the size of the standard deviation in the statistics is used as a threshold value (Thresh〇ld). Again, determine whether the standard deviation of the target product in the user's past score exceeds the threshold, if the standard deviation is too large Then, the prediction value of the score predicted by the approximate ensemble algorithm is accurate, otherwise the prediction value of the prediction predicted by the statistical analysis is accurate. ❹ 2 · The recommended device and method for integrating the approximate set and multiple feature exploration according to the scope of the patent application scope, wherein the training phase step (D) is based on the calculation of the similarity of the clustering algorithm in advance The user cuts into several clusters and uses the pears〇re correlation coefficient (Pearson Correiation c〇efficien〇 formula to gather users with similar performances in the same group. 3. According to the scope of patent application i The recommended device and method for integrating the approximate set and multiple feature exploration, in the middle, the Xiao group calculus 27 201011575 The legal system is the K-means algorithm. 4 · The integrated set and multiple according to the scope of patent application The device and method for feature exploration, wherein the step (F) of the prediction stage is for the target user's score record, and the cluster established by the training stage is a Pearson correlation coefficient for the center point of each cluster. The distance formula calculates the similarity and finds the cluster closest to the user. ❹ 5. Integration according to item 1 of the patent application scope The apparatus and method for slightly arranging and multi-feature exploration, wherein the prediction stage step (G) calculates a similarity between each item of the sub-array and the target item as a distance formula of Pearson correlation coefficient, and the formula is γ Σ^· —N Ν, Σ^2

6 ·依據申請專利範圍第ι項所述之整合約略集合與多重特徵探勘之推薦裝置及其方法，其中，該使用者數量限制參數係用以對找出具有相同類別之第一、二元素集合之下界近似之集合大小作限制。 7 .依據申請專利範圍第1項所述之整合約略集合與多重特徵探勘之推薦裝置及其方法，其中，該商品數量限制參數係用以對在建立第二元素集合時所參考之特徵商品之數量作限制。 28 201011575 8 =射請專利第1項所述之整合約略集合㈠ f特徵探勘之推薦裝置及其方法，其中，該約略：合浹算法之流程係進一步包含下列步驟： (h 1 )接收步驟（G )建立之子陣列； j h 2 )從上述之子陣列中’計算每項商品與該目:：品之相似度，並找出相似度最高之別標藏；。册 (h 3 )從上述之子陣列中，利用設定該約略集 ❹合演算法中之商品數量限制參數，將除了上述作為類別標籤之商品外，計算每項商品對該目標商品之相似度，藉該商品數量限制參數對該特徵商品所設之限制數量找出該特徵商品，並透過排序找出與該目標商品最相關之商品集合； (h 4 )分別對各使用者以該類別標籤為基礎，將各使用者依具有相同類別之評分資料作分割，以建立數個第-元素集合，以及對該目標使用者外之其他 ©冑用者以該商品集合為基礎’將其他使用者依具有相同類別之評分資料作分割，以建立數個第二元素集合 9 (h 5 )透過比較該第一元素集合與該第二元素集合，找出有被完全包含者為其下界近似； (h 6 )針對上述找出之下界近似，利用設定該約略集合演算法中之使用者數量限制參數，判斷是否有符合要求之下界近似，若無符合要求，則回步驟（ h 3 )重新設定最相關之商品集合；以及 29 201011575 (h 7)以上述找出之下界近似為依據，預測該目標商品之評分係與同類別之集合相似，因此獲得目標使用者對其目標商品之評分預測值。 9 ·依據申請專利範圍第i項所述之整合約略集合與多重特徵探勘之推薦裝置及其方法，其中，該動態交互使用基礎混合法所使用之標準差，其公式為： ’ Λ6. The recommended apparatus and method for integrating approximate set and multiple feature exploration according to the scope of claim 1 wherein the user quantity limit parameter is used to find the first and second element sets having the same category The set size of the lower bound is limited. 7. The apparatus and method for integrating the approximated set and multiple feature exploration according to claim 1 of the scope of the patent application, wherein the item quantity restriction parameter is used for the characteristic item referenced when establishing the second element set The number is limited. 28 201011575 8 = The integrated approximation set of the patent application (1) f feature exploration device and method thereof, wherein the approximation: the process of the merge algorithm further comprises the following steps: (h 1 ) receiving step ( G) the created sub-array; jh 2) from the above sub-array, 'calculate the similarity between each item and the item:: and find the highest similarity; Book (h 3 ), from the above-mentioned sub-array, using the product quantity limitation parameter in the approximate set combination algorithm, calculating the similarity of each item to the target item, in addition to the above-mentioned item as the category label, The item quantity restriction parameter finds the characteristic item for the limited quantity set by the characteristic item, and finds the item set most relevant to the target item by sorting; (h 4 ) respectively, based on the category label of each user, Each user is divided according to the rating data having the same category to establish a plurality of first-element sets, and other users outside the target user are based on the product collection, and the other users are the same. The classification data of the category is divided to establish a plurality of second element sets 9 (h 5 ) by comparing the first element set with the second element set to find out that the fully included one has a lower bound for it; (h 6 ) Finding the lower bound approximation for the above, and determining whether there is a lower bound of the required boundary by setting a limit number of users in the approximate set algorithm. If it meets the requirements, go back to step (h 3) to reset the most relevant product set; and 29 201011575 (h 7) based on the above-mentioned lower bound approximation, predict that the target product's rating is similar to the same category, so Get the predicted value of the target user's rating for their target product. 9. The recommended apparatus and method for integrating approximate set and multiple feature exploration according to item i of the patent application scope, wherein the dynamic interaction uses the standard deviation used by the basic mixing method, and the formula is: ’

〇·-種整合約略集合與多重特徵探勘之推薦裝置及其方法，該推薦裝置係包括：使用者記錄模組，係包含一使用者軌跡檔、一商品軌跡檔及一使用者評分記錄，用以提供使用者之個人特性資料及個人評分記錄與商品之内涵料；資料整合模組，係接收該使用者記錄模組輸出之個人特性資料、個人評分記錄及内涵資料，用 Μ進行一資料前置處理，將使用者每一筆消費後之個人評分記錄結合使用者之個人特性資料及商品之資料產生一使用者交易資料（Transaction Table w-现則徠助模組，係接收該資料整合模組之交易資料’用以將該些交易資料中之：同關性擷取出，並以規則之形式表現，以建立該些交 201011575 易資料中之多個關聯規則，並存入一關聯規則資料庫供存取之用；一使用者分群模組，係接收該使用者評分記錄輸出之個人評分記錄，用以針對使用者之個人評分記錄將使用者分割成幾個群集，並存入一使用者分群資料庫供存取之用；一統計分析評分預測模組（ModelACR )，係接收該資料整合模組輸出之交易資料，用以針對每一使用者之交易記錄進行統計分析，並以類別之方式作整理與β十算，以建立每一使用者對各編碼後之類別商品之評分表，並根據該評分表預測出每一使用者對各類別商品之評分預測值；一使用者分群決定模組，係針對一目標使用者之評分記錄，根據該使用者分群資料庫之多個群集中，找出與該目標使用者較為相關之其他使用者，並建立該目標使用者及其他使用者對商品之評分表 » 一資料陣列模組，係接收該使用者分群決定模組輸出之評分表，並根據該關聯規則資料庫之多個關^規則巾’對該評分表内之目標使用者及其目標商外其他未知評分作初步之預測，且依此建立一完整之子陣列； :一約略集合評分預測模組（ModelRS )，係接收，資料陣列模組輸出之子陣列，用以將依類別做分 ^建立相同類別之第—^素集合以及依商品集合做 31 201011575 «1建立相同類別之第二元素集合加以比較，並找出下界近似以預測該目標使用者對該目標商品之評分預測值；以及一行為變異判斷模組，係接收該統計分析評分預測模組與該約略集合評分預測模組各自輸出之評分預測值’透過一門檻值之設定，以動態之調整判斷其中一評分預測值為最終之評分預測值。 32推荐 - - 整合整合整合整合整合整合整合整合整合整合整合整合约约约约约约约约约约约约约约约约约约约约约约约约约约约约约约约约约约To provide the user's personal characteristics data and personal score records and the connotation of the product; the data integration module receives the personal characteristic data, personal score records and connotation data output by the user record module, and is used to perform a data before The processing unit generates a user transaction data by combining each user's post-consumer personal score record with the user's personal characteristics data and the product data (Transaction Table w-current support module, receiving the data integration module The transaction data is used to extract the relevant information from the transaction data and express it in the form of rules to establish a plurality of association rules in the 201011575 easy-to-use data and deposit them into an association rule database. For access; a user grouping module, which receives the personal rating record output by the user's rating record, The user's personal rating record divides the user into several clusters and stores them in a user group database for access; a statistical analysis score prediction module (ModelACR) receives the data integration module output. The transaction data is used for statistical analysis of each user's transaction records, and is sorted by category and β ten calculations, to establish a score table for each user for each type of coded product, and according to the The score table predicts the predicted value of each user's rating for each category of products; a user grouping decision module is for a target user's rating record, based on the plurality of clusters of the user grouping database, Other users who are more relevant to the target user, and establish a score sheet for the target user and other users on the product » a data array module, which receives the score sheet output by the user group determination module, and according to The multiple rules of the association rule database are preliminary to the target users in the score table and other unknown scores outside the target business. Predicting, and thereby establishing a complete sub-array; an approximate set score prediction module (ModelRS), which is a sub-array of the output, data array module output, for classifying the categories to create the same category -^ Prime collection and product collection 31 201011575 «1 to establish a second set of elements of the same category to compare, and find the lower bound approximation to predict the target user's rating of the target product; and a behavior variation judgment module, The system receives the score prediction value of the statistical analysis score prediction module and the approximate set score prediction module, and uses a threshold value to dynamically determine whether the score prediction value is the final score prediction value.