TWI398780B - Efficient signature-based strategy for inexact information filtering - Google Patents

Efficient signature-based strategy for inexact information filtering Download PDF

Info

Publication number
TWI398780B
TWI398780B TW98115183A TW98115183A TWI398780B TW I398780 B TWI398780 B TW I398780B TW 98115183 A TW98115183 A TW 98115183A TW 98115183 A TW98115183 A TW 98115183A TW I398780 B TWI398780 B TW I398780B
Authority
TW
Taiwan
Prior art keywords
identification
unit
signature
bit number
order
Prior art date
Application number
TW98115183A
Other languages
Chinese (zh)
Other versions
TW201040742A (en
Inventor
Ye In Chang
Yi Siang Wang
Lee Wen Huang
Jun Hong Shen
Original Assignee
Univ Nat Sun Yat Sen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Sun Yat Sen filed Critical Univ Nat Sun Yat Sen
Priority to TW98115183A priority Critical patent/TWI398780B/en
Publication of TW201040742A publication Critical patent/TW201040742A/en
Application granted granted Critical
Publication of TWI398780B publication Critical patent/TWI398780B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

非精確資訊過濾之有效簽章式方法Valid signature method for inexact information filtering

本發明係關於一種資訊過濾之有效簽章式方法,詳言之,係關於一種非精確資訊過濾之有效簽章式方法。The present invention relates to an effective signature method for information filtering, and more particularly to an effective signature method for inexact information filtering.

在習知技術中,關於既有支援非精確過濾之方法中,最常被使用的方法是透過每個用戶簽章(signature)的相異項目,區域性地分割資料群並建立相應之索引,再過濾掉不滿足查詢條件的用戶資料。然而,由於此種區域性地分割方式並不穩定,同時,此種方式只擷取了每個用戶簽章間的一個相異項目進行分割,對於含有大量項目之查詢,其過濾效果是非常差的。In the prior art, among the methods for supporting inexact filtering, the most commonly used method is to segment the data group and establish a corresponding index through the different items of each user signature. Filter out user data that does not meet the query criteria. However, because this kind of regional division is not stable, at the same time, this method only captures a different item between each user signature, and the filtering effect is very poor for queries with a large number of items. of.

圖1顯示習知比對用戶簽章及資料索引之示意圖。該習知過濾方法係先根據一資訊數據(profile)之複數個項目之簽章(item signature),計算一整體之數據簽章(profile signature)。其中,若用戶輸入之查詢項目與該數據簽章作及閘運算後與該查詢項目原來之簽章相同,則判斷該查詢項目係為適格之資訊數據。Figure 1 shows a schematic diagram of a conventional comparison of user signatures and data indexing. The conventional filtering method first calculates an overall data signature based on an item signature of a plurality of items of information. Wherein, if the query item input by the user is the same as the original signature of the query item after the data signature and the gate operation, the query item is determined to be suitable information data.

例如,查詢項目Animation、Comic、Game之簽章分別為(110000)、(100100)、(100010),將該等簽章作或閘(OR)運算得一整體之數據簽章(110110),其中該等查詢項目Animation、Comic、Game之簽章與該數據簽章作及閘(AND)運算後與原來之簽章相同,則判斷該等查詢項目係為適格之資訊數據。For example, the signatures of the query items Animation, Comic, and Game are (110000), (100100), and (100010), respectively, and the signatures are ORed (OR) to calculate an overall data signature (110110), wherein The signatures of the query items Animation, Comic, and Game are the same as the original signatures after the data signature and AND operation, and the query items are judged to be suitable information data.

然而,若一查詢項目為Movie相應之簽章為(110100),其與該數據簽章(110110)作及閘運算後同樣與該查詢項目原來之簽章相同,該查詢項目應為適格之資訊數據,但因整體之數據簽章定義不良(未定義Movie之簽章),故導致過濾資訊數據發生錯誤(false drop)。However, if a query item is the corresponding signature of Movie (110100), it is the same as the original signature of the query item after the data signature (110110) and the gate operation, and the query item should be suitable information. Data, but because the overall data signature is poorly defined (the signature of the Movie is not defined), it causes a false drop in filtering information data.

因此,實有必要提供一種創新且進步性之非精確資訊過濾之有效簽章式方法,以解決上述問題。Therefore, it is necessary to provide an innovative and progressive non-exact information filtering effective signature method to solve the above problems.

本發明提供一種非精確資訊過濾之有效簽章式方法,用以過濾複數個資訊數據(profile)中符合使用者查詢項目之適格(qualified)者,每一資訊數據具有一數據簽章(signature),該數據簽章具有一設定位元數,該數據簽章之該等位元具有相應之序號,每一位元具有二位元中之一第一字元或一第二字元,該查詢項目具有一查詢簽章。該方法包括以下步驟:(a)根據該等數據簽章於相同序號位元出現該第一字元之個數及該等數據簽章數量之一中間值,計算一第一識別位元序號,且將該第一識別位元序號紀錄於一識別單元中;(b)根據該第一識別位元序號區分該等資訊數據為一第一資訊數據集團及一第二資訊數據集團,其中該第一資訊數據集團中每一資訊數據之數據簽章相應該第一識別位元序號之位元係為該第一字元,該第二資訊數據集團中每一資訊數據之數據簽章相應該第一識別位元序號之位元係為該第二字元;(c)根據相應該第一識別位元序號以外之位元且重複步驟(a)及(b),計算複數個下階之新第一識別位元序號並紀錄於相應識別單元中,以及根據上階之新第一識別位元序號區分得相應下階第一資訊數據集團及該第二資訊數據集團,直至區分後之第一資訊數據集團及第二資訊數據集團僅包括一資訊數據,根據該等相應識別單元以建立一鑑識樹(ID-tree);及(d)根據該鑑識樹及該查詢簽章計算適格之資訊數據。The present invention provides an effective signature method for inexact information filtering, which is used to filter a plurality of qualified information profiles in a plurality of information profiles, each of which has a data signature. The data signature has a set number of bits, and the bits of the data signature have corresponding serial numbers, each of the bits having one of the first characters or a second character of the two bits, the query The project has a query signature. The method comprises the following steps: (a) calculating a first identification bit number according to the number of the first character and the intermediate value of one of the data signatures in the same serial number sign, And the first identification bit number is recorded in an identification unit; (b) the information data is classified into a first information data group and a second information data group according to the first identification bit number, wherein the first The data signature of each information data in an information data group corresponds to the first character of the first identification bit number, and the data signature of each information data in the second information data group corresponds to the first a bit identifying the bit number is the second character; (c) calculating a plurality of lower order new bits according to the bits other than the first identifying bit number and repeating steps (a) and (b) The first identification bit number is recorded in the corresponding identification unit, and the first information data group and the second information data group are respectively classified according to the new first identification bit number of the upper stage until the first after the discrimination Information Data Group and second information data Groups include only one information data, in accordance with that corresponding forensic identification unit to establish a tree (ID-tree); and (d) computing eligibility of forensic data according to the tree information and the query signature.

本發明非精確資訊過濾之有效簽章式方法不以區域性的方式分割資料群(該等資訊數據),而是利用簽章式策略採用全域性的根據每個項目(每一資訊數據)在整個資料群的出現次數來進行分割,資料群被分割為二個子群集(該第一資訊數據集團及該第二資訊數據集團)並據以建立鑑識樹。根據所建立之鑑識樹(ID-tree)的索引方式來儲存用戶簽章,再利用建立好的索引,快速地處理大量的用戶資料,其為有效率的索引及查詢方法,故可減少資訊過濾時所需花費的時間。The effective signature method of the inexact information filtering of the present invention does not divide the data group (the information data) in a regional manner, but uses the signature strategy to adopt a global basis according to each item (each information data). The number of occurrences of the entire data group is divided, and the data group is divided into two sub-clusters (the first information data group and the second information data group) and the forensic tree is established accordingly. According to the established ID-tree indexing method, the user signature is stored, and the established index is used to quickly process a large amount of user data, which is an efficient indexing and query method, thereby reducing information filtering. The time it takes.

本發明提供一種非精確資訊過濾之有效簽章式方法,用以過濾複數個資訊數據(profile)中符合使用者查詢項目之適格(qualified)者,每一資訊數據具有一數據簽章(signature),該數據簽章具有一設定位元數,該數據簽章之該等位元具有相應之序號,每一位元具有二位元中之一第一字元或一第二字元。The present invention provides an effective signature method for inexact information filtering, which is used to filter a plurality of qualified information profiles in a plurality of information profiles, each of which has a data signature. The data signature has a set number of bits, and the bits of the data signature have corresponding serial numbers, and each bit has one of the first characters or a second character of the two bits.

該查詢項目具有一查詢簽章,該查詢簽章具有該設定位元數,該查詢項目之該等位元具有相對於該數據簽章之該等位元之序號,該查詢項目之每一位元具有該第一字元或該第二字元。在本實施例中,該第一字元係為1,該第二字元係為0The query item has a query signature, the query signature has the set number of bits, and the bits of the query item have a serial number relative to the data signature, each of the query items The element has the first character or the second character. In this embodiment, the first character is 1 and the second character is 0.

圖2顯示本發明資訊數據之數據簽章與查詢項目之查詢簽章之關係圖。其中,該數據簽章係包含於該查詢簽章,例如,查詢項目之查詢簽章Query為{A,B,C,D,E},數據簽章P1為{A,C,E},數據簽章P2為{B,D,G},數據簽章P1包含於該查詢簽章Query,則數據簽章P1為符合查詢簽章Query之過濾條件,數據簽章P2不包含於該查詢簽章Query,則數據簽章P2不符合查詢簽章Query之過濾條件。FIG. 2 is a diagram showing the relationship between the data signature of the information data of the present invention and the query signature of the query item. The data signature is included in the query signature. For example, the query signature Query of the query item is {A, B, C, D, E}, and the data signature P1 is {A, C, E}, data. The signature P2 is {B, D, G}, the data signature P1 is included in the query signature Query, and the data signature P1 is a filtering condition that conforms to the query signature Query, and the data signature P2 is not included in the query signature. Query, the data signature P2 does not meet the filtering criteria of the query signature Query.

圖3顯示本發明非精確資訊過濾之有效簽章式方法流程圖。首先,參考步驟S1,根據該等數據簽章於相同序號位元出現該第一字元之個數及該等數據簽章數量之一中間值,計算一第一識別位元序號,且將該第一識別位元序號紀錄於一識別單元中。在本實施例中,步驟S1包括以下步驟:步驟S11,計算該等數據簽章於相同序號位元出現該第一字元之個數與該中間值之複數個差值;及步驟S12,根據該等差值中最小值者之相應序號,定義該相應序號為該第一識別位元序號。其中,在步驟S12中,若該等差值中具有複數個最小值者,定義該等差值中最小值者其中之一之相應序號為該第一識別位元序號。3 is a flow chart showing an effective signature method of the inexact information filtering of the present invention. First, referring to step S1, calculating a first identification bit number according to the number of the first character and the intermediate value of one of the data signatures in the same serial number bit, and calculating the first identification bit number, and The first identification bit number is recorded in an identification unit. In this embodiment, step S1 includes the following steps: Step S11, calculating, by the data signature, a plurality of differences between the number of the first character and the intermediate value in the same serial number bit; and step S12, according to step S12 The corresponding serial number of the minimum of the differences, the corresponding serial number is defined as the first identification bit serial number. Wherein, in step S12, if there is a plurality of minimum values among the differences, the corresponding sequence number defining one of the minimum values of the differences is the first identification bit number.

參考步驟S2,根據該第一識別位元序號區分該等資訊數據為一第一資訊數據集團及一第二資訊數據集團,其中該第一資訊數據集團中每一資訊數據之數據簽章相應該第一識別位元序號之位元係為該第一字元,該第二資訊數據集團中每一資訊數據之數據簽章相應該第一識別位元序號之位元係為該第二字元。Referring to step S2, the information data is divided into a first information data group and a second information data group according to the first identification bit number, wherein the data signature of each information data in the first information data group corresponds to the The bit of the first identification bit number is the first character, and the data signature of each information data of the second information data group corresponds to the bit of the first identification bit number being the second character .

參考步驟S3,根據相應該第一識別位元序號以外之位元(無資料,N/A)且重複步驟S1及S2,計算複數個下階之新第一識別位元序號並紀錄於相應識別單元中,以及根據上階之新第一識別位元序號分區分得相應下階第一資訊數據集團及該第二資訊數據集團,直至區分後之第一資訊數據集團及第二資訊數據集團僅包括一資訊數據,根據該等相應識別單元以建立一鑑識樹(1位元)。Referring to step S3, according to the bit other than the first identification bit number (no data, N/A) and repeating steps S1 and S2, the plurality of new first identification bit numbers of the lower order are calculated and recorded in the corresponding identification. In the unit, and according to the new first identification bit number of the upper stage, the corresponding first information data group and the second information data group are divided into the first information data group and the second information data group. An information data is included, and an identification tree (1 bit) is established according to the corresponding identification units.

參考步驟S4,根據該鑑識樹及該查詢簽章,計算適格之資訊數據。在本實施例中,在步驟S4中係根據該鑑識樹各階之第一識別位元序號比對該查詢簽章之該等位元,以計算適格之資訊數據。Referring to step S4, the appropriate information data is calculated according to the forensic tree and the query signature. In this embodiment, in step S4, the first identification bit number of each step of the forensic tree is compared with the equal sign of the query to calculate the appropriate information data.

茲以圖4至圖8詳細說明本發明非精確資訊過濾之有效簽章式方法中,建立該鑑識樹之實施例。首先參考圖4及圖5,在本實施例中共有9個數據簽章(P1-P9,N=9,其中N表示數據簽章之數量),該數據簽章之設定位元數D為10位(D={1,2,3,4,5,6,7,8,9,10}),該查詢項目為{1,2,3,5,8},該查詢簽章定義為(1110100100)。其中,該等數據簽章相同序號(Item ID)之位元出現1(第一字元)之個數紀錄於I_C欄位,例如,第一序號之字元為1的有P1、P2、P6、P7、P8、P9,共6個數據簽章;該等數據簽章相同序號之位元出現1之個數與該等數據簽章數量N之中間值4.5(N*1/2)之差值(MD=∣I_C-(1/2)N∣)紀錄於MD欄位,例如,Item ID為1之數據簽章(P1),其差值MD為:∣6-(1/2)*9∣=1.5。An embodiment of establishing the forensic tree in the effective signature method of the inexact information filtering of the present invention will be described in detail with reference to FIGS. 4 to 8. Referring first to FIG. 4 and FIG. 5, in this embodiment, there are 9 data signatures (P1-P9, N=9, where N represents the number of data signatures), and the number of setting bits of the data signature is 10 Bit (D={1,2,3,4,5,6,7,8,9,10}), the query item is {1,2,3,5,8}, and the query signature is defined as ( 1110100100). Wherein, the number of occurrences of the same number (Item ID) of the data signature is 1 (the first character) is recorded in the I_C field, for example, the character of the first serial number is 1 and there are P1, P2, and P6. , P7, P8, P9, a total of 6 data signatures; the difference between the number of 1s with the same serial number of the data signature and the intermediate value of the data signature number N (N*1/2) The value (MD=∣I_C-(1/2)N∣) is recorded in the MD field. For example, the data signature (P1) with Item ID is 1, and the difference MD is: ∣6-(1/2)* 9∣=1.5.

接著,選定MD欄位中最小差值所相應之Item ID序號為第一識別位元序號。其中,最小差值位於Item ID序號為3、4、6、7、8處,在本實施例中,係選定Item ID序號為3作為該第一識別位元序號,並紀錄於相應識別單元中。Next, the item ID number corresponding to the smallest difference in the selected MD field is the first identification bit number. The minimum difference is located at the item ID number of 3, 4, 6, 7, and 8. In this embodiment, the item ID number is selected as the first identification bit number, and is recorded in the corresponding identification unit. .

參考圖6,根據該第一識別位元序號(3),將第3位字元為1之資訊數據區分為一第一資訊數據集團(P4、P5、P7、P8),及將第3位字元為0(第二字元)之資訊數據區分為一第二資訊數據集團(P1、P2、P3、P6、P9)。Referring to FIG. 6, according to the first identification bit number (3), the information data of the third character is 1 is divided into a first information data group (P4, P5, P7, P8), and the third bit is The information data whose character is 0 (second character) is divided into a second information data group (P1, P2, P3, P6, P9).

參考圖7,根據相應該第一識別位元序號(3)以外之位元(即根據第1、2、4-10個位元)且重複上述選定Item ID序號作為第一識別位元序號之步驟,及根據第一識別位元序號區分資訊數據為下階第一資訊數據集團及第二資訊數據集團之步驟,直至區分後之第一資訊數據集團及第二資訊數據集團僅包括一資訊數據,再根據該等相應識別單元(第一識別位元序號)以建立一鑑識樹(如圖8所示)。Referring to FIG. 7, the selected item ID number is repeated as the first identification bit number according to the bit other than the first identification bit number (3) (ie, according to the first, second, and fourth to tenth bits). And the step of distinguishing the information data into the first information data group and the second information data group according to the first identification bit number until the differentiated first information data group and the second information data group only include one information data And according to the corresponding identification unit (first identification bit number) to establish a forensic tree (as shown in FIG. 8).

本發明之鑑識樹可對應至常見之二元樹,其中,最高階之識別單元可視為二元樹之根節點,而最高階識別單元以下之所有識別單元可視為二元樹之子節點。The forensic tree of the present invention can correspond to a common binary tree, wherein the highest-order identification unit can be regarded as the root node of the binary tree, and all the identification units below the highest-order identification unit can be regarded as the child nodes of the binary tree.

建立該鑑識樹之後,即由上而下根據該鑑識樹中各階之第一識別位元序號,比對相對於各階第一識別位元序號之該查詢簽章之位元,若該位元為1(第一字元)則往右邊走進行下一階比對,若該位元為0(第二字元)則往左邊走進行下一階比對,完成查詢簽章與該鑑識樹各階之比對後,即可得出適格之資訊數據。After the forensic tree is established, the first identification bit number of each step in the forensic tree is compared from top to bottom, and the bit of the query signature relative to each sequence first identification bit number is compared, if the bit is 1 (first character) goes to the right for the next order comparison. If the bit is 0 (second character), go to the left for the next order comparison, complete the query signature and the identification tree. After the comparison, you can get the appropriate information data.

茲以圖8詳細說明利用該鑑識樹進行該查詢項目{1,2,3,5,8}及資訊數據比對之過程,其中查詢簽章定義為(1110100100)。首先,該查詢簽章相應於最高階識別位元序號(3)序號之字元為(1),相應之左側識別字元為(0)、右側識別字元為(1),故往右邊走至下一識別位元序號(4)進行下一階比對;該查詢簽章相應於識別位元序號(4)序號之字元為(0),相應之左側識別字元為(0)、右側識別字元為(1),故往左邊走至下一識別位元序號(2)進行下一階比對;該查詢簽章相應於識別位元序號(2)序號之字元為(1),相應之左側識別字元為(0)、右側識別字元為(1),故往右邊走至下一識別位元序號(9)進行下一階比對;最後,該查詢簽章相應於最後一階識別位元序號(9)序號之字元為(0),相應之左側識別字元為(0)、右側識別字元為(1),故往左邊走至單一相應之資訊數據P9(若應用習知技術之過濾方法,則最後結果得出複數個適格之資訊數據)。The process of querying the query items {1, 2, 3, 5, 8} and information data using the forensic tree is described in detail with reference to FIG. 8, wherein the query signature is defined as (1110100100). First, the query signature corresponds to the highest-order identification bit number (3), the character number is (1), the corresponding left-side identification character is (0), and the right-side identification character is (1), so go to the right The next identification bit number (4) is performed for the next order comparison; the query signature corresponds to the identification bit number (4), the number of the character is (0), and the corresponding left identification character is (0), The right recognition character is (1), so go to the left to the next identification bit number (2) for the next order comparison; the query signature corresponds to the identification bit number (2) the number of characters is (1) ), the corresponding left-side recognition character is (0), and the right-side identification character is (1), so go to the right to the next identification bit number (9) for the next-order comparison; finally, the query signature corresponds accordingly In the last-order identification bit number (9), the character number is (0), the corresponding left-side identification character is (0), and the right-side identification character is (1), so go to the left to a single corresponding information data. P9 (If the filtering method of the prior art is applied, the final result is a plurality of suitable information data).

要強調的是,在本發明之非精確資訊過濾之有效簽章式方法中,該鑑識樹(一位元)亦可再延伸為具有多位元過濾之多位元鑑識樹,其中查詢簽章與數據簽章每次係以多位元之比對方式進行比對,更可精確過濾該等資訊數據。在本實施例中,同樣以設定位元數D為10位之數據簽章P1-P9以及查詢項目為{1,2,3,5,8}為例說明,其中查詢簽章定義為(1110100100)。It should be emphasized that in the effective signature method of the inexact information filtering of the present invention, the forensic tree (one element) can be further extended to a multi-bit forensic tree with multi-bit filtering, wherein the query signature Compared with the data signature, each time the multi-bit comparison method is used, the information data can be accurately filtered. In this embodiment, the data signatures P1-P9 with the number of bits D being 10 bits and the query items being {1, 2, 3, 5, 8} are also illustrated as an example, wherein the query signature is defined as (1110100100). ).

配合參考圖8、圖9及圖10至圖20,根據該鑑識樹最高階之識別單元(第一識別位元序號為3),該鑑識樹被界定為一左側部分及一右側部分(如圖8虛線所界定),並且,定義複數個初始指標模組51及複數個識別單元指標模組52(參考圖10),每一初始指標模組51包括一初始識別簽章紀錄單元(I_key)511及一初始聯合指標單元(U_key)512,該鑑識樹中最低階之每一資訊數據之數據簽章分別紀錄於相應初始識別簽章紀錄單元511,該等初始聯合指標單元512之初始為空無資料(null)。每一識別單元指標模組52包括一識別位元序號指標單元(key_set)521、一聯合指標單元522、一識別簽章紀錄單元523及二識別位元比對紀錄單元524。Referring to FIG. 8 , FIG. 9 and FIG. 10 to FIG. 20 , according to the identification unit of the highest level of the forensic tree (the first identification bit number is 3), the forensic tree is defined as a left part and a right part (as shown in FIG. 8 is defined by a dotted line), and a plurality of initial indicator modules 51 and a plurality of identification unit index modules 52 (refer to FIG. 10) are defined. Each initial index module 51 includes an initial identification signature recording unit (I_key) 511. And an initial joint indicator unit (U_key) 512, the data signature of each of the lowest order information data in the forensic tree is recorded in the corresponding initial identification signature record unit 511, and the initial joint indicator unit 512 is initially empty. Information (null). Each identification unit indicator module 52 includes an identification bit number indicator unit (key_set) 521, a joint indicator unit 522, an identification signature recording unit 523, and a second identification bit comparison unit 524.

本發明建立該多位元鑑識樹之方法中,在圖3之步驟S3建立1位元之鑑識樹之後更包括以下步驟。參考步驟S31及圖11,將該左側部分最低階之二初始識別簽章紀錄單元511之二初始識別簽章(即數據簽章P4、P5)進行互斥或閘(XOR)運算,以計算一識別位元序號組(7,9),並將其紀錄於相應識別單元53及相應識別單元指標模組52之識別位元序號指標單元521。其中,該階之識別位元序號組即係包括該第一識別位元序號(7)及一第二識別位元序號(9)。In the method for establishing the multi-bit forensic tree of the present invention, after the one-bit forensic tree is established in step S3 of FIG. 3, the following steps are further included. Referring to step S31 and FIG. 11, the initial identification signature (ie, data signatures P4, P5) of the second lowest partial second initial identification signature recording unit 511 is mutually exclusive or gated (XOR) operation to calculate one. The bit number group (7, 9) is identified and recorded in the corresponding identification unit 53 and the identification bit number index unit 521 of the corresponding identification unit index module 52. The identification bit number group of the order includes the first identification bit number (7) and a second identification bit number (9).

參考步驟S32及圖12,聯集該識別位元序號組(7,9)及該聯合指標單元522之值(初始值為null),以形成一新識別位元序號組(7,9),並將其紀錄於相應識別單元指標模組之聯合指標單元522。Referring to step S32 and FIG. 12, the identification bit number group (7, 9) and the joint indicator unit 522 are combined (the initial value is null) to form a new identification bit number group (7, 9). And record it in the joint indicator unit 522 of the corresponding identification unit indicator module.

參考步驟S33及圖13,將相應於數據簽章P4及P5之該等初始識別簽章紀錄單元511((0101010110)及(0101011100))進行及閘(AND)運算,以計算一識別簽章(0101010100),並將其紀錄於相應識別單元指標模組52之識別簽章紀錄單元(I_key)523。Referring to step S33 and FIG. 13, the initial identification signature recording units 511 ((0101010110) and (0101011100)) corresponding to the data signatures P4 and P5 are ANDed to calculate an identification signature ( 0101010100), and record it in the identification signature record unit (I_key) 523 of the corresponding identification unit indicator module 52.

參考步驟S34及圖14,將P4及P5之二數據簽章相應於該第一及第二識別位元序號(7,9)位置之字元定義為識別字元,並將其紀錄於相應識別單元指標模組52之二識別位元比對紀錄單元524。在本實施例中,相對於數據簽章P4之識別字元為(01),相對於數據簽章P5之識別字元為(10)。Referring to step S34 and FIG. 14, the characters corresponding to the first and second identification bit numbers (7, 9) of the P4 and P5 data signatures are defined as identification characters, and are recorded in the corresponding identification. The unit indicator module 52 bis identifies the bit alignment record unit 524. In the present embodiment, the identification character with respect to the data signature P4 is (01), and the identification character with respect to the data signature P5 is (10).

參考步驟S35及圖15,重複步驟S31至S34,計算該左側部分中同階之識別單元(相應於數據簽章P7及P8)之新識別位元序號組(2,6,9)、識別簽章(1000001100)及識別字元,並將其分別紀錄於相應識別單元54、相應識別單元指標模組55之識別位元序號指標單元551、聯合指標單元552、識別簽章紀錄單元553及二識別位元比對紀錄單元554。在本實施例中,相對於數據簽章P7之識別字元為(001),相對於數據簽章P8之識別字元為(110)。Referring to step S35 and FIG. 15, steps S31 to S34 are repeated to calculate a new identification bit number group (2, 6, 9) and identification mark of the same-order identification unit (corresponding to the data signatures P7 and P8) in the left part. Chapter (1000001100) and identification characters are recorded in the corresponding identification unit 54, the identification bit number index unit 551 of the corresponding identification unit index module 55, the joint indicator unit 552, the identification signature recording unit 553, and the second identification. The bit alignment is recorded in unit 554. In the present embodiment, the identification character with respect to the data signature P7 is (001), and the identification character with respect to the data signature P8 is (110).

參考步驟S36及圖16,聯集該左側部分中同一下階之新識別位元序號組(7,9)及(2,6,9),以形成一聯集識別位元序號組(2,6,7,9),並將其紀錄於上階相應識別單元指標模組56之聯合指標單元562。Referring to step S36 and FIG. 16, the new identification bit number groups (7, 9) and (2, 6, 9) of the same lower order in the left part are combined to form a union identification bit number group (2, 6, 7, 9), and record it in the joint indicator unit 562 of the upper-level corresponding identification unit indicator module 56.

參考步驟S37及圖17,將該左側部分相對於同一上階之識別單元57之二下階識別簽章紀錄單元523(相應識別簽章 為(0101010100)及(1000001100))進行互斥或閘運算,其運算結果為(1101011000),並且取得運算所得結果中出現1之位元序號之集合(1,2,4,6,7),排除該等下階之聯集識別位元序號組(2,6,7,9)之值,以形成一上階之識別位元序號組(1,4),並將其紀錄於相應識別單元57及相應識別單元指標模組56之識別位元序號指標單元561。Referring to step S37 and FIG. 17, the left portion is identified with respect to the second upper-order identification signature recording unit 523 of the same upper-order identification unit 57 (corresponding identification signature) Mutual exclusion or gate operation for (0101010100) and (1000001100), the operation result is (1101011000), and the set of bit numbers (1, 2, 4, 6, 7) in which 1 is found in the result of the operation is obtained. Excluding the values of the lower-order union identification bit number group groups (2, 6, 7, 9) to form an upper-order identification bit number group group (1, 4), and recording the corresponding identification unit 57 and the identification bit number indicator unit 561 of the corresponding identification unit indicator module 56.

參考步驟S38及圖17、圖18,聯集識別位元序號指標單元561及聯合指標單元562,並將運算結果紀錄至聯合指標單元562,亦即,聯集該聯集識別位元序號組(2,6,7,9)及該上階識別位元序號組(1,4)之值,以形成一上階之聯集識別位元序號組(1,2,4,6,7,9),並將其紀錄於相應識別單元指標模組56之聯合指標單元562。Referring to step S38 and FIG. 17 and FIG. 18, the combination identifying the bit number index unit 561 and the joint indicator unit 562, and recording the operation result to the joint indicator unit 562, that is, the union identification bit number group ( 2, 6, 7, 9) and the value of the upper-order identification bit number group (1, 4) to form an upper-order union identification bit number group (1, 2, 4, 6, 7, 9 And record it in the joint indicator unit 562 of the corresponding identification unit indicator module 56.

參考步驟S39及圖18,將該左側部分相對於同一上階識別單元57之二下階識別簽章紀錄單元523及識別簽章紀錄單元553(即識別簽章(0101010100)及(1000001100))進行及閘運算,以計算一上階之識別簽章(運算結果為(0000000100)),並將其紀錄於相應識別單元指標模組56之識別簽章紀錄單元563。Referring to step S39 and FIG. 18, the left portion is performed with respect to the second lower-order identification signature recording unit 523 and the identification signature recording unit 553 (ie, identification signatures (0101010100) and (1000001100)) of the same upper-order identification unit 57. And the gate operation to calculate an upper-level identification signature (the operation result is (0000000100)), and record it in the identification signature recording unit 563 of the corresponding identification unit indicator module 56.

參考步驟S40及圖19,將該二下階之識別簽章(0101010100)及(1000001100)相應於該上階識別位元序號組(1,4)序號位置之字元定義為上階識別字元,並將其紀錄於相應識別單元指標模組56之上階識別位元比對紀錄單元564。在本實施例中,相對於下階之識別簽章(0101010100)之識別字元為(01),相對於下階之識別簽章 (1000001100)之識別字元為(10)。Referring to step S40 and FIG. 19, the identifiers of the two lower-order identification signatures (0101010100) and (1000001100) corresponding to the position number of the upper-order identification bit number group (1, 4) are defined as upper-level identification characters. And recording it in the upper recognition bit comparison record unit 564 of the corresponding identification unit indicator module 56. In this embodiment, the identification character relative to the lower-level identification signature (0101010100) is (01), and the identification signature relative to the lower stage is The identification character of (1000001100) is (10).

要注意的是,若鑑識樹係為一更多階之二元樹,則需進行一步驟S41,重複步驟S36至S40,計算該左側部分中其他下階二識別簽章之同一上階識別單元之聯集識別位元序號組、上階之聯集識別位元序號組、上階識別簽章及上階識別字元,並將其分別紀錄於相應識別單元、相應識別單元指標模組之聯合指標單元、識別位元序號指標單元、識別簽章紀錄單元及二識別位元比對紀錄單元。It should be noted that if the forensic tree is a more-order binary tree, a step S41 is performed, and steps S36 to S40 are repeated to calculate the same upper-order identification unit of the other lower-order identification tags in the left part. The combination identification bit number group group, the upper level combination identification bit number group group, the upper order identification signature and the upper order recognition character are respectively recorded in the combination of the corresponding identification unit and the corresponding identification unit indicator module The indicator unit, the identification bit number indicator unit, the identification signature record unit, and the second identification bit comparison record unit.

參考步驟S42及圖20,利用步驟S31至S41(若需要)中之方式,計算該右側部分各階之識別位元序號組、聯集識別位元序號組、識別簽章及識別字元,並將其分別紀錄於相應各階識別單元、相應各階識別單元指標模組之識別位元序號指標單元、聯合指標單元、識別簽章紀錄單元及二識別位元比對紀錄單元。其中步驟S31至S41中之計算方式已詳述如上,在此不再加以贅述。Referring to step S42 and FIG. 20, using the manner of steps S31 to S41 (if needed), the identification bit number group, the union identification bit number group group, the identification signature and the recognition character of each step of the right part are calculated, and They are respectively recorded in the corresponding order identification unit, the identification bit number index unit of the corresponding order identification unit indicator module, the joint indicator unit, the identification signature record unit and the second identification bit comparison record unit. The calculation methods in steps S31 to S41 have been described in detail above, and are not described herein again.

參考步驟S43及圖20,最後,根據該左側部分及該右側部分最高階之識別位元序號組、聯集識別位元序號組及識別簽章,計算並紀錄該多位元鑑識樹最高階之識別位元序號組(3,8)及識別字元(01)及(10),以完成該多位元鑑識樹。Referring to step S43 and FIG. 20, finally, according to the left part and the highest order identification bit number group, the joint identification bit number group and the identification signature of the right part, the highest level of the multi-bit identification tree is calculated and recorded. The bit number group (3, 8) and the recognition characters (01) and (10) are identified to complete the multi-bit forensic tree.

參考圖21,在本實施例中步驟S43另包括以下步驟。步驟S431,聯集該左側部分及該右側部分最高階之識別位元序號組,以形成一聯集識別位元序號組,並將其紀錄於相應識別單元指標模組之聯合指標單元(可比照參考步驟S32 及圖12之說明)。Referring to Fig. 21, step S43 in the present embodiment further includes the following steps. Step S431, the left side part and the highest order identification bit number group of the right part are combined to form a joint identification bit number group group, and recorded in the joint indicator unit of the corresponding identification unit indicator module (comparable Refer to step S32 And the description of Figure 12).

步驟S432,將該左側部分及該右側部分最高階之識別簽章進行互斥或閘運算,並且將該等識別簽章進行互斥或閘運算所得結果中出現該第一字元之序號之集合,排除該左側部分及該右側部分最高階之聯集識別位元序號組之值,以形成該識別位元序號組,並將其紀錄於相應識別單元指標模組之識別位元序號指標單元(可比照參考步驟S37及圖17之說明)。Step S432, performing a mutual exclusion or gate operation on the identification stamp of the highest order of the left part and the right part, and performing a mutually exclusive or gate operation on the identification signature to obtain a set of serial numbers of the first character. Excluding the value of the left-most part and the highest-order combination identification bit number group of the right part to form the identification bit number group group, and record the identification bit number indicator unit of the corresponding identification unit indicator module ( Reference may be made to the description of steps S37 and 17).

步驟S433,聯集該左側部分及該右側部分最高階之聯集識別位元序號組及識別位元序號組之值,以形成該多位元鑑識樹之最高階之聯集識別位元序號組,並將其紀錄於相應最高階識別單元指標模組之聯合指標單元(可比照參考步驟S38、圖17及圖18之說明)。Step S433, the values of the combination of the left-most part and the highest-order combination identification bit number group and the identification bit number group are combined to form a highest-order union identification bit number group of the multi-bit identification tree. And record it in the joint indicator unit of the corresponding highest-order identification unit indicator module (refer to the description of steps S38, 17 and 18).

步驟S434,將該左側部分及該右側部分最高階之識別簽章,相應於該左側部分及該右側部分最高階之識別位元序號組中序號位置之字元,定義為該多位元鑑識樹最上之階識別字元,並將其紀錄於相應識別單元指標模組之識別位元比對紀錄單元(可比照參考步驟S40及圖19之說明),如此,該多位元鑑識樹之各階具有階識別字元之識別位元比對紀錄單元(可視為根節點及各階子節點)即建立完成。Step S434, the identification mark of the leftmost part and the highest order of the right part, corresponding to the left part and the character of the serial number position in the highest order identification bit number group of the right part, is defined as the multi-bit forensic tree The uppermost level identifies the character and records it in the identification bit comparison record unit of the corresponding identification unit indicator module (refer to the description of step S40 and FIG. 19), so that each level of the multi-bit identification tree has The identification bit of the order recognition character is compared to the record unit (which can be regarded as the root node and each step child node).

茲以圖20詳細說明本發明非精確資訊過濾之有效簽章式方法。在本實施例中,同樣以設定位元數D為10位之數據簽章P1-P9以及查詢項目為{1,2,3,5,8}為例說明,其中查詢簽章以Sig(D,Qdoc )表示且將其值定義為(1110100100)。An effective signature method for inexact information filtering of the present invention will be described in detail with reference to FIG. In this embodiment, the data signatures P1-P9 with the number of bits D being 10 bits and the query items being {1, 2, 3, 5, 8} are also illustrated as an example, wherein the query signature is Sig (D). , Q doc ) indicates and defines its value as (1110100100).

在建立該多元鑑識樹之後,即由上而下根據該多元鑑識樹中各階之識別位元序號組(包括第一識別位元序號及至少一第二識別位元序號),根據各階之識別位元序號組之序號及識別字元,比對該查詢簽章及資訊數據。其中,若相對之識別字元其中之一以上為1,但該識別字元不等於該查詢簽章中相對於識別位元序號組之序號位置之字元,則不進行下一階之比對,完成查詢簽章與該多位元鑑識樹各階之比對後,即可得出適格之資訊數據。After the multi-intelligence tree is established, that is, from the top to the bottom according to the identification bit number group of each step in the multi-identification tree (including the first identification bit number and the at least one second identification bit number), according to the identification bits of each step The serial number of the meta-group and the identification character are compared to the signature and information data of the query. Wherein, if one or more of the relative recognition characters are 1, but the identification character is not equal to the character in the query signature relative to the serial number position of the identification bit number group, the next order comparison is not performed. After completing the comparison of the query signature with the various levels of the multi-bit forensic tree, an appropriate information data can be obtained.

以下茲詳細說明利用該多位元鑑識樹進行該查詢簽章及資訊數據比對之過程。首先,該查詢簽章相應於最高階識別位元序號組(3,8)序號之字元為(11),相應之左側識別字元為(01),該左側識別字元(01)中為1之第二位字元與該查詢簽章相應位置之字元相同,故進行下一階之比對;相應之右側識別字元為(10),該右側識別字元(10)中為1之第一位字元與該查詢簽章相應位置之字元相同,故進行下一階之比對。The process of using the multi-bit forensic tree to perform the query signature and information data comparison is described in detail below. First, the query signature corresponding to the highest-order identification bit number group (3, 8) serial number is (11), and the corresponding left identification character is (01), and the left identification character (01) is The second character of 1 is the same as the character of the corresponding position of the query signature, so the next order is compared; the corresponding right recognized character is (10), and the right recognized character (10) is 1 The first character is the same as the character at the corresponding position of the query signature, so the next order is compared.

接著,該查詢簽章相應於第二階(左側部分之最高階)識別位元序號組(1,4)序號之字元為(10),相應之第二階左側識別字元為(01),該第二階左側識別字元(01)中為1之第二位字元與該查詢簽章相應位置之字元不同,故不進行下一階之比對,該第二階右側識別字元(10)中為1之第一位字元與該查詢簽章相應位置之字元相同,故進行下一階之比對。Then, the query signature corresponds to the second order (the highest order of the left part), the character number of the identification bit number group (1, 4) is (10), and the corresponding second-order left identification character is (01) The second character of the second-order left-side identification character (01) is different from the character of the corresponding position of the query signature, so the next-order comparison is not performed, and the second-order right identification word is The first character in the element (10) is the same as the character in the corresponding position of the query signature, so the next order is compared.

接著,該查詢簽章相應於第二階(右側部分之最高階)識別位元序號組(4)序號之字元為(0),相應之第二階左側識別字元為(0),該第二階左側識別字元不為1,故進行下一階之比對,該第二階右側識別字元為1,與該查詢簽章相應位置之字元不同,故不進行下一階之比對。Then, the query signature corresponds to the second order (the highest order of the right part), the identification bit number group (4), the serial number of the character is (0), and the corresponding second-order left identification character is (0), The second-order left-side recognition character is not 1, so the next-order comparison is performed, and the second-order right-side recognition character is 1, which is different from the character at the corresponding position of the query signature, so the next-order is not performed. Comparison.

接著,該查詢簽章相應於第三階(左側部分之第二階)識別位元序號組(2,6,9)序號之字元為(100),相應之第三階左側識別字元為(001),該第三階左側識別字元(001)中為1之第三位字元與該查詢簽章相應位置之字元不同,故不進行下一階之比對,該第三階右側識別字元(110)中為1之第一及第二位字元與該查詢簽章相應位置之字元不同,故不進行下一階之比對。Then, the query signature corresponds to the third order (the second order of the left part), and the character number of the identification bit number group (2, 6, 9) is (100), and the corresponding third-order left identification character is (001), the third character of the third-order left-side identification character (001) is different from the character of the corresponding position of the query signature, so the next-order comparison is not performed, and the third-order The first and second character characters of the right side recognition character (110) are different from the characters of the corresponding position of the query signature, so the next order comparison is not performed.

接著,該查詢簽章相應於第三階(右側部分之第二階)識別位元序號組(2,5,6)序號之字元為(110),相應之第三階左側識別字元為(011),該第三階左側識別字元(011)中為1之第三位字元與該查詢簽章相應位置之字元不同,故不進行下一階之比對,該第三階右側識別字元(100)中為1之第一位字元與該查詢簽章相應位置之字元相同,故進行下一階之比對。Then, the query signature corresponds to the third order (the second order of the right part), and the character number of the identification bit number group (2, 5, 6) is (110), and the corresponding third-order left identification character is (011), the third character of the third-order left-side identification character (011) is different from the character of the corresponding position of the query signature, so the next-order comparison is not performed, and the third-order The first character of the right recognizing character (100) of 1 is the same as the character of the corresponding position of the query signature, so the next order comparison is performed.

最後,該查詢簽章相應於第四階(右側部分之第三階,本實施例之最後一階)識別位元序號組(9,10)序號之字元為(00),相應之第四階左側識別字元為(00),該第四階左側識別字元不為1,故往左邊走至單一相應之資訊數據P9(若應用習知技術之過濾方法,則最後結果得出複數個適格之資訊數據),該第四階右側識別字元(11)中為1之第一及第二位字元與該查詢簽章相應位置之字元不同,故不進行下一階之比對。Finally, the query signature corresponds to the fourth order (the third order of the right part, the last stage of the embodiment), and the character number of the digit number group (9, 10) is (00), corresponding to the fourth The left-side recognition character is (00), and the fourth-order left-side recognition character is not 1, so it goes to the left to a single corresponding information data P9 (if a filtering method of the prior art is applied, the final result is a plurality of Qualified information data), the first and second character characters of the fourth-order right-side recognition character (11) are different from the characters at the corresponding positions of the query signature, so the next-order comparison is not performed. .

利用本發明之多位元鑑識樹,可於同一階(次)比對時進行多位元之比對,不僅可加速該查詢簽章及資訊數據比對之過程,更可減少過濾後適格資訊數據之數量,故具有較佳之過濾精確度及效率。By using the multi-bit forensic tree of the invention, the multi-bit comparison can be performed in the same order (secondary) comparison, which not only accelerates the process of the query signature and the information data comparison, but also reduces the filtered information. The amount of data has better filtering accuracy and efficiency.

本發明非精確資訊過濾之有效簽章式方法不以區域性的方式分割資料群(該等資訊數據),而是利用簽章式策略採用全域性的根據每個項目(每一資訊數據)在整個資料群的出現次數來進行分割,資料群被分割為二個子群集(該第一資訊數據集團及該第二資訊數據集團)並據以建立鑑識樹。根據所建立之鑑識樹(ID-tree)的索引方式來儲存用戶簽章,再利用建立好的索引,快速地處理大量的用戶資料,其為有效率的索引及查詢方法,故可減少資訊過濾時所需花費的時間。例如,針對網際網路傳送的資料進行資訊過濾,利用預先針對使用者的用戶簽章建好之索引,快速且直接地過濾所有不符合的用戶。The effective signature method of the inexact information filtering of the present invention does not divide the data group (the information data) in a regional manner, but uses the signature strategy to adopt a global basis according to each item (each information data). The number of occurrences of the entire data group is divided, and the data group is divided into two sub-clusters (the first information data group and the second information data group) and the forensic tree is established accordingly. According to the established ID-tree indexing method, the user signature is stored, and the established index is used to quickly process a large amount of user data, which is an efficient indexing and query method, thereby reducing information filtering. The time it takes. For example, information filtering is performed on the data transmitted by the Internet, and all the non-compliant users are quickly and directly filtered by using an index built in advance for the user's signature.

在實際應用上,本發明之方法可適用於網際網路資料庫:架設於全球資訊網WWW上的資料庫網站,可透過此資訊過濾技術把資訊主動傳送給有興趣的使用者,或想研究媒體群播的業者,可透過本發明之資訊過濾技術減少找出他們在媒體發送的對象上所需花費的時間。In practical applications, the method of the present invention can be applied to an Internet database: a database website located on the World Wide Web (WWW), through which information filtering technology can be actively transmitted to interested users, or to research Media group broadcasters can reduce the time it takes to find objects they send on the media through the information filtering techniques of the present invention.

上述實施例僅為說明本發明之原理及其功效,並非限制本發明。因此習於此技術之人士對上述實施例進行修改及變化仍不脫本發明之精神。本發明之權利範圍應如後述之申請專利範圍所列。The above embodiments are merely illustrative of the principles and effects of the invention and are not intended to limit the invention. Therefore, those skilled in the art can make modifications and changes to the above embodiments without departing from the spirit of the invention. The scope of the invention should be as set forth in the appended claims.

51...初始指標模組51. . . Initial indicator module

52...識別單元指標模組52. . . Identification unit indicator module

53、54...下階識別單元53, 54, . . Lower recognition unit

55...識別單元指標模組55. . . Identification unit indicator module

56...識別單元指標模組56. . . Identification unit indicator module

57...上階識別單元57. . . Upper recognition unit

511...初始識別簽章紀錄單元511. . . Initial identification signature record unit

512...初始聯合指標單元512. . . Initial joint indicator unit

521...識別位元序號指標單元521. . . Identification bit number indicator unit

522...聯合指標單元522. . . Joint indicator unit

523...識別簽章紀錄單元523. . . Identification signature unit

524...識別位元比對紀錄單元524. . . Identification bit alignment unit

551...識別位元序號指標單元551. . . Identification bit number indicator unit

552...聯合指標單元552. . . Joint indicator unit

553...識別簽章紀錄單元553. . . Identification signature unit

554...識別位元比對紀錄單元554. . . Identification bit alignment unit

561...識別位元序號指標單元561. . . Identification bit number indicator unit

562...聯合指標單元562. . . Joint indicator unit

563...識別簽章紀錄單元563. . . Identification signature unit

564...識別位元比對紀錄單元564. . . Identification bit alignment unit

圖1顯示習知比對用戶簽章及資料索引之示意圖;Figure 1 shows a schematic diagram of a conventional comparison of user signatures and data indexing;

圖2顯示本發明資訊數據之數據簽章與查詢項目之查詢簽章之關係圖;2 is a diagram showing the relationship between the data signature of the information data of the present invention and the query signature of the query item;

圖3顯示本發明非精確資訊過濾之有效簽章式方法流程圖;3 is a flow chart showing an effective signature method of the inaccurate information filtering of the present invention;

圖4至圖8顯示本發明建立一鑑識樹之過程示意圖;4 to 8 are schematic diagrams showing the process of establishing a forensic tree according to the present invention;

圖9顯示本發明根據該鑑識樹建立一多位元鑑識樹之流程圖;FIG. 9 is a flowchart of the present invention for establishing a multi-bit forensic tree according to the forensic tree; FIG.

圖10至圖20顯示本發明建立該多位元鑑識樹之過程示意圖;及10 to 20 are schematic diagrams showing the process of establishing the multi-bit forensic tree of the present invention; and

圖21顯示本發明計算該多位元鑑識樹最高階識別單元相應之識別位元序號組及識別字元之流程圖。FIG. 21 is a flow chart showing the calculation of the identification bit number group and the identification character corresponding to the highest order identification unit of the multi-bit identification tree.

(無元件符號說明)(no component symbol description)

Claims (8)

一種非精確資訊過濾之有效簽章式方法,用以過濾複數個資訊數據(profile)中符合使用者查詢項目之適格(qualified)者,每一資訊數據具有一數據簽章(signature),該數據簽章具有一設定位元數,該數據簽章之該等位元具有相應之序號,每一位元具有二位元中之一第一字元或一第二字元,該查詢項目具有一查詢簽章,若該數據簽章包含於該查詢簽章,則該資訊數據符合該查詢項目之過濾條件為適格,該方法包括以下步驟:(a)根據該等數據簽章於相同序號位元出現該第一字元之個數及該等數據簽章數量之一中間值,計算一第一識別位元序號,且將該第一識別位元序號紀錄於一識別單元中,其中該中間值為數據簽章數量之一半,該識別單元為一鑑識樹(ID-tree)之一根節點;(b)根據該第一識別位元序號區分該等資訊數據為一第一資訊數據集團及一第二資訊數據集團,其中該第一資訊數據集團中每一資訊數據之數據簽章相應該第一識別位元序號之位元係為該第一字元,該第二資訊數據集團中每一資訊數據之數據簽章相應該第一識別位元序號之位元係為該第二字元;(c)依據相應該第一識別位元序號以外之位元且重複步驟(a)及(b),計算複數個下階之新第一識別位元序號並紀錄於相應識別單元中,該等相應識別單元為該 鑑識樹之子節點,以及根據上階之新第一識別位元序號區分得相應下階第一資訊數據集團及該第二資訊數據集團,直至區分後之第一資訊數據集團及第二資訊數據集團僅包括一資訊數據,依據該等相應識別單元以建立該鑑識樹;及(d)根據該鑑識樹及該查詢簽章計算適格之資訊數據。 An effective signature method for filtering non-precise information, which is used to filter a plurality of qualified information profiles in a plurality of information profiles, each of which has a data signature (signature), the data The signature has a set number of bits, and the bits of the data signature have corresponding serial numbers, each of the bits has one of the first characters or a second character of the two bits, and the query item has a Querying the signature, if the data signature is included in the query signature, the information data conforms to the filtering condition of the query item, and the method includes the following steps: (a) signing the same serial number bit according to the data signature An intermediate value of the number of the first character and the number of the data signatures is calculated, a first identification bit number is calculated, and the first identification bit number is recorded in a recognition unit, wherein the intermediate value As one and a half of the number of data signatures, the identification unit is a root node of an ID-tree; (b) distinguishing the information data into a first information data group and one according to the first identification bit number Second Information Data Group The data signature of each information data in the first information data group corresponding to the first identification bit number is the first character, and the data information of each information data in the second information data group The bit corresponding to the first identification bit number is the second character; (c) calculating the plurality of bits according to the bit corresponding to the first identification bit number and repeating steps (a) and (b) a new first identification bit number of the lower order and recorded in the corresponding identification unit, wherein the corresponding identification unit is The child node of the forensic tree, and the first information data group and the second information data group corresponding to the lower first order according to the new first identification bit number of the upper level, until the first information data group and the second information data group are distinguished Include only one piece of information data, based on the respective identification units to establish the forensic tree; and (d) calculate appropriate information data based on the forensic tree and the query signature. 如請求項1之方法,其中該第一字元係為1,該第二字元係為0。 The method of claim 1, wherein the first character is 1 and the second character is 0. 如請求項1之方法,其中步驟(a)包括以下步驟:(a1)計算該等數據簽章於相同序號位元出現該第一字元之個數與該中間值之複數個差值;及(a2)根據該等差值中最小值者之相應序號,定義該相應序號為該第一識別位元序號。 The method of claim 1, wherein the step (a) comprises the steps of: (a1) calculating a plurality of differences between the number of the first character and the intermediate value in the same number of bits; and (a2) The corresponding serial number is defined as the first identification bit number according to the corresponding sequence number of the minimum of the equal differences. 如請求項3之方法,其中在步驟(a2)中,若該等差值中具有複數個最小值者,定義該等差值中最小值者其中之一之相應序號為該第一識別位元序號。 The method of claim 3, wherein in step (a2), if the difference has a plurality of minimum values, the corresponding sequence number defining one of the minimum values is the first identification bit. Serial number. 如請求項1之方法,其中該查詢簽章具有該設定位元數,該查詢簽章之該等位元具有相對於該數據簽章之該等位元之序號,該查詢簽章之每一位元具有該第一字元或該第二字元,在步驟(d)中係根據該鑑識樹各階之第一識別位元序號比對該查詢簽章之該等位元,以計算適格之資訊數據。 The method of claim 1, wherein the query signature has the set number of bits, and the bits of the query signature have a serial number relative to the data signature of the data signature, each of the query signatures The bit has the first character or the second character, and in step (d), the first identification bit number of each step of the forensic tree is compared with the bit of the query signature to calculate the eligibility Information data. 如請求項1之方法,其中根據該鑑識樹最高階之識別單元界定一左側部分及一右側部分,且定義複數個初始指 標模組及複數個識別單元指標模組,每一初始指標模組包括一初始識別簽章紀錄單元,每一識別單元指標模組包括一識別位元序號指標單元、一聯合指標單元、一識別簽章紀錄單元及二識別位元比對紀錄單元,其中該鑑識樹中最低階之每一資訊數據之數據簽章分別紀錄於相應初始識別簽章紀錄單元,在步驟(c)之後另包括以下步驟:(c1)將該左側部分最低階之二識別簽章進行互斥或閘(XOR)運算,以計算一識別位元序號組,並將其紀錄於相應識別單元及相應識別單元指標模組之識別位元序號指標單元,其中該階之識別位元序號組包括該第一識別位元序號及至少一第二識別位元序號;(c2)聯集該識別位元序號組及該聯合指標單元之值,以形成一新識別位元序號組,並將其紀錄於相應識別單元指標模組之聯合指標單元;(c3)將該左側部分最低階之二初始識別簽章進行及閘(AND)運算,以計算一識別簽章,並將其紀錄於相應識別單元指標模組之識別簽章紀錄單元;(c4)將該二初始數據簽章相應於該第一識別位元序號及該第二識別位元序號位置之字元定義為識別字元,並將其紀錄於相應識別單元指標模組之二識別位元比對紀錄單元;(c5)重複步驟(c1)至(c4),計算該左側部分中其他同階 之新識別位元序號組、識別簽章及識別字元,並將其分別紀錄於相應識別單元、相應識別單元指標模組之識別位元序號指標單元、聯合指標單元、識別簽章紀錄單元及二識別位元比對紀錄單元;(c6)聯集該左側部分中同一下階二識別單元之新識別位元序號組,以形成一聯集識別位元序號組,並將其紀錄於相應識別單元指標模組之聯合指標單元;(c7)將該左側部分相對於同一上階之識別單元之二下階識別簽章進行互斥或閘運算,並且將該等識別簽章進行互斥或閘運算所得結果中出現該第一字元之序號之集合,排除該等下階之聯集識別位元序號組之值,以形成一上階之識別位元序號組,並將其紀錄於相應識別單元及相應識別單元指標模組之識別位元序號指標單元;(c8)聯集該聯集識別位元序號組及該上階識別位元序號組之值,以形成一上階之聯集識別位元序號組,並將其紀錄於相應識別單元指標模組之聯合指標單元;(c9)將該左側部分相對於同一上階識別單元之二下階識別簽章進行及閘運算,以計算一上階之識別簽章,並將其紀錄於相應識別單元指標模組之識別簽章紀錄單元; (c10)將該二下階之識別簽章相應於該上階識別位元序號組序號位置之字元定義為上階識別字元,並將其紀錄於相應識別單元指標模組之上階識別位元比對紀錄單元;(c11)重複步驟(c6)至(c10),計算該左側部分中其他下階二識別簽章之同一上階識別單元之聯集識別位元序號組、上階之聯集識別位元序號組、上階識別簽章及上階識別字元,並將其分別紀錄於相應識別單元、相應識別單元指標模組之聯合指標單元、識別位元序號指標單元、識別簽章紀錄單元及二識別位元比對紀錄單元;(c12)利用步驟(c1)至(c11)中之方式,計算該右側部分各階之識別位元序號組、聯集識別位元序號組、識別簽章及識別字元,並將其分別紀錄於相應各階識別單元、相應各階識別單元指標模組之識別位元序號指標單元、聯合指標單元、識別簽章紀錄單元及二識別位元比對紀錄單元;及(c13)根據該左側部分及該右側部分最高階之識別位元序號組、聯集識別位元序號組及識別簽章識,計算並紀錄該鑑識樹最高階之識別位元序號組及識別字元。 The method of claim 1, wherein a left part and a right part are defined according to the highest order identification unit of the forensic tree, and a plurality of initial fingers are defined a standard module and a plurality of identification unit indicator modules, each initial indicator module includes an initial identification signature recording unit, each identification unit indicator module includes an identification bit number indicator unit, a joint indicator unit, and an identification The signature record unit and the second identification bit comparison record unit, wherein the data signature of each of the lowest order information data in the forensic tree is respectively recorded in the corresponding initial identification signature record unit, and after step (c), the following includes Step: (c1) performing a mutual exclusion or gate (XOR) operation on the lowest-order second identification signature of the left part to calculate a recognition bit number group, and recording the same in the corresponding identification unit and the corresponding identification unit indicator module. The identification bit number indicator unit, wherein the identification bit number group of the order includes the first identification bit number and the at least one second identification bit number; (c2) combining the identification bit number group and the joint indicator The value of the unit is formed to form a new identification bit number group, and is recorded in the joint indicator unit of the corresponding identification unit indicator module; (c3) the lowest order second initial identification of the left part Performing an AND operation to calculate an identification signature and record it in the identification signature record unit of the corresponding identification unit indicator module; (c4) corresponding to the first identification bit of the two initial data signatures The character number and the character of the second identification bit number position are defined as the recognition character, and are recorded in the second identification bit comparison record unit of the corresponding identification unit indicator module; (c5) repeating step (c1) to (c4), calculate the other same order in the left part The new identification bit number group, the identification signature and the recognition character are recorded in the corresponding identification unit, the identification bit number indicator unit of the corresponding identification unit indicator module, the joint indicator unit, the identification signature record unit and Second identifying the bit alignment record unit; (c6) combining the new identification bit number group of the same lower order two identification unit in the left part to form a union identification bit number group group, and recording the corresponding identification a joint indicator unit of the unit indicator module; (c7) mutually exclusive or gate operation of the left portion with respect to the second lower-order identification signature of the same upper-order identification unit, and mutually exclusive or gate the identification signatures The set of serial numbers of the first character appears in the result of the operation, and the values of the set of the identification bit of the lower order are excluded to form an upper-order identification bit number group, and the record is recorded in the corresponding identification. a unit and a corresponding identification unit indicator module identification bit number indicator unit; (c8) a combination of the union identification bit number group and the upper-order identification bit number group value to form an upper-order joint recognition Bit The group is recorded in the joint indicator unit of the corresponding identification unit indicator module; (c9) the left part is compared with the second lower-order identification signature of the same upper-order identification unit to perform a gate operation to calculate an upper order Identifying the signature and recording it in the identification and signature unit of the corresponding identification unit indicator module; (c10) defining the character of the second-order identification signature corresponding to the position of the upper-order identification bit number group as the upper-level identification character, and recording the character in the upper-level identification of the corresponding recognition unit indicator module The bit alignment comparison unit; (c11) repeating steps (c6) to (c10), calculating the union identification bit number group of the same upper-order identification unit of the other lower-order identification signatures in the left part, and the upper order The association identifies the bit number group, the upper-level identification signature and the upper-level identification character, and records them in the corresponding identification unit, the joint indicator unit of the corresponding identification unit indicator module, the identification bit number indicator unit, and the identification sign. a recording unit and a second identification bit comparison unit; (c12) calculating the identification bit number group, the union identification bit number group, and the identification of each of the right part of the right part by using the methods in steps (c1) to (c11) Signature and identification characters, and record them in the corresponding level identification unit, the identification unit number indicator unit of the corresponding level identification unit indicator module, the joint indicator unit, the identification signature record unit and the second identification bit comparison record. unit; (c13) calculating and recording the identification bit number group and the identification word of the highest order of the forensic tree according to the identification block number group, the joint identification bit number group group and the identification signature of the highest part of the left part and the right part. yuan. 如請求項6之方法,其中步驟(c13)包括以下步驟:(c131)聯集該左側部分及該右側部分最高階之識別位元序號組,以形成一聯集識別位元序號組,並將其 紀錄於相應識別單元指標模組之聯合指標單元;(c132)將該左側部分及該右側部分最高階之識別簽章進行互斥或閘運算,並且將該等識別簽章進行互斥或閘運算所得結果中出現該第一字元之序號之集合,排除該左側部分及該右側部分最高階之聯集識別位元序號組之值,以形成該識別位元序號組,並將其紀錄於相應識別單元指標模組之識別位元序號指標單元;(c133)聯集該左側部分及該右側部分最高階之聯集識別位元序號組及識別位元序號組之值,以形成該最高階之聯集識別位元序號組,並將其紀錄於相應最高階識別單元指標模組之聯合指標單元;及(c134)將該左側部分及該右側部分最高階之識別簽章,相應於該左側部分及該右側部分最高階之識別位元序號組中序號位置之字元,定義為該最高階識別字元,並將其紀錄於相應識別單元指標模組之識別位元比對紀錄單元。 The method of claim 6, wherein the step (c13) comprises the steps of: (c131) combining the left part and the highest order identification bit number group of the right part to form a union identification bit number group, and its Recording in the corresponding indicator unit of the corresponding identification unit indicator module; (c132) mutually exclusive or gate operation of the identification mark of the leftmost part and the highest part of the right part, and mutually exclusive or gate operation of the identification signatures A set of serial numbers of the first character appears in the obtained result, and the values of the left-most part and the highest-order union identification bit number group of the right part are excluded to form the identification bit number group, and the corresponding number is recorded and recorded in the corresponding Identifying a unit number indicator unit of the unit indicator module; (c133) combining the left part and the highest order joint identification bit number group of the right part and the value of the identification bit number group to form the highest order Collecting the identification bit number group and recording it in the joint indicator unit of the corresponding highest-order identification unit indicator module; and (c134) identifying the left part and the highest-order identification of the right part, corresponding to the left part And the character of the serial number position in the highest-order identification bit number group of the right part is defined as the highest-order identification character, and is recorded in the identification bit ratio of the corresponding identification unit indicator module. Record unit. 如請求項6之方法,其中在步驟(d)中係根據該鑑識樹之各階識別位元序號組之值及各階識別字元,比對該查詢簽章與該鑑識樹,以過濾出適格之資訊數據。The method of claim 6, wherein in step (d), the value of the digit sequence group and the identification digits of each order are identified according to the order of the forensic tree, and the query is signed and the forensic tree is filtered to filter out the eligibility. Information data.
TW98115183A 2009-05-07 2009-05-07 Efficient signature-based strategy for inexact information filtering TWI398780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW98115183A TWI398780B (en) 2009-05-07 2009-05-07 Efficient signature-based strategy for inexact information filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW98115183A TWI398780B (en) 2009-05-07 2009-05-07 Efficient signature-based strategy for inexact information filtering

Publications (2)

Publication Number Publication Date
TW201040742A TW201040742A (en) 2010-11-16
TWI398780B true TWI398780B (en) 2013-06-11

Family

ID=44996059

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98115183A TWI398780B (en) 2009-05-07 2009-05-07 Efficient signature-based strategy for inexact information filtering

Country Status (1)

Country Link
TW (1) TWI398780B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020157116A1 (en) * 2000-07-28 2002-10-24 Koninklijke Philips Electronics N.V. Context and content based information processing for multimedia segmentation and indexing
TW200525941A (en) * 2003-12-12 2005-08-01 Ibm Methods, apparatus and computer programs for enhanced access to resources within a network
US7328216B2 (en) * 2000-07-26 2008-02-05 Recommind Inc. System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7328216B2 (en) * 2000-07-26 2008-02-05 Recommind Inc. System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models
US20020157116A1 (en) * 2000-07-28 2002-10-24 Koninklijke Philips Electronics N.V. Context and content based information processing for multimedia segmentation and indexing
TW200525941A (en) * 2003-12-12 2005-08-01 Ibm Methods, apparatus and computer programs for enhanced access to resources within a network

Also Published As

Publication number Publication date
TW201040742A (en) 2010-11-16

Similar Documents

Publication Publication Date Title
Zheng et al. Reference-based framework for spatio-temporal trajectory compression and query processing
Li et al. Exploiting similarities of user friendship networks across social networks for user identification
CN106452450B (en) Method and system for data compression
CN106021541B (en) Distinguish the anonymous Privacy preserving algorithms of secondary k of standard identifier attribute
Mitzenmacher et al. Efficient estimation for high similarities using odd sketches
US10002142B2 (en) Method and apparatus for generating schema of non-relational database
CN111144089B (en) Method and equipment for checking difference between part list and model file of design software
US11423249B2 (en) Computer architecture for identifying data clusters using unsupervised machine learning in a correlithm object processing system
CN107180079A (en) The image search method of index is combined with Hash based on convolutional neural networks and tree
Jansson On the complexity of inferring rooted evolutionary trees
CN105279524A (en) High-dimensional data clustering method based on unweighted hypergraph segmentation
TWI398780B (en) Efficient signature-based strategy for inexact information filtering
Thachuk Indexing hypertext
CN110276609B (en) Business data processing method and device, electronic equipment and computer readable medium
Awadalla et al. Aggregate function based enhanced apriori algorithm for mining association rules
CN109543712B (en) Method for identifying entities on temporal data set
CN105653567A (en) Method for quickly looking for feature character strings in text sequential data
US20200175321A1 (en) Computer architecture for identifying data clusters using correlithm objects and machine learning in a correlithm object processing system
CN103257983A (en) Unique constraint based Deep Web entity identification method
JP2018136640A (en) Detection method, detection device and detection program
CN104463864A (en) Multistage parallel key frame cloud extraction method and system
Priya et al. Entity resolution for high velocity streams using semantic measures
Brodnik et al. Space-Efficient Data Structures, Streams, and Algorithms
Kim et al. Temporal Patterns Discovery of Evolving Graphs for Graph Neural Network (GNN)-based Anomaly Detection in Heterogeneous Networks.
CN110096640A (en) User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees