WO2014177181A1

WO2014177181A1 - A method of processing a ratings dataset

Info

Publication number: WO2014177181A1
Application number: PCT/EP2013/058931
Authority: WO
Inventors: Ihab Francis Ilyas Kaldas; Sihem Amer-Yahia; Anup K. CHALAMALLA
Original assignee: Qatar Foundation; Hoarton, Lloyd
Priority date: 2013-04-29
Filing date: 2013-04-29
Publication date: 2014-11-06
Also published as: WO2014177181A9

Abstract

Collaborative rating systems have evolved as important tools for users in dealing with information overload while making decisions pertaining to content hosted on the Web. Such systems allow users to evaluate content in the form of ratings. For example, websites such as yelp.com, imdb.com and amazon.com allow users to express their preferences by rating content-items. An interesting type of pattern in such systems is 'who' rated 'what' and 'how'. A data mining system known as PromPt is disclosed for exploring patterns in ratings given by users to items. A new type of association paradigm called promotional pattern is introduced. Promotional patterns are summarized descriptions of ratings given by a subset of users to a subset of items in the system, and the goal is to mine interesting patterns. Such functionality is demonstrated as being useful in a wide variety of real application scenarios such as business intelligence in promotion and advertising.

Description

A Method of Processing a Ratings Dataset

Description of Invention The present invention relates to a method of processing a ratings dataset, and more particularly relates to a method of mining a ratings dataset for promotional patterns.

Collaborative rating systems for products, movies, businesses, and news have proliferated rapidly on the Web. Websites such as yelp, imdb, amazon and news broadcasting sites such as digg provide a platform for users to evaluate the content-items hosted on these sites by rating them, thereby helping other users in the system make informed decisions pertaining to content of their interest. Such a collaborative rating system has a large number of users, items, and a very large number of ratings between them. An interesting type of pattern in such systems is 'who' rated 'what' and 'how'.

Users rate products on sites such as amazon.com and movies on imdb.com; each user rates many items (movies/products) and each item is rated by many users. Examples of interesting patterns in such rating systems are given below.

PATTERN EXAMPLE 1 . 80% of ratings for cold weather accessories as rated by female users between age 18-25 are 5/5.

PATTERN EXAMPLE 2. The average of ratings for James Cameron's action movies given by male students between age 25-35 is greater than 8/10.

Patterns of this kind may be known as promotional patterns. A promotional pattern is a summarized description of ratings between a subset of users and a subset of items in the system satisfying certain prespecified constraints on the ratings between them and the sets themselves. A subset of users is denoted by a userset and a subset of items is denoted by an itemset. There are limitations with conventional techniques for mining a collaborative rating system for such patterns in ratings between usersets and itemsets. Nevertheless, such patterns are a direct indication of how cross- sections of users have historically rated various categories of items, and hence offer a rich source of business intelligence in promotion and advertising.

From a micro-economic perspective, promoting specific types of products to specific communities of customers is a low cost, profit-driven marketing strategy. Retailers often design such promotions, e.g., 50% discount on cold weather accessories to women. More recently, it is known to promote individual objects through ranking in appropriate communities identified using a multi-dimensional customer database.

One of the most well studied mining tasks on such datasets is association rule mining. Often, it is formulated as a market basket problem, in which a set of items rated by a user are organized into a transaction and the dataset comprises transactions of all users. The goal of this task is to compute significant associations between two disjoint sets of items, A and B, satisfying a given support and confidence. Interestingness of a rule is specified as a query involving constraints on the composition of sets A, B and association between them. Constraints involving either set >4 or B a re called single-variable constraints, e.g., multi-dimensional constraints such as A.att = val, aggregation constraints such as sum(B. item. price) > $1000. Efficient algorithms based on frequent itemset mining have been developed by exploiting the monotonicity properties of the constraints on the itemset lattice. Here, the focus is primarily on finding sets of items co-occuring frequently. Information about preferences of cross-sections of users towards sets of items is not available. A constrained association query focuses on a subset of the transaction database from which candidate itemsets A and B are mined efficiently. However, in promotional patterns user and item are two primary entities. It requires evaluating the ratings between all candidate pairs of usersets and itemsets to discover significant rating patterns among them. Consider a ratings dataset assuming a binary rating model, for instance as shown in Fig. 1 (a). Representing this data by the transaction model (Fig. 1 (b) and Fig. 1 (c)), a promotional pattern query triggers an exponential number of constrained association mining queries each corresponding to a subset of the transaction database (equivalent to a userset). Hence, a na^'fve approach is prohibitively expensive.

Recommender systems seek to predict the 'rating' that a user would give to an item they had not yet considered, using models built from the item content (content-based systems) or from items rated by users similar to the given user (collaborative filtering), or a combination of both (hybrid systems). In the process, recommendation techniques compute some significant associations as pairs of usersets and itemsets with coherent ratings between them. However, they do not offer the flexibility needed for exploratory mining of promotional patterns. More recently, a system known as Flexrecs is proposed to provide users the flexibility to filter recommendations provided by the system based on certain criteria. However, such systems are designed for answering constrained personalized recommendation queries of each user and are not suitable for mining promotional patterns.

The present invention seeks to provide an improved method of processing a ratings dataset. One aspect of the present invention provides a method of processing a ratings dataset, the ratings dataset incorporating data identifying a plurality of users U, a plurality of items I and a set of ratings R allocated by the users U to the items I, the method comprising:

defining a subset of users U in the dataset as a userset Q^u,

defining a subset of items I in the dataset as an itemset Q¹,

receiving at least one rating constraint specifying at least one constraint on the set of ratings between the userset Q^u and the itemset Q¹,

inputting each rating constraint into a ratings summary function g(R[Q^u,Q']) θ δ to define a function of ratings between the userset Q^u and the itemset Q', where Θ is selected from one of =,>,<,≥,< and 5 e R,

projecting the ratings summary function separately as Hu(Q^u,R) on the space of all usersets and H|(Q',R) on the space of all itemsets to identify pairs of usersets and itemsets (Q^U| Q') that have a score for the ratings summary function that is greater than δ, and

analysing the identified pairs of usersets and itemsets to find patterns in the ratings dataset according to each rating constraint. Preferably, the method further comprises a rank-join method to identify pairs of usersets and itemsets, the rank-join method comprising:

calculating a score Hu(Q^u,R) for all usersets in the space,

sorting a list of usersets by the calculated scores,

calculating a score H|(Q',R) for all itemsets belonging to the space of itemsets,

sorting a list of itemsets by the calculated scores,

merging list of usersets with the list of itemsets in their sorted orders and ranking them. Conveniently, the ratings summary function is ratings count function g(R[Q^U , Q^I ]) =∑_USQ« X_;eg, l(u,i) , where I is an indicator function with value 1 if u has rated i, and 0 otherwise.

Advantageously, the ratings summary function is ratings sum function

Preferably, the ratings dataset is a binary ratings model where L{u_t) denotes the set of items rated 1 by user w. and L(Q") = L(u_i),u_i e Q" , and wherein the ratings summary function is ratings cover function g(R[Q^U , Q^! ]) = .

Conveniently, the ratin s summary function is ratings density function

Advantageously, t = (Q^U ,Q ,R[Q^U ,ζ)¹]) ^ a pattern and d_t \s the ratings density of the pattern t and wherein the ratings summary function is ratings variance function g(R[0",0'^'])

Preferably, the ratings dataset is a binary ratings model where L(u_t) denotes the set of items rated 1 by user w. and L(Q") = L(u_i),u_i e Q" , and wherein the ratings summary function is average ratings cover function

Conveniently, the ratings summary function g is the entropy of ratings distribution R(u,i) /u e Q^u,i e ( .

Advantageously, the ratings dataset incorporating data identifying a plurality of users U, a plurality of items I and a set of ratings R allocated by the users U to the items I, the method comprising:

storing the ratings dataset as a matrix with the users and the items in respective rows and columns; and

processing the matrix using a biclustering algorithm to detect biclusters of subsets of the rows and columns that exhibit a high similarity score.

Preferably, the biclustering algorithm comprises a mean square residue (MSR) function.

Conveniently, the biclustering algorithm comprises a Delta biclustering algorithm.

In order that the invention may be more readily understood, and so that further features thereof may be appreciated, embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings in which:

Figures 1 (a-c) show an example of a transaction data model for encoding binary ratings, Figure 2 is a table of projections of ratings summary functions,

Figure 3 is a schematic diagram representing a rank-join method,

Figure 4 is a schematic representation of the bottom up computation of the iceberg cube algorithm to compute user and item data cubes for four user attributes,

Figure 5 is a schematic representation showing the rank join of cuboids, Figure 6 is a schematic representation illustrating computing tree bounds,

Figure 7 is a matrix of binary ratings for users and items to illustrate biclustering, Figure 8 is a table showing dataset size,

Figures 9 (a-d) show a comparison of the overall running time for ratings summary functions for the DCRJN and rank join algorithms, Figures 10 (a-d) show a comparison of the enumeration time for ratings summary functions for the DCRJN and rank join algorithms,

Figures 1 1 (a-d) show a comparison of the aggregation time for ratings summary functions for the DCRJN and rank join algorithms,

Figures 12 (a-d) show a comparison of the sorting and rank join time for ratings summary functions for the DCRJN and rank join algorithms,

Figure 13 is a table showing sample query results,

Figures 14 (a-b) shows the overall running time for ratings summary functions,

Figures 15 (a-d) show a comparison of the threshold experiment for ratings summary functions for the DCRJN and rank join algorithms, Figures 16 (a-d) show the results of the biclustering efficiency experiments,

Figure 17 is a table showing mean residue values, and Figures 18 (a-d) show the biclustering efficiency for different sized clusters.

An embodiment of the invention, known as PromPt, seeks to provide a method and system for querying and efficiently mining interesting promotional patterns. The notion of constrained promotional pattern queries is introduced as a means to specify interestingness of patterns using constraints on usersets, itemsets, ratings between them, and patterns themselves. Denoting a subset of users in a ratings system (e.g., imdb) by Q^u and a subset of items by Q^x , examples of queries on this ratings system include : I. Find patterns (Q^u , Q^x ) such that the count of ratings given by users in Q^u to movies in Q^x is greater than 10. In general, other measures such as percentage, sum, avg, and variance of ratings can be defined for a threshold δ.

II. Find patterns (Q^u , Q^x ) such that Q^u comprises users in the age group

25-35 and Q^x comprises only movies directed by James Cameron.

III. A set of k patterns {(Q^u , Q^x )} such that usersets from no two patterns overlap by more than β^ and itemsets from no two patterns overlap by more than β2 . In this example, we are interested in a holistic constraint such as diversity in the composition of usersets and itemsets of the patterns mined. Conventional approaches have severe limitations with regards to constrained promotional pattern mining. In constrained association mining, certain constraints are characterized by their monotonicity property on the itemsets, thus enabling efficient algorithms to be developed. Constraints on promotional patterns are not necessarily monotonic on the usersets or itemsets. As an example, the constraint count of ratings given by users in

Q^u to movies in 0/ greater than a threshold δ is not monotonic in the transaction data model. Consider the transaction model representation (Fig. 1 .2 (b)). For an itemset X, the set of transactions (userset) it is contained in is denoted by T(X). One is interested in |Γ(Χ)|.|Χ| > δ. Let X1 and X2 be two itemsets such that X1 X2 , then |X1 | < |X2 | and |7^"(X1 )| > |T(X2 )|. Clearly, such constraints induce technical challenges that cannot be easily tackled using conventional techniques for constrained association mining.

Furthermore, the expressiveness of the languages used to specify constrained association queries is insufficient to express constraints on promotional patterns. Constraints on usersets and itemsets can be specified and handled similarly to approaches developed for single-variable constraints. However, rating constraints and holistic constraints such as diversity are inexpressible. Constraints are discussed in detail below in Section 2. Finally, the transaction data model in Fig. 1 is a convenient encoding for only binary ratings (0/1 ), where a user has either rated or not rated an item. However, for numerical ratings such as an item rated on a scale 1 to 5 extending the transaction model representation leads to space explosion.

While promotional pattern mining is challenging, two approaches in embodiments of the invention have been developed to answer queries involving constraints on ratings - schema-driven and ratings-driven mining, and integrate greedy algorithms for holistic pattern constraints such as diversity. Two scenarios are possible in applications related to promotion. It is sometimes useful to obtain a description of userset and itemset of a pattern. In Pattern Example 2 above, the itemset is defined by items satisfying the query {director- James Cameron' Λ genre- Action'} on a multi- dimensional item database, and the userset by {gender- male' Λ occupation='student' Λ 25 < age < 35} which is a query on the user database. In schema-driven pattern mining, the solution space of promotional patterns is confined to usersets and itemsets constructed from group-bys of a multi-dimensional data cube. In ratings-driven pattern mining, pairs of ad-hoc usersets and itemsets with ratings are discovered directly from the data.

For schema-driven pattern mining, space pruning algorithms i n embod iments of the invention were developed by exploiting the monotonicity properties of the rating constraints vis-a-vis the user and item data cubes. Important aspects of the space pruning algorithms are as follows. A given rating constraint (e.g., count of ratings> <5) is projected onto the respective spaces of usersets and itemsets and enumerate pairs of usersets and itemsets in the order of their likelihood to satisfy the given constraint. Upper bounding techniques are used to evaluate the likelihood of a userset or an itemset with which patterns satisfying the given constraint can be constructed. A technique similar to rank join i s th en u sed between the space of usersets and the space of itemsets to prune "unpromising" pairs. The tree structure of the user and item data cubes is exploited for sharing computation between sets (e.g., sharing aggregation for measures such as count, sum).

For ratings-driven pattern mining, biclustering techniques are employed to directly discover patterns in the data. The ratings dataset is represented as a matrix and is input to a biclustering algorithm which outputs biclusters of users and items between which the ratings satisfy a given constraint. A bicluster corresponds to a pair of userset and itemset. Not all rating constraints can be handled using biclustering techniques. However, the schema-driven and ratings-driven pattern mining approaches together cover substantially good number of constraints embodiments of the invention. Finally, greedy algorithms are integrated into the system for holistic pattern constraints.

1. Summary

In summary, the development of embodiments of the present invention resulted in the following:

1 . A new association paradigm called promotional pattern between a subset of users and a subset of items based on the ratings given by the users to items in a collaborative rating system.

2. The notion of constrained promotional pattern queries and study of different types of constraints involving usersets, itemsets, ratings and patterns. The monotonicity properties of rating constraints are critical to pruning algorithms for promotional pattern mining (Section 2.1 ).

3. A suite of algorithms: a) Space pruning algorithms which take advantage of the monotonicity properties of rating constraints vis-a-vis the lattices of usersets and itemsets. The algorithms use a schema-driven definition of usersets and itemsets b) Ratings-driven mining algorithms use biclustering models to mine coherent rating patterns between usersets and itemsets c) Greedy algorithms for holistic pattern constraints (Section 3).

2. DATA MODEL

The data model consists of users, items, and the ratings between them. Users are considered in a du -dimensional space U = {/41 , A2 ,■■■ , }, with the domain of each attribute Aj defined as dom(A ). Items are considered in a di -dimensional space / = {81 , B2 , . . . , }, with the domain of each attribute Bk defined as dom(Bk ). Examples of databases U and / in a movie ratings dataset are shown in Tables 1 and 2 below. Table 1: An example of user database

Table 2: An example of item database

The function R : U /→ V , where V c R, assigns a unique rating value for a pair of user and item, e.g., R(u, /^') = 4. Examples of V include {0, 1}, {1, 2, 3, 4,

5}, [-1, 1]. A userset is denoted by Q^u and an itemset is denoted by 0/ . 2.1 Constrained Promotional Pattern Queries

Constrained promotional pattern mining is a means to discover interesting rating patterns between usersets and itemsets. The end-user of the system specifies the data to be mined which includes the user data, item data and ratings. Additionally, the user needs to specify the promotional patterns he is interested in through a set of constraints.

DEFINITION 1. Constrained Promotional Pattern Query.

Given U , I , R, and a set of constraints C, the result of a constrained promotional pattern query is a set of triples satisfying C and of the form:

tfQ", 0' [0",0/]},

where R[Q^U , Q/^"] denotes the set of ratings R : Q^u * Q' → V between _a userset Q^uand an itemset Q/ respectively. Three fundamental classes of constraints for promotional pattern queries are discussed below.

• Set Constraints confine the composition of the userset Q^u and itemset Q^x between which ratings patterns are evaluated

• Rating Constraints specify constraints that the ratings between a userset Q^u and an itemset Q^x need to satisfy · Holistic Constraints are defined on a set of patterns that are together considered to be interesting to the end-user of our system

A single set constraint is denoted by c^s, a rating constraint is denoted by c^r and a holistic constraint is denoted by c^¹ . Similarly, a group of set constraints is denoted by C^s , rating constraints are denoted by C and holistic constraints are denoted by C^h .

2.1.1 Set Constraints

Let p be a generic property of the sets. A set constraint is of the form S.p Θ δ where S is a userset or itemset, Θ is one of the operators =,≠, > < < > c, _≡ and δ is a real value or Boolean depending on the operator. The property p ranges over several different types of definitions, e.g., support of sets, aggregate value on an attribute of set objects, multi-dimensional variables, etc. Let Q^u and Q^x be a userset and an itemset respectively. Some examples of set constraints are:

Q^u such that \Q^U | > 10, Q' such that \Q' \≥ 5

• Q^u satisfies a multi-dimensional constraint, u.zipcode = 70011 Vu e • Q/^" such that agg(i.price) > $100 for /^' e Q/^" and agg is one of sum, count, mm, max, avg 2.12 Rating Constraints

Rating constraints specify constraints on the set of ratings between a userset

Q^u and an itemset 0/ . Ratings Summary Function (RSF) is defined first, and then examples of RSFs and constraints are listed based on the RSFs.

DEFIN ITION 2. Ratings Summary Function, denoted by g, is a function of ratings between a set of users Q^u and a set of '^{tems 1} , g : {(Q^u , Q^x , R[Q^U , Q/^"])} → R, simply denoted as 9(R[Q^U , Q'])-

Rating constraints are expressed on an RSF, e.g., g(R[Q^u , Q/ ]) θ δ, where Θ can be one of =, > < > < and δ ε R. Some examples of g based on aggregation of ratings are listed below.

[Ratings Count] g(R[Q^u , Q ]) =∑_u^ ∑_ieQi I(u,i) where / is an indicator function with value 1 if u has rated /^', and 0 otherwise.

. [Ratings Sum] g(R[Q^u ]) =∑_η^ ∑_ieQi R(u, i)

• [Ratings Cover] Assuming binary ratings model of V, let L{u_t) denote the set of items rated 1 by user u_t , then L(Q") = L(u_i),u_i e Q" . Define

y „ y , R(u,i)

[Ratings Density] g(R[Q^u , Q' ]) • [Ratings Variance] Let t = (Q^U ,Q' ,R[Q^U ,Q']) be a pattern. Let < , denote the ratings density of the pattern t . Then ratings variance is defined as

∑_ieQi (R(u,i) - d_tf

\Q^U\Q

• Average Ratings Cover] Let L(Q") be as defined above. Define g as

• [Entropy] g is the entropy of ratings distribution R(u,i) /u e Q",i e Q

Typical rating constraints include:

Ratings Density(Q , Q' ,R\Q^U , Q'])≥ 3.0

· 0.5 < Ratings Variance(Q^u , Q^RIQ" , Q⁷]) < 0.8

Entropy(R[Q^u , Q⁷]) < 1 .0

• Compute top-/ patterns ranked by score computed by an RSF, e.g., Ratings Count. 2.1.3 Holistic Constraints

While set and rating constraints operate on the usersets, itemsets and set of ratings between them, a holistic constraint operates on a set of discovered patterns to select a subset of patterns that together satisfy the holistic constraint. This is often challenging as the number of candidate subsets of patterns is exponential . Below, we list two holistic constraints.

• [Threshold Coverage Constraint] Let P be a set of patterns and t^J = (Q_j", Q_j,R[Q_j" , Q_j

υ_Ρρ , then compute the smallest such P subject to the constraint that W has at least β_υ% of the database U. Similarly, one can define /' and discover the smallest set of patterns which cover at least /3,% of the database / • [Threshold Diversity Constraint] Let P be a set of patterns. Compute a subset P' of top-k patterns (based on the score of an RSF) from P such that and

An example to illustrate the above constraints on the movie ratings dataset (Tables 1 and 2) is as follows.

EXAMPLE 3. Let Q[ and Q₂' be defined by the usersets

and gender='fema/e respectively. Let Q[ and Q₂ be defined by the itemsets {year='1997'} and {year='1996'} respectively. Let ΐ_γ=(% ,Q[,R[Q_I ,Q[j, h = {Q₂ ^U,Q₂,R[Q₂ ^U,Q₂^- Let β_υ = 1.0 and β,- = 0.75. The set of patterns { , t₂j are said to satisfy the Threshold Coverage Constraint. 2.2 Properties of RSFs

The monotonicity properties of RSFs are critical to space pruning algorithms. Let Gu represent the powerset lattice of U, and G/ the powerset lattice of /. Let Q" ,Q" be any two elements in Gu, and Q[ and Q₂' be any two elements in G/, such that Q" c ¾" , and Q[^Q₂. The following properties of RSFs are defined based on their monotonicity on Gu and Gf.

1. [L/-monotonic] g is said to be L/-monotonic if Vg",^ e s.t. Q" contains Q\ and VQ e G_I,g(Q₂ ^U,Q^I ^(Q^Q)^ g(Q? ,Q' ,R(Q? ,Q'))

2. [/-monotonic] g is said to be /-monotonic if Vg,£¾ ^{e G}i ^s-^~ Qi contains Q₂' and νβ" _e G_u,g(Q^U,Q₂ ^I,R(Q^U,Q₂))< g(Q" ,Q_I,R(Q" ,Q[)) Among the examples of RSFs discussed above, Ratings Sum and Ratings Cover are both L/-monotonic and /-monotonic, Average Ratings Cover is L/-monotonic but not /-monotonic, and Ratings Density is neither a L/-monotonic nor an /-monotonic function. Space pruning algorithms in embodiments of the invention are developed to handle both monotonic and non-monotonic functions.

3. ALGORITHMS

Computing constrained promotional pattern queries involves, given a set of constraints, efficiently enumerating candidate usersets, candidate itemsets and patterns satisfying the constraints for all pairs of usersets and itemsets. Though all types of constraints are equally important from the perspective of constrained promotional pattern queries, due to space limitations, embodiments of the present invention seek to provide efficient algorithms for rating constraints and holistic constraints. A suite of algorithms for rating constraints, schema-driven space pruning algorithms and ratings-driven biclustering techniques are discussed in the following sections.

3.1 Schema-Driven Pattern Mining

In schema-driven pattern mining, usersets and itemsets are defined based on a schema. We first define the following terms concerning schema-driven pattern mining below.

DEFINITION 3. User Query. An atomic query q^u on a user database is an assignment of an attribute Aj = vj . A query Q_u of length I is a conjunction of I≤ d_u atomic queries on I different attributes of the user database, (Ai = vn Λ . , . Λ Α_ϋ = ν¾). A user u e U satisfies an atomic query Aj = v_j if the attribute value of u forAj is V_j . A user u satisfies query Q^u, denoted by u |= Q^u, if u satisfies each atomic query in Q^u.

DEFINITION 4. Item Query. An atomic query q¹ on an item database is an assignment of an attribute B_j = v_j . Similarly, an item query Q¹ is a conjunction of a set m≤ di of atomic queries. An item u e l satisfies the query Q¹, denoted by i |= Q', if i satisfies each atomic query in Q¹.

Examples of user query and item query are /gender='male' Λ occupation='student' and director=' James Cameron' Λ genre='Action' respectively. The space of user queries is the output of all group-bys on the user database, known as

data cube. For three attributes A^, A₂, and A₃, this space consists of the resulting groups of each of the 7 group-by operations on dimensions Α_Λ, A₂, A₃, Α_ΛΑ₂, A₂A₃, A A₃, A^A A₃. Similarly, the space of item queries is computed by the item data cube. The output of a user query is a userset and that of an item query is an itemset.

For a given rating constraint (e.g., Ratings Density> <5), the main challenge is to efficiently enumerate candidate usersets and itemsets, and compute patterns that satisfy the constraint. A general framework that lays the foundation to address the above challenge progressively will now be discussed.

For a constrained promotional pattern query, let d be a rating constraint (e.g., Ratings Density> <5) and C^s comprise all the set constraints. An approach based on the rank-join method is then discussed to compute d. It takes two sorted lists of usersets and itemsets satisfying the set constraints C^s, and outputs a set of patterns satisfying d . The general idea is as follows.

The value of an RSF on a pattern is denoted by score. A na^'fve approach computes the score for every pair of candidate userset and itemset in no particular order and outputs pairs that have scores > <5. However, one can limit this search to pairs which are more likely to have a score > <5. This is achieved by projecting the RSF on the spaces of usersets and itemsets separately, and selecting sets in the order of their likeliness to contribute to a higher score. In other words, an RSF is upper bounded by a composite monotonic function of two score projection functions which operate on the spaces of usersets and itemsets respectively. This is illustrated in Equation 4.1 as follows using Ratings Density.

g{R[Q Qⁱ = _Σ (Q^U , R]H_J [Q , R) _HU(Q R) = ^ ^ ^{R(U, L)}

In Equation 4.1 , . is the composite function of two score projection functions Hu(Q^u,R) and H, (Q',R). Η_υ upper bounds the score contribution of Q^u in patterns comprising Q^u by aggregating its ratings on the entire item database. Fig. 2 lists the projections of other aggregate RSFs.

Rank-Join. The score Hu(Q^u,R) is computed first for all usersets Q^u and they are sorted by their scores. Similarly, itemsets Q' ranked by H, {Q',R) are listed. A join between the two lists is then performed in the sorted order (on the lines of sort- merge join). The pseudocode for the method is given in Algorithm 1 . The method is illustrated in Fig. 3 for the Ratings Density function with δ = 0.3. The search on Q" terminates at Q_} and the algorithm terminates after processing Q_K" as any pair involving Q_K" and usersets ranked below it has an upper bound less than <5. There are three computational bottlenecks in this approach:

• Materializing all usersets and itemsets, which are very large in number

• Computing the aggregate component scores Hu(Q^u,R) and H, (Q',R) for each userset and itemset

· Sorting all usersets and itemsets by their Hu(Q^u,R) and H, (Q',R) scores respectively and performing a rank join between them In the next section, these bottlenecks are addressed by providing pruning optimizations. Hereinafter, the algorithms are illustrated using the Ratings Density function, and it is demonstrated below that they can be extended to other RSFs easily.

3.2 Algorithm DCRJN

The bottom up computation of an iceberg cube (BUC algorithm) is used to efficiently compute user and item data cubes. This is illustrated in Fig. 4 when there are four user attributes. Each node in this tree, called cuboid, is a group-by on a subset of attributes, and partitions the database into a number of disjoint sets. To compute rating constraints, the rank join computation is modelled between the sorted lists of all usersets and itemsets (from previous section) into multiple rank join computations between pairs of user and item cuboids (Fig. 5). The user and item data cubes are materialized on-the-fly. The pseudocode for this algorithm is given in Algorithm 2. This approach does not offer significant improvement in efficiency by itself compared to the approach considered in previous section. Several optimization strategies are discussed below using this approach. Prior to that, some preliminaries on data cubes are presented that enable those skilled in the art to understand the optimization strategies.

Algorithm 1 : GroupRJN(Su, Si )

Input: Two sorted lists of sets Su, Si and δ

Output: A result set of patterns

1 : N <— newPriorityQueueQ

2: while notEmpty(Su) do

3: Q" <- nextSubspace{Su)

4: while notEmpty(Si ) do

5: Q' <— nextSubspace(Si )

6: if Hu{Q^u,R).Hi (Q',R) < δ then

7: move to next Q^u

8: else

9: compute g(Q^u,Q',R[Q^u,Q']), insert in N

10: if Hu{Q^u,R).Hi {* ,R) < δ then 1 1 : return N

Let and P be two cuboids of the user data cube. A set Q" e P¹ is a parent of

Q_m ^u e P^m (or equivalently, is a child of Q" ) if all the attributes of≠ are in P"¹ as well. Further, Q_m" has the same values as Q" for the common attributes. For example, the set gender='male7 is a parent of /gender='male', occupation='student7, which is a child of both gender='male7 and occupation='student7. A set Q^u is said to be a most specific set (MSS) if all the attributes of the database are assigned some values. All the usersets which belong to the cuboid A₁A₂A₃A₄ in Fig. 4 are most specific sets. Further, the most specific descendant set of a set Q^u is the set of all most specific sets which have the same values for the attributes on which Q^u is grouped. For example, the most specific descendant set of (A-i = a_1;A2 = a₂) is the set of all sets (A-i = a_1;A2 = a₂,A3 = * A* - * ) where * denotes that the corresponding attribute takes all values from the domains of corresponding attributes.

The score projection functions, Η_υ and Hi are functions on sets computed by user and item data cubes respectively. Such functions can be classified into three types depending on their monotonicity properties on the data cube. For example:

1 . is said to be monotonic if /Q",Q" s.t. Q" is a parent of Q" ,

H_u(Q₂ ^U , R) < H_u(Q , R) . For example, H_u(Q^U , R) =∑_U≠QU ∑_IELR{u,i) is monotonic.

2. H_u is said to be antimonotonic if VQ",Q" s.t. Q" is a parent of Q" , is antimonotonic.

s said to be nonmonotonic if it is neither monotonic nor antimonotonic

y. R(u,i)

-,— is nonmonotonic. Similarly, H, can be categorized into monotonic, antimonotonic or nonmonotonic. Optimization techniques for Algorithm 2 are presented below.

Algorithm 2: Algorithm DCRJN

Input: Databases U, I and δ

Output: A result set of patterns

Procedure DCRankJoin:

1 : for all j = 1 to d_u do

2: DepthFirstUC(A_j, j)

Procedure DepthFirstUC(P_u, j):

Input: A user cuboid Pu, last attribute of P_u 1 : DepthFirstlC{P_u, * I )

2\ \fj = d_u then

3: Update tree upper bounds in parents of P_u

4: else

5: for all k = j + 1 to d_u do

6: Project U on attribute A_k;

7: Ρ υ ^set of unique values (A_k = v)

9: DepthFirstUC{P'u, k) Procedure DepthFirstlC(P_u,Pi):

Input: A user cuboud P_u, item cuboid P,

1 : GroupRJN{P_u,Pi )

2: if P/ is a leaf node then 3: Update tree upper bounds in parents of P,

4: else

5: for all children ΡΊ of P, do

6: DepthFirstlC{P_u,P'!)

Vertical Pruning. The rank join between the sets of an item and user cuboid proceeds similarly to the rank join method discussed in Algorithm 1 . Additionally, vertical pruning is employed to avoid computing the children of a set Q^u (or Q') if they are unlikely to produce patterns that satisfy the given rating constraints. For each set, two measures are associated, the component function score and the tree upper bound score. The component function score is the value of Hu(Q^u,R) (or H, (Q',R)), and the tree upper bound is the upper bound of Hu(Q^u,R), denoted by

, on all sets that are children of Q" in the data cube. Two conditions are checked during the rank-join between the cuboids. If Hu(Q^u,R).Hi (Q¹ ,R) < δ then the search on Q^u is discarded. Additionally, if H_U {Q R H_I ^~{Q' , R) < S the computation of the children of Q' can be avoided when the corresponding cuboid is expanded in a depth-first manner. The tree upper bound is propagated until the root node, denoted by _J (*,R) on the item data cube. For any Q^u, if

H_jj ig" , R)H _J (* , R) < δ the computation of the children of Q" can be avoided, and the algorithm terminates if H_U (*, R).H_I (*, R) < S . The computation of tree upper bounds for various score projection functions is discussed below.

Computing Tree Upper Bounds. For monotonic functions, by definition the tree upper bound of a set Q^u is the score computed by the component function on Q^u itself, Hu(Q^u,R). For anti-monotonic functions such as Η^"^) = the tree upper bound is the minimum of the scores computed by Η_υ on all most specific descendants of the set Q^u. In bottom-up data cube computation, the most specific descendants of several sets are computed before computing the sets themselves. For example, in Fig. 6 the process begins with B₁ and proceeds in a depth-first manner until the leaf node B₁B₂B₃B₄ is reached. For all sets which are parents of sets in BiB₂B₃B₄ and belong to cuboids not processed yet, the tree upper bound corresponds to the lower bound on the scores of their most specific descendants in

nontrivial upper bound is the maximum of the scores computed by Η_υ on all most specific descendants of the set Q^u.

Sharing Aggregation. Computing the score projection functions Η_υ and H, involves aggregating on the users (items) of the entire group. For example, involves aggregating for each user the ratings on the

entire item database. To minimize redundant computation, two strategies are employed. First, ratings aggregate for individual users w(∑ R{ ,i)) are pre- computed and stored in a hash table. Second, aggregate score of a group Q" can be computed from its children if the aggregate scores of all its children are known. Since BUC proceeds in a depth-first manner, aggregates for the children of several groups are available before the groups are computed. For example, in Fig. 4 scores for a group Q" of the cuboid AiA₂A₄ can be computed by simply aggregating the scores of all its children in the cuboid A₁A₂A₃A₄.

In the preceding sections, pruning algorithms are disclosed which compute rating constraints guided by the monotonicity properties of ratings summary functions. The techniques are extendible to any aggregate RSFs which can be projected on the spaces of usersets and itemsets and scores of patterns can be upper bounded by composite monotonic functions of the projections. The pruning power of the algorithms depends on three factors: 1 ) the component functions chosen to upper bound the scoring function 2) the histograms of database objects over different attributes 3) the distribution of ratings. For example, for the component functions of Ratings Density, a long tail of ratings along with a large number of itemsets with high cardinality can lead to effective pruning. This can be explained as follows. There are a large number of users who rate very few items and hence the function Η_υ has a smaller numerator score for many usersets. Also, because of the high cardinality of item groups the product Η_υ.Ηι score is small for most patterns, which effectively means greater pruning power. Later in Section 4, the effect of these factors is discussed using the results of the experiments. Further, Algorithm 2 can be parallelized easily as the rank join computations between cuboids can be performed in parallel.

3.3 Ratings-Driven Pattern Mining

In this section, a ratings-driven approach is proposed to discover promotional patterns using biclustering of the ratings matrix. Biclustering, or co-clustering is an effective technique to discover subsets of rows in a data matrix that exhibit similar behavior across a subset of columns. Given a ratings data matrix with user data along rows and item data along columns, biclustering techniques discover a set of biclusters p_k = (Q_k",Qi) such that each bicluster p_k satisfies specific characteristics of homogeneity in ratings between the userset Q_k" and itemset Q[ , where homogeneity is defined by an objective function. Biclustering algorithms enable certain types of rating constraints to be computed efficiently. Biclusters can be both overlapping as well as non-overlapping. Some important classes of biclustering algorithms relevant to promotional pattern mining are described as follows:

1 . Biclusters with constant values corresponds to a scenario where all the users in a bicluster give the same rating to all the items. 2. Biclusters with constant values on columns or rows corresponds to biclusters in which all users have the same distribution of ratings for the itemset, or all items have the same distribution of ratings for a set of users.

3. Biclusters with coherent additive values corresponds to the scenario where the ratings on each row of a bicluster add up to the same value.

4. Minimum Entropy Biclustering discovers biclusters which have an entropy less than a given threshold. Such an algorithm is useful in minimizing the variance in ratings between a set of users and a set of items. A subset of rows and a subset of columns is considered to be a bi-cluster if they together exhibit high similarity score, which is measured by a function defined as mean squared residue (MSR). The algorithm starts with the original data matrix by computing its mean squared residue and at each step, removes a row or column which results in maximum drop in the MSR until its value reaches below a threshold. It then adds a row or column with maximum rise in MSR until the value reaches above the threshold.

Delta biclustering employs a different heuristic under the same quality metric, mean squared residue. The algorithm starts with several randomly generated biclusters. At each step, it determines the best action for each row and each column with an action being deleting or adding a row or column to a bicluster. It performs best actions for every row and every column sequentially until no further improvement can be gained in the mean squared residue of the biclusters. The main advantage of Delta biclustering is that it generates overlapping biclusters with the number of biclusters specified a priori. The time complexity of both the algorithms is 0((N +M) * N * M * k * p) where k is the number of bi-clusters and p is the number of iterations needed to converge to stable biclusters. The mean squared residue can be modified accordingly in the implementation based on the type of biclusters queried for, e.g., biclusters with constant values, bicluster with constant rows, and biclusters with constant columns. 3.4 Top-k and Holistic Constraints

Techniques for handling the holistic constraints listed in Section 2 will now be discussed. Top-/ constraint can be handled in the algorithms discussed in the previous section by assigning the parameter δ to the score of the top-Z ^ pattern and updating it when the rf^h pattern changes as the algorithm progresses. For diversity and coverage threshold constraints, we post-process the promotional patterns satisfying the rating constraints using greedy heuristics which take as input a set of patterns possibly in the descending order of their RSF scores, and output a set of patterns that satisfy the given threshold constraints. One such heuristic is listed in Algorithm 3 below for Threshold Coverage constraint.

The heuristic takes as input, two threshold parameters /3i and β₂ for the user and item databases. The algorithm proceeds as follows. It maintains two sets, one is a set of users covered by the usersets in patterns seen so far, denoted by Τ_υ, and the other is a set of items covered by the itemsets in the patterns seen so far, denoted by 7_/ . For every new pattern added to the set, if \ Τ_υ\≥ ι and | T,| > β₂ we terminate. The pseudocode for the algorithm is given in Algorithm 3 below. Algorithm 3: Computing Holistic Constraints

Input: A set of patterns, coverage thresholds βι,β₂

Output: A set of patterns

2: Γ, <- 0

3: OutputSet <- {}

4: while | T_U| < β₁ and \ T, \ < β₂ do

5: NextPattern <— t≡ (Q^U,Q',R[Q^U,Q']) which maximizes the number of elements

6: T_u ^ T_u j Q^u

7: Ti <- T, Q' 8: Add t to OutputSet 4. EXPERIMENTS

In this section, the performance and effectiveness of the schema-driven pruning algorithms and biclustering techniques of embodiments of the invention is discussed. The experiments are conducted using a movielens dataset from the Group Lens Project site (http://www.grouplens.org/node/12). The dataset consists of 1 million ratings from 6040 users on 3638 movies. The user tuples have four attributes (age, gender, location, occupation). For the movie database, the attributes include 3 genres, director, writers, actors (with a rank associated indicating the significance of the actor's role), year, country, etc.

Five meaningful attributes were extracted for experimental purposes: (year,main Genre, country, director,main actor). Each user has rated on an average 100 movies on a scale 1 to 5. The experiments were conducted on a Linux machine running a Dual Core AMD Opteron processor 2.2GHz with 8GB memory. All the algorithms were implemented in Java and executed in the main memory with all the data being loaded at once in the memory and no further disk access. The analysis includes (i) performance evaluation of the space pruning algorithms in terms of running time and performance of queries involving multiple types of constraints including set, rating and holistic constraints (Section 4.1 ) (ii) performance evaluation of biclustering techniques in terms of running time and quality evaluation of the biclusters generated (Section 4.2).

4.1 Performance of Pruning Techniques

The performance of the pruning techniques was evaluated based on the following three properties of a promotional pattern query: Dataset Size. The running time of the algorithms was considered for different sizes of the dataset relevant to the mining task specified by a promotional pattern query. Six subsets of the movielens dataset of increasing size were considered for each experiment. The dataset size is characterized by the number of users, number of items, the number of ratings, and the number of usersets and itemsets for the given schema (shown in Figure 8).

Ratings Summary Functions. The algorithms were evaluated for different types of ratings summary functions characterized by their monotonicity properties on the userset and itemset lattices. The four selected RSFs were Ratings Count and Ratings Sum both of which are L/-monotonic and /-monotonic, Ratings Density which is neither L/-monotonic nor /-monotonic, and Average Ratings Cover which is U- monotonic but not /-monotonic. Constraint Type. Lastly, the performance of our algorithms were evaluated for two types of rating constraint on RSFs, namely the (1 ) threshold δ-constraint and (2) top-/ constraint. The δ-constraint computes all the patterns whose RSF score is greater than <5. The top-/ constraint computes the patterns with top-/ scores. For each RSF considered above, the value of δ was varied to cover the entire range of scores induced by the RSF on all pairs of usersets and itemsets. For example, the <5 value for Ratings Count is varied from 10¹ to 10⁵.

It was observed that a na^'fve algorithm which enumerates all pairs of usersets and itemsets in no particular order and evaluates the rating constraints on each of them is several orders of magnitude slower than the basic rank join algorithm even for smaller datasets (e.g., 1000x600 ratings matrix). Hence, the experiments focussed on analyzing and comparing the performance of basic GroupRJN algorithm and the advanced DCRJN algorithm (Section 3). The two space pruning algorithms were compared using four measures corresponding to the optimization principles discussed in Section 3: (1 ) Running Time which measures the overall running time of the algorithms (2) Enumeration Time measures the time to materialize usersets and itemsets for result computation (3) Aggregation Time measures the score aggregation time for the given RSF (g) on pairs of usersets and itemsets, their score projection functions (Η_υ and H, ) on individual sets and their tree upper bounds (4) Sorting and Rank Join Time measures the time to perform sorting and rank join in both algorithms. The performance of the algorithms is demonstrated in the following experiments.

Dataset Experiment. The performance comparison of DCRJN is presented with the basic rank join algorithm in the Figures 9 to 12 (a-d) for the four measures discussed above by varying the dataset size (Fig .8). In this experiment, the top-10 patterns are retrieved and hence the value of threshold δ for a ratings summary function is the smallest score computed by an RSF among the top-10 patterns. The running time is measured in seconds. It was observed that for Ratings Count and Ratings Sum which are L/-monotonic and /-monotonic, and to an extent Average Ratings Cover which is L/-monotonic but not /-monotonic, the principles of tree upper bound and aggregation sharing together are effective in lowering the running time by 4 to 18 orders of magnitude. For certain RSFs, over larger datasets DCRJN is 10-20 times faster. Tree upper bound control the number of usersets and itemsets materialized. Hence, the number of usersets and itemsets between which ratings are aggregated is also significantly lowered. Hence, the enumeration time, aggregate score computation and rank join time are several orders better than in basic rank join implementation.

For Ratings Density which is non-monotonic in both the usersets and itemsets, it is not possible to achieve significant improvement in pruning, except for minimizing the aggregate score computation time through sharing of aggregation . This can be explained as follows. The pruning power of the score projection functions (Η_υ and Hi) chosen for Ratings Density is minimal in both the basic rank join and DCRJN algorithms. We observed a large statistical difference between the range of values computed by the composite monotonic function of Η_υ and H, and the actual scores of patterns for Ratings Density. The actual scores for Ratings Density vary from 0 to 5, while 82% of the values computed by Η_υ(Ο^υ).Ηι (Q for patterns ^ ^R ^¹]) are greater than 5. Hence, a large number of patterns evaluated by the GroupRJN algorithm (and hence DCRJN) in this scenario have scores greater than δ at any given time. Therefore, it performs as badly as a na^'fve baseline which computes scores for all pairs. Sample query results (top-3 patterns) are provided for the four RSFs in the table shown in Fig. 13 on the largest dataset (6000x3600 ratings matrix).

Threshold Experiment. Unlike in the previous experiment where it was assumed that a top-10 constraint by which the value of threshold δ is dynamically computed as the algorithm progresses, the value of δ over the range of scores induced by an RSF is fixed and varied for experiments in this evaluation. The performance comparison (overall running time) of DCRJN is presented with the GroupRJN algorithm in Figures 15 (a-d). It was observed that for higher values of δ both algorithms perform better, and DCRJN performs 3 to 10 orders of magnitude faster for all RSFs except Ratings Density. For Ratings Density a better running time was obtained than the basic rank join implementation by an order of 0.25 (on average).

Set and Holistic Constraints. An experiment was performed with queries involving multiple constraints. A set constraint of support > 5 on the individual usersets and itemsets was added, and a holistic constraint of diversity on the usersets and itemsets of the patterns. The overall running time is plotted in Figures 14 (a-b). The grey blocks on the top represent additional time taken for evaluating set and holistic constraints. It was observed that the running time for queries involving set and holistic constraints as well scales proportionately to the time for queries involving only rating constraints. 4.2 Performance and Effectiveness of Biclustering

The efficiency of the biclustering techniques discussed in Section 3 is first evaluated. Biclustering algorithms were implemented for three different types of biclusters: 1 ) Biclusters with constant values (Type 1 ) 2) Biclusters with constant values on rows (Type 2) 3) Biclusters with constant values on columns (Type 3). The running times of the implementation is plotted by varying the number of clusters generated and dataset size in Figures 16 and 18 (a-d). The running time for delta biclustering on a dataset in which the range of matrix values is large is usually of the order of 10⁴ seconds for a 3000x500 data matrix. For the movielens ratings dataset, the range of matrix values is much smaller (1 to 5) compared to other datasets and the running time is expected to be much higher.

An implementation of an embodiment of the invention is experimented on smaller datasets extracted from the movielens dataset. Figure 16(a-d) plots the running time against number of clusters for four different sizes of the data matrix : (1 ) 200 x 100 (2) 400 x 200 (3) 600 x 300 (4) 800 x 400. Figure 18(a-d) plots running time against the dataset size by varying the number of clusters generated. The running time increases rapidly for Type 1 clusters with increase in the number of biclusters and the dataset size. They achieve a much smaller running time for Type 2 and Type 3 biclusters.

Cluster Quality. In biclustering techniques the ground truth about correct biclusters is not available in advance. Hence, the biclustering results are evaluated using a quality measure called residue function. For a perfect bicluster, the residue is zero, and a smaller residue represents a more coherent bicluster. The mean values of residue function for four different number of clusters generated is presented in the table shown in Fig. 17. The mean residue values in general can be as large as possible (the upper bound being ∞). The mean residue values in the implementation are all less than 1 and tend to be closer to 0 indicating a good cluster quality. Embodiments of the invention present Prompt, an exploratory system for mining promotional patterns in large collaborative rating systems. Collaborative rating systems generate large amounts of data in the form of ratings and text reviews given by users to items which can be leveraged to extract business intelligence for promoting sets of items to sets of users. It is important to have an expressive language for constrained promotional pattern queries specifying different types of constraints on usersets, itemsets, ratings and patterns. It is equally important, given the complexity of mining and exploration tasks involved, for the techniques employed to be computationally efficient and scalable at this level.

When used in this specification and claims, the terms "comprises" and "comprising" and variations thereof mean that the specified features, steps or integers are included. The terms are not to be interpreted to exclude the presence of other features, steps or components.

Techniques available for implementing aspects of embodiments of the invention:

[1 ] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Transactions on Information Systems (TOIS),

23(1 ):103-145, 2005.

[2] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. Knowledge and Data Engineering EEE Transactions on, 17(6):734-749, 2005.

[3] R. Agrawal, T. Imieli^'nski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, volume 22, pages 207-216. ACM, 1993.

[4] K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cube. ACM SIGMOD Record, 28(2):359-370, 1999.

[5] F. Bonchi, F. Giannotti, C. Lucchese, S. Orlando, R. Perego, and R. Trasarti. A constraint-based querying system for exploratory pattern discovery. Information Systems, 34(1 ):3-27, 2009.

[6] R. Burke. Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction, 12(4):331-370, 2002.

[7] Y. Cheng and G. Church. Biclustering of expression data. In Proceedings of the eighth international conference on intelligent systems for molecular biology, volume 1 , pages 93-103, 2000.

[8] C. Das, P. Maji, and S. Chattopadhyay. A novel biclustering algorithm for discovering value-coherent overlapping σ-biclusters. In Advanced Computing and Communications, 2008. ADCOM 2008. 16th International

[9] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51 (1 ):107-1 13, 2008.

[10] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. In ACM SIGMOD Record, volume 25, pages 13-23. ACM, 1996. [1 1 ] J. Gray, A. Bosworth, A. Lyaman, and H. Pirahesh. Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals, pages 152 - 159, feb-1 mar 1996.

[12] J. Han, H. Cheng, D. Xin, and X. Yan. Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1 ):55-86, 2007.Conference on,pages 148-156. IEEE, 2008.

[13] J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the International Conference on very large Data Bases, pages 420-431 , 1995.

[14] J. Han, L. Lakshmanan, and R. Ng. Constraint-based, multidimensional data mining. Computer, 32(8):46-50, 1999.

[15] I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Joining ranked inputs in practice. In VLDB, pages 950-961 , 2002.

[16] T. Imieli^'nski and A. Virmani. Msql: A query language for database mining. Data Mining and Knowledge Discovery, 3(4):373-408, 1999.

[17] M. Kamber, J. Han, and J. Chiang. Metarule-guided mining of multidimensional association rules using data cubes. In KDD, volume 97, page 207, 1997.

[18] J. Kleinberg, C. Papadimitriou, and P. Raghavan. A microeconomic view of data mining. Data Min. Knowl.Discov., 2(4):31 1-324, Dec. 1998.

[19] P. Kotler and K. Keller. A framework for marketing management. 2003.

[20] G. Koutrika, B. Bercovitz, and H. Garcia-Molina. Flexrecs: expressing and combining flexible recommendations. In Proceedings of the 35th SIGMOD international conferenceon Management of data, pages 745-758. ACM, 2009.

[21 ] S. Madeira and A. Oliveira. Biclustering algorithms forbiological data analysis: a survey. Computational Biologyand Bioinformatics, IEEE/ACM Transactions on,

1 (1 ):24-45, 2004.

[22] R. Ng, L. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In ACM SIGMOD Record, volume 27, pages 13-24. ACM, 1998. [23] R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In ACM SIGMOD Record, volume 25, pages 1-12. ACM, 1996.

[24] R. Srikant and R. Agrawal. Mining generalized association rules. Future Generation Computer Systems, 13(2):161-180, 1997.

[25] T. Wu, D. Xin, Q. Mei, and J. Han. Promotion analysis in multi-dimensional space. Proc. VLDB Endow., 2(1 ): 109-120, 2009.

[26] X. Zhang, P. L. Chou, and G. Dong. Efficient computation of iceberg cubes by bounding aggregate functions. IEEE Trans, on Knowl. and Data Eng., 19(7):903- 918, July 2007.

Claims

CLAIMS:

1 . A method of processing a ratings dataset, the ratings dataset incorporating data identifying a plurality of users U, a plurality of items I and a set of ratings R allocated by the users U to the items I, the method comprising:

defining a subset of users U in the dataset as a userset Q^u,

defining a subset of items I in the dataset as an itemset Q',

receiving at least one rating constraint specifying at least one constraint on the set of ratings between the userset Q^u and the itemset Q',

inputting each rating constraint into a ratings summary function g(R[Q^u,Q']) θ δ to define a function of ratings between the userset Q^u and the itemset Q', where Θ is selected from one of =,>,<,≥,< and δ≡ R,

analysing the identified pairs of usersets and itemsets to find patterns in the ratings dataset according to each rating constraint.

2. A method according to claim 1 , wherein the method further comprises a rank- join method to identify pairs of usersets and itemsets, the rank-join method comprising:

calculating a score Hu(Q^u,R) for all usersets in the space,

sorting a list of usersets by the calculated scores,

sorting a list of itemsets by the calculated scores,

merging list of usersets with the list of itemsets in their sorted orders and ranking them.

3. A method according to claim 1 or claim 2, wherein the ratings summary function is ratings count function g(R[Q" ,Q ^~) =∑_ueQ» ∑_;._eg, ^U ) _> where I is an indicator function with value 1 if u has rated i, and 0 otherwise.

4. A method according to claim 1 or claim 2, wherein the ratings summary function is ratings sum function g(R[Q" ,Q^{i ~}) = _uf=QU ∑._eg, R(u,i) .

5. A method according to claim 1 or claim 2, wherein the ratings dataset is a binary ratings model where L{u_t) denotes the set of items rated 1 by user w. and L(Q") = L(u_i),u_i e Q" , and wherein the ratings summary function is ratings cover function g(R[0",0^1']) =

Q\ .

6. A method according to claim 1 or claim 2, wherein the ratings summary

y „ y , R(u,i)

function is ratings density function g(J?[g^M ,O ])

\Q^U\Q

7. A method according to claim 1 or claim 2, wherein t = (Q^u,Qⁱ,R[Q^u,Qⁱ]) \s a pattern and d_t \s the ratings density of the pattern t and wherein the ratings summary function is ratings variance function

8. A method according to claim 1 or claim 2, wherein the ratings dataset is a binary ratings model where L{u_t) denotes the set of items rated 1 by user w. and

L(Q") = L(u_i),u_i e Q" , and wherein the ratings summary function is average ratings cover function

9. A method according to claim 1 or claim 2, wherein the ratings summary function g is the entropy of ratings distribution R(u,i) /u e Q",i e Q .

10. A method of processing a ratings dataset, the ratings dataset incorporating data identifying a plurality of users U, a plurality of items I and a set of ratings R allocated by the users U to the items I, the method comprising:

1 1 . A method of processing a ratings dataset according to claim 10, wherein the biclustering algorithm comprises a mean square residue (MSR) function.

12. A method of processing a ratings dataset according to claim 10, wherein the biclustering algorithm comprises a Delta biclustering algorithm.