JP2009510598A

JP2009510598A - Communication and collaboration system

Info

Publication number: JP2009510598A
Application number: JP2008533302A
Authority: JP
Inventors: デヴァジョティサーカー，
Original assignee: サーカーピーティーイーリミテッド
Priority date: 2005-09-27
Filing date: 2006-09-26
Publication date: 2009-03-12

Abstract

アイテムがユーザ間で共有されることができ、アイテム及びユーザの両方が既存の情報検索ランク付け技術に基づいて検索及びランク付けされることができるように汎用的なアノテーションベースのメカニズムを使用するコミュニケーション及びコラボレーションのためのシステム及び方法。新しい方法が、カテゴリーコンテキストに基づいてユーザ及びアイテムを同時にクラスタ化するために導入される。これらのメカニズムは、コンテキストに基づいてアイテムを発行及びサブスクライブすることを可能にするメカニズムをもたらすために利用される。
【選択図】図３Communication using generic annotation-based mechanisms so that items can be shared between users and both items and users can be searched and ranked based on existing information search ranking techniques And methods and methods for collaboration. A new method is introduced to cluster users and items simultaneously based on category context. These mechanisms are utilized to provide a mechanism that allows items to be published and subscribed based on context.
[Selection] Figure 3

Description

本発明は、アイテム及びユーザの両方が既存の情報検索ランク付け技術で検索及びランク付けされることを可能にするように検索の概念を拡張するコミュニケーション及びコラボレーションのための方法に関する。本発明は、コンテキストベースのコミュニケーションメカニズムを提供するために利用される。 The present invention relates to a method for communication and collaboration that extends the search concept to allow both items and users to be searched and ranked with existing information search ranking techniques. The present invention is utilized to provide a context-based communication mechanism.

発明の背景Background of the Invention

［背景］
ワールドワイドウェブの登場とコンピュータのほぼユビキタスな存在とは、人々が情報を発見し使用する様式を劇的に変化させた。しかし、我々が２１世紀を迎えたとき、我々は他に例を見ないジレンマに直面している。情報及び知識がこれまでになく重要になり、ますます多く生み出されているが、有用で適切な情報を発見することはますます難しくなりつつある。 [background]
The advent of the World Wide Web and the almost ubiquitous presence of computers has dramatically changed the way people discover and use information. But as we enter the 21st century, we are faced with an unparalleled dilemma. Information and knowledge are more important than ever and are being generated more and more, but finding useful and relevant information is becoming increasingly difficult.

ウェブ検索技術は、当技術分野における重要な飛躍的進歩を示す。初期の検索エンジンは、ウェブ上のページのグラフを行き来し、それらを集中サーバにダウンロードするソフトウェアプログラムであるウェブクローラ又はスパイダを作成した。次に、これらは転置インデックスに変換され、情報検索（ＩＲ）方法に基づいて検索された。人は、単語又は単語の集合を含んだすべてのドキュメントを検索することができた。ページの小規模な集合に対して有用であったが、これはスケーリングできなかった。検索の有用性は、クエリに関連しない多数の返却ドキュメントによって損なわれた。したがって、関連性のあるドキュメントを発見するためにすべての結果を調べることが現実的ではないとき、重要なドキュメントが発見されることはできなかった。 Web search technology represents an important breakthrough in the art. Early search engines created web crawlers or spiders, software programs that traverse graphs of pages on the web and download them to a central server. These were then converted to transposed indexes and retrieved based on information retrieval (IR) methods. A person could search all documents containing a word or set of words. Useful for a small set of pages, but this could not be scaled. The usefulness of the search was compromised by a large number of returned documents that were not relevant to the query. Therefore, important documents could not be found when it was not practical to examine all the results to find relevant documents.

この「多量さ」の問題は、Ｋｌｅｉｎｂｅｒｇ、Ｐａｇｅ、及びＢｒｉｎの初期の重要な論文において対処された。Ｋｌｅｉｎｂｅｒｇ、Ｐａｇｅ、及びＢｒｉｎは、ドキュメント間のハイパーリンクを関連性の判断の代用として使用した。多くのその他のページによってリンクされたページは通常よりも関連性があるであろうことが既に知られていた。Ｐａｇｅ及びＢｒｉｎは、あるページをいくつのページが指すかだけでなく、これらのページがどのような質であるかも重要であるという洞察によってこの概念を改善した。Ｐａｇｅ及びＢｒｉｎは、Ｇｏｏｇｌｅ検索エンジンにおいて使用されているＰａｇｅＲａｎｋ法を提案した。これは、そのページにリンクするページに基づく、クエリに依存しないページのランク付けである。 This “abundance” problem was addressed in Kleinberg, Page, and Brin's early important papers. Kleinberg, Page, and Brin used hyperlinks between documents as an alternative to determining relevance. It was already known that pages linked by many other pages would be more relevant than usual. Page and Brin improved this concept with the insight that not only how many pages point to a page, but also the quality of those pages. Page and Brin proposed the PageRank method used in the Google search engine. This is a query independent page ranking based on the page that links to the page.

Ｋｌｅｉｎｂｅｒｇは、ウェブページの質のより改善された概念を提案した。Ｋｌｅｉｎｂｅｒｇは、質の高いページがその他の質の高いページ（Ｋｌｅｉｎｂｅｒｇはオーソリティと呼んだ）を指すことは必要でないと主張した。その代わりに、質の高いオーソリティへのリンクの集合を含むハブと呼ばれる特別なノードが存在する。ＨＩＴＳアルゴリズムは、質の良いハブは多くの質の良いオーソリティにリンクするハブであり、質の良いオーソリティは多くの質の良いハブによってリンクされるオーソリティであるとの洞察によってハブ及びオーソリティの両方に基づいて質を評価した。これはクエリに依存する態様で計算された。 Kleinberg proposed a more improved concept of web page quality. Kleinberg argued that it was not necessary for a quality page to point to another quality page (Kleinberg called an authority). Instead, there is a special node called the hub that contains a collection of links to quality authorities. The HITS algorithm is based on the insight that a quality hub is a hub that links to many quality authorities, and a quality authority is an authority that is linked by many quality hubs. Based on the quality evaluation. This was calculated in a query dependent manner.

これらの方法は、ネットワーク中の集合的な知性を取り入れて、適切な及び有用なページを検索結果の上位にすることを助けることを可能にした。そのようなリンク解析及びランク付け（ＬＡＲ）アルゴリズムの成功は、Ｇｏｏｇｌｅの成功にはっきりと見られる。この成功は当該領域の多数の研究をもたらし、Ｈｉｌｌｔｏｐ、ＳＡＬＳＡ、ランダム化ＨＩＴＳ、サブスペースＨＩＴＳなどのような上記アルゴリズムの多くの変形が存在する。より最近、結果の質を向上するために３変数のテンソル分解を利用するＴＯＰＨＩＴＳのような３レベルアルゴリズムが提案された。ＨＩＴＳは、ハブ及びオーソリティを計算するために使用された初期ページがクエリのトピックにまったく関連しなかった可能性がある場合にトピックドリフトを被った。ＴＯＰＨＩＴＳはリンクテキスト（ハイパーリンク内のテキスト）を使用して、ハブ及びオーソリティのクエリのトピックに対する関連性を割り当てることによってＨＩＴＳを改善する。 These methods have made it possible to incorporate collective intelligence in the network to help make appropriate and useful pages high on the search results. The success of such link analysis and ranking (LAR) algorithms is clearly seen in Google's success. This success has resulted in numerous studies of the area, and there are many variations of the above algorithms such as Hilltop, SALSA, randomized HITS, subspace HITS, and the like. More recently, a three-level algorithm such as TOPHITS that uses a three-variable tensor decomposition has been proposed to improve the quality of results. HITS suffered topic drift when the initial page used to calculate the hub and authority may not have been related to the query topic at all. TOPITS uses link text (text in hyperlinks) to improve HITS by assigning relevance to hub and authority query topics.

しかし、ウェブ検索技術は進歩したものの、まだ多くの重大な問題が残っている。通常のクエリは数百万の結果を返す。所望のページが上位の１０件又は２０件の結果の中に発見されない場合、検索は無駄である。この問題は、たとえユーザが多種多様な情報ニーズを持っている可能性があったとしてもすべてのユーザが同じクエリに対して同じ結果を得るという事実によっていっそう大きくされる。その一方で、ウェブに対する通常のクエリは２、３語である。このことは、検索エンジンがすべてのユーザの情報ニーズを満足することを非常に難しくする。この問題を軽減するために主要な検索エンジンが取っている１つのアプローチはパーソナライズされた検索である。ＣｕｂｅＳＶＤは、ユーザのクリックストリーム（ユーザがクリックするクエリ結果）を使用してユーザに関する関連性を判定することによるパーソナライズされた検索に対する最近のテンソル分解アプローチである。しかし、パーソナライズが実装される態様によっては重大なプライバシーに関する懸念が残り、これらのアプローチが効果的であるかどうかはまだ分かっていない。 However, although web search technology has progressed, many significant problems still remain. A regular query returns millions of results. If the desired page is not found in the top 10 or 20 results, the search is useless. This problem is exacerbated by the fact that all users get the same results for the same query, even though the users may have a wide variety of information needs. On the other hand, a typical query for the web is a few words. This makes it very difficult for search engines to meet the information needs of all users. One approach that major search engines are taking to mitigate this problem is personalized search. CubeSVD is a recent tensor decomposition approach to personalized search by using a user's click stream (query results that the user clicks) to determine relevance for the user. However, depending on how personalization is implemented, significant privacy concerns remain, and it is not yet known if these approaches are effective.

会社のイントラネットのウェブページの検索又はファイル共有などのエンタープライズサーチ、及びデスクトップ検索における状況はさらに悪い。いくつかの研究は、ファイル、電子メールなどのすべての法人のデータの８０％までもが構造化されていない（データベース又はアプリケーション内にない）と見積もる。ドキュメント間にハイパーリンクの接続性がないために、ウェブのＬＡＲアルゴリズムの進歩は適用されるようにされることができない。これらのシステムにおけるランク付けは、全文検索のＴＦＩＤＦスタイルのアルゴリズムにまだ限定され、より低い質をもたらす。近く登場するＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓＶｉｓｔａオペレーティングシステムにおけるキーワードに基づくカテゴリの使用などの最近の充実したインデックス付けアプローチでさえも、以前のＩＲシステムの同様の問題にまださらされている（キーワードに基づいてドキュメントを検索する人は検索回数の２０％しか当該ドキュメント上のキーワードと同じキーワードを使用しない可能性が高いと推定されている）。このことは、インターネット上でドキュメントを発見することの方が、ユーザ自身のハードディスク上でそのドキュメントを発見することよりも容易であるという逆説的な状況を生じる。 The situation in enterprise search, such as searching a company intranet web page or file sharing, and desktop search is even worse. Some studies estimate that up to 80% of all corporate data such as files, emails, etc. is unstructured (not in a database or application). Due to the lack of hyperlink connectivity between documents, advances in web LAR algorithms cannot be made to apply. Ranking in these systems is still limited to TFIDF style algorithms for full text search, resulting in lower quality. Even recent rich indexing approaches such as the use of keyword-based categories in the upcoming Microsoft Windows Vista operating system are still exposed to similar problems with previous IR systems (search documents based on keywords) It is estimated that a person who uses the same keyword as the keyword on the document is only 20% of the search frequency). This creates a paradoxical situation where finding a document on the Internet is easier than finding the document on the user's own hard disk.

最近、フォークソノミと呼ばれる分担方法を通じてカテゴリーの領域で興味深い研究がなされてきた。集中的な態様でウェブを手動で分類するＹａｈｏｏ！及びＯＤＰの初期の試みと異なり、それらは、ブックマーク、画像、及びウェブページを共有するためのコラボレーションによるタグ付けアプローチを使用するように試みる。この空間の主要な革新者はＦｌｉｃｋｒ、ｄｅｌ．ｉｃｉｏ．ｕｓ、及びＴｅｃｈｎｏｒａｔｉを含む。タグ付けに対するそれぞれのアプローチは異なるが、それらのすべてはユーザのための共有空間を作成するための基礎としてキーワードを使用するように試みる。しかし、全般に、少数のユーザだけが実際にフォークソノミを使用する。これは、それらが検索よりも使用することが難しいという事実が原因の一部であり、フォークソノミのページの適用範囲が利用可能なページのうちのごくわずかであることも原因である。ウェブ又はディスクを自動的に巡回してすべてのドキュメントにインデックスを付けることができる検索とは異なり、フォークソノミに効率的にページを取り込むことができる同等のメカニズムは存在しない。 Recently, interesting research has been done in the category area through a sharing method called folk sonomi. Yahoo to manually classify the web in an intensive manner! And unlike the initial attempts of ODP, they attempt to use a collaborative tagging approach for sharing bookmarks, images, and web pages. Major innovators in this space are Flickr, del. icio. including us and Technorati. Each approach to tagging is different, but all of them attempt to use keywords as the basis for creating a shared space for users. But in general, only a few users actually use folk sonomies. This is partly due to the fact that they are harder to use than search, and also due to the folk sonomi's page coverage being only a few of the available pages. Unlike searches that can automatically traverse the web or disk and index all documents, there is no equivalent mechanism that can efficiently fetch pages into a folksonomy.

同様の一連の問題が、概してブログ及びメッセージングシステムに存在する。ブログは、ユーザが単に情報の消費者であるだけでなく、オンラインでコンテンツをポストすることもできる「読み書き」ウェブを作成する試みである。ブログの真の期待は、誰かに発行されたブログのポストが興味を持っている可能性がある読者に効率的な態様で届くことができるときにのみ実現されことができる。必要とされているのは、人が関連性のあるブログのポスティングを検索する代わりにブログのポスティングが関連性のある人を検索する必要がある、検索とは反対のものである。これを実現する態様は現在存在しない。電子メールのようなその他のメッセージングシステムにおいては状況はさらに悪い。電子メール及びＩＭは１対１のインタラクションのための効率的なメカニズムであるが、電子メール及びＩＭはトピックに対して電子メールを送信する概念を効率的に扱わない。配布先リストはトピックに関してメールを送信することに対する最も近い代用であるが、動的に配布先リストを作成し、人をそれらの配布先リストに割り当てることはできない。多くの場合、ドキュメントの存在を他の人に伝える唯一の態様は電子メールである。このことはそのような配布先リストの乱用につながり、受信箱が、ユーザに対する関連性が低く、情報の過多及び損失をもたらすメールで溢れるようになる。 A similar set of problems generally exists in blogs and messaging systems. Blogging is an attempt to create a “read and write” web where users can not only be consumers of information but also post content online. The true expectation of a blog can only be realized when a blog post issued to someone can reach an interested reader in an efficient manner. What is needed is the opposite of search, where instead of searching for relevant blog postings, blog posting needs to search for relevant people. There is currently no way to achieve this. The situation is worse with other messaging systems such as email. While email and IM are efficient mechanisms for one-to-one interactions, email and IM do not efficiently deal with the concept of sending email to a topic. Distribution lists are the closest alternative to sending emails about topics, but you cannot create distribution lists dynamically and assign people to those distribution lists. In many cases, the only way to communicate the existence of a document to others is email. This leads to abuse of such distribution lists, and the inbox becomes flooded with emails that are less relevant to the user and result in excessive and lost information.

組織が法人か、政府機関か、非政府機関か、軍隊か、それとも宗教団体かにかかわらず、概して組織はより大きく、より複雑になった。そのような組織が規模を増すにつれて、人が互いに知り合うことは次第に難しくなり、多くの場合組織は、ある人が他の人が何をしているのか分からない周囲が見えない部分に分割される。これは、組織が急速に変わる環境に直面するとき、及び組織の異なる部分が好機を物にするか又は脅威に立ち向かうために動的に協働するようにされなければならない場合に重大な問題である。電話又は電子メールのような２点間コミュニケーションを用いた階層及び部門などの従来の組織構造は、効果的な対応を行うための柔軟性を認めない。コンテキストベースのコミュニケーション及びコラボレーションメカニズムが重要な役割を果たすことができる。人が新しい好機／脅威などの特定のコンテキストに基づいて集まり、その好機／脅威が過ぎ去るときに解散することを可能にするパラダイムは、変更により対応しやすい組織化に対するより有機的なアプローチの作成を可能にする。ウェブページの作成者及び最終ユーザが「コラボレーションする」ために互いを知る必要がないウェブ検索の基本的メタファは、組織が複雑性を管理することを可能にするようにコミュニケーション及びコラボレーションにまで拡張されることができる。 Regardless of whether the organization is a corporation, government agency, non-government agency, military, or religious organization, the organization has generally become larger and more complex. As such organizations grow in size, it becomes increasingly difficult for people to get to know each other, and in many cases the organization is divided into parts where one cannot see what others are doing without knowing what others are doing . This is a critical issue when organizations face a rapidly changing environment and when different parts of the organization must be made to work together dynamically to take advantage of the threats or face threats. is there. Traditional organizational structures such as hierarchies and departments using point-to-point communication such as telephone or email do not allow the flexibility to make effective responses. Context-based communication and collaboration mechanisms can play an important role. A paradigm that allows people to gather based on a specific context, such as a new opportunity / threat, and disband when that opportunity / threat passes, creates a more organic approach to organizing that is more amenable to change enable. The basic metaphor of web search where web page authors and end users do not need to know each other to “collaborate” extends to communication and collaboration to allow organizations to manage complexity. Can be.

［本発明の背後にある基本的概念］
本発明は、コミュニケーション及びコラボレーションの観点で問題を定義することによって解決法を提供するように試みる。本発明は、ｉ）情報検索の進歩が適用されることができるような汎用的なアノテーションベースのコラボレーションシステムの作成と、ii）ユーザ及びアイテムをクラスタ化するための方法と、iii）人がコンテキストに基づいてメッセージを発行及びサブスクライブすることを可能にするコミュニケーション方法とを可能にするシステム及び方法を作成することに重点を置く。当技術分野にとって新規性のあるこれらのメカニズムには複数の特徴がある可能性がある。これらの方法はスタンドアロンで使用されることができるだけでなく、互いに又はその他のシステムと関連して使用されることができる。 [Basic concept behind the present invention]
The present invention attempts to provide a solution by defining problems in terms of communication and collaboration. The present invention provides i) the creation of a universal annotation-based collaboration system to which advances in information retrieval can be applied, ii) a method for clustering users and items, and iii) human context. Emphasis is placed on creating a system and method that enables communication methods that allow messages to be issued and subscribed to. These mechanisms that are novel to the art may have multiple features. These methods can not only be used standalone, but can be used in conjunction with each other or other systems.

［ＩＲを用いた汎用的なアノテーションベースのコラボレーションシステム］
当技術分野に知られている多くの形態のアノテーションベースの分担システムが存在する。すべてのフォークソノミはそのようなシステムの例である。しかし、アノテーションシステムは、分担及びコラボレーションの観点でそれらのシステムの有効性に違いがある。常にＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓのＮＴＦＳファイルシステムは任意のファイルにキーワードを追加する能力を有していたが、この機能は、主にそのようなキーワードがファイルを効率的に発見するために使用されることができなかったという理由で実際には使用されてこなかった。ＴｅｃｈｎｏｒａｔｉＴａｇ（ブロガが人がそれらのブロガのポストを発見できるようにそれらのブロガのポストにタグを付けることを可能にする技術）は、作者しかアイテムにタグを付けることができないので有効性が低い。Ｆｌｉｃｋｒは、ユーザがそれらのユーザの写真をその他の人と共有することを可能にする写真共有サイトである。しかし、ユーザは、それらのユーザがいったんその他のユーザの写真を見てしまうとその写真に関する利用を制限され、したがって、Ｆｌｉｃｋｒの力はコラボレーションについてはより低く、共有についてはより高い。 [Generic annotation-based collaboration system using IR]
There are many forms of annotation-based sharing systems known in the art. All folk sonomies are examples of such systems. However, annotation systems differ in their effectiveness from the perspective of sharing and collaboration. Although Microsoft Windows NTFS file system has always had the ability to add keywords to any file, this feature can mainly be used to find files efficiently such keywords. It has never been used because it was not. Technorati Tag (a technique that allows bloggers to tag their blogger posts so that people can find them) is less effective because only authors can tag items . Flickr is a photo sharing site that allows users to share their photos with others. However, users are limited in their use of photos once they see other users' photos, so Flickr's power is lower for collaboration and higher for sharing.

しかし、Ｄｅｌ．ｉｃｉｏ．ｕｓは、ブックマークに関する成功したコラボレーションシステムの例である。ある人のブックマークは場合によってはその他の人に対して独自の価値を有する可能性があるので、そのブックマークを共有する行為は例えばＦｌｉｃｋｒよりも重要な役割を負う。様々な人が同様の問題に直面する可能性があるので、その問題を解決することに役立つ関連情報を共有することはコラボレーションの側面を帯びる。ブックマークの本質的な価値は別にして、ｄｅｌ．ｉｃｉｏ．ｕｓは、そのメカニズムの観点においてもその他のフォークソノミとは異なる。そのメカニズムは複数のユーザが同じアイテムにタグを付けることを可能にし、複数のユーザはそれらのユーザがタグを付けることから独立に有用性を導き出すので同じアイテムにやはりタグを付ける。 However, Del. icio. us is an example of a successful collaboration system for bookmarks. Since a person's bookmark may have unique value to others in some cases, the act of sharing that bookmark plays a more important role than Flickr, for example. Since different people can face similar problems, sharing relevant information that can help solve the problem is a collaborative aspect. Apart from the intrinsic value of bookmarks, del. icio. us is different from other folk sonomies in terms of its mechanism. The mechanism also allows multiple users to tag the same item, and multiple users also tag the same item because they derive usefulness independently from tagging.

ウェブの多くの性質がべき乗則を示すことが当技術分野において知られている。べき乗則は、図２のような両対数グラフにプロットされるとき直線を形成する分布である。これは、フラクタルであることの基本指標とみなされ、スケール不変性を説明する（分布はどのようなスケールでも自己相似的に見える）。べき乗則は、ウェブページのコンテンツ、ウェブページ間のハイパーリンク、検索クエリ、ウェブサーバ上のファイルサイズ、トラフィックパターン、及びインターネットを構成する物理的配線において認められている。ページ間のハイパーリンクにおけるべき乗則は、ＰａｇｅＲａｎｋ及びＨＩＴＳのようなＬＡＲアルゴリズムが解に効果的に収束し、したがって実用的な方法になることを可能にする。ドキュメントのコンテンツにおいて使用される語は、ジップの法則としても知られるべき乗則にやはり従う。これは、ＴＦ−ＩＤＦ（語の頻度−ドキュメントの頻度の逆数）、及び関連性によって検索結果をソートするためのその他の方法などのＩＲランク付け方法によって暗黙的に利用される。Ｄｅｌ．ｉｃｉｏ．ｕｓも図１のようなべき乗則を示す。アイテム内のキーワードの頻度、キーワードに対するアイテム数、ユーザに対するアイテム数、アイテムに対するユーザ数などがすべてべき乗則の分布に従う。本発明は、ｄｅｌ．ｉｃｉｏ．ｕｓにおけるアイテムに対するキーワードの分布が（実際にはウェブのリンクテキストであるような）ウェブ検索エンジンにおけるクエリに対するキーワードの分布と似ていることに留意する。アイテム毎のユーザの分布は、ワールドワイドウェブ上のインリンク（ウェブページに入ってくるハイパーリンク）の分布に似ている。実際には、ウェブ上のハイパーリンクは通常はページナビゲーションの形態とみなされるが、本発明はそれらのハイパーリンクがアノテーションの形態としてもみなされ得ることに留意する。 It is known in the art that many properties of the web exhibit power law. The power law is a distribution that forms a straight line when plotted on a log-log graph as in FIG. This is considered the basic indicator of being a fractal and explains scale invariance (the distribution looks self-similar at any scale). Power law is accepted in the content of web pages, hyperlinks between web pages, search queries, file sizes on web servers, traffic patterns, and the physical wiring that makes up the Internet. The power law in hyperlinks between pages allows LAR algorithms such as PageRank and HITS to effectively converge on the solution and thus become a practical method. The terms used in the document content still obey the power law, also known as Zip's law. This is implicitly utilized by IR ranking methods such as TF-IDF (word frequency-reciprocal of document frequency) and other methods for sorting search results by relevance. Del. icio. us also indicates a power law as shown in FIG. The frequency of keywords in items, the number of items for keywords, the number of items for users, the number of users for items, etc. all follow the power-law distribution. The present invention relates to del. icio. Note that the distribution of keywords for items in us is similar to the distribution of keywords for queries in web search engines (which is actually web link text). The distribution of users for each item is similar to the distribution of in-links (hyperlinks entering a web page) on the World Wide Web. In practice, hyperlinks on the web are usually considered as forms of page navigation, but the present invention notes that these hyperlinks can also be considered as forms of annotations.

本発明はこれらのべき乗則の性質が、言語自体の性質、及び我々が我々の周りのアイテムの有用性を理解する態様の性質であると仮定する。これは、コミュニケーション及びコラボレーションの２つの相互に強化し合うプロセスの必然的結果である。したがって、これらのプロセスにおいて行われる創発的な自己組織化を好適に利用することができる任意のアノテーションシステムは、ＩＲ及びＬＡＲ研究の既存の団体によって有利に利用されることができる同様のべき乗則の特徴を示す。 The present invention assumes that these power-law properties are properties of the language itself and of the manner in which we understand the usefulness of the items around us. This is an inevitable result of two mutually reinforcing processes of communication and collaboration. Thus, any annotation system that can advantageously take advantage of the emergent self-organization that occurs in these processes is a similar power law that can be used to advantage by existing organizations in IR and LAR research. Show features.

この基本的な考え方を用いて、本発明は、アノテーションシステムの概念を少なくとも２つの重要な態様で一般化する。アノテーションは、ユーザがキーワードを用いてアイテムにタグを付ける行為であると考えられるだけでなく、ユーザに簡潔な態様でアイテムを記述するように求める任意の行為であるとも考えられる。検索におけるクリックストリームは、アノテーションのための少なくとも１つのその他の等価な方法である。また、それは、クエリのキーワードが所与のユーザに関するクリックされたＵＲＬに関連付けられるようにアノテーションを生成する。ウェブ上のハイパーリンク内のリンクテキストも、表現力は低いがそのようなアノテーションのための別のメカニズムである。ファイルをファイルシステムの階層内に置くことは、より柔軟性がなく、制限されてはいるがアノテーションの形態であり、当該ファイルよりも上の階層ツリー内のすべてのディレクトリ名を当該ファイルに関連付けることと同様である。リンクテキスト及びファイル名はタグ付け又はクリックストリームほど効率的でないが、十分な数があればそれらのリンクテキスト及びファイル名は本発明の形態のアノテーションの効果を近似する。 Using this basic idea, the present invention generalizes the concept of an annotation system in at least two important ways. Annotation is not only thought of as an act of a user tagging an item with a keyword, but also an arbitrary act of asking the user to describe the item in a concise manner. Clickstream in search is at least one other equivalent method for annotation. It also generates an annotation so that the query keyword is associated with the clicked URL for a given user. Link text in hyperlinks on the web is another mechanism for such annotations, although it is less expressive. Placing a file in the file system hierarchy is less flexible and, in a limited but form of annotation, associates all directory names in the hierarchy tree above that file with that file. It is the same. Link text and file names are not as efficient as tagging or clickstreams, but if there are sufficient numbers, the link text and file names approximate the effect of annotations in the form of the present invention.

次に、アノテーションは、グループのコミュニケーション及びコラボレーション活動内で特定されることができる任意のアイテムに適用されることができる。これは、ウェブのＵＲＬだけに制限されず、概念的に説明されることができるあらゆるものを包含することができる。これは、企業ＬＡＮ内のファイル、プロジェクト管理システム内のタスク及び課題、ブレインストーミングセッションにおけるアイデア、紙のドキュメント、スプレッドシート上の表、ＲＤＢ内のデータ、ウェブサービス、ＲＳＳフィードなどを含むことができるがこれらに限定されない。ユーザがアイテムを取り出すか又は使用することを可能にするための（オフライン又はオンラインのデジタル式か又はその他の形式の）メカニズムが存在すると仮定して、アイテムは（ＵＲＩ、社会保障番号、又はバーコードのような）一意なＩＤによって示されることができる任意のものであることができる。 Annotations can then be applied to any item that can be identified within the group's communication and collaboration activities. This is not limited to web URLs but can encompass anything that can be conceptually described. This can include files in the corporate LAN, tasks and issues in the project management system, ideas in brainstorming sessions, paper documents, spreadsheet tables, data in the RDB, web services, RSS feeds, etc. However, it is not limited to these. Assuming there is a mechanism (offline or online digital or other form) to allow the user to retrieve or use the item, the item (URI, social security number, or barcode) Can be anything that can be indicated by a unique ID.

本発明の汎用的なアノテーションベースのコラボレーションシステムは、ｉ）システムが互いに無関係な多数のユーザによって（アイテムが取り出されるか、評価されるか、見られるか、又は使用されることができるように）アイテムが特定及び共有されることを可能にし、ii）それぞれのそのようなユーザがアイテムを記述するのに有用であると感じるキーワードを用いてそれらのユーザがアイテムを独立にアノテーションすることができ、それぞれのそのようなアイテムが多数のそのようなユーザによってアノテーションされる可能性があり、iii）各ユーザがそのようなキーワードに基づいてアイテムを独立に発見することができ、その結果、（各アイテムに関して複数のユーザに渡って収集された）対応するキーワードを有するすべてのアイテムが取り出される、多数のアイテムを含む任意のアノテーションシステムとして定義され、ユーザ及びアイテムの十分に多様な集団に対してべき乗則の分布に近い自己組織化の性質を示す。ここで重要な概念は独立性の概念であり、この概念はユーザが互いの活動及び／又は存在を知ることなしに操作を行うことができることを示唆している。これは、ユーザ間のアイテムの組織化のための唯一のメカニズムが、様々なユーザの間のキーワードの共有された意味であることを意味する。 The generic annotation-based collaboration system of the present invention is i) the system can be retrieved by multiple users unrelated to each other (so that items can be retrieved, evaluated, viewed or used). Allows items to be identified and shared, and ii) allows each such user to annotate items independently with keywords that they find useful to describe the item, Each such item can be annotated by a number of such users, and iii) each user can find the item independently based on such keywords, so that (each item All with corresponding keywords (collected across multiple users) Item is removed, is defined as any annotation system includes a number of items, indicating the nature of the self-organizing close to the distribution of power law with respect to a sufficiently diverse population of users and items. An important concept here is the concept of independence, which suggests that users can operate without knowing each other's activity and / or presence. This means that the only mechanism for organizing items between users is the shared meaning of keywords between various users.

より具体的には、アノテーションは、以下の形態のイベントを生成する任意のメカニズムによって生成されることができる。
［アイテムＩＤ］［ユーザＩＤ］［キーワード１，キーワード２，．．．．キーワードＮ］
一意な識別子［ユーザＩＤ］を有するユーザが、アイテムを記述するキーワード［キーワード１，キーワード２．．．．キーワードＮ］を用いて一意な識別子［アイテムＩＤ］を有するアイテムを記述する度に、本発明は、そのようなアノテーションベースのシステムが
・キーワードに対するアイテム数
・アイテム内のキーワードの頻度
・アイテムに対するユーザ数
・ユーザに対するアイテム数
においておおよそべき乗則の分布を示す場合にそのようなメカニズムが以下の性質を示すことに留意する。 More specifically, annotations can be generated by any mechanism that generates events of the form:
[Item ID] [User ID] [Keyword 1, Keyword 2,. . . . Keyword N]
A keyword having a unique identifier [user ID] is described by a keyword [keyword 1, keyword 2. . . . Each time an item with a unique identifier [item ID] is described using the keyword N], the present invention will allow such an annotation-based system to: • number of items for the keyword • frequency of keywords in the item • user for the item Numbers Note that such a mechanism exhibits the following properties when it shows an approximate power law distribution in the number of items for a user.

・関連性に関して検索結果をソートすることにＴＦＩＤＦスタイルのランク付けを使用することができる。各アイテムは、ＩＲにおける通常のドキュメントと同様にキーワードベクトルによって示されることができる。さらに、ウェブ上のリンクテキストと同様に、各アノテーションはアイテムのコンテンツの異なる個人の判断を示し、収集されるときに当該アイテムの作者／作成者よりもさらに上手くコンテンツを記述する。したがって、収集されたキーワードに基づくランク付けは標準的な全文検索に比べて優れた質をもたらすことが多い。 • TFIDF style ranking can be used to sort search results for relevance. Each item can be indicated by a keyword vector just like a regular document in IR. Furthermore, like the link text on the web, each annotation indicates a different individual judgment of the item's content and describes the content better than the author / creator of the item when collected. Therefore, ranking based on collected keywords often provides superior quality compared to standard full-text search.

・各イベントをユーザからアイテムへの「統合的な」ハイパーリンクとして扱うことができる。ユーザをハブとして扱い、アイテムをオーソリティとして扱うことによって、ＨＩＴＳ、Ｈｉｌｌｔｏｐ、ＳＡＬＳＡ、ＰＨＩＴＳ、ランダム化ＨＩＴＳ、サブスペースＨＩＴＳ、ＴＯＰＨＩＴＳ、ＣｕｂｅＳＶＤなどのような（Ｂｏｒｏｄｉｎらの文献に記載されている）ＬＡＲ方法のようなＩＲアルゴリズムを結果セット及び検索結果のランク付けを決定することに取り入れることができる。そのようなアプローチによって達成される結果の質は、ウェブにおけるハイパーリンクに基づいたそれらの現在の性能と同等か、ときにはそれ以上である可能性がある。アイテムに対するユーザ及びユーザに対するアイテムの両方がべき乗則を示すので、これらのアルゴリズムはウェブのように急速に収束する。このことは、企業内のファイルのようにアノテーションされたアイテムがＬＡＲスタイルのアプローチの恩恵を受けることが現在不可能である場合に、アノテーションされたアイテムがＬＡＲスタイルのアプローチの恩恵を受けることを可能にすることができる。 • Each event can be treated as a “integrated” hyperlink from the user to the item. By treating users as hubs and items as authorities, LAR methods such as HITS, Hilltop, SALSA, PHITS, Randomized HITS, Subspace HITS, TOPHITS, CubeSVD, etc. (as described in Borodin et al.) An IR algorithm such as can be incorporated in determining the result set and search result ranking. The quality of results achieved by such an approach may be comparable to, or even better than, their current performance based on hyperlinks on the web. These algorithms converge as fast as the web because both the user for the item and the item for the user exhibit a power law. This allows an annotated item to benefit from the LAR style approach when it is currently impossible for an annotated item, such as a corporate file, to benefit from the LAR style approach. Can be.

・ユーザがキーワードの集合を用いてアイテムをアノテーションするとき、アイテムを記述する多数の態様のうちから、通常のユーザはそれらのユーザが当該アイテムを定義するのに有用である思うキーワードを自然に選択する。そのようにする際に、ユーザはアイテムを記述するだけでなく、同時にそれらのユーザが重要だと思うことを記述する。このことは、ドキュメントと同様にキーワードによってユーザに対してクエリを行うことを可能にする。クエリに対するユーザのランク付けは、ユーザをハブとして扱う、ＬＡＲアルゴリズムを含む上記のＩＲアプローチのうちのいずれかによって行われることができる。 When a user annotates an item with a set of keywords, among many ways to describe the item, normal users naturally select keywords that they think are useful for defining the item To do. In doing so, the user not only describes the items, but at the same time describes what they think is important. This makes it possible to query the user by keyword as well as the document. Ranking users for queries can be done by any of the IR approaches described above, including the LAR algorithm, treating the user as a hub.

・コミュニティは、人が場合によってはアイテム／ユーザを誰よりも効果的に検索できる可能性がある様々なキーワードをアノテーションする。しかし、キーワード−アイテム又はキーワード−ユーザ行列を単語−ドキュメント行列と同様の態様で使用することによってＬＳＩ又はＰＬＳＡを使用して、たとえアイテム又はユーザをキーワードに明確に関連付けるアノテーションがないとしてもキーワードによってそれらのアイテム又はユーザに対してクエリを行うことを可能にすることもできる。これらは、高次共起データを利用してそのような単語を発見する。これは、イベントデータに対してＰＨＩＴＳ＋ＰＬＳＡ、ＨＯＳＶＤ、ＣｕｂｅＳＶＤ、及びＴＯＰＨＩＴＳのような３レベルアプローチを使用することによってさらに向上されることができる。 The community annotates various keywords that a person may possibly be able to search for an item / user more effectively than anyone. However, using a keyword-item or keyword-user matrix in a manner similar to a word-document matrix, using LSI or PLSA, even if there are no annotations that explicitly associate an item or user with a keyword, It may also be possible to query for items or users. These use higher-order co-occurrence data to find such words. This can be further improved by using a three-level approach such as PHITS + PLSA, HOSVD, CubeSVD, and TOPHITS for event data.

上記に基づいて、そのようなアノテーションシステムにおいてアイテム及びユーザに対してＩＲに基づく多数の検索方法があり得ることが当業者には明らかであろう。特定の実施形態においてそれらが選択的に使用されるのか、それとも互いに連動して使用されるのかは、本発明の精神から逸脱しない。さらに、アイテム／ユーザに関するアノテーションキーワードベクトル、キーワード−アイテム行列、及びキーワード−ユーザ行列を構築し、アノテーションシステムがべき乗則を示すかどうかにかかわらずアノテーションを統合的なハイパーリンクとして扱うことがいつでも可能である。すべての上記の方法が任意のそのような場合において、ただし場合によっては低い有効性で適用されることができる。 Based on the above, it will be apparent to those skilled in the art that there can be numerous IR-based search methods for items and users in such annotation systems. It does not depart from the spirit of the invention whether they are used selectively or in conjunction with each other in particular embodiments. In addition, it is possible to construct annotation keyword vectors, keyword-item matrices, and keyword-user matrices for items / users and treat annotations as an integrated hyperlink at any time, regardless of whether the annotation system exhibits a power law. is there. All the above methods can be applied in any such case, but in some cases with low effectiveness.

［アイテム及びユーザのクラスタ化］
過去に、ドキュメントをクラスタ化するための方法を生み出す試みがあった。分類法及び統制語彙が試され、１つの上部構造をすべてのアイテムが効果的にその上部構造に入れられることができるように作成することが現実的でないために失敗した。原文の類似性に基づく結果データに対するクラスタ化を使用する試みがなされたが、そのような自動化された方法によって生成された下位カテゴリーはユーザが理解することが容易でないことが多い。 [Clustering items and users]
In the past, there have been attempts to create a method for clustering documents. Taxonomies and controlled vocabularies have been tried and failed because it is impractical to create one superstructure so that all items can be effectively put into that superstructure. Attempts have been made to use clustering on result data based on textual similarity, but the subcategories generated by such automated methods are often not easy for the user to understand.

本発明はクラスタ化の概念を概してアイテムに、具体的には検索に拡張する。コンテキストが次第に狭まると共に検索結果にドリルダウンする能力は、通常のクエリの長さが２、３個のキーワードであるウェブ検索における問題に対する可能な解決策である。現在のユーザはファイルシステム内のフォルダにドリルダウンして、それらのユーザが探すファイルを発見することに慣れている。したがって、検索結果がキーワードによって下位カテゴリーにクラスタ化された場合（例えば、図１０）、ユーザはファイルシステムと同様の閲覧の行動を示す可能性がある。これは、そのような下位カテゴリーに対応するキーワードを用いて元のクエリを増強することに等しい。これは、ＧｏｏｇｌｅＳｕｇｇｅｓｔなどのクエリ改善方法よりもユーザフレンドリーであり、より優れている可能性がある。そのような方法はウェブ検索に限定されず、デスクトップ検索及びエンタープライズサーチを含むがこれらに限定されない任意の形態の検索に適用されることができる。 The present invention extends the concept of clustering generally to items, specifically search. The ability to drill down into search results as the context narrows is a possible solution to the problem in web searches where the length of a normal query is a few keywords. Current users are accustomed to drilling down into folders in the file system and finding the files they find. Therefore, when search results are clustered into lower categories by keywords (for example, FIG. 10), the user may exhibit browsing behavior similar to that of the file system. This is equivalent to augmenting the original query with keywords corresponding to such subcategories. This is more user friendly and may be better than query improvement methods such as Google Suggest. Such methods are not limited to web search, but can be applied to any form of search including but not limited to desktop search and enterprise search.

アイテムをクラスタ化することとは別に、ユーザをクラスタ化することに関する多くの潜在的な用途が存在する。これは、同じ興味を持つ人の集まりの動的な形成、又は類似性などに基づくのではなく興味に基づいたソーシャルネットワークの生成をもたらすことである可能性がある。本発明の汎用的アノテーションメカニズムは、キーワードに対してユーザをアイテムと同様に扱うことを可能にする。アイテム及びユーザをクラスタ化するために使用されることができる当技術分野において知られている多くのクラスタ化アルゴリズムが存在する。これらは、主成分解析及び多次元尺度構成法のような射影法、又は自己組織化マップ、Ｋ−平均クラスタ化などのようなその他の方法を含む。アイテムは、それらのアイテムのアノテーションにおいて使用されるキーワード、又はそれらのアイテムをアノテーションするユーザ、又はその両方に基づいてクラスタ化されることができる。同様の態様で、ユーザはそれらのユーザのキーワード、又はアイテム、又はその両方に基づいてクラスタ化されることができる。 Apart from clustering items, there are many potential uses for clustering users. This may result in the dynamic formation of a group of people with the same interest, or the creation of a social network based on interest rather than on similarity or the like. The generic annotation mechanism of the present invention allows a user to be treated like an item for a keyword. There are many clustering algorithms known in the art that can be used to cluster items and users. These include projection methods such as principal component analysis and multidimensional scaling, or other methods such as self-organizing maps, K-means clustering, etc. Items can be clustered based on keywords used in annotations of those items, users annotating those items, or both. In a similar manner, users can be clustered based on their keywords, items, or both.

すべてのクラスタ化方法が解決する必要がある重大な問題のうちの１つは複雑性の削減である。例として、アイテム及びユーザのキーワードに関連する高い複雑性がある。実際には、アイテムが存在するのと同じぐらい多くの一意なアノテーションのコンテキストが存在する可能性がある。したがって、クラスタ化の問題は、類似したアイテム及びユーザをまとめる目的に役立つ関連性のあるキーワードのサブセットを選択することになる。これは、とりわけ、可能な組合せの膨大な数と、キーワードのアイテム及びユーザに対する関連性を判定することの難しさとが原因で困難な問題である。また、アイテムとは異なり、ユーザは時間の経過と共に変わる多くの側面を持ち、多くのキーワードを共有する可能性がある。 One of the major problems that all clustering methods need to solve is reduced complexity. An example is the high complexity associated with items and user keywords. In practice, there can be as many unique annotation contexts as there are items. Thus, the problem of clustering is to select a subset of relevant keywords that serve the purpose of grouping similar items and users. This is a difficult problem due to, among other things, the huge number of possible combinations and the difficulty of determining the relevance of keywords to items and users. Also, unlike items, users have many aspects that change over time and may share many keywords.

ＬＳＩなどのパターン認識方法は、次元縮小をこの複雑性に対処するための方法として使用するが、それらの方法は実行され、更新され続けるために高いコストがかかり、さらにそれらの方法が実際に何をしているのかを理解することが難しい。 Pattern recognition methods such as LSI use dimensionality reduction as a way to deal with this complexity, but those methods are expensive to run and keep updated, and what are they actually doing? It is difficult to understand what you are doing.

本発明は、ユーザ及びアイテムの両方が同時にクラスタ化されるときに最も適切なクラスタ化が行われることに留意する。本発明は、大幅な複雑性の削減を実現し、ユーザ及びアイテムの両方に関して直感的で効果的なクラスタ化の結果を達成するアプローチを明らかにする。そのアプローチはコンテキストの概念に基づく。ここで、コンテキストはキーワードの集合として定義される。検索の場合、コンテキストはキーワードの論理積に基づく検索に対応する。アイテム又はユーザは、それらのアイテム又はユーザがコンテキストのすべてのキーワードにマッチする場合にコンテキストに属するとみなされる。コンテキストのサブコンテキストは、当該コンテキストのすべてのキーワードと少なくとも１つのその他のキーワードとを有するコンテキストである。したがって、サブコンテキスト内に存在するすべてのユーザ及びアイテムはコンテキスト内にも存在する。アノテーションイベント内のキーワードの集合はコンテキストである（アノテーションイベント自体が、ユーザがキーワードの定義をユーザＩＤ及びアイテムＩＤを含むように拡張する場合はコンテキストとみなされることができ、その場合、アノテーションイベントはキーワードの集合によって形成されるコンテキストに対するサブコンテキストである。また、ユーザはアイテムとみなされることができる。）。 The present invention notes that the most appropriate clustering occurs when both users and items are clustered simultaneously. The present invention reveals an approach that achieves significant complexity reduction and achieves intuitive and effective clustering results for both users and items. The approach is based on the concept of context. Here, a context is defined as a set of keywords. In the case of a search, the context corresponds to a search based on the AND of keywords. Items or users are considered to belong to a context if they match all keywords in the context. A context sub-context is a context that has all the keywords of the context and at least one other keyword. Thus, all users and items that exist within the subcontext are also present within the context. A set of keywords in an annotation event is a context (an annotation event itself can be considered a context if the user extends the keyword definition to include a user ID and item ID, in which case the annotation event is (It is a sub-context to the context formed by a set of keywords, and the user can be considered an item.)

そのとき、アイテム及びユーザのクラスタ化は、ユーザ及びアイテムをまとめるために最も関連性のあるコンテキストの集合を決定することとして定義されることができる。複雑性の削減は、そのようなコンテキストの集合の大きさがシステム内のすべてのコンテキストの集合よりもずっと小さいときに達成されることができる。本発明は、関連性のある／有用なコンテキストは使用されるコンテキストであるという洞察を用いる。したがって、そのようなコンテキストの集合は、ユーザによってアイテムを記述することに使用されたアノテーションイベント内の実際のコンテキストから、並びに少なくとも特定の最小数のアイテム及びユーザを含むコンテキストを発見することによって決定されることができる。実際には、十分に多様なユーザ／アイテムの集団を有するアノテーションシステムに関して、小さな最小数でさえも大きな次元縮小と、興味に基づくカテゴリーに従ったユーザ／アイテムの効率的クラスタ化とをもたらすことができる。これは、ユーザが効果的にコミュニケーションできるようにどのようなトピックがそれらのユーザに対して有意義であるのかを判断することが難しいというフォーラムの実装の主な問題のうちの１つを克服する。本発明の創発的なコンテキスト又はトピックは、この問題に解決法を提供する、ユーザ及びアイテムに対する動的で適切なクラスタ化をもたらす。 Then, item and user clustering can be defined as determining the most relevant set of contexts to group users and items together. Complexity reduction can be achieved when the size of such a context set is much smaller than the set of all contexts in the system. The present invention uses the insight that the relevant / useful context is the context used. Thus, the set of such contexts is determined from the actual context in the annotation event used to describe the item by the user and by finding a context that includes at least a certain minimum number of items and users. Can. In fact, for annotation systems with a sufficiently diverse population of users / items, even a small minimum number can result in large dimensional reduction and efficient clustering of users / items according to interest-based categories. it can. This overcomes one of the main problems of the forum implementation that it is difficult to determine what topics are meaningful to those users so that they can communicate effectively. The inventive context or topic of the present invention provides a dynamic and appropriate clustering for users and items that provides a solution to this problem.

ウェブ検索を含む検索の場合、アノテーションは検索結果のクリックストリームに基づいてクエリログから得られることができる（それらはリンクテキストと有利に組み合わされることもでき、そのようなリンクは異なるウェブホスト、又はブログ、又は独立したアノテータ及びその他のアノテーションソースのその他の代理からくる）。上で定義されたコンテキストの集合が計算され、カテゴリーコンテキストと呼ばれることができる。検索における任意の所与のコンテキストに関して、検索クエリのサブコンテキストであるカテゴリーコンテキスト内のコンテキストの集合が計算されることができ、結果として得られる集合内の（検索コンテキストのキーワードを取り出した後の）キーワードのそれぞれが上述のようにドリルダウン下位カテゴリーとして提供されることができる。所与の下位カテゴリーをドリルダウンすることは、クエリのコンテキストがドリルダウン下位カテゴリーに対応するキーワードを含むようにそのクエリのコンテキストを変更することと等価である。したがって、それぞれのドリルダウンキーワードは、クエリのコンテキストのサブコンテキストに対応する。これは１語のキーワードに限定されず、ドリルダウンキーワードとみなされることができる単語同士の結びつき及びｎ−ｇｒａｍに基づく単語列を包含する。これらのドリルダウンキーワードは、特定のランク付け順に基づいてソートされて示されることができる。そのようなランク付け順は、そのようなサブコンテキストに関するイベント（又はユーザ若しくはアイテム）数から計算されることができる（さらに、それらの数は「今日」などの対象期間及び累計に基づいて計算されることができる）。さらに、カテゴリーコンテキスト自体の計算が、所与の期間内のすべてのイベントがカテゴリーコンテキストを計算するために使用される期間を定めた態様で行われることができる。 For searches involving web searches, annotations can be derived from the query log based on the click stream of the search results (they can also be advantageously combined with link text, such links can be different web hosts or blogs Or from other representatives of independent annotators and other annotation sources). The set of contexts defined above is computed and can be referred to as a category context. For any given context in the search, the set of contexts in the category context that is a sub-context of the search query can be computed, and in the resulting set (after retrieving the search context keywords) Each of the keywords can be provided as a drill-down subcategory as described above. Drilling down on a given subcategory is equivalent to changing the query context so that the query context contains keywords corresponding to the drilldown subcategory. Thus, each drill-down keyword corresponds to a sub-context of the query context. This is not limited to a single word keyword, but includes word strings based on n-grams and word links that can be regarded as drill-down keywords. These drill-down keywords can be shown sorted according to a specific ranking order. Such ranking order can be calculated from the number of events (or users or items) related to such sub-contexts (and those numbers are calculated based on the target period and cumulative total such as “Today”. Can). Furthermore, the calculation of the category context itself can be performed in a manner that defines the period in which all events within a given period are used to calculate the category context.

ドリルダウン中の実際の検索プロセスは、そのようなドリルダウンを計算することに使用されたアノテーションとは独立に行われることができ、全文検索などを含む、検索を実行するために検索エンジンが使用するどんな方法であってもよい。下位カテゴリーの役割は単に関連性のあるキーワードをユーザに示すことであり、次にそのキーワードが検索クエリを増強するために使用される。 The actual search process during drill-down can be done independently of the annotations used to calculate such drill-downs, used by search engines to perform searches, including full-text searches etc. Any way you want. The role of the subcategory is simply to show the user relevant keywords, which are then used to augment the search query.

［コンテキストベースのコミュニケーション方法］
コミュニケーションはすべてのコラボレーション活動の中核である。しかし、今日のほとんどのコミュニケーション技術は（電話、電子メール、ＳＭＳ／ＩＭなどのような）１対１パラダイム又は（ＴＶ、ラジオ、ウェブなどのような）１対多パラダイムに限られている。最新技術において適切に対応されていないコミュニケーションの重要な種類、すなわち、多対多コミュニケーションが存在する。多対多によって示唆されるのは、電話会議の場合のような複数の人、又は電子メールの複数の受信者ではない。そうではなく、１対多の場合のように、受信者は、コミュニケーションの前はメッセージの送信者に知られていない。例として、企業ファイルシステム内にファイルを置く人は、そのファイルの存在をそのファイルを必要とする可能性がある場合によっては未知の人に伝えることができる必要がある。ブログはウェブ上で誰でもコンテンツをポストできるようにするが、対象のウェブユーザがブログのポスティングを発見することを可能にする有効なメカニズムは存在しない。 [Context-based communication method]
Communication is the core of all collaboration activities. However, most communication technologies today are limited to one-to-one paradigms (such as telephone, email, SMS / IM, etc.) or one-to-many paradigms (such as TV, radio, web, etc.). There is an important type of communication that is not adequately addressed in the latest technology, namely many-to-many communication. Many-to-many suggests not multiple people, as in a conference call, or multiple email recipients. Rather, as in the one-to-many case, the recipient is not known to the sender of the message prior to communication. As an example, a person who places a file in a corporate file system needs to be able to communicate the existence of the file to an unknown person, which may need the file. Blogs allow anyone on the web to post content, but there is no effective mechanism that allows targeted web users to discover blog postings.

問題は、「各アイテムに関してはそのアイテムのユーザであり、各ユーザに関してはそれらのユーザのアイテムである」と明確に表現されることができる。検索は人が関連性のあるアイテムを発見することを可能にするが、コミュニケーション及びコラボレーションの観点で恐らくより重要なのは、アイテムが関連性のある人を発見するための能力である。上述の方法がこの問題を解決するために有利に使用されることができる。 The problem can be clearly expressed as “for each item is the user of that item and for each user is the item of those users”. While searching allows a person to find relevant items, perhaps more important from a communication and collaboration perspective is the ability for an item to find relevant people. The method described above can be advantageously used to solve this problem.

多対多コミュニケーションシステムは２つの別個の部分、すなわち、関連性のあるアイテムを発見するユーザを検索する必要があるアイテムの発行者、及びユーザに関連するアイテムを求めてすべてのアイテムを検索するユーザとして実装されることができる。任意のそのようなメカニズムは実用的な実装をもたらすために包括性、検索能力、及びプライバシーに関する必要性のバランスを取らなければならないことが当業者には明らかであろう。 The many-to-many communication system has two distinct parts: the issuer of the item that needs to search for users who find relevant items, and the user who searches all items for items related to the user Can be implemented as: It will be apparent to those skilled in the art that any such mechanism must balance the need for comprehensiveness, search capabilities, and privacy to yield a practical implementation.

本発明はコミュニケーションプロセスを３つの段階、すなわち、発行、コンテキスト設定、及びサブスクライブに分割する。本発明は、発行者が問題のアイテムが所望のサブスクライバに到達するために最も関連性のあるコンテキストを選択する必要がある、発行に対するコンテキストベースのアプローチを使用する。このコンテキストは、上述のアノテーションシステムのカテゴリーコンテキストに制限されることが好ましい。発行者は、アイテムに加えて公開された形態の識別情報を明らかにする。これは、（発行者のユーザＩＤと同じであっても、又は同じでなくてもよい）発行者に対して一意な発行者ＩＤを使用し、このＩＤを用いてアイテムをアノテーションすることによって行われることができる。発行の行為は、発行アノテーションイベントと呼ばれる特別な種類のアノテーションイベントを生成するか、アイテムに発行者ＩＤを追加する通常のアノテーションイベントである可能性がある明確な行為に変換されることができる。 The present invention divides the communication process into three stages: publishing, context setting, and subscribing. The present invention uses a context-based approach to publishing where the issuer needs to select the most relevant context for the item in question to reach the desired subscriber. This context is preferably limited to the category context of the annotation system described above. The issuer reveals the identification information of the published form in addition to the item. This is done by using a unique issuer ID for the issuer (which may or may not be the same as the issuer's user ID) and annotating the item with this ID. Can be The issuing action can be converted to a clear action that can be a special annotation event that generates a special type of annotation event called an issuing annotation event or adds an issuer ID to an item.

サブスクライバは、ユーザがそれまでに興味を示すか、又は興味を明らかにしたカテゴリーコンテキストの集合からのコンテキストに基づいてアイテムを周期的に取得する（引き出す）。次に、そのようなサブスクライブされたアイテムは、アイテムのキーワードと、ユーザがコンテキストに関して過去のアノテーションにおいて使用したキーワードとの間のマッチに基づいて「パーソナライズされる」か又は再ランク付けされることができる。これは、ドキュメントの発行者ＩＤと、それまでにユーザが有用であると思った（又はアノテーションした）アイテムの発行者ＩＤとの間のマッチを計算することによって有利に増強されることができる。発行者ＩＤは、発行者に関する評判の分散された形態として働く。それまでに発行者からのアイテムを（選択又はブックマークするなど）アノテーションしたサブスクライバは、パーソナライズにおける再ランク付けの後で同じ発行者からの将来のアイテムを高くランク付けさせることができる。同様に、発行者がそれまでにユーザから比較的少ないアノテーションを受けている場合、当該発行者からの将来のアイテムは低くランク付けされる。フィードバックループを完成するためにアノテーションイベントは通常の検索とは異なるように実装されることができ、アイテムをアノテーションするユーザはそれらのユーザが認める発行者ＩＤをそれらのユーザのアノテーションに自動的に含めさせることができる。これらのアノテーションイベントは収集されることができ、ちょうどアノテーションイベントがアイテムをランク付けするようにコンテキスト内の発行者をランク付けすることを可能にすることができる。 Subscribers periodically retrieve (retrieve) items based on context from a set of category contexts that the user has previously shown interest in or revealed interest in. Such subscribed items are then “personalized” or re-ranked based on the match between the item's keyword and the keyword that the user used in the past annotations with respect to the context. Can do. This can be advantageously augmented by calculating a match between the issuer ID of the document and the issuer ID of the item that the user previously thought useful (or annotated). The issuer ID serves as a distributed form of reputation for the issuer. Subscribers who have previously annotated items (such as selecting or bookmarking) from publishers can make future items from the same publisher highly ranked after re-ranking in personalization. Similarly, if an issuer has received relatively few annotations from a user so far, future items from that issuer are ranked low. An annotation event can be implemented differently than a normal search to complete the feedback loop, and users who annotate items automatically include the publisher ID they recognize in their annotations Can be made. These annotation events can be collected and can allow publishers in the context to be ranked just as annotation events rank items.

コンテキスト設定は、特定のコンテキスト内で初めて発行されたアイテムがそのコンテキスト内のユーザのサブセットにプッシュダウンされるプロセスである。このユーザのサブセットはアイテムに対するパイロットグループとして働き、それらのユーザがそのアイテムが適切と感じる場合、それらのユーザはそのアイテムをそのキーワードを用いてアノテーションする。これらのアノテーションイベントが収集され、次に、アノテーションされたアイテムがすべてのコンテキストサブスクライバに利用できるようにされる。コンテキスト設定フェーズは、既定の期間、アイテムがアノテーションを獲得するレートに対する下限、又はその他の好適な測定基準に制限されることができる。このプロセスは、特定の状況でアイテムの利用のプロセスを促進する。これは、アイテムが将来の検索又はサブスクリプションのためにそのアイテムを十分に分類するためのアノテーションの特定の最小レベルを受け取ることを可能にする。これは、新しいアイテムが関連性があり適切であると感じた場合に当該新しいアイテムを奨励し、速やかな受け入れを助けることができるコンテキストに関して影響力のある発行者に当該新しいアイテムを紹介するために有利に使用されることができる。コンテキスト設定は、実装が実装することから価値を導き出せる場合に実装が実装することができる任意的なステップである。 Context configuration is the process in which items that are first published within a particular context are pushed down to a subset of users within that context. This subset of users acts as a pilot group for the item, and if they feel the item is appropriate, they annotate the item with the keyword. These annotation events are collected and then the annotated items are made available to all context subscribers. The context setting phase can be limited to a predetermined period, a lower bound on the rate at which the item gets annotations, or other suitable metric. This process facilitates the process of using items in certain situations. This allows the item to receive a certain minimum level of annotation to fully categorize the item for future search or subscription. This is to encourage the new item if it feels relevant and appropriate, and to introduce the new item to an influential issuer that can help prompt acceptance It can be used advantageously. Context configuration is an optional step that an implementation can implement if it can derive value from the implementation.

以下で検討されるようにこれらに従ってシステムをモデル化するいくつかの理由がある。１つの主要な理由はエンドユーザのプライバシーである。企業のワークグループなどのいくつかの場合、キーワードによってユーザを検索することは許容されるか、又は望ましい可能性さえある（「専門家を発見する」など）。そのような場合、人は検索された人と電話又は電子メールで直接連絡を取ることができる。しかし、多くの場合、ユーザはそれらのユーザの連絡先又は個人詳細が開示されることを望まない可能性がある。これらの場合、同様の目的が、送信者がそれらの送信者の識別情報を公開するが、しかしコンテキスト内のユーザに送信先を知らない状態でメッセージを送信することによって達成されることができ、そのようなユーザはそれらのユーザの個々の判断に基づいて返信することができる。送信者を１つのコンテキスト（又は、場合によっては少数のコンテキスト）に制限することは、送信者をメッセージを送るために妥当な人のグループを選択することに集中させる。これは、自動化されたプロセスがアイテムを収集する情報フィルタリング及び検索にはない人間の判断の重要な点である。 There are several reasons to model the system according to these as discussed below. One major reason is end user privacy. In some cases, such as corporate workgroups, searching for users by keyword may be acceptable or even desirable (such as “find an expert”). In such cases, the person can contact the retrieved person directly by phone or email. However, in many cases, users may not want their contact information or personal details disclosed. In these cases, a similar objective can be achieved by sending a message without the sender revealing their identity to the users in the context, but knowing the destination. Such users can reply based on their individual judgments. Limiting the sender to one context (or possibly a few contexts) concentrates the sender on selecting a reasonable group of people to send the message. This is an important aspect of human judgment that is not in information filtering and retrieval where the automated process collects items.

公開された識別情報持つことによって、アイテムは、それらのアイテムの発行者ＩＤにも基づいて検索及びサブスクリプションのためにランク付けされることができる。これは、エンドユーザが作者のポストに認めた価値に基づいて作者がサブスクライバの間で評判を確立することを可能にする。それは、発行者に関するアカウンタビリティに関するメカニズムでもある。広く評価されている作者は、質の高いアイテムだけを発行することによってそれらの作者の評判を守ることに強い関心がある。知らないうちに又は故意にそれらの作者がそのようにすることができない場合、それらの作者は広く評価されなくなる。評判は築くのに時間がかかるので、広く評価された作者は、質の低いアイテムを奨励することによって得るものはほとんどなく、失うものは多い。これはスパム行為を働くことが難しい評判の分散された形態であり、そのような作者は、リンク解析、又はアノテーションベースの解析の統合的なハイパーリンクのいずれかから認められ得ること以外に重要な新しい特徴を情報検索アルゴリズムに加えることができる。本質的に、それは専門家の判断の要素をランク付けプロセスに含める。したがって、実装は、アイテム毎のアノテーション情報と発行者ＩＤ情報との両方を、コンテキスト内のアイテムの全体的なランク付けを計算することに取り入れることを選択することができる。 By having published identification information, items can be ranked for search and subscription based also on their issuer ID. This allows the author to establish a reputation among the subscribers based on the value that the end user recognizes in the author's post. It is also a mechanism for accountability for issuers. Widely respected authors have a strong interest in protecting their reputation by publishing only quality items. If they do not know or deliberately do so, they will not be widely evaluated. As reputation takes time to build, widely recognized authors have little to gain by encouraging low-quality items and much to lose. This is a decentralized form of reputation that makes spamming difficult and such authors are important except that they can be recognized either from link analysis or from an integrated hyperlink of annotation-based analysis. New features can be added to the information retrieval algorithm. In essence, it includes an element of expert judgment in the ranking process. Thus, an implementation can choose to incorporate both per-item annotation information and issuer ID information into calculating the overall ranking of items in context.

コンテキスト設定フェーズは、アイテムが妥当なキーワードの集合を用いて速やかにアノテーションされるために重要である。上述のように、アイテムにコンテキストを設定するコミュニティの能力は、どんな個人の能力をもしのぐ。アイテム内のキーワードの分布がべき乗則に従うことがｄｅｌ．ｉｃｉｏ．ｕｓのようなコミュニティにおいて見られた。既定の割合を超えるユーザによってアノテーションされるキーワードの数は比較的一定していることが多く、特定の閾値を超えるイベント数に対してスケール不変性を示す。アイテムの定義特徴／キーワードと呼ばれるこれらの上位キーワードは、アイテムがコンテキスト設定などのコンテキストの影響を受ける態様でユーザに公開される場合に比較的早く取得されることができる。定義特徴は、アイテムのコンテキストのコミュニティの判断を適度に示し、サブスクリプションがより関連性があり、正確であることを可能にする。 The context setting phase is important because items are quickly annotated with a reasonable set of keywords. As mentioned above, the community's ability to set context on items surpasses any individual's ability. The distribution of keywords in items follows a power law. icio. Seen in communities like us. Often, the number of keywords annotated by users exceeding a predetermined percentage is relatively constant, indicating scale invariance for the number of events exceeding a certain threshold. These high-level keywords, called item definition features / keywords, can be obtained relatively early when the item is exposed to the user in a context-sensitive manner such as context settings. The defining features reasonably indicate the community's judgment of the context of the item and allow the subscription to be more relevant and accurate.

コンテキスト設定は新しいアイテムが速やかに認識されるためにも重要である。比較的知られていない発行者からのアイテムは、コンテキストに関する広く評価されている発行者に的を絞った態様で送信されることができ、それらの広く評価されている発行者が当該アイテムに価値を見出す場合、それらの広く評価されている発行者はそれらの広く評価されている発行者の識別情報と共に当該アイテムを発行することができる（基本的に、それらの広く評価されている発行者の識別情報を元の作者と共にアイテムに追加する）。これは、新しいアイテムに関して集団全体の中での速やかな利用を可能にする。これは、有望な新しい才能が早く取り上げられることも可能にする。 Context settings are also important for new items to be recognized quickly. Items from relatively unknown issuers can be sent in a manner that is targeted to widely-recognized issuers with respect to the context, and those widely-recognized issuers value the item. If they find an item, they can publish the item along with their widely-rated issuer's identification information (basically, those widely-issued issuers' Add identification information to the item with the original author). This allows for quick use within the entire group for new items. This also allows promising new talents to be taken up early.

コンテキスト内のアイテムを処理するコミュニティの能力は、コンテキスト内のユーザ数に基づいて増す可能性がある。すべての人がすべてのアイテムを処理する必要はない。アイテムはコミュニティのサブセットの間で分割されることができ、並列的にコンテキスト設定されることができる。カテゴリーコンテキストは、そのようなコラボレーションが行われるための有意義な場所を示す。例として、Ｇｏｏｇｌｅにはインデックス付けされた５０億を超えるページが存在し、１日当たり１千億を超える電子メール（残念ながらスパムを含む）が存在する。このコミュニケーションメカニズムの好適な実装はウェブ全体の大きさの領域を適正な期間内にコンテキスト設定することが可能である可能性がある。汎用的アノテーションメカニズムに関して、このコミュニケーション方法は、ウェブ検索におけるスパイダの役割の実用的な代替を示す。さらに、全般的なコンテキストレベルの統計が送信者に利用可能にされることができ、それらの送信者が適切なコンテキストを発見することを可能にすることができる。そのような統計は、コンテキスト内のユーザ数及びアイテム数を含むことができるがこれらに限定されない。ユーザ対アイテムの比が平均よりも高い場合、これは話題になっているトピックのよい指標である可能性がある。比が平均よりもかなり低い場合、送信者は、コンテキストがその送信者のメッセージに対して競争が激しすぎると判断することができる。これは、ウェブ上のコンテンツ生成に、又はより包括的には、任意のコラボレーションする組織がどのようにタスクにリソースを割り当てたいかに影響を与える可能性がある重要なフィードバックループを与える。 The community's ability to process items in the context may increase based on the number of users in the context. Not every person needs to process every item. Items can be divided among a subset of communities and contextualized in parallel. The category context indicates a meaningful place for such collaboration to occur. As an example, Google has over 5 billion pages indexed and over 100 billion emails per day (unfortunately including spam). A preferred implementation of this communication mechanism may be able to context the entire web-sized area within a reasonable period of time. With respect to generic annotation mechanisms, this communication method represents a practical alternative to the role of spiders in web search. In addition, general context level statistics can be made available to senders, allowing them to discover the appropriate context. Such statistics can include, but are not limited to, the number of users and items in the context. If the user-to-item ratio is higher than average, this may be a good indicator of the topic being talked about. If the ratio is much lower than average, the sender can determine that the context is too competitive for the sender's message. This provides an important feedback loop that can affect content generation on the web, or more generally, how any collaborating organization wants to allocate resources to tasks.

アノテーションは最初のコンテキスト設定プロセスの後でさえも継続し、アイテムは時間の経過と共に異なる人によって異なるキーワードを用いて記述され続ける。コンテキスト設定フェーズは、関連性のある又は有望なアイテムが前面に出されるようにアイテムの最初の選別に寄与することができる。さらなるアノテーションは、定義キーワードのより関連性のある集合を通じてアイテムのさらなる特徴付けを可能にする。アイテムに関する定義特徴の中のキーワードは、コミュニティがアイテムの有用性を時間の経過と共に異なる態様で特徴付けるので時間に対して変化する（激しく変動する挙動を示す）可能性がある。アノテーションのプロセス全体は、送信先のユーザが未知であり、各ユーザがアイテムのコンテキストに対する関連性のそれらのユーザの判断に基づいて新しいコンテキストを通じてその他のユーザにアイテムを渡すスモールワールドネットワークを介してアイテムを送信することに結びつけられることができる。効果的に、コンテキストの関連の意味ネットワークは、そのようなコンテキストに基づくインタラクションのソーシャルネットワークによって作成されるのみならず、そのようなコンテキストに基づくインタラクションのソーシャルネットワークを反映する。カテゴリーコンテキストは、ユーザ間のアイテムの効率的な伝達を可能にする、そのようなネットワークにおける多くの接続を有するハブを示す。 Annotations continue even after the initial context setting process, and items continue to be described with different keywords by different people over time. The context setting phase can contribute to the initial selection of items so that relevant or promising items are brought to the foreground. Additional annotations allow for further characterization of items through a more relevant set of defining keywords. The keywords in the defining features for an item can change over time (showing severely varying behavior) as the community characterizes the usefulness of the item in different ways over time. The entire annotation process involves items via a small world network where the destination user is unknown and each user passes the item to other users through the new context based on their judgment of relevance to the item context. Can be tied to sending. Effectively, context-related semantic networks are not only created by such a context-based interaction social network, but also reflect such a context-based interaction social network. The category context refers to a hub with many connections in such a network that allows efficient communication of items between users.

サブスクリプションプロセスは、コンテキストレベルで収集されたアイテムがユーザの興味に基づいて周期的に取り出されることを可能にする。コンテキストに対するユーザの興味は、ユーザによって明示的に指定されることができるか、又はユーザのアノテーション、クリックストリーム、若しくはサブスクリプションプロセス内のアイテムの使用パターンに基づいて暗黙的に導出されることができる。ユーザの興味の明示的な指定は、特定のコンテキストに関するクエリを繰り返し行い、バックグラウンドでそのコンテキストに関するアイテムを継続的に取り出すことに相当する。しかし、検索プロセスと異なり、サブスクリプションに対するユーザの興味の明示的指定は実用的な方法になりにくい。任意の所与の瞬間に、ユーザが気付いていない可能性があるユーザに潜在的に関連する多くのコンテキスト内の多くのアイテムが存在する可能性がある。そのようなサブスクリプションシステムの暗黙的な目的はそのようなアイテムの発見を容易にすることである。本発明は、アノテーションイベントに基づいてユーザをプロファイリングして興味を推論するパーソナライズの形態を使用する。 The subscription process allows items collected at the context level to be retrieved periodically based on user interests. The user's interest in the context can be explicitly specified by the user or can be implicitly derived based on the user's annotations, clickstream, or usage pattern of items in the subscription process . An explicit designation of the user's interest is equivalent to repeatedly querying for a particular context and continuously retrieving items for that context in the background. However, unlike the search process, explicitly specifying user interest in subscriptions is not a practical method. At any given moment, there can be many items in many contexts that are potentially relevant to the user that the user may not be aware of. The implicit purpose of such a subscription system is to facilitate the discovery of such items. The present invention uses a form of personalization that profiles users and infers interest based on annotation events.

当技術分野に知られているパーソナライズに対する多くのアプローチが存在する。本発明は、適切なパーソナライズを達成することにおいて重要な３つの点、カテゴリーコンテキストと、発行者ＩＤと、再ランク付けのためのＴＦＩＤＦの時間に基づく変形とを導入する。パーソナライズに対するほとんどのユーザプロファイリングに基づくアプローチは、これまでにユーザが価値があると思ったキーワードに基づいてアイテムをランク付けするように試みる。しかし、そのようなアプローチは、重要な新しい興味の領域を見逃し、キーワードの制限された集合を増強し続け、ユーザエクスペリエンスを損なう。カテゴリーコンテキストを使用することによって、コミュニティが興味深いと思うものに基づいた思いがけない発見の要素を導入することができる。概して、そのような思いがけなく発見されるアイテムは、ユーザを予期しない新しい領域に導き、ユーザプロファイルに対する関連性のある新しい興味の領域の発見を容易にする。これは、そのような新しいアイテムのアノテーションイベント（クリックなど）において暗黙的に、さらに、アイテムを読むこと／使用することが原因でユーザがそのような新しいコンテキストにおいて検索を実行することによって明示的に獲得されることができる。 There are many approaches to personalization known in the art. The present invention introduces three important points in achieving proper personalization: category context, issuer ID, and TFIDF time-based variants for reranking. Most user profiling-based approaches to personalization attempt to rank items based on keywords that users have previously thought worth. However, such an approach misses important new areas of interest, continues to augment the limited set of keywords and impairs the user experience. By using category context, you can introduce elements of unexpected discovery based on what the community finds interesting. In general, such unexpectedly discovered items lead the user to unexpected new areas and facilitate the discovery of new areas of interest that are relevant to the user profile. This is implicit in such new item annotation events (such as clicks) and explicitly by the user performing a search in such a new context due to reading / using the item. Can be earned.

サブスクリプションプロセスはそれぞれのカテゴリーコンテキストを、そのプロセスがサブスクライブする独立したアイテムのソースとみなす。サブスクリプションは、ユーザのプロファイル内のすべてのカテゴリーコンテキストからアイテムを取り出す。これは、興味の分布に基づいてそのような取り出しを分散することによってなされることができる。例として、我々がユーザが制限された注意範囲（１日当たりの特定の最大アイテム数など）を有すると仮定する場合、コンテキストから取り出されるアイテム数は、取り出されるアイテムの合計に対する比が、（読まれる／取り出されるアイテム数などのような）その特定のコンテキストにおいてユーザが費やす注意の量対合計の比と同じである可能性がある。この分布の計算は所与の期間に制限されることもでき、その期間中にユーザがプロファイリングされる。 The subscription process considers each category context as a source of independent items that the process subscribes to. The subscription retrieves items from all category contexts in the user's profile. This can be done by distributing such retrievals based on the distribution of interest. As an example, if we assume that the user has a limited attention range (such as a specific maximum number of items per day), the number of items retrieved from the context is the ratio to the total number of items retrieved (read It may be the same as the ratio of the amount of attention the user spends in that particular context (such as the number of items / removed, etc.) to the total. This distribution calculation can also be limited to a given time period during which the user is profiled.

様々なコンテキストのカテゴリーにまたがる興味の分布の形態のユーザプロファイルは、ユーザの検査のために要求に応じてユーザに示されることができる。ユーザは様々なメタファでそのような比を削除又は更新して、それらのユーザの趣向の調整を可能にすることができる。しかし、特定の比に対する変更がどのような意味を持つかはユーザに容易に明らかにならない可能性があるので、実施形態は、実際のユーザの使用に関わりなく継続的にダウンロードが行われるカテゴリーコンテキストをユーザが指定又は指定解除することを単に可能にすることができる。それは、ユーザがプロファイルから任意のカテゴリーコンテキストを取り除く／削除することも可能にすることができる。 A user profile in the form of a distribution of interest across various context categories can be presented to the user on demand for user examination. Users can delete or update such ratios in various metaphors to allow them to adjust their preferences. However, since it may not be readily apparent to the user what the change to a particular ratio means, the embodiment is a category context that is continuously downloaded regardless of actual user usage. Can simply be allowed to be specified or unspecified by the user. It may also allow the user to remove / delete any category context from the profile.

そのようなコラボレーションによるランク付けと、ユーザが関連性があると思うものとの間のバランスが必要である。本発明は、特定のユーザに対する関連性を判定するために、ユーザプロファイルに基づく再ランク付けのためのＴＦ−ＩＤＦアプローチの時間に基づく変形を導入する。特定のコンテキストの特定のユーザに関して、キーワード及びそれらのキーワードの実際の使用頻度が、そのコンテキストに関するアノテーションイベントから導出される。特定期間の使用頻度を計算するために、キーワードの使用回数と、ユーザがそのコンテキスト内のそのキーワードを最初に使ったときからの時間間隔とが測定される。本発明が特定期間の使用頻度と称するものを与えるために、この頻度を元に所定の期間に対する頻度が推定される。例として、所与のキーワードに関して、ユーザがそのキーワードの最初の使用以来２日の間にそのキーワードを２回使用した。これは、１年につき３６５回の特定期間の使用頻度を与える。この特定期間の使用頻度は、ＴＦ−ＩＤＦスタイルのアプローチでよくあるように頻度の対数値を使用することによって変動を小さくされ、コンテキストに対するユーザの興味のキーワードベクトルに対する重みとして使用される。アイテムのキーワードベクトルの重みは従来のＴＦ−ＩＤＦの態様で行われることができ、このユーザに関するコンテキストに対するアイテムのランク（マッチ）が通常の態様で２つのベクトルの内積として計算されることができる。アイテムが、そのような計算されたランクに基づいてコンテキスト内で再ランク付けされる。 There needs to be a balance between such collaborative rankings and what users find relevant. The present invention introduces a time-based variant of the TF-IDF approach for re-ranking based on user profiles to determine relevance for a particular user. For a particular user in a particular context, the keywords and the actual usage of those keywords are derived from the annotation events for that context. To calculate the usage frequency for a specific period, the number of times the keyword is used and the time interval since the user first used the keyword in the context are measured. In order for the present invention to give what is called the usage frequency of a specific period, the frequency for a predetermined period is estimated based on this frequency. As an example, for a given keyword, the user has used the keyword twice during the two days since the first use of the keyword. This gives a usage frequency of 365 specific times per year. The frequency of use of this particular period is reduced in variation by using the logarithm of frequency, as is often the case with TF-IDF style approaches, and is used as a weight for the keyword vector of user interest in context. The weight of the item's keyword vector can be done in a conventional TF-IDF manner, and the item's rank (match) for the context for this user can be calculated as the inner product of the two vectors in the usual manner. Items are re-ranked in context based on such calculated rank.

上述のユーザの興味のプロファイルは扱いに注意を要する個人データを含む。したがって、ユーザは、記憶されているものを見て、それを変更するためのフルアクセスを有するそれらのユーザのＰＣ上のクライアントサイドの実装の方が、そのような情報が集中サーバ上で管理されるよりも満足を感じやすい。しかしこれは、所与のコンテキストに関して、大量のデータをクライアントサイドにダウンロードすることなしに、コンテキストに対応するアイテムの集合全体に渡って再ランク付けを実行することができない可能性があることを意味する。あるいは、集中サーバ上でさえも、そのようなパーソナライズされた再ランク付けはコストがかかりすぎて実行できない可能性があるか、又はコンテキストのコミュニティのコラボレーションによるランク付けが失われる可能性があるので望ましくない可能性がある。これらの目的の間のバランスが、再ランク付けをコンテキストにおける上位のコラボレーションによってランク付けされた結果のサブセットに制限することによって得られる。これは、コンテキストからのいくつかのアイテムだけをそのコンテキストから取り出し、ユーザプロファイルに基づいてこれらの結果を再ランク付けすることによって達成されることができる。この数は、コラボレーションによるランク付けとパーソナライズの間の混合の制御を可能にする。 The user interest profile described above includes personal data that requires attention to handling. Thus, a client-side implementation on those users' PCs that has full access to see what is stored and modify it is more manageable on a centralized server. It is easier to feel satisfaction than However, this means that for a given context, it may not be possible to perform reranking across the entire set of items corresponding to the context without downloading a large amount of data to the client side. To do. Or, even on a centralized server, such personalized re-ranking may be too costly to perform, or it may be lost because the context community collaboration ranking may be lost There is no possibility. A balance between these objectives is obtained by limiting the re-ranking to a subset of the results ranked by higher collaborations in the context. This can be achieved by taking only a few items from the context and re-ranking these results based on the user profile. This number allows control of the mix between collaborative ranking and personalization.

ＴＦ−ＩＤＦの時間に基づく変形のようなレートに基づく計算が発行者ＩＤと共に、興味深い作者の効率的検出に有利に適用されることができる。場合によっては多数のアイテムの一定のフローを有する可能性がある通常のキーワードとは異なり、ほとんどの作者は比較的少数のアイテムを生成する。発行者ＩＤがアイテムに関するキーワードベクトル内のキーワードのように扱われる場合、通常のＴＦ−ＩＤＦに似た方法でさえもその他のキーワードよりも発行者ＩＤにより高い重みを与える。これは、発行者ＩＤに関するユーザの特定期間の使用頻度を使用することによって好適に増強されることができる。例として、例えば発行者Ａが全部で２つのブログのポストを発行しており、ユーザは当該発行者ＩＤに遭遇してから過去２日の間に当該発行者のポストの両方を読んでいる。例えば、別の発行者Ｂが２０のブログのポストを有し、ユーザはそれらのポストのうちの２０個すべてをこの１年の間に読んでいる。発行者Ａは新しいアイテムに関して発行者Ｂよりも高くランク付けされる。しかし、ユーザが発行者Ａによるその後のアイテムを読まないか、又は発行者Ａがその後アイテムを生成しない場合、ランク付けは時間の経過と共に落ちる。この方法は、最近有用だと分かった発行者ＩＤが最初に比較的高くランク付けされることを保証するが、有用なアイテムを安定して生成するその他の発行者が有用なアイテムを安定して生成しない発行者を追い抜くことを可能にする。 Rate-based calculations such as time-based variants of TF-IDF, along with issuer ID, can be advantageously applied to the efficient detection of interesting authors. Unlike regular keywords, which in some cases may have a constant flow of a large number of items, most authors generate a relatively small number of items. If the issuer ID is treated like a keyword in a keyword vector for an item, even a method similar to normal TF-IDF gives higher weight to the issuer ID than other keywords. This can be suitably augmented by using the user's specific period usage frequency for the issuer ID. As an example, for example, issuer A has issued a total of two blog posts, and the user has read both issuer posts in the past two days after encountering the issuer ID. For example, another publisher B has 20 blog posts and the user has read all 20 of those posts during the past year. Issuer A is ranked higher than issuer B for new items. However, if the user does not read a subsequent item by issuer A or if issuer A does not subsequently generate the item, the ranking will drop over time. This method ensures that issuer IDs that have recently been found useful will be ranked relatively high initially, but other issuers that stably generate useful items will be able to stabilize useful items. Allows overtaking issuers that do not generate.

同様の時間に基づくＴＦ−ＩＤＦアプローチが、システムのドリルダウンカテゴリーにおいてカテゴリーコンテキストをランク付けするために使用されることができる。本質的に、コンテキストのユーザの実際の使用、及びその最新性がランク付けプロセスに好適に組み込まれることができ、コンテキスト内のイベントの累積数又は所与の期間中の累積数だけではない可能性がある。 A similar time-based TF-IDF approach can be used to rank category context in the drill-down category of the system. In essence, the actual usage of the user of the context, and its up-to-dateness, can be favorably incorporated into the ranking process, and not just the cumulative number of events in the context or the cumulative number during a given period of time. There is.

発明の概要Summary of the Invention

本発明の広範な概念によれば、本発明はコラボレーションのための方法を提供し、この方法は、
一意な識別子を有する複数のユーザ間で共有することができる、一意な識別子を有する複数のアイテムを特定するステップと、
各ユーザにその他のユーザと無関係に少なくとも１つの自然言語の少なくとも１つのキーワードを用いて複数のそのようなアイテムをアノテーションさせるステップであって、それぞれのそのようなアイテムは少なくとも１人のユーザによってアノテーションされ、それぞれのそのようなアノテーションはアノテーションするユーザの識別子、アノテーションされているアイテムの識別子、及びアノテーションするユーザがアノテーションされているアイテムを記述するために選択する少なくとも１つのキーワードを含むアノテーションイベントによって示され、それぞれのそのようなアノテーションイベントは少なくとも１つの種類の複数のイベントソースから生成される、ステップと、
特定のアイテムに関連する前記キーワードが該アイテムのためのアノテーションイベントから収集され、かつ、特定のユーザに関連する該キーワードが該ユーザにのためのアノテーションイベントから収集されるようにイベントソースからアノテーションイベントを収集するステップと、
少なくとも１人の前記ユーザにアイテム又はユーザをキーワードによって検索させるステップであって、該検索に使用したキーワードを前記収集されたキーワードの中に有するアイテム又はユーザが結果として返されるように検索させるステップと
を含む。 According to the broad concept of the present invention, the present invention provides a method for collaboration, which comprises:
Identifying a plurality of items having unique identifiers that can be shared among a plurality of users having unique identifiers;
Causing each user to annotate a plurality of such items with at least one keyword in at least one natural language independent of other users, each such item being annotated by at least one user Each such annotation is indicated by an annotation event that includes an annotating user identifier, an annotated item identifier, and at least one keyword that the annotating user selects to describe the annotated item. Each such annotation event is generated from a plurality of event sources of at least one type, and
Annotation event from an event source such that the keyword associated with a particular item is collected from an annotation event for the item and the keyword associated with a particular user is collected from the annotation event for the user Collecting steps,
Causing at least one user to search for an item or user by keyword, wherein the item or user having the keyword used for the search in the collected keyword is returned as a result; including.

本発明の実施形態は、各アイテムが複数のユーザによってアノテーションされる可能性があり、各ユーザがアイテムの複数のアノテーションに渡って収集されたアイテムのキーワードに基づいてアイテムを検索することができるようにユーザが互いに独立して複数のアイテムをアノテーションすることを可能にするシステム及び方法を提供する。概して、本明細書において使用される用語「アノテーション」は、キーワードがユーザから収集され、その後、そのユーザの識別子と関連して記憶されるようなアイテムの任意の簡潔な記述を指す。アイテムは、一意な識別子によって特定されることができる任意のアイテム（ファイルシステム内のファイル、紙のドキュメント、プロセス管理システム内のタスク及び課題、リポジトリ内に記憶されたアイデアなどを含む）に対応する可能性がある。本発明の実施形態において、アノテーションは、発行、タグ付け、検索結果の集合内の結果のクリック、ファイルシステムパスからのディレクトリ及びファイル名、ハイパーリンクテキストなどを含む様々な方法で収集されることができる。 Embodiments of the present invention allow each item to be annotated by multiple users and allow each user to search for items based on the keyword of the item collected across multiple annotations of the item Provides a system and method that allows a user to annotate multiple items independently of each other. In general, the term “annotation” as used herein refers to any concise description of an item such that keywords are collected from a user and then stored in association with the user's identifier. An item corresponds to any item that can be identified by a unique identifier, including files in the file system, paper documents, tasks and issues in the process management system, ideas stored in the repository, etc. there is a possibility. In embodiments of the present invention, annotations may be collected in a variety of ways, including publishing, tagging, clicking on results in a collection of search results, directories and file names from file system paths, hyperlink text, etc. it can.

本発明は、各イベントソースの種類に関して別々にクエリに対する関連性に基づいて検索結果をランク付けするステップと、各結果の最終的なランクを計算するためにそのようなランクを収集するステップと、関連性の順番で最終結果を示すためにすべてのイベントソースの種類にまたがって結果を収集するステップとをさらに含むことができる。 The invention ranks search results based on relevance to the query separately for each event source type, and collects such ranks to calculate the final rank of each result; Collecting results across all event source types to show final results in order of relevance.

一実施形態において、検索の結果集合はＴＦ−ＩＤＦのような情報検索アルゴリズムを使用してランク付けされる。本発明の別の態様において、各アノテーションは、ハイパーリンクと、リンク解析ランク付けアルゴリズムに基づく結果集合の決定及び結果集合のランク付けとに等しいとみなされる。別の態様において、あらゆるアノテーションに関して、各ユーザはハブとみなされ、各アイテムはオーソリティとみなされ、結果及びランク付けがＨＩＴＳなどのリンク解析アルゴリズムによって決定される。例として、そのような方法を通じて、企業のファイル共有におけるファイルなどのウェブページ以外のアイテムが、ウェブ検索に通常関連するより高い精度の恩恵を受けることができる。 In one embodiment, the search result set is ranked using an information search algorithm such as TF-IDF. In another aspect of the invention, each annotation is considered equal to a hyperlink and a result set determination and result set ranking based on a link analysis ranking algorithm. In another aspect, for every annotation, each user is considered a hub, each item is considered an authority, and results and rankings are determined by a link analysis algorithm such as HITS. By way of example, through such a method, items other than web pages, such as files in a corporate file share, can benefit from the higher accuracy normally associated with web searches.

本発明の別の実施形態において、アノテーションは、ユーザ及びアイテムのコンテキストベースのクラスタ化を可能にするために処理される。アノテーションは、特定の既定のユーザ数及び既定のアイテム数を超えるユーザ数及びアイテム数を有するコンテキストがユーザ及びアイテム両方のクラスタを同時に示すようにキーワードに基づくコンテキストに従ってグループ化される。これは、検索結果に関するドリルダウンカテゴリーを提供することによって検索プロセスを強化するために使用される。例として、ウェブ検索エンジンは、そのクリックストリームログからアノテーションを収集することができ、以下のクラスタ化方法を使用して結果を分類し、ユーザが意味のある態様で結果をさらに絞ることを可能にするあり得るクエリ修正を提供することができる。 In another embodiment of the invention, annotations are processed to allow context-based clustering of users and items. Annotations are grouped according to a keyword-based context so that contexts with a certain default number of users and a number of users and items that exceed the default number of items simultaneously indicate clusters of both users and items. This is used to enhance the search process by providing a drill-down category for search results. As an example, a web search engine can collect annotations from its clickstream log and classify the results using the following clustering method, allowing the user to further refine the results in a meaningful manner: Possible query modifications can be provided.

本発明の別の実施形態において、キーワードに基づいてユーザを検索することができる。上述のように、これは、ユーザのアノテーションの収集されたキーワードを使用することによってか、又はＨＩＴＳ、ＢＦＳ、若しくはＩＮＤＥＧＲＥＥのようなリンク解析ランク付けを使用することを通じてかのいずれかで行われることができる。クエリに対するユーザのランク付けは、上述のようにアイテムと同様の態様で遂行されることができる。 In another embodiment of the invention, users can be searched based on keywords. As mentioned above, this can be done either by using the collected keywords of the user's annotations, or through using link analysis rankings such as HITS, BFS, or INDREGREE. Can do. Ranking users for queries can be accomplished in a manner similar to items as described above.

本発明の別の実施形態において、ユーザは、発行及びサブスクライブを通じてコンテキストベースの態様でその他のユーザとコミュニケーションすることができる。発行において、ユーザは、コンテキスト及び発行者識別子を用いて新しいアイテムをアノテーションすることによってその新しいアイテムをシステムに導入する。その他のユーザは、検索を通じて、又はサブスクライブによってそのようなアイテムを発見することができる。サブスクライブは、これまでにユーザが有用であると思ったコンテキストに基づいて上位の結果を自動的に検索し、取り出し、パーソナライズされた態様でそれらの結果を提示することを指す。そのようなコンテキストは、ユーザによって明示的に指定されるか、又はユーザのアノテーションに基づいて監視されることができる。本発明の一態様において、パーソナライズは、上位にランク付けされたサブスクライブされたアイテムのサブセットをＴＦ−ＩＤＦの時間に基づく代替形態を用いて再ランク付けすることを通じて遂行される。本発明の別の態様において、サブスクリプションは指定期間内のアイテムに制限されることができる。本発明の別の態様において、発行及びサブスクライブの両方は、ユーザ及びアイテムのクラスタを示すコンテキストに制限されることができる。 In another embodiment of the present invention, users can communicate with other users in a context-based manner through publishing and subscribing. In publishing, the user introduces the new item into the system by annotating the new item with the context and issuer identifier. Other users can find such items through search or by subscribing. Subscribing refers to automatically searching for top results based on contexts that the user has previously found useful, retrieving them, and presenting those results in a personalized manner. Such context can be explicitly specified by the user or monitored based on user annotations. In one aspect of the present invention, personalization is accomplished through re-ranking a subset of the highly ranked subscribed items using a time-based alternative of TF-IDF. In another aspect of the invention, subscriptions can be limited to items within a specified period. In another aspect of the invention, both publishing and subscribing can be limited to a context that represents a cluster of users and items.

別の実施形態において、発行は、各サブスクライバにおいてパーソナライズされた態様でアイテムをランク付けする基礎として使用される発行者識別子を用いた明示的な行為に変換される。これは発行者がユーザの集団中の利用に基づく分散された評判を獲得することを可能にし、高くランク付けされた発行者は質の高いアイテムを発行することに強い関心がある。これは、アイテムのランク付けで利用されることができる専門家の判断の形態をもたらす。 In another embodiment, the issue is converted to an explicit act with an issuer identifier that is used as a basis for ranking items in a personalized manner at each subscriber. This allows publishers to earn a distributed reputation based on usage in the user's population, and highly ranked issuers are very interested in publishing quality items. This provides a form of expert judgment that can be utilized in item ranking.

本発明の別の態様によれば、本方法は、コンテキストによってアイテム及びユーザの両方を同時にクラスタ化するステップと、コンテキストによってアノテーションイベントを収集するステップと、コンテキストの収集されたアノテーションイベントに既定の最小数の一意なユーザ識別子及び既定の最小数の一意なアイテム識別子を有するコンテキストを判定するステップと、そのようなコンテキストに基づいてアイテム及びユーザをクラスタ化するステップとをさらに含む。この方法は、クラスタ化アルゴリズムを使用することと、検索結果に対する下位カテゴリーとしてクラスタを提示することとに基づいてアイテム又はユーザの結果集合をクラスタ化することをさらに含む。クラスタ化は、ＬＳＡ、Ｋ−平均、自己組織化マップ、主成分解析、多次元尺度構成法、及び射影法からなる群からの方法に基づいて遂行されることができる。クラスタ化は、キーワード、アイテム識別子、及びユーザ識別子からなる群からの少なくとも１つのデータの種類に基づいて実行されることができる。 According to another aspect of the invention, the method comprises the steps of simultaneously clustering both items and users by context, collecting annotation events by context, and a predetermined minimum for context collected annotation events. Further comprising determining a context having a number of unique user identifiers and a predetermined minimum number of unique item identifiers, and clustering items and users based on such context. The method further includes clustering the item or user result set based on using a clustering algorithm and presenting the cluster as a subcategory to the search results. Clustering can be performed based on a method from the group consisting of LSA, K-means, self-organizing map, principal component analysis, multi-dimensional scaling, and projection methods. Clustering can be performed based on at least one data type from the group consisting of keywords, item identifiers, and user identifiers.

本発明のさらに別の態様によれば、アイテムに対して定義キーワードが、アノテーションする集団の特定の割合を超える割合によって使用されるキーワードの集合にそれらのキーワードが対応するように、アイテムの収集されたキーワードから計算される。定義キーワードのこれらの集合は、意味、概念、及びそれらの意味関係の機械表現を決定するための基礎として使用される。意味関係は、パターン認識方法とＬＳＡなどの相関解析方法とからなる群からの少なくとも１つの方法を用いて計算されることができ、オントロジが知識表現フォーマットで示されることができる。オントロジは、ＲＤＦ，ＯＷＬ、実体関連図、リレーショナルデータベーススキーマ、オブジェクト指向クラス、ＸＭＬ、及び表からなる群のうちの１つであるフォーマットで示されることができる。方法は、結果集合からのアイテムが特定のキーワードがそれらのアイテムの定義キーワード内に存在する場合に削除されることができるフィルタリング方法をさらに含む。 According to yet another aspect of the present invention, items are collected so that the keywords correspond to a set of keywords that are used by a definition keyword for an item that exceeds a certain percentage of the annotating population. Calculated from keywords. These sets of definition keywords are used as the basis for determining the machine representation of meanings, concepts, and their semantic relationships. The semantic relationship can be calculated using at least one method from the group consisting of pattern recognition methods and correlation analysis methods such as LSA, and the ontology can be shown in a knowledge representation format. The ontology can be shown in a format that is one of the group consisting of RDF, OWL, entity relationship diagram, relational database schema, object oriented class, XML, and table. The method further includes a filtering method in which items from the result set can be deleted if certain keywords are present in the definition keywords of those items.

以下、本発明を添付図面を参照して説明する。 The present invention will be described below with reference to the accompanying drawings.

好ましい実施形態の詳細な説明Detailed Description of the Preferred Embodiment

［コンピューティング環境］
図３は、汎用的アノテーションシステムの全体の概略を示す。そのようなシステムは、イントラネットＬＡＮ／ＷＡＮ１４０などのネットワーク接続することができ、さらに直接か又は当該ＬＡＮを介してかのいずれかでインターネット１５０に接続する可能性があるクライアントシステム１１０を含む。そのようなクライアントシステム１１０は、インターネット１５０からのウェブページなどのコンテンツと、ＬＡＮ１４０からのファイル及び電子メールなどのイントラネットのコンテンツとにアクセスし、受信し、取り出し、表示するために使用されることができる。コンテンツサーバ１８０は、コンテンツをクライアントシステム１１０にサービスすることができるウェブサーバ又はアプリケーションサーバなどのウェブ上で利用可能なサーバであることができる。メールサーバ１８４、ファイルサーバ１８３、データベースサーバ１８２、並びにサーバ１８１内に収容された多くの特別に作られたソフトウェアアプリケーション及びパッケージソフトウェアアプリケーションなどの、場合によってはクライアントシステム１１０にコンテンツをサービスする可能性があるイントラネットＬＡＮにおいて利用可能な多くの種類のサーバが存在する。場合によっては、クライアントシステム１１０は、エクストラネット、仮想プライベートネットワーク（ＶＰＮ）、非ＴＣＰ／ＩＰベースのネットワークなどのその他の種類のネットワークを通じてコンテンツサーバに接続する可能性もある。 [Computing environment]
FIG. 3 shows the general outline of the general-purpose annotation system. Such a system includes a client system 110 that may be connected to a network, such as an intranet LAN / WAN 140, and may connect to the Internet 150 either directly or through the LAN. Such a client system 110 may be used to access, receive, retrieve, and display content such as web pages from the Internet 150 and intranet content such as files and emails from the LAN 140. it can. The content server 180 can be a server available on the web, such as a web server or an application server that can serve content to the client system 110. The possibility of serving content to the client system 110 in some cases, such as a mail server 184, a file server 183, a database server 182, and many specially created software applications and packaged software applications housed within the server 181. There are many types of servers available in an intranet LAN. In some cases, the client system 110 may connect to the content server through other types of networks such as extranets, virtual private networks (VPNs), non-TCP / IP based networks.

クライアントシステム１１０は、処理ユニット２０２と、システムメモリ２０３と、システムメモリ及びその他のシステムコンポーネントを処理ユニット２０２に結合するシステムバス２０４とを含む通常のパーソナルコンピュータ２０１の形態の図４のような多目的コンピューティングデバイスで実装されることができる。システムバス２０４は、メモリバス又はメモリコントローラ、周辺バス、及びローカルバスを含むいくつかの種類のうちのいずれかであってよく、各種のバス構造のいずれかを使用することができる。システムメモリ２０３は、読み出し専用メモリ（ＲＯＭ）２０５及びランダムアクセスメモリ（ＲＡＭ）２０６を含む。ＲＯＭ２０５に記憶される基本入出力システム（ＢＩＯＳ）２０７は、パーソナルコンピュータ２０１のコンポーネント間で情報を転送する基本的なルーチンを含む。ＢＩＯＳ２０５は、システムのための起動ルーチンも含む。コンピュータ２０１は、ハードディスク（図示せず）からの読み出し及びハードディスクに対する書き込みのためのハードディスクドライブ２０８と、リムーバブル磁気ディスク２１０からの読み出し及びリムーバブル磁気ディスク２１０に対する書き込みのための磁気ディスクドライブ２０９と、ＣＤ−ＲＯＭ又はその他の光媒体などのリムーバブル光ディスク２１２からの読み出し及びリムーバブル光ディスク２１２に対する書き込みのための光ディスクドライブ２１１とをさらに含む。ハードディスクドライブ２０８、磁気ディスクドライブ２０９、及び光ディスクドライブ２１１は、それぞれハードディスクドライブインターフェース２１３、磁気ディスクドライブインターフェース２１４、及び光ドライブインターフェース２１５によってシステムバス２０４に接続される。ドライブ及びそれらのドライブに関連するコンピュータ可読媒体は、パーソナルコンピュータ２０１のためのコンピュータ可読命令、データ構造、プログラムモジュール、及びその他のデータの不揮発性の記憶を提供する。コンピュータによってアクセス可能なデータを記憶するその他の種類のコンピュータ可読媒体が動作環境内で使用されることもできる。 The client system 110 includes a multi-purpose computer such as FIG. 4 in the form of a conventional personal computer 201 that includes a processing unit 202, a system memory 203, and a system bus 204 that couples system memory and other system components to the processing unit 202. It can be implemented with a storage device. The system bus 204 may be any of several types including a memory bus or memory controller, a peripheral bus, and a local bus, and any of a variety of bus structures may be used. The system memory 203 includes a read only memory (ROM) 205 and a random access memory (RAM) 206. A basic input / output system (BIOS) 207 stored in the ROM 205 includes basic routines for transferring information between components of the personal computer 201. The BIOS 205 also includes a startup routine for the system. The computer 201 includes a hard disk drive 208 for reading from and writing to a hard disk (not shown), a magnetic disk drive 209 for reading from and writing to the removable magnetic disk 210, and a CD- It further includes an optical disk drive 211 for reading from and writing to the removable optical disk 212 such as ROM or other optical media. The hard disk drive 208, magnetic disk drive 209, and optical disk drive 211 are connected to the system bus 204 by a hard disk drive interface 213, a magnetic disk drive interface 214, and an optical drive interface 215, respectively. The drives and computer readable media associated with those drives provide non-volatile storage of computer readable instructions, data structures, program modules, and other data for the personal computer 201. Other types of computer readable media that store computer accessible data may also be used within the operating environment.

プログラムモジュールは、ハードディスク、磁気ディスク２１０、光ディスク２１２、ＲＯＭ２０５、及びＲＡＭ２０６に記憶されることができる。プログラムモジュールは、オペレーティングシステム２１６、１つ又は複数のアプリケーションプログラム２１７、その他のプログラムモジュール２１８、及びプログラムデータ２１９を含むことができる。ユーザは、キーボード２２２及びポインティングデバイス２２１などの入力デバイスを通じてパーソナルコンピュータ２０１に命令及び情報を入力することができる。その他の入力デバイス（図示せず）は、マイクロホン、ジョイスティック、ゲームパッド、衛星通信用パラボラアンテナ、スキャナなどを含む可能性がある。これらの及びその他の入力デバイスは、システムバス２０４に結合されたシリアルポートインターフェース２２０を介して処理ユニット２０２に接続されることが多いが、それらの入力デバイスは、パラレルポート、ゲームポート、又はユニバーサルシリアルバス（ＵＳＢ）などのその他のインターフェースを介して接続されることができる。モニタ２２８又はその他のディスプレイデバイスも、ビデオアダプタ２２３などのインターフェースを介してシステムバス２０４に接続する。ビデオカメラ又はその他のビデオソースが、ビデオ会議及びその他のアプリケーションのためのビデオ画像を提供するためにビデオアダプタ２２３に結合され、それらのビデオ画像は処理され、さらにパーソナルコンピュータ２０１に送信されることができる。さらなる実施形態において、別個のビデオカードが、衛星放送の符号化画像を含む複数のデバイスからの信号を受け取るために提供されることができる。モニタに加えて、概してパーソナルコンピュータは、スピーカ及びプリンタなどのその他の周辺出力デバイス（図示せず）を含む。 The program module can be stored in the hard disk, magnetic disk 210, optical disk 212, ROM 205, and RAM 206. The program modules can include an operating system 216, one or more application programs 217, other program modules 218, and program data 219. A user can input commands and information into the personal computer 201 through input devices such as a keyboard 222 and a pointing device 221. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, and the like. These and other input devices are often connected to the processing unit 202 via a serial port interface 220 coupled to the system bus 204, but these input devices may be parallel ports, game ports, or universal serial devices. It can be connected via other interfaces such as a bus (USB). A monitor 228 or other display device is also connected to the system bus 204 via an interface, such as a video adapter 223. A video camera or other video source may be coupled to the video adapter 223 to provide video images for video conferencing and other applications, which may be processed and further sent to the personal computer 201. it can. In a further embodiment, a separate video card may be provided to receive signals from multiple devices including satellite broadcast encoded images. In addition to the monitor, personal computers generally include other peripheral output devices (not shown) such as speakers and printers.

パーソナルコンピュータ２０１は、リモートコンピュータ２２９などの１つ又は複数のリモートコンピュータへの論理接続を使用したネットワーク化された環境で動作することができる。リモートコンピュータ２２９は、別のパーソナルコンピュータ、サーバ、ルータ、ネットワークＰＣ、ピアデバイス、又はその他の一般的なネットワークノードであってよい。概して、そのリモートコンピュータ２２９は、パーソナルコンピュータ２０１に関連して上述されたコンポーネントのうちの多く又はすべてを含む。図４に示された論理接続は、ローカルエリアネットワーク（ＬＡＮ）２２７及び広域ネットワーク（ＷＡＮ）２２６を含む。 Personal computer 201 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer 229. The remote computer 229 may be another personal computer, server, router, network PC, peer device, or other common network node. In general, the remote computer 229 includes many or all of the components described above in connection with the personal computer 201. The logical connections shown in FIG. 4 include a local area network (LAN) 227 and a wide area network (WAN) 226.

ＬＡＮネットワーキング環境内に置かれるとき、ＰＣ２０１は、ネットワークインターフェース又はアダプタ２２４を介してローカルネットワーク２２７に接続する。インターネットなどのＷＡＮネットワーキング環境において使用される場合、概して、ＰＣ２０１は、モデム２２５、又はネットワーク２２６を介して通信を確立するためのその他の手段を含む。モデム２２５はＰＣ２０１に内蔵されるか又は外付けされることができ、シリアルポートインターフェース２２０を介してシステムバス２０４に接続する。ネットワーク化された環境において、２０１内に存在するように示されている、ＭｉｃｒｏｓｏｆｔＷｏｒｄを含むプログラムモジュールなどのプログラムモジュール又はそのプログラムモジュールの一部は、リモートの記憶装置２３０に記憶されることができる。 When placed in a LAN networking environment, the PC 201 connects to the local network 227 via a network interface or adapter 224. When used in a WAN networking environment such as the Internet, the PC 201 generally includes a modem 225 or other means for establishing communications over the network 226. The modem 225 can be built in the PC 201 or externally connected to the system bus 204 via the serial port interface 220. In a networked environment, a program module, such as a program module including Microsoft Word, shown to be present in 201, or a portion of that program module, can be stored in a remote storage device 230. .

クライアントシステム１１０は、デスクトップパーソナルコンピュータ、ワークステーション、ラップトップ、携帯情報端末（ＰＤＡ）、セル電話、又は任意のＷＡＰ対応デバイス、若しくはインターネットと直接的に又は間接的にインターフェースを取ることができる任意のその他のコンピューティングデバイスも含む可能性がある。クライアントシステム１１０は、ＭｉｃｒｏｓｏｆｔのＩｎｔｅｒｎｅｔＥｘｐｌｏｒｅｒ（商標）ブラウザ、ＮｅｔｓｃａｐｅＮａｖｉｇａｔｏｒ（商標）ブラウザ、Ｍｏｚｉｌｌａ（商標）ブラウザ、Ｏｐｅｒａ（商標）ブラウザ、又はセル電話、ＰＤＡ、若しくはその他の無線デバイスなどの場合にはＷＡＰ対応のブラウザなどのブラウジングプログラム内で実行されることができる。 Client system 110 may be a desktop personal computer, workstation, laptop, personal digital assistant (PDA), cell phone, or any WAP-enabled device, or any that can interface directly or indirectly with the Internet. Other computing devices may also be included. Client system 110 can be a Microsoft Internet Explorer ™ browser, Netscape Navigator ™ browser, Mozilla ™ browser, Opera ™ browser, or WAP in the case of a cell phone, PDA, or other wireless device. It can be executed in a browsing program such as a compatible browser.

サーバシステム１２０はイントラネット環境内のアノテーションサーバに対応し、サーバシステム１３０はウェブ中からクライアントにサービスを提供することができるインターネット１５０上のアノテーションサーバに対応する。サーバシステム１２０は、アノテーションイベントを受信し、イベントを収集し、クライアントからの検索及びサブスクリプション要求を処理することによってクライアントシステムにサービスを提供する。これは、上述のようなＰＣ、又はＳｕｎＭｉｃｒｏｓｙｓｔｅｍｓからのＵＮＩＸサーバ、Ｌｉｎｕｘベース及びＷｉｎｄｏｗｓベースのＩｎｔｅｌのサーバなどのサーバ構成上に実装されることができる。 The server system 120 corresponds to an annotation server in the intranet environment, and the server system 130 corresponds to an annotation server on the Internet 150 that can provide services to clients from the web. Server system 120 receives annotation events, collects events, and provides services to client systems by processing search and subscription requests from clients. This can be implemented on a server configuration such as a PC as described above, or a UNIX server from Sun Microsystems, a Linux-based and Windows-based Intel server.

ソフトウェアは、Ｃ、Ｃ＋＋、Ｊａｖａ、Ｃ＃、ＶｉｓｕａｌＢａｓｉｃ、ＰＥＲＬ又はＴＣＬなどのスクリプト言語を含む多くの様々な方法を使用して設計されることができる。クライアントシステムの態様は、ＨＴＭＬ、ＸＭＬ、Ｊａｖａ、ＪａｖａＳｃｒｉｐｔ、ＡｃｔｉｖｅＸなどのコードなどのブラウザベースの供給、又は任意のその他の好適なスクリプト言語（例えば、ＶＢＳｃｒｉｐｔ）のために開発されることができる。一部の実施形態において、いかなるコードもクライアントシステム１１０にダウンロードされず、必要なコードがサーバによって実行されるか、又はクライアントシステム１１０に既に存在するコードが実行される。 The software can be designed using many different methods including scripting languages such as C, C ++, Java, C #, VisualBasic, PERL or TCL. Aspects of the client system can be developed for browser-based provisioning such as HTML, XML, Java, JavaScript, ActiveX, etc. code, or any other suitable scripting language (eg, VBSscript). In some embodiments, no code is downloaded to the client system 110 and the necessary code is executed by the server or code that already exists on the client system 110 is executed.

本発明は、ハンドヘルドデバイス、マルチプロセッサシステム、マイクロプロセッサベースのプログラム可能な家庭用電化製品、ネットワークＰＣ、ミニコンピュータ、エンジニアリングワークステーション、メインフレームコンピュータなどを含むその他のコンピュータシステム構成を用いて実施されることができる。本発明は、デジタル電子回路で、又はコンピュータハードウェア、ファームウェア、ソフトウェアで、又はそれらの組合せで実装されることができる。好適なプロセッサは、例として、汎用マイクロプロセッサ及び専用マイクロプロセッサの両方を含む。上記のいずれも、特別に設計されたＡＳＩＣ（特定用途向け集積回路）によって補完されるか、又は特別に設計されたＡＳＩＣ（特定用途向け集積回路）に組み込まれることができる。 The present invention is implemented using other computer system configurations including handheld devices, multiprocessor systems, microprocessor-based programmable consumer electronics, network PCs, minicomputers, engineering workstations, mainframe computers, and the like. be able to. The invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or a combination thereof. Suitable processors include, by way of example, both general and special purpose microprocessors. Any of the above can be supplemented by a specially designed ASIC (Application Specific Integrated Circuit) or incorporated into a specially designed ASIC (Application Specific Integrated Circuit).

［アノテーション］
上で説明されたように、アノテーションは、ユーザの判断に従ってアイテムを記述するキーワードが導出されることができるような、ユーザによるアイテムの任意の簡潔な記述を指す。これは、そのようなアノテーションが発生する度に対してイベントの形態で発せられる。イベントに含まれるデータ要素のサンプルが図１４にＸＭＬの形態で示される。そのようなイベントは、複数のイベントソース及びイベントソースの種類から複数の態様で生成されることができる。好ましい実施形態は、イベントジェネレータ１１２の概念を用いてこれを遂行する。これらのイベントジェネレータは、ツールバー、アドイン、共有ライブラリ、ＯＳレベルのサポートなどの形態であることができる。それぞれのイベントソースの種類は、その種類独自のイベントジェネレータを有する。それぞれのイベントソースの種類に対して複数のイベントジェネレータが存在する可能性もある。好ましい実施形態はユーザインタラクションから直接キーワードを獲得するが、それらのキーワードは、ユーザの電子メール及びドキュメントの全文検索に対して自動化されたプロシージャを使用することによって取得されることもできる。また、メールメッセージ又はドキュメントに対するキーワードの指定は、強調されるべきテキストを指定することと同様の態様でテキストにおいてユーザによって行われることができる。当業者は、そのようなキーワードがユーザから取得されることができる多数の態様が存在することを認識するであろう。 [Annotation]
As explained above, annotation refers to any concise description of an item by the user, such that keywords describing the item can be derived according to the user's judgment. This is issued in the form of an event whenever such annotation occurs. A sample data element included in the event is shown in XML form in FIG. Such events can be generated in multiple ways from multiple event sources and event source types. The preferred embodiment accomplishes this using the concept of event generator 112. These event generators can be in the form of toolbars, add-ins, shared libraries, OS level support, and the like. Each event source type has its own event generator. There may be multiple event generators for each event source type. Although the preferred embodiment obtains keywords directly from user interaction, these keywords can also be obtained by using automated procedures for full text search of user emails and documents. Also, the specification of keywords for mail messages or documents can be made by the user in the text in a manner similar to specifying the text to be emphasized. One skilled in the art will recognize that there are numerous ways in which such keywords can be obtained from the user.

例として、ブラウザベースのアノテーションイベントは、図５のようなブラウザに対するツールバーアドインから生成されることができる。ファイルシステムに基づくイベントは、図６のようなエクスプローラアドインから生成されることができる。各イベントジェネレータ１１２は、そのイベントジェネレータ１１２がアノテーションイベント内にセットすることができるイベントソースＩＤ及びイベントソースラベル要素を通じてそのイベントジェネレータ１１２自身を記述することができる。ブラウザにおけるウェブページに対するアノテーションイベントは、企業ＬＡＮのファイルシステムに基づくアノテーションイベントとは異なるイベントソースによって示されることができる。そのようなイベントは、イベントジェネレータ１１２に基づいて、ＬＡＮなどの私設ネットワーク上の１つ若しくは複数のアノテーションサーバに伝えられることができるか、又はインターネット上の１つ若しくは複数のアノテーションサーバに伝えられることができる。 As an example, a browser-based annotation event can be generated from a toolbar add-in for a browser as in FIG. An event based on the file system can be generated from an explorer add-in as shown in FIG. Each event generator 112 can describe the event generator 112 itself through an event source ID and event source label element that the event generator 112 can set in an annotation event. An annotation event for a web page in the browser can be indicated by an event source different from an annotation event based on the file system of the corporate LAN. Such events can be communicated to one or more annotation servers on a private network, such as a LAN, based on the event generator 112, or to one or more annotation servers on the Internet. Can do.

アノテーションイベントは、アノテーションされているアイテムに関するアイテムＩＤと、アノテーションするユーザに関するユーザＩＤとを含む。最大の精度のために、すべてのイベントソースからのイベントに渡って一意なアイテムは同じ一意なアイテムＩＤに対応すべきであり、一意なユーザは同じ一意なユーザＩＤに対応すべきである。しかし、好ましい実施形態は、実装の要件の通りにこの目的を達成する最良の態様を決定するためにこの実装を離れる。これは、たとえユーザＩＤ及びアイテムＩＤが上記の要件に厳密に従わないとしても異種のイベントジェネレータと統合することから得られる利点があるので行われる。収集値は、アイテム及びユーザの適当な多様性のある集団が存在する多くの場合、精度よりも価値がある可能性がある。例として、ウェブ上の同じページが複数のＵＲＬ（事実上複数のアイテムＩＤ）を有する可能性がある。しかし、多くの場合、それらのうちの１つだけを見つければ十分である。Ｇｏｏｇｌｅのような検索エンジンは、ユーザの容易さのためにそのようなＵＲＬを収集するように試みるが、ウェブの基本的構造はそのような収集又は一意性を義務づけない。 The annotation event includes an item ID related to the annotated item and a user ID related to the user who performs the annotation. For maximum accuracy, items that are unique across events from all event sources should correspond to the same unique item ID, and unique users should correspond to the same unique user ID. However, the preferred embodiment leaves this implementation to determine the best way to achieve this goal as per the requirements of the implementation. This is done because of the benefits that come from integrating with disparate event generators even if the user ID and item ID do not strictly follow the above requirements. Collected values can be more valuable than accuracy in many cases where there is a suitably diverse population of items and users. As an example, the same page on the web may have multiple URLs (effectively multiple item IDs). However, in many cases it is sufficient to find only one of them. Search engines like Google attempt to collect such URLs for user ease, but the basic structure of the web does not mandate such collection or uniqueness.

好ましい実施形態において、アイテムは、ＵＲＩによって与えられる一意なアイテムＩＤによって特定されることができる任意のものであることができる。当然、これは、フォークソノミでよく見られるようにＵＲＬを使用するウェブコンテンツを含む。これはファイルシステム内のファイル及びフォルダ、メールサーバの電子メールメッセージも含むことができ、さらに、バーコードを有する紙のドキュメントなどの物理的対象、一意なＩＤを有するプロジェクト管理システム内のタスク／課題、ＵＲＩを有するアプリケーション内にテキストの形態で記憶されたブレインストーミングセッションにおけるアイデアなどを含むことができる。そのような一意なアイテムＩＤを生成するための当技術分野において知られている多数の方法が存在し、それらの方法が実装の要件の通りに利用されることができる。好ましい実施形態は、電子メールがタイトル及び本文を許可する態様と同様に、アノテーションイベントに含まれるアイテムに対してタイトル及び説明を任意的に指定することを可能にする。これは、図１４におけるようにアイテムＩＤと共に記憶される。好ましい実施形態は、ユーザがユーザのためにローカルに記憶される「記憶」ダイアログアノテーションイベント中にタイトル及び説明を指定することを可能にする。これは、図１０のようなアノテーションブラウザウィンドウの「マイアイテム」オプションでアイテムを示すことに使用されることができ、ユーザがアイテムの記述をカスタマイズすることを可能にする。発行者は、アイテムがシステムに対して新しい場合はアイテムのタイトル及び説明を指定しなければならず、そうでなければ、図９のような「発行」ダイアログはサーバに記憶されたアイテムの情報からこの情報を予め記述する。クリックストリームアノテーションイベントは、タイトル及び説明情報を指定しない。その他の実施形態は、最新のタイトル及び説明情報を用いてサーバ内のアイテムの情報を更新するなど、いくつかの異なる態様でこれを処理することができる。一部の実施形態は、ユーザに基づくキーを用いてアノテーションイベントにデジタル署名を施して、必要に応じて認証及び否認防止を可能にすることができる。 In a preferred embodiment, the item can be anything that can be identified by a unique item ID given by a URI. Of course, this includes web content that uses URLs as commonly found in folk sonomies. This can also include files and folders in the file system, email messages from the mail server, plus physical objects such as paper documents with barcodes, tasks / issues in the project management system with unique IDs , Ideas in brainstorming sessions, etc. stored in the form of text within an application with a URI. There are numerous methods known in the art for generating such a unique item ID, and these methods can be utilized as per the requirements of the implementation. The preferred embodiment allows an optional title and description to be specified for items included in an annotation event, similar to the manner in which email allows titles and body text. This is stored with the item ID as in FIG. The preferred embodiment allows the user to specify a title and description during a “store” dialog annotation event that is stored locally for the user. This can be used to indicate an item in the “My Items” option of the annotation browser window as in FIG. 10 and allows the user to customize the description of the item. The issuer must specify the item title and description if the item is new to the system, otherwise the “Publish” dialog as in FIG. This information is described in advance. The click stream annotation event does not specify a title and description information. Other embodiments may handle this in a number of different ways, such as updating information on items in the server with the latest title and description information. Some embodiments may digitally sign annotation events using a user-based key to allow authentication and non-repudiation as needed.

例として、図５、６、及び７は、イベントジェネレータ１１２をツールバーの形態で統合するアプローチを示す。ツールバーは、ウェブブラウザ、ファイルシステムエクスプローラ、及び電子メールアプリケーションなどの既存のアプリケーションに対するアドインとして働くプログラムである。これは当技術分野で知られており、現在、ＭＳＮ、Ｙａｈｏｏ！、及びＧｏｏｇｌｅからのツールバーなどの多くの例が存在する。ツールバーは、「記憶」ボタン及び「発行」ボタンを有する。「記憶」ボタンはユーザが図８のような記憶ダイアログウィンドウを起動することによってアイテムをアノテーションすることを可能にし、「発行」ボタンはユーザが図９のような発行ダイアログウィンドウを起動することによってアイテムを発行することを可能にする。図５は、ウェブブラウザのツールバーがユーザが現在表示されているＵＲＬを「記憶」ボタンを通じてアノテーションすること、又は「発行」ボタンを用いてアイテムをシステムに発行することを可能にすることができることを示す。そのようなツールバーは、ユーザが表示されたページ内の任意のハイパーリンクを右クリックし、記憶又は発行ダイアログウィンドウを起動するメニューアイテムを選択することも可能にすることができる。さらに、ユーザがツールバー内の検索を使用してページを検索し、結果内の返されたＵＲＬのうちの１つをクリックする場合、ツールバーはそのようなユーザの行為を監視し、図１５のような、アイテムに対する検索において使用されたキーワードを使用するイベントを自動的に生成することができる。これは、生成されるイベントを決定する前に、ユーザがクリックされたページを読むかどうか、又はさらにはユーザがどこまでページを読むかを評価するなど、さらなるユーザの監視によって増強されることができる。本発明は、一意なユーザから生成されたイベントが収集されるときに最もよく機能する。したがって、アノテーションするユーザに対するユーザＩＤの割り当てが必要とされる。これは、ユーザにクッキーとして記憶される一意なＩＤを動的に割り当てるか、又はアノテーションの前にユーザにサーバにログインさせることによってインターネットベースのアノテーションサーバに対して遂行されることができる。ブラウザベースのイベントジェネレータは、そのイベントジェネレータのすべてのアノテーションイベントをそのようなイントラネットサーバに送信することができるが、公的なウェブのページに対して生成されたイベントをインターネットベースのアノテーションサーバに任意的に送ることができる。ユーザＩＤはＵＲＩの形態で示される。 As an example, FIGS. 5, 6 and 7 show an approach for integrating the event generator 112 in the form of a toolbar. The toolbar is a program that acts as an add-in to existing applications such as web browsers, file system explorers, and email applications. This is known in the art and is currently MSN, Yahoo! There are many examples, such as a toolbar from Google. The toolbar has a “memo” button and an “issue” button. The “Remember” button allows the user to annotate an item by launching a remember dialog window as shown in FIG. 8, and the “Publish” button allows the user to launch an issue dialog window as shown in FIG. Can be issued. FIG. 5 illustrates that the web browser toolbar may allow the user to annotate the currently displayed URL through the “Remember” button or to publish the item to the system using the “Publish” button. Show. Such a toolbar may also allow a user to right-click on any hyperlink in the displayed page and select a menu item that launches a store or publish dialog window. In addition, if the user searches the page using a search in the toolbar and clicks one of the returned URLs in the results, the toolbar monitors such user activity, as shown in FIG. In addition, an event using a keyword used in a search for an item can be automatically generated. This can be augmented by further user monitoring, such as evaluating whether the user reads the clicked page or even how far the user reads the page before determining the event to be generated. . The present invention works best when events generated from unique users are collected. Therefore, it is necessary to assign a user ID to the user to be annotated. This can be accomplished for an Internet-based annotation server by dynamically assigning the user a unique ID stored as a cookie, or by allowing the user to log into the server prior to annotation. A browser-based event generator can send all annotation events of that event generator to such an intranet server, but any event generated for a public web page can be sent to an Internet-based annotation server. Can be sent. The user ID is shown in the form of a URI.

図６において、メタファの同様の集合がファイルシステムの場合に使用されることができる。ちょうどブラウザの場合のように、任意のファイル又はフォルダが、「記憶」又は「発行」ボタンのいずれかを使用してアノテーションされることができる。アイテムを右クリックすることは、ボタンと同じ機能にアクセスするためのコンテキストメニューアイテムを与えることができる。ファイルシステムのアイテムに対して実行される検索は、上述のようにバックグラウンドでアノテーションされることができる。ユーザＩＤ情報は、イントラネットＬＡＮにログインしたユーザなどの特定の場合、オペレーティングシステムから有利に取得されることができる。そのようなイベントは、セキュリティ及びプライバシーなどの理由でイントラネットベースのアノテーションサーバにのみ送信される可能性がある。 In FIG. 6, a similar set of metaphors can be used in the case of a file system. Just as in the case of a browser, any file or folder can be annotated using either a “Remember” or “Publish” button. Right-clicking on an item can give a context menu item to access the same function as the button. Searches performed on file system items can be annotated in the background as described above. User ID information can be advantageously obtained from the operating system in certain cases, such as a user logged into an intranet LAN. Such events may only be sent to intranet-based annotation servers for reasons such as security and privacy.

図７は、同じ概念がＭｉｃｒｏｓｏｆｔＯｕｔｌｏｏｋなどの電子メールソフトウェアに適用されることを示す。任意のメールが上述のようにキーワードを用いてアノテーションされることができる。同様に、ファイル保存及びファイルオープンダイアログボックスが、キーワードを用いてファイルをアノテーション又は発行し、キーワードに基づいてファイルを検索する能力によって増強されることができる。当業者は、そのような機能が任意の所与のアプリケーション内に実装されることができる多数の態様が存在することを認識するであろう。例として、ツールバー又はＯＳが、特別に作られたアプリケーションが任意のアイテムに対して記憶ダイアログウィンドウを起動することを可能にするＡＰＩを提供することができる。そのような特別に作られたアプリケーションは、そのアプリケーション独自のイベントソース識別情報を有する可能性があり、そのアプリケーション独自の要件の通りにアイテムＩＤを予め記述することができる。 FIG. 7 shows that the same concept applies to e-mail software such as Microsoft Outlook. Any email can be annotated with keywords as described above. Similarly, file save and file open dialog boxes can be enhanced by the ability to annotate or publish files using keywords and search for files based on keywords. Those skilled in the art will recognize that there are numerous ways in which such functionality can be implemented in any given application. As an example, a toolbar or OS can provide an API that allows a specially crafted application to launch a store dialog window for any item. Such a specially created application may have its own event source identification information, and the item ID can be pre-described according to the application's unique requirements.

したがって、説明の目的で、すべてのアノテーションイベントが、上記のツールバーのような（図３の）イベントジェネレータ１１２によって生成されると仮定される。これらのイベントジェネレータ１１２は、ジェネレータによってイベントに追加される一意なイベントソース識別子を持つことができる。例えば、電子メールをアノテーションするツールバーは、ｈｔｔｐ：／／ｗｗｗ．ａｂｃ．ｔｌｄ／ＥｍａｉｌのようなイベントソースＩＤと、「ＡＢＣＣｏｍｐａｎｙＥｍａｉｌ」のようなラベルとを追加することができる。イベントソースＩＤにおける名前空間の競合を避けるために、好ましい実施形態はＵＲＩに基づく構文を使用する。しかし、一意性を保証する責任は実装に委ねられる。 Thus, for illustrative purposes, it is assumed that all annotation events are generated by an event generator 112 (of FIG. 3) such as the toolbar described above. These event generators 112 can have a unique event source identifier that is added to the event by the generator. For example, the toolbar for annotating emails can be found at http: // www. abc. An event source ID such as tld / Email and a label such as “ABC Company Email” can be added. In order to avoid namespace conflicts in the event source ID, the preferred embodiment uses a URI based syntax. However, the responsibility to ensure uniqueness is left to the implementation.

ユーザのユーザＩＤは、実装の要件の通りにいくつかの態様でイベントジェネレータ１１２によって決定されることができる。これは、ユーザにユーザＩＤ（電子メールＩＤなど）及びパスワードを用いてアノテーションサーバにログインさせ、それによってアノテーションサーバが一意なユーザＩＤを生成することを可能にすることによって獲得されることができるか、又はそれは、オペレーティングシステムのログイン情報、連携型識別情報ソリューション、シングルサインオンデータを使用することができるか、又はそれは自動的に一意なＩＤを生成し、そのＩＤをブラウザにクッキーとして保持することができる。 The user ID of the user can be determined by the event generator 112 in several ways as per implementation requirements. Can this be obtained by having the user log in to the annotation server using a user ID (such as an email ID) and password, thereby allowing the annotation server to generate a unique user ID Or it can use operating system login information, federated identity solutions, single sign-on data, or it can automatically generate a unique ID and keep that ID as a cookie in the browser Can do.

一部の実施形態において、クローラの形態のその他の形態のアノテーションイベントジェネレータを有することができる。ウェブクローラ１７０は当技術分野においてよく知られており、インデックス付けのためにウェブ上のページを取得するために検索エンジンによって使用される。そのようなクローラが、ウェブページの間のハイパーリンクを発見し、アノテーションイベントを生成するために使用されることができる。リンクテキストがキーワードの代わりに使用されることができ、ウェブホスト又はブログの情報がユーザＩＤの代わりに使用されることができる。これは、概してリンクテキストが少数の関連性のあるキーワードを決定するのに有用であり、判断の独立したソース（ハイパーリンクを作成する実際のユーザ）を特定することが容易でないので上述の記憶ダイアログ又はクリックストリームに比べて質が低い可能性があるイベントを生成する。Ｔｅｃｈｎｏｒａｔｉのｒｅｌ＝“ｔａｇ”スタイルのタグが、ページに関するアノテーションイベントを生成するために使用されることができる。しかし、これは、ページの作者だけがそのようなタグを割り当てることが可能なのでやはり制限される。 In some embodiments, there may be other forms of annotation event generator in the form of a crawler. Web crawler 170 is well known in the art and is used by search engines to retrieve pages on the web for indexing. Such crawlers can be used to discover hyperlinks between web pages and generate annotation events. Link text can be used instead of keywords, and web host or blog information can be used instead of user IDs. This is because the link text is generally useful for determining a small number of related keywords, and it is not easy to identify an independent source of judgment (the actual user creating the hyperlink), so the above mentioned storage dialog Or, an event that may be of lower quality than the click stream is generated. Technorati's rel = “tag” style tags can be used to generate annotation events for a page. However, this is still limited because only the author of the page can assign such tags.

その他の実施形態において、イントラネットベースのクローラ１６０を生成することもでき、ファイルシステム内のすべてのファイル、メールサーバ内のメール、又はその他のデータの種類が自動的にアノテーションされることができる。例として、ファイルシステムクローラは、個人の及び共有のドライブを含むファイルシステム全体をスキャンすることができる。そのようなクローラは当技術分野に知られており、デスクトップ検索ソフトウェアにおけるファイルをインデックス付けするプログラムと同様である。そのようなプログラムは、ディレクトリパス内のディレクトリ及びファイルのラベルをファイルに対するキーワードとして使用するように修正されることができる。これは、（暗号学的ハッシュなどの）ファイルのコンテンツに基づく一意なハッシュを計算し、異なるユーザの個人フォルダ内の同じファイルを発見し、異なるユーザＩＤ及び異なるキーワードの集合を使用して同じファイルに対して異なるイベントを生成することができる。しかし、ユーザがファイルを直接アノテーションするアノテーションメカニズムと比較してそのような方法には制限が存在する。これらは、「スタッフ（Ｓｔｕｆｆ）」のような誤解を招きかねない名前を有するファイル、関連性のある情報をほとんど又はまったく持たない電子メール及びその他の形態のデータなどを含む可能性がある。したがって、そのようなクローラは、アイテムのコンテンツに基づいてキーワードを生成する自動アノテーション方法も使用することができる。アイテムをアノテーションサーバに取り込む好ましい方法は、後で詳細に説明される発行及びサブスクライブパラダイムを使用することである。 In other embodiments, an intranet-based crawler 160 can also be created, and all files in the file system, mail in the mail server, or other data types can be automatically annotated. As an example, a file system crawler can scan an entire file system, including personal and shared drives. Such crawlers are known in the art and are similar to programs that index files in desktop search software. Such a program can be modified to use directory and file labels in the directory path as keywords for the file. It calculates a unique hash based on the contents of a file (such as a cryptographic hash), finds the same file in a different user's personal folder, and uses the same file using different user IDs and different keyword sets Different events can be generated for. However, there are limitations to such methods compared to annotation mechanisms where users annotate files directly. These may include files with misleading names such as “Stuff”, emails with little or no relevant information, and other forms of data. Thus, such a crawler can also use an automatic annotation method that generates keywords based on the content of the item. A preferred way to import items into the annotation server is to use the publish and subscribe paradigm described in detail later.

［収集］
アノテーションイベントが、本発明の機能を達成するために収集される。これは、情報検索におけるドキュメントのインデックス付けに幾分似ている。ドキュメントのインデックス付けにおいて、各単語がその単語が現れたドキュメントにマッピングされる単語の転置インデックスが生成される。本発明において、イベントは、各ユーザＩＤをそのイベントにマッピングし、アイテムＩＤをそのイベントにマッピングし、各コンテキストをそのイベントにマッピングする３つの別個のマッピングに収集される。ここで、コンテキストは、少なくとも１つのキーワードからなるキーワードの集合を指す。このマッピングを実現するために利用されることができる、当技術分野で知られている多くのインデックス付け又はハッシュ方法が存在し、そのようなプロセスを詳細に説明する必要はない。参考として、１つのそのような方法が、Ｇｏｏｇｌｅ，Ｉｎｃ．のＤｅａｎらによる論文「ＭａｐＲｅｄｕｃｅ：ＳｉｍｐｌｉｆｉｅｄＤａｔａＰｒｏｃｅｓｓｉｎｇｏｎＬａｒｇｅＣｌｕｓｔｅｒｓ」に見られる。 [collection]
Annotation events are collected to achieve the functionality of the present invention. This is somewhat similar to document indexing in information retrieval. In document indexing, a transposed index of words is generated in which each word is mapped to the document in which the word appeared. In the present invention, events are collected in three separate mappings that map each user ID to the event, map the item ID to the event, and map each context to the event. Here, the context refers to a set of keywords including at least one keyword. There are many indexing or hashing methods known in the art that can be utilized to implement this mapping, and such a process need not be described in detail. For reference, one such method is described in Google, Inc. Dean et al., “MapReduce: Simplified Data Processing on Large Clusters”.

第１に、ユーザに関してすべてのイベントがマッピングされる。これは、同じユーザＩＤを有するすべてのイベントが同じデータ構造で利用できることを示唆する。これは、イベントの正規化及びユーザプロファイリングのために行われる。ユーザに基づくマッピングは、実装の要件に応じてクライアントシステム１１０又はサーバシステム１２０において行われることができる。好ましい実施形態は、ユーザに基づくマッピングをクライアントシステム１１０において使用することがプライバシー、セキュリティ、及びネットワークの末端の計算機能力を使用することに関してより望ましい可能性があるのでユーザに基づくマッピングをクライアントシステム１１０において使用する。これは、すべてのイベントジェネレータ１１２からのイベントを受信し（ステップ３００）、それらのイベントを１１１のような永続的記憶装置のローカルに記憶するクライアントシステム１１０上に存在するソフトウェアを用いて実現されることができる。 First, all events are mapped for the user. This suggests that all events with the same user ID are available with the same data structure. This is done for event normalization and user profiling. User-based mapping can be performed at the client system 110 or the server system 120 depending on implementation requirements. The preferred embodiment uses user-based mapping at client system 110 because it may be more desirable with respect to using privacy, security, and network computing capabilities at the end of the network. use. This is accomplished using software residing on the client system 110 that receives events from all event generators 112 (step 300) and stores them locally in a persistent storage device such as 111. be able to.

ほとんどのフォークソノミ及びその他のアノテーション方法は、ユーザがアイテムを一度だけアノテーションすると仮定する。しかし、本発明の汎用的アノテーションメカニズムなどの汎用的アノテーションメカニズムにおいて、ユーザは同じアイテムを複数回アノテーションすることができる。これは、異なるコンテキストにおいてアイテムをアノテーション／使用すること、又は異なるイベントジェネレータにおいてそのアイテムを使用することが原因である可能性がある。所与のアイテムに関するすべてのアノテーションイベントがユーザに基づくマッピングにおいて収集され、各イベントジェネレータに対して一意なコンテキストを示すイベントの集合が計算され、ローイベント（ｒａｗｅｖｅｎｔｓ）と呼ばれる。次に、ローイベントのこの集合が正規化される（図１１のステップ３１０又は４０５）。正規化は、各ユーザに関するアイテムに対するすべてのキーワード及びそれらのキーワードの使用回数を累積し、ユーザに関するアイテムに対するキーワード使用の合計で割ることを指す。例として、ユーザがアイテムに対して２つのキーワード（例えば、キーワード１及びキーワード２）を使用する場合。ユーザがアイテムに対してキーワード１を使用する合計回数が３であり、キーワード２を使用する回数が７である。そのとき、正規化の後、キーワード１の数は３／（３＋７）で０．３となり、同様に、キーワード２の数は０．７となる。正規化の最後に、正規化されたイベントが、特定のユーザに関する所与のアイテムに対して生成される。正規化は、実装によって要求されるようにアイテム毎にイベントソース毎か又はすべてのイベントにまたがって実行されることができる。イベントを正規化する多数の態様が存在する可能性があるが、好ましい実施形態はすべてのイベントソースにまたがって正規化を行い、アイテム毎にキーワードを収集するときに各ユーザがアイテム毎に１つのキーワードの「投票」を事実上得ることを保証する。 Most folk sonomies and other annotation methods assume that the user annotates the item only once. However, in a general annotation mechanism such as the general annotation mechanism of the present invention, the user can annotate the same item multiple times. This may be due to annotating / using the item in a different context or using the item in a different event generator. All annotation events for a given item are collected in a user-based mapping, and a set of events that represents a unique context for each event generator is calculated and called raw events. Next, this set of raw events is normalized (step 310 or 405 in FIG. 11). Normalization refers to accumulating all keywords for an item for each user and the number of uses of those keywords and dividing by the sum of keyword usage for the item for the user. As an example, when a user uses two keywords (for example, keyword 1 and keyword 2) for an item. The total number of times the user uses the keyword 1 for the item is 3, and the number of times the keyword 2 is used is 7. At that time, after normalization, the number of keywords 1 is 3 / (3 + 7), which is 0.3, and similarly, the number of keywords 2 is 0.7. At the end of normalization, a normalized event is generated for a given item for a particular user. Normalization can be performed per event source per item or across all events as required by the implementation. Although there may be many ways to normalize events, the preferred embodiment normalizes across all event sources and each user collects one keyword per item when collecting keywords per item. Guarantees that you will effectively get a keyword “vote”.

そのような正規化されたイベントとそのイベントに対応するローイベントとが、１２０のようなイントラネットベースのサーバ又は１３０のようなインターネットベースのサーバなどの関連するアノテーションサーバに送信される（ステップ３２０）。これは、漸次的な態様で、状態変化があるときにのみ行われる。正規化されたイベントをどのサーバに送信するかの選択はアイテムに基づくことができる。アイテムがインターネットのウェブページのような公的資産である場合、イベントは、インターネットベースのアノテーションサーバ及びイントラネットベースのアノテーションサーバの両方に伝達されることができる。アイテムがイントラネットのドキュメントである場合、イベントはイントラネットベースのサーバにだけ送信されることができる。クライアントシステム１１０がイベントを伝達することができる複数のインターネット又はイントラネットサーバが存在してもよい。この情報は、クライアントシステム１１０内に設定プロファイルとして保存されることができる。 Such normalized events and raw events corresponding to the events are sent to an associated annotation server, such as an intranet-based server such as 120 or an Internet-based server such as 130 (step 320). . This is done in a gradual manner only when there is a state change. The choice of which server to send the normalized event can be based on the item. If the item is a public asset such as an internet web page, the event can be communicated to both the internet-based annotation server and the intranet-based annotation server. If the item is an intranet document, events can only be sent to intranet-based servers. There may be multiple Internet or intranet servers through which the client system 110 can communicate events. This information can be stored as a setting profile in the client system 110.

次に、イベントは、収集モジュール１２２又は１３２によって１２０又は１３０などのアノテーションサーバで受信される。はじめに、正規化されたイベントがアイテムに基づいてマッピングされる（ステップ４０６）。これは、特定のアイテムＩＤに対応するすべてのイベントが同じデータ構造に収集されることを意味する。これらは正規化されたイベントなので、各アイテムは特定のユーザＩＤからの最大で１つのイベントを有する。キーワードの数は、アイテムをアノテーションしたすべてのユーザからのキーワードによって収集される。キーワードの数の合計は、ユーザ数に等しいか、又は実質的にアイテムをアノテーションした判断の独立したソースの数に等しい。上述のように、キーワードの分布はおおよそべき乗則の分布に近い。これは、アイテムをアノテーションするユーザのうちのあるわずかな数を超えるユーザが使用するキーワードの数がほぼ一定であり、アノテーションイベントの数に対してスケール不変であることを意味する。例として、アイテムに関するアノテーションする集団のうちの５％超によって使用されるキーワードの数が、アイテムが５０回のアノテーションを受けたか、５００回のアノテーションを受けたかによらずほぼ一定である場合。５％又は１０％などの実装の必要に応じて好適な割合のユーザによって使用されるキーワードが、アイテムの定義特徴又はキーワードとみなされることができる。定義キーワードは、アノテーションの収集から生じるグループの記述であり、アイテムをこれらのキーワードに潜在するトピックに割り当てるための信頼できるガイドである。アイテムに関する定義キーワードは、イベントに基づいて上述のように更新される（ステップ４０７）。定義集合内のキーワードは激しく変動する挙動を確かに示し、すなわち、異なる単語が時間の経過につれて定義集合に入り、出て行く可能性があることに留意されたい。 The event is then received at an annotation server such as 120 or 130 by the collection module 122 or 132. Initially, a normalized event is mapped based on the item (step 406). This means that all events corresponding to a particular item ID are collected in the same data structure. Since these are normalized events, each item has at most one event from a particular user ID. The number of keywords is collected by keywords from all users who have annotated the item. The total number of keywords is equal to the number of users, or substantially equal to the number of independent sources of judgment that annotated the item. As described above, the keyword distribution is close to the power law distribution. This means that the number of keywords used by more than a small number of users who annotate items is almost constant and scale-invariant with the number of annotation events. As an example, if the number of keywords used by more than 5% of the annotating population for an item is almost constant regardless of whether the item has received 50 annotations or 500 annotations. Keywords used by a suitable percentage of users as required for implementation, such as 5% or 10%, can be considered as item defining features or keywords. Definition keywords are group descriptions that result from the collection of annotations, and are a reliable guide for assigning items to topics that cover these keywords. The definition keyword for the item is updated as described above based on the event (step 407). It should be noted that the keywords in the definition set certainly exhibit highly fluctuating behavior, that is, different words can enter and leave the definition set over time.

次に、収集モジュールがコンテキストに基づいてローイベントをマッピングする。コンテキストは単一のキーワード又は２つ以上のキーワードの集合である可能性がある。コンテキストのすべてのキーワードをそのイベント内に有するイベントは、コンテキストの一部とみなされる。したがって、イベントからコンテキストへのマッピング４０８は、ユーザがコンテキストの一部であるすべてのイベントを単一のデータ構造に収集することを可能にする。実際には、アイテムと同じか又はそれ以上の一意なコンテキストが恐らく存在し得る。上述のように、本発明は、カテゴリーコンテキストの概念を使用してコンテキスト空間の複雑性を削減し、ユーザ及びアイテムの両方の同時のクラスタ化を可能にする。好ましい実施形態において、カテゴリーコンテキストがローイベント（正規化されていないアノテーションイベント）から計算され、特定の最小数の一意なアイテム及びユーザを有するコンテキストに対応する。一部の実施形態は、すべてのアイテムが特定の最小数のユーザによってアノテーションされる特定の最小数のアイテムを有するカテゴリーコンテキストにカテゴリーコンテキストの定義をさらに制限することができる。一部の実施形態は、カテゴリーコンテキストの計算のために正規化されたイベントを使用することを好む可能性がある。その他の実施形態はローイベントを使用することができるが、アイテム及びユーザに対するそれぞれの一意なコンテキストに関して１つのイベントだけが送信されるように、サーバに送信されるローイベントを制限することができる。 The collection module then maps the raw event based on the context. A context can be a single keyword or a collection of two or more keywords. Events that have all the context keywords in the event are considered part of the context. Thus, the event-to-context mapping 408 allows the user to collect all events that are part of the context into a single data structure. In practice, there may probably be a unique context that is the same as or more than the item. As mentioned above, the present invention uses the concept of category context to reduce the complexity of the context space and allows simultaneous clustering of both users and items. In the preferred embodiment, the category context is calculated from raw events (unnormalized annotation events) and corresponds to a context with a certain minimum number of unique items and users. Some embodiments may further limit the definition of a category context to a category context that has a specific minimum number of items where all items are annotated by a specific minimum number of users. Some embodiments may prefer to use normalized events for category context calculations. Other embodiments can use row events, but can limit the row events sent to the server so that only one event is sent for each unique context for the item and user.

カテゴリーコンテキストは、イベントが受信されるときに生成的に計算されることができる（ステップ４０８）。ユーザは、一意なキーワードに基づいてイベントを収集することからはじめることができる。イベントが特定のキーワードに累積されるとき、これらは元のキーワードを削除し、それぞれがそれ自体及び元のキーワードのコンテキストを示す一意なキーワードの第２の集合を取得することによって再びハッシュされることができる。これは、各コンテキストがカテゴリーコンテキストに関する所定の基準を満たすときにカテゴリーコンテキストを繰り返し生成し続けることができる。好ましい実施形態において、この方法は、トピックドリフトを防止するように２つのさらなる制限によってさらに増強される。アイテムの定義キーワードに対応するキーワードだけが、イベントがコンテキストの一部であるかどうかを判定するために使用される。さらに、特定のレベル未満のイベントのレートを有するカテゴリーコンテキストはカテゴリーコンテキストであることから除外される（又は、最も最近使用されたカテゴリーコンテキストのみを保持するキャッシュメカニズムが使用されることができる）。上記の制限は、より大規模な実装においてより意味を持つ可能性があるかなり厳格な要件である。これらは、そのような制限から有用性を得ない実装に対して少なくとも以下の態様で緩和されることができる。
・すべての受信されたイベントが計算のために使用される
・たとえコンテキストのキーワードがアイテムに関する定義キーワードに対応しないとしてもコンテキストのキーワードを含むイベント
・コンテキストの少なくとも１つのキーワードがアイテムに関する定義キーワードに対応する場合にそのようなキーワードを含むイベント
・イベントを使用する代わりにそれぞれユーザ及びアイテムの収集されたキーワードに基づいてコンテキストに対応するユーザ及びアイテムを判定することによる
・コンテキストのキーワードに基づいてアイテムを判定し、そのようなアイテムをアノテーションしたユーザを判定することによる
・コンテキストのキーワードに基づいてユーザを判定し、それらのユーザがアノテーションしたアイテムを判定することによる The category context can be generated generatively when an event is received (step 408). The user can begin by collecting events based on unique keywords. When events are accumulated on a particular keyword, they are hashed again by deleting the original keyword and each obtaining a second set of unique keywords that indicate itself and the context of the original keyword Can do. This can continue to generate the category context repeatedly as each context meets a predetermined criteria for the category context. In the preferred embodiment, the method is further enhanced by two additional limitations to prevent topic drift. Only keywords that correspond to item definition keywords are used to determine whether an event is part of a context. In addition, category contexts with a rate of events below a certain level are excluded from being category contexts (or a caching mechanism that keeps only the most recently used category contexts can be used). The above limitations are fairly strict requirements that can be more meaningful in larger implementations. These can be mitigated at least in the following manner for implementations that do not benefit from such limitations.
All received events are used for calculations. Events that contain context keywords even if the context keyword does not correspond to the definition keyword for the item. At least one context keyword corresponds to the definition keyword for the item. Events that contain such keywords when: • instead of using events, by determining the user and item corresponding to the context based on the collected keyword of the user and the item, respectively • the item based on the context keyword By determining and annotating users who have annotated such items ・ Determine users based on context keywords and determine items annotated by those users According to the Rukoto

さらに、好ましい実施形態は、「統合的な」カテゴリーコンテキストを生成する。コンテキストは、有向非循環グラフ（又はＤＡＧ）を形成すると考えられることができる。例として、コンテキスト「キーワード１ＡＮＤキーワード２（キーワード１＋キーワード２）」は、キーワード１に対応するアイテム及びキーワード２に対応するアイテムのサブセットであるアイテムを表す。キーワード１及びキーワード２は個々にコンテキスト「キーワード１＋キーワード２」の親であると考えることができる。キーワード１＋キーワード２がカテゴリーコンテキストである場合、好ましい実施形態は、（たとえその親コンテキストが所定の基準に基づいてそれらのイベントを収集することに基づいてカテゴリーカテゴリとして適格でない可能性があるとしても）その親コンテキストもカテゴリーコンテキストとして「統合する」。 Furthermore, the preferred embodiment creates an “integrated” category context. The context can be thought of as forming a directed acyclic graph (or DAG). As an example, the context “keyword 1 AND keyword 2 (keyword 1 + keyword 2)” represents an item corresponding to keyword 1 and an item that is a subset of the item corresponding to keyword 2. Keyword 1 and keyword 2 can be considered individually as parents of the context “keyword 1 + keyword 2”. If keyword 1 + keyword 2 is a category context, the preferred embodiment is (even though its parent context may not qualify as a category category based on collecting those events based on predetermined criteria). Its parent context is also “integrated” as a category context.

好ましい実施形態は、カテゴリーコンテキストに基づいてイベントを収集する。これは、コンテキストがカテゴリーコンテキストになるときにイベントデータがその親コンテキストとは別個に管理されるように生成的に行われる。これは、各カテゴリーコンテキストに関するユーザ及びアイテムのランク付けの別個の計算を可能にし、それぞれの別個の計算を発行及びサブスクライブのための別個の目的として扱う。これは遅延された態様で行われることができ、イベントは、検索クエリのように要求されるか、又はそれに関して発行／サブスクライブ要求が受信されるときにのみカテゴリーコンテキストに関して収集されることができる。 The preferred embodiment collects events based on category context. This is done generatively so that event data is managed separately from its parent context when the context becomes a category context. This allows separate calculations of user and item rankings for each category context, treating each separate calculation as a separate purpose for publishing and subscribing. This can be done in a delayed manner, and events can be collected for a category context only when a search query is requested or a publish / subscribe request is received for it. .

当業者は、アノテーションの収集の異なる要素に渡るアノテーションイベントの挿入、更新、及び削除の実装はそれぞれの実装の要件に依存するが、実装が上述の基本的な収集の要件に従うように実装することは比較的簡単であることを理解するであろう。収集されたアノテーションデータは、Ｌｕｃｅｎｅなどの検索エンジンのインデックス、又はリレーショナルデータベースなどのいくつかの異なる方法で記憶されることができる。イベントは、リアルタイムで、又は所定の間隔で若しくは検索クエリなどのユーザからの動作に応じて実行されるバッチモードで収集されることができる。厳密な方法は特定の実装の要件に基づいて決定されることができ、その方法の選択は本発明の基本的な意図を変更しない。特定のイベントソースの種類が、ある状況においてその他の種類よりも優れたランク付け結果をもたらす可能性がある。例として、実装に応じて、「記憶」ダイアログからのイベントがその他のイベントよりもユーザの興味の優れた指示をもたらす可能性がある。したがって、実施形態は、必要に応じてそのようなソースの種類に基づいた別個のランク付け及びその他の計算を可能にする収集データ構造を有することができる。最終的なランクが、異なるイベントソースの種類からのランクの収集に基づいて計算されることができる。 Those skilled in the art will implement the implementation of annotation events across different elements of the collection of annotations depending on the requirements of each implementation, but the implementation should follow the basic collection requirements described above. Will understand that is relatively simple. The collected annotation data can be stored in several different ways, such as a search engine index such as Lucene, or a relational database. Events can be collected in real time or in batch mode that is executed at predetermined intervals or in response to actions from the user such as search queries. The exact method can be determined based on the requirements of a particular implementation, and the choice of method does not change the basic intent of the present invention. Certain event source types may give better ranking results than others in certain situations. As an example, depending on the implementation, events from the “Remember” dialog may provide a better indication of user interest than other events. Thus, embodiments can have a collection data structure that allows separate ranking and other calculations based on such source types as needed. A final rank can be calculated based on a collection of ranks from different event source types.

［検索］
ユーザは、キーワードをクライアントシステム１１０の検索モジュール１１４に送信すること（ステップ５００）によって検索を開始することができる。これはいくつかの態様で行われることができる。例として、図５、６、及び７のツールバー内の検索フィールド。ユーザは、図１０のような専用アノテーションブラウザウィンドウを起動し、検索フィールドに入力することができる。これを実装する態様がクエリ文字列を検索モジュール１１４に伝達する限りこれを実装する多くの態様が存在する可能性がある。概して、検索はキーワードの形態であり、ウェブ上の検索エンジンにおいてよくある検索と同じフォーマットに従う。クエリは上述のように実質的にコンテキストを示す。 [Search]
The user can initiate a search by sending the keyword to the search module 114 of the client system 110 (step 500). This can be done in several ways. As an example, the search field in the toolbar of FIGS. The user can start a dedicated annotation browser window as shown in FIG. 10 and enter the search field. There may be many ways to implement this as long as the manner of implementing this communicates the query string to the search module 114. In general, search is in the form of keywords and follows the same format as search that is common in search engines on the web. The query substantially indicates context as described above.

サーバの検索応答モジュール１２３又は１３３は、マッチするアイテム又はユーザ（ヒット）を判定し、そのようなヒットの関連性（ランク付け）を判定する責任を負う。本発明の根幹をなす革新は、アノテーションに基づいてヒットのランク付けにおいて情報検索技術を使用する能力の認識である。これは、（ＡｍｉｔＳｉｎｇｈａｌによる「ＭｏｄｅｒｎＩｎｆｏｒｍａｔｉｏｎＲｅｔｒｉｅｖａｌ：Ａｂｒｉｅｆｏｖｅｒｖｉｅｗ」に記載されているような）従来のＴＦ−ＩＤＦスタイルのアプローチ及び（Ｂｏｒｏｄｉｎらによる「ＬｉｎｋＡｎａｌｙｓｉｓＲａｎｋｉｎｇＡｌｇｏｒｉｔｈｍ，Ｔｈｅｏｒｙ，ａｎｄＥｘｐｅｒｉｍｅｎｔｓ」に記載されているような）ＬＡＲスタイルのアプローチを含む。クエリコンテキストがカテゴリーコンテキストに対応する場合、ＬＡＲスタイルのアプローチがランク付けの好ましい形態である。上述のように、それぞれのアノテーションをユーザとアイテムの間の統合的なリンクとして使用することによってＬＡＲスタイルのアプローチを取り入れることができる。概して、ユーザ及びアイテムの両方は、ユーザからアイテムに向かう有向リンクを有するグラフのノードとみなされることができる。より具体的には、これは、ユーザがハブのように扱われ、アイテムがリンク解析アルゴリズムにおけるオーソリティとして扱われることを可能にする。 The server search response module 123 or 133 is responsible for determining matching items or users (hits) and determining the relevance (ranking) of such hits. A fundamental innovation of the present invention is the recognition of the ability to use information retrieval techniques in ranking hits based on annotations. This is due to the traditional TF-IDF style approach (as described in “Modern Information Retrieval: A brief overview” by Amit Singhal) and “Link Analysis Ranking Algorithm, Borgodin et al.” LAR style approach). If the query context corresponds to a category context, the LAR style approach is the preferred form of ranking. As mentioned above, the LAR style approach can be taken by using each annotation as an integrated link between the user and the item. In general, both users and items can be considered nodes of the graph with directed links from the user to the item. More specifically, this allows the user to be treated like a hub and the item to be treated as an authority in the link analysis algorithm.

好ましい実施形態は、プライバシーに関する懸念のためにクエリを行うユーザがキーワードに基づいてユーザを検索することを許さず、その代わりに発行／サブスクライブ方法を生成する。基本的に、好ましい実施形態はユーザ情報を検索結果として返すことを許さないが、個人が（サーバにおいて行われたユーザに基づく検索によって決定される）関連性のある人にそれらの人がそれらの人の私的情報を明かす必要なしにメッセージを送信し、それらの人に送信者に折り返し連絡するか否かの選択を与えることを可能にする。これはカテゴリーコンテキストに制限される。 The preferred embodiment does not allow a querying user to search for users based on keywords due to privacy concerns, but instead generates a publish / subscribe method. Basically, the preferred embodiment does not allow user information to be returned as a search result, but individuals are related to those who are relevant (determined by a user-based search performed on the server). Sending messages without having to reveal the private information of the person and allowing them to choose whether or not to contact the sender. This is limited to category context.

好ましい実施形態において、カテゴリーコンテキストはＨＩＴＳアルゴリズムに基づいてランク付けされる５０４。説明されたＬＡＲアルゴリズムなどの任意のＬＡＲアルゴリズムが使用されることができる。この分野は広範に研究されており、様々な欠陥を対象とする多くの変形が存在する。好ましい実施形態は、カテゴリーコンテキストの一部であるイベントを初期集合として使用する。上述のように、これらのイベントは、コンテキストのすべてのキーワードが存在し、それらのキーワードがアイテムに関する定義キーワードに対応するように選択される。これは、ＨＩＴＳアルゴリズムに通常関連するトピックドリフト問題を解決するために行われる。これらのイベントはユーザとアイテムの間の統合的なリンクを生成するために使用され、ＨＩＴＳアルゴリズムがそれに適用される。これは、ハブのランクの形態のユーザに関するランクと、オーソリティのランクの形態のアイテムに関するランクとの両方を与える。これらのランクは、カテゴリーコンテキストに対して生成時（ステップ４０８）に計算され、イベントが収集されるときに更新され続ける（ステップ４０９）。カテゴリーコンテキストに対するクエリに基づくユーザ及びアイテムのランク付けがこれらのランクを使用することによって有利に行われることができる。ユーザのランク付けは、アイテムのランク付けとは異なるアルゴリズムを用いて行われることができる。例として、アイテムはＨＩＴＳアルゴリズムを用いてランク付けされることができる一方、ユーザはＢｏｒｏｄｉｎらによる論文に記載されたようなＢＦＳアルゴリズムを用いてランク付けされることができる。 In the preferred embodiment, category contexts are ranked 504 based on the HITS algorithm. Any LAR algorithm can be used, such as the described LAR algorithm. This field has been extensively studied, and there are many variations that target various defects. The preferred embodiment uses events that are part of the category context as the initial set. As described above, these events are selected such that all keywords in the context are present and those keywords correspond to the definition keywords for the item. This is done to solve the topic drift problem normally associated with the HITS algorithm. These events are used to create an integrated link between the user and the item, and the HITS algorithm is applied to it. This gives both a rank for users in the form of hub ranks and a rank for items in the form of authority ranks. These ranks are calculated at the time of creation (step 408) for the category context and continue to be updated as events are collected (step 409). Ranking users and items based on queries against the category context can be advantageously performed by using these ranks. User ranking may be performed using an algorithm different from item ranking. As an example, items can be ranked using the HITS algorithm, while users can be ranked using the BFS algorithm as described in the article by Borodin et al.

カテゴリーコンテキストではないコンテキスト、好ましい実施形態は、イベントのアイテムに基づくマッピングにおいて正規化されたイベントに基づいて単純なＴＦ−ＩＤＦに基づくランク付けを利用する（ステップ５０２）。ユーザの検索を可能にするその他の実施形態は、ユーザ毎の収集されたキーワードに基づいてヒットを生成することができる。コンテキストのキーワードが収集されたキーワード内に存在する場合、ユーザはヒットとして返されることができる（ステップ５０５）。そのようなメカニズムは、ランク付けのためのＴＦ−ＩＤＦスタイルのメカニズムを使用することができる（ステップ５０６）か、ＩＲからの適用可能なその他の方法のうちのいずれかを使用することができる。 A context that is not a category context, the preferred embodiment utilizes a simple TF-IDF based ranking based on events normalized in the item based mapping of events (step 502). Other embodiments that allow user searches can generate hits based on collected keywords for each user. If the context keyword is present in the collected keywords, the user can be returned as a hit (step 505). Such a mechanism can use a TF-IDF style mechanism for ranking (step 506) or any of the other applicable methods from IR.

高速なクエリ応答を容易にするために、ユーザ及びアイテムの両方に関するそのようなヒット及びランク付け情報は、Ｌｕｃｅｎｅなどの従来の検索エンジンの転置インデックスに記憶されることができるか、又はＯｒａｃｌｅなどのリレーショナルデータベース内に記憶されることができる。 To facilitate fast query response, such hit and ranking information for both users and items can be stored in the inverted index of a conventional search engine such as Lucene, or such as Oracle It can be stored in a relational database.

その他の実施形態において、カテゴリーコンテキスト内の発行者ＩＤをアイテムと同様の態様でランク付けすることもできる。各アイテムは、場合によってはいくつかの発行者ＩＤを持つ可能性がある。これらのＩＤのそれぞれは、アイテムと同様の態様で統合的なリンクを用いてユーザに関連付けられることができる。所与の発行者ＩＤに関連するユーザがすべてのアイテムに渡って収集される。ユーザはハブとしてモデル化され、発行者ＩＤはオーソリティとしてモデル化される。アイテムをランク付けするための同じアルゴリズムが発行者ＩＤをランク付けするために使用されることができる。これらのランクは、発行者がコンテキスト内のユーザの間で有する権威のレベルを示す。ランク付けはカテゴリーコンテキストではないコンテキストにおいても可能であるが、データのまばらさは、ＴＦ−ＩＤＦスタイルのアプローチ、ＣｕｂｅＳＶＤ、ＬＳＩ、又はＰＬＳＡ及びＰＨＩＴＳなどのテンソル分解アプローチなどを優れた代替にする。アイテムのランク付けはそのアイテムの発行者ＩＤのランク付けに部分的に基づくこともでき、そのようなランクはクエリに対するアイテムの最終的なランクの計算で収集されることができる。 In other embodiments, issuer IDs within a category context can also be ranked in a manner similar to items. Each item may have several issuer IDs in some cases. Each of these IDs can be associated with a user using an integrated link in a manner similar to an item. Users associated with a given issuer ID are collected across all items. Users are modeled as hubs and issuer IDs are modeled as authorities. The same algorithm for ranking items can be used to rank issuer IDs. These ranks indicate the level of authority that the issuer has among users in the context. Ranking is possible in contexts that are not categorical contexts, but the sparseness of data makes TF-IDF style approaches, CubeSVD, LSI, or tensor decomposition approaches such as PLSA and PHITS, etc. an excellent alternative. The ranking of an item can also be based in part on the ranking of the issuer ID of that item, and such rank can be collected in the calculation of the final rank of the item for the query.

ランク付けに関して、ユーザとアイテムの間に基本的な違いが存在する。概して、アイテムは、アイテムの定義特徴の決定を可能にするべき乗則の挙動を示す。これは、ＴＦ−ＩＤＦスタイルのアプローチが収集され正規化されたイベントにおけるキーワードの数を利用することができるのでＴＦ−ＩＤＦスタイルのアプローチを関連性の弁別において効果的にする。その一方、ユーザは、アイテムよりも多くの側面を有する傾向があり、そのような側面は時間の経過と共に変わる。ＬＡＲスタイルのアプローチ、又はＣｕｂｅＳＶＤなどのテンソル分解方法は、所与の実装においてＴＦ−ＩＤＦアプローチよりも優れた代替を生み出す可能性がある。当業者が気付くであろうように、ＩＲ技術を適用することにおいて可能な多くの変更形態が存在する。異なる実施形態は、それらの要件の通りにランク付けのための異なるＩＲ技術を実装することを選択することができる。これはこのメカニズムの基本的な意図から逸脱しない。 There are fundamental differences between users and items regarding ranking. In general, items exhibit a power-law behavior that should allow determination of the item's defining characteristics. This makes the TF-IDF style approach effective in relevance discrimination because the TF-IDF style approach can utilize the number of keywords in the collected and normalized event. On the other hand, users tend to have more aspects than items, and such aspects change over time. A LAR style approach or a tensor decomposition method such as CubeSVD may yield a better alternative than the TF-IDF approach in a given implementation. As those skilled in the art will be aware, there are many possible variations in applying IR technology. Different embodiments may choose to implement different IR techniques for ranking according to their requirements. This does not depart from the basic intent of this mechanism.

また、検索の結果はさらなるドリルダウンカテゴリーを返す（ステップ５０２、５０４、及び５０６）。実質的に、これらは、クエリをさらに増強するための関連性のあるキーワードの提案である。好ましい実施形態は、カテゴリーコンテキストからドリルダウンカテゴリーを計算する。具体的には、これは、クエリコンテキストの子であるすべてのカテゴリーコンテキストに関して、クエリコンテキストのキーワードを削除し、次のレベルの一意なキーワードを生成し、それらのキーワードの累積のイベント数によってそれらのキーワードをソートし、上位（例えば、２０）のキーワードを返すことを意味する。特定の実施形態は、コンテキストに対応するアイテムのキーワードからこれらを計算することができる。その他の実施形態は、検索のクリックストリームを計算のための基礎として使用することができる。一部の実施形態は、所与の期間内のイベント数の累積数に対応する「最近の」イベント数を使用することを好む可能性がある。上述のように、一部の実施形態は、ドリルダウンカテゴリーのよりパーソナライズされた集合を得るためにカテゴリーコンテキストのユーザの特定期間の使用頻度に基づく時間に基づくＴＦ−ＩＤＦアプローチも使用することができる。 The search results also return additional drill-down categories (steps 502, 504, and 506). In essence, these are suggestions of relevant keywords to further enhance the query. The preferred embodiment calculates the drill-down category from the category context. Specifically, for all category contexts that are children of the query context, this removes the query context keywords, generates the next level of unique keywords, and counts those keywords by their cumulative event count. This means sorting the keywords and returning the top (eg, 20) keywords. Certain embodiments can calculate these from the keyword of the item corresponding to the context. Other embodiments may use the search clickstream as the basis for the calculation. Some embodiments may prefer to use a “recent” number of events that corresponds to a cumulative number of events within a given time period. As described above, some embodiments may also use a time-based TF-IDF approach based on the frequency of use of a category context user for a specific period of time to obtain a more personalized set of drill-down categories. .

ランク付けに対する多くのその他のアプローチが上記のアプローチと協調して使用されることができる。例として、テキストコンテンツに関して、アノテーションに基づくランク付けを増強するために全文インデックス付けが使用されることができる。ウェブページに関して、ハイパーリンクの接続性が従来のＬＡＲアプローチによって利用されることができる。アイテム又はユーザに関するランク付けのすべてのこれらの異なるソースは、実装の要件に基づいて選択されることができる当技術分野に知られているいくつかのランク収集アルゴリズムを用いて有利に収集されることができる。 Many other approaches to ranking can be used in coordination with the above approach. As an example, full text indexing can be used to enhance annotation-based ranking for text content. For web pages, hyperlink connectivity can be exploited by a conventional LAR approach. All these different sources of ranking for items or users are advantageously collected using a number of rank collection algorithms known in the art that can be selected based on implementation requirements. Can do.

次に、結果及びカテゴリーが検索応答モジュール１３３からクライアントシステム１１０に返され、次にディスプレイモジュール１１３を使用してユーザに表示される。これらの結果は、図１０のアノテーションブラウザなどの専用のウィンドウに表示されることができる。クエリされたコンテキストにおけるユーザ及びアイテムの総数などのその他のデータが検索結果において増強されることができる。一部の実施形態は、クエリ指定に使用されることもできる、ヒットに対応するイベントソースの種類又はイベントソースのリストを提供することができる（基本的にそれらをカテゴリーコンテキストのように扱う）。 Results and categories are then returned from the search response module 133 to the client system 110 and then displayed to the user using the display module 113. These results can be displayed in a dedicated window such as the annotation browser of FIG. Other data such as the total number of users and items in the queried context can be augmented in the search results. Some embodiments can provide a list of event source types or event sources corresponding to hits that can also be used for query specification (basically treating them like a category context).

［パーソナライズ、サブスクリプション、及び発行］
すべてのイベントに基づいて検索を行い、検索結果を返すことの望ましくない結果は、べき乗則が、コンテキストに対する上位にランクされるヒットが少数のアイテムなどによって満たされはじめ、新しく入ってくるものが見られることが次第に難しくなる「リッチゲットリッチャー（ｒｉｃｈｇｅｔｒｉｃｈｅｒ）」減少を示すことである。これは、システム全体の有用性にとって有害であり、本質的にインタラクションの生態系を少数の参加者だけに集中させる。これは、時間に基づくアプローチを使用することによって幾分改善されることができる。例として、上述のように収集全体とは別個の（ここ１時間、又は今日、又は今週などの）期間に対してイベントが収集されることができる。これは、最近のイベントが公開されやすくなることを可能にする。 [Personalization, subscription and issuance]
The undesired result of searching based on all events and returning search results is that the power law is such that the top ranked hits for the context are satisfied by a small number of items, etc. To show a “rich get richer” reduction that becomes increasingly difficult to do. This is detrimental to the overall system utility and essentially concentrates the interaction ecosystem on a small number of participants. This can be improved somewhat by using a time-based approach. By way of example, events may be collected for a time period (such as the last hour, or today, or this week) that is separate from the entire collection as described above. This allows recent events to be easily published.

図１０は、そのような時間に基づく結果を「最新」タブに表示することができるアノテーションブラウザを示す。表示の観点からすると、クライアントシステム１１０は、結果のそのような時間に基づく変化を処理することを可能にされる必要がある。概して、電子メールのような現在のほとんどの時間に基づくシステムは、それらが受信される時間によってメッセージをソートする。これは、このシステムにおいてよくあるように多数のそのようなメッセージが存在する場合には不可能である可能性がある。したがって、「最新」タブは、関連性に基づいてアイテムを表示する必要がある。これは、ユーザが（電子メールのようにリストを順次下に移動するなど）それらのユーザが見たものを追跡し続け、それらのユーザが見ていないものを効率的に発見することがもはやできないので、基本的なユーザインターフェースのメタファの変更を要求する。これは、ユーザによって既に見られたアイテムが「マイアイテム」と呼ばれる別個のリストに保持されることができるメタファを実装することによって行われることができる。これは、「最新」タブにある間にコンボボックス内の「マイアイテム」メニューアイテムを選択することによってアクセスされることができる。これは、ユーザが「最新」によって与えられる期間内に見る、アノテーションする、発行するなどしたすべてのアイテムを表示する。また、「マイアイテム」は、「すべて」タブを選択するときと同様の意味を有するが、期間にかかわらずすべてのアイテムを含む。 FIG. 10 shows an annotation browser that can display such time-based results in the “Latest” tab. From a display point of view, client system 110 needs to be enabled to handle such time-based changes in results. In general, most current time-based systems, such as email, sort messages by the time they are received. This may not be possible if there are a large number of such messages as is common in this system. Therefore, the “Latest” tab needs to display items based on relevance. This keeps track of what they see (such as moving down the list sequentially like email) and can no longer efficiently discover what they don't see So request a change to the basic user interface metaphor. This can be done by implementing a metaphor that allows items already viewed by the user to be kept in a separate list called “My Items”. This can be accessed by selecting the “My Items” menu item in the combo box while in the “Latest” tab. This displays all items that the user has seen, annotated, published, etc. within the period given by “latest”. “My items” has the same meaning as when the “all” tab is selected, but includes all items regardless of the period.

そのような機能は、真に新しいアイテムと、既存のアイテムに関する新しいイベントとを区別する必要がある。人気のあるアイテムは、新しいユーザがそれらを発見したときに継続的にアノテーションされる。収集モジュール１３２は、アイテムがシステムにとって新しいのか、それともアイテムがコンテキストに対して新しいのかを評価することによってイベントがそのような時間に基づく記憶に置かれるべきかどうかを検出することができ（ステップ４０２）、その場合はアイテムが追加され、そうでない場合アイテムは追加されない。代替として、一部の実施形態は、イベントが所与の期間アイテムに関して受信されなかったか、又はアイテムに対するイベントのレートが規定のレート未満に落ちた場合にアイテムに関するイベントを新しいイベントとしてシグナリングすることができる。 Such functionality needs to distinguish between a truly new item and a new event for an existing item. Popular items are continuously annotated as new users discover them. The collection module 132 can detect whether the event should be placed in such a time-based memory by evaluating whether the item is new to the system or the item is new to the context (step 402). ), In that case an item is added, otherwise no item is added. Alternatively, some embodiments may signal an event for an item as a new event if the event has not been received for the item for a given period of time or the rate of events for the item falls below a prescribed rate. it can.

これは、デフォルトの「最新」タブウィンドウが常に最新の及び最も関連性のあるアイテムを表示することに集中することを可能にする。これは、任意の時間の関連性によって継続的にソートされることができる。この表示メタファは、関連性が高いアイテムが受信されたときのデスクトップ上の新しいアイテムのチッカーテープ又はシステムトレイ上のアラートメッセージなどのその他のパラダイムによって増強されることができる。 This allows the default “Latest” tab window to always focus on displaying the latest and most relevant items. This can be continually sorted by any time relevance. This display metaphor can be augmented by other paradigms such as a ticker tape of new items on the desktop or alert messages on the system tray when highly relevant items are received.

この時間に基づくアプローチは、新しい及び関連性のあるコンテンツの発見しやすさを高めるためにコンテンツの的を絞った配信を可能にする発行及びサブスクライブメカニズムを用いて有利に増強される。これは、以下のメカニズムを利用することによって達成される。
・カテゴリーコンテキストを使用することによって発行者及びサブスクライバが集まることを可能にすること
・発行者がアイテムに対して最も関連性のあるコンテキストを選択することを可能にすることによる
・発行をアカウンタビリティを伴う明示的な行為にすることによる
・分散された態様のコミュニティによる迅速な処理を可能にするコンテキスト設定フェーズを持つことによる
・発行者に関する評判の分散された形態を可能にするパーソナライズされたサブスクリプションプロセスを持つことによる This time-based approach is advantageously augmented with a publishing and subscribing mechanism that allows targeted delivery of content to increase the ease of finding new and relevant content. This is achieved by utilizing the following mechanism.
• Enables publishers and subscribers to gather by using category contexts • By enabling publishers to select the most relevant context for an item • Issue with accountability By making it explicit action-By having a context setup phase that allows rapid processing by the community in a distributed manner-A personalized subscription process that allows a distributed form of reputation for the issuer By having

アイテムは発行を通じてシステムに導入されることができる。発行において、発行者は、図９のようにカテゴリーコンテキストにアイテムを割り当て、次にそのアイテムをシステムに発行する。本来、発行はアノテーションの形態である。好ましい実施形態において、発行はアノテーションとは別個の明示的な行為に変換される。これは、図５、６、７のように「発行」ボタンを押すことによって遂行されることができる。発行者は、ユーザＩＤ及びパスワードを用いてシステムに認証する（ステップ６００）必要がある。証明されたとき、システムは、発行者に発行者のすべての発行されるアイテムと共に使用される一意な発行者ＩＤを割り当てる。これは、実装の要件に応じて発行者のユーザＩＤと同じであっても又は同じでなくてもよい。次に、発行者は、アイテムをそれらの発行者が当該アイテムに最も関連性があると感じる一意なカテゴリーコンテキストに割り当てる（ステップ６０１）。これは発行者の判断によって決定されるが、発行者が意図されるカテゴリーコンテキストのアイテム並びに総ユーザ数及び総アイテム数を見ることができるという事実によって支援される。意図されるカテゴリーコンテキストがアイテムよりも比較的かなり多くのユーザを有する場合、それはアイテムにかなりの興味が存在する場合のコンテキストを示す可能性があり、アイテムが関連性がある場合、それは受け入れられるより大きな機会を有する。コンテキストがユーザと比べて比較的多くのアイテムを有する場合、アイテムがコンテキスト内のユーザの関心を引くためにその他のアイテムと競争する必要がある可能性があり、発行者がその他のアイテムと比較された当該アイテムの相対的な有用性のそれらの発行者の判断に応じてそのコンテキストか又は別のコンテキストを使用するように判断する可能性がある。 Items can be introduced into the system through publication. In issuing, the issuer assigns an item to the category context as shown in FIG. 9, and then issues the item to the system. Originally, issuance is a form of annotation. In the preferred embodiment, publishing is converted into an explicit action separate from annotations. This can be accomplished by pressing the “Issue” button as in FIGS. The issuer needs to authenticate to the system using the user ID and password (step 600). When certified, the system assigns the issuer a unique issuer ID that is used with all of the issuer's issued items. This may or may not be the same as the issuer's user ID depending on implementation requirements. Next, the issuer assigns the item to a unique category context that the issuer feels most relevant to the item (step 601). This is determined by the issuer's judgment, but is supported by the fact that the issuer can see the intended category context items as well as the total number of users and total items. If the intended category context has relatively much more users than the item, it may indicate the context where there is significant interest in the item, and if the item is relevant it is more than accepted Have a great opportunity. If the context has a relatively large number of items compared to the user, the item may need to compete with other items to attract the user in the context, and the issuer is compared with the other items Depending on their issuer's judgment of the relative usefulness of the item, it may decide to use that context or another context.

いったん発行者が（例えば、図９の発行ボタンを押すことによって）アイテムを発行すると、アイテムは発行イベントの形態でサーバシステム（１２０又は１３０）に送信される。そのような発行イベントの例が図１６にＸＭＬフォーマットで示される。発行イベントは、発行イベントがイベントの発行者の一意な発行者ＩＤを必ず含むことを除いてアノテーションイベントと同様である。既存のアイテムは、アイテムをシステムに最初に導入した元の発行者だけでなく任意の発行者によって発行されることができる。これは、単にアイテムに新しい発行者ＩＤを付け加えることと等価である。それぞれのそのような発行イベントは、正規化とアイテム及びコンテキストのマッピングとに関してその他のアノテーションイベントと同様の態様で処理される（ステップ６０２）。発行者ＩＤは、これがキーワードの記述を歪曲するのでキーワードに関する正規化の計算において使用されない。しかし、発行者ＩＤはアイテムに関するメタデータであり、すべてのそのようなアノテーションに渡ってアイテムレベルで収集されることができる。これらのＩＤは、アイテムに関する定義キーワードを決定することには含まれないが、ユーザがアイテムを発行者に関するそれらのユーザの熟知度に基づいて再ランク付けすることを可能にするために検索からの結果に含まれることができる（ステップ５０２、５０４、及び５０６）。発行者ＩＤは、それらのサブスクライバによる発行者に関する評判（又はアカウンタビリティ）の分散された形態を可能にする。 Once the issuer publishes the item (eg, by pressing the publish button of FIG. 9), the item is sent to the server system (120 or 130) in the form of a publish event. An example of such an issue event is shown in XML format in FIG. The issue event is similar to the annotation event except that the issue event necessarily includes the unique issuer ID of the event issuer. Existing items can be issued by any issuer, not just the original issuer who originally introduced the item into the system. This is equivalent to simply adding a new issuer ID to the item. Each such issue event is processed in the same manner as other annotation events in terms of normalization and item and context mapping (step 602). The issuer ID is not used in the normalization calculation for the keyword because it distorts the keyword description. However, the issuer ID is metadata about the item and can be collected at the item level across all such annotations. These IDs are not included in determining the defining keywords for the items, but from searches to allow users to re-rank items based on their familiarity with issuers. The results can be included (steps 502, 504, and 506). Issuer IDs allow a distributed form of reputation (or accountability) for issuers by their subscribers.

アイテムがコンテキストにとって新しいか、又はサーバシステム（１２０又は１３０）にとって新しい場合、好ましい実施形態はアイテムにコンテキスト設定するように試みる（ステップ６０３）。これは、カテゴリーコンテキスト内のユーザのサブセットである可能性があるユーザの集合にアイテムがプッシュされるプロセスである。そのようなユーザは、コンテキストに関する上位ユーザを決定するためのランク付け方法を使用することによってサーバシステムによって決定されることができるか、又はコンテキストに対して影響力のある発行者を含むことができるか、又はコンテキストのユーザの無作為のサブセットであるか、又はコンテキスト内のすべてのユーザにアイテムを送信することを含む、実装の要件に応じたその他の態様による。そのような特定のユーザに関するプッシュメカニズムは、サブスクライブプロセスを通じてサーバによって実行されることができる。基本的に、サブスクライバは時々サーバからアイテムを引き出す。サーバは、サブスクライバのユーザＩＤを使用して、ユーザに対する結果にコンテキスト設定のためのアイテムを追加するか否かを決定する。アイテムがコンテキスト設定されるまで、そのアイテムはユーザが通常のサブスクリプションプロセスの一部としてダウンロードするために利用できない。コンテキスト設定の主要な目的は、アイテムがコンテキストのコミュニティによってより速やかに利用されることができるように（検索及びサブスクリプションに関するより高いランク付け）迅速な態様で人の小さいが代表的なグループがそれらの人が関連性があると感じたアイテムを「記憶」ボタンを使用してアノテーションするか又は「発行」を使用して発行することを可能にし、アイテムがそのアイテムの定義キーワード及びその他の人が当該アイテムを検索するために使用することができるその他のそのようなキーワードの両方を決定することを可能にすることである。このプロセスは、影響力のある発行者がコンテキストに関するユーザのサブセットに含まれる場合に促進されることができる。当業者は、コンテキスト設定プロセスが受け入れを迅速にするためだけにあり、高トラフィックのコンテキストにおいて有用であるが、必須の条件ではないことを認めるであろう。このステップは、コンテキストが低いレベルのトラフィック、又はこのプロセスのオーバヘッドが相応の価値をもたらさないその他の状況にある場合は省略されることができる。 If the item is new to the context or new to the server system (120 or 130), the preferred embodiment attempts to context the item (step 603). This is the process by which items are pushed to a set of users that can be a subset of users in a category context. Such users can be determined by the server system by using a ranking method to determine top users for the context, or can include influential publishers for the context. Or a random subset of users in the context, or according to other aspects depending on implementation requirements, including sending items to all users in the context. Such a push mechanism for a particular user can be performed by the server through a subscribe process. Basically, subscribers sometimes retrieve items from the server. The server uses the subscriber's user ID to determine whether to add an item for context setting to the result for the user. Until the item is contextualized, the item is not available for the user to download as part of the normal subscription process. The main purpose of context setting is to allow small but representative groups of people in a quick manner (higher ranking for search and subscription) so that items can be used more quickly by the context community. Item can be annotated using the “Remember” button or published using “Issued”, and the item's definition keyword and others can be It is possible to determine both other such keywords that can be used to search for the item. This process can be facilitated when influential issuers are included in the subset of users for context. One skilled in the art will appreciate that the context setup process is only for quick acceptance and is useful in high traffic contexts, but not a mandatory requirement. This step can be omitted if the context is in a low level of traffic or in other situations where the overhead of this process does not provide a reasonable value.

検索に関する１つの問題は、ユーザがそれらのユーザが興味があるコンテキストを指定して結果を取り出す必要があることである。これは、ユーザが関連性のあるアイテムが利用可能になるときにそれらのアイテムを発見するための効率的な態様ではない。（上で説明されたように）好ましい実施形態は、サブスクリプションプロセス１１４を使用して自動化された態様で関連性のあるアイテムを取り出し、それらを図１０の「最新」タブに表示し１１３、それらを関連性の降順で示し、関連性のあるカテゴリーコンテキストをドリルダウンカテゴリーとして示す。そのようなコンテキストは、「最も読まれているコンテキスト」、「最も最近のコンテキスト」などの広く有用なコンテキスト、及び関心のあるユーザのプロファイルに基づいたコンテキストを含むことができる。サブスクリプションプロセスは、カテゴリーコンテキストに基づいてユーザプロファイルを生成する。これは、明示的及び暗黙的の両方で行われることができる。ユーザは、それらのユーザが興味があるカテゴリーコンテキストを繰り返しクエリの形態で明示的に指定することができ、その結果、それらのカテゴリーコンテキストからのアイテムがバックグラウンドで継続的にダウンロードされる。また、これは、（クリックすることなどによる）各カテゴリーコンテキストに関するアノテーションイベントのユーザの相対的頻度を観測し、その割合でアイテムを取り出すことによって暗黙的に行われる。好ましい実施形態は、すべてのイベントソースの種類からのアノテーションイベントを使用してこれらの割合を計算する。その他の実施形態において、異なるイベントソースの種類からのアノテーションイベントは実装の要件の通りに異なるように重み付けされることができる。例えば、「記憶」ダイアログからのアノテーションイベントは、クリックストリームからのアノテーションイベントよりもユーザの興味をよりはっきり示すとみなされることができ、したがってより重く重み付けされることができる。 One problem with searching is that users need to specify the context in which they are interested and retrieve the results. This is not an efficient way for users to find relevant items as they become available. The preferred embodiment (as described above) retrieves relevant items in an automated manner using the subscription process 114 and displays them on the “Latest” tab of FIG. Are shown in descending order of relevance, and the relevant category context is shown as a drill-down category. Such contexts can include broadly useful contexts such as “most read context”, “most recent context”, and contexts based on profiles of interested users. The subscription process generates a user profile based on the category context. This can be done both explicitly and implicitly. Users can explicitly specify the category contexts they are interested in in the form of repeated queries, so that items from those category contexts are continuously downloaded in the background. This is also done implicitly by observing the user's relative frequency of annotation events for each category context (such as by clicking) and retrieving items at that rate. The preferred embodiment calculates these percentages using annotation events from all event source types. In other embodiments, annotation events from different event source types can be weighted differently according to implementation requirements. For example, an annotation event from a “remember” dialog can be considered more clearly indicative of the user's interest than an annotation event from a clickstream and can therefore be more heavily weighted.

ユーザのプライバシーを守るために、好ましい実施形態はクライアントシステム１１０にこのプロファイルを記憶し、その結果、ユーザはそれらユーザのプロフィールの完全な制御を握り、それらのユーザの望むようにそのプロファイルを見る又は編集することができる。クライアントシステム１１０はカテゴリーコンテキストに基づいて匿名でアイテムを取り出すことができるが、コンテキスト設定プロセスが機能するために、サブスクリプションプロセス１１４は、ユーザに関するプロファイルが取り出されるためにユーザＩＤ／パスワードに基づくログインプロセスを用いてシステムに認証する（ステップ６１０）ようにユーザに要求する。一部の実施形態は、的を絞った広告の配信などの、コンテキスト設定とは別のその他のプロセスに対してこの形態のプッシュプロセスを使用することができる。 In order to protect the user's privacy, the preferred embodiment stores this profile in the client system 110 so that the user has full control of their profile and sees that profile as they wish or Can be edited. Although the client system 110 can retrieve items anonymously based on the category context, in order for the context setting process to work, the subscription process 114 is a login process based on the user ID / password to retrieve the profile for the user. Is used to request the user to authenticate to the system (step 610). Some embodiments may use this form of push process for other processes separate from context setting, such as targeted advertisement delivery.

「最新」タブ内のドリルダウンカテゴリーの処理は、「すべて」タブ内のアイテムの集合全体に渡る通常の検索のための処理とは異なり、イベントのために使用される期間と、カテゴリーコンテキストのユーザの特定期間の使用頻度に基づく時間に基づくＴＦ−ＩＤＦとの両方に対して調整される。基本的に、そのようなドリルダウンカテゴリーのランク付けは、期間の累積のイベント数、並びにユーザの使用及び使用の最新性を反映する。これは、ユーザが最近の「話題になっている」及び関連性のあるトピックを容易に発見できるようにする。 The processing of drill-down categories in the Latest tab is different from the processing for normal searches across the collection of items in the All tab, as opposed to the time period used for the event and the user of the category context. It is adjusted for both TF-IDF based on time based on the frequency of use of a specific period of time. Basically, the ranking of such drill-down categories reflects the cumulative number of events in the period, as well as the user's usage and up-to-date usage. This allows the user to easily discover recent “topics” and related topics.

ユーザに提示されるアイテムの関連性を高めるために、サブスクリプションプロセス１１４は、ユーザに表示する１１３前にアイテムをパーソナライズする。当技術分野において知られているパーソナライズに対する多くのアプローチが存在するが、そのようなアプローチの有効性はまだよく理解されていない。好ましい実施形態は、コンテキストに関してサーバシステム（１２０又は１３０）において上位にランク付けされたアイテムを取り出し、次にユーザプロファイルに基づいてクライアントシステム１１０においてアイテムを再ランク付けするアプローチを取る。これは、向上されたプライバシー及びセキュリティなどの有利に利用されるいくつかの特徴を有し、コラボレーションによるランク付け及びコンテンツに基づくランク付けの両方（それぞれサーバ及びクライアントベースのランク付け）を利用し、ネットワークの末端の計算機能力を使用する。アイテムはユーザプロファイルに基づいてクライアントにダウンロードされる（ステップ６１１）。これは、１日のような所与の期間にユーザが通常読む／使用するアイテム数をサンプリングし、次にその数の好適な倍数を取得し、ユーザのプロファイル内のコンテキスト中にそれを分配することによって行われることができる。例として、ユーザが１日に１５０個のアイテムを読み、すべてのそのような読んだもののうちの１０％がコンテキスト「Ａｊａｘ」内にある。サブスクリプションシステムは、アイテムのうちの１０％がキーワード「Ａｊａｘ」に対応するか、又は利用可能なだけ多くのいずれか少ない方であるようにその日中に１５０００個のアイテムをダウンロードすることができる。これらの割合は、ユーザがシステムとインタラクションするときにリアルタイムで、又は１日に１回など所定の間隔の後にバッチ式に再計算されることができる。倍数は、（アイテムに対する関連性のコミュニティの意見の代理である）サーバにおけるランク付けに基づく順序と、（ユーザの興味に基づいて決定される）クライアントにおけるランク付けとの間の混合を可能にする。好適な倍数は、ユーザ毎若しくはコンテキスト毎に計算されることができるか、又はスライダコントロールのような視覚的なメタファを通じてインタラクティブな態様でユーザによって設定されることさえできる。 To increase the relevance of items presented to the user, the subscription process 114 personalizes the item before displaying 113 to the user. There are many approaches to personalization known in the art, but the effectiveness of such approaches is not yet well understood. The preferred embodiment takes the approach of retrieving items ranked higher in the server system (120 or 130) with respect to context and then re-ranking items in the client system 110 based on the user profile. It has several advantageous features such as improved privacy and security, utilizing both collaboration ranking and content based ranking (server and client based ranking, respectively) Use computational capabilities at the end of the network. The item is downloaded to the client based on the user profile (step 611). This samples the number of items that a user normally reads / uses in a given period, such as a day, and then obtains a suitable multiple of that number and distributes it in context within the user's profile Can be done. As an example, a user reads 150 items per day and 10% of all such reads are in the context “Ajax”. The subscription system can download 15000 items during the day so that 10% of the items correspond to the keyword “Ajax” or as many as possible, whichever is less. These percentages can be recalculated in real time as the user interacts with the system or batchwise after a predetermined interval, such as once a day. Multiples allow a mix between ranking based on ranking at the server (which is a proxy for the community opinion of relevance to the item) and ranking at the client (determined based on user interest) . A suitable multiple can be calculated per user or per context, or even set by the user in an interactive manner through a visual metaphor such as a slider control.

それぞれのそのようなコンテキストに関してサーバからダウンロードされたアイテムが、ユーザプロファイルに基づいて再ランク付けされる（ステップ６１２）。これは、コンテキストに関するユーザプロファイルのキーワードベクトルを各アイテムのキーワードベクトルと比較することによって遂行される。好ましい実施形態は、ユーザプロファイルからのコンテキストのキーワードベクトルと各アイテムのキーワードベクトルとを厳格な態様で決定する。ユーザに関するコンテキストの一部であるアノテーションイベント内のアイテムに関する定義キーワードの一部であるキーワードのみが、キーワードベクトルを計算するために使用される（そのようなイベントは、プロファイル内の、再ランク付けされるべきアイテムに基づかないすべてのイベントを含む）。「マイアイテム」タブのクリックストリームに対応するアノテーションイベントは計算から除外される。このベクトルに対する重みは既に説明されたように時間に基づくＴＦ−ＩＤＦの態様で計算され、そのコンテキスト内のユーザに関するキーワードの特定期間の使用頻度がベクトル内のキーワードに対する重みとして使用される。再ランク付けは、コンテキストに関するユーザプロファイルのキーワードベクトルを各アイテムのキーワードベクトルと比較することによって行われる。アイテムに対するキーワードの頻度が、アイテムに関する収集され正規化されたイベントから判定される。次に、これは、対数に基づく減衰ｌｏｇ（Ｎ／ｄ）を用いる従来のＴＦ−ＩＤＦアプローチのようにドキュメントの頻度の逆数によって乗算され、ここでＮはアイテムの総数に対応し、ｄはそのうちの当該キーワードを有するアイテム数に対応する。ランクは、アイテムに関するキーワードベクトルとそのコンテキスト内のユーザに関するキーワードベクトルとの内積に基づいて各アイテムに対して計算される。 Items downloaded from the server for each such context are re-ranked based on the user profile (step 612). This is accomplished by comparing the user profile keyword vector for the context with the keyword vector for each item. The preferred embodiment determines in a strict manner the context's keyword vector from the user profile and the keyword vector of each item. Only keywords that are part of the definition keywords for items in annotation events that are part of the context for the user are used to calculate the keyword vector (such events are re-ranked in the profile. Including all events that are not based on items that should be). Annotation events corresponding to the click stream in the “My Items” tab are excluded from the calculation. The weight for this vector is calculated in a time-based TF-IDF manner as described above, and the frequency of use of the keyword for a particular period for the user in that context is used as the weight for the keyword in the vector. The re-ranking is done by comparing the user profile keyword vector for the context with the keyword vector for each item. The keyword frequency for the item is determined from the collected and normalized events for the item. This is then multiplied by the reciprocal of the frequency of the document, as in the conventional TF-IDF approach using a logarithmic decay log (N / d), where N corresponds to the total number of items, d of which This corresponds to the number of items having the keyword. The rank is calculated for each item based on the dot product of the keyword vector for the item and the keyword vector for the user in that context.

上記の計算は、キーワードと同様の態様で発行者ＩＤを用いて増強される。それぞれのそのような発行者ＩＤはキーワードベクトルに含まれることができ、内積によって生成される最終的なランクに影響を与える。発行者ＩＤはキーワードと比べて比較的出現頻度が低いので、それらの発行者ＩＤは最終的な重み付けに対して大きな影響を有する。発行者ＩＤの重み付けの効果はユーザが発行者を有用だと思ったコンテキストに制限され、１つのコンテキスト内の高くランク付けされた発行者は別のコンテキスト内のアイテムのランク付けに影響を与えない可能性があることに留意することが重要である。再ランク付けは、これらの計算されたランクの減少する値に基づいてアイテムをソートすることに等しい。当技術分野に習熟した読者は気付くであろうように、すべての上記の条件は厳格な制限であり、実装に対する必要性の通りに多数の異なる態様で緩和されることができる。実装は、イベントのキーワードがアイテムに関する定義キーワードの一部であるための条件を緩和することができる。実装は、アイテムのすべての定義キーワードをユーザがそれらのユーザのアノテーションにそれらのキーワードを使用したかどうかにかかわらず使用することができる。実装は、それらの実装の必要に応じてランク付けに対する発行者ＩＤの効果を好適に小さくすることができるか、又はランク付けに発行者ＩＤをまったく使用しない可能性がある。使用されることができるランク付けのようなＴＦ−ＩＤＦの多数の変更形態が存在する。アイテムに関する生成及び消費レートが計算の基礎として使用されることができる。 The above calculations are augmented with issuer IDs in a manner similar to keywords. Each such issuer ID can be included in a keyword vector, affecting the final rank generated by the dot product. Since the issuer ID has a relatively low frequency of appearance compared to the keyword, the issuer ID has a great influence on the final weighting. The effect of issuer ID weighting is limited to the context in which the user found the issuer useful, and highly ranked issuers in one context do not affect the ranking of items in another context. It is important to note that there is a possibility. Reranking is equivalent to sorting items based on decreasing values of these calculated ranks. As those skilled in the art will be aware, all the above conditions are strict limits and can be relaxed in a number of different ways as required for implementation. Implementations can relax the condition for the event keyword to be part of the definition keyword for the item. An implementation can use all of the definition keywords for an item regardless of whether users have used those keywords in their annotations. Implementations can suitably reduce the effect of issuer IDs on ranking as required by their implementation, or may not use issuer IDs at all for ranking. There are numerous variations of TF-IDF, such as ranking, that can be used. Production and consumption rates for items can be used as the basis for calculations.

次にこれらは、上述のように図１０の「最新」タブ上に表示される６１３。アノテーションブラウザに関して、「最新」タブにおける検索は、システム内のすべてのアイテムを検索することと対称的に期間に基づいたアイテムのみを検索する。これは、システムに対して知られているすべてのアイテムをカバーする「すべて」タブにおける検索とは異なる。 These are then displayed 613 on the “Latest” tab of FIG. 10 as described above. With respect to the annotation browser, the search in the “latest” tab only searches for items based on time periods, as opposed to searching for all items in the system. This is different from searching in the “All” tab, which covers all items known to the system.

サブスクリプションの表示は、発行者ＩＤを認識し、それらをサーバに送り返すためにも使用される。好ましい実施形態は、アイテム毎に許可される発行者ＩＤ数を１０などの所定の数に制限し、ＩＤをアイテムと共にその所定の数まで発行順に記憶する。サブスクリプションは、アイテムを取り出すときに各アイテムに関するすべての知られている発行者ＩＤをダウンロードする。サブスクリプションプロセスはすべてのそのような発行者ＩＤを記憶し（基本的に、ユーザプロファイル内のそれぞれのそのようなＩＤに関する使用を更新する）、発行者ＩＤに基づく再ランク付けの計算にそれを使用する。これらが上述のように再ランク付けのためだけに使用されるのではなく、アイテムの元の発行者ＩＤ及び最もマッチする発行者ＩＤもアイテムに関するユーザからのアノテーションイベントに追加される。次に、これは上述のアノテーション収集方法を通じてサーバに返送される（ステップ６１４）。また、これはコンテキストに対してユーザプロファイルを更新する（ステップ６１５）。これは、発行者ＩＤに対するフィードバックループを閉じ、それらが将来のコンテキスト設定及びその他の目的のためにサーバサイドでランク付けされることを可能にする。最もマッチする発行者ＩＤは、アノテーションの功績を認められるように、ユーザによく知られている発行者を与える。元の発行者ＩＤは、アイテムをシステムに導入した発行者が功績を認められるように含まれる。当業者はそのようなフィードバックループを実装するための多数の異なる方法が存在することを認識し、所与の実装の要件に対して有利に使用されることができる異なるシステムの特徴を与える。しかし、これらは、中央サーバに戻る発行者ＩＤに関するフィードバックループを提供する基本的意図から逸脱しない。特定の実施形態は、ユーザが高くランク付けされた発行者を検索することを可能にすることができる。その他の実施形態は、検索のヒットを評価することにアイテムに関する発行者のランク付けを含むことによってアイテムのランク付けを増強することができる。特定の実施形態は、上述のように発行者ＩＤと同様の態様で所与のアイテムに関するユーザＩＤを使用することができる。特定の実装が、上記のようなサブスクリプションと同様の態様で検索結果に関する再ランク付け５０７を使用することができる。検索結果に関するドリルダウンカテゴリーも、期間の基準又はユーザの特定期間の使用頻度の基準に従ってランク付けされることができる。好ましい実施形態は、純粋にサーバにおける累積のイベントの合計に基づいた検索結果を保持し、それを使用するすべてのユーザに対して同じである。サブスクリプションアイテムだけが再ランク付けされる。これは、ユーザがそれらの個々のビューとは別のデータの純粋なグループのビューを見ることが可能であるように行われる。これは、すべてのユーザに渡り共有されるアイテムに対する少なくとも１つのビューを可能にする。 The subscription display is also used to recognize the issuer ID and send them back to the server. The preferred embodiment limits the number of issuer IDs allowed per item to a predetermined number, such as 10, and stores the IDs along with the items up to the predetermined number in order of issue. The subscription downloads all known issuer IDs for each item when retrieving the item. The subscription process remembers all such issuer IDs (basically updating the usage for each such ID in the user profile) and uses it for reranking calculations based on the issuer ID. use. These are not only used for re-ranking as described above, but the original issuer ID of the item and the best matching issuer ID are also added to the annotation event from the user for the item. This is then returned to the server through the annotation collection method described above (step 614). It also updates the user profile for the context (step 615). This closes the feedback loop for the issuer ID and allows them to be ranked server-side for future context settings and other purposes. The best-matched issuer ID gives the user a well-known issuer so that the achievement of the annotation can be recognized. The original issuer ID is included so that the issuer who introduced the item to the system can be credited. Those skilled in the art will recognize that there are many different ways to implement such a feedback loop, giving the features of different systems that can be used advantageously for the requirements of a given implementation. However, they do not depart from the basic intention of providing a feedback loop for the issuer ID returning to the central server. Certain embodiments may allow a user to search for highly ranked issuers. Other embodiments may enhance item ranking by including issuer ranking for items in evaluating search hits. Certain embodiments may use a user ID for a given item in a manner similar to an issuer ID as described above. Certain implementations can use re-ranking 507 for search results in a manner similar to subscriptions as described above. Drill-down categories for search results can also be ranked according to period criteria or user frequency-of-use criteria. The preferred embodiment keeps search results purely based on the cumulative total of events on the server and is the same for all users who use it. Only subscription items are reranked. This is done so that the user can see a pure group view of the data separate from their individual views. This allows at least one view for items that are shared across all users.

アノテーションシステムは、任意の所与のカテゴリーコンテキストに関する多数のアイテムを受信することができる。クライアントへのカテゴリーコンテキストに基づいたアイテムを受信するサブスクリプションプロセスは、そのようなフローに対応することができない可能性がある。好ましい実施形態は、周期的にコンテキストの所定の数の最も関連性のあるアイテムを取り出す。これは、再ランク付けのためにクライアントにおいて利用可能でない多くのアイテムが存在する可能性があることを示唆する。しかし、任意の所与の瞬間、クライアントは最も関連性のあるアイテムを有する可能性が高い。その他の実施形態は、すべてのアイテム又はすべてのイベントさえもが取り出され、ある期間のアイテムのクライアントイメージがサーバイメージとの同期を維持されるなど、当該アプローチに対する変更形態を取る可能性がある。再ランク付けにおいて使用されるキーワード及び発行者ＩＤの重要な統計がサーバからクライアントに取得される。これは、サブスクリプションのときに行われ、周期的に最新に保たれる。同様の方法が、アイテムに関する定義キーワードが変わるときにその定義キーワードを取得するために使用される。すべてのそのようなデータは、検索などのユーザによる情報の実際の要求に便乗する態様で取り出されることができるか、又は規則的間隔で保守される可能性がある。すべてのユーザプロファイルデータも規則的間隔でサーバにバックアップされることができ、及び／又はネットワークベースの記憶から利用できる可能性がある。これは、アノテーションサーバを管理するエンティティから独立したエンティティによって記憶されることができる。当業者は気付くであろうように、上記の基本機能を変更することなく上記発明が実装されることができる多くのシステム構成が存在する。サブスクリプションは、クライアントサーバアーキテクチャにおいてアノテーションサーバがクライアントと通信することを暗黙的に仮定する。しかし、アノテーションサーバの処理は、ロードバランシング、３層アーキテクチャ、ＲＰＣ／ウェブサービスベースのアプローチ、ピアツーピアアプローチなどのいくつかの従来の態様で分散されることができる。処理はアイテム及びコンテキストに基づいて行われるので、計算がハッシュに基づいて分散されることができる。アイテムの場合、各サーバは、ハッシュ関数の基礎において決定されるアイテムのサブセットのみを処理することができる。コンテキストベースのサーバの場合、処理はコンテキストに基づいてサーバに割り当てられることができる。ＲＥＳＴのようなアプローチが、性能を高速化するためにローカルキャッシュサーバを実装することができるように利用されることができる。処理は、コンテキストベースのサーバのための少なくとも１つの新規性のある態様で分散されることもできる。ＤＮＳのようなアプローチが使用されることができ、処理はコンテキストに基づいていくつかのサーバに渡って階層的な態様で連携される。例として、コンテキスト「Ｐｒｏｇｒａｍｍｉｎｇ」に関するイベントに対するすべての処理は、このコンテキストを専門に扱うサーバに中継されることができる。コンテキスト「ＪａｖａｓｃｒｉｐｔＰｒｏｇｒａｍｍｉｎｇ」に関するイベントは「Ｐｒｏｇｒａｍｍｉｎｇ」を専門に扱うサーバに送信されることができ、そこからコンテキスト「ＪａｖａｓｃｒｉｐｔＰｒｏｇｒａｍｍｉｎｇ」に基づくさらなるサーバに送信されることができる。 The annotation system can receive a number of items for any given category context. A subscription process that receives items based on category context to the client may not be able to accommodate such flows. The preferred embodiment periodically retrieves a predetermined number of most relevant items of context. This suggests that there may be many items that are not available at the client for re-ranking. However, at any given moment, the client is likely to have the most relevant item. Other embodiments may take modifications to the approach, such as all items or even all events being retrieved and the client image of the item for a period of time being kept synchronized with the server image. Important statistics of keywords and issuer IDs used in reranking are obtained from the server to the client. This is done at subscription time and is kept up to date periodically. A similar method is used to obtain the definition keyword when the definition keyword for the item changes. All such data can be retrieved in a manner that piggybacks on the actual request for information by the user, such as a search, or can be maintained at regular intervals. All user profile data can also be backed up to the server at regular intervals and / or available from network-based storage. This can be stored by an entity that is independent of the entity that manages the annotation server. As those skilled in the art will be aware, there are many system configurations in which the invention can be implemented without changing the basic functionality. Subscription implicitly assumes that the annotation server communicates with the client in a client-server architecture. However, annotation server processing can be distributed in several conventional ways, such as load balancing, a three-tier architecture, an RPC / web service based approach, a peer-to-peer approach. Since processing is based on items and contexts, computations can be distributed based on hashes. In the case of items, each server can process only a subset of items determined on the basis of a hash function. For context-based servers, processing can be assigned to servers based on context. An approach such as REST can be utilized so that a local cache server can be implemented to speed up performance. Processing can also be distributed in at least one novel manner for context-based servers. A DNS-like approach can be used and processing is coordinated in a hierarchical fashion across several servers based on context. As an example, all processing for an event related to the context “Programming” can be relayed to a server that specializes in this context. Events relating to the context “Javascript Programming” can be sent to a server that specializes in “Programming” and from there to further servers based on the context “Javascript Programming”.

好ましい実施形態は単に本発明の基本的概念を使用するシステムの例にすぎず、可能な、及び本発明の基本的な意図から逸脱しない多数の変更形態が存在する。好ましい実施形態は、検索エンジンにおいてよくあるような検索コンテキストに関するＡＮＤ、ＯＲ、及びＮＯＴなどのブール論理式を使用するように拡張されることができる。コラボレーションの範囲を設けるか、又は「Ｓｐａｍ」若しくは「Ａｄｕｌｔ」などのような特別な目的のキーワードを実装することによってコンテンツのフィルタリングを取り入れることができる。これは、ボタンの形態で、又はその他の好適な形態でユーザに提示されることができる。フィルタリングは、ユーザがそのようなキーワードのキーワード数に関する値を設定すること可能にすることによってクライアントシステムにおいて制御されることができ、サブスクリプション取りだしプロセス及び再ランク付けプロセスは、ユーザが指定した量を超えるこれらのキーワードに関するキーワード数によって任意のアイテムを除去することができる。このテーマに関する可能な変更形態は、そのようなキーワードがアイテムに関する定義キーワードである場合にアイテムを除去することである。実施形態が発行を既存のアイテムに関するアノテーションイベントの代わりに特別なアイテムの種類として扱い、それによってそれぞれのそのようなアノテーションが別個のアイテムＩＤを持つことを可能にすることができる。これは、アノテーションがそのアノテーションのメタデータ内で別のアノテーションを指すことを可能にし、そのようなアノテーションのチェーンの作成を可能にする。これは、対話のスレッドの作成を可能にし、アイテムがアノテーションサーバに対して別個に記憶され、発行イベントが単にその存在を知らせるための方法として働く説明された好ましい実施形態におけるフォーマットの代わりにアノテーションサーバ自体の中にこれらのメッセージが保存されることを可能にするフォーラムのような機能を可能にする。 The preferred embodiment is merely an example of a system that uses the basic concepts of the invention, and there are numerous variations that are possible and do not depart from the basic intent of the invention. The preferred embodiment can be extended to use Boolean logic expressions such as AND, OR, and NOT for search context as is common in search engines. Content filtering can be incorporated by providing a range of collaboration or by implementing special purpose keywords such as “Spam” or “Adult”. This can be presented to the user in the form of a button or in any other suitable form. Filtering can be controlled in the client system by allowing the user to set a value for the number of keywords for such keywords, and the subscription retrieval and re-ranking processes can be controlled by a user-specified amount. Any item can be removed depending on the number of keywords related to these keywords. A possible variation on this theme is to remove the item if such a keyword is a defining keyword for the item. Embodiments can treat publishing as a special item type instead of an annotation event for an existing item, thereby allowing each such annotation to have a separate item ID. This allows an annotation to point to another annotation within that annotation's metadata, allowing the creation of such a chain of annotations. This allows the creation of a thread of interaction, the annotation server instead of the format in the described preferred embodiment, where items are stored separately to the annotation server, and the publishing event simply serves as a way to signal its existence Allows a forum-like feature that allows these messages to be stored within itself.

重大な新しい機能を達成するための小さな調整を用いて既存のソフトウェアアプリケーション内で使用されることができる本発明の実施形態が存在する。１つのそのような実施形態において、カテゴリーコンテキストの概念が、それらのカテゴリーコンテキストを現在のウェブ検索エンジンにおいてクリックストリームログから導出することによってそのような検索エンジンに有利に組み込まれることができる。これは、任意の検索エンジンに容易に追加されることができ、より大きな平均の数のキーワードを有するクエリを生成することにおいて重要な役割を担うことができる。これは、現在のパーソナライズのアプローチよりも関連性のある結果を達成することに対するより有効な態様である可能性がある。 There are embodiments of the present invention that can be used within existing software applications with minor adjustments to achieve significant new functionality. In one such embodiment, the concept of category contexts can be advantageously incorporated into such search engines by deriving those category contexts from clickstream logs in current web search engines. This can be easily added to any search engine and can play an important role in generating queries with a larger average number of keywords. This may be a more effective way to achieve more relevant results than current personalization approaches.

別の実施形態において、電子メールが、キーワードをメールアドレスとして受け入れる特別に設計されたメールサーバを使用することによってキーワードのようなコンテキストを用いてアノテーションされることができる。例として、キーワードＫｅｙ１がＫｅｙ１＠ｓｐｅｃｉａｌＳｅｒｖｅｒ．ｔｌｄとして入力されることができる。Ｏｕｔｌｏｏｋのような既存の電子メールクライアントに対するアドインモジュールが、メールアドレスをオートコンプリートすることによってそのようなキーワードのシームレスな入力を可能にするインタラクションパラダイムをもたらすために修正されることができる。この電子メールが、Ｔｏ：、ＣＣ：、及びＢＣＣを使用することによってそのようなアドレスに送信されることができる。これは、基本的なプロトコルを変更する必要なしに電子メールのカテゴリーを可能にする。メールが転送又は返信される度に、そのようなアノテーションが行われ、送信者の電子メールＩＤがこのシステムのユーザＩＤなどとして使用され、すべてのそのようなアノテーションがサーバにおいて収集されることができる。そのとき、フロントエンドのアドインモジュールは、電子メールのカテゴリーコンテキストへのカテゴリー、及び検索などの好ましい実施形態のその他の機能を提供することができる。別のあり得る方法は、ストップワードが取り除かれ、それから電子メールが本発明のメッセージと同様に処理されることができるように「題名」行のテキストを使用してキーワードを導出することである。 In another embodiment, an email can be annotated with a keyword-like context by using a specially designed mail server that accepts the keyword as an email address. For example, the keyword Key1 is Key1 @ specialServer. It can be input as tld. Add-in modules for existing email clients such as Outlook can be modified to provide an interaction paradigm that allows seamless entry of such keywords by auto-complete email addresses. This email can be sent to such an address by using To :, CC :, and BCC. This allows email categories without having to change the basic protocol. Each time a mail is forwarded or returned, such an annotation is made, the sender's email ID is used as the user ID of this system, etc., and all such annotations can be collected at the server . The front-end add-in module can then provide categories to the email's category context and other features of the preferred embodiment such as searching. Another possible method is to use the text in the “title” line to derive keywords so that the stopwords are removed and then the email can be processed in the same way as the message of the present invention.

発行者ＩＤに対するＴＦ−ＩＤＦの時間に基づく変形に基づいてアイテムを再ランク付けする概念は、今でさえ電子メールクライアントソフトウェアに有利に実装されることができる。発行者ＩＤは、受信箱内の受信電子メールの送信者電子メールＩＤとなる。メールソフトウェアは、各送信者に関する特定期間の使用頻度を計算するために、どの送信者からのどの電子メールがユーザによって読まれるかを見るためにユーザを監視することができる。特定の送信者から受信された電子メール数は、送信者ＩＤに関連するアイテム数の代用として働くことができる。再ランク付け機能が、関連性に基づいてユーザの受信箱をソートするために有利に実装されることができる。 The concept of re-ranking items based on TF-IDF time-based variations on issuer ID can still be advantageously implemented in email client software. The issuer ID is the sender email ID of the received email in the inbox. The mail software can monitor the user to see which email from which sender is read by the user to calculate the frequency of use for a particular period for each sender. The number of emails received from a particular sender can serve as a surrogate for the number of items associated with the sender ID. A re-ranking function can be advantageously implemented to sort the user's inbox based on relevance.

同様のアプローチが、好適に開発されたソフトウェアを用いてユーザの集団全体に対してコンテキストの影響を受ける態様で発見され得るブログのポスト、ポッドキャスト、及び任意のＲＳＳフィードに基づくアイテムを作成するために有利に利用されることもできる。すべてのそのようなポストは発行イベントとして好適なアノテーションサーバに送信されることができ、ＲＳＳリーダソフトウェアが、本発明によって説明されたようにサブスクリプション及びアノテーションを可能にするために好適に修正されることができる。発行者ＩＤは、ＲＳＳｕｒｌから合成されることができるか、又はアノテーションサーバにおいて特別なログイン手順によって割り当てられることができる。 A similar approach uses well-developed software to create items based on blog posts, podcasts, and arbitrary RSS feeds that can be discovered in a context-sensitive manner for the entire population of users It can also be used advantageously. All such posts can be sent to a suitable annotation server as a publishing event, and the RSS reader software is suitably modified to enable subscriptions and annotations as described by the present invention. be able to. The issuer ID can be synthesized from RSS url or assigned by a special login procedure at the annotation server.

当技術分野に習熟した読者は、本発明及び説明された実施形態が、企業又はデスクトップのファイル、ウェブ上のブログ、及び本発明に関するその他の好適な用途に様々な形態で適用されることができることを認めるであろう。インスタントメッセージ通信ソフトウェア及びチャットソフトウェアは、これを使用してコンテキストベースのリアルタイムのメッセージ通信を実装することができる。また、本発明は、的を絞った広告の配信に特定の関連性を有する。ユーザの検索は、広告がユーザに対して的を絞った態様でプッシュされることを可能にする。これは、プルに基づく現在のキーワードベースの広告モデルの代わりにプッシュベースの新しい形態の広告を提供するためにウェブ検索エンジンのプロバイダによって利用されることができる。サブスクリプションは現在のウェブ検索と比較してユーザがさらされるコンテキストの数を増加させるので、サブスクリプションは、対応するより多くの高い関連性を有する広告機会をもたらす。ユーザプロファイルはクライアントに保持されることができるので、これは、コンテンツがテレビなどのように無料で与えられるコンテンツ配信の新しいモデルを可能にする可能性があるパーソナライズされた態様でユーザにストリーミングされるコマーシャルと共にＴＶ又は映画／ビデオを提供することなど、広告配信の新規性のある形態で利用されることができる。この形態のコミュニケーションは、多対多コミュニケーションパラダイムがより低コストでより効率の良いそのような広告の的を絞った配信を可能にする分類された広告の配信に理想的に適する可能性がある。例として、中古カメラの販売人は、対応するカテゴリーコンテキストの使用を通じて中古カメラの潜在的な買い手とコミュニケーションすることができる。 Those skilled in the art will appreciate that the present invention and the described embodiments can be applied in various forms to corporate or desktop files, web blogs, and other suitable uses for the present invention. Would admit. Instant messaging software and chat software can be used to implement context-based real-time messaging. The present invention also has a particular relevance to targeted advertisement delivery. The user's search allows advertisements to be pushed in a targeted manner to the user. This can be utilized by web search engine providers to provide push-based new forms of advertisements instead of pull-based current keyword-based advertising models. Since subscriptions increase the number of contexts to which a user is exposed compared to current web searches, subscriptions provide a correspondingly more relevant advertising opportunity. Since the user profile can be maintained on the client, it is streamed to the user in a personalized manner that may allow a new model of content delivery where content is provided free of charge, such as television It can be used in novel forms of advertising delivery, such as providing TV or movies / video with commercials. This form of communication may be ideally suited for the distribution of classified advertisements where a many-to-many communication paradigm allows for the targeted distribution of such advertisements at a lower cost and more efficiency. As an example, a used camera seller can communicate with a potential buyer of a used camera through the use of a corresponding category context.

本発明の実施形態は、同じ発明者による２つの特許出願、「Ｓｙｓｔｅｍｆｏｒｓｅｍａｎｔｉｃａｌｌｙｄｉｓａｍｂｉｇｕａｔｉｎｇｔｅｘｔｉｎｆｏｒｍａｔｉｏｎ」（米国特許出願第１０／９５４９６４号及び国際出願ＰＣＴ／ＳＧ２００５／０００３２１号）並びに「Ａｍｅｔｈｏｄａｎｄｓｙｓｔｅｍｆｏｒｏｒｇａｎｉｚｉｎｇｉｔｅｍｓ」（国際出願ＰＣＴ／ＳＧ２００５／０００３２０号）において提供された意味論メカニズムを有利に利用することができる。これらの従来出願の内容は参照により本願に援用される。出願番号ＵＳ１０／０５４０６４は、テキストが意味の一意な機械表現に変換されることを可能にするユーザインターフェース方法を開示する。したがって、ｂｌｏｇ、ｂｌｏｇｓ、ｗｅｂｌｏｇなどのようなキーワードが、意味「ｗｅｂｌｏｇ」を表す単一のＩＤにマッピングされることができる。これは、本発明のアノテーション、検索、サブスクリプション、及びその他のメカニズムがより正確であることを可能にする。アイテムをそれらのアイテムをより発見されにくくする上記のキーワードによって３つの別個のコンテキストに分割させる代わりに、それらは容易な想起及び比較のために１つのコンテキストにグループ化される。本発明のメカニズムは任意の言語からのキーワードを認めることができるので、意味のそのような機械表現が言語をまたがる態様で実装され、同じ曖昧性除去を達成することもできる。出現番号ＰＣＴ／ＳＧ２００５／０００３２０は、そのような意味メタデータが「ｒｅｌａｔｅｄ−Ｔｏ」関係を通じて制限された階層の形態で組織化されることを可能にする知識表現方法を開示する。これは、ちょうど出願番号ＵＳ１０／０５４０６４の方法がテキストの曖昧性を除去するようにコンテキストの曖昧性を除去するために有利に利用されることができる。例として、以下のコンテキスト｛“Ｊａｖａｓｃｒｉｐｔ”，“Ｐｒｏｇｒａｍｍｉｎｇ”｝及び｛“Ｊａｖａｓｃｒｉｐｔ”｝は事実上アイテムの同じ集合を指す。基本的に、Ｊａｖａｓｃｒｉｐｔはプログラミング言語であるので、Ｊａｖａｓｃｒｉｐｔに関連するアイテムの多くはプログラミングにも関連する。したがって、キーワード「Ｐｒｏｇｒａｍｍｉｎｇ」は、新しい情報又は区別能力をアイテムの集合に加えない。２つの別個のコンテキストを持つことによって、コンテキスト空間が断片化される。これは、「Ｊａｖａｓｃｒｉｐｔ」から「Ｐｒｏｇｒａｍｍｉｎｇ」に向かう、当該特許出願に記載の「ｒｅｌａｔｅｄ−Ｔｏ」関係を持つことによって改善されることができる。これは、任意のアイテムに関して、「Ｊａｖａｓｃｒｉｐｔ」がキーワードである場合に「Ｐｒｏｇｒａｍｍｉｎｇ」キーワードが存在すると仮定される可能性があることを示唆する。そのような意味関係を前もって計画させることによって、２つのコンテキストを同じコンテキストに曖昧性除去することができる。当該特許出願は、より直感的なユーザエクスペリエンスのためのドリルダウンキーワードに関するカテゴリーコンテキストと共に有利に使用されることができる「ブラウズパスビヘイビア（ＢｒｏｗｓｅＰａｔｈＢｅｈａｖｉｏｕｒ）」と呼ばれるメカニズムも記載する。本発明のメカニズムと有利に組み合わされることができる、両方の特許出願に開示されたようなこれらの発明のいくつかのその他の能力が存在する。 Embodiments of the present invention are described in two patent applications by the same inventor, “System for semantically distorting text information” (US Patent Application No. 10/95964 and International Application PCT / SG2005 / 000321) and “A method and system”. The semantic mechanism provided in "Organizing items" (International Application PCT / SG2005 / 000320) can be used to advantage. The contents of these prior applications are incorporated herein by reference. Application number US 10/054064 discloses a user interface method that allows text to be converted into a unique machine representation of meaning. Thus, keywords such as blog, blogs, weblog, etc. can be mapped to a single ID representing the meaning “weblog”. This allows the annotations, searches, subscriptions, and other mechanisms of the present invention to be more accurate. Instead of splitting the items into three separate contexts with the above keywords that make them less likely to be discovered, they are grouped into one context for easy recall and comparison. Since the mechanism of the present invention can accept keywords from any language, such a machine representation of meaning can be implemented in a cross-language manner to achieve the same disambiguation. Appearance number PCT / SG2005 / 000320 discloses a knowledge representation method that allows such semantic metadata to be organized in the form of a restricted hierarchy through a “related-To” relationship. This can be advantageously used to remove context ambiguity just as the method of application number US 10/054064 removes text ambiguity. As an example, the following contexts {“Javascript”, “Programming”} and {“Javascript”} refer to virtually the same set of items. Basically, because JavaScript is a programming language, many of the items related to Javascript are also related to programming. Thus, the keyword “Programming” does not add new information or discrimination capabilities to the set of items. By having two separate contexts, the context space is fragmented. This can be improved by having a “related-To” relationship as described in the patent application from “Javascript” to “Programming”. This suggests that for any item, it may be assumed that the “Programming” keyword is present when “Javascript” is the keyword. By having such semantic relationships planned in advance, two contexts can be disambiguated into the same context. The patent application also describes a mechanism called “Browse Path Behavior” that can be used advantageously with category context for drill-down keywords for a more intuitive user experience. There are several other capabilities of these inventions as disclosed in both patent applications that can be advantageously combined with the mechanism of the present invention.

本発明は、意味メタデータ又は概念、及びそれらの関係を生成するための重要な方法として働くこともできる。本発明におけるアイテムの定義キーワードが、候補概念を生成するために利用されることができる。例として、同じ意味に割り当てられることができるキーワードが、各アイテムに関する定義キーワード内の同様の単語を調べることによって（自動的に又は手動で）生成されることができる。同じ語幹形を有する異なるキーワードは、自動的に通常の意味にマッピングされることができる（同じアイテム内で使用される「ｂｌｏｇ」及び「ｂｌｏｇｓ」などは同じことを意味する可能性が高い）。少しの手動介入を用いて、「ｗｅｂｌｏｇ」が任意の所与のアイテムの定義キーワードで「ｂｌｏｇ」又は「ｂｌｏｇｓ」と共に何度も使用される場合、「ｗｅｂｌｏｇ」を同じ意味に関連付けることができる。同様に、異なる自然言語にまたがるキーワードが同じ意味に関連付けられることができる。より包括的な方法として、定義キーワード及びそれらの対応するアイテムの行列を取り、我々が共起、及び２次、３次、又はそれより高次の共起を含む語形の間の相関を調べて「ｒｅｌａｔｅｄ−Ｔｏ」関係及び定義概念並びにそれらのキーワードの割り当てを生成することを可能にするＬＳＩなどの相関解析を実行することができる。これは、ＩＲにおけるその他のそのような試みとは大きく異なる。本発明のメカニズムによって生成される定義キーワードは、実世界のアイテムに関する単語の意味のコミュニティの解釈であり、実際の使用における意味を示す。これは、意味メタデータ及びそれらの関係を導出するためにＬＳＩなどのパターン認識方法及びその他の相関方法が使用されることができる質の高いデータ集合を与える。カテゴリーコンテキストも、意味メタデータを導出するためにそのような解析のために利用されることができる同様の質の高いデータ集合を表す。そのようなメタデータは、２つの特許出願によってそれらの機能のために要求される形態に変換されることができ、次に本発明の精度を向上するために使用されることができる。 The present invention can also serve as an important method for generating semantic metadata or concepts and their relationships. Item definition keywords in the present invention can be used to generate candidate concepts. As an example, keywords that can be assigned to the same meaning can be generated (automatically or manually) by looking up similar words in the definition keywords for each item. Different keywords with the same stem form can be automatically mapped to their normal meanings (such as “blog” and “blogs” used in the same item are likely to mean the same). With a little manual intervention, if “weblog” is used multiple times with “blog” or “blogs” in the definition keyword of any given item, “weblog” can be associated with the same meaning. Similarly, keywords that span different natural languages can be associated with the same meaning. As a more comprehensive method, take a matrix of definition keywords and their corresponding items, and we examine the correlation between word forms that include co-occurrence and second, third, or higher order co-occurrence Correlation analysis such as LSI that allows to generate “related-To” relationships and definition concepts and their keyword assignments can be performed. This is very different from other such attempts in IR. The definition keyword generated by the mechanism of the present invention is a community interpretation of the meaning of words for real-world items and indicates the meaning in actual use. This provides a high quality data set in which pattern recognition methods such as LSI and other correlation methods can be used to derive semantic metadata and their relationships. The category context also represents a similar high quality data set that can be used for such analysis to derive semantic metadata. Such metadata can be converted into the form required for their function by two patent applications and then used to improve the accuracy of the present invention.

上記と同様に、これは、アノテータが「ｂａｎｄ＝Ｂｅａｔｌｅｓ」などのキーワードの形態で関係を指定することを可能にし、定義キーワードを決定するのと同様の方法を使用して所与のアイテム／概念の属性名の発見を可能にし、それを使用して自動的又は半自動的な態様でオントロジを生成することによってセマンティックウェブによって定義されるようなより豊富なオントロジを生成することに拡張されることができる。 As above, this allows the annotator to specify relationships in the form of keywords such as “band = Beatles” and uses a method similar to determining the definition keyword to give a given item / concept Can be extended to generate richer ontology as defined by the Semantic Web by enabling the discovery of attribute names and using it to generate ontologies in an automatic or semi-automatic manner it can.

本発明の好適な実施形態は、組織内の処理機能を増強するための新しい方法として使用されることができる。例として、組織内で自身の机を移動したい人は、単に「移動要求」などのコンテキストにメッセージを発行することができ、移動要求を処理することに関わるすべての関係者がこのトピックにサブスクライブし、同時に通知されることができる。これらの関係者は、要求、設備、技術などのオーソライザを含むことができる。これらの関係者のそれぞれは、元の要求のコンテキストへのメッセージの連鎖を可能にするためにアイテムを元のアイテムＩＤと共に１つにして発行することができる。これらの発行されたアイテムは、要求を処理するための組織のワークフロー内のステップの完了に対応する可能性がある。例として、そのようなステップは、要求の認可、要求のその他のコンテキストへの通知、要求の終了、要求の別のコンテキストへの割り当てなどを含むことができる。これは、「移動要求」意味メタデータを定義し、それをアイテムの種類に割り当てるなど、発行されたアイテムを意味メタデータを用いて厳密に分類することによって強化されることができる。より広範な処理機能を容易にするために、このメカニズムは既存のＢＰＭＳシステムとインターフェースを取られることができる。処理の自動化に関する重要な問題のうちの１つは、例外的な状況を扱うことが難しいことである。本発明のパラダイムなどの多対多コミュニケーションパラダイムは、そのようなインタラクションに対してより有機的で適応性のある構造をもたらす助けとなることができる。 The preferred embodiment of the present invention can be used as a new method for enhancing the processing function in tissue. As an example, a person who wants to move his desk within an organization can simply issue a message in a context such as “Move Request”, and all parties involved in processing the move request subscribe to this topic. And can be notified at the same time. These parties can include authorizers such as requirements, equipment, and technology. Each of these participants can publish items together with the original item ID to allow message chaining to the context of the original request. These published items may correspond to the completion of steps within the organization's workflow to process the request. By way of example, such steps may include request authorization, notification of the request to other contexts, termination of the request, assignment of the request to another context, etc. This can be enhanced by strictly classifying issued items using semantic metadata, such as defining “move request” semantic metadata and assigning it to item types. This mechanism can be interfaced with existing BPMS systems to facilitate a wider range of processing functions. One of the important issues with process automation is that it is difficult to handle exceptional situations. Many-to-many communication paradigms, such as the paradigm of the present invention, can help provide a more organic and adaptable structure for such interactions.

幅広く説明された本発明の範囲又は精神を逸脱することなく、特定の実施形態に示された本発明に対して多くの変更及び／又は修正がなされ得ることが当業者によって理解されるであろう。したがって、本実施形態はすべての点において例示的とみなされるべきであり、限定的とみなされるべきではない。 It will be appreciated by those skilled in the art that many changes and / or modifications can be made to the invention shown in the specific embodiments without departing from the scope or spirit of the invention as broadly described. . Accordingly, this embodiment should be considered exemplary in all respects and not limiting.

典型的なアノテーションの分布を示す図である。It is a figure which shows distribution of typical annotation. べき乗則の分布を示すグラフである。It is a graph which shows distribution of the power law. 本発明による基本的なシステム設計を示す図である。It is a figure which shows the basic system design by this invention. 例示的なコンピューティング環境の図である。1 is a diagram of an exemplary computing environment. ブラウザベースのアノテーションイベントジェネレータのためのユーザインターフェースの図である。FIG. 4 is a user interface diagram for a browser-based annotation event generator. ファイルシステムベースのアノテーションイベントジェネレータのためのユーザインターフェースの図である。FIG. 4 is a user interface for a file system based annotation event generator. 電子メールソフトウェアベースのアノテーションイベントジェネレータのためのユーザインターフェースの図である。FIG. 3 is a user interface for an email software based annotation event generator. 例示的な記憶ダイアログの図である。FIG. 6 is an exemplary storage dialog. 例示的な発行ダイアログの図である。FIG. 6 is an exemplary publish dialog. アノテーションブラウザのためのユーザインターフェースの図である。FIG. 6 is a user interface diagram for an annotation browser. アノテーションの収集のプロセスを示す流れ図である。It is a flowchart which shows the process of collection of annotations. 検索及びパーソナライズのプロセスを示す流れ図である。2 is a flow diagram illustrating a search and personalization process. 発行及びサブスクライブのプロセスを示す流れ図である。2 is a flow diagram illustrating a process of issuing and subscribing. 例示的な記憶アノテーションイベントの図である。FIG. 4 is an example storage annotation event. サブスクリプションクリックストリームアノテーションイベントの図である。It is a figure of a subscription click stream annotation event. 例示的な発行イベントの図である。FIG. 6 is a diagram of an example issue event.

Claims

一意な識別子を有する複数のユーザ間で共有することができる、一意な識別子を有する複数のアイテムを特定するステップと、
各ユーザにその他のユーザと無関係に少なくとも１つの自然言語の少なくとも１つのキーワードを用いて複数の前記アイテムをアノテーションさせるステップであって、それぞれの前記アイテムが少なくとも１人のユーザによってアノテーションされ、それぞれの前記アノテーションが、アノテーションする前記ユーザの前記識別子と、アノテーションされている前記アイテムの前記識別子と、アノテーションする前記ユーザがアノテーションされている前記アイテムを記述するために選択する少なくとも１つのキーワードとを含むアノテーションイベントによって示され、それぞれの前記アノテーションイベントが少なくとも１つの種類の複数のイベントソースから生成される、ステップと、
特定のアイテムに関連する前記キーワードが該アイテムのための前記アノテーションイベントから収集され、かつ、特定のユーザに関連する該キーワードが該ユーザにのための前記アノテーションイベントから収集されるように前記イベントソースから前記アノテーションイベントを収集するステップと、
少なくとも１人の前記ユーザにアイテム又はユーザをキーワードによって検索させるステップであって、該検索に使用したキーワードを前記収集されたキーワードの中に有するアイテム又はユーザが結果として返されるように検索させるステップと
を含む、コラボレーションのための方法。 Identifying a plurality of items having unique identifiers that can be shared among a plurality of users having unique identifiers;
Allowing each user to annotate a plurality of said items with at least one keyword in at least one natural language independent of other users, each said item being annotated by at least one user, The annotation includes the identifier of the user to be annotated, the identifier of the item being annotated, and at least one keyword selected to describe the item being annotated by the user to be annotated Indicated by an event, wherein each said annotation event is generated from a plurality of event sources of at least one type;
The event source such that the keyword associated with a particular item is collected from the annotation event for the item and the keyword associated with a particular user is collected from the annotation event for the user Collecting the annotation event from:
Causing at least one user to search for an item or user by keyword, wherein the item or user having the keyword used for the search in the collected keyword is returned as a result; A method for collaboration, including

前記イベントソースの種類が、記憶ダイアログ、アイテムを発行すること、アイテムをタグ付けすること、検索のクリックストリーム、アイテム内の単語を強調し、前記単語を前記アイテムに関するキーワードとして働かせること、ファイルをファイルシステムに保存すること、リンクテキスト解析、オペレーティングシステム、及びソフトウェアモジュールからなる群のうちの少なくとも１つのである請求項１に記載の方法。 The event source type is a storage dialog, publishing an item, tagging an item, search clickstream, highlighting a word in the item, and using the word as a keyword for the item, file file The method of claim 1, wherein the method is at least one of the group consisting of storing in a system, link text analysis, operating system, and software module.

前記アイテムが、デジタルアセット、物理的資産、人、生物、テキスト広告、映像広告、イベント、場所、状態、プロセス、行為、グループ、概念、ファイル、電子メール、インスタントメッセージ、ブログのポスト、ポッドキャスト、ウェブページ、ウェブサイト、ウェブサービス、データ構造、ソフトウェアモジュール、ソフトウェアオブジェクト、アプリケーション、オペレーティングシステム、リレーショナルデータベース内のテーブルの行、ＸＭＬデータ、及びＲＤＦで示されたリソースからなる群のうちの任意の１つである請求項１に記載の方法。 The item is a digital asset, physical asset, person, creature, text ad, video ad, event, location, state, process, act, group, concept, file, email, instant message, blog post, podcast, web Any one of the group of pages, websites, web services, data structures, software modules, software objects, applications, operating systems, rows of tables in relational databases, XML data, and resources indicated in RDF The method of claim 1, wherein

前記一意な識別子が、ハッシュ値、ＵＲＬ、ＵＲＩ、ＵＲＮ、ＵＮＣ、バーコード、ＲＦＩＤ、基準マーカ、電子メールアドレス、社会保障番号、車両登録番号、及び電話番号からなる群のうちのいずれか１つである請求項１に記載の方法。 The unique identifier is one of the group consisting of a hash value, URL, URI, URN, UNC, barcode, RFID, reference marker, e-mail address, social security number, vehicle registration number, and telephone number. The method of claim 1, wherein

前記アイテムの各々が最大で１つの一意な識別子を有する請求項１に記載の方法。 The method of claim 1, wherein each of the items has at most one unique identifier.

前記ユーザの各々が最大で１つの一意な識別子を有する請求項１に記載の方法。 The method of claim 1, wherein each of the users has at most one unique identifier.

前記識別子がグローバルに一意である請求項１に記載の方法。 The method of claim 1, wherein the identifier is globally unique.

前記ユーザにユーザＩＤ及びパスワードを用いて認証させることによって一意なユーザ識別子を割り当てるステップをさらに含む、請求項１に記載の方法。 The method of claim 1, further comprising assigning a unique user identifier by having the user authenticate with a user ID and password.

前記アイテムが、少なくとも１つのタイトルフィールド及び／又は説明フィールドによってさらに記述される請求項１に記載の方法。 The method of claim 1, wherein the item is further described by at least one title field and / or description field.

前記アイテム識別子がユーザ識別子である請求項１に記載の方法。 The method of claim 1, wherein the item identifier is a user identifier.

前記少なくとも１つのキーワードがアイテム識別子である請求項１に記載の方法。 The method of claim 1, wherein the at least one keyword is an item identifier.

前記少なくとも１つのキーワードがユーザ識別子である請求項１に記載の方法。 The method of claim 1, wherein the at least one keyword is a user identifier.

前記アノテーションイベントが前記ユーザによってデジタル署名される請求項１に記載の方法。 The method of claim 1, wherein the annotation event is digitally signed by the user.

前記アノテーションイベントが収集のためにネットワークを介して少なくとも１つのサーバに送信される請求項１に記載の方法。 The method of claim 1, wherein the annotation event is transmitted over a network to at least one server for collection.

前記ネットワークがインターネットである請求項１４に記載の方法。 The method of claim 14, wherein the network is the Internet.

ユーザ毎のアノテーションイベントの前記収集がクライアントにおいて行われる請求項１に記載の方法。 The method of claim 1, wherein the collection of annotation events for each user is performed at a client.

前記アノテーションイベントがアイテム毎に収集される前にユーザ毎の前記アノテーションイベントが正規化される請求項１に記載の方法。 The method of claim 1, wherein the annotation event for each user is normalized before the annotation event is collected for each item.

情報検索ランク付けアルゴリズムを使用してクエリに対する関連性に基づいて前記検索結果をランク付けするステップをさらに含む請求項１に記載の方法。 The method of claim 1, further comprising ranking the search results based on relevance to a query using an information search ranking algorithm.

複数の前記アルゴリズムに基づいて前記検索結果をランク付けさせ、次に前記ランクを収集して関連性を判定する請求項１８に記載の方法。 The method of claim 18, wherein the search results are ranked based on a plurality of the algorithms, and then the ranks are collected to determine relevance.

前記クエリに関するキーワードベクトルを構築し、収集されたキーワード及び各結果に関するそれらのキーワードの発生の頻度のベクトルを構築するステップと、
ＴＦ−ＩＤＦ、ＴＦ−ＩＤＦの変形形態、ＯＫＡＰＩ、及びピボット正規化からなる群からのランク付けアルゴリズムを使用して前記ベクトルに基づいてランクを計算するステップと
をさらに含む請求項１８に記載の方法。 Constructing a keyword vector for the query and constructing a vector of collected keywords and the frequency of occurrence of those keywords for each result;
19. The method of claim 18, further comprising: calculating a rank based on the vector using a ranking algorithm from the group consisting of TF-IDF, a variant of TF-IDF, OKAPI, and pivot normalization. .

前記アイテムの前記収集されたキーワードが前記アイテムの定義キーワードに制限される請求項２０に記載の方法。 21. The method of claim 20, wherein the collected keyword of the item is limited to a definition keyword of the item.

アイテムの結果集合を計算するステップと、
これらのアイテムをアノテーションしたすべての前記ユーザを前記結果集合に含めるステップと、
リンク解析ランク付けアルゴリズムのためにユーザをハブとして扱い、アイテムをオーソリティとして扱うステップと、
それぞれの前記ユーザから、前記ユーザによってアノテーションされたすべてのアイテムへの統合的なハイパーリンクを生成するステップと、
少なくとも１つのリンク解析ランク付けアルゴリズムを使用して、アイテムに関するランクを計算し、かつ、少なくとも１つのリンク解析ランク付けアルゴリズムを使用して、前記結果セットのユーザに関するランクを計算するステップと
をさらに含む請求項１８に記載の方法。 Calculating a result set of items;
Including all the users who have annotated these items in the result set;
Treating the user as a hub and the item as an authority for the link analysis ranking algorithm;
Generating an integrated hyperlink from each said user to all items annotated by said user;
Calculating a rank for the item using at least one link analysis ranking algorithm, and calculating a rank for users of the result set using at least one link analysis ranking algorithm. The method of claim 18.

前記リンク解析ランク付けアルゴリズムが、Ｉｎｄｅｇｒｅｅ、ＨＩＴＳ、ランダム化ＨＩＴＳ、サブスペースＨＩＴＳ、ＳＡＬＳＡ、ＨＵＢＡＶＧ、オーソリティ閾値系統のアルゴリズム、ＭＡＸ、ＢＦＳ、ＢＡＹＥＳＩＡＮ、単純化ＢＡＹＥＳＩＡＮ、ＰａｇｅＲａｎｋ、パーソナライズされたＰａｇｅＲａｎｋ、ＴｒａｆｆｉｃＲａｎｋ、ＴＯＰＨＩＴＳ、ＣｕｂｅＳＶＤ、ＰＨＩＴＳ、及びＰＬＳＡ＋ＰＨＩＴＳからなる群のうちの１つである請求項２２に記載の方法。 The link analysis ranking algorithms are: Indegree, HITS, Randomized HITS, Subspace HITS, SALSA, HUBAVG, Authority threshold family algorithm, MAX, BFS, BAYESIAN, Simplified BAYESIAN, PageRank, Personalized PageRank, TrafficRank, TOPHITS 23. The method of claim 22, wherein the method is one of the group consisting of:, CubeSVD, PHITS, and PLSA + PHITS.

前記結果集合内のユーザ及びアイテムの双方に関して収集された前記キーワードが、前記検索の前記キーワードを含む請求項２２に記載の方法。 23. The method of claim 22, wherein the keywords collected for both users and items in the result set include the keywords for the search.

前記結果集合を計算するために使用される各アノテーションイベントが、前記クエリのすべてのキーワードを含む請求項２２に記載の方法。 23. The method of claim 22, wherein each annotation event used to compute the result set includes all keywords of the query.

前記結果集合を計算するために使用される各アノテーションイベントが、前記クエリのすべてのキーワードを含み、少なくとも１つの前記キーワードが前記アノテーションイベントの前記アイテムに関する定義キーワードである請求項２２に記載の方法。 23. The method of claim 22, wherein each annotation event used to calculate the result set includes all keywords of the query, and at least one of the keywords is a definition keyword for the item of the annotation event.

前記少なくとも１つのキーワードが、意味の機械表現である請求項１に記載の方法。 The method of claim 1, wherein the at least one keyword is a machine representation of meaning.

前記少なくとも１つのキーワードが、意味メタデータである請求項１に記載の方法。 The method of claim 1, wherein the at least one keyword is semantic metadata.

少なくとも１つの辞書によって前記意味メタデータを記述し、前記辞書に基づいて前記クエリの前記キーワードと、アイテム及びユーザの前記収集されたキーワードとの曖昧性を除去するステップをさらに含む請求項２８に記載の方法。 29. The method of claim 28, further comprising: describing the semantic metadata by at least one dictionary and removing ambiguities between the keywords of the query and items and users' collected keywords based on the dictionary. the method of.

アイテムの定義キーワード又はカテゴリーコンテキストのキーワードを集合として扱い、すべての前記集合におけるキーワードの共起に基づいてオントロジーを生成するステップをさらに含む請求項１に記載の方法。 The method of claim 1, further comprising: treating item definition keywords or category context keywords as a set, and generating an ontology based on keyword co-occurrence in all the sets.

辞書内の概念の間の意味関係が前記共起データから生成される請求項３０に記載の方法。 31. The method of claim 30, wherein semantic relationships between concepts in a dictionary are generated from the co-occurrence data.

コンテキストによってアイテム及びユーザの両方を同時にクラスタ化するステップをさらに含む請求項１に記載の方法。 The method of claim 1, further comprising clustering both items and users simultaneously by context.

コンテキストによってアノテーションイベントを収集するステップと、
既定の最小数の一意なユーザ識別子及び既定の最小数の一意なアイテム識別子をコンテキストの収集されたアノテーションイベント内に有するコンテキストを判定するステップと、
前記コンテキストに基づいてアイテム及びユーザをクラスタ化するステップと
をそらに含む請求項３２に記載の方法。 Collecting annotation events by context;
Determining a context having a predetermined minimum number of unique user identifiers and a predetermined minimum number of unique item identifiers in the context's collected annotation events;
35. The method of claim 32, further comprising clustering items and users based on the context.

前記アノテーションイベントを所定の期間に制限するステップをさらに含む請求項３３に記載の方法。 The method of claim 33, further comprising limiting the annotation event to a predetermined period.

前記クエリのサブコンテキストであるすべての前記コンテキストを判定するステップと、
前記判定されたコンテキスト内に存在するすべてのキーワードが前記検索クエリの一部である前記キーワード以外に存在する場合に、前記判定されたコンテキストから一意なキーワードの集合を計算するステップと、
それぞれの前記一意なキーワードを、前記キーワードを元のクエリに追加し、前記クエリを再発行することによってユーザがドリルダウンすることができる下位カテゴリーとして提示するステップと
をさらに含む請求項３３に記載の方法。 Determining all the contexts that are sub-contexts of the query;
Calculating a set of unique keywords from the determined context if all keywords present in the determined context are present other than the keyword that is part of the search query;
34. Presenting each said unique keyword as a subcategory that can be drilled down by a user by adding the keyword to the original query and reissuing the query. Method.

ユーザがキーワードの集合を用いてアイテムをアノテーションするときにアイテムを発行するステップをさらに含む請求項１に記載の方法。 The method of claim 1, further comprising issuing an item when the user annotates the item with a set of keywords.

前記ユーザに対応する一意な発行者識別子を用いて前記発行されたアイテムをアノテーションするステップをさらに含む請求項３６に記載の方法。 37. The method of claim 36, further comprising annotating the issued item with a unique issuer identifier corresponding to the user.

前記発行するユーザにユーザＩＤ及びパスワードを用いて認証させることによって一意な発行者識別子を割り当てるステップをさらに含む請求項３７に記載の方法。 38. The method of claim 37, further comprising assigning a unique issuer identifier by causing the issuing user to authenticate using a user ID and password.

前記アノテーションイベントが発行アノテーションイベントである請求項３６に記載の方法。 38. The method of claim 36, wherein the annotation event is a published annotation event.

前記発行アノテーションイベントが一意なアイテム識別子を有するアイテムである請求項３９に記載の方法。 40. The method of claim 39, wherein the published annotation event is an item having a unique item identifier.

前記アノテーションイベントのキーワードの前記集合がカテゴリーコンテキストである請求項３６に記載の方法。 The method of claim 36, wherein the set of keywords for the annotation event is a category context.

前記発行されたアイテムが、所定の条件が満たされるまで前記コンテキストに関する前記ユーザのサブセットに対して前記発行されたコンテキストにおいて高くランク付けされる請求項３６に記載の方法。 37. The method of claim 36, wherein the published items are ranked high in the published context against a subset of the users for the context until a predetermined condition is met.

検索結果が前記発行者識別子に基づいてランク付けされる請求項３７に記載の方法。 38. The method of claim 37, wherein search results are ranked based on the issuer identifier.

アイテムを発行する行為がワークフロープロセス内のステップの完了に対応する請求項３７に記載の方法。 38. The method of claim 37, wherein the act of publishing an item corresponds to completion of a step in the workflow process.

前記ワークフロープロセス内の前記ステップが、認可、要求、割り当て、終了、及び通知を含む群のうちの１つである請求項４４に記載の方法。 45. The method of claim 44, wherein the steps in the workflow process are one of a group comprising authorization, request, assignment, termination, and notification.

ユーザプロファイルに基づいて自動的に周期的に検索するステップと、
情報検索ランク付けアルゴリズムを使用して、クエリに対する関連性に基づいて検索結果をランク付けするステップと、
ユーザ毎に上位にランク付けされた結果のサブセットを取り出し、記憶するステップと、
前記結果を前記ユーザの要求で前記ユーザに提示するステップと
をさらに含む請求項３７に記載の方法。 Automatically and periodically searching based on a user profile;
Using an information search ranking algorithm to rank search results based on relevance to the query;
Retrieving and storing a subset of the top ranked results for each user;
38. The method of claim 37, further comprising presenting the result to the user at the user's request.

発行者識別子によって各コンテキストに関する前記記憶された結果を再ランク付けするステップをさらに含む請求項４６に記載の方法。 47. The method of claim 46, further comprising reranking the stored results for each context by issuer identifier.

前記発行されたアイテムが、所定の条件が満たされるまで前記コンテキストに関する前記ユーザのサブセットに対して前記発行されたコンテキストにおいて高くランク付けされる請求項４６に記載の方法。 47. The method of claim 46, wherein the published items are ranked higher in the published context against the subset of users for the context until a predetermined condition is met.

ユーザの前記サブセットが、高くランク付けされた発行者、高くランク付けされたユーザ、及びユーザの無作為の選択からなる群のうちの少なくとも１つのユーザから選択される請求項４８に記載の方法。 49. The method of claim 48, wherein the subset of users is selected from at least one user in the group consisting of a highly ranked issuer, a highly ranked user, and a random selection of users.

提示されたアイテムに関するアノテーションイベントのときに存在する発行者識別子が前記アノテーションイベントに含まれるように、ユーザが前記アイテムをアノテーションするときに前記アノテーションイベントを生成するステップをさらに含む請求項４６に記載の方法。 47. The method of claim 46, further comprising: generating the annotation event when a user annotates the item such that an issuer identifier that exists at the time of the annotation event for the presented item is included in the annotation event. Method.

前記発行者が、前記ユーザ及び／又は元の発行者によって最も認められた発行者に制限される請求項５０に記載の方法。 51. The method of claim 50, wherein the issuer is restricted to an issuer most recognized by the user and / or original issuer.

ユーザプロファイルに基づいて自動的に周期的にコンテキストの集合を検索するステップと、
情報検索ランク付けアルゴリズムを使用してクエリに対する関連性に基づいて検索結果をランク付けするステップと、
各コンテキストに関して上位にランク付けされた結果のサブセットを取り出し、記憶するステップと、
前記結果を前記ユーザの要求で前記ユーザに提示するステップと
をさらに含む請求項１に記載の方法。 Automatically and periodically searching a set of contexts based on a user profile;
Ranking the search results based on their relevance to the query using an information search ranking algorithm;
Retrieving and storing a subset of the top ranked results for each context;
The method of claim 1, further comprising presenting the result to the user at the user's request.

所定の期間内に前記コンテキストに追加されたイベントに前記検索を制限するステップをさらに含む請求項５２に記載の方法。 53. The method of claim 52, further comprising limiting the search to events added to the context within a predetermined time period.

前記コンテキストがカテゴリーコンテキストに制限される請求項５２に記載の方法。 53. The method of claim 52, wherein the context is limited to a category context.

前記コンテキストが前記ユーザによって明示的に指定される請求項５２に記載の方法。 53. The method of claim 52, wherein the context is explicitly specified by the user.

キーワード及びユーザ識別子からなる群からの少なくとも１つのパラメータに基づいてＴＦＩＤＦの時間に基づく変形によって各コンテキストに関する前記記憶された結果を再ランク付けするステップをさらに含む請求項５２に記載の方法。 53. The method of claim 52, further comprising reranking the stored results for each context by time-based deformation of TFIDF based on at least one parameter from the group consisting of a keyword and a user identifier.

少なくとも１つの広告が、関連性のあるユーザを検索し、前記広告を前記ユーザに対して表示することによって少なくとも１人のユーザにプッシュされる請求項１に記載の方法。 The method of claim 1, wherein at least one advertisement is pushed to at least one user by searching for relevant users and displaying the advertisement to the user.

前記広告がユーザプロファイルに基づいて再ランク付けされる請求項５７に記載の方法。 58. The method of claim 57, wherein the advertisement is reranked based on a user profile.

前記広告が、テキスト、オーディオ及びビデオ、並びに部門別案内広告の群からの少なくとも１つからなる請求項５７に記載の方法。 58. The method of claim 57, wherein the advertisement comprises at least one from the group of text, audio and video, and departmental advertising.

前記検索が、コンテキストに基づいていくつかのアノテーション収集サーバに渡って連携される請求項１に記載の方法。 The method of claim 1, wherein the search is coordinated across several annotation collection servers based on context.

アイテムが所定のキーワードの集合のうちの少なくとも１つを用いてアノテーションされる場合に前記結果から前記アイテムの集合を削除するステップをさらに含む請求項１に記載の方法。 The method of claim 1, further comprising deleting the set of items from the result if the item is annotated using at least one of a predetermined set of keywords.

前記キーワードのアノテーションの頻度が所定のレベルを超える場合にのみアイテムが削除される請求項６１に記載の方法。 62. The method of claim 61, wherein an item is deleted only when the frequency of annotation of the keyword exceeds a predetermined level.

前記検索が検索エンジンにおいて実行され、アノテーションイベントが前記検索エンジンの検索ログのクリックストリームから生成される請求項１に記載の方法。 The method of claim 1, wherein the search is performed in a search engine and an annotation event is generated from a click stream of the search engine search log.

一意な識別子を有する複数のユーザ間で共有することができる、一意な識別子を有する複数のアイテムを特定する手段と、
各ユーザにその他のユーザと無関係に少なくとも１つの自然言語の少なくとも１つのキーワードを用いて複数の前記アイテムをアノテーションさせる手段であって、それぞれの前記アイテムが少なくとも１人のユーザによってアノテーションされ、それぞれの前記アノテーションが、アノテーションするユーザの前記識別子と、アノテーションされている前記アイテムの前記識別子と、アノテーションする前記ユーザがアノテーションされている前記アイテムを記述するために選択する少なくとも１つのキーワードとを含むアノテーションイベントによって示され、それぞれの前記アノテーションイベントが少なくとも１つの種類の複数のイベントソースから生成される、手段と、
特定のアイテムに関連する前記キーワードが該アイテムのための前記アノテーションイベントから収集され、かつ、特定のユーザに関連する前記キーワードが該ユーザのための前記アノテーションイベントから収集されるように、前記イベントソースから前記アノテーションイベントを収集する手段と、
少なくとも１人の前記ユーザにアイテム又はユーザをキーワードによって検索させる手段であって、該検索に使用したキーワードを前記収集されたキーワードの中に有するアイテム又はユーザが結果として返されるように検索させる手段と
を備える、コラボレーションのためのシステム。 Means for identifying a plurality of items having unique identifiers that can be shared among a plurality of users having unique identifiers;
Means for allowing each user to annotate a plurality of said items using at least one keyword in at least one natural language independent of other users, each said item being annotated by at least one user, Annotation event wherein the annotation includes the identifier of the annotating user, the identifier of the item being annotated, and at least one keyword selected to describe the item being annotated by the user Means wherein each said annotation event is generated from a plurality of event sources of at least one type;
The event source such that the keyword associated with a particular item is collected from the annotation event for the item and the keyword associated with a particular user is collected from the annotation event for the user Means for collecting said annotation events from:
Means for causing at least one of the users to search for an item or a user by a keyword, wherein the item or user having the keyword used for the search in the collected keyword is returned as a result; A system for collaboration with

前記コンテキストを、すべてのアイテムが前記アイテムをアノテーションする既定の数のユーザを有する既定の数の前記アイテムが存在するコンテキストに制限するステップをさらに含む請求項３３に記載の方法。 34. The method of claim 33, further comprising limiting the context to a context in which a predetermined number of the items exist with all items having a predetermined number of users annotating the item.