JP2003526133A6

JP2003526133A6 - Method and apparatus for providing expression data mining database and laboratory information management

Info

Publication number: JP2003526133A6
Application number: JP2000570688A
Authority: JP
Inventors: バラバン，デイビッド・ジェイ; カージン，エリナ; バーンハート，デレク・エイチ; ソワッキイ，ジョン; アガーウォル，オーン; ジェボン，ルイ
Original assignee: Affymetrix Inc
Current assignee: Affymetrix Inc
Priority date: 1998-09-17
Filing date: 1999-09-15
Publication date: 2004-07-08

Abstract

データ・マイニングを容易にするような方法で発現情報を編成するシステムおよび方法。サンプル調製、実験結果の発現分析、ならびに発現および濃度結果のマイニングの中間および最終結果などに関係した情報を編成することができるデータベース・モデルが提供される（１２４）。このモデルは、ＳＱＬなどのデータベース言語に容易に変換することができる。このデータベース・モデルは、多数のサンプルから収集した発現または濃度情報のマイニングが可能な規模を有する（１２６）。本発明の実施形態は、複数のサンプルについて実施された複数の実験などに関する情報を管理するコンピュータ・ベースの方法を提供する（１１２〜１１８）。Systems and methods for organizing expression information in a manner that facilitates data mining. A database model is provided that can organize information related to sample preparation, expression analysis of experimental results, and intermediate and final results of mining expression and concentration results (124). This model can be easily converted to a database language such as SQL. The database model is large enough to allow mining of expression or concentration information collected from multiple samples (126). Embodiments of the present invention provide computer-based methods for managing information regarding multiple experiments and the like performed on multiple samples (112-118).

Description

【０００１】
（関連出願への相互参照）
本出願は、以下の米国暫定出願からの優先権を主張する。全ての付属物および全ての添付文書を含むこれらの出願の開示全体が、あらゆる目的のために参照によって組み込まれる。
１９９８年９月１７日出願の「ＭＥＴＨＯＤＡＮＤＡＰＰＡＲＡＴＵＳＦＯＲＰＲＯＶＩＤＩＮＧＡＬＡＢＯＲＡＴＯＲＹＩＮＦＯＲＭＡＴＩＯＮＭＡＮＡＧＥＭＥＮＴＳＹＳＴＥＭ」という名称の米国暫定特許出願第６０／１００７２４号（代理人事件整理番号０１８５４７−０３７５００ＵＳ）、および
１９９８年９月１７日出願の「ＭＥＴＨＯＤＡＮＤＡＰＰＡＲＡＴＵＳＦＯＲＰＲＯＶＩＤＩＮＧＡＮＥＸＰＲＥＳＳＩＯＮＤＡＴＡＭＩＮＩＮＧＤＡＴＡＢＡＳＥ」という名称の米国暫定特許出願第６０／１００７４０号（代理人事件整理番号０１８５４７−０３３８４０ＵＳ）。
【０００２】
さらに同一所有権者の１９９８年７月２４日出願の「ＭＥＴＨＯＤＡＮＤＡＰＰＡＲＡＴＵＳＦＯＲＰＲＯＶＩＤＩＮＧＡＢＩＯＩＮＦＯＲＭＡＴＩＣＳＤＡＴＡＢＡＳＥ」という名称の米国同時係属特許出願第０９／１２２１６７号、および
１９９８年７月２４日提出の「ＧＥＮＥＥＸＰＲＥＳＳＩＯＮＡＮＤＥＶＡＬＵＡＴＩＯＮＳＹＳＴＥＭ」という名称の米国特許出願第０９／１２２４３４号は、参照により本明細書に組み込まれる。
【０００３】
（米連邦政府後援の研究開発によって達成された発明の権利に関するステートメント）本発明の部分を導いた研究は、米国立標準技術研究所を通じて米商務省の資金援助を受けた。
【０００４】
（発明の背景）
本発明はコンピュータ・システムに関し、より具体的には、遺伝子発現レベルに関する実験室操作をマイニングしかつ管理するコンピュータ・システムに関する。
【０００５】
多数のサンプル中の遺伝子発現または発現配列タグ（ｅｘｐｒｅｓｓｅｄｓｅｑｕｅｎｃｅｔａｇｓ：ＥＳＴ）に関する情報を収集する装置およびコンピュータ・システムが開発されている。例えば、あらゆる目的のために参照により本明細書に組み込まれるＰＣＴ出願ＷＯ９２／１０５８８号には、核酸および他の物質の配列を検査する技法が開示されている。これらの操作を実行するプローブは、例えば米国特許第５１４３８５４号および米国特許第５５７１６３９号に開示されている先駆的な技法に基づいてアレイの形態に形成することができる。これらの米国特許はともに、あらゆる目的のために参照により本明細書に組み込まれる。
【０００６】
これらの特許に記載されている技法の一態様によれば、核酸プローブのアレイは、チップまたは基板上の既知の位置に製作される。次いで、核酸に取り付けた蛍光標識をチップと接触させ、標識された核酸がチップと結合している位置を指示する画像ファイルをスキャナで生成する。これらの位置のプローブの識別に基づいて、ＤＮＡまたはＲＮＡの単量体配列などの情報を抽出することができる。
【０００７】
このようなプローブ・アレイを使用した遺伝子発現モニタリングのコンピュータ支援技法が開発され、参照によりその内容が本明細書に組み込まれる欧州公告第０８４８０６７号およびＰＣＴ公告ＷＯ９７／１０３６５号に開示されている。多くの疾病が、特定の遺伝子ＤＮＡの複写数の変化または転写レベルの変化（例えば、開始の制御、ＲＮＡ前駆物質の供給、ＲＮＡプロセッシングなど）を介してさまざまな遺伝子が発現する程度の差によって特徴づけられる。例えば、遺伝物質の損失および獲得は、悪性形質転換および進行に重要な役割を演じる。さらに、特定の遺伝子（例えば腫瘍遺伝子または腫瘍サプレッサー）の発現（転写）レベルの変化は、さまざまな癌の存在および進行を知る手がかりとなる。
【０００８】
遺伝子または発現配列タグの発現に関する情報は、先に説明したプローブ・アレイ技法を含むさまざまな方法で大規模に収集することができる。この情報を収集する目的の１つは、その発現が特に重要である遺伝子またはＥＳＴの識別である。研究者は、このような技法を使用して以下のような疑問に答える。１）悪性腫瘍細胞では発現し、健康な組織または特定の治療プログラムに基づいて治療された組織では発現しない遺伝子はどれか。２）特定の器官で発現し、他の器官では発現しない遺伝子またはＥＳＴはどれか。３）特定の種で発現し、他の種では発現しない遺伝子またはＥＳＴはどれか。
【０００９】
多くの組織タイプを含む多数のサンプルから莫大な量の発現データを収集することは、これらの疑問に答える上で有用である。発現データの収集・保存に対する投資から十分な利益を引き出すため、発現データを効率的にマイニングし、特に関連した項目を見つけ出す技法が強く求められている。
【００１０】
（発明の概要）
本発明は、マイニングを容易にするような方法で発現または濃度情報を編成する技法を提供する。サンプル調製、実験結果の発現分析、ならびに遺伝子発現測定のマイニングの中間および最終結果、遺伝子セットなどに関係した情報を編成することができるデータベース・モデルが提供される。このモデルは、ＳＱＬなどのデータベース言語に容易に変換することができる。このデータベース・モデルは、多数のサンプルから収集した遺伝子発現測定のマイニングが可能な規模を有する。
【００１１】
本発明の一実施態様によれば、複数の実験情報をマイニングするコンピュータ・ベースの方法が提供される。この方法は、実験およびチップ設計から情報を収集するステップなどのさまざまなステップを含む。この方法は、マイニングする実験を選択するステップを含むことができる。実験結果およびその他の情報を、実験分析などによって編成することができる。マイニングする実験に対して１つまたは複数のグルーピングを定義するステップもこの方法の一部である。この方法はさらに、マイニングする実験に関する情報を前記グルーピングに基づいて選択して、結果として得られる複数の情報を形成するステップを含む。この得られた情報は、結果として得られる１つまたは複数の遺伝子セットなどを含むことができる。最後にこの方法は、得られた情報をユーザが見ることができるようにフォーマットする。これらのステップの組合せによって、ユーザが、実験情報にアクセスすることが可能となる。
【００１２】
いくつかの実施態様では、データ・マイニングの結果をユーザがより容易に理解することができるよう、この方法のステップとともに視覚化技法を使用する。さらにいくつかの実施態様では、データ・マイニングの結果についての結論を記録するステップがこの方法の一部を構成する。
【００１３】
本発明に基づく他の態様では、発現情報とともに動作する方法が提供される。この方法は、実験の結果に関する情報を収集するステップなどのさまざまなステップを含む。サンプルに関する情報および実験分析などを含む実験に関する情報を集めるステップもこの方法の一部を構成する。実験に関する情報に１つまたは複数の属性を追加するステップを実行することもできる。この方法では次いで、複数の実験結果を変換された複数の情報に変換する。変換には、正規化、脱正規化、集約、スケーリングなどを含めることができる。変換された複数の情報をマイニングするステップ、および変換された複数の情報を視覚化するステップを、この方法の一部とすることもできる。
【００１４】
本発明は、遺伝子発現または配列分析をモニタリングする技法の改良を提供する。具体的には、本発明は、発現をモニタリングする、または配列分析を実行する実験室操作を管理する方法を提供する。
【００１５】
本発明の一実施態様によれば、複数のサンプルについて実施された複数の実験に関する情報を管理するコンピュータ・ベースの方法が提供される。それぞれの実験は、特定の遺伝子がサンプル中で発現される程度を指示する。この方法は、複数のサンプルのうちの少なくとも１つのサンプルを中央データベースに登録するステップなどのさまざまなステップを含む。この方法は、サンプルに関する複数の情報を追跡するステップおよび実験に関する複数の情報を追跡するステップを含むことができる。複数のサンプル関するサンプル履歴を複数の情報から生成するステップをこの方法の一部とすることもできる。この方法に、ユーザが選択したパラメータに基づいて、実験に関する情報およびサンプルに関する情報をフィルタに掛けるステップを含めることができる。情報を、公開データベースなどのさまざまな目標に発表することができる。これらのステップの組合せによって、ユーザが情報にアクセスすることを可能にするウェブ・ベースのユーザ・インタフェースを提供することができる。
【００１６】
多くの実施態様で、実験結果情報を、情報のクロス・プラットフォームを使用および共有することができるフォーマットで入力することができる。このようなフォーマットの１つに、米カリフォルニア州ヘイワード（Ｈａｙｗａｒｄ）のＭｏｌｅｃｕｌａｒＤｙｎａｍｉｃｓ社および米カリフォルニア州サンタクララのＡｆｆｙｍｅｔｒｉｘ社によって提供されたゲノム・データベース用の標準フォーマットであるジェネティック・アナリシス・テクノロジ・コンソーシアム（「ＧＡＴＣ」）がある。ＧＡＴＣについてさらに詳しい情報は、ｈｔｔｐ：／／ｗｗｗ．ｇａｔｃｏｎｓｏｒｔｉｕｍ．ｏｒｇを参照されたい。しかし多くの実施態様では、技術分野で一般に知られているもののような他の標準フォーマットを使用することができる。
【００１７】
本発明に基づく他の態様では、少なくとも１つのデータベースに記憶された複数の実験の結果を見る方法が提供される。この方法は、照会にデータベースを指定するステップなどのさまざまなステップを含む。１つまたは複数の照会を提示して、１つの結果を形成することができる。次いでこの結果をユーザが見ることができる。ユーザが指定した１つまたは複数の対象の要因に基づいてこの結果をフィルタに掛け、表示などの場合のためにグラフィック形態に変換することができるフィルタを通した結果を生成してもよい。
【００１８】
従来の技法をしのぐ多数の利点が本発明によって達成される。本発明に基づくいくつかの実施態様は、従来技術で周知の方法よりも優れた遺伝子実験情報へアクセスすることができる。いくつかの実施態様では、本発明のほうが従来の技法よりも経済的である。実施態様は、「遺伝子発現値が１００以上の遺伝子を全て示せ、少なくとも３／４の遺伝子がこの照会に応答する」などの照会に対する応答ならびにその他の多くの有用なさまざまな照会に対する応答を提供することができる。この方法によって提供される他の利点は、視覚化技法および集合論照会を使用して多数の実験の結果を効果的にマイニングすることができることである。本発明に基づくいくつかの実施態様は、周知の技法よりも単純である。本発明はさらに、実験室分析プロセスのかなり明瞭で見やすいグラフィック指示を提供することができる。
【００１９】
本明細書の残りの部分および添付図面を参照することによって、本発明の性質および利点をさらに理解することができよう。
【００２０】
（特定の実施形態の説明）
本発明の一実施形態は、ＲＮＡ、ＤＮＡなどの生物学的物質などから成るプローブを含むアレイを使用して、生物学的物質または他の物質を分析するシステムとして機能する。ＶＬＳＩＰＳ^TMおよびＧｅｎｅＣｈｉｐ^TM技術は、非常に小さなチップの上に核酸などのポリマーの超大型アレイを形成し、使用する方法を提供する。あらゆる目的のために参照により全体が本明細書に組み込まれる米国特許第５１４３８５４号、ＰＣＴ特許公告ＷＯ９０／１５０７０号および９２／１００９２号を参照されたい。チップ上の核酸プローブは、対象のサンプル核酸（「目標」核酸）中の相補核酸配列を検出するのに使用される。
【００２１】
プローブが、核酸プローブである必要はなく、ペプチドなどの他のポリマーであってもよいことを理解されたい。ペプチド・プローブは、サンプル中のペプチド、ポリペプチドまたはポリマーの濃度を検出する目的に使用することができる。濃度を測定しようとする化合物との間に結合親和性を有するプローブを慎重に選択しなければならない。
【００２２】
図１に、ＲＮＡ、ＤＮＡなど生物学的物質のアレイを形成・分析する代表的なシステム例１００の簡略図を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。チップ設計システム１０４は、ＲＮＡ、ＤＮＡなどの生物学的ポリマーなどのポリマーのアレイを設計するのに使用される。チップ設計システム１０４は、例えば、適当にプログラムされたＳｕｎＷｏｒｋｓｔａｔｉｏｎ、あるいはＩＢＭＰＣ同等機などのパーソナル・コンピュータまたはワークステーションである。チップ設計システム１０４は、対象の遺伝子の特性を含むチップ設計目標に関する入力および所望のアレイの機能に関するその他の入力をユーザから受け取る。チップ設計システム１０４は任意選択で、バイオインフォーマティックス・データベース１０２またはＧｅｎＢａｎｋなどの外部データベースから対象の特定の遺伝子配列に関する情報を受け取ることができる。チップ設計システム１０４の出力は、ＰＣＴ出願ＷＯ９２／１００９２号に記載されているスイッチ・マトリックスなどの形態の一組のチップ設計コンピュータ・ファイルおよびその他の関連コンピュータ・ファイルである。配列決定、配列検査および発現分析用のチップの設計システムが、あらゆる目的のためにそれらの内容全体が参照により本明細書に組み込まれる米国特許第５５７１６３９号およびＰＣＴ出願ＷＯ９７／１０３６５号に開示されている。
【００２３】
チップ設計ファイルは、ＤＮＡなどの分子のアレイの製造で使用するリソグラフィック・マスクを設計するマスク設計システム（図示せず）に入力される。マスク設計システムは、プローブ・アレイの製造で使用するリソグラフィック・マスクを設計する。マスク設計システムはマスク設計ファイルを生成し、次いでマスク構築システム（図示せず）がこれらのファイルを使用して、ポリマー・アレイの製造で使用するクロム・オン・グラス・マスクなどのマスクまたはその他の合成パターンを構築する。
【００２４】
マスクは、合成システム（図示せず）で使用される。合成システムは、基板またはチップ上にポリマーのアレイを製作するのに必要なハードウェアおよびソフトウェアを含む。合成システムは、基板またはチップが置かれる化学フロー・セルおよび光源を含む。マスクは、光源と基板／チップの間に置かれ、これら２つは、チップの選択領域の脱保護のために適当な回数だけ互いに対して平行移動される。選択した化学試薬を、脱保護した領域への結合、洗浄およびその他の操作のためにフロー・セルの中に流す。合成システムで製作した基板は任意選択で、より小さなチップにダイシングされる。合成システムの出力は、目標サンプルを適用する用意が整ったチップである。マスク設計システム、マスク構築システムおよびプローブ・アレイ合成システムに関する情報は、発明の背景の項に示されている。
【００２５】
生物学的ソース１１２は、例えば植物または動物の組織である。生物学的ソース１１２からの物質には、サンプル調製システム１１４によってさまざまな処理ステップが適用される。これらのステップには例えば、ｍＲＮＡの分離、濃度を高めるためのｍＲＮＡの沈澱などが含まれる。これらの処理ステップの結果は、合成システム１１０によって製作されたチップに適用する用意が整った目標サンプルである。発現分析用のサンプル調製方法の詳細は、ＷＯ９７／１０３６５号で論じられている。
【００２６】
調製されたサンプルは、ＲＮＡ、ＤＮＡなどの単量体ヌクレオチドの配列を含む。サンプル暴露システム１１６によってサンプルをチップに適用すると、プローブに結合するヌクレオチドと、結合しないヌクレオチドが生じる。どのプローブがサンプルのヌクレオチド配列と結合したかを判定するため、ヌクレオチドには、フルオレセイン標識が付けられている。次に、調製したサンプルを、走査システム１１８内に置く。走査システム１１８は、標識したレセプタが基板に結合した位置を検出するのに使用するコンフォーカル顕微鏡、ＣＣＤ（電荷結合デバイス）などの検出装置を含む。走査システム１１８の出力は、フルオレセイン標識されたレセプタの場合、基板上の位置に応じた蛍光強度（光子計数または電圧などその他の関係測定値）を示す１つまたは複数の画像ファイルである。標識されたレセプタがポリマーのアレイに強く結合している場所ほど、高い光子計数が観測され、また、基板上のポリマーの単量体配列は位置の関数として既知であるため、レセプタと相補配列である基板上のポリマーの配列を決定することができる。
【００２７】
画像ファイルおよびチップ設計は、例えば塩基配列の呼出し、あるいは遺伝子または発現配列タグの発現レベルの決定などを実施する分析システム１２０に入力される。本明細書で言う遺伝子またはＥＳＴの発現レベルとは、遺伝子またはＥＳＴの転写の結果として得られるｍＲＮＡまたはタンパク質のサンプル中の濃度であると理解されたい。このような分析技法が、あらゆる目的のためにそれらの内容全体が参照により本明細書に組み込まれるＷＯ９７／１０３６５号および米国特許出願第０８／５３１１３７号に開示されている。
【００２８】
発現分析データベース１２２は、発現分析に使用する情報および発現分析の結果を維持する。発現分析データベース１２２の内容には例えば、実行した分析、分析結果、実行した実験、サンプル調製計画、これらの計画のパラメータ、チップ設計などをリストした表が含まれる。発現分析データベース１２２の一実施形態の詳細が、あらゆる目的のためにその内容全体が参照により本明細書に組み込まれる１９９８年７月２４日提出の「ＭＥＴＨＯＤＡＮＤＡＰＰＡＲＡＴＵＳＦＯＲＰＲＯＶＩＤＩＮＧＡＢＩＯＩＮＦＯＲＭＡＴＩＣＳＤＡＴＡＢＡＳＥ」という名称の米国特許出願第０９／１２２１６７号に記載されている。
【００２９】
発現分析データベース１２２の１つまたは複数の具体例は、多くの異なる組織サンプルから収集した多くの遺伝子またはＥＳＴの発現に関する情報を含む。この情報は、例えば以下のような疑問を調べるのに有効である。１）どの遺伝子またはＥＳＴが、病気にかかった組織でアップレギュレート（より多く発現）しており、どの遺伝子またはＥＳＴがダウンレギュレート（より少なく発現）しているか、２）１つの種の器官および組織タイプ間で遺伝子発現がどのように変化するか、３）共通の遺伝子を共有する種間で遺伝子発現がどのように変化するか、４）遺伝子発現が、さまざまな疾病治療プログラムに対してどのように応答するか、５）遺伝子発現が疾病の進行に伴ってどのように変化するか、など。
【００３０】
この種の調査を容易にするため、発現マイニング・データベース１２４が用意されている。発現マイニング・データベース１２４は例えば、発現分析データベース中のデータの複製を含む。発現マイニング・データベース１２４はさらに、照会／マイニング・システム１２６を操作するユーザが実施するマイニング操作を容易にするさまざまな表を含むことができる。照会／マイニング・システム１２６は、遺伝子およびＥＳＴの発現を調べ、先に示した種類の疑問に答えるように求める照会をオペレータがすることを可能にするユーザ・インタフェースを含む。照会／マイニング・システムの一例が、あらゆる目的のためにその内容全体が参照により本明細書に組み込まれる１９９８年７月２４日提出の「ＧＥＮＥＥＸＰＲＥＳＳＩＯＮＡＮＤＥＶＡＬＵＡＴＩＯＮＳＹＳＴＥＭ」という名称の米国特許出願第０９／１２２４３４号に記載されている。
【００３１】
チップ設計システム１０４、分析システム１２０、暴露システム１１６の制御回路部分、サンプル調製システム１１４および走査システム１１８は、Ｓｕｎワークステーション、ＩＢＭ互換ＰＣなどの適当にプログラムされたコンピュータとすることができる。それぞれのシステムに対して独立したコンピュータが、これらのシステムのコンピュータ実装機能を実行してもよいし、または１台のコンピュータが、２つの以上のシステムのコンピュータ機能を結合してもよい。図１のシステムを動作させるコンピュータとは独立の１台または複数台のコンピュータが、発現分析データベース１２２、発現マイニング・データベース１２４および照会／マイニング・システム１２６を維持することができる。
【００３２】
図２Ａに、本発明に基づく特定の一実施形態の代表的なホスト・コンピュータ・システム１０の簡略ブロック図を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。ホスト・コンピュータ・システム２１０は、中央処理装置２１４、システム・メモリ２１６（一般にＲＡＭ）、入出力（Ｉ／Ｏ）アダプタ２１８、ディスプレイ・アダプタ２２６を介して表示画面２２４などの外部装置、入出力アダプタ２１８を介してキーボード２３２およびマウス２３４、ＳＣＳＩホスト・アダプタ２３６、リムーバブル・ディスク２４０を受け取るように動作するリムーバブル・ディスク・ドライブ２３８などの主要なサブシステムを相互接続するバス２１２を含む。ＳＣＳＩホスト・アダプタ２３６は、固定ディスク・ドライブ２４２またはＣＤ−ＲＯＭ２４６を受け取るように動作するＣＤ−ＲＯＭプレーヤ２４４との記憶インタフェースとして機能することができる。固定ディスク２４４は、ホスト・コンピュータ・システム２１０の一部であってもよいし、または、その他のインタフェース・システムを介してアクセスする分離型であってもよい。ネットワーク・インタフェース２４８は、電話リンクを介したリモート・サーバへの直接接続、またはインターネットへの直接接続を可能とする。ネットワーク・インタフェース２４８はさらに、ローカル・エリア・ネットワーク（ＬＡＮ）、または多くのコンピュータ・システムと相互接続したその他のネットワークに接続することができる。その他の多くの装置またはサブシステム（図示せず）を同様の方法で接続することができる。
【００３３】
後に論じる本発明の実施に図２Ａに示した装置の全てが必要というわけではない。装置およびサブシステムを、図２Ａに示したのとは別の方法で相互接続してもよい。図２Ａに示したもののようなコンピュータ・システムの動作は当技術分野では周知であり、本出願で詳細に論じることはしない。本発明を実装するコードは、システム・メモリ２１６、固定ディスク２４２、ＣＤ−ＲＯＭ２４６、リムーバブル・ディスク２４０などのコンピュータ可読記憶媒体内に使用可能な状態で配置または記憶することができる。
【００３４】
図２Ｂに、複数のコンピュータ・システム２１０ａ〜２１０ｅを相互接続したネットワーク２６０の簡略図を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。ネットワーク２６０は例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）などである。バイオインフォーマティックス・データベース１０２および図２Ｂのその他の要素のコンピュータ関連操作を、コンピュータ・システム２１０の間で任意の方法で分割し、ネットワーク２６０を、これらのさまざまなコンピュータ間の情報伝達に使用することができる。ネットワーク２６０の代わりにリムーバブル・ディスクなどの携帯可能記憶媒体を、コンピュータ間の情報伝達に使用することができる。
【００３５】
次に、本発明に基づく代表的な特定の例示的実施形態の発現マイニング・データベース１２４の内容および構造を説明する。発現マイニング・データベース１２４は、複雑な内部構造を有する多次元リレーショナル・データベースであることが好ましい。しかし、選択した実施形態によっては、本発明の範囲から逸脱することなく他の種類のデータベースを使用することもできる。発現マイニング・データベース１２４の構造および内容を、データベースの表の内容ならびに表間の相互関係を記述したモデルに関して説明する。このモデルの可視表現は、エンティティ、関係および属性を含むエンティティ関係図（ＥｎｔｉｔｙＲｅｌａｔｉｏｎｓｈｉｐＤｉａｇｒａｍ：ＥＲＤ）である。ＥＲＤの詳細な議論が、あらゆる目的のためにその内容が参照により本明細書に組み込まれるＰｌａｔｉｎｕｍＴｅｃｈｎｏｌｏｇｉｅｓ社の「ＥＲｗｉｎｖｅｒｓｉｏｎ３．５．２ＭｅｔｈｏｄｓＧｕｉｄｅ」に出ている。当業者は、ＥＲｗｉｎ、Ｏｒａｃｌｅ社から販売されているＤｅｖｅｌｏｐｅｒ２０００などの自動化ツールが、図４ＡのＥＲＤをＳＱＬコードなどの実行可能コードに直接に変換して、データベースを作成し動作させることを理解されたい。
【００３６】
図３に、チップ設計データベース１０２の内容の説明に使用するＥＲＤの手がかりを示す。図３は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。代表的な表３０２は、１つまたは複数のキー属性３０４および１つまたは複数のノンキー属性３０６を含む。代表的な表３０２は、リストされた属性に対応するフィールドをそれぞれが含む１つまたは複数のレコードを含む。キー・フィールドの内容は全体として個々のレコードを識別する。ＥＲＤではそれぞれの表が、水平線によって分割された長方形によって表される。水平線の上側のフィールドまたは属性がキー属性であり、水平線の下側のフィールドまたは属性がノンキー属性である。識別関係３０８は、親表３１０のキー属性が、子表３１２の複合キー属性の一部でもあることを示している。非識別関係３１４は、親表３１６のキー属性が子表３１８の非キー属性でもあることを示している。（ＦＫ）で示す外来キーは、別の表のキーまたは複合キーの一部である属性を含む。非識別関係でも識別関係でも、親表の１つのレコードは子表の１つまたは複数のレコードに対応する。
【００３７】
図４Ａに、本発明に基づく特定の実施形態の発現マイニング・データベース１２４の要素の簡略エンティティ関係図（ＥＲＤ）を示す。図４Ａは例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。図４Ａの長方形は、発現マイニング・データベース１２４内の表に対応する。表の名称はそれぞれの長方形の上に記されている。長方形の中にはそれぞれの表の欄が記されている。それぞれの長方形の中の水平線の上側には、その内容が表の個々のレコードを識別するのに使用されるキー欄が記されている。この水平線の下側にはノンキー属性の名称が記されている。長方形どうしを結ぶ線は、１つの表のレコードと他の表のレコードの間の関係を識別する。最初に、これらのさまざまな表どうしの間の関係を説明する。次に、それぞれの表の内容を詳細に論じる。
【００３８】
操作時、マイニング操作の間に発現マイニング・データベース１２４が更新される。発現分析データベース１２２からの移入および変換によってあるいくつかの表が更新される。照会／マイニング・システム１２６のオペレータが照会操作を定義すれば、他の表を更新することができる。
【００３９】
その発現が変化する遺伝子またはＥＳＴを１つまたは複数の組織属性に応じて何らかの方法で識別することができると有用である。したがって、照会／マイニング・システム１２６が、発現分析結果に関連した組織属性を認識していることが必要である。一般に、１つまたは複数の分析結果が、本明細書で「リーフ目標サンプル」と呼ぶものに関連づけられる。
【００４０】
本発明の作業をより分かりやすく説明するため、まず、「リーフ目標サンプル」と組織属性の間の関係を論じる。「未処理サンプル」は、抽出された１つの組織を表す。以降の処理の前に、単一の未処理サンプルを複数の未処理サンプルに切断してもよい。未処理サンプルは、サンプル調製システム１１４の入力となる。サンプル調製システム１１４は未処理サンプルごとに、ｍＲＮＡまたはその他の発現インジケータを含む流体、いわゆる「目標」を調製する。「目標」を複数の「複製」に分割すること、および複製をプールして別の目標を形成することができる。チップに適用する個々の「目標」がリーフ目標サンプルである。「リーフ目標サンプル」をチップに適用するそれぞれの操作が１つの実験となる。現時点の好ましい実施形態では、１つまたは複数の選択可能な基準に基づいて実験データに対する発現分析を実施し、実験分析結果データを生成する。
【００４１】
サンプルおよび属性に関係した発現マイニング・データベース１２４の表は、図４Ａの文字「Ａ」で示されている。リーフ目標サンプル、未処理サンプル、複製、目標等は、サンプル項目表４０２にリストされる。サンプル項目派生表４０４には、１つのサンプル項目から別のサンプル項目への変換がリストされる。サンプル項目派生表には、分割、プール、および切断操作、未処理サンプルから目標への変換および適用した分析がリストされる。サンプル派生タイプ表４０６にはさまざまなタイプの変換がリストされる。サンプル項目タイプ表４０８には、さまざまなサンプル項目タイプ、例えば目標、複製、未処理サンプル、リーフ目標サンプル、分析などがリストされる。サンプル派生タイプおよびサンプル項目タイプをリストすることは、サンプル処理手順の変更を収容する再プログラミングを容易にする。
【００４２】
属性はサンプルに関連する。属性のいくつかは、濃度、サンプル調製日、有効期限などを識別する文字列または値である。その他の属性は、サンプルを抽出した組織、器官または種の疾病状態など、対象の遺伝子またはＥＳＴの探索に役立つ特性を識別する。属性は、サンプル項目属性表４１０にリストされる。サンプル項目属性マップ表４１２は、サンプル項目属性表４１０とサンプル項目表４０２の間の多対多関係を実装する。１つのサンプルが２つ以上の属性を持つことができ、１つの属性が２つ以上のサンプル項目を記述することができる。
【００４３】
それぞれの属性は、サンプル項目属性タイプ表４１４にリストされた関連属性タイプおよびその属性の関連値を有する。属性タイプの例には、「濃度」、「調製日」、「有効期限」などがある。属性タイプの別の例として「検体タイプ」が挙げられ、その可能な値としては「組織」、「器官培養」、「精製細胞」、「一次細胞培養」、「株化細胞系統」などが対応する。他の例には「人種集団」が考えられ、その値には、例えば「東アジア人」、「先住アメリカ人」などが対応する。
【００４４】
多くの属性タイプが、他の属性タイプから派生したと理解することができる。例えば、属性タイプ「人種集団」は属性タイプ「ヒト」から派生し、属性タイプ「ヒト」は属性タイプ「種」から派生する。属性タイプの中には関連属性を持たず、カテゴリー化のレベルを定義するものがある。「親」属性タイプを「子」属性タイプに関係付ける派生が、属性タイプ派生タイプ表４１８にリストされる。属性タイプは、１つまたは複数の親または子を有することができる。さまざまな種類の派生が属性タイプ派生タイプ表４２０にリストされる。代表的な属性タイプ派生タイプの１つが、親タイプがカテゴリを表し、子タイプがサブカテゴリを表すカテゴリ−サブカテゴリである。属性タイプ間の派生関係を利用できることは、発現マイニング・データベース１２４への有用な照会の定式化を容易にし、ユーザが対象の属性タイプを容易に識別することを可能にする。
【００４５】
実験に関する情報に関係する表は、図４Ａの文字「Ｂ」で示されている。実験表４２４は、結果を問い合わせることができる実験をリストする。データ・マップ表４２６は、調べる遺伝子またはＥＳＴのセットに対応するエントリをリストする。それぞれセットは、そのセットの遺伝子を調べるために実行した実験の集合に対応する。実験セット表４２８は、実験とデータ・マップ表４２６のエントリの間の関連をリストし、したがって、それぞれの遺伝子セットに対応する実験の集合を定義する。分析セット表４３０は、それぞれの遺伝子セットに対応する実行された分析のセットを定義する。それぞれのエントリは、分析、実験とデータ・マップ表４２６のエントリの間の関連を定義する。
【００４６】
遺伝子に関する情報に関係する表を、図４Ａの文字「Ｃ」で示す。遺伝子セット表４３２は、照会／マイニング操作の準備のためにユーザによって、またはその他の方法で定義された全ての遺伝子セットのメンバシップを定義する。遺伝子セット名前表４３４は、遺伝子セットの名前をリストする。遺伝子セットに属する遺伝子は、バイオ項目登録表４３６にリストされる。バイオ項目登録表４３６のそれぞれのエントリは、バイオ項目データベース中の登録番号を識別する。登録番号の定義は登録定義表４３８中に記憶される。ハウスキーピング遺伝子表４４０は、発現モニタリング・プロセスの補正に使用する発現レベルが既知の遺伝子をリストする。
【００４７】
分析情報に関係する表を文字「Ｄ」で示す。絶対発現分析結果が絶対結果表４４４に記憶される。絶対結果表４４４のそれぞれのエントリは絶対結果タイプを参照する。異なる絶対結果タイプは、所与の遺伝子またはＥＳＴの発現レベルの推定を示す、例えば、存在、境界、不在、未知などである。さまざまな相対的な絶対結果タイプが、絶対結果タイプ表４４６にリストされる。相対分析結果は、相対分析結果表４４８に記憶される。相対分析結果表４４８のそれぞれのエントリは、相対結果タイプ表４５０にリストされた相対結果タイプを参照する。相対分析は、２つの実験における遺伝子の発現を比較する。異なる相対結果タイプは、発現の変化を記述する例えば増大、不変、低減、未知などである。表４４８および４５０は、発現分析データベース１２２から移入され、照会／マイニング・システム１２６からは読取り専用である。
【００４８】
照会／マイニング・システム１２６はさらに、さまざまな発現分析操作を実行する。これらの計算の結果は、計算済みフィールド表４５２の中に維持される。
【００４９】
マイニング操作および照会操作に関係する表を文字「Ｅ」で示す。任意の時点でユーザは実験の集合からのデータを考慮する。これらの実験に使用したサンプル項目のリストは、選択サンプル項目表４５４に記憶される。選択サンプル項目表４５４は一般に、サンプル項目表４０２よりもはるかに小さく、照会操作を高速に実行することができる。
【００５０】
基準セット表４５６のそれぞれのエントリは、サンプル項目または属性によって選択されたグループを問い合わせるのに使用する基準セットを識別する。基準セット実験表４５８のそれぞれのエントリは、基準セット表４５６への参照によって識別されたグループに属する特定のサンプル項目の遺伝子またはＥＳＴ発現レベルに適用された基準セットを識別する。基準セット実験詳細表４６０は、基準として適用される値を識別するエントリを含む。
【００５１】
照会／マイニング・システム１２６のユーザは、リーフ目標サンプルに関する情報にはアクセスできず、それらの「親」に関する情報にだけアクセスできる。発現データを、リーフ目標サンプルに関して記録することができる。基準セット実験リーフ表４６２によって基準セット実験表４５８のエントリを、サンプル項目表４５８中のサンプル項目およびそれらのサンプル項目に対応するリーフ目標サンプルに関連付けることができる。
【００５２】
その他のさまざまな表を、本発明に基づく実施形態に含めることができる。これらの表を文字「Ｆ」で示す。ユーザ選好表４６４は、照会／マイニング・システム１２６の個々のユーザの選好を記録したユーザ選好ファイルへの参照を記憶する。ユーザが、後の使用のために、発現データの正規化に使用する関数を記憶しておきたい場合がある。正規化調整関数表４６６は、正規化およびその他の変換関数に関する情報をリストする。ユーザが、関係した複製から収集した発現データを平均するのに使用する関数を記憶しておきたい場合がある。これらの平均化関数の記述は、複製平均関数表４６８に記憶される。
【００５３】
図５Ａに、１つのパターンに対して複数の実験情報をマイニングする本発明に基づく代表的な特定の実施形態の簡略化したプロセスステップの流れ図５０１を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。ステップ５０２で、実験およびチップ設計の情報を収集する。次いでステップ５０４で、マイニングする実験分析を選択する。ステップ５０６で、１つまたは複数のサンプル属性を定義する。ステップ５０８で、実験分析からマイニングによって結果として得られる情報を求め、結果として得られた複数の情報を形成する。この得られた情報は、結果として得られた１つまたは複数の遺伝子セットを含むことができる。最後にステップ５１０で、この得られた情報をユーザに対する表示用にフォーマットする。これらのステップの組み合わせによって、ユーザは、実験情報にアクセスすることができる。
【００５４】
図５Ｂに、発現情報を対象とした本発明に基づく代替実施形態の簡略化したプロセス・ステップの流れ図５０３を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。ステップ５１２で、複数の実験分析の複数の結果に関する情報を収集する。次いでステップ５１４で、サンプルに関する情報および複数の実験に関する情報を集める。次にステップ５１６で、１つまたは複数の属性を実験に関する情報に追加する。次いでステップ５１８で、実験情報の複数の結果を変換して、変換された複数の情報を形成する。変換には、正規化、非正規化、基準化、集約などが含まれる。続いてステップ５２０で、変換された複数の情報をマイニングする。次いでステップ５２２で、マイニングの結果をユーザに表示するため視覚化する。最後にステップ５２４で結論を記録する。
【００５５】
図６Ａに、本発明に基づく特定の実施形態の簡略化したプロセス・ステップの代表的なブロック流れ図を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。ブロック流れ図６０１は、入力データ・ウェアハウス６０２、出力データ・マート６０６を生成する変換ステップ６０４、およびマイニング・プロセス・ステップ６０８を含む。入力データ・ウェアハウス６０２は、実験室情報管理システムおよびその他のデータベースを含むことができる。特定の実施形態のデータ・ウェアハウス６０２は、ゲノム情報およびチップ設計情報、ならびに実験室発現分析プロセスに有用なその他の情報を含むことができる。
【００５６】
図６Ｂに、図６Ａのデータ・ウェアハウス６０２などの本発明に基づく特定の実施形態の代表的なデータ・ウェアハウスの簡略ブロック図を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。データ・ウェアハウス６０２は、実験室情報管理システム６１０および公開データベース６１２を含む複数の公開データベースを含む。特定の一実施形態では、データ・ウェアハウス６０２にチップ設計コンポーネント６１４がさらに含まれる。さらに、ゲノム情報コンポーネント６１６をデータ・ウェアハウス６０２の一部とすることができる。いくつかの実施形態ではさらに、その他の参照データベース６１８がデータ・ウェアハウス６０２の一部となる。多くの実施形態で、本発明の範囲から逸脱することがなく、その他の情報をさらに含めること、またはこれらの特定のコンポーネントを省略することが可能である。
【００５７】
本発明に基づく特定の実施形態では、図６Ａのデータ変換ステップ６０４が正規化および調整ステップを含む。正規化および調整は、分析タイプおよび／または関数タイプによって追跡される関数を含むことができる。いくつかの実施形態では、ＶＢＡ関数または独立したアプレットが追加または削除される。さらに多くの実施形態で、ユーザは選好に基づいていくつかの変換を選択的に省略することができる。データ変換ステップ６０４は、正規化および調整と同様の方法でユーザが複製を操作することができる複製ステップを含むことができる。さらに多くの実施形態で、ユーザは、サンプル識別を使用して派生タイプの複製を識別することができる。さらに、いくつかの実施形態では、複製のカスタム選択をアプレットに組み込むことができる。
【００５８】
図６Ｃに、図６Ａのデータ・マート６０６などの本発明に基づく特定の実施形態の代表的なデータ・マートを示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。代表的なデータ・マート６０６は、実験集合６２０を含むことができる。実験集合の情報および結果を発現結果６２２に転送することができる。多くの実施形態で、１つまたは複数のサンプル属性を有することができる複数のサンプル６２４がさらに、発現結果６２２との関係を有することができる。データ・マート６０６にはさらに複数の遺伝子６２６を含めることができる。最後に、現時点の好ましい実施形態では、時間を、発現結果６２２の次元６２８として取り扱うことができる。データ・マート中のデータを編成する他の方法を、本発明の範囲から逸脱することがなく使用することができる。
【００５９】
特定の実施形態では、実験集合６２０に実験を追加すること、または実験集合６２０から実験を削除することができる。さらに多くの実施形態で、同じ実験集合を複数の目的でマイニングすることができる。さらに、実験集合６２０を、１つまたは複数の実験サブセットに細分してマイニングすることができる。
【００６０】
図６Ｄに、図６Ｃのサンプル６２４などの本発明に基づく特定の実施形態の代表的なサンプルおよび目標の編成を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。サンプルおよび目標は、ユーザが実験のステップを記述することを可能にする。最上位レベルは未処理サンプルである。図６Ｄは、未処理サンプル６３０を含むサンプル６２４を示す。未処理サンプルの下には１つまたは複数の複製がある。未処理サンプル６３０は、２つの複製、複製６３２および複製６３４を含む。複製は、目標を含むことができる。複製６３２は、薬品Ａを用いて処理する目標である。複製６３４は、薬品Ｂを用いて処理する目標である。目標は、１つまたは複数のリーフ目標を含むことができる。例えば、目標６３２は、リーフ目標６３６、６３８、６４０および６４２を含む。目標６３４は、リーフ目標６４４、６４６、６４８および６５０を含む。実験分析を、リーフ目標に関連付けることができる。図６Ｄに、リーフ目標６３６に関連した実験分析６５２および実験分析６５４を示す。現時点の好ましい実施形態では、実験分析を帰納的に定義することができる、すなわち、１つの実験分析が１つまたは複数の実験分析を含むことができる。特定の実施形態では、中間のレベルをユーザが定義することができる。本発明の請求の範囲から逸脱することがなく他のレベルを含めること、および他の編成を使用することが可能である。
【００６１】
図６Ｅに、図６Ｃのサンプル６２４などの本発明に基づく特定の実施形態の他の代表的なサンプルおよび目標の編成を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。図６Ｅに、抽出した１片の組織などを表す未処理サンプル６７０を示す。未処理サンプル６７０は、未処理サンプル６７２、６７３、６７４などの複数の未処理サンプルに切断されている。これらの未処理サンプルは、図１のサンプル調製システム１１４の入力である。サンプル調製システム１１４は、未処理サンプル６７２に対応する目標６７６などの目標を調製する。ｍＲＮＡまたは他の表示インジケータを含む液を目標とすることができる。目標６７２は、複製６７７、６７８、６７９などの複数の複製に分割される。複製６７８と６８０はプールされ、別の目標６８２を形成する。チップに適用される個々の「目標」がリーフ目標サンプルである。「リーフ目標サンプル」をチップに適用するそれぞれの操作が１つの実験となる。リーフ目標サンプル６８４は一例である。現時点の好ましい実施形態では、１つまたは複数の実験分析を、特定のリーフ目標サンプルに関連付けることができる。この図では、分析６８６および６８８がリーフ目標サンプル６８４に関連付けられている。さらに、１つの実験分析を、他の１つまたは複数の実験分析に関して定義することができる。
【００６２】
図６Ｆは、図６Ｃの属性６２８などの本発明に基づく特定の実施形態の複数の属性の代表的な編成を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。図６Ｆに、非階層構造を有する複数の属性を示す。現時点の好ましい実施形態では、任意の数の属性を特定のサンプルに割り当てることができる。さらに、異なるサンプルが同じ属性を有することができる。図６Ｆに、ヒト属性６６２、マウス属性６６４、トウモロコシ属性６６６、酵母菌属性６６８などの複数の属性と関係を有する生物種６６０を示す。「系統」および「人種」ウィンドウは属性の例である。さまざまな実施形態で、本発明の請求の範囲から逸脱することなくその他の構成および属性を使用することができる。
【００６３】
いくつかの実施形態では、図６Ｃの遺伝子６２６を１つまたは複数の遺伝子セットに結合することができる。遺伝子セットは、さまざまなユーザが記述することができる。少なくとも１つの特定の実施形態では遺伝子セットをユーザ間で共有することができないが、その他の実施形態ではユーザ間で共有することができる。ユーザは、他のユーザの遺伝子セットをコピーすること、および遺伝子セットを編集または削除することができる。現時点の好ましい実施形態では、データ・マートのマイニング中に遺伝子セットを作成または保存することができる。いくつかの実施形態では、論理和、論理積のような論理演算、加算、減算、スケーリングなどの算術演算などの１つまたは複数の関数演算を遺伝子セットに適用することができる。
【００６４】
図７Ａに、本発明に基づく特定の実施形態のユーザ・インタフェースの代表的な実験集合画面７０１を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。画面７０１は、ユーザが、図１の発現マイニング・データベース１２４に含まれる実験集合と対話することができる。画面７０１は、実験集合７０４、７０６など、４つの実験集合とともに示された実験集合選択タブ７０２を含む。必要に応じてその他の実験集合を追加することができる。本発明に基づくさまざまな実施形態ではこの情報をユーザに提示するのに他のフォーマットを使用することもできる。
【００６５】
図７Ｂに、本発明に基づく特定の実施形態の実験選択画面７０３を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。実験選択画面７０３は実験タブ７３０を含む。複数の実験が、２つのスクロール・ウィンドウ、すなわち選択実験ウィンドウ７３４および使用可能実験ウィンドウ７３６に示されている。選択ボタン７３８ａおよび７３８ｂを使用して、さまざまな実験を実験選択スクロール・ウィンドウ７３４と７３６の間で移動させることができる。実験選択ウィンドウ７３６は複数の実験を含む。実験選択スクロール・ウィンドウ７３４および７３６に表示される実験の数を制限するために、画面の下部のフィルタ機構を使用して１つまたは複数のフィルタを実験データに適用することができる。フィルタ機構７４４は、欄選択フィールド７４６および選択値入力フィールド７４８を含む。ユーザは、実験をふるいに掛ける特定のフィールドを欄選択フィールド７４６を使用して選択し、次いで、値入力フィールド７４８に所望の値を入力することができる。次いでフィルタ・ボタン７５０をクリックすることによって、ユーザは、集合内の実験にフィルタを適用し、これによって実験選択スクロール・ウィンドウ７３４および７３６の中で、欄が選択された値にセットされた実験だけを検出することができる。
【００６６】
図７Ｃに、本発明に基づく特定の実施形態の分析タブ７５１を有する選択実験集合画面７０５を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。画面７０５は、２つの選択スクロール・ウィンドウ、選択分析ウィンドウ７５２および使用可能分析ウィンドウ７５４を含む。選択キー７５６および７５８を使用して、選択スクロール・ウィンドウ７５２および７５４の間でさまざまな分析を移動させることができる。同様に、画面７０５の下部に提供されたフィルタ機構を用い、欄選択フィールド７６０を使用して特定の欄を選択し、値入力フィールド７６２に所望の値を入力し、フィルタ・ボタン７６４をクリックして実験集合中の分析にフィルタを適用することによって、ユーザは、選択スクロール・ウィンドウ７５２および７５４に示された分析をふるいに掛けることができる。
【００６７】
図７Ｄに、本発明に基づく特定の実施形態の代表的なサンプル選択画面７０７を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。画面７０７で、ユーザは、１つまたは複数のサンプルについて実施した選択の結果を見ることができる。画面７０７は、サンプル選択７７０、サンプル・タイプ選択７７１および属性タイプ選択７７２を含む複数の選択を含む。前／次ボタン対７７４および選択ボタン７７５をそれぞれ使用して、探索および選択を実施することができる。
【００６８】
図７Ｅに、本発明に基づく特定の実施形態の代表的なサンプル／属性管理画面７０９を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。画面７０９を使用してユーザは、サンプル、属性、サンプル・タイプ、属性タイプおよびこれらの間の関係の追加、削除または名前変更を実施することができる。画面７０９は、サンプル／属性セクション７２２および関係セクション７２４を含む。サンプル／属性セクション７２２の項目選択ウィンドウ７７６は、新しい項目、サンプル、属性などのタイプをユーザが選択することができる機能を提供する。機能ボタン７７７を使用して、ユーザは新規追加、名前変更、削除などの操作を選択することができる。ユーザが新しい項目の作成を選択した場合、図７Ｆの画面７１１が表示される。画面７１１を使用してユーザは新しい項目を作成することができる。ユーザは、項目の名前を画面７１１の新規項目フィールド７８０に、項目のタイプを項目タイプ・フィールド７８４に入力することができる。あるいは、ユーザは、図７Ｅの画面７０９の関係セクション７２４を使用して関係を対象に作業を進めることができる。
【００６９】
関係セクション７２４の関係選択ウィンドウ７７８を使用してユーザは、例えば、サンプル項目とサンプル項目の関係、属性とサンプル項目の関係、属性タイプと属性タイプの関係などの関係のタイプを選択することができる。機能ボタン７７９を使用して、ユーザは新規追加、削除などの操作を選択することができる。ユーザが新しい関係の作成を選択した場合、図７Ｆの画面７１３が表示される。画面７１３を使用してユーザは新しい関係を作成することができる。ユーザは、関係のソースをソース・ウィンドウ７８２に、親を親ウィンドウ７８６に、関係のタイプを派生タイプ・ウィンドウ７８８に入力することができる。
【００７０】
図７Ｇに、本発明に基づく特定の実施形態の代表的なデータ・マイニング・オプション管理画面７１５を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。画面７１５には、照会／チャート・タブ７９０、パターン・タブ７９２および遺伝子セット比較タブ７９４を含む複数のタブが示される。ユーザは、データ・マイニングを開始させるため、照会／チャート・タブ７９０の機能によるグループを使用していくつかのグルーピング・パラメータを指定することができる。
【００７１】
図７Ｈに、本発明に基づく特定の実施形態の実験マイニング画面７１７を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。画面７１７の機能を使用してユーザは、例えばどの遺伝子セットを使用するかなどのデータ選択の基準を入力することができる。画面７１７は、サンプル項目７９６などの複数のサンプル項目を含む。グループ選択フィールド７９８を使用してユーザは、実験集合中の複数のグループから選択を実施することができる。遺伝子セット選択フィールド８００を使用して１つまたは複数の遺伝子セットを選択することができる。遺伝子セットを、特定の遺伝子チップによって表される全ての遺伝子またはサブセットとすることができる。特定の一実施形態では、特定の遺伝子チップ上の全ての遺伝子セットがデフォルトとして提供されるが、他のデフォルトを使用することもできる。発現パーセンテージ・フィールド８０２を使用して、グループ内での遺伝子発現の存在程度を指定することができる。これらのフィールドを使用してユーザが探索パラメータを指定したときには、実行ボタン８０１を押すとデータ・マイニングが開始される。
【００７２】
図７Ｉに、本発明に基づく特定の実施形態の選択データ画面７１９を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。データ選択画面７１９には、図７Ｈの実験マイニング画面７１７でユーザが指定した基準を満たすデータが示される。データ選択画面７１９には、リーフ親８０４を含む複数のリーフ親が示されている。画面７１９にはさらに、実験複製８０５、バイオ項目８０６、およびそれぞれのリーフ親の実験中に測定された結果８０７が示されている。ユーザは、搬出ボタン８０８を使用してマイニングの結果を搬出すること、および／または遺伝子セット保存ボタン８０９を使用してマイニングの結果を保存することができる。
【００７３】
図７Ｊに、本発明に基づく特定の実施形態の棒グラフ視覚化画面７２１を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。分布図選択視覚化画面７２１は、実験集合内のデータの表示を有する表示域を含む。視覚化する量は、選択値フィールド８１４から選択することができる。実験結果８１０および８１２は、フィールド８１４でユーザが選択した量に対する特定の遺伝子の発現の差を指示する。
【００７４】
図７Ｋに、本発明に基づく特定の実施形態の分布図視覚化画面７２３を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。分布図選択視覚化画面７２３は、実験集合内のデータの表示を有する表示域８１９を含む。表示域８１９はＸ−Ｙプロットであるが、本発明に基づくさまざまな実施形態では、棒グラフ、グラフ、円グラフなどのその他の形態のデータ視覚化が考慮される。
【００７５】
図７Ｌに、本発明に基づく特定の実施形態のパターン探索画面７２５および７２７を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。遺伝子パターン探索によって、ユーザは、ある薬品に暴露したときにどの遺伝子が同じようにふるまうかなどの関係を求めることができる。画面７２５の「パターン」タブを選択すると、探索基準を入力するための遺伝子パターン・フィールド８２０を含む情報入力装置が表示される。遺伝子パターンについての探索を指定することによって、ユーザに、遺伝子パターン探索画面７２７が提示される。ユーザは、遺伝子セット名前フィールド８２２および８２４を使用して比較する複数の遺伝子セットを選択することができる。測定選択フィールド８２６を使用して、ユーザは、比較の基礎としての対象の測定を選択することができる。
【００７６】
図７Ｍに、本発明に基づく特定の実施形態の遺伝子セット比較画面７２９および７３１を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。遺伝子セットの比較によって、ユーザは、どの遺伝子セットが特定の遺伝子を含むか、どの遺伝子セットが特定の遺伝子または遺伝子の機能上の組合せを含まないかなどの関係を求めることができる。画面７２９の「遺伝子セット比較」タブを選択することによって、比較のためにユーザが選択することができる遺伝子セットに関する情報が表示される。画面７２９には、遺伝子セット８３０、８３２および８３４を含む複数の遺伝子セットが示される。対象の遺伝子セットを指定すると、ユーザに遺伝子比較画面７３１が提示される。ユーザは、選択ウィンドウ８３６、８３８などの複数の選択ウィンドウの１つまたは複数をチェックすることによって、画面７２０で選択した遺伝子セットを比較する出発点として、複数の遺伝子を選択することができる。
【００７７】
図７Ｎ〜７Ｏに、本発明に基づく特定の実施形態のサンプル・データ管理画面７３３および７３５を示す。この図は例にすぎず、本明細書に記載の請求の範囲を限定するものではない。当業者なら、その他の変形、修正および代替形態を認めるであろう。図７Ｎに、遺伝子セット管理画面７３３を示す。この画面を使用して、ユーザは、遺伝子セットの追加、削除、作成、コピー、遺伝子セット内での遺伝子の追加、削除など、遺伝子および遺伝子セットを用いたさまざまな作業を実行することができる。図７Ｏに、遺伝子セット更新画面７３５を示す。この画面を使用して、ユーザは、データベースから削除する１つまたは複数の遺伝子を指定することができる。
【００７８】
実験室情報管理）
本発明の一実施形態は、ＲＮＡやＤＮＡなど生物材料からなることがあるプローブを含むアレイを使用して、生物材料またはその他の材料を分析するためのシステムで動作する。ＶＬＳＩＰＳ^TM技術とＧｅｎｅＣｈｉｐ^TM技術は、非常に小さいチップ上に、核酸などのポリマーの非常に大きいアレイを作成して使用する方法を提供する。全ての目的でそれぞれ参照により本明細書に組み込む米国特許第５１４３８５４号と、ＰＣＴ特許公開ＷＯ９０／１５０７０号および９２／１００９２号を参照のこと。チップ上の核酸プローブは、対象のサンプル核酸（「目標」核酸）中の相補的核酸配列を検出するために使用される。
【００７９】
プローブが核酸プローブである必要はなく、ペプチドなど他のポリマーであってもよいことを理解されたい。ペプチド・プローブを使用して、サンプル中のペプチド、ポリペプチド、ポリマーの濃度を検出することができる。プローブは、測定に使用される濃度の化合物に対する結合親和性を有するように注意深く選択すべきである。【００８０】
図１’は、ＲＮＡやＤＮＡなど生物材料のアレイを形成して分析するためのシステム１００’全体を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。チップ設計システム１０４’を使用して、ＲＮＡやＤＮＡといった生物ポリマーなどのポリマーのアレイを設計する。チップ設計システム１０４’は、例えば、適切にプログラムされたＳｕｎＷｏｒｋｓｔａｔｉｏｎ、あるいは適切なメモリおよびＣＰＵを含むＩＢＭＰＣ相当物などのパーソナル・コンピュータまたはワークステーションであってよい。チップ設計システム１０４’は、対象の遺伝子の特徴を含めたチップ設計目標を評価するユーザからの入力、およびアレイの所望のフィーチャを評価する他の入力を得る。任意選択で、チップ設計システム１０４’は、バイオインフォマティクス・データベース１０２’またはＧｅｎＢａｎｋなどの外部データベースから、対象となる特定の遺伝子配列を評価する情報を得ることができる。チップ設計システム１０４’の出力は、例えばＰＣＴ出願ＷＯ９２／１００９２号に記載されているスイッチ・マトリックスや、他の関連するコンピュータ・ファイルの形をした１組のチップ設計コンピュータ・ファイルである。配列決定および発現分析用のチップを設計するためのシステムが、米国特許第５５７１６３９号およびＰＣＴ出願ＷＯ９７／１０３６５号に開示され、それらの内容を参照により本明細書に組み込む。
【００８１】
チップ設計ファイルは、ＤＮＡなどの分子のアレイの製造に使用されるリソグラフ・マスクを設計するマスク設計システム（図示せず）に入力される。マスク設計システムは、プローブ・アレイの製造に使用されるリソグラフ・マスクを設計する。マスク設計システムは、マスク設計ファイルを生成し、次いでこれがマスク構成システム（図示せず）によって使用されて、ポリマー・アレイの製造に使用するクロム・オン・ガラス・マスクなどのマスクや他の合成パターンを構成する。
【００８２】
マスクは、合成システム（図示せず）で使用される。合成システムは、基板またはチップ上にポリマーのアレイを製造するために使用される必要なハードウェアおよびソフトウェアを含む。合成システムは、光源と、基板またはチップが配置される化学フロー・セルとを含む。マスクは、光源と基板／チップとの間に配置され、この２つは、チップの選択された領域の保護解除をするために、適切な時間に、互いに並進移動させられる。選択された化学試薬は、保護解除された領域への結合のため、ならびに洗浄および他の操作のためにフロー・セルを通って送られる。合成システムによって製造される基板は、任意選択でより小さなチップにダイシングされる。合成システムの出力は、目標サンプルを付着するために準備されたチップである。マスク設計、マスク構成、およびプローブ・アレイ合成システムに関する情報が、背景として提示される。
【００８３】
生物源１１２’は、例えば植物や動物からの組織である。サンプル調製システム１１４’によって、生物源１１２’からの材料に様々な処理ステップが施される。これらのステップは、濃度を高めるために、ｍＲＮＡの隔離、ｍＲＮＡの析出を含んでよい。様々な処理ステップの結果として、合成システム１１０’によって生成されたチップに付着するように準備された目標サンプルができる。発現分析に関するサンプル調製方法は、ＷＯ９７／１０３６５号に詳細に論じられている。
【００８４】
調製されたサンプルは、ＲＮＡやＤＮＡなどの核酸配列を含む。サンプル露出システム１１６’によってサンプルをチップに付着するとき、サンプル中の核酸は、プローブに結合することもあり、結合しないこともある。プローブがサンプルからの核酸配列に結合するかどうかを判定するために、核酸にフルオレセイン・ラベルでタグが付けられている。調製されたサンプルは、走査システム１１８’内に配置される。走査システム１１８’は、ラベル付けされたレセプタが基板に結合されている位置を検出するために使用される共焦顕微鏡やＣＣＤ（電荷結合デバイス）などの検出装置を含む。走査システム１１８’の出力は、フルオレセインでラベル付けされたレセプタの場合、基板上の位置に応じてフルオレセイン強度（光子数、または電圧など他の関係する測定値）を示す画像ファイルである。ラベル付けされた目標がポリマーのアレイに対してより強く結合されている場合にはより高い光子数が観察されるため、かつ基板上のポリマーのモノマー配列が位置に応じてわかっているため、プローブに相補的な基板上の目標の配列を決定することが可能になる。
【００８５】
画像ファイルおよびチップの設計は、例えば、塩基配列を呼ぶ、あるいは遺伝子の発現レベルまたは発現された配列タグを決定する分析システム１２０’に入力される。本明細書では、遺伝子またはＥＳＴの発現レベルは、遺伝子またはＥＳＴの転写から生じるｍＲＮＡまたはタンパク質のサンプル中の濃度と理解される。そのような分析技法は、ＷＯ９７／１０３６５号および米国特許出願第０８／５３１１３７号に開示されており、それらの内容を参照により本明細書に組み込む。
【００８６】
発現分析データベース１２２’は、発現を分析するために使用される情報、および発現分析の結果を維持する。発現分析データベース１２２’の内容には、行われる分析、分析結果、行われる実験、サンプル調製プロトコルとこれらのプロトコルのパラメータ、チップ設計などをリストする表を含むことができる。発現分析データベース１２２’の一実施形態の詳細は、１９９８年７月２４日に出願された「ＭＥＴＨＯＤＡＮＤＡＰＰＡＲＡＴＵＳＦＯＲＰＲＯＶＩＤＩＮＧＡＢＩＯＩＮＦＯＲＭＡＴＩＣＳＤＡＴＡＢＡＳＥ」という名称の米国特許出願第０９／１２２１６７号に記載されており、その内容を全ての目的で参照により本明細書に組み込む。
【００８７】
発現分析データベース１２２’の１つまたは複数の例示は、多くの異なる組織サンプルから収集された多くの遺伝子またはＥＳＴの発現に関する情報を含むことができる。この情報を使用して、例えば、１）どの遺伝子またはＥＳＴが疾患組織内でアップレギュレートされ（より多く発現され）、どれが疾患組織内でダウンレギュレートされる（より少なく発現される）か、２）種内の器官および組織タイプ間で遺伝子発現がどのように異なるか、３）共通の遺伝子を共有する種の間で遺伝子発現がどのように異なるか、４）様々な疾患治療期間に対して遺伝子発現がどのように応答するか、５）疾患の進行と共に、遺伝子発現がどのように変化するかなどの疑問を調査することは有用である。
【００８８】
このような調査を容易にするために、発現マイニング・データベース１２４’が提供される。発現マイニング・データベース１２４’は、発現分析データベース内のデータの複製表現を含むことができる。発現マイニング・データベース１２４’はまた、様々な表を含んで、照会およびマイニング・システム１２６’を操作するユーザによって行われるマイニング操作を容易にすることができる。照会およびマイニング・システム１２６’は、遺伝子およびＥＳＴの発現を調査して、上で識別されたタイプの疑問に答えるために、操作者が照会を行うことを可能にするユーザ・インターフェースを含む。照会およびマイニング・システムの例は、１９９８年７月２４日に出願された「ＧＥＮＥＥＸＰＲＥＳＳＩＯＮＡＮＤＥＶＡＬＵＡＴＩＯＮＳＹＳＴＥＭ」という名称の本出願人所有の米国特許出願第０９／１２２４３４号に記載されている。
【００８９】
チップ設計システム１０４’、分析システム１２０’、露出システム１１６’の制御部、サンプル調製システム１１４’、および走査システム１１８’は、ＳｕｎワークステーションやＩＢＭ互換ＰＣなど適切にプログラムされたコンピュータであってよい。各システムに関して独立したコンピュータが、これらのシステムのコンピュータ機能を実施することができ、または１つのコンピュータが、２つ以上のシステムのコンピュータ機能を組み合わせることができる。１つまたは複数のコンピュータが、図１’のシステムを操作するコンピュータから独立して、発現分析データベース１２２’、発現マイニング・データベース１２４’、照会およびマイニング・システム１２６’を維持することができる。
【００９０】
図２Ａ’に、本発明による特定の実施形態を実施するのに適したホスト・コンピュータ・システム１０’のブロック図を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図２Ａ’は、中央処理装置２１４’、システム・メモリ２１６’（典型的にはＲＡＭ）、入出力（Ｉ／Ｏ）アダプタ２１８’、表示アダプタ２２６’を介する表示画面２２４’などの外部装置、入出力アダプタ２１８’を介するキーボード２３２’およびマウス２３４’、ＳＣＳＩホスト・アダプタ２３６’、取り外し可能ディスク２４０’を収容するように動作可能な取り外し可能ディスク・ドライブ２３８’など主要なサブシステムを相互接続するバス２１２’を含むホスト・コンピュータ・システム２１０’を示す。ＳＣＳＩホスト・アダプタ２３６’は、ＣＤ−ＲＯＭ２４６’を収容するように動作可能な固定ディスク・ドライブ２４２’またはＣＤ−ＲＯＭプレーヤ２４４’に対する記憶インターフェースとして働くことができる。固定ディスク２４４’は、ホスト・コンピュータ・システム２１０’の一部であってよく、または独立しており、他のインターフェース・システムを介してアクセスされてもよい。ネットワーク・インターフェース２４８’は、電話回線を介する遠隔サーバへ、またはインターネットへの直接接続を提供することができる。ネットワーク・インターフェース２４８’は、ローカル・エリア・ネットワーク（ＬＡＮ）、または多くのコンピュータ・システムを相互接続する他のネットワークに接続することもできる。多くの他の装置またはサブシステム（図示せず）を同様の形で接続することができる。
【００９１】
また、後述するように、本発明を実施するためには、図２Ａ’に示される全ての装置が存在する必要はない。装置およびサブシステムを、図２Ａ’に示される方法とは異なる方法で相互接続することもできる。図２Ａ’に示されるようなコンピュータ・システムの動作は、当技術分野で容易にわかるものであり、本出願では詳細に論じない。本発明を実施するためのコードは、システム・メモリ２１６’、固定ディスク２４２’、ＣＤ−ＲＯＭ２４６’、フロッピー（登録商標）・ディスク２４０’などコンピュータ可読記憶媒体に動作可能に配置する、または記憶することができる。
【００９２】
図２Ｂ’に、複数のコンピュータ・システム２１０ａ’〜２１０ｅ’を相互接続するネットワーク２６０’の簡略図を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。ネットワーク２６０’は、ローカル・エリア・ネットワーク（ＬＡＮ）、広域ネットワーク（ＷＡＮ）などであってよい。様々なコンピュータ間で情報を通信するために使用されるネットワーク２６０’を用いた任意の方法で、バイオインフォマティクス・データベース１０２’、および図２Ｂ’の他の要素のコンピュータ関連動作をコンピュータ・システム２１０’間で分割することができる。ネットワーク２６０’の代わりに取り外し可能ディスクなどのポータブル記憶媒体を使用して、コンピュータ間で情報を搬送することができる。
【００９３】
図３Ａ’に、本発明による特定の典型的な実施形態における、複数のサンプルについて行われた複数の実験に関する情報を管理するための簡略化された処理ステップの流れ図３０１’を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。各実験は、サンプル中の特定の遺伝子配列の発現の程度の表示を提供することができる。ステップ３１０’で、複数のサンプルの少なくとも１つが集中データベースに登録される。次に、ステップ３１２’で、複数のサンプルに関する複数の情報が追跡される。ステップ３１２’の結果、サンプルに関する情報をデータベースに組み込むことができる。次いで、ステップ３１４’で、複数の実験に関する複数の情報が追跡される。実験室内の実験環境の変化が、ステップ３１４’の機能によってデータベース内に反映される。次に、ステップ３１６’で、データベース内の情報からサンプル履歴が生成される。サンプル履歴は、複数のサンプルの状態を記述する。ステップ３１８’で、複数の実験に関する情報および複数のサンプルに関する情報が、ユーザによって選択された１つまたは複数のフィルタによってフィルタ処理されて、発現配列情報を生成する。最後に、任意選択のステップ３２０’で、実験室内での実験の操作から生じる発現配列情報を、ウェブ・ベースのユーザ・インターフェースまたは他の手段によってアクセスできる公共データベース上に掲載することができる。
【００９４】
図３Ｂ’に、本発明による他の実施形態における、複数のサンプルの結果を閲覧するための簡略化された処理ステップの流れ図３０３’を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。結果は、１つまたは複数のデータベースに記憶することができる。ステップ３２２’で、ユーザがデータベースを指定して照会する。次に、ステップ３２４’で、１つまたは複数の照会がデータベースに提出されて、結果を形成する。次いで、ステップ３２６’で、表示装置を用いてユーザがその結果を閲覧することができる。ステップ３２８’では、その結果を１つまたは複数のユーザ指定のフィルタによってフィルタ処理することができる。最後に、ステップ３３０’で、フィルタ処理された結果をグラフの形にすることができる。
【００９５】
図３Ｃ’は、本発明による特定の実施形態における、サンプルについて行われた複数の実験に関する情報を管理するための簡略化された処理ステップの典型的な流れ図３０５’を提供する。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。ステップ３３０’で、サンプルがデータベースに登録される。次いで、ステップ３３２’で、実験のセットアップが行われる。ステップ３３４’では、アリコートが行われる。次いでステップ３３６’で、ＲＮＡが抽出される。ＲＮＡについて重合連鎖反応（ＰＣＲ）がステップ３３８’で行われる。ステップ３４０’で、ｃＲＮＡにラベルが付けられる。ステップ３４２’で、断片形成が行われる。ハイブリッド形成がステップ３４４’で行われる。ステップ３４６’では、ハイブリッド形成されたチップの走査が行われる。次いでステップ３４８’で、格子の配列が行われる。セル平均分析がステップ３５０’で行われる。ステップ３５２’で、プローブ・アレイ分析が行われ、ステップ３５４’では複合分析が行われる。
【００９６】
図４Ａ’に、本発明による特定の実施形態における典型的なデータベース構成を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図４Ａ’は、図２Ｂ’のワークステーション２１０’の１つであってよく、例えば複数のデータベースの１つまたは複数に相互接続することができるクライアント・ワークステーション４０１’を示す。例えば、ＧＡＴＣデータベース４０３’は、ＧＡＴＣ形式で、複数の遺伝子チップの結果を含む。ＧＡＴＣ形式は、複数のシステムにわたって遺伝子チップ・データに関する標準インターフェースを提供する。ＧＡＴＣに関するさらなる情報に関しては、全ての目的で全体を参照により本明細書に組み込む「ＳｏｆｔｗａｒｅＳｐｅｃｉｆｉｃａｔｉｏｎｓ」と「ＤａｔａｂａｓｅＳｃｈｅｍａ」と題する文書について、ｈｔｔｐ：／／ｗｗｗ．ｇａｔｃｏｎｓｏｒｔｉｕｍ．ｏｒｇを参照することができる。データベース４０５’は、データ・マイニング情報を提供し、ＦＡＱおよび基本設定を含むことができる。データベース４０７’は、遺伝子情報に関する注釈、記述、およびＵＲＬを備える。実施形態は、上の全てのデータベースを含むことができ、またはデータベースのサブセットを備えることができ、または特許請求された発明の範囲から逸脱することなく他のデータベースを含むこともできる。
【００９７】
図４Ａ’のデータベース構成は、データ管理機能、データ掲載機能、およびクライアント４０１’など遺伝子チップ・クライアントとの統合を提供する。データ管理機能は、実験室情報管理システム（ＬＩＭＳ）を備えることができる。本発明によりＬＩＭＳを実装する実施形態は、処理入力、処理出力、処理環境などデータ追跡の機能を提供することができる。認証、アクセス許可、特権などのデータ・セキュリティ機能は、書込みアクセス権を有する所有者と、読取専用アクセス権を有するユーザ・グループとを分離することを含む。データ共有機能は、データへのグループ・アクセス権を提供することができる。データの掲載および共有は、標準データ形式に適合させることによって容易にすることができる。ここで好ましい実施形態では、ＧＡＴＣ形式を使用することができる。この標準形式は、遺伝子チップ・データに対する相互システム・アクセス権を提供する。好ましい実施形態では、データベース・サーバが、ウェブ・ブラウザ・アクセス権を提供するインターネット・サーバであってよい。実施形態は、スクリプト能力を含むことができ、サーバで分析機能を提供することができる。ブラウザなどのウェブ・アプリケーションおよび遺伝子チップ・インターフェースを介するデータベース・アプリケーションとの通信を提供することができる実施形態もある。データベースは、ＳＱＬサーバやＯＲＡＣＬＥサーバなどのサーバ内で実施することができる。データベース・サーバは、ＯＲＡＣＬＥＮＴやＵＮＩＸなどいくつかのプラットフォームに常駐することができる。
【００９８】
図４Ｂ’は、本発明による特定の実施形態における複数のデータ源を有するデータ源選択窓４０９’を示し、そこから遺伝子および実験の情報を得て、検索して、操作することができる。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図４Ｂ’は、ＭＩＣＲＯＳＯＦＴＥＸＣＥＬファイル、テキストファイル、ＭＩＣＲＯＳＯＦＴＡＣＣＥＳＳ９７データベース、ＡｌｆａＰｕｂｌｉｓｈ、ＤａｔａＭｉｎｉｎｇＩｎｆｏ、ＧｅｎｅＩｎｆｏ、ＪｅｔＦｏｒｍＡＳＣＩＩファイル、ＪｅｔＦｏｒｍｄＢａｓｅ、ＪｆＤｂＦｅｔｃｈＤＢＦ、ＪｆＳａｍｐｌｅ、ＪｅｔＦｏｒｍＦｉｌｌｅｒＥｘａｍｐｌｅ、ＦｏｒｍｓＴｒａｃｋ、ＪｅｔＦｏｒｍＥｘｃｅｌ、ＪｅｔＦｏｒｍＥｘｃｅｌ５、ＡＦＦＹＭＥＴＲＩＸ、Ｐｕｂｌｉｓｈ＿Ｓｔａｔｉｃ、ＧｅｎｅＣｈｉｐＬＩＭＳ、ＥｌｉＰｕｂｌｉｓｈ、ＧＥＤａｔａなどを含めた複数の異なるデータベース形式を示すが、それらに限定されない。
【００９９】
本発明による多くの実施形態が、実験データの収集と分析、ならびに結果の掲載の自動化を提供することができる。本発明による多くの実施形態が、特定のサンプルおよび複数のサンプルに関する複数の実験について、発現分析、サンプル登録、および結果の掲載を提供することができる。さらに、本発明の方法および技法は、分析などに関するユーザ・パラメータの定義を自動化することができる。
【０１００】
図５Ａ’は、本発明による特定の実施形態における典型的な自動化ページを示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図５Ａ’は、サンプル情報区域５０２’と、実験情報区域５０４’と、サンプル実験プローブ・アレイ区域５０６’とを有する自動化ページ５０１’を示す。サンプル情報区域５０２’は、サンプル名、サンプル・タイプ、プロジェクト名、サンプルの記述およびなんらかのコメントなどのデータを入力するためのフィールドを提供する。本発明の様々な実施形態では、他のデータを入力するためのフィールドを含むこともできる。実験情報区域５０４’は、実験名と、プローブ・アレイ画像識別子と、プローブ・アレイ・タイプと、ロット番号、分析セット、セル平均セット、結果を掲載するための目標データベースなどのプローブ・アレイに関する情報とを入力するためのフィールドを含む。区域５０６’は、サンプル・プローブ・アレイ、サンプル実験、およびプローブ・アレイの識別子を突合わせるための表示画面を提供する。ここで好ましい実施形態は、複数のサンプルを有する能力と、サンプル当たり複数の実験を有する能力を提供する。
【０１０１】
図５Ｂ’は、本発明による特定の実施形態における自動化結果ページ５０３’を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。自動化結果ページ５０３’は、実施形態のセットアップおよび実行時の複数のステップの表示、および各ステップに関する特定のサンプルについての結果を提供する。例えば、図５Ｂ’に示されるように、「ｓａｍｐｌｅｄｅｍｏｐａｓｔｒｅｇｉｓｔｒａｔｉｏｎ（サンプル・デモ成功登録）」と名付けられたサンプル第１ステップは、成功の結果を受け取っている。本発明の請求項の範囲から逸脱することなく、様々な実施形態に他のステップを含むことができる。
【０１０２】
図５Ｃ’に、本発明による特定の実施形態における典型的な発現走査画面５０５’を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図５Ｃ’は、保留走査に関する情報を示す。画面５０５’は、ハイブリッド形成された発現プローブ・アレイ画像識別子フィールド５１０’を含み、走査用の特定のプローブ・アレイを選択するためにユーザが使用することができる。実験情報フィールド５１２’内のサンプルは、名前、プロジェクト、サンプルのタイプ、ユーザ識別子、および日付などサンプルに関する情報、ならびに実験に関する情報を提供する。プローブ・アレイ情報フィールド５１４’は、識別子、アレイ・タイプ、ロット番号などプローブ・アレイ画像に関する情報を提供する。ハイブリッド形成情報フィールド５１６’は、試薬およびロット番号に関する情報を提供する。複数のフィルタ・フィールド５１８’は、サンプル・プロジェクト、サンプル・タイプ、およびプローブ・アレイ・タイプをフィルタにかける能力を提供する。
【０１０３】
図６Ａ’に、本発明による特定の実施形態における典型的なサンプル登録画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図６Ａ’は、サンプルを記述するデータを入力するためのフィールドを有するサンプル登録画面６０１’を示す。例えば、画面６０１’は、サンプル名６０２’、サンプル・プロジェクト、サンプル・タイプを入力するためのフィールド、ならびにコメントおよび記述フィールドを含む。初期処理入力点フィールド６０４’は、ユーザが、開始点として実験室の処理中の特定の時点を選択することを可能にする。登録サンプル・フィールド６０６’は、登録されているサンプルのリストを提供する。サンプル情報フィールド６０８’は、様々なサンプルに関する情報を提供する。
【０１０４】
図６Ｂ’に、本発明による特定の実施形態における、実験室情報管理を自動化する前の複数の画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図６Ｂ’は、実験のセットアップを行うための画面６１０’を示す。画面６１２’は、アリコートするステップを行うことを可能にする。画面６１４’は、ＲＮＡ抽出を行うことを可能にする。画面６１６’は、ＲＴＰＣＲを行うことを可能にする。画面６１８’はｃＲＮＡのラベル付けを行うことを可能にし、画面６２０’は断片形成を行うことを可能にする。本明細書における請求項の範囲から逸脱することなく、本発明による様々な実施形態で、他の画面、および異なるタイプの画面または画面の設計を使用することができる。
【０１０５】
図６Ｃ’は、本発明による特定の実施形態における典型的なハイブリッド形成画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図６Ｃ’は、ハイブリッド形成処理を制御するための画面６２１’を示す。画面６２１’は、保留ハイブリッド形成断片化発現容器識別子フィールド６２２’を備える。そのようなハイブリッド形成断片化発現容器は、断片化されているサンプルを含む。サンプルおよび実験情報フィールド６２４’は、ハイブリッド形成処理でのサンプルおよび実験に関する追跡情報を提供する。保留走査フィールド６２６’は、ハイブリッド形成された発現およびプローブ・アレイ画像識別情報を提供する。図６Ｃ’はまた、ハイブリッド形成制御画面６２３’とハイブリッド形成制御画面６２５’を示す。画面６２３’は、ハイブリッド形成ステップを施す待機している実験に関する情報を提供する。画面６２５’は、ハイブリッド形成ステップを完了した実験に関する情報を提供する。
【０１０６】
図６Ｄ’は、本発明による特定の実施形態における格子配列制御画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図６Ｄ’は、格子配列制御画面６３１’を示す。格子配列制御画面６３１’は、保留格子配列表示領域６３２’、および完了した格子配列表示領域６３４’を備える。サンプル実験情報フィールド６３６’は、格子配列処理におけるサンプルおよび実験に関する情報を提供する。ファイル・タイプ情報フィールド６３８’は、ファイル・タイプに関する識別情報を提供し、プローブ・アレイ情報フィールド６３９’は、プローブ・アレイに関する識別情報を提供する。
【０１０７】
図６Ｅ’は、本発明による特定の実施形態における典型的なセル平均分析画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図６Ｅ’は、サンプル・プロジェクト、実験名、サンプル・タイプ、プローブ・アレイ・タイプ、ユーザ名、画像データ／プローブ・アレイ・タイプ、セル平均名、画像データ、セル・データ、アルゴリズム、および他のパラメータを入力するための複数のフィールドを有する画面６４１’を示す。さらに、結果領域６４２’は、特定の画像名、セル名、プローブ・アレイ・タイプ、および様々なパラメータに関する情報を提供する。結果領域は、特定の実験について成功／失敗表示を提供する。
【０１０８】
図６Ｆ’は、本発明による特定の実施形態における典型的なプローブ・アレイ分析画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図６Ｆ’は、サンプル・プロジェクト、実験名、サンプル・タイプ、プローブ・アレイ・タイプ、ユーザ名、セル・データ／プローブ・アレイ・タイプ、プローブ・アレイ名、プローブ・アレイ・データ、アルゴリズム、および他のパラメータに関する情報を入力するための複数のフィールドを有する画面６５１’を示す。図６Ｆ’はまた、セル名、プローブ・アレイ名、プローブ・アレイ・タイプ、パラメータ領域、および成功／失敗表示を提供するための結果領域を有する結果領域６５２’を示す。
【０１０９】
図６Ｇ’は、本発明による特定の実施形態における複合分析画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図６Ｇ’は、サンプル・プロジェクト、実験名、サンプル・タイプ、ユーザ名、検知／反検知プローブ・アレイ、複合名、複合データ、アルゴリズム、および他のパラメータに関する情報を入力するための複数のフィールドを有する画面６６１’を示す。さらに、画面６６１’は、検知チップ・ファイル名、反チップ・ファイル名、複合ファイル名、パラメータ領域、および結果の成功／失敗表示を提供するための結果領域を表示するための結果領域６６２’を提供する。
【０１１０】
図６Ｈ’は、本発明による特定の実施形態における典型的なサンプル履歴画面を提供する。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。サンプル履歴画面６８１’は、特定のサンプルに関して完了している処理の履歴リストを提供する。
【０１１１】
図７Ａ’に、本発明による特定の実施形態における、セットを取り扱う典型的な発現分析画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図７Ａ’は、プローブ・アレイ・タイプ・フィールド７１０’、ユーザ名フィールド７１２’、アルゴリズム・フィールド７１４’、セル平均名フィールド７１６’、パラメータ・フィールド７１８’、既存セット名フィールド７１１’、作成更新セット名フィールド７１３’、結果領域７１９’を含めた複数のフィールドを有する画面７０１’を示す。結果領域は、画像名、セル名、プローブ・アレイ・タイプ、アルゴリズム、セット名に関するフィールドと、発現分析ステップに関する成功／失敗結果を表示するための領域とを提供する。実験結果およびユーザ・パラメータ・セットのバッチ分析に関するサポートを提供することができる実施形態もある。
【０１１２】
図７Ｂ’に、本発明による特定の実施形態における作成セット名画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図７Ｂ’は、プローブ・アレイ・タイプ・フィールド７２０’、使用されたプローブ・アレイ・タイプ・フィールド７２２’、既存のセット名フィールド７２４’、および様々なチップに関するスケーリングおよび正規化を指定するための領域を有する画面７０３’を示す。
【０１１３】
図７Ｃ’に、本発明による特定の実施形態における発現セル・データ分析画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図７Ｃ’は、フィルタ・パラメータを記述するための複数のフィールドを有する画面７０５’を示す。フィルタリングは、検定タイプ、データ・タイプ、プローブ・アレイ・タイプ、年月日を含めた日付、サンプル・プロジェクト、実験名、サンプル・タイプ、ユーザ名などいくつかのフィールドで実施することができる。
【０１１４】
図８Ａ’〜８Ｃ’に、本発明による特定の実施形態における典型的なＥｘｐｒｅｓｓｉｏｎＤａｔａＭｉｎｉｎｇＴｏｏｌ（発現データ・マイニング・ツール、ＥＤＭＴ）画面を示す。これらの図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図８Ａ’はＥＤＭＴ画面８０１’を示す。画面８０１’は、フィルタに関する情報を提供する領域８０２’など複数の領域を備える。マイニングを施すデータのフィールドを狭めるように、フィルタを実験データに適用することができる。結果領域８０４’は、フィルタ・データの結果を提供する。グラフ領域８０６’は、データを閲覧するための複数のグラフの形式を提供する。
【０１１５】
図８Ｂ’は、本発明による特定の実施形態における図８Ａ’のフィルタ領域８０２’などのフィルタ領域を示す。図８Ｂ’は、プロジェクト・フィルタ８１２’、プローブ・アレイ・フィルタ８１４’、サンプル・タイプ・フィルタ８１６’、操作者フィルタ８１８’、サンプル名フィルタ８２０’、実験フィルタ８２２’、および分析フィルタ８２４’に関するフィールドを有するフィルタ領域８０２’を示す。図８Ｂ’はまた、データに適用されるフィルタのタイプを示すためのフィルタ結果フィールドを示す。フィルタ領域８０２’のフィルタを使用して、照会を記述することができる。ここで好ましい実施形態では、ユーザが、分析を選択して照会し、次いで結果に関する範囲を選択することができる。
【０１１６】
図８Ｃ’は、本発明による特定の実施形態における図８Ａ’の結果領域８０４’などの結果領域を示す。図８Ｃ’は、実験結果表８３０、照会結果表８３２’、およびピボット結果表８３４’を有する結果領域８０４’を示す。
【０１１７】
図８Ｄ’〜８Ｇ’に、図８Ａ’のグラフ区域８０６’内に表示することができるものなど、本発明による特定の実施形態における典型的なグラフを示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図８Ｄ’は、実験結果の分散型グラフを示す。分散グラフは、対数目盛りまたは線形目盛りで任意の数的結果をグラフ化することができる。さらに、ここで好ましい実施形態は、軸当たり複数の分析を有する能力を提供することができる。プローブ・セットの記述がグラフの右側に含まれる。外部データベースへのホットリンクも、少なくとも本発明による好ましい実施形態では提供することができる。フィルタ、ポイント・サイズ、色など他のオプションをユーザが指定することができる。
【０１１８】
図８Ｅ’は、本発明による特定の実施形態における図８Ａ’のグラフ領域８０６’内に表示することができる折畳み変更グラフを示す。図８Ｅ’の完全変更グラフは、対数目盛りまたは線形目盛りを使用して提供することができ、外部データベースへのプローブ・セット記述ホットリンクを提供し、折畳み変更を再計算する能力を、本発明による特定の実施形態において提供することもできる。さらに、ユーザが、ポイント・サイズや色などのオプションを指定することができる。
【０１１９】
図８Ｆ’に、図８Ａ’のグラフ領域８０６’内に表示することができるものなど、本発明による特定の実施形態における典型的な棒グラフを示す。図８Ｆ’の棒グラフは、任意の数的結果をグラフ化することができ、実施形態は、棒のサイズや色などのオプションを変更することができる能力をユーザに提供することができる。
【０１２０】
図８Ｇ’に、図８Ａ’のグラフ領域８０６’内に表示することができるものなど典型的なヒストグラム・グラフを示す。図８Ｇ’のヒストグラム・グラフは、様々なランドマークを表示する機能をヒストグラム平均差に提供し、ピン・サイズ、範囲、色などのオプションを指定する能力をユーザに提供することができる。
【０１２１】
図９Ａ’に、本発明による特定の実施形態における照会表示画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図９Ａ’は、複数のフィルタに関する表示領域を有する名前保存照会画面９０１’を示す。ユーザは、システムに対するフィルタを定義して、それらを画面９０１’によって表示される参照名と共に保存することができる。フィルタは、後で使用するために、データ・マイニング情報データベース３０４’に保存することができる。
【０１２２】
図９Ｂ’に、本発明による特定の実施形態における注釈画面９０３’を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。注釈画面９０３’は、プローブ・セットに関する情報を表示するための機構を提供する。注釈には、注釈テキスト、注釈のタイプ、および他の有用な情報を含めることができる。注釈タイプは、好ましい実施形態ではユーザ定義される。ユーザ名を指定することもでき、データを指定することができる。ある実施形態では他の情報を指定することができ、ある実施形態ではこの情報全てが指定されるわけではない。
【０１２３】
図９Ｃ’は、図９Ｂ’の注釈画面９０３’で構成されたものなど、本発明による特定の実施形態におけるプローブ注釈の表示の一例を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図９Ｃ’は、対応するプローブ注釈９０６’が表示される情報９０４’の強調表示された線を示す。プローブ注釈は、プローブの名前、記述、および他の有用な情報を提供することができる。
【０１２４】
図９Ｄ’に、本発明による特定の実施形態における照会注釈画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図９Ｄ’は、プローブ・セット・タイプ、注釈、ユーザ識別子、日付、および記述を指定するためのフィールドを有する照会注釈画面９１０’を示す。照会注釈は、複数のフィルタを指定する機能を提供することができ、また注釈を更新する機能を提供することもできる。
【０１２５】
図９Ｅ’は、本発明による特定の実施形態におけるプローブ・セット記述画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図９Ｅ’は、プローブ・セットの名前と、関連する記述とを有するプローブ・セット記述画面９１２’を示す。これらの記述は、結果区域８０４’の下で、発現データ・マイニング・ツール画面８０１’内に表示することもできる。
【０１２６】
図９Ｆ’に、本発明による特定の実施形態におけるアレイ記述を検索するための検索画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図９Ｆ’は、入力を受け入れるための検索フィールド９１６’と、プローブ・セットの記述に関する入力フィールド内に入力されたテキストに合致するプローブ・セットを表示するための出力フィールド９１８’とを有する検索アレイ記述画面９１４’を示す。検索アレイ記述画面９１４’は、ユーザにデータベース内の記述を検索する能力を提供する。ユーザは、入力フィールドを使用して検索基準を定義することができ、様々なフィルタに結果を追加することができる。
【０１２７】
図１０Ａ’は、本発明による特定の実施形態における外部データベースを検索するための画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図１０Ａ’は、プローブ・セット名、記述、および様々な注釈を有するプローブ・セット記述ダイアログ画面１００２’を示す。ユーザは、プローブ・セット記述ダイアログ画面１００２’を使用して、外部データベース内の記述に対応する情報を検索することができる。ダイアログ画面１００２’内の入力データベースを選択することによって、ブラウザ・ウィンドウ１００４’が表示される。ブラウザ・ウィンドウ１００４’は、入力データベースなど外部データベース内の遺伝子発現配列などに関する情報のブラウズを可能にする。ここで好ましい実施形態では、ＵＲＬを特定のプローブ・セットに関連付けることができる。さらに、複数のＵＲＬを特定のプローブ・セットに関連付けることができ、外部データベースからのプローブ・セットに関する関連情報を表示するために、ブラウザ・ウィンドウをシステムによって自動的に活動化させることができる。
【０１２８】
図１０Ｂ’に、本発明による特定の実施形態におけるＦＡＱ表示選択画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図１０Ｂ’は、頻繁に使用される複数の検索を有するＦＡＱ選択画面１００８’を示す。ユーザは、所望の検索を単に選択するだけで、検索の１つを実施することができる。特定のＦＡＱが選択されたときに、ダイアログ画面１０１０’をユーザに表示することができる。ダイアログ画面１０１０’は、選択された検索を定義するためにユーザが回答することができる複数の疑問を提供する。ここで好ましい実施形態では、ＦＡＱをデータ・マイニング情報データベース３０６’に記憶することができる。特定の照会に関連付けられた疑問、英訳、ＳＱＬステートメントを、ＦＡＱと共にデータベースに記憶することもできる。
【０１２９】
図１０Ｃ’に、本発明による特定の実施形態における遺伝子チップ移行画面を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。図１０Ｃ’は、複数の形式１０２４’でのローカル・ファイルに関する表示領域、移行するデータを示す表示領域１０２６’、ステータス領域１０２８’、およびＬＩＭＳサンプル領域１０３０’を有する遺伝子チップ移行画面１０２２’を示す。移行画面を使用して、遺伝子チップ・データをＬＩＭＳに追加することができる。好ましい実施形態では、これにより、サンプル、実験、走査データ、および結果に関する情報の関連付けを容易にすることができる。さらに、ワークフローのシミュレーションを行うことができる実施形態もある。
【０１３０】
図１０Ｄ’に、本発明による特定の実施形態における流体工学ステーション制御画面１０３１’および１０３２’を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。流体工学制御画面１０３１’および１０３２’は、特定の実験名およびプロトコルの選択に基づいて流体工学ステーションを制御する能力をユーザに提供する。ユーザは、流体工学制御画面を使用して、検定タイプ、サンプル・プロジェクト、試薬、およびプロトコルを指定することができる。
【０１３１】
図１０Ｅ’に、本発明による特定の実施形態における、ローカル・ドライブまたはネットワークへの走査を制御するための走査制御画面１０４１’および１０４２’を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。走査制御画面１０４１’および１０４２’は、実験名、プローブ・アレイ・タイプ、行われる走査の数、検定タイプ、サンプル・プロジェクト、実験、および走査される実験の表示を指定する能力をユーザに提供する。
【０１３２】
図１０Ｆ’に、本発明による特定の実施形態における実験情報画面１０５１’および１０５２’を示す。この図は単なる例示にすぎず、本明細書中の請求項の範囲を限定しない。他の変形例、修正例、および代替例が当業者には理解されよう。実験情報画面１０５１’および１０５２’は、実験名、プローブ・アレイ、プローブ・アレイ・ロット、操作者、サンプル・タイプ、サンプル記述、プロジェクト、コメント、試薬、および試薬ロットを指定する能力をユーザに与える。
【０１３３】
（結論）
結論として、本発明は、ユーザによって選択可能なパターンに関する実験情報をマイニングするための方法を提供する。１つの利点は、この方法が、従来技術で知られている方法よりも良い遺伝子発現情報へのアクセスを提供することである。この手法によって提供される他の利点は、多数の実験結果を、例えば視覚化技法および集合論照会を使用して効果的にマイニングすることができることである。
【０１３４】
本明細書に記述された例および実施形態は例示のためのものにすぎず、様々な修正または変更が当業者に本明細書の見解内で提案され、それらが本出願の精神および範囲、ならびに頭記の特許請求の範囲の範囲内に含まれることを理解されたい。例えば、表を削除することができ、複数の表の内容を統合することができ、１つまたは複数の表の内容を、本明細書に記述したよりも多くの表間で分散して、照会速度を改良し、かつ／またはシステム・メンテナンスを補助することができる。また、本明細書に記述されたデータベース・アーキテクチャおよびデータ・モデルは、生物学的な適用例に限定されず、任意の適用例で使用することができる。本明細書で引用した全ての公開、特許、および特許出願を、参照により本明細書に組み込む。
【図面の簡単な説明】
【図１】
ＤＮＡ、ＲＮＡなどの生物学的物質のアレイを形成し分析する本発明に基づく特定の実施形態の代表的なシステムおよびプロセスを示す図である。
【図２Ａ】
図１の代表的なシステムとの使用に適したコンピュータ・システムを示す図である。
【図２Ｂ】
図１の代表的なシステムとの使用に適したコンピュータ・ネットワークを示す図である。
【図３】
データベース・モデルを解釈するためのエンティティ関係図である。
【図４Ａ】〜【４Ｆ】
図１のシステムおよび方法に用いる情報を維持する本発明に基づく特定の実施形態のデータベース・モデルである。
【図５Ａ】〜【５Ｂ】
本発明に基づく選択実施形態の代表的なプロセス・ステップの簡略化した流れ図である。
【図６Ａ】〜【６Ｆ】
本発明に基づく特定の実施形態の代表的なブロック流れ図である。
【図７Ａ】〜【７Ｏ】
本発明に基づく特定の実施形態の代表的なユーザ・インタフェース画面を示す図である。
【図１’】
ＤＮＡ、ＲＮＡなどの生物学的物質のアレイを形成し分析する本発明に基づく特定の実施形態の代表的なシステムおよびプロセスの全体を示す図である。
【図２Ａ’】〜【２Ｂ’】
図１’の全体システムとの使用に適した本発明に基づく特定の実施形態のコンピュータ・システムを示す図である。
【図３Ａ’〜３Ｃ’】
本発明に基づく特定の実施形態に基づく代表的なプロセス・ステップの簡略化した流れ図である。
【図４Ａ’】〜【４Ｂ’】
本発明に基づく特定の実施形態の代表的なデータベース構造およびデータ・フォーマットを示す図である。
【図５Ａ’】〜【５Ｃ’】
本発明に基づく特定の実施形態の代表的な自動化画面を示す図である。
【図６Ａ’】〜【６Ｈ’】
本発明に基づく特定の実施形態の代表的な発現分析画面を示す図である。
【図７Ａ’】〜【７Ｃ’】
セットを対象にした本発明に基づく特定の実施形態の代表的な発現分析画面を示す図である。
【図８Ａ’】〜【８Ｇ’】
本発明に基づく特定の実施形態の代表的な発現データ・マイニング画面を示す図である。
【図９Ａ’】〜【９Ｆ’】
本発明に基づく特定の実施形態では代表的な注釈画面を示す図である。
【図１０Ａ’】〜【１０Ｆ’】
本発明に基づく特定の実施形態の代表的な機能画面を示す図である。[0001]
(Cross-reference to related applications)
This application claims priority from the following US provisional applications: The entire disclosure of these applications, including all appendices and all package inserts, is incorporated by reference for all purposes.
US Provisional Patent Application No. 60/100724, entitled "METHOD AND APPARATUS FOR PROVIDING A LABORATORY INFORMATION MANAGEMENT SYSTEM," filed Sep. 17, 1998 (Attorney Docket No. 018547-0337500US);
U.S. Provisional Patent Application No. 60/100740, filed Sep. 17, 1998, entitled "METHOD AND APPARATUS FOR PROVIDING AN EXPRESSION DATA MING DATABASE" (Attorney Docket Number 018747-0333840US).
[0002]
No. 09/122167, filed on Jul. 24, 1998, entitled "METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS DATABASE," filed on Jul. 24, 1998, and
US Patent Application No. 09 / 122,434, filed July 24, 1998, entitled "GENE EXPRESSION AND EVALUATION SYSTEM", is hereby incorporated by reference.
[0003]
(Statement of Invention Rights Achieved by Research and Development Sponsored by the U.S. Government) The work leading up to this invention was funded by the U.S. Department of Commerce through the National Institute of Standards and Technology.
[0004]
(Background of the Invention)
The present invention relates to computer systems, and more particularly, to computer systems for mining and managing laboratory operations on gene expression levels.
[0005]
Devices and computer systems have been developed that collect information about gene expression or expressed sequence tags (ESTs) in large numbers of samples. For example, PCT application WO 92/10588, which is incorporated herein by reference for all purposes, discloses techniques for examining the sequence of nucleic acids and other substances. Probes for performing these operations can be formed in arrays, for example, based on the pioneering techniques disclosed in US Pat. Nos. 5,143,854 and 5,571,639. Both of these US patents are incorporated herein by reference for all purposes.
[0006]
According to one aspect of the techniques described in these patents, an array of nucleic acid probes is fabricated at a known location on a chip or substrate. Next, the fluorescent label attached to the nucleic acid is brought into contact with the chip, and an image file indicating the position where the labeled nucleic acid is bonded to the chip is generated by the scanner. Based on the identification of the probe at these positions, information such as a DNA or RNA monomer sequence can be extracted.
[0007]
Computer aided techniques for gene expression monitoring using such probe arrays have been developed and are disclosed in European Patent Publication No. 0848067 and PCT Publication WO 97/10365, the contents of which are incorporated herein by reference. Many diseases are characterized by varying degrees of expression of various genes through altered copy number or transcription levels of specific gene DNA (eg, regulation of initiation, supply of RNA precursors, RNA processing, etc.). Attached. For example, loss and gain of genetic material plays an important role in malignant transformation and progression. In addition, changes in the expression (transcription) levels of particular genes (eg, oncogenes or tumor suppressors) provide clues to the presence and progression of various cancers.
[0008]
Information about the expression of a gene or expressed sequence tag can be collected on a large scale in a variety of ways, including the probe array techniques described above. One purpose of collecting this information is to identify genes or ESTs whose expression is of particular importance. Researchers use such techniques to answer the following questions: 1) Which genes are expressed in malignant cells but not in healthy tissues or tissues treated under a particular treatment program? 2) Which genes or ESTs are expressed in specific organs but not in other organs? 3) Which genes or ESTs are expressed in a particular species but not in other species?
[0009]
Collecting enormous amounts of expression data from a large number of samples, including many tissue types, is useful in answering these questions. There is a strong need for a technique for efficiently mining expression data, and in particular, finding relevant items in order to derive sufficient benefits from investment in the collection and storage of expression data.
[0010]
(Summary of the Invention)
The present invention provides techniques for organizing expression or concentration information in a manner that facilitates mining. A database model is provided that can organize information related to the intermediate and final results of sample preparation, expression analysis of experimental results, and gene expression measurements, gene sets, and the like. This model can be easily converted to a database language such as SQL. This database model is large enough to allow mining of gene expression measurements collected from many samples.
[0011]
According to one embodiment of the present invention, there is provided a computer-based method for mining a plurality of experimental information. The method includes various steps, such as collecting information from experiments and chip design. The method can include selecting an experiment to mine. Experimental results and other information can be organized by experimental analysis and the like. Defining one or more groupings for the experiment to be mined is also part of the method. The method further includes selecting information about the experiment to be mined based on the grouping to form a plurality of resulting information. This obtained information may include the resulting set of one or more genes and the like. Finally, the method formats the obtained information for viewing by the user. The combination of these steps allows the user to access the experimental information.
[0012]
In some embodiments, a visualization technique is used in conjunction with the steps of the method so that the results of the data mining can be more easily understood by the user. In some further embodiments, recording a conclusion about the results of the data mining forms part of the method.
[0013]
In another aspect according to the present invention, a method is provided that operates with manifestation information. The method includes various steps, such as collecting information about the results of the experiment. Gathering information about the experiment, including information about the sample and experimental analysis, also forms part of the method. The step of adding one or more attributes to the information about the experiment may also be performed. The method then converts the plurality of experimental results into the converted plurality of information. The transformation can include normalization, denormalization, aggregation, scaling, and the like. Mining the transformed plurality of information and visualizing the transformed plurality of information may be part of the method.
[0014]
The present invention provides improved techniques for monitoring gene expression or sequence analysis. Specifically, the invention provides methods for monitoring expression or managing laboratory operations to perform sequence analysis.
[0015]
According to one embodiment of the present invention, there is provided a computer-based method for managing information about a plurality of experiments performed on a plurality of samples. Each experiment dictates the degree to which a particular gene is expressed in a sample. The method includes various steps, such as registering at least one sample of the plurality of samples in a central database. The method can include tracking a plurality of information about the sample and tracking a plurality of information about the experiment. Generating a sample history for the plurality of samples from the plurality of information may be part of the method. The method can include filtering information about the experiment and information about the sample based on parameters selected by the user. Information can be published to various goals, such as public databases. The combination of these steps can provide a web-based user interface that allows users to access information.
[0016]
In many embodiments, experimental result information can be entered in a format that allows for a cross-platform of information to be used and shared. One such format is the Genetic Analysis Technology Consortium, a standard format for genomic databases provided by Molecular Dynamics, Inc. of Hayward, California, and Affymetrix, Inc. of Santa Clara, California, USA. "GATC"). For more information about GATC, see http: // www. gatconsortium. See org. However, in many embodiments, other standard formats, such as those commonly known in the art, can be used.
[0017]
In another aspect according to the present invention, a method is provided for viewing the results of a plurality of experiments stored in at least one database. The method includes various steps, such as specifying a database in the query. One or more queries can be submitted to form one result. The result can then be viewed by the user. This result may be filtered based on one or more target factors specified by the user to generate a filtered result that can be converted to a graphical form for display and the like.
[0018]
A number of advantages over conventional techniques are achieved by the present invention. Some embodiments according to the present invention have better access to genetic experimental information than methods known in the prior art. In some embodiments, the present invention is more economical than conventional techniques. Embodiments provide responses to inquiries such as "show all genes with gene expression values of 100 or more and at least 3/4 genes respond to this query" as well as many other useful queries to a variety of queries. be able to. Another advantage provided by this method is that the results of multiple experiments can be effectively mined using visualization techniques and set theory queries. Some embodiments according to the invention are simpler than known techniques. The present invention can further provide a fairly clear and legible graphical indication of the laboratory analysis process.
[0019]
The nature and advantages of the present invention may be better understood with reference to the remaining portions of the specification and the accompanying drawings.
[0020]
(Description of a specific embodiment)
One embodiment of the present invention functions as a system for analyzing a biological substance or other substance using an array including probes composed of a biological substance such as RNA or DNA. VLSIPS ^TM And GeneChip ^TM The technology provides a way to form and use very large arrays of polymers, such as nucleic acids, on very small chips. See U.S. Patent No. 5,143,854, PCT Patent Publication Nos. WO 90/15070 and 92/10092, which are hereby incorporated by reference in their entirety for all purposes. The nucleic acid probes on the chip are used to detect complementary nucleic acid sequences in the sample nucleic acid of interest ("target" nucleic acids).
[0021]
It should be understood that the probe need not be a nucleic acid probe, but may be another polymer such as a peptide. Peptide probes can be used to detect the concentration of a peptide, polypeptide or polymer in a sample. A probe must be carefully selected that has a binding affinity with the compound whose concentration is to be determined.
[0022]
FIG. 1 shows a simplified diagram of an exemplary system 100 for forming and analyzing an array of biological materials such as RNA and DNA. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. Chip design system 104 is used to design an array of polymers, such as biological polymers such as RNA, DNA, and the like. The chip design system 104 is, for example, a personal computer or workstation, such as an appropriately programmed Sun Workstation or IBM PC equivalent. The chip design system 104 receives inputs from the user regarding chip design goals, including the properties of the gene of interest, and other inputs regarding the desired array function. The chip design system 104 can optionally receive information about a particular gene sequence of interest from an external database, such as the bioinformatics database 102 or GenBank. The output of the chip design system 104 is a set of chip design computer files and other related computer files in the form of a switch matrix as described in PCT application WO 92/10092. Chip design systems for sequencing, sequence testing and expression analysis are disclosed in US Pat. No. 5,571,639 and PCT application WO 97/10365, which are hereby incorporated by reference in their entirety for all purposes. I have.
[0023]
The chip design file is input to a mask design system (not shown) that designs lithographic masks used in the manufacture of arrays of molecules such as DNA. The mask design system designs a lithographic mask for use in manufacturing the probe array. The mask design system generates mask design files, which are then used by a mask construction system (not shown) to create a mask or other mask, such as a chrome-on-glass mask, used in the manufacture of the polymer array. Build a composite pattern.
[0024]
The mask is used in a synthesis system (not shown). The synthesis system includes the necessary hardware and software to fabricate an array of polymers on a substrate or chip. The synthesis system includes a chemical flow cell on which the substrate or chip is located and a light source. A mask is placed between the light source and the substrate / chip, and the two are translated relative to each other a suitable number of times for deprotection of selected areas of the chip. The selected chemical reagent is flowed into the flow cell for binding to the deprotected area, washing and other operations. Substrates made with the synthesis system are optionally diced into smaller chips. The output of the synthesis system is a chip ready to apply the target sample. Information regarding the mask design system, mask construction system and probe array synthesis system is provided in the Background section.
[0025]
The biological source 112 is, for example, a plant or animal tissue. Various processing steps are applied by the sample preparation system 114 to the material from the biological source 112. These steps include, for example, mRNA isolation, mRNA precipitation to increase concentration, and the like. The result of these processing steps is a target sample ready to be applied to the chips produced by the synthesis system 110. Details of sample preparation methods for expression analysis are discussed in WO 97/10365.
[0026]
The prepared sample contains a sequence of monomeric nucleotides such as RNA, DNA and the like. When a sample is applied to the chip by the sample exposure system 116, some nucleotides will bind to the probe and some will not. The nucleotides are fluorescein labeled to determine which probe bound to the nucleotide sequence of the sample. Next, the prepared sample is placed in the scanning system 118. The scanning system 118 includes a detection device such as a confocal microscope, a CCD (Charge Coupled Device) used to detect the position where the labeled receptor has bound to the substrate. The output of the scanning system 118 is, for a fluorescein-labeled receptor, one or more image files showing the fluorescence intensity (photon count or other relevant measurement such as voltage) as a function of position on the substrate. Higher photon counts are observed where the labeled receptor is more tightly bound to the polymer array, and because the polymer monomer sequence on the substrate is known as a function of position, the receptor and complementary The sequence of the polymer on a given substrate can be determined.
[0027]
The image file and the chip design are input to an analysis system 120 that performs, for example, calling of a base sequence or determination of the expression level of a gene or expressed sequence tag. As used herein, the expression level of a gene or EST is to be understood as the concentration in a sample of mRNA or protein resulting from the transcription of the gene or EST. Such analytical techniques are disclosed in WO 97/10365 and US patent application Ser. No. 08 / 531,137, the contents of which are hereby incorporated by reference in their entirety for all purposes.
[0028]
The expression analysis database 122 maintains information used for expression analysis and the results of expression analysis. The contents of the expression analysis database 122 include, for example, a table listing the analyzes performed, the analysis results, the experiments performed, the sample preparation plans, the parameters of these plans, the chip design, and the like. Details of one embodiment of the expression analysis database 122 may be found in the "METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS DATABASE" filed July 24, 1998, the entire contents of which are incorporated herein by reference for all purposes. It is described in US patent application Ser. No. 09/122167.
[0029]
One or more embodiments of the expression analysis database 122 include information regarding the expression of many genes or ESTs collected from many different tissue samples. This information is useful for examining the following questions, for example. 1) Which genes or ESTs are up-regulated (more expressed) in diseased tissue and which genes or ESTs are down-regulated (less expressed) 2) Organs of one species And how gene expression varies between tissue types, 3) how gene expression varies between species sharing a common gene, and 4) gene expression How it responds; 5) how gene expression changes as the disease progresses.
[0030]
An expression mining database 124 is provided to facilitate this type of investigation. The expression mining database 124 includes, for example, a copy of the data in the expression analysis database. The expression mining database 124 may further include various tables that facilitate mining operations performed by users operating the query / mining system 126. The query / mining system 126 includes a user interface that allows the operator to query the expression of genes and ESTs and ask them to answer the above types of questions. An example of a query / mining system is described in US Patent Application No. 09/09, filed July 24, 1998, entitled "GENE EXPRESSION AND EVALUATION SYSTEM", which is incorporated herein by reference in its entirety for all purposes. No. 122434.
[0031]
The chip design system 104, the analysis system 120, the control circuitry of the exposure system 116, the sample preparation system 114, and the scanning system 118 can be suitably programmed computers such as Sun workstations, IBM compatible PCs, and the like. Independent computers for each system may perform the computer-implemented functions of these systems, or one computer may combine the computer functions of two or more systems. One or more computers independent of the computer operating the system of FIG. 1 can maintain the expression analysis database 122, the expression mining database 124, and the query / mining system 126.
[0032]
FIG. 2A shows a simplified block diagram of a representative host computer system 10 of one particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The host computer system 210 includes a central processing unit 214, a system memory 216 (generally a RAM), an input / output (I / O) adapter 218, external devices such as a display screen 224 via a display adapter 226, and an input / output adapter. A bus 212 interconnects major subsystems such as a keyboard 232 and mouse 234 via 218, a SCSI host adapter 236, and a removable disk drive 238 operable to receive the removable disk 240. The SCSI host adapter 236 can function as a storage interface with a fixed disk drive 242 or a CD-ROM player 244 that operates to receive a CD-ROM 246. Fixed disk 244 may be part of host computer system 210, or it may be separate, accessed through other interface systems. Network interface 248 allows a direct connection to a remote server via a telephone link or to the Internet. Network interface 248 may further connect to a local area network (LAN) or other network interconnected with many computer systems. Many other devices or subsystems (not shown) can be connected in a similar manner.
[0033]
Not all of the devices shown in FIG. 2A are required for the practice of the invention discussed below. The devices and subsystems may be interconnected in other ways than shown in FIG. 2A. The operation of a computer system such as that shown in FIG. 2A is well known in the art and will not be discussed in detail in this application. Code implementing the present invention can be conveniently located or stored in a computer readable storage medium, such as system memory 216, fixed disk 242, CD-ROM 246, removable disk 240, and the like.
[0034]
FIG. 2B shows a simplified diagram of a network 260 interconnecting a plurality of computer systems 210a-210e. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The network 260 is, for example, a local area network (LAN), a wide area network (WAN), or the like. The computer-related operations of the bioinformatics database 102 and other elements of FIG. 2B may be divided in any manner between the computer systems 210, and the network 260 may be used to communicate information between these various computers. can do. Instead of the network 260, a portable storage medium such as a removable disk can be used for transmitting information between computers.
[0035]
Next, the content and structure of the expression mining database 124 of a particular exemplary embodiment according to the present invention will be described. The expression mining database 124 is preferably a multidimensional relational database having a complex internal structure. However, other types of databases may be used depending on the selected embodiment without departing from the scope of the invention. The structure and contents of the expression mining database 124 will be described with respect to a model that describes the contents of the tables in the database and the interrelationships between the tables. The visual representation of this model is an Entity Relationship Diagram (ERD) containing entities, relationships and attributes. A detailed discussion of ERD can be found in Platinum Technologies'"ERwin version 3.5.2 Methods Guide", the contents of which are incorporated herein by reference for all purposes. Those skilled in the art will appreciate that automation tools, such as ERwin, Developer2000, available from Oracle, convert the ERD of FIG. 4A directly into executable code, such as SQL code, to create and operate a database. .
[0036]
FIG. 3 shows ERD clues used to explain the contents of the chip design database 102. FIG. 3 is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The representative table 302 includes one or more key attributes 304 and one or more non-key attributes 306. The exemplary table 302 includes one or more records, each of which includes fields corresponding to the listed attributes. The contents of the key fields collectively identify individual records. In ERD, each table is represented by a rectangle divided by a horizontal line. Fields or attributes above the horizontal line are key attributes, and fields or attributes below the horizontal line are non-key attributes. The identification relationship 308 indicates that the key attribute of the parent table 310 is also a part of the composite key attribute of the child table 312. The non-identification relationship 314 indicates that the key attribute of the parent table 316 is also the non-key attribute of the child table 318. The foreign key denoted by (FK) includes an attribute that is part of another table key or composite key. One record in the parent table corresponds to one or more records in the child table, whether in a non-identifying relationship or an identifying relationship.
[0037]
FIG. 4A shows a simplified entity relationship diagram (ERD) of elements of the expression mining database 124 for a particular embodiment according to the present invention. FIG. 4A is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The rectangle in FIG. 4A corresponds to a table in the expression mining database 124. The name of the table is written above each rectangle. The columns of each table are described in the rectangle. Above the horizontal line in each rectangle is a key field whose contents are used to identify individual records in the table. Below the horizontal line, the name of the non-key attribute is described. The lines connecting the rectangles identify the relationship between records in one table and records in another table. First, the relationship between these various tables will be described. Next, the contents of each table will be discussed in detail.
[0038]
In operation, the expression mining database 124 is updated during the mining operation. Certain tables are updated by importing and converting from the expression analysis database 122. Once the query / mining system 126 operator has defined the query operation, other tables can be updated.
[0039]
It would be useful to be able to identify the gene or EST whose expression changes in some way depending on one or more tissue attributes. Therefore, the query / mining system 126 needs to be aware of the tissue attributes associated with the expression analysis results. Generally, one or more analysis results are associated with what is referred to herein as a "leaf target sample."
[0040]
To better illustrate the operation of the present invention, the relationship between "leaf target samples" and tissue attributes will first be discussed. "Untreated sample" represents one tissue that has been extracted. A single unprocessed sample may be cut into a plurality of unprocessed samples before further processing. The unprocessed sample is an input to the sample preparation system 114. The sample preparation system 114 prepares, for each unprocessed sample, a fluid containing mRNA or other expression indicators, the so-called “target”. A "goal" can be split into multiple "replicas" and the replicas can be pooled to form another goal. Each "goal" applied to a chip is a leaf target sample. Each operation of applying the “leaf target sample” to the chip is one experiment. In a presently preferred embodiment, an expression analysis is performed on the experimental data based on one or more selectable criteria to generate experimental analysis result data.
[0041]
A table in the expression mining database 124 relating to samples and attributes is indicated by the letter “A” in FIG. 4A. Leaf target samples, unprocessed samples, duplicates, targets, etc. are listed in the sample item table 402. The sample item derivation table 404 lists conversions from one sample item to another. The sample item derivation table lists split, pool, and cut operations, raw sample to target conversion, and applied analysis. The sample derived type table 406 lists various types of transformations. The sample item type table 408 lists various sample item types, such as targets, duplicates, raw samples, leaf target samples, analyzes, and the like. Listing sample derivation types and sample item types facilitates reprogramming to accommodate changes in sample processing procedures.
[0042]
Attributes are associated with the sample. Some of the attributes are strings or values that identify concentration, sample preparation date, expiration date, and the like. Other attributes identify characteristics that are useful in locating the gene or EST of interest, such as the disease state of the tissue, organ or species from which the sample was taken. The attributes are listed in the sample item attribute table 410. Sample item attribute map table 412 implements a many-to-many relationship between sample item attribute table 410 and sample item table 402. One sample can have more than one attribute, and one attribute can describe more than one sample item.
[0043]
Each attribute has an associated attribute type and associated value for that attribute listed in the sample item attribute type table 414. Examples of the attribute type include “concentration”, “preparation date”, and “expiration date”. Another example of an attribute type is "specimen type", and its possible values are "tissue", "organ culture", "purified cells", "primary cell culture", "cell line", etc. I do. Another example would be a "racial group", whose value would correspond to, for example, "East Asian", "Native American".
[0044]
Many attribute types can be understood to be derived from other attribute types. For example, the attribute type "race group" is derived from the attribute type "human", and the attribute type "human" is derived from the attribute type "species". Some attribute types have no associated attributes and define the level of categorization. Derivations relating the "parent" attribute type to the "child" attribute type are listed in the attribute type derivation type table 418. An attribute type can have one or more parents or children. The various types of derivations are listed in the attribute type derivation type table 420. One of the representative attribute type derived types is a category-subcategory in which a parent type represents a category and a child type represents a subcategory. The availability of derivation relationships between attribute types facilitates the formulation of useful queries to the expression mining database 124 and allows the user to easily identify the attribute type of interest.
[0045]
The table relating information about the experiment is indicated by the letter "B" in FIG. 4A. Experiment table 424 lists the experiments for which the results can be queried. Data map table 426 lists entries corresponding to the set of genes or ESTs to be examined. Each set corresponds to a set of experiments performed to examine the genes in that set. The experiment set table 428 lists the associations between the experiments and the entries in the data map table 426, and thus defines the set of experiments corresponding to each gene set. The analysis set table 430 defines the set of analysis performed corresponding to each gene set. Each entry defines an association between an analysis, an experiment, and an entry in the data map table 426.
[0046]
A table relating to information about genes is indicated by the letter “C” in FIG. 4A. The gene set table 432 defines the membership of all gene sets defined by a user or otherwise in preparation for a query / mining operation. Gene set name table 434 lists the names of the gene sets. Genes belonging to the gene set are listed in the bio item registration table 436. Each entry in the bio item registration table 436 identifies a registration number in the bio item database. The definition of the registration number is stored in the registration definition table 438. Housekeeping Genes Table 440 lists genes of known expression levels used to correct the expression monitoring process.
[0047]
A table related to the analysis information is indicated by the letter “D”. Absolute expression analysis results are stored in absolute results table 444. Each entry in the absolute result table 444 references an absolute result type. The different absolute result types are indicative of an estimate of the expression level of a given gene or EST, eg, present, borderline, absent, unknown, etc. Various relative absolute result types are listed in Absolute Result Type Table 446. The relative analysis result is stored in the relative analysis result table 448. Each entry in the relative analysis result table 448 references a relative result type listed in the relative result type table 450. Relative analysis compares gene expression in two experiments. The different relative result types describe changes in expression, eg, increased, unchanged, reduced, unknown, etc. Tables 448 and 450 are populated from expression analysis database 122 and are read-only from query / mining system 126.
[0048]
The query / mining system 126 also performs various expression analysis operations. The results of these calculations are maintained in a calculated fields table 452.
[0049]
Tables related to mining and query operations are denoted by the letter "E". At any time, the user considers data from the set of experiments. A list of sample items used in these experiments is stored in the Selected Sample Items Table 454. The selection sample item table 454 is generally much smaller than the sample item table 402, and allows for faster query operations.
[0050]
Each entry in the criteria set table 456 identifies the criteria set used to query the group selected by the sample item or attribute. Each entry in the reference set experiment table 458 identifies a reference set applied to a gene or EST expression level of a particular sample item belonging to the group identified by reference to the reference set table 456. Criteria set experiment details table 460 includes entries that identify values that apply as criteria.
[0051]
Users of the query / mining system 126 have no access to information about leaf target samples, but only information about their "parents." Expression data can be recorded for leaf target samples. The reference set experiment leaf table 462 allows the entries in the reference set experiment table 458 to be associated with the sample items in the sample item table 458 and the leaf target samples corresponding to those sample items.
[0052]
Various other tables may be included in embodiments according to the present invention. These tables are designated by the letter "F". The user preference table 464 stores references to user preference files that record individual user preferences of the query / mining system 126. The user may want to store the function used to normalize the expression data for later use. The normalization adjustment function table 466 lists information about normalization and other conversion functions. The user may want to store the function used to average the expression data collected from the relevant replica. A description of these averaging functions is stored in the replicated averaging function table 468.
[0053]
FIG. 5A shows a flowchart 501 of simplified process steps of a representative specific embodiment according to the present invention for mining multiple pieces of experimental information for one pattern. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. In step 502, information on experiments and chip design is collected. Then, in step 504, the experimental analysis to be mined is selected. At step 506, one or more sample attributes are defined. At step 508, information resulting from the mining is obtained from the experimental analysis to form a plurality of resulting information. This obtained information can include the resulting set of one or more genes. Finally, at step 510, the obtained information is formatted for display to the user. The combination of these steps allows the user to access the experimental information.
[0054]
FIG. 5B shows a simplified process step flow chart 503 of an alternative embodiment according to the present invention for manifestation information. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. At step 512, information regarding a plurality of results of a plurality of experimental analyzes is collected. Then, at step 514, information about the sample and information about a plurality of experiments is gathered. Next, at step 516, one or more attributes are added to the information about the experiment. Next, at step 518, the plurality of results of the experiment information are converted to form the converted plurality of information. The conversion includes normalization, denormalization, normalization, aggregation, and the like. Subsequently, in step 520, the plurality of pieces of converted information are mined. Then, in step 522, the results of the mining are visualized for display to the user. Finally, at step 524, the conclusion is recorded.
[0055]
FIG. 6A shows a representative block flow diagram of simplified process steps of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. Block flow diagram 601 includes an input data warehouse 602, a transformation step 604 that produces an output data mart 606, and a mining process step 608. The input data warehouse 602 can include a laboratory information management system and other databases. In certain embodiments, the data warehouse 602 can include genomic and chip design information, as well as other information useful for the laboratory expression analysis process.
[0056]
FIG. 6B shows a simplified block diagram of a representative data warehouse of a particular embodiment according to the present invention, such as data warehouse 602 of FIG. 6A. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The data warehouse 602 includes a plurality of public databases including a laboratory information management system 610 and a public database 612. In one particular embodiment, data warehouse 602 further includes a chip design component 614. Further, the genome information component 616 can be part of the data warehouse 602. Further, in some embodiments, another reference database 618 becomes part of the data warehouse 602. In many embodiments, other information can be further included or these particular components can be omitted without departing from the scope of the invention.
[0057]
In a particular embodiment according to the present invention, the data conversion step 604 of FIG. 6A includes a normalization and adjustment step. Normalization and adjustment can include functions tracked by analysis type and / or function type. In some embodiments, VBA functions or independent applets are added or deleted. In even more embodiments, the user can selectively skip some transforms based on preferences. The data conversion step 604 can include a duplication step that allows a user to manipulate the duplication in a manner similar to normalization and adjustment. In many more embodiments, the user can use the sample identification to identify duplicates of the derived type. Further, in some embodiments, a custom selection of duplicates can be incorporated into the applet.
[0058]
FIG. 6C shows a representative data mart of a particular embodiment according to the present invention, such as data mart 606 of FIG. 6A. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. An exemplary data mart 606 may include an experimental set 620. Experimental set information and results can be transferred to expression results 622. In many embodiments, multiple samples 624, which can have one or more sample attributes, can further have a relationship with an expression result 622. Data mart 606 can further include a plurality of genes 626. Finally, in the presently preferred embodiment, time can be treated as a dimension 628 of the expression result 622. Other ways of organizing the data in the data mart can be used without departing from the scope of the present invention.
[0059]
In certain embodiments, experiments may be added to or deleted from experiment set 620. In many more embodiments, the same set of experiments can be mined for multiple purposes. Further, the experimental set 620 can be subdivided into one or more experimental subsets for mining.
[0060]
FIG. 6D illustrates a representative sample and target organization of a particular embodiment according to the present invention, such as sample 624 of FIG. 6C. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The samples and goals allow the user to describe the steps of the experiment. The highest level is the unprocessed sample. FIG. 6D shows sample 624 including unprocessed sample 630. Below the unprocessed sample are one or more replicas. Unprocessed sample 630 includes two replicas, replica 632 and replica 634. The replica may include a goal. Replica 632 is the target to be processed using Drug A. Replica 634 is the target to be processed using Drug B. Goals can include one or more leaf goals. For example, goals 632 include leaf goals 636, 638, 640 and 642. Goals 634 include leaf goals 644, 646, 648, and 650. Experimental analysis can be linked to leaf goals. FIG. 6D shows the experimental analysis 652 and the experimental analysis 654 associated with the leaf target 636. In a presently preferred embodiment, an experimental analysis can be defined recursively, ie, one experimental analysis can include one or more experimental analyses. In certain embodiments, the intermediate levels can be user defined. It is possible to include other levels and use other arrangements without departing from the scope of the invention.
[0061]
FIG. 6E illustrates another representative sample and target organization of certain embodiments according to the present invention, such as sample 624 of FIG. 6C. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. FIG. 6E shows an unprocessed sample 670 representing one extracted piece of tissue or the like. The unprocessed sample 670 has been cut into a plurality of unprocessed samples, such as unprocessed samples 672, 673, 674. These raw samples are inputs to the sample preparation system 114 of FIG. The sample preparation system 114 prepares a target, such as the target 676 corresponding to the unprocessed sample 672. Liquids containing mRNA or other display indicators can be targeted. Goal 672 is split into multiple replicas, such as replicas 677, 678, 679. Replicas 678 and 680 are pooled to form another target 682. Each "target" applied to a chip is a leaf target sample. Each operation of applying the “leaf target sample” to the chip is one experiment. Leaf target sample 684 is an example. In a presently preferred embodiment, one or more experimental analyzes can be associated with a particular leaf target sample. In this figure, analyzes 686 and 688 are associated with leaf target sample 684. Further, one experimental analysis can be defined with respect to one or more other experimental analyses.
[0062]
FIG. 6F illustrates a representative organization of a plurality of attributes of certain embodiments according to the present invention, such as attribute 628 of FIG. 6C. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. FIG. 6F shows a plurality of attributes having a non-hierarchical structure. In the currently preferred embodiment, any number of attributes can be assigned to a particular sample. Further, different samples can have the same attributes. FIG. 6F shows a species 660 having a relationship with a plurality of attributes such as a human attribute 662, a mouse attribute 664, a corn attribute 666, and a yeast attribute 668. The "family" and "race" windows are examples of attributes. Other configurations and attributes may be used in various embodiments without departing from the scope of the present invention.
[0063]
In some embodiments, gene 626 of FIG. 6C can be linked to one or more gene sets. Gene sets can be described by various users. In at least one particular embodiment, the gene set cannot be shared between users, while in other embodiments it can be shared between users. A user can copy another user's gene set and edit or delete the gene set. In a presently preferred embodiment, gene sets can be created or stored during data mart mining. In some embodiments, one or more functional operations can be applied to the gene set, such as logical operations such as logical sums and logical products, and arithmetic operations such as addition, subtraction, and scaling.
[0064]
FIG. 7A shows a representative experiment set screen 701 of the user interface of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. Screen 701 allows the user to interact with the experiment set contained in the expression mining database 124 of FIG. Screen 701 includes an experiment set selection tab 702 shown with four experiment sets, such as experiment sets 704 and 706. Other experiment sets can be added as needed. In various embodiments according to the present invention, other formats may be used to present this information to a user.
[0065]
FIG. 7B shows an experiment selection screen 703 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The experiment selection screen 703 includes an experiment tab 730. Multiple experiments are shown in two scroll windows: a select experiment window 734 and an available experiment window 736. Various experiments can be moved between the experiment selection scroll windows 734 and 736 using the select buttons 738a and 738b. The experiment selection window 736 includes a plurality of experiments. To limit the number of experiments displayed in the experiment selection scroll windows 734 and 736, one or more filters can be applied to the experiment data using a filter mechanism at the bottom of the screen. The filter mechanism 744 includes a column selection field 746 and a selection value input field 748. The user can select a particular field to sift through the experiment using the column selection field 746, and then enter the desired value in the value entry field 748. By clicking on the filter button 750, the user then applies a filter to the experiments in the set, so that in the experiment selection scroll windows 734 and 736 only those experiments whose columns are set to the selected value. Can be detected.
[0066]
FIG. 7C shows a selected experiment set screen 705 having an analysis tab 751 of a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. Screen 705 includes two selection scroll windows, a selection analysis window 752 and an available analysis window 754. The select keys 756 and 758 can be used to move various analyzes between the select scroll windows 752 and 754. Similarly, using the filter mechanism provided at the bottom of screen 705, select a particular column using column selection field 760, enter the desired value in value entry field 762, and click filter button 764. By applying filters to the analyzes in the experimental set, the user can screen the analyzes shown in the selection scroll windows 752 and 754.
[0067]
FIG. 7D shows a representative sample selection screen 707 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. At screen 707, the user can see the results of the selections made on one or more samples. Screen 707 includes a plurality of selections, including sample selection 770, sample type selection 771 and attribute type selection 772. A search and selection can be performed using the previous / next button pair 774 and the select button 775, respectively.
[0068]
FIG. 7E illustrates a representative sample / attribute management screen 709 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. Using screen 709, the user can add, delete, or rename samples, attributes, sample types, attribute types, and relationships between them. Screen 709 includes sample / attribute section 722 and relationship section 724. The item selection window 776 of the sample / attribute section 722 provides the ability for the user to select a type of new item, sample, attribute, etc. Using the function button 777, the user can select an operation such as new addition, name change, or deletion. If the user selects to create a new item, the screen 711 of FIG. 7F is displayed. Using screen 711, the user can create a new item. The user can enter the name of the item in new item field 780 of screen 711 and the type of item in item type field 784. Alternatively, the user can work on relationships using the relationships section 724 of the screen 709 of FIG. 7E.
[0069]
Using the relationship selection window 778 of the relationships section 724, the user can select a type of relationship, for example, a sample item to sample item relationship, an attribute to sample item relationship, an attribute type to attribute type relationship, and the like. . Using the function button 779, the user can select an operation such as new addition or deletion. If the user selects to create a new relationship, screen 713 of FIG. 7F is displayed. Using screen 713, the user can create a new relationship. The user can enter the source of the relationship in the source window 782, the parent in the parent window 786, and the type of the relationship in the derived type window 788.
[0070]
FIG. 7G illustrates a representative data mining option management screen 715 for a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. Screen 715 shows multiple tabs, including a query / chart tab 790, a pattern tab 792, and a gene set comparison tab 794. The user can specify a number of grouping parameters using the group by function of the query / chart tab 790 to initiate data mining.
[0071]
FIG. 7H shows an experiment mining screen 717 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. Using the functions of the screen 717, the user can input criteria for data selection, such as which gene set to use. Screen 717 includes a plurality of sample items such as sample item 796. Using the group selection field 798, the user can make a selection from multiple groups in the experimental set. One or more gene sets can be selected using the gene set selection field 800. The gene set can be all genes or a subset represented by a particular gene chip. In one particular embodiment, all gene sets on a particular gene chip are provided as defaults, but other defaults can be used. The expression percentage field 802 can be used to specify the degree of gene expression within a group. When the user specifies search parameters using these fields, pressing the execute button 801 starts data mining.
[0072]
FIG. 7I shows a selection data screen 719 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The data selection screen 719 shows data that satisfies the criteria specified by the user on the experiment mining screen 717 in FIG. 7H. The data selection screen 719 shows a plurality of leaf parents including the leaf parent 804. Screen 719 further shows experimental replica 805, bio item 806, and results 807 measured during the experiment for each leaf parent. The user can export the mining results using the export button 808 and / or save the mining results using the save gene set button 809.
[0073]
FIG. 7J shows a bar graph visualization screen 721 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The distribution map selection visualization screen 721 includes a display area having a display of data in the experiment set. The amount to visualize can be selected from selection value field 814. Experimental results 810 and 812 indicate the difference in expression of a particular gene relative to the amount selected by the user in field 814.
[0074]
FIG. 7K shows a distribution diagram visualization screen 723 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The distribution map selection visualization screen 723 includes a display area 819 having a display of data in the experiment set. Although the display area 819 is an XY plot, various embodiments according to the present invention allow for other forms of data visualization, such as bar charts, graphs, pie charts, and the like.
[0075]
FIG. 7L shows pattern search screens 725 and 727 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. The gene pattern search allows the user to find relationships such as which genes behave similarly when exposed to a certain drug. Selecting the “Patterns” tab on screen 725 displays an information input device that includes a gene pattern field 820 for entering search criteria. By designating a search for a gene pattern, a gene pattern search screen 727 is presented to the user. The user can use the gene set name fields 822 and 824 to select multiple gene sets to compare. Using the measurement selection field 826, a user can select a measurement of interest as a basis for comparison.
[0076]
FIG. 7M shows gene set comparison screens 729 and 731 of a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. By comparing gene sets, a user can determine relationships such as which gene sets contain a particular gene, which gene sets do not contain a particular gene or a functional combination of genes. By selecting the “Gene Set Comparison” tab on screen 729, information about gene sets that can be selected by the user for comparison is displayed. The screen 729 shows a plurality of gene sets including the gene sets 830, 832 and 834. When a target gene set is designated, a gene comparison screen 731 is presented to the user. By checking one or more of a plurality of selection windows, such as selection windows 836, 838, the user can select a plurality of genes as a starting point for comparing the gene sets selected on screen 720.
[0077]
7N-7O illustrate sample data management screens 733 and 735 of a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Those skilled in the art will recognize other variations, modifications, and alternatives. FIG. 7N shows a gene set management screen 733. Using this screen, the user can perform various tasks using genes and gene sets, such as adding, deleting, creating, and copying gene sets, and adding and deleting genes within gene sets. FIG. 70 shows a gene set update screen 735. Using this screen, the user can specify one or more genes to delete from the database.
[0078]
Laboratory information management)
One embodiment of the present invention operates with a system for analyzing biological or other material using an array that includes probes that may be comprised of biological material such as RNA or DNA. VLSIPS ^TM Technology and GeneChip ^TM The technology provides a way to create and use very large arrays of polymers, such as nucleic acids, on very small chips. See U.S. Pat. No. 5,143,854, incorporated herein by reference for all purposes, and PCT Patent Publication Nos. WO 90/15070 and 92/10092. The nucleic acid probes on the chip are used to detect complementary nucleic acid sequences in the sample nucleic acid of interest ("target" nucleic acids).
[0079]
It should be understood that the probe need not be a nucleic acid probe, but may be another polymer such as a peptide. Peptide probes can be used to detect the concentration of peptides, polypeptides, and polymers in a sample. Probes should be carefully selected to have binding affinity for the concentration of compound used in the measurement. [0080]
FIG. 1 'shows an overall system 100' for forming and analyzing an array of biological materials such as RNA and DNA. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. The chip design system 104 'is used to design an array of polymers, such as biological polymers such as RNA and DNA. The chip design system 104 'may be, for example, a personal computer or workstation, such as an appropriately programmed Sun Workstation or an IBM PC equivalent including appropriate memory and CPU. Chip design system 104 'obtains input from a user to evaluate chip design goals, including the characteristics of the gene of interest, and other inputs to evaluate desired features of the array. Optionally, the chip design system 104 'can obtain information from a bioinformatics database 102' or an external database such as GenBank to evaluate the specific gene sequence of interest. The output of the chip design system 104 'is a set of chip design computer files in the form of, for example, a switch matrix as described in PCT application WO 92/10092 or other related computer files. Systems for designing chips for sequencing and expression analysis are disclosed in US Pat. No. 5,571,639 and PCT application WO 97/10365, the contents of which are incorporated herein by reference.
[0081]
The chip design file is input to a mask design system (not shown) that designs a lithographic mask used to manufacture an array of molecules such as DNA. Mask design systems design lithographic masks used in the manufacture of probe arrays. The mask design system generates a mask design file, which is then used by a mask construction system (not shown) to create a mask, such as a chrome-on-glass mask, or other synthetic pattern used to manufacture the polymer array. Is composed.
[0082]
The mask is used in a synthesis system (not shown). The synthesis system includes the necessary hardware and software used to fabricate the polymer array on the substrate or chip. The synthesis system includes a light source and a chemical flow cell on which the substrate or chip is located. The mask is located between the light source and the substrate / chip, and the two are translated with respect to each other at the appropriate times to effect deprotection of selected areas of the chip. Selected chemical reagents are sent through the flow cell for binding to the deprotected area, as well as for washing and other operations. Substrates manufactured by the synthesis system are optionally diced into smaller chips. The output of the synthesis system is a chip prepared for attaching the target sample. Information about mask design, mask configuration, and probe array synthesis system is presented as background.
[0083]
The biological source 112 'is, for example, tissue from plants or animals. The sample preparation system 114 'subjects the material from the biological source 112' to various processing steps. These steps may include mRNA sequestration, mRNA precipitation to increase concentration. As a result of the various processing steps, there is a target sample prepared to be attached to the chip generated by the synthesis system 110 '. Sample preparation methods for expression analysis are discussed in detail in WO 97/10365.
[0084]
The prepared sample contains a nucleic acid sequence such as RNA or DNA. When the sample is attached to the chip by the sample exposure system 116 ', the nucleic acids in the sample may or may not bind to the probe. The nucleic acid is tagged with a fluorescein label to determine whether the probe binds to a nucleic acid sequence from the sample. The prepared sample is placed in a scanning system 118 '. The scanning system 118 'includes a detection device such as a confocal microscope or a CCD (Charge Coupled Device) used to detect the position where the labeled receptor is bonded to the substrate. The output of the scanning system 118 'is, for a fluorescein-labeled receptor, an image file showing the fluorescein intensity (photon count or other relevant measurement such as voltage) depending on its location on the substrate. Because higher photon numbers are observed when the labeled target is more strongly bound to the array of polymers, and because the monomer sequence of the polymer on the substrate is known in a position-dependent manner, the probe To determine the target sequence on the substrate complementary to.
[0085]
The design of the image file and the chip, for example, is input to an analysis system 120 'that calls the base sequence or determines the expression level of the gene or the expressed sequence tag. As used herein, the expression level of a gene or EST is understood as the concentration in a sample of mRNA or protein resulting from transcription of the gene or EST. Such analytical techniques are disclosed in WO 97/10365 and U.S. patent application Ser. No. 08 / 53,137, the contents of which are incorporated herein by reference.
[0086]
The expression analysis database 122 'maintains information used to analyze expression and the results of the expression analysis. The contents of the expression analysis database 122 'can include tables listing the analyses performed, the results of the analyses, the experiments performed, sample preparation protocols and parameters of these protocols, chip design, and the like. Details of one embodiment of the expression analysis database 122 'are described in U.S. patent application Ser. The contents of which are incorporated herein by reference for all purposes.
[0087]
One or more instances of the expression analysis database 122 'can include information regarding the expression of many genes or ESTs collected from many different tissue samples. Using this information, for example, 1) which genes or ESTs are up-regulated (more expressed) in diseased tissue and which are down-regulated (less expressed) in diseased tissue 2) how gene expression differs between organs and tissue types within a species, 3) how gene expression differs between species sharing a common gene, 4) during various disease treatment periods It is useful to investigate questions such as how gene expression responds to it, and 5) how gene expression changes with disease progression.
[0088]
To facilitate such a search, an expression mining database 124 'is provided. The expression mining database 124 'can include a duplicate representation of the data in the expression analysis database. The expression mining database 124 'may also include various tables to facilitate mining operations performed by users operating the query and mining system 126'. Query and mining system 126 'includes a user interface that allows an operator to query to investigate the expression of genes and ESTs and answer the types of questions identified above. An example of a query and mining system is described in commonly owned U.S. patent application Ser. No. 09 / 122,434, filed Jul. 24, 1998, entitled "GENE EXPRESSION AND EVALUATION SYSTEM".
[0089]
The chip design system 104 ', the analysis system 120', the controls of the exposure system 116 ', the sample preparation system 114', and the scanning system 118 'may be a suitably programmed computer such as a Sun workstation or an IBM compatible PC. . Independent computers for each system can perform the computer functions of these systems, or one computer can combine the computer functions of two or more systems. One or more computers can maintain an expression analysis database 122 ', an expression mining database 124', a query and mining system 126 'independently of the computer operating the system of FIG. 1'.
[0090]
FIG. 2A 'shows a block diagram of a host computer system 10' suitable for implementing certain embodiments according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 2A 'shows external devices such as a central processing unit 214', a system memory 216 '(typically RAM), an input / output (I / O) adapter 218', and a display screen 224 'via a display adapter 226'. Interconnects key subsystems such as keyboard 232 'and mouse 234' via I / O adapter 218 ', SCSI host adapter 236', removable disk drive 238 'operable to accommodate removable disk 240' Shown is a host computer system 210 'that includes a bus 212'. The SCSI host adapter 236 'can serve as a storage interface to a fixed disk drive 242' or a CD-ROM player 244 'operable to accommodate a CD-ROM 246'. The fixed disk 244 'may be part of the host computer system 210', or may be independent and accessed via other interface systems. Network interface 248 'can provide a direct connection to a remote server over a telephone line, or to the Internet. Network interface 248 'may also connect to a local area network (LAN), or other network that interconnects many computer systems. Many other devices or subsystems (not shown) can be connected in a similar manner.
[0091]
Also, as described below, not all of the devices shown in FIG. 2A 'need be present to practice the present invention. The devices and subsystems may be interconnected in different ways than shown in FIG. 2A '. The operation of a computer system as shown in FIG. 2A 'is readily apparent in the art and will not be discussed in detail in this application. Code for practicing the present invention is operably located or stored on a computer readable storage medium, such as system memory 216 ', fixed disk 242', CD-ROM 246 ', floppy disk 240'. be able to.
[0092]
FIG. 2B 'shows a simplified diagram of a network 260' interconnecting a plurality of computer systems 210a'-210e '. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. Network 260 'may be a local area network (LAN), a wide area network (WAN), or the like. The computer-related operations of the bioinformatics database 102 ', and other elements of FIG. 2B', may be performed in any manner using the network 260 'used to communicate information between the various computers. Can be divided between Portable storage media, such as removable disks, can be used instead of network 260 'to carry information between computers.
[0093]
FIG. 3A ′ shows a flowchart 301 ′ of simplified processing steps for managing information about multiple experiments performed on multiple samples in certain exemplary embodiments according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. Each experiment can provide an indication of the degree of expression of a particular gene sequence in a sample. At step 310 ', at least one of the plurality of samples is registered in a centralized database. Next, in step 312 ', information about the samples is tracked. As a result of step 312 ', information about the sample can be incorporated into the database. Then, in step 314 ', information about the experiments is tracked. Changes in the experimental environment within the laboratory are reflected in the database by the function of step 314 '. Next, at step 316 ', a sample history is generated from the information in the database. The sample history describes the state of a plurality of samples. At step 318 ', information about the plurality of experiments and information about the plurality of samples is filtered by one or more filters selected by a user to generate expression sequence information. Finally, in an optional step 320 ', the expressed sequence information resulting from the manipulation of the experiment in the laboratory can be posted on a public database accessible by a web-based user interface or other means.
[0094]
FIG. 3B 'shows a simplified flowchart 303' of the processing steps for viewing the results of a plurality of samples in another embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. The results can be stored in one or more databases. At step 322 ', the user specifies and queries the database. Next, at step 324 ', one or more queries are submitted to a database to form a result. Then, at step 326 ', the user can view the results using the display device. At step 328 ', the results can be filtered by one or more user-specified filters. Finally, at step 330 ', the filtered result can be in the form of a graph.
[0095]
FIG. 3C ′ provides an exemplary flowchart 305 ′ of simplified processing steps for managing information about multiple experiments performed on a sample, in certain embodiments according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. At step 330 ', the sample is registered in the database. Next, in step 332 ', the experiment is set up. In step 334 ', an aliquot is performed. Then, in step 336 ', the RNA is extracted. A polymerization chain reaction (PCR) is performed on the RNA at step 338 '. At step 340 ', the cRNA is labeled. In step 342 ', fragmentation is performed. Hybridization is performed at step 344 '. In step 346 ', a scan of the hybridized chip is performed. Then, in step 348 ', the grid is arranged. A cell average analysis is performed at step 350 '. At step 352 ', a probe array analysis is performed, and at step 354', a combined analysis is performed.
[0096]
FIG. 4A 'illustrates a typical database configuration in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 4A 'illustrates a client workstation 401' that may be one of the workstations 210 'of FIG. 2B', for example, capable of interconnecting with one or more of a plurality of databases. For example, the GATC database 403 'contains the results of a plurality of gene chips in GATC format. The GATC format provides a standard interface for gene chip data across multiple systems. For more information on GATC, see the documents entitled "Software Specification" and "Database Schema", which are hereby incorporated by reference in their entirety for all purposes, at http: // www. gatconsortium. org. Database 405 'provides data mining information and may include FAQs and preferences. The database 407 'comprises annotations, descriptions, and URLs for genetic information. Embodiments can include all of the above databases, or can comprise a subset of the databases, or can include other databases without departing from the scope of the claimed invention.
[0097]
The database configuration of FIG. 4A 'provides data management functions, data posting functions, and integration with gene chip clients such as client 401'. The data management function can include a laboratory information management system (LIMS). Embodiments implementing LIMS according to the present invention can provide data tracking functions such as processing input, processing output, and processing environment. Data security features such as authentication, access permissions, and privileges include separating owners with write access and groups of users with read-only access. The data sharing function can provide group access to data. Posting and sharing of data can be facilitated by adapting to standard data formats. Here, in the preferred embodiment, the GATC format can be used. This standard format provides inter-system access to gene chip data. In a preferred embodiment, the database server may be an Internet server that provides web browser access. Embodiments can include scripting capabilities and can provide analytics functionality at the server. In some embodiments, communication with a web application, such as a browser, and a database application via a gene chip interface can be provided. The database can be implemented in a server, such as a SQL server or an ORACLE server. The database server can reside on several platforms, such as ORACLENT and UNIX.
[0098]
FIG. 4B 'shows a data source selection window 409' with multiple data sources in a particular embodiment according to the present invention from which gene and experimental information can be obtained, searched and manipulated. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. Figure 4B 'is, MICROSOFT EXCEL file, a text file, MICROSOFT ACCESS 97 database, AlfaPublish, DataMiningInfo, GeneInfo, JetForm ASCII file, JetForm dBase, JfDbFetchDBF, JfSample, JetForm Filler Example, Forms Track, JetForm Excel, JetForm Excel 5, AFFYMETRIX, Shows a number of different database formats including, but not limited to, Publish_Static, GeneChipLIMS, EliPublish, GEData, and the like.
[0099]
Many embodiments according to the present invention may provide for the automation of the collection and analysis of experimental data, as well as the posting of results. Many embodiments according to the invention can provide for expression analysis, sample registration, and posting of results for multiple experiments on a particular sample and multiple samples. Further, the methods and techniques of the present invention can automate the definition of user parameters, such as for analysis.
[0100]
FIG. 5A 'illustrates an exemplary automation page in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 5A 'shows an automation page 501' having a sample information area 502 ', an experimental information area 504', and a sample experimental probe array area 506 '. Sample information area 502 'provides fields for entering data such as sample name, sample type, project name, sample description, and any comments. Various embodiments of the present invention may include fields for entering other data. The experimental information area 504 'contains information about the probe array, such as the experiment name, probe array image identifier, probe array type, lot number, analysis set, cell average set, and a target database for posting results. And a field for entering Area 506 'provides a display screen for matching sample probe arrays, sample experiments, and probe array identifiers. The presently preferred embodiments provide the ability to have multiple samples and the ability to have multiple experiments per sample.
[0101]
FIG. 5B ′ illustrates an automation results page 503 ′ in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. The automation results page 503 'provides a display of multiple steps during setup and execution of an embodiment, and results for a particular sample for each step. For example, as shown in FIG. 5B ', the first step of the sample, labeled "sample demo paste registration", has received a successful result. Various steps can include other steps in various embodiments without departing from the scope of the claims of the present invention.
[0102]
FIG. 5C ′ shows an exemplary expression scan screen 505 ′ in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 5C ′ shows information about the pending scan. Screen 505 'includes a hybridized expression probe array image identifier field 510', which can be used by a user to select a particular probe array for scanning. The sample in the experiment information field 512 'provides information about the sample, such as name, project, sample type, user identifier, and date, as well as information about the experiment. Probe array information field 514 'provides information about the probe array image, such as identifier, array type, lot number, and the like. Hybridization information field 516 'provides information regarding reagents and lot numbers. Multiple filter fields 518 'provide the ability to filter sample projects, sample types, and probe array types.
[0103]
FIG. 6A 'shows a typical sample registration screen in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 6A 'shows a sample registration screen 601' having fields for entering data describing the sample. For example, screen 601 'includes a sample name 602', a sample project, a field for entering a sample type, and a comment and description field. The initial processing input point field 604 'allows the user to select a particular point in the laboratory processing as a starting point. Registered sample field 606 'provides a list of registered samples. Sample information field 608 'provides information about various samples.
[0104]
FIG. 6B 'shows a plurality of screens before automating laboratory information management in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 6B 'shows a screen 610' for setting up the experiment. Screen 612 'allows performing the steps of aliquoting. Screen 614 'allows RNA extraction to be performed. Screen 616 'allows the RT PCR to be performed. Screen 618 'allows for labeling of cRNA and screen 620' allows for fragmentation. Other screens and different types of screens or screen designs can be used in various embodiments according to the present invention without departing from the scope of the claims herein.
[0105]
FIG. 6C ′ illustrates an exemplary hybridization screen in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 6C ′ shows a screen 621 ′ for controlling the hybrid formation process. Screen 621 'includes a pending hybridization fragmentation expression container identifier field 622'. Such a hybridized fragmentation expression vessel contains the sample that has been fragmented. The sample and experiment information field 624 'provides tracking information about the sample and experiment in the hybridization process. The reserved scan field 626 'provides the hybridized expression and probe array image identification information. FIG. 6C 'also shows a hybrid formation control screen 623' and a hybrid formation control screen 625 '. Screen 623 'provides information about the waiting experiment for performing the hybridization step. Screen 625 'provides information about the experiment that has completed the hybridization step.
[0106]
FIG. 6D ′ shows a grid array control screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 6D ′ shows the lattice arrangement control screen 631 ′. The lattice arrangement control screen 631 'includes a reserved lattice arrangement display area 632' and a completed lattice arrangement display area 634 '. The sample experiment information field 636 'provides information about samples and experiments in the grid array processing. File type information field 638 'provides identification information about the file type, and probe array information field 639' provides identification information about the probe array.
[0107]
FIG. 6E ′ shows a typical cell average analysis screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 6E ′ shows the sample project, experiment name, sample type, probe array type, user name, image data / probe array type, cell average name, image data, cell data, algorithm, and other A screen 641 'having a plurality of fields for entering parameters is shown. Additionally, the results area 642 'provides information regarding the particular image name, cell name, probe array type, and various parameters. The results area provides a success / failure indication for a particular experiment.
[0108]
FIG. 6F ′ shows a typical probe array analysis screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 6F ′ shows the sample project, experiment name, sample type, probe array type, user name, cell data / probe array type, probe array name, probe array data, algorithm, and others. A screen 651 ′ having a plurality of fields for inputting information on the parameters of FIG. FIG. 6F 'also shows a results area 652' having a cell name, a probe array name, a probe array type, a parameter area, and a result area for providing a success / failure indication.
[0109]
FIG. 6G ′ illustrates a composite analysis screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 6G ′ shows multiple fields for entering information about the sample project, experiment name, sample type, user name, detection / anti-detection probe array, compound name, compound data, algorithm, and other parameters. Shown is a screen 661 ′. Further, the screen 661 'includes a result area 662' for displaying a detection chip file name, an anti-chip file name, a compound file name, a parameter area, and a result area for providing a result success / failure display. provide.
[0110]
FIG. 6H 'provides an exemplary sample history screen in certain embodiments according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. The sample history screen 681 'provides a history list of processes that have been completed for a particular sample.
[0111]
FIG. 7A ′ shows a typical expression analysis screen for handling sets in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 7A 'shows a probe array type field 710', a user name field 712 ', an algorithm field 714', a cell average name field 716 ', a parameter field 718', an existing set name field 711 ', a creation update set. A screen 701 'having a plurality of fields including a name field 713' and a result area 719 'is shown. The result area provides fields for the image name, cell name, probe array type, algorithm, set name, and an area for displaying success / failure results for the expression analysis step. In some embodiments, support can be provided for batch analysis of experimental results and user parameter sets.
[0112]
FIG. 7B ′ shows a created set name screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 7B ′ shows the probe array type field 720 ′, the probe array type field used 722 ′, the existing set name field 724 ′, and the scaling and normalization for various chips. Shows a screen 703 'having an area.
[0113]
FIG. 7C ′ shows an expression cell data analysis screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 7C 'shows a screen 705' having a plurality of fields for describing filter parameters. Filtering can be performed on several fields, such as test type, data type, probe array type, date including date, sample project, experiment name, sample type, user name, etc.
[0114]
8A'-8C 'show an exemplary Expression Data Mining Tool (EDMT) screen in certain embodiments according to the present invention. These figures are merely examples and do not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 8A 'shows the EDMT screen 801'. The screen 801 ′ includes a plurality of areas such as an area 802 ′ that provides information on a filter. Filters can be applied to the experimental data to narrow the fields of the data to be mined. The results area 804 'provides the results of the filter data. The graph area 806 'provides multiple graph formats for viewing data.
[0115]
FIG. 8B 'illustrates a filter region, such as filter region 802' of FIG. 8A ', in a particular embodiment according to the present invention. FIG. 8B ′ relates to a project filter 812 ′, a probe array filter 814 ′, a sample type filter 816 ′, an operator filter 818 ′, a sample name filter 820 ′, an experiment filter 822 ′, and an analysis filter 824 ′. Shown is a filter region 802 'with fields. FIG. 8B 'also shows a filter result field to indicate the type of filter applied to the data. The query can be described using the filters in the filter area 802 '. In the presently preferred embodiment, the user can select and query an analysis and then select a range for the results.
[0116]
FIG. 8C ′ illustrates a result area, such as result area 804 ′ of FIG. 8A ′, in certain embodiments according to the invention. FIG. 8C 'shows a results area 804' having an experimental results table 830, a query results table 832 ', and a pivot results table 834'.
[0117]
8D'-8G 'illustrate exemplary graphs in certain embodiments according to the present invention, such as those that can be displayed within graph area 806' of FIG. 8A '. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 8D 'shows a dispersion graph of the experimental results. A variance graph can graph any numerical result on a logarithmic or linear scale. Further, the presently preferred embodiments can provide the ability to have multiple analyzes per axis. A description of the probe set is included on the right side of the graph. A hot link to an external database can also be provided, at least in the preferred embodiment according to the invention. Other options such as filters, point sizes, colors, etc. can be specified by the user.
[0118]
FIG. 8E ′ illustrates a collapsed change graph that can be displayed in the graph area 806 ′ of FIG. 8A ′ in certain embodiments according to the present invention. The full change graph of FIG. 8E 'can be provided using a logarithmic scale or a linear scale, providing a probe set description hotlink to an external database, and the ability to recalculate folding changes according to the present invention. It can also be provided in certain embodiments. In addition, the user can specify options such as point size and color.
[0119]
FIG. 8F 'illustrates an exemplary bar graph in certain embodiments according to the present invention, such as may be displayed in the graph area 806' of FIG. 8A '. The bar graph of FIG. 8F ′ can graph any numerical result, and embodiments can provide the user with the ability to change options such as bar size and color.
[0120]
FIG. 8G ′ shows a typical histogram graph, such as can be displayed in the graph area 806 ′ of FIG. 8A ′. The histogram graph of FIG. 8G 'may provide the ability to display various landmarks to the histogram average difference and may provide the user with the ability to specify options such as pin size, range, color, and the like.
[0121]
FIG. 9A 'shows a query display screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 9A 'shows a name storage inquiry screen 901' having a display area for a plurality of filters. The user can define filters for the system and save them with the reference name displayed by screen 901 '. The filters can be stored in the data mining information database 304 'for later use.
[0122]
FIG. 9B 'shows an annotation screen 903' in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. Annotation screen 903 'provides a mechanism for displaying information about the probe set. Annotations can include annotation text, annotation type, and other useful information. The annotation type is user-defined in the preferred embodiment. A user name can be specified, and data can be specified. In some embodiments, other information can be specified, and in some embodiments not all of this information is specified.
[0123]
FIG. 9C ′ illustrates an example of a display of a probe annotation in a particular embodiment according to the present invention, such as the configuration of the annotation screen 903 ′ of FIG. 9B ′. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 9C 'shows a highlighted line of information 904' where the corresponding probe annotation 906 'is displayed. Probe annotations can provide the name, description, and other useful information of the probe.
[0124]
FIG. 9D 'illustrates a query annotation screen in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 9D ′ shows a query annotation screen 910 ′ having fields for specifying the probe set type, annotation, user identifier, date, and description. Query annotations can provide the ability to specify multiple filters, and can also provide the ability to update annotations.
[0125]
FIG. 9E 'shows a probe set description screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 9E 'shows a probe set description screen 912' with the name of the probe set and the associated description. These descriptions can also be displayed in the expression data mining tool screen 801 'under the results area 804'.
[0126]
FIG. 9F ′ illustrates a search screen for searching an array description in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 9F 'shows a search array having a search field 916' for accepting input and an output field 918 'for displaying probe sets that match the text entered in the input field for the probe set description. The description screen 914 'is shown. The search array description screen 914 'provides the user with the ability to search for descriptions in the database. Users can use the input fields to define search criteria and add results to various filters.
[0127]
FIG. 10A ′ shows a screen for searching an external database in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 10A 'shows a probe set description dialog screen 1002' with a probe set name, description, and various annotations. The user can use the probe set description dialog screen 1002 'to search for information corresponding to the description in the external database. By selecting the input database in the dialog screen 1002 ', a browser window 1004' is displayed. Browser window 1004 'allows for browsing information about gene expression sequences and the like in an external database such as an input database. In a presently preferred embodiment, a URL can be associated with a particular probe set. In addition, multiple URLs can be associated with a particular probe set, and a browser window can be automatically activated by the system to display relevant information about the probe set from an external database.
[0128]
FIG. 10B ′ shows a FAQ display selection screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 10B 'shows a FAQ selection screen 1008' with frequently used multiple searches. The user can perform one of the searches simply by selecting the desired search. When a particular FAQ is selected, a dialog screen 1010 'can be displayed to the user. Dialog screen 1010 'provides a number of questions that the user can answer to define the selected search. In the presently preferred embodiment, the FAQ can be stored in the data mining information database 306 '. Questions, English translations, and SQL statements associated with a particular query may be stored in the database along with the FAQ.
[0129]
FIG. 10C ′ shows a gene chip transfer screen in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. FIG. 10C 'shows a gene chip transfer screen 1022' having a display area for local files in multiple formats 1024 ', a display area 1026' showing the data to be transferred, a status area 1028 ', and a LIMS sample area 1030'. . The transition screen can be used to add gene chip data to the LIMS. In a preferred embodiment, this can facilitate the association of information about samples, experiments, scan data, and results. Further, in some embodiments, a workflow simulation can be performed.
[0130]
FIG. 10D ′ illustrates fluidics station control screens 1031 ′ and 1032 ′ in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. The fluidics control screens 1031 'and 1032' provide the user with the ability to control the fluidics station based on the selection of a particular experiment and protocol. The user can specify the assay type, sample project, reagent, and protocol using the fluidics control screen.
[0131]
FIG. 10E 'illustrates scan control screens 1041' and 1042 'for controlling scanning to a local drive or network in a particular embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. The scan control screens 1041 'and 1042' provide the user with the ability to specify the experiment name, probe array type, number of scans performed, assay type, sample project, experiment, and display of the experiment being scanned. .
[0132]
FIG. 10F ′ shows experimental information screens 1051 ′ and 1052 ′ in a specific embodiment according to the present invention. This diagram is merely an example, which should not limit the scope of the claims herein. Other variations, modifications, and alternatives will be apparent to those skilled in the art. The experiment information screens 1051 'and 1052' give the user the ability to specify the experiment name, probe array, probe array lot, operator, sample type, sample description, project, comments, reagents, and reagent lots. .
[0133]
(Conclusion)
In conclusion, the present invention provides a method for mining experimental information on patterns selectable by a user. One advantage is that this method provides better access to gene expression information than methods known in the art. Another advantage provided by this approach is that multiple experimental results can be effectively mined using, for example, visualization techniques and set theory queries.
[0134]
The examples and embodiments described herein are for illustrative purposes only, and various modifications or alterations may be suggested to one of ordinary skill in the art in the context of this specification, which are intended to provide the spirit and scope of the present application, and It is to be understood that they fall within the scope of the appended claims. For example, tables can be dropped, the contents of multiple tables can be consolidated, and the contents of one or more tables can be distributed across more tables than described herein to query Speed can be improved and / or system maintenance can be assisted. Also, the database architecture and data models described herein are not limited to biological applications and can be used in any application. All publications, patents, and patent applications cited herein are hereby incorporated by reference.
[Brief description of the drawings]
FIG.
FIG. 2 illustrates an exemplary system and process of a particular embodiment according to the present invention for forming and analyzing an array of biological materials such as DNA, RNA, and the like.
FIG. 2A
FIG. 2 illustrates a computer system suitable for use with the exemplary system of FIG.
FIG. 2B
FIG. 2 illustrates a computer network suitable for use with the exemplary system of FIG.
FIG. 3
FIG. 4 is an entity relationship diagram for interpreting a database model.
[FIG. 4A] to [4F]
2 is a database model of a particular embodiment according to the present invention that maintains information used in the system and method of FIG. 1.
FIG. 5A to FIG. 5B
5 is a simplified flow diagram of representative process steps of a selected embodiment according to the present invention.
[FIG. 6A] to [6F]
5 is an exemplary block flow diagram of a particular embodiment according to the present invention.
FIG. 7A to [70]
FIG. 4 illustrates an exemplary user interface screen of a particular embodiment according to the present invention.
[Fig. 1 ']
FIG. 1 illustrates an overall representative system and process of a particular embodiment according to the present invention for forming and analyzing an array of biological materials such as DNA, RNA, and the like.
[Fig. 2A ']-[2B']
FIG. 2 illustrates a computer system of a particular embodiment according to the present invention suitable for use with the overall system of FIG. 1 '.
[FIG. 3A ′ to 3C ′]
5 is a simplified flowchart of representative process steps according to a particular embodiment in accordance with the invention.
[Fig. 4A ']-[4B']
FIG. 3 illustrates an exemplary database structure and data format of a particular embodiment according to the present invention.
[Fig. 5A ']-[5C']
FIG. 4 illustrates a representative automation screen of a particular embodiment according to the present invention.
[Fig. 6A ']-[6H']
FIG. 3 shows a representative expression analysis screen of a specific embodiment according to the present invention.
[Fig. 7A ']-[7C']
FIG. 4 shows a representative expression analysis screen of a specific embodiment according to the present invention for a set.
[Fig. 8A ']-[8G']
FIG. 4 illustrates a representative expression data mining screen of a particular embodiment according to the present invention.
[Fig. 9A ']-[9F']
FIG. 4 illustrates a representative annotation screen in a particular embodiment according to the present invention.
[Fig. 10A ']-[10F']
FIG. 4 shows a representative functional screen of a specific embodiment according to the present invention.

Claims

パターンに関する複数の実験情報をマイニングするためのコンピュータ・ベースの方法であって、
実験およびチップ設計から情報を収集すること、
前記実験および前記チップ設計からマイニングされるものを選択すること、
マイニングされる前記実験に関する複数のグループの少なくとも１つを定義すること、
複数のグループの前記少なくとも１つに基づいて、マイニングされる前記複数の実験に関する情報を選択し、少なくとも結果遺伝子セットを含む複数の結果情報を形成すること、および
前記複数の結果情報をユーザが閲覧できるようにフォーマットすること
を含む方法。A computer-based method for mining a plurality of experimental information on a pattern, the method comprising:
Gather information from experiments and chip design;
Selecting what is mined from the experiment and the chip design;
Defining at least one of a plurality of groups for the experiment to be mined;
Selecting information about the plurality of experiments to be mined based on the at least one of a plurality of groups to form a plurality of result information including at least a result gene set; and a user viewing the plurality of result information A method that includes formatting to allow.

マイニングされる実験が、複数の実験分析の少なくとも１つに基づいて選択される請求項１に記載の方法。The method of claim 1, wherein the experiment to be mined is selected based on at least one of a plurality of experimental analyses.

複数のグループの前記少なくとも１つがサンプル・タイプである請求項１に記載の方法。The method of claim 1, wherein said at least one of a plurality of groups is a sample type.

複数のグループの前記少なくとも１つがサンプル属性である請求項１に記載の方法。The method of claim 1, wherein the at least one of a plurality of groups is a sample attribute.

前記複数のグループが、非階層構成を有するサンプル属性である請求項１に記載の方法。The method of claim 1, wherein the plurality of groups are sample attributes having a non-hierarchical configuration.

さらに、マイニングされる前記実験に実験を追加することを含む請求項１に記載の方法。The method of claim 1, further comprising adding an experiment to the experiment to be mined.

さらに、マイニングされる前記実験に実験を削除することを含む請求項１に記載の方法。The method of claim 1, further comprising deleting experiments from the experiments to be mined.

前記パターンが遺伝子経路である請求項１に記載の方法。2. The method of claim 1, wherein said pattern is a gene pathway.

前記パターンが薬物毒性である請求項１に記載の方法。2. The method of claim 1, wherein said pattern is drug toxicity.

さらに、ユーザが前記結果遺伝子セットに集合論操作を施すことを可能にすることを含む請求項１に記載の方法。The method of claim 1, further comprising allowing a user to perform a set-theoretic operation on the resulting gene set.

発現情報を取り扱うためのコンピュータ・ベースの方法であって、
複数の実験の複数の結果に関する情報を収集すること、
サンプルに関する情報と、前記複数の実験に関する情報とをまとめること、
複数の属性の少なくとも１つを、前記複数の実験に関する前記情報に追加すること、
複数の変換情報を形成するように、前記複数の実験結果を変換すること、
前記複数の変換情報をマイニングすること、および
前記複数の変換情報を視覚化すること
を含む方法。A computer-based method for handling expression information, comprising:
Collecting information about the results of experiments.
Combining information about the sample and information about the plurality of experiments,
Adding at least one of a plurality of attributes to said information about said plurality of experiments;
Converting the plurality of experimental results so as to form a plurality of conversion information;
A method comprising: mining the plurality of pieces of conversion information; and visualizing the plurality of pieces of conversion information.

前記複数の実験に関する前記情報が、複数の実験分析の少なくとも１つを備える請求項１１に記載の方法。The method of claim 11, wherein the information about the plurality of experiments comprises at least one of a plurality of experimental analyzes.

複数の実験分析の前記少なくとも１つが、１つまたは複数の実験分析を備える請求項１２に記載の方法。13. The method of claim 12, wherein the at least one of a plurality of experimental analyzes comprises one or more experimental analyses.

前記変換が正規化することを含み、前記変換情報が正規化される情報を備える請求項１１に記載の方法。The method of claim 11, wherein the transform comprises normalizing, and wherein the transform information comprises information to be normalized.

さらに、前記複数の変換情報を前記マイニングした１つまたは複数の結果を記録することを含む請求項１１に記載の方法。The method of claim 11, further comprising recording one or more results of the mining of the plurality of transformation information.

さらに、前記変換情報に関する理論を引用することを含む請求項１１に記載の方法。The method of claim 11, further comprising citing a theory regarding the conversion information.

パターンに関する複数の実験情報をマイニングするためのコンピュータ・プログラム製品であって、
実験およびチップ設計から情報を収集するためのコードと、
前記実験および前記チップ設計のサブセットを選択するためのコードであって、前記サブセットが、マイニングされる複数の実験であるコードと、
マイニングされる前記実験に関する複数のグループの少なくとも１つを定義するためのコードと、
少なくとも結果遺伝子セットを含む複数の結果情報を形成するように、複数のグループの前記少なくとも１つに基づいて、マイニングされる前記複数の実験に関する情報を選択するためのコードと、
ユーザが閲覧できるように、前記複数の結果情報をフォーマットするためのコードと、
コードを含むためのコンピュータ可読記憶媒体と
を備えるコンピュータ・プログラム製品。A computer program product for mining a plurality of experimental information on a pattern,
Code to gather information from experiments and chip design;
Code for selecting a subset of the experiments and the chip design, wherein the subset is a plurality of experiments to be mined;
Code for defining at least one of a plurality of groups for the experiment to be mined;
Code for selecting information about the plurality of experiments to be mined based on the at least one of a plurality of groups to form a plurality of result information including at least a result gene set;
A code for formatting the plurality of result information so that the user can view the result information;
Computer readable storage medium for containing the code.

複数のグループの前記少なくとも１つがサンプル・タイプである請求項１７に記載のプログラム製品。18. The program product of claim 17, wherein the at least one of a plurality of groups is a sample type.

複数のグループの前記少なくとも１つがサンプル属性である請求項１７に記載のコンピュータ・プログラム製品。The computer program product of claim 17, wherein the at least one of a plurality of groups is a sample attribute.

前記複数のグループが、非階層構成を有するサンプル属性である請求項１７に記載のコンピュータ・プログラム製品。The computer program product of claim 17, wherein the plurality of groups are sample attributes having a non-hierarchical configuration.

さらに、前記マイニングされる実験に実験を追加するためのコードを備える請求項１７に記載のコンピュータ・プログラム製品。18. The computer program product of claim 17, further comprising code for adding an experiment to the mined experiment.

さらに、前記マイニングされる実験に実験を削除するためのコードを備える請求項１７に記載のコンピュータ・プログラム製品。18. The computer program product of claim 17, further comprising code for deleting an experiment from the mined experiment.

前記パターンが遺伝子経路である請求項１７に記載のコンピュータ・プログラム製品。18. The computer program product of claim 17, wherein said pattern is a gene pathway.

前記パターンが薬品毒性である請求項１７に記載のコンピュータ・プログラム製品。18. The computer program product of claim 17, wherein said pattern is drug toxicity.

さらに、ユーザが前記結果遺伝子セットに集合論操作を施すことを可能にするためのコードを備える請求項１７に記載のコンピュータ・プログラム製品。18. The computer program product of claim 17, further comprising code for allowing a user to perform a set-theoretic operation on the resulting gene set.

発現情報を取り扱うためのコンピュータ・プログラム製品であって、
複数の実験の複数の結果に関する情報を収集するためのコードと、
サンプルに関する情報と、前記複数の実験に関する情報とをまとめるためのコードと、
複数の属性の少なくとも１つを、前記複数の実験に関する前記情報に追加するためのコードと、
複数の変換情報を形成するように、前記複数の実験結果を変換するためのコードと、
前記複数の変換情報をマイニングするためのコードと、
前記複数の変換情報を視覚化するためのコードと、
コードを記憶するためのコンピュータ可読記憶媒体と
を備えるコンピュータ・プログラム製品。A computer program product for handling expression information,
Code to collect information about multiple results of multiple experiments,
A code for summarizing information about the sample and information about the plurality of experiments,
Code for adding at least one of a plurality of attributes to the information about the plurality of experiments;
A code for converting the plurality of experimental results so as to form a plurality of conversion information;
A code for mining the plurality of pieces of conversion information,
A code for visualizing the plurality of pieces of conversion information,
A computer readable storage medium for storing code.

さらに、前記変換情報に関する理論を引用するためのコードを備える請求項２６に記載のコンピュータ・プログラム製品。27. The computer program product according to claim 26, further comprising code for citing a theory regarding the conversion information.

変換するための前記コードがさらに、正規化するためのコードを備え、前記変換情報がさらに、正規化された情報を備える請求項２６に記載のコンピュータ・プログラム製品。27. The computer program product of claim 26, wherein the code for converting further comprises code for normalizing, and wherein the conversion information further comprises normalized information.

データベースと、
コンピュータ・メモリと、
プロセッサとを備え、
前記プロセッサが、
複数の実験の複数の結果に関する情報を収集し、
サンプルに関する情報と、前記複数の実験に関する情報とをまとめ、
複数の属性の少なくとも１つを、前記複数の実験に関する前記情報に追加し、
複数の変換情報を形成するように、前記複数の実験結果を変換し、
前記複数の変換情報をマイニングし、かつ
前記複数の変換情報を視覚化するために動作可能に配置する、
発現情報を管理するためのシステム。Database and
Computer memory;
With a processor,
The processor,
Gather information about multiple results from multiple experiments,
Summarizing information about the sample and information about the plurality of experiments,
Adding at least one of a plurality of attributes to the information about the plurality of experiments;
Converting the plurality of experimental results to form a plurality of conversion information,
Mining the plurality of conversion information, and operatively arranged to visualize the plurality of conversion information,
A system for managing expression information.

複数のサンプルに対して行われ、それぞれがサンプル中の特定の遺伝子配列の発現の程度を表示する複数の実験に関する情報を管理するためのコンピュータ・ベースの方法であって、
前記複数のサンプルの少なくとも１つを集中データベースに登録すること、
前記複数のサンプルに関する複数の情報を追跡すること、
前記複数の実験に関する複数の情報を追跡すること、
前記複数の情報から前記複数のサンプルに関するサンプル履歴を生成すること、
複数の発現配列情報を形成するために、ユーザによって入力されたフィルタによって、前記複数の実験に関する前記複数の情報と、前記複数のサンプルに関する前記複数の情報とをフィルタ処理すること、
前記複数の発現配列情報を掲載すること、および
ユーザが前記情報にアクセスすることが可能になるように前記ユーザにウェブ・ベースのユーザ・インターフェースを提供すること
を含む方法。A computer-based method for managing information about a plurality of experiments performed on a plurality of samples, each indicating a degree of expression of a particular gene sequence in the sample, the method comprising:
Registering at least one of said plurality of samples in a centralized database;
Tracking a plurality of information about the plurality of samples;
Tracking a plurality of information about the plurality of experiments;
Generating a sample history for the plurality of samples from the plurality of information;
To form a plurality of expression sequence information, filtering the plurality of information about the plurality of experiments and the plurality of information about the plurality of samples by a filter input by a user,
A method comprising posting said plurality of expressed sequence information and providing said user with a web-based user interface to enable said user to access said information.

前記複数の実験に関する前記情報が、前記複数の実験それぞれのステータスを含む請求項３０に記載の方法。31. The method of claim 30, wherein the information about the plurality of experiments includes a status of each of the plurality of experiments.

前記複数の実験に関する前記情報が、前記複数の実験それぞれに関する結果を含む請求項３０に記載の方法。31. The method of claim 30, wherein the information about the plurality of experiments includes a result for each of the plurality of experiments.

前記複数の実験に関する前記情報が、前記複数の実験それぞれのプローブ・アレイ・タイプを含む請求項３０に記載の方法。31. The method of claim 30, wherein the information about the plurality of experiments includes a probe array type for each of the plurality of experiments.

前記複数の実験に関する前記情報が、前記複数の実験それぞれのプローブ・アレイ・ロット番号を含む請求項３０に記載の方法。31. The method of claim 30, wherein the information about the plurality of experiments includes a probe array lot number for each of the plurality of experiments.

前記複数のサンプルに関する前記情報が、前記複数の実験それぞれのサンプル・タイプを含む請求項３０に記載の方法。31. The method of claim 30, wherein the information about the plurality of samples includes a sample type for each of the plurality of experiments.

前記複数のサンプルに関する前記情報が、前記複数の実験それぞれのサンプル・プロジェクトを含む請求項３０に記載の方法。31. The method of claim 30, wherein the information about the plurality of samples includes a sample project for each of the plurality of experiments.

前記複数の実験が、前記複数のサンプル中の各サンプルについて少なくとも２つの実験を含む請求項３０に記載の方法。31. The method of claim 30, wherein the plurality of experiments includes at least two experiments for each sample in the plurality of samples.

前記複数の実験が、前記複数のサンプル中の少なくとも２つのサンプルに関する１つの実験を含む請求項３０に記載の方法。31. The method of claim 30, wherein the plurality of experiments includes one experiment on at least two of the plurality of samples.

複数の遺伝子発現配列から得られる情報を追跡するためのシステムであって、
データ記憶域を有するサーバを備え、前記サーバが
前記複数のサンプルの少なくとも１つを集中データベースに登録し、
前記複数のサンプルに関する複数の情報を追跡し、
前記複数の実験に関する複数の情報を追跡し、
前記複数の情報から前記複数のサンプルに関するサンプル履歴を生成し、
複数の発現配列情報を形成するために、ユーザによって入力されたフィルタによって、前記複数の実験に関する前記複数の情報と、前記複数のサンプルに関する前記複数の情報とをフィルタ処理し、
前記複数の発現配列情報を掲載し、かつ
ユーザが前記情報にアクセスできるように前記ユーザにウェブ・ベースのユーザ・インターフェースを提供するように配置されているシステム。A system for tracking information obtained from a plurality of gene expression sequences,
A server having data storage, the server registering at least one of the plurality of samples in a centralized database;
Tracking a plurality of information about the plurality of samples;
Tracking a plurality of information about the plurality of experiments,
Generating a sample history for the plurality of samples from the plurality of information;
In order to form a plurality of expression sequence information, by a filter input by a user, the plurality of information about the plurality of experiments, and the plurality of information about the plurality of samples is filtered,
A system arranged to post the plurality of expressed sequence information and to provide the user with a web-based user interface so that the user can access the information.

前記データ記憶域がＧＡＴＣ準拠データベースである請求項３９に記載のシステム。40. The system of claim 39, wherein the data storage is a GATC compliant database.

前記データ記憶域が複数のリレーショナル・データベースである請求項３９に記載のシステム。40. The system of claim 39, wherein said data storage is a plurality of relational databases.

さらに、前記サーバに接続されたクライアントを備え、前記クライアントが、前記サーバの前記データ記憶域に照会を提出するために動作可能に配置され、さらに、前記データ記憶域に含まれる情報を含む前記サーバからの応答を受け取るために動作可能に配置される請求項３９に記載のシステム。The server further comprising a client connected to the server, the client operatively arranged to submit a query to the data storage of the server, and further comprising information contained in the data storage. 40. The system of claim 39, wherein the system is operably arranged to receive a response from the system.

前記クライアントと前記サーバがインターネットワークによって相互接続されている請求項４２に記載のシステム。43. The system of claim 42, wherein said client and said server are interconnected by an internetwork.

複数のサンプルについて行われた複数の実験の結果を閲覧するための方法であって、前記結果が複数のデータベースの少なくとも１つに記憶され、
どのデータベースを照会するかを指定するステップと、
結果を形成するために、複数の照会の少なくとも１つを提出するステップと、
前記結果を閲覧するステップと、
フィルタ処理された結果を形成するために、対象の複数のユーザ指定因子の少なくとも１つによって前記結果をフィルタ処理するステップと、
前記フィルタ処理された結果をグラフの形にするステップとを含む方法。A method for viewing results of a plurality of experiments performed on a plurality of samples, wherein the results are stored in at least one of a plurality of databases,
Specifying which database to query;
Submitting at least one of the plurality of queries to form a result;
Browsing the result;
Filtering the result by at least one of a plurality of user-specified factors of interest to form a filtered result;
Putting the filtered result in the form of a graph.

複数のサンプルについて行われ、それぞれがサンプル中の特定の遺伝子配列の発現の程度の表示を提供する複数の実験に関する情報を管理するためのコンピュータ・プログラム製品であって、
前記複数のサンプルの少なくとも１つを集中データベースに登録するためのコードと、
前記複数のサンプルに関する複数の情報を追跡するためのコードと、
前記複数の実験に関する複数の情報を追跡するためのコードと、
前記複数の情報から前記複数のサンプルに関するサンプル履歴を生成するためのコードと、
複数の発現配列情報を形成するために、ユーザによって入力されたフィルタによって、前記複数の実験に関する前記複数の情報と、前記複数のサンプルに関する前記複数の情報とをフィルタ処理するためのコードと、
前記複数の発現配列情報を掲載するためのコードと、
ユーザが前記複数の発現配列情報にアクセスすることが可能になるように前記ユーザにウェブ・ベースのユーザ・インターフェースを提供するためのコードと、
コードを保持するためのコンピュータ可読記憶媒体と
を備える製品。A computer program product for managing information about a plurality of experiments performed on a plurality of samples, each providing an indication of the degree of expression of a particular gene sequence in the sample,
Code for registering at least one of the plurality of samples in a centralized database;
Code for tracking a plurality of information about the plurality of samples;
Code for tracking a plurality of information about the plurality of experiments;
Code for generating a sample history for the plurality of samples from the plurality of information,
A code for filtering the plurality of pieces of information about the plurality of experiments and the plurality of pieces of information about the plurality of samples by a filter input by a user to form a plurality of expression sequence information,
A code for posting the plurality of expression sequence information,
Code for providing a user with a web-based user interface to enable the user to access the plurality of expression sequence information;
A computer readable storage medium for holding the code.

前記複数の実験に関する前記情報が、前記複数の実験それぞれのステータスを含む請求項４５に記載のコンピュータ・プログラム製品。46. The computer program product of claim 45, wherein the information about the plurality of experiments includes a status of each of the plurality of experiments.

前記複数の実験に関する前記情報が、前記複数の実験それぞれに関する結果を含む請求項４５に記載のコンピュータ・プログラム製品。46. The computer program product of claim 45, wherein the information about the plurality of experiments includes a result for each of the plurality of experiments.

前記複数の実験に関する前記情報が、前記複数の実験それぞれのプローブ・アレイ・タイプを含む請求項４５に記載のコンピュータ・プログラム製品。The computer program product of claim 45, wherein the information about the plurality of experiments includes a probe array type for each of the plurality of experiments.

前記複数の実験に関する前記情報が、前記複数の実験それぞれのプローブ・アレイ・ロット番号を含む請求項４５に記載のコンピュータ・プログラム製品。The computer program product of claim 45, wherein the information about the plurality of experiments includes a probe array lot number for each of the plurality of experiments.

前記複数のサンプルに関する前記情報が、前記複数の実験それぞれのサンプル・タイプを含む請求項４５に記載のコンピュータ・プログラム製品。46. The computer program product of claim 45, wherein the information about the plurality of samples includes a sample type for each of the plurality of experiments.

前記複数のサンプルに関する前記情報が、前記複数の実験それぞれのサンプル・プロジェクトを含む請求項４５に記載のコンピュータ・プログラム製品。The computer program product of claim 45, wherein the information about the plurality of samples includes a sample project for each of the plurality of experiments.

前記複数の実験が、前記複数のサンプル中の各サンプルについて少なくとも２つの実験を含む請求項４５に記載のコンピュータ・プログラム製品。46. The computer program product of claim 45, wherein the plurality of experiments include at least two experiments for each sample in the plurality of samples.

前記複数の実験が、前記複数のサンプル中の少なくとも２つのサンプルに関する１つの実験を含む請求項４５に記載のコンピュータ・プログラム製品。46. The computer program product of claim 45, wherein the plurality of experiments include one experiment on at least two of the plurality of samples.

複数のサンプルについて行われ、それぞれがサンプル中の特定の遺伝子配列の発現の程度の表示を提供する複数の実験に関する情報を管理するためのコンピュータ・ベースの方法であって、
情報のデータベースを形成するために、前記複数のサンプルについて行われた前記複数の実験に関する情報を追跡すること、
追跡ステップの結果を分析すること、
データベースを照会することを含む方法。A computer-based method for managing information about multiple experiments performed on multiple samples, each providing an indication of the degree of expression of a particular gene sequence in the sample,
Tracking information regarding the plurality of experiments performed on the plurality of samples to form a database of information;
Analyzing the results of the tracking steps,
A method that involves querying a database.