WO2014112567A1

WO2014112567A1 - Apparatus for classifying cell groups and method for classifying cell groups

Info

Publication number: WO2014112567A1
Application number: PCT/JP2014/050720
Authority: WO
Inventors: 真也黒田; 大樹浅野; 新介宇田; 峰夫黒川; 裕矢冨
Original assignee: 国立大学法人東京大学
Priority date: 2013-01-17
Filing date: 2014-01-16
Publication date: 2014-07-24

Abstract

This method for classifying cell groups includes: retaining population data obtained on the basis of measured data of cell group samples which have been diagnosed as normal; creating processing data, the similarity of which to the population data can be assayed, on the basis of measured data obtained about cell groups which are to be processed; assaying the similarity of the processing data to the population data retained by a retaining means; and classifying the cell groups which are to be processed into multiple predefined groups according to prescribed criteria based on the results of assays.

Description

細胞群分類装置及び、細胞群分類方法Cell group classification device and cell group classification method

　本発明は、処理の対象となった細胞群を分類する細胞群分類装置及び、細胞群分類方法に関する。 The present invention relates to a cell group classification apparatus and a cell group classification method for classifying a cell group to be processed.

　近年、骨髄異形成症候群（ＭＤＳ）に対してフローサイトメトリー（ＦＣＭ）を用いた解析が行われており、この解析に基づく予後予測モデル等が提唱されている（非特許文献１）。 In recent years, analysis using flow cytometry (FCM) has been performed for myelodysplastic syndrome (MDS), and a prognosis prediction model based on this analysis has been proposed (Non-patent Document 1).

　しかしながら、上記従来のフローサイトメトリーを用いた解析においては、異常となった細胞のみを対象として解析している。一方、現実的なＭＤＳ等の病態では、異常細胞だけでなく、当該異常細胞の周囲にある免疫細胞や正常細胞との関係に着目する必要がある。つまり、従来の技術では、現実的な病態との関係での解析が適切に行われていない。 However, in the analysis using the conventional flow cytometry, only the abnormal cells are analyzed. On the other hand, in a realistic disease state such as MDS, it is necessary to pay attention to not only abnormal cells but also the relationship with immune cells and normal cells around the abnormal cells. That is, in the conventional technique, analysis in relation to a realistic disease state is not appropriately performed.

　本発明は上記実情に鑑みて為されたもので、現実的な病態を考慮した解析を行うことのできる細胞群分類装置及び、細胞群分類方法を提供することを、その目的の一つとする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a cell group classification device and a cell group classification method capable of performing analysis in consideration of a realistic pathological condition.

　上記従来例の問題点を解決するための本発明は、細胞群分類装置であって、正常と診断された細胞群サンプルの測定データに基づいて得られる母集団データを保持する保持手段と、処理対象となった細胞群について得た測定データに基づいて、前記母集団データとの類似性を検定可能な処理データを生成する手段と、前記生成した処理データと、前記保持手段に保持された母集団データとの類似性を検定する検定手段と、前記検定の結果に基づく所定の基準に従い、前記処理対象となった細胞群を、予め定めた複数のグループのいずれかに分類する分類手段と、を含むこととしたものである。 The present invention for solving the problems of the above-mentioned conventional example is a cell group classification device, a holding means for holding population data obtained based on measurement data of a cell group sample diagnosed as normal, and processing Based on the measurement data obtained for the target cell group, means for generating processing data that can be tested for similarity to the population data, the generated processing data, and a mother held in the holding means A test means for testing the similarity with the population data, a classification means for classifying the cell group to be processed into one of a plurality of predetermined groups according to a predetermined standard based on the result of the test, Is to be included.

　またここで前記保持手段は、予め定めた年齢層ごとに、各年齢層に属する年齢の提供者から提供され、正常と診断された細胞群サンプルの測定データに基づいて得られる、年齢層ごと母集団データを保持し、前記検定手段は、処理対象となった細胞群の提供者の年齢が属する前記年齢層に対応する、年齢層ごと母集団データと、前記処理データとの類似性を検定することとしてもよい。
　また前記測定データは、フローサイトメトリーによって得られる複数のパラメータであり、前記母集団データと前記処理データとは、当該パラメータに基づいて得られるｍ次元（ｍは自然数）の分布データであってもよい。 Further, here, the holding means is provided for each predetermined age group provided by a provider of an age belonging to each age group, and is obtained based on measurement data of cell group samples diagnosed as normal. Holding the population data, the testing means tests the similarity between the population data for each age group corresponding to the age group to which the age of the provider of the cell group to be processed belongs and the processing data It is good as well.
The measurement data is a plurality of parameters obtained by flow cytometry, and the population data and the processing data are m-dimensional (m is a natural number) distribution data obtained based on the parameters. Good.

　また本発明の一態様に係る細胞群分類方法は、正常と診断された細胞群サンプルの測定データに基づく母集団データを取得する工程、処理対象となった細胞群について得た測定データに基づいて、前記母集団データとの類似性を検定可能な処理データを生成する工程、前記生成した処理データと、前記保持手段に保持された母集団データとの類似性を検定する工程、及び前記検定の結果に基づく所定の基準に従い、前記処理対象となった細胞群を、予め定めた複数のグループのいずれかに分類する工程、を含むこととしたものである。 The cell group classification method according to one embodiment of the present invention includes a step of acquiring population data based on measurement data of a cell group sample diagnosed as normal, and based on measurement data obtained for a cell group to be processed. , Generating process data that can be tested for similarity to the population data, testing the similarity between the generated process data and the population data held in the holding means, and In accordance with a predetermined standard based on the result, the cell group to be processed is classified into one of a plurality of predetermined groups.

　本発明によると、現実的な病態を考慮した解析を行うことができる。 According to the present invention, it is possible to perform an analysis in consideration of a realistic disease state.

本発明の実施の形態に係る細胞群分類装置の構成例を表すブロック図である。It is a block diagram showing the example of a structure of the cell group classification device based on embodiment of this invention. 本発明の実施の形態に係る細胞群分類装置の例を表す機能ブロック図である。It is a functional block diagram showing the example of the cell group classification device concerning an embodiment of the invention. 本発明の実施の形態に係る細胞群分類装置が分類の処理で用いる既知結果情報の例を表す説明図である。It is explanatory drawing showing the example of the known result information which the cell group classification device based on embodiment of this invention uses by the process of classification. 本発明の実施の形態の一例に係る細胞群分類装置が利用する年齢層別の母集団データの例を表す説明図である。It is explanatory drawing showing the example of the population data according to age group which the cell group classification device concerning an example of an embodiment of the invention uses. 本発明の実施の形態に係る細胞群分類装置が生成する二次元分布データの例を表す説明図である。It is explanatory drawing showing the example of the two-dimensional distribution data which the cell group classification device which concerns on embodiment of this invention produces | generates. 本発明の実施の形態に係る細胞群分類装置による適応的パーティショニング処理の概要を表す説明図である。It is explanatory drawing showing the outline | summary of the adaptive partitioning process by the cell group classification device which concerns on embodiment of this invention. 本発明の実施の形態に係る細胞群分類装置が生成する密度分布関数の例を表す説明図である。It is explanatory drawing showing the example of the density distribution function which the cell group classification device concerning an embodiment of the invention generates. 本発明の実施の形態に係る細胞群分類装置による尤度比統計量の演算例を表す説明図である。It is explanatory drawing showing the example of calculation of the likelihood ratio statistic by the cell group classification device concerning an embodiment of the invention. 本発明の実施の形態に係る細胞群分類装置による尤度比統計量の検定例を表す説明図である。It is explanatory drawing showing the test example of likelihood ratio statistic by the cell group classification | category apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る細胞群分類装置により得られた結果に基づく生存曲線の例を表す説明図である。It is explanatory drawing showing the example of the survival curve based on the result obtained by the cell group classification device concerning an embodiment of the invention.

　本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る細胞群分類装置１は、図１に例示するように、制御部１１、記憶部１２、操作部１３、出力部１４、及び分析情報入力部１５を含んで構成され、フローサイトメトリー機器２に接続される。本実施の形態のある例では、この細胞群分類装置１により本発明の実施の形態に係る細胞群分類方法が実行される。 Embodiments of the present invention will be described with reference to the drawings. The cell group classification apparatus 1 according to the embodiment of the present invention includes a control unit 11, a storage unit 12, an operation unit 13, an output unit 14, and an analysis information input unit 15 as illustrated in FIG. , Connected to the flow cytometry instrument 2. In an example of the present embodiment, the cell group classification apparatus 1 executes the cell group classification method according to the embodiment of the present invention.

　ここに制御部１１は、ＣＰＵ（Central Processing Unit）等のプログラム制御デバイスであり、記憶部１２に格納されたプログラムに従って動作する。本実施の形態においてこの制御部１１は、後に説明する記憶部１２に格納されている、正常と診断された細胞群サンプルの測定データに基づいて得られる母集団データを用いた処理を行う。制御部１１は、処理対象となった細胞群について得た測定データに基づいて、母集団データとの類似性を検定可能な処理データを生成し、当該生成した処理データと、母集団データとの類似性を検定する。そして制御部１１は、この検定の結果に基づく所定の基準に従い、処理対象となった細胞群を、予め定めた複数のグループのいずれかに分類して、当該分類の結果を表す情報を出力する。この制御部１１の詳しい処理の内容は後に説明する。 Here, the control unit 11 is a program control device such as a CPU (Central Processing Unit), and operates according to a program stored in the storage unit 12. In the present embodiment, the control unit 11 performs processing using population data obtained based on measurement data of cell group samples diagnosed as normal, which are stored in a storage unit 12 described later. The control unit 11 generates processing data that can be tested for similarity to the population data based on the measurement data obtained for the cell group to be processed, and the generated processing data and the population data Test for similarity. And the control part 11 classify | categorizes the cell group used as the process target into either of several predetermined groups according to the predetermined reference | standard based on this test result, and outputs the information showing the result of the said classification | category . Details of the processing of the control unit 11 will be described later.

　記憶部１２は、メモリやディスクデバイスを含んで構成される。この記憶部１２には、正常と診断された細胞群サンプル（処理対象となる細胞群と同じ組織の細胞群のサンプル）の測定データに基づいて得られる母集団データが格納されている。本実施の形態の一例では、この測定データは、正常と診断された検体から採取された細胞群を、フローサイトメトリー機器にかけて得られる複数のパラメータであり、母集団データは、当該パラメータに基づいて得られるｍ次元（ｍは自然数）の分布データである。 The storage unit 12 includes a memory and a disk device. The storage unit 12 stores population data obtained based on measurement data of a cell group sample diagnosed as normal (a sample of a cell group of the same tissue as the cell group to be processed). In one example of the present embodiment, the measurement data is a plurality of parameters obtained by applying a cell group collected from a specimen diagnosed as normal to a flow cytometry device, and the population data is based on the parameters. This is m-dimensional (m is a natural number) distribution data obtained.

　具体的に本実施の形態の一例では、この母集団データは、正常と診断されている複数の検体から得た各細胞群についてのフローサイトメトリーにて、少なくともＦＳＣ（前方散乱光）、ＳＳＣ（側方散乱光）、及び所定の細胞表面マーカー（複数種類）の測定データ（蛍光強度を表す情報）をそれぞれ得て、これらを累算し、累算結果について母集団の密度分布関数を推定したものである。ここで密度分布関数の推定は、適応パーティショニング法、カーネル密度推定（Parzen E. (1962). On estimation of a probability density function and mode, Ann. Math. Stat. 33, pp. 1065-1076.）など種々の方法を用いることができる。つまり、母集団データは、例えばこれらの密度分布関数の推定演算の結果であり、この推定演算の方法は種々の統計演算ソフトウエアにおいても広く知られたものを利用できるので、ここでの詳しい説明を省略する。 Specifically, in one example of the present embodiment, this population data is obtained by flow cytometry for each cell group obtained from a plurality of specimens diagnosed as normal, at least FSC (forward scattered light), SSC ( Side-scattered light) and measurement data (information indicating fluorescence intensity) of a given cell surface marker (plural types) were obtained, accumulated, and the population density distribution function was estimated for the accumulated result. Is. The density distribution function is estimated by adaptive partitioning method, kernel density estimation (Parzen E. (1962). On estimation of a probability density function and mode, Ann. Math.athStat. 33, pp. 1065-1076.) Various methods can be used. In other words, the population data is, for example, the result of an estimation calculation of these density distribution functions, and this estimation calculation method can be used widely in various statistical calculation software. Is omitted.

　本実施の形態の一例では、正常と診断されている複数の検体から得た各細胞群についてのフローサイトメトリーにて、ＦＳＣ（前方散乱光）、ＳＳＣ（側方散乱光）、及び所定の細胞表面マーカーの測定データ（蛍光強度を表す情報）を得る。そして細胞表面マーカーの種類ごとに、Ｘ軸をＦＳＣ，Ｙ軸をＳＳＣとする第１の二次元分布データと、Ｘ軸をＦＳＣ，Ｙ軸を細胞表面マーカーの測定データとする第２の二次元分布データと、Ｘ軸をＳＳＣ，Ｙ軸を細胞表面マーカーの測定データとする第３の二次元分布データとを生成し、これらの各二次元分布データについて密度分布関数を例えばカーネル密度推定法等により推定し、その結果を母集団データとして記憶部１２に格納しておくものとする。つまり、ｒ種類の細胞表面マーカーを測定した場合、３ｒ個の二次元分布データが生成され、３ｒ個の母集団データが記憶部１２に保持される。 In an example of the present embodiment, FSC (forward scattered light), SSC (side scattered light), and predetermined cells are obtained by flow cytometry for each cell group obtained from a plurality of specimens diagnosed as normal. Measurement data (information indicating fluorescence intensity) of the surface marker is obtained. For each type of cell surface marker, the first two-dimensional distribution data with the X axis as FSC and the Y axis as SSC, and the second two dimensional with the X axis as FSC and the Y axis as measurement data for the cell surface marker Generation of distribution data and third two-dimensional distribution data with the X axis as SSC and the Y axis as cell surface marker measurement data, and a density distribution function for each of these two-dimensional distribution data, such as a kernel density estimation method, etc. And the result is stored in the storage unit 12 as population data. That is, when r types of cell surface markers are measured, 3r two-dimensional distribution data is generated, and 3r population data is stored in the storage unit 12.

　また、この記憶部１２には、予め疾患の経過が知られている複数の細胞群に含まれる細胞群ごとの検定結果の情報（既知結果情報）が保持される。この既知結果情報については後に述べる。 In addition, the storage unit 12 stores information on test results (known result information) for each cell group included in a plurality of cell groups whose disease progress is known in advance. This known result information will be described later.

　さらにこの記憶部１２は、制御部１１によって実行されるプログラムを保持する。このプログラムは、ＤＶＤ-ＲＯＭ等のコンピュータ可読な記録媒体に格納されて提供され、この記憶部１２に格納されたものであってもよい。またこの記憶部１２は、制御部１１のワークメモリとしても動作する。 Further, the storage unit 12 holds a program executed by the control unit 11. The program may be provided by being stored in a computer-readable recording medium such as a DVD-ROM and stored in the storage unit 12. The storage unit 12 also operates as a work memory for the control unit 11.

　操作部１３は、キーボードやマウス等を含む。操作部１３は、利用者からの操作を受け入れて、当該操作の内容を表す情報を、制御部１１に出力する。出力部１４は、ディスプレイや、プリンタ、その他の出力デバイスであり、制御部１１から入力される指示に従って情報を出力する。分析情報入力部１５は、フローサイトメトリー機器２に接続されるインタフェースであり、フローサイトメトリー機器２から入力される測定データを制御部１１に出力する。 The operation unit 13 includes a keyboard and a mouse. The operation unit 13 accepts an operation from the user and outputs information representing the content of the operation to the control unit 11. The output unit 14 is a display, a printer, or other output device, and outputs information according to an instruction input from the control unit 11. The analysis information input unit 15 is an interface connected to the flow cytometry device 2 and outputs measurement data input from the flow cytometry device 2 to the control unit 11.

　次に、本実施の形態の制御部１１の動作について説明する。本実施の形態の制御部１１は、記憶部１２に格納されたプログラムを実行し、機能的には、図２に示すように、処理データ生成部２１と、母集団データ取得部２２と、検定部２３と、分類部２４と、結果提示部２５とを含んで構成される。 Next, the operation of the control unit 11 of this embodiment will be described. The control unit 11 of the present embodiment executes a program stored in the storage unit 12, and functionally, as shown in FIG. 2, a processing data generation unit 21, a population data acquisition unit 22, and a test The unit 23, the classification unit 24, and the result presentation unit 25 are configured.

　ここで処理データ生成部２１は、処理対象となった細胞群について得た測定データを、分析情報入力部１５を介してフローサイトメトリー機器２から取得する。そして処理データ生成部２１は、記憶部１２に格納されている、母集団データとの類似性を検定可能な処理データを生成する。具体的にこの処理データは次のように生成できる。フローサイトメトリー機器２で得られた測定データ（複数のパラメータ）に基づき、ｍ次元の分布データを生成する。この分布データは例えば、少なくともＦＳＣ（前方散乱光）、ＳＳＣ（側方散乱光）、及び母集団データの元となったものと同じ、所定のｒ種類の細胞表面マーカーの測定データの２＋ｒ個の情報を含む、少なくとも（２＋ｒ）次元の分布データである。
　処理データ生成部２１は、母集団データから母集団データを得るのと同様に、この分布データに基づいてｒ種類の細胞表面マーカーの種類ごとに、Ｘ軸をＦＳＣ，Ｙ軸をＳＳＣとする第１の二次元分布データと、Ｘ軸をＦＳＣ，Ｙ軸を細胞表面マーカーの測定データとする第２の二次元分布データと、Ｘ軸をＳＳＣ，Ｙ軸を細胞表面マーカーの測定データとする第３の二次元分布データとを生成する。そしてこれら３ｒ個の二次元分布データの各々について密度分布関数を推定し、その結果を処理データとして検定部２３に出力する。この密度分布関数の推定もまた、母集団データを得るときに用いたのと同様に、適応パーティショニング法、カーネル密度推定など種々の方法を用いることができる。 Here, the processing data generation unit 21 acquires measurement data obtained for the cell group to be processed from the flow cytometry device 2 via the analysis information input unit 15. Then, the processing data generation unit 21 generates processing data that can be tested for similarity with the population data stored in the storage unit 12. Specifically, this processing data can be generated as follows. Based on the measurement data (a plurality of parameters) obtained by the flow cytometry device 2, m-dimensional distribution data is generated. The distribution data is, for example, 2 + r pieces of measurement data of predetermined r types of cell surface markers that are the same as the source of at least FSC (forward scattered light), SSC (side scattered light), and population data. At least (2 + r) -dimensional distribution data including information.
Similarly to obtaining population data from population data, the processing data generation unit 21 sets the X axis as FSC and the Y axis as SSC for each of the r types of cell surface markers based on this distribution data. First two-dimensional distribution data, second two-dimensional distribution data in which the X-axis is FSC, the Y-axis is cell surface marker measurement data, the X-axis is SSC, and the Y-axis is cell surface marker measurement data. 3 two-dimensional distribution data. Then, a density distribution function is estimated for each of the 3r two-dimensional distribution data, and the result is output to the test unit 23 as processing data. For the estimation of the density distribution function, various methods such as an adaptive partitioning method and a kernel density estimation can be used in the same manner as used when obtaining population data.

　母集団データ取得部２２は、記憶部１２に格納されている、ｒ種類の細胞表面マーカーの各々に係る第１から第３の二次元分布データごとに、それぞれに対応する母集団データを読み出して検定部２３に出力する。検定部２３は、処理データ生成部２１から入力される処理データ（以下、この処理データをｄpと書く）の各二次元分布データと、母集団データ取得部２２から入力される母集団データ（以下、この母集団データをｄeと書く）の対応する二次元分布データとの類似性を検定する。具体的にこの検定部２３は、一対のｍ次元空間内（ｍは自然数、ここではｍ＝２）の分布データｄp（ｘ1，ｘ2，…ｘm），ｄe（ｘ1，ｘ2，…ｘm）について、当該ｍ次元空間を複数の領域（ビン）Ｒ1，Ｒ2…に区分したときの各領域（ビン）Ｒi（ｉ＝１，２，…ｋ）内のそれぞれのデータの総和Ｄpi，Ｄei（ｉ＝１，２，…ｋ）を演算する。 The population data acquisition unit 22 reads population data corresponding to each of the first to third two-dimensional distribution data relating to each of the r types of cell surface markers stored in the storage unit 12. The data is output to the test unit 23. The test unit 23 includes two-dimensional distribution data of processing data (hereinafter, this processing data is written as dp) input from the processing data generation unit 21 and population data (hereinafter referred to as population data acquisition unit 22). , This population data is written as de), and the similarity to the corresponding two-dimensional distribution data is tested. Specifically, the test unit 23 performs the distribution data dp (x1, x2,... Xm) and de (x1, x2,... Xm) in a pair of m-dimensional spaces (m is a natural number, here m = 2). The sum Dpi, Dei (i = 1) of the respective data in each region (bin) Ri (i = 1, 2,... K) when the m-dimensional space is divided into a plurality of regions (bins) R1, R2,. , 2, ... k).

　一例として、この領域Ｒiは、例えばｘs_j_min＜ｘs_j＜ｘs_j_max（s＝１，２，…ｍ、ｊ＝１，２，…）で区切られた領域内とすることができ、互いに重なり合わないように設定されているものとする。 As an example, this area Ri can be in an area delimited by, for example, xs_j_min <xs_j <xs_j_max (s = 1, 2,... M, j = 1, 2,...), So as not to overlap each other. It is assumed that it is set.

　検定部２３は、各領域（ビン）Ｒi（ｉ＝１，２，…ｋ）内のそれぞれのデータの総和Ｄpi，Ｄeiについて、次の（１）式を用いて細胞表面マーカーごと、かつ二次元分布データごとの尤度比統計量κn_1，κn_2，κn_3のそれぞれを演算する。 The test unit 23 uses the following equation (1) for the total sum Dpi, Dei of each data in each region (bin) Ri (i = 1, 2,... K), and for each two-dimensional cell surface marker. Each likelihood ratio statistic κn_1, κn_2, κn_3 for each distribution data is calculated.

　これらの尤度比統計量κn_1，κn_2，κn_3は、サンプルサイズが大きい場合に、それぞれ自由度ｋ-１のカイ二乗分布に漸近するもので、分布が異なるほど、この値の絶対値が大きくなる。ここでの例では、検定部２３は、ｒ種類の細胞表面マーカーごとに、第１から第３の二次元分布データの各々について互いに対応する二次元分布データごとに尤度比統計量κn_1，κn_2，κn_3を得る。そして検定部２３は、ｒ種類の細胞表面マーカーごとに得られる、二次元分布データごとの尤度比統計量κn_1，κn_2，κn_3を累算して、ｒ種類の細胞表面マーカーごとのｒ個の尤度比統計量κnを得る。 These likelihood ratio statistics κn_1, κn_2, and κn_3 are each asymptotic to a chi-square distribution with k-1 degrees of freedom when the sample size is large, and the absolute value of this value increases as the distribution differs. . In this example, the test unit 23 performs likelihood ratio statistics κn_1 and κn_2 for each of the two-dimensional distribution data corresponding to each of the first to third two-dimensional distribution data for each of the r types of cell surface markers. , Κn_3 is obtained. Then, the test unit 23 accumulates the likelihood ratio statistics κn_1, κn_2, κn_3 for each two-dimensional distribution data obtained for each of the r types of cell surface markers to obtain r pieces of r for each of the r types of cell surface markers. A likelihood ratio statistic κn is obtained.

　ここで記憶部１２に格納されている既知結果情報について述べる。既知結果情報は、図３に例示するように、予め疾患の経過が知られている複数の細胞群について、当該疾患の経過を表す情報（Ｉ）と、尤度比統計量κnとを関連付けたものである。本実施の形態のここでの例では、この尤度比統計量κnは、ｒ種類の細胞表面マーカーごとに得られたものとなる。分類部２４は、この既知結果情報を用いて処理を行う。 Here, the known result information stored in the storage unit 12 will be described. As shown in FIG. 3, the known result information associates information (I) indicating the progress of the disease and likelihood ratio statistic κn for a plurality of cell groups in which the progress of the disease is known in advance. Is. In this example of the present embodiment, this likelihood ratio statistic κn is obtained for each of the r types of cell surface markers. The classification unit 24 performs processing using this known result information.

　すなわち分類部２４は、検定部２３における演算結果に基づく所定の基準に従い、処理対象となった細胞群を、予め定めた複数のグループのいずれかに分類する。具体的にこの分類部２４は、処理データに基づいて生成したｒ種類の細胞表面マーカーごとの尤度比統計量κnと、既知結果情報に含まれる複数の細胞群ごとのｒ種類の細胞表面マーカーごとの尤度比統計量κnとのそれぞれを、尤度比統計量κnを要素とした２つのｒ次元のベクトルとして用い、クラスタリング処理を行う。このクラスタリングの処理の方法はｒ次元のベクトル間の距離を用いる最長距離法や、メディアン法等各種の広く知られた処理を用いることができる。 That is, the classification unit 24 classifies the cell group to be processed into one of a plurality of predetermined groups according to a predetermined standard based on the calculation result in the test unit 23. Specifically, the classification unit 24 calculates the likelihood ratio statistic κn for each of the r types of cell surface markers generated based on the processing data, and the r types of cell surface markers for each of a plurality of cell groups included in the known result information. Each likelihood ratio statistic κn is used as two r-dimensional vectors having the likelihood ratio statistic κn as elements, and clustering processing is performed. As the clustering processing method, various widely known processes such as a longest distance method using a distance between r-dimensional vectors and a median method can be used.

　分類部２４は、例えばこのクラスタリングの処理により処理対象となった細胞群と、既知結果情報に係る細胞群とを複数のグループに分割する。結果提示部２５は、この分類部２４によって得られた複数のグループの各々に属する既知結果情報に係る細胞群の情報（例えば疾患の経過を表す情報）と、処理データに基づいて生成した尤度比統計量κnをどのグループに分類したか（つまり、処理対象となった細胞群をどのグループに分類したか）を表す情報とを出力部１４を介して出力する。 The classification unit 24 divides, for example, the cell group that has been processed by the clustering process and the cell group related to the known result information into a plurality of groups. The result presentation unit 25 uses the cell group information (for example, information indicating the progress of the disease) related to the known result information belonging to each of the plurality of groups obtained by the classification unit 24, and the likelihood generated based on the processing data. Information indicating to which group the ratio statistic κn is classified (that is, to which group the group of cells to be processed is classified) is output via the output unit 14.

　なお、この結果提示部２５は、さらにこの分類の結果を用いて各グループに属する既知結果情報に係る細胞群の生存曲線（Kaplan-Mayer生存曲線）を提示してもよい。 In addition, this result presentation unit 25 may further present a survival curve (Kaplan-Mayer survival curve) of a cell group related to known result information belonging to each group using the result of this classification.

　本実施の形態は、以上の構成を有してなり、次のように動作する。処理の対象として被験者から採取した細胞群をフローサイトメトリー機器２を用いた測定にかける。細胞群分類装置１は、処理対象となった細胞群について得た測定データをフローサイトメトリー機器２から取得すると、この測定データ（複数のパラメータ）に基づき、ｍ次元の分布データを、処理データとして生成する。 This embodiment has the above configuration and operates as follows. A group of cells collected from a subject as a treatment target is subjected to measurement using the flow cytometry device 2. When the cell group classification apparatus 1 obtains measurement data obtained from the cell group to be processed from the flow cytometry device 2, the m-dimensional distribution data is processed as processing data based on the measurement data (a plurality of parameters). Generate.

　細胞群分類装置１は、記憶部１２に格納されている、ｒ種類の細胞表面マーカーに係る第１から第３の二次元分布データごとに、それぞれに対応する母集団データを読み出す。そして細胞群分類装置１は、生成した処理データｄpの各二次元分布データと、記憶部１２から読出した母集団データｄeの対応する二次元分布データとの類似性を検定する。具体的にこの検定は次のように行う。すなわち細胞群分類装置１は、一対のｍ次元空間内（ここではｍ＝２）の分布データｄp（ｘ1，ｘ2，…ｘm），ｄe（ｘ1，ｘ2，…ｘm）について、当該ｍ次元空間を複数の領域（ビン）Ｒ1，Ｒ2…に区分したときの各領域（ビン）Ｒi（ｉ＝１，２，…ｋ）内のそれぞれのデータの総和Ｄpi，Ｄei（ｉ＝１，２，…ｋ）を演算する。そして、細胞群分類装置１は、各領域（ビン）Ｒi（ｉ＝１，２，…ｋ）内のそれぞれのデータの総和Ｄpi，Ｄeiについて、（１）式により尤度比統計量を演算する。ここでの例では、ｒ種類の細胞表面マーカーごとに、第１から第３の二次元分布データの各々について互いに対応する二次元分布データのそれぞれについての尤度比統計量κn_1，κn_2，κn_3を演算することとなる。 The cell group classification device 1 reads population data corresponding to each of the first to third two-dimensional distribution data related to the r types of cell surface markers stored in the storage unit 12. Then, the cell group classification device 1 tests the similarity between each two-dimensional distribution data of the generated processing data dp and the corresponding two-dimensional distribution data of the population data de read from the storage unit 12. Specifically, this test is performed as follows. That is, the cell group classification device 1 uses the m-dimensional space for the distribution data dp (x1, x2,... Xm), de (x1, x2,... Xm) in a pair of m-dimensional spaces (here, m = 2). The sum Dpi, Dei (i = 1, 2,... K) of each data in each region (bin) Ri (i = 1, 2,... K) when divided into a plurality of regions (bins) R1, R2,. ) Is calculated. Then, the cell group classification device 1 calculates a likelihood ratio statistic based on the equation (1) for the sum Dpi, Dei of each data in each region (bin) Ri (i = 1, 2,... K). . In this example, for each of the r types of cell surface markers, likelihood ratio statistics κn_1, κn_2, and κn_3 for each of the two-dimensional distribution data corresponding to each of the first to third two-dimensional distribution data are calculated. Will be calculated.

　細胞群分類装置１は、ｒ種類の細胞表面マーカーごとに、第１から第３の二次元分布データの各々について互いに対応する二次元分布データのそれぞれについての尤度比統計量κn_1，κn_2，κn_3を得、さらにこれらを累算して、ｒ種類の細胞表面マーカーごとのｒ個の尤度比統計量κnを得る。 The cell group classification device 1 uses the likelihood ratio statistics κn_1, κn_2, κn_3 for each of the two-dimensional distribution data corresponding to each of the first to third two-dimensional distribution data for each of the r types of cell surface markers. Are further accumulated to obtain r likelihood ratio statistics κn for each of the r cell surface markers.

　次に細胞群分類装置１は、処理データに基づいて生成したｒ種類の細胞表面マーカーごとの尤度比統計量κnを要素としたｒ次元のベクトルと、記憶部１２に格納されている、既知結果情報に含まれる複数の細胞群ごとのｒ種類の細胞表面マーカーごとの尤度比統計量κnを要素としたｒ次元のベクトルとを用い、広く知られた方法でクラスタリング処理を行う。 Next, the cell group classification device 1 stores the r-dimensional vector having the likelihood ratio statistic κn for each of the r types of cell surface markers generated based on the processing data as an element, and is stored in the storage unit 12. Clustering processing is performed by a widely known method using r-dimensional vectors having likelihood ratio statistics κn as elements of r types of cell surface markers for each of a plurality of cell groups included in the result information.

　そして細胞群分類装置１は、複数のグループの各々に属する既知結果情報に係る細胞群の情報（例えば疾患の経過を表す情報）と、処理データに基づいて生成した尤度比統計量κnをどのグループに分類したか（つまり、処理対象となった細胞群をどのグループに分類したか）を表す情報とを出力部１４を介して出力する。 Then, the cell group classification device 1 determines which cell group information (for example, information indicating the progress of the disease) related to the known result information belonging to each of the plurality of groups and the likelihood ratio statistic κn generated based on the processing data. Information indicating whether the cells are classified into groups (that is, the group into which the cell group to be processed is classified) is output via the output unit 14.

　さらに本実施の形態の細胞群分類装置１では、正常と診断されている細胞群サンプルの測定データを、各細胞群サンプルの提供者の年齢層別に分割し、記憶部１２は、図４に例示するように、年齢層を表す情報（年齢層に属する最低年齢と最高年齢とで特定可能な情報）と、各年齢層別の測定データに基づいて生成した母集団データ（年齢層ごと母集団データ）とを関連付けて保持してもよい。 Furthermore, in the cell group classification apparatus 1 of the present embodiment, the measurement data of the cell group sample diagnosed as normal is divided according to the age group of the provider of each cell group sample, and the storage unit 12 is illustrated in FIG. As described above, information that represents the age group (information that can be specified by the lowest age and the highest age that belong to the age group) and population data that is generated based on the measurement data for each age group (population data for each age group) ) May be held in association with each other.

　この場合、制御部１１は、処理対象となった細胞群の提供者の年齢を操作部１３等から受け入れる。そして母集団データ取得部２２は、当該受け入れた年齢が属する年齢層に対応する、年齢層ごと母集団データを取得して検定部２４に出力する。そして検定部２４が、当該取得された年齢層ごと母集団データと、処理対象となった細胞群から得られた処理データとの類似性を検定する。 In this case, the control unit 11 accepts the age of the provider of the cell group to be processed from the operation unit 13 or the like. The population data acquisition unit 22 acquires population data for each age group corresponding to the age group to which the accepted age belongs, and outputs the population data to the testing unit 24. And the test | inspection part 24 tests the similarity of the acquired population data for every age group, and the process data obtained from the cell group used as the process target.

　またここでは、本実施の形態の細胞群分類装置１は、測定データに基づく分布データや、母集団データとして、ｍ次元（ｒ種類の細胞表面マーカーを用いる場合、ｍ＝ｒ＋２）の分布データから得た３ｒ個の２次元のデータを用いていた。しかしながら本実施の形態は、これに限られない。例えば、Ｘ軸をＦＳＣ，Ｙ軸をＳＳＣとし、Ｚ軸を各細胞表面マーカーとしたｒ個の３次元データ（分布データ）を得て、検定部２３が、測定データに基づく分布データと、それぞれに対応する細胞表面マーカーに係る母集団データとの類似性を検定することとしてもよい。また、複数の細胞表面マーカーに係るデータを含む分布データを用いてもよい。さらに、本実施の形態では、ｍ次元の分布データをそのまま用いて処理を行ってもよい。具体的に、この例では、記憶部１２に格納されている、ｒ種類の細胞表面マーカーに係るｍ次元の分布データに対応する母集団データを生成して記憶部１２に保持しておく。 In addition, here, the cell group classification device 1 of the present embodiment uses m-dimensional (m = r + 2 when r types of cell surface markers are used) distribution data based on measurement data or population data as distribution data. The obtained 3r two-dimensional data was used. However, the present embodiment is not limited to this. For example, r three-dimensional data (distribution data) is obtained with the X axis as FSC, the Y axis as SSC, and the Z axis as each cell surface marker. It is good also as examining the similarity with the population data regarding the cell surface marker corresponding to. Further, distribution data including data relating to a plurality of cell surface markers may be used. Further, in the present embodiment, processing may be performed using m-dimensional distribution data as it is. Specifically, in this example, population data corresponding to m-dimensional distribution data relating to r types of cell surface markers stored in the storage unit 12 is generated and stored in the storage unit 12.

　細胞群分類装置１は、生成した処理データｄpのｍ次元分布データと、記憶部１２から読み出した母集団データｄeのｍ次元分布データとの類似性を検定する。この検定は既に述べたものと同様で、細胞群分類装置１は、一対のｍ次元空間内の分布データｄp（ｘ1，ｘ2，…ｘm），ｄe（ｘ1，ｘ2，…ｘm）について、当該ｍ次元空間を複数の領域（ビン）Ｒ1，Ｒ2…に区分したときの各領域（ビン）Ｒi（ｉ＝１，２，…ｋ）内のそれぞれのデータの総和Ｄpi，Ｄei（ｉ＝１，２，…ｋ）を演算する。そして、細胞群分類装置１は、各領域（ビン）Ｒi（ｉ＝１，２，…ｋ）内のそれぞれのデータの総和Ｄpi，Ｄeiについて、（１）式により尤度比統計量κnを演算する。 The cell group classification device 1 tests the similarity between the m-dimensional distribution data of the generated processing data dp and the m-dimensional distribution data of the population data de read from the storage unit 12. This test is similar to that already described, and the cell group classification apparatus 1 uses the m for the distribution data dp (x1, x2,... Xm), de (x1, x2,... Xm) in a pair of m-dimensional spaces. The sum Dpi, Dei (i = 1, 2) of the respective data in each region (bin) Ri (i = 1, 2,... K) when the dimensional space is divided into a plurality of regions (bins) R1, R2,. , ... k). Then, the cell group classification device 1 calculates the likelihood ratio statistic κn by the equation (1) for the sum Dpi, Dei of each data in each region (bin) Ri (i = 1, 2,... K). To do.

　一方、このような処理を行う場合、既知結果情報もまた、ｍ次元分布データに基づいて、細胞群ごとに一つの尤度比統計量κnが関連付けられたものとなる。そして細胞群分類装置１は、この既知結果情報と、処理データに基づいて生成した尤度比統計量κnとを用い、広く知られた方法（例えば最長距離法等）でクラスタリング処理を行う。 On the other hand, when such processing is performed, the known result information is also associated with one likelihood ratio statistic κn for each cell group based on the m-dimensional distribution data. Then, the cell group classification device 1 performs clustering processing using a known method (for example, the longest distance method) using the known result information and the likelihood ratio statistic κn generated based on the processing data.

　この例による場合も、記憶部１２は、図４に例示したるように、年齢層を表す情報（年齢層に属する最低年齢と最高年齢とで特定可能な情報）と、各年齢層別の測定データに基づいて生成した母集団データ（年齢層ごと母集団データ）とを関連付けて保持してもよい。 Also in this example, as illustrated in FIG. 4, the storage unit 12 includes information indicating an age group (information that can be specified by the minimum age and the maximum age belonging to the age group), and measurement for each age group. You may hold | maintain in association with the population data (population data for every age group) produced | generated based on data.

　そしてこのようにした場合は、制御部１１が、処理対象となった細胞群の提供者の年齢を操作部１３等から受け入れ、定母集団データ取得部２２が、当該受け入れた年齢が属する年齢層に対応する、年齢層ごと母集団データを取得して検定部２４に出力する。そして検定部２４が、当該取得された年齢層ごと母集団データと、処理対象となった細胞群から得られた処理データとの類似性を検定する。 In this case, the control unit 11 receives the age of the provider of the cell group to be processed from the operation unit 13 or the like, and the fixed population data acquisition unit 22 belongs to the age group to which the received age belongs. Population data corresponding to age groups is acquired and output to the test unit 24. And the test | inspection part 24 tests the similarity of the acquired population data for every age group, and the process data obtained from the cell group used as the process target.

　本実施の形態によれば、異常な細胞だけでなく、その周囲で異常な細胞との間で相互作用する他の細胞の種類を含め、細胞種のばらつきを定量化して分析するので、現実的な病態を考慮した解析を行うことができる。また、年齢層別に母集団データを生成しておくことで、細胞種のばらつきが変化することにも配慮した解析を行うことができる。
　すなわち本発明の実施の形態に係る細胞群分類装置１は、特定の細胞を抜出すようなゲーティングを行うことなく、採取した細胞のすべてについての細胞表面マーカーの信号強度の分布のばらつきを用いて分類を行い、正常と診断される細胞群から得られる細胞表面マーカーの信号強度の分布のばらつきとの相違を定量化して扱うこととしたものである。また、この定量化の結果と、予後との相関に基づいて、予後予測を可能としている。 According to the present embodiment, not only abnormal cells but also other cell types that interact with abnormal cells in the surrounding area are quantified and analyzed, so that it is realistic. It is possible to perform analysis in consideration of various pathological conditions. In addition, by generating population data for each age group, it is possible to perform analysis in consideration of changes in cell type variation.
That is, the cell group classification device 1 according to the embodiment of the present invention uses variations in signal intensity distribution of cell surface markers for all collected cells without performing gating that extracts specific cells. Thus, the difference from the variation in the signal intensity distribution of the cell surface marker obtained from the group of cells diagnosed as normal is quantified and handled. In addition, the prognosis can be predicted based on the correlation between the quantification result and the prognosis.

　次に本発明の実施例について説明する。以下の例では、骨髄異形成症候群（ＭＤＳ）患者のデータについて、細胞表面マーカーとしてＣＤ３４とＣＤ４１ａとを用いる例について説明する。 Next, examples of the present invention will be described. In the following example, an example using CD34 and CD41a as cell surface markers will be described for data of a patient with myelodysplastic syndrome (MDS).

　まず、予め正常と診断されている処理対象となる細胞群と同じ組織の細胞群のサンプルを４０だけ用意し、各サンプルについてフローサイトメトリー機器にて、ＦＳＣ（前方散乱光）、ＳＳＣ（側方散乱光）、及び細胞表面マーカーであるＣＤ３４に係る蛍光強度の測定データを得た。これからＸ軸をＦＳＣ，Ｙ軸をＳＳＣとする第１の二次元分布データと、Ｘ軸をＦＳＣ，Ｙ軸を細胞表面マーカーの測定データとする第２の二次元分布データと、Ｘ軸をＳＳＣ，Ｙ軸を細胞表面マーカーの測定データとする第３の二次元分布データとを生成した。 First, only 40 samples of cell groups of the same tissue as the cell group to be processed that have been diagnosed as normal in advance are prepared, and each sample is subjected to FSC (forward scattered light), SSC (lateral) using a flow cytometry instrument. Scattered light) and fluorescence intensity measurement data related to cell surface marker CD34 were obtained. From now on, the first two-dimensional distribution data in which the X-axis is FSC, the Y-axis is SSC, the second two-dimensional distribution data in which the X-axis is FSC, the Y-axis is cell surface marker measurement data, and the X-axis is SSC. , And the third two-dimensional distribution data with the Y axis as the measurement data of the cell surface marker.

　ここで生成される二次元分布データは、ＦＳＣ，ＳＳＣ，及び細胞表面マーカーの蛍光強度（測定データ）を、直交三次元座標の各軸に割り当てて測定されたデータをプロットした三次元の分布データを、ＦＳＣの軸とＳＳＣの軸とを含む面、ＦＳＣの軸と細胞表面マーカーの測定データの軸とを含む面、ＳＳＣの軸と細胞表面マーカーの測定データの軸とを含む面にそれぞれ射影したもので、各面における各点でのデータ数を表した密度プロットである。具体的に図５に示すように三次元の分布データから第２，第３の二次元分布データ等が得られることになる。また、この正常と診断された各サンプルについてそれぞれ対応する二次元分布データを、それぞれ累算（同じ点に対する密度の値を加算）して、累算した第１ないし第３の二次元分布データを得た。 The two-dimensional distribution data generated here is three-dimensional distribution data obtained by plotting data measured by assigning fluorescence intensity (measurement data) of FSC, SSC, and cell surface markers to respective axes of orthogonal three-dimensional coordinates. Are projected onto a plane including the FSC axis and the SSC axis, a plane including the FSC axis and the cell surface marker measurement data axis, and a plane including the SSC axis and the cell surface marker measurement data axis, respectively. This is a density plot showing the number of data at each point on each surface. Specifically, as shown in FIG. 5, second and third two-dimensional distribution data and the like are obtained from the three-dimensional distribution data. Also, the corresponding two-dimensional distribution data for each sample diagnosed as normal is accumulated (added the density value for the same point), and the accumulated first to third two-dimensional distribution data are obtained. Obtained.

　次に、この累算した第１ないし第３の二次元分布データのそれぞれを処理の対象として、適応パーティショニング法の処理を行い、密度分布関数を推定した。この適応パーティショニング法は次のような処理となる。すなわち、図６に例示するように、処理の対象とする二次元分布データを２×２の互いに合同な領域に仮想的に分割し、各分割した領域内の密度が互いに等しいか否か（第１の仮説）をカイ二乗検定により検定する（Ｓ１）。また、同様に、処理の対象とする二次元分布データを４×４の互いに合同な領域に仮想的に分割し、各分割した領域内の密度が互いに等しいか否か（第２の仮説）をカイ二乗検定により検定する（Ｓ２）。 Next, the processing of the adaptive partitioning method was performed on each of the accumulated first to third two-dimensional distribution data, and the density distribution function was estimated. This adaptive partitioning method is as follows. That is, as illustrated in FIG. 6, the two-dimensional distribution data to be processed is virtually divided into 2 × 2 congruent regions, and whether the densities in the divided regions are equal to each other (first 1 hypothesis) is tested by chi-square test (S1). Similarly, the two-dimensional distribution data to be processed is virtually divided into 4 × 4 congruent areas, and whether or not the densities in the divided areas are equal to each other (second hypothesis). Test by chi-square test (S2).

　ここで第１または第２の仮説がカイ二乗検定により棄却されると判断されたときには、処理の対象とした二次元分布データを２×２の互いに合同な領域に分けて、それぞれの二次元分布データを生成し（Ｓ３）、生成した各二次元分布データのそれぞれを処理の対象として、再帰的にステップＳ１，Ｓ２，Ｓ３の処理を繰り返す（Ｓ４）。 Here, when it is determined that the first or second hypothesis is rejected by the chi-square test, the two-dimensional distribution data to be processed is divided into 2 × 2 congruent regions and the respective two-dimensional distributions are divided. Data is generated (S3), and the processes of steps S1, S2, and S3 are recursively repeated for each of the generated two-dimensional distribution data (S4).

　一方、ステップＳ１，Ｓ２において、処理の対象とした二次元分布データについての第１、第２の仮説のいずれもがカイ二乗検定により棄却されなかったときには、当該二次元分布データについては分割を行わないものとする（Ｓ５）。 On the other hand, if neither the first hypothesis nor the second hypothesis for the two-dimensional distribution data to be processed is rejected by the chi-square test in steps S1 and S2, the two-dimensional distribution data is divided. It is assumed that there is not (S5).

　そして分割を行わないとした二次元分布データ内の領域については、密度が等しいとして、当該領域内の密度の値の平均により、当該領域内のすべての点の値を置き換えておく。こうしてすべての領域について分割を行わないと決定するまで処理を繰り返す。この処理により、上記第１ないし第３の二次元分布データについて得られた処理結果（密度分布関数）をＣＤ３４に係る母集団データとした。以上の処理を細胞表面マーカーＣＤ４１ａについても同様に行い、ＣＤ４１ａに係る母集団データを得た。これにより得られるデータでは、図７にその概要を示すように、分布が平滑化され、データのノイズが軽減されたものとなる。 Suppose that the areas in the two-dimensional distribution data that are not divided are assumed to have the same density, and the values of all the points in the area are replaced by the average of the density values in the area. Thus, the process is repeated until it is determined that the division is not performed for all areas. By this processing, the processing result (density distribution function) obtained for the first to third two-dimensional distribution data was used as population data related to CD34. The above processing was similarly performed for the cell surface marker CD41a, and population data related to CD41a was obtained. In the data obtained by this, as shown in FIG. 7, the distribution is smoothed and the data noise is reduced.

　次に、腫瘍に罹患している患者から得た細胞群についてフローサイトメトリー機器にて、ＦＳＣ（前方散乱光）、ＳＳＣ（側方散乱光）、及び細胞表面マーカーであるＣＤ３４に係る蛍光強度の測定データを得た。これからＸ軸をＦＳＣ，Ｙ軸をＳＳＣとする第１の二次元分布データ（密度プロット）と、Ｘ軸をＦＳＣ，Ｙ軸を細胞表面マーカーの測定データとする第２の二次元分布データ（密度プロット）と、Ｘ軸をＳＳＣ，Ｙ軸を細胞表面マーカーの測定データとする第３の二次元分布データ（密度プロット）とを生成した。 Next, with respect to a cell group obtained from a patient suffering from a tumor, the fluorescence intensity of CD34 which is FSC (forward scattered light), SSC (side scattered light), and cell surface marker is measured with a flow cytometry instrument. Measurement data was obtained. From this, the first two-dimensional distribution data (density plot) in which the X-axis is FSC and the Y-axis is SSC, and the second two-dimensional distribution data (density) in which the X-axis is FSC and the Y-axis is cell surface marker measurement data Plot) and third two-dimensional distribution data (density plot) in which the X-axis is SSC and the Y-axis is measurement data of the cell surface marker.

　次に、この第１ないし第３の二次元分布データのそれぞれを処理の対象として、適応パーティショニング法の処理を行い、密度分布関数を推定した。ここでの適応パーティショニング法の処理は、母集団データの生成の際に用いた方法と同様であるので、繰返しての説明を省略する。 Next, the adaptive partitioning method was processed for each of the first to third two-dimensional distribution data, and the density distribution function was estimated. The processing of the adaptive partitioning method here is the same as the method used when generating the population data, and thus repeated description is omitted.

　この患者から得た細胞群についても、細胞表面マーカーＣＤ４１ａに係る蛍光強度の測定データに対して同様に処理し、ＣＤ４１ａに係る、第１ないし第３の二次元分布データについての密度分布関数を得た。 For the cell group obtained from this patient, the fluorescence intensity measurement data related to the cell surface marker CD41a is similarly processed, and the density distribution function for the first to third two-dimensional distribution data related to CD41a is obtained. It was.

　そしてここで患者から得た細胞群に基づく各二次元分布データｄp（ｘ1，ｘ2，…，ｘ６）（ＣＤ３４に係るデータｘ１ないしｘ３と、ＣＤ４１ａに係るデータｘ４ないしｘ６）と、母集団データｄe（ｘ1，ｘ2，…，ｘ６）（ＣＤ３４に係るデータｘ１ないしｘ３と、ＣＤ４１ａに係るデータｘ４ないしｘ６）について、各二次元空間を、複数の領域（ビン）Ｒ1，Ｒ2…に区分したときの各領域（ビン）Ｒi（ｉ＝１，２，…ｋ）内のそれぞれの密度のデータの総和Ｄpi，Ｄei（ｉ＝１，２，…ｋ）を演算する。この領域Ｒiは、例えばｘj_min＜ｘj＜ｘj_max（ｊ＝１，２，…）で区切られた領域内とし、互いに重なり合わないように設定した。 Here, each two-dimensional distribution data dp (x1, x2,..., X6) (data x1 to x3 related to CD34 and data x4 to x6 related to CD41a) based on the cell group obtained from the patient, and population data de (X1, x2,..., X6) (data x1 to x3 related to CD34 and data x4 to x6 related to CD41a) when each two-dimensional space is divided into a plurality of regions (bins) R1, R2. The sum Dpi, Dei (i = 1, 2,... K) of each density data in each region (bin) Ri (i = 1, 2,... K) is calculated. This region Ri is set, for example, within a region delimited by xj_min <xj <xj_max (j = 1, 2,...) And does not overlap each other.

　次に、第１ないし第３の二次元分布データのそれぞれについての各領域（ビン）Ｒi（ｉ＝１，２，…ｋ）内の密度のデータの総和Ｄpi，Ｄeiを用い、（１）式により、ＣＤ３４，ＣＤ４１ａそれぞれについての第１ないし第３の二次元分布データについての尤度比統計量κn_1，κn_2，…，κn_６を演算した。 Next, using the sum Dpi and Dei of the density data in each region (bin) Ri (i = 1, 2,... K) for each of the first to third two-dimensional distribution data, Thus, likelihood ratio statistics κn_1, κn_2,..., Κn_6 for the first to third two-dimensional distribution data for CD34 and CD41a were calculated.

　図８は、患者から得た細胞群についてのＣＤ３４に係る第２、第３の二次元分布データに基づく密度分布関数（上段の左右２つのデータ）と、母集団データについてのＣＤ３４に係る第２、第３の二次元分布データに基づく密度分布関数（下段の左右２つのデータ）とから得た尤度比統計量κn_2，κn_3を演算した例を表すものである。ここではκn_2＝０．１２、κn_3＝０．１５３となっている。 FIG. 8 shows a density distribution function based on the second and third two-dimensional distribution data related to CD34 for the cell group obtained from the patient (the upper left and right data), and a second related to CD34 for the population data. 4 shows an example in which likelihood ratio statistics κn_2 and κn_3 obtained from a density distribution function (bottom two left and right data) based on the third two-dimensional distribution data are calculated. Here, κn_2 = 0.12 and κn_3 = 0.153.

　図９に、５９例の患者ごとに、上記の処理を行い、ＣＤ３４に係る第２、第３の二次元分布データに基づく密度分布関数から得た尤度比統計量κn_2，κn_3と、ＣＤ４１ａに係る第２、第３の二次元分布データに基づく密度分布関数から得た尤度比統計量κn_5，κn_6との分布を各患者ごとに生成した例を示す。図９のグラフは横軸が患者番号（１から５９）、縦軸が尤度比統計量を表す。 FIG. 9 shows the likelihood ratio statistics κn_2 and κn_3 obtained from the density distribution function based on the second and third two-dimensional distribution data related to CD34 and the CD41a for 59 patients. An example in which a distribution with likelihood ratio statistics κn_5 and κn_6 obtained from the density distribution function based on the second and third two-dimensional distribution data is generated for each patient is shown. In the graph of FIG. 9, the horizontal axis represents the patient number (1 to 59), and the vertical axis represents the likelihood ratio statistic.

　分布の中央値（κn_2，κn_3，κn_5，κn_6の中央値）が比較的大きい集団があり、０．５を基準に、当該中央値が０．５を超えるものと、中央値が０．５以下のものとの２つのクラスタに分類した。以下、中央値が０．５を超えるクラスタをグループ１、中央値が０．５以下のクラスタをグループ２と呼ぶ。 There is a group with a relatively large median distribution (median values of κn_2, κn_3, κn_5, κn_6), and when the median value exceeds 0.5 and the median value is 0.5 or less based on 0.5 Into two clusters. Hereinafter, a cluster having a median value greater than 0.5 is referred to as group 1, and a cluster having a median value of 0.5 or less is referred to as group 2.

　図１０に、上記のグループ１、グループ２のそれぞれについての全生存率を表す生存曲線（Kaplan-Mayer生存曲線）を示す。図１０に示した生存曲線（横軸は年、縦軸は全生存率）の差を一般化Wilcoxon検定により検定するとｐ＝０．０４０８となり、５％水準で有意であると認められた。生存期間の中央値は、グループ１について１．８８年、グループ２で４．６６年であった。５年生存率はグループ１で３０％、グループ２で４３．７％であった。
　これより、採取した細胞のすべてについての細胞表面マーカーの信号強度の分布のばらつきを用いて分類を行い、正常と診断される細胞群から得られる細胞表面マーカーの信号強度の分布のばらつきとの相違の定量化結果である尤度比統計量が、予後と相関することも理解される。すなわち、他の患者についても上記の処理を行うことで、上記尤度比統計量の中央値が０．５を超えるか否かにより、予後の異なるグループのどちらに属するかを判断可能になっている。 FIG. 10 shows a survival curve (Kaplan-Mayer survival curve) representing the overall survival rate for each of the

groups

1 and 2 described above. When the difference between the survival curves shown in FIG. 10 (the horizontal axis is the year and the vertical axis is the total survival rate) was tested by the generalized Wilcoxon test, p = 0.0408, which was recognized as significant at the 5% level. The median survival was 1.88 years for group 1 and 4.66 years for group 2. The 5-year survival rate was 30% in Group 1 and 43.7% in Group 2.
Based on this, classification is performed using the variation in the signal intensity of the cell surface marker for all the collected cells, and the difference from the variation in the distribution of the signal intensity of the cell surface marker obtained from a group of cells diagnosed as normal It is also understood that the likelihood ratio statistic, which is the quantification result of, correlates with the prognosis. That is, by performing the above processing for other patients, it becomes possible to determine which group has a different prognosis depending on whether or not the median of the likelihood ratio statistics exceeds 0.5. Yes.

Claims

　正常と診断された細胞群サンプルの測定データに基づく母集団データを保持する保持手段と、
　処理対象となった細胞群について得た測定データに基づいて、前記母集団データとの類似性を検定可能な処理データを生成する手段と、
　前記生成した処理データと、前記保持手段に保持された母集団データとの類似性を検定する検定手段と、
　前記検定の結果に基づく所定の基準に従い、前記処理対象となった細胞群を、予め定めた複数のグループのいずれかに分類する分類手段と、
　を含む細胞群分類装置。 Holding means for holding population data based on measurement data of cell group samples diagnosed as normal;
Based on the measurement data obtained for the cell group to be processed, means for generating processing data that can be tested for similarity to the population data;
Testing means for testing the similarity between the generated processing data and the population data held in the holding means;
In accordance with a predetermined standard based on the result of the test, classification means for classifying the cell group to be processed into any of a plurality of predetermined groups,
A cell group classification apparatus comprising:
　請求項１に記載の細胞群分類装置であって、
　前記保持手段は、予め定めた年齢層ごとに、各年齢層に属する年齢の提供者から提供され、正常と診断された細胞群サンプルの測定データに基づいて得られる、年齢層ごと母集団データを保持し、
　前記検定手段は、処理対象となった細胞群の提供者の年齢が属する前記年齢層に対応する、年齢層ごと母集団データと、前記処理データとの類似性を検定する細胞群分類装置。 The cell group classification device according to claim 1,
The holding means provides population data for each age group, which is provided from a provider of an age belonging to each age group and obtained based on measurement data of a cell group sample diagnosed as normal for each predetermined age group. Hold and
The test means is a cell group classification device that tests the similarity between population data for each age group and the processing data corresponding to the age group to which the age of the provider of the cell group to be processed belongs.
　請求項１または２に記載の細胞群分類装置であって、
　前記測定データは、フローサイトメトリーによって得られる複数のパラメータであり、前記母集団データと前記処理データとは、当該パラメータに基づいて得られるｍ次元（ｍは自然数）の分布データである。 The cell group classification device according to claim 1 or 2,
The measurement data is a plurality of parameters obtained by flow cytometry, and the population data and the processing data are m-dimensional (m is a natural number) distribution data obtained based on the parameters.
　正常と診断された細胞群サンプルの測定データに基づく母集団データを取得する工程、
　処理対象となった細胞群について得た測定データに基づいて、前記母集団データとの類似性を検定可能な処理データを生成する工程、
　前記生成した処理データと、前記保持手段に保持された母集団データとの類似性を検定する工程、及び
　前記検定の結果に基づく所定の基準に従い、前記処理対象となった細胞群を、予め定めた複数のグループのいずれかに分類する工程、
　を含む細胞群分類方法。
Obtaining population data based on measurement data of cell group samples diagnosed as normal;
A step of generating processing data that can be tested for similarity to the population data, based on measurement data obtained for a cell group to be processed;
In accordance with a step of testing the similarity between the generated processing data and the population data held in the holding unit, and a predetermined standard based on the result of the testing, the cell group to be processed is determined in advance. A process of classifying it into one of a plurality of groups,
A cell group classification method comprising: