WO2005048185A1

WO2005048185A1 - Transductive neuro fuzzy inference method for personalised modelling

Info

Publication number: WO2005048185A1
Application number: PCT/NZ2004/000290
Authority: WO
Inventors: Nikola Kirilov Kasabov; Qun Song
Original assignee: Auckland University Of Technology
Priority date: 2003-11-17
Filing date: 2004-11-17
Publication date: 2005-05-26

Abstract

The invention provides a prediction system (100) configured to predict an output from a test input. The system includes a data transformation module configured to transform at least some of the input data to obtain a set of normalised data (110). A rationalising module is configured to apply a rationalising function to the set of normalised data to obtain a set of rationalised input data (115) and rationalised expected output data. A clustering module is configured to apply a clustering function to the set of rationalised data (115). A set of rules (125) is maintained in computer memory. An optimiser module (130) is configured to apply a transformation to the rules (125) based at least partly on the results of the clustering function. A decoder (135) is configured to transform a series of outputs and an output layer (140) is configured to display a set of outputs.

Description

TRANSDUCTΓVΈ NEURO FUZZY INFERENCE METHOD FOR PERSONALISED MODELLING

FIELD OF INVENTION

The invention relates to a Transductive Neuro Fuzzy Inference Method and Uses for Personalised Modelling.

BACKGROUND OF THE INVENTION

Most of the learning models and systems in artificial intelligence currently available are in most cases global models, covering the whole problem space. Such models include regression functions, the multilayer perception neural networks, the ANFIS neuro-fuzzy inference systems, and so on. These models are usually difficult to update on new data without using old data, previously used to derive the models; they do not take into account the partial information contained in a new vector when they are recalled on it; they do not take into account the importance of the different variables for a different part of the problem space. Overall, creating a global model (function) that would be valid for the whole problem space is a difficult task, and in most cases — it is not necessary to solve. These global models are usually derived through using inductive learning methods.

In some connectionist and also fuzzy inference systems, a global model is learned that consists of many local models (e.g., rules representing clusters of data) that collectively cover the whole space and are adjusted incrementally on new data. The output for a new vector is calculated based on the activation of one or several neighbouring local models (rules). Such systems are the evolving connectionist systems.

The inductive learning and inference approach is useful when a global model ("the big picture") of the problem is needed even in its very approximate form. In some models (e.g. ECOS) it is possible to apply incremental, on-line learning to adjust this model on new data and trace its evolution. Unfortunately, despite these advances, the inductive global learning process is less suitable where personalised modelling is required, for example, in clinical and medical applications of learning systems. This problem is particularly acute in determining individual outcomes, diagnoses and treatment regimes for medical decision support systems. In such applications, the focus is not on the global model, but on the individual patient. It is not so important what the global error of a global model over the whole problem space is, but rather - the accuracy of prediction for an individual patient.

Transductive inference systems and methods have been devised to address this problem by estimation of a function at a single point of the search space only. For a new input vector that needs to be processed for a prognostic task, the closest examples that form a data subset are derived from an existing data set or/and generated from an existing model. A new model is dynamically created from this subset to approximate the function at the new input vector. An example of these models is the k-nearest neighbour method where for every new input vector v, the closest A: vectors from a training (existing) data set are chosen and the predicted output for the new vector is calculated based on the outputs of these k examples. Unfortunately, currently available transductive inference methods and systems have one or more of the following disadvantages:

(1) The models do not estimate the importance factors for the input variables in every part of the problem space, where a new vector is located; it is known that for different groups of patients, for example old, versus young; male versus female, some input variables are more important than others, and if this is taken into account, a more accurate output value would be calculated for the new input vector.

(2) The models created are opaque, making it difficult to explain in terms of rules a predicted value (or a class) for an input vector, thus limiting the explanation power of the systems.

(3) The models fail to work well in a large dimensional space of many variables (for example as required in bioinformatics applications where thousands of genes, proteins and/or clinical variables are present). (4) The models assume in general that data is mapped in a linear function. Such models fail to sufficiently accurately predict non-linear data; and

(5) The models are inflexible making them unsuitable for applications where there are different numbers of variables or on data characterised by missing values. It is therefore an object of the present invention to provide a transductive inference method and system with variable importance evaluation, that overcome the above- mentioned difficulties or that at least provides the public with a useful choice.

SUMMARY OF THE INVENTION

In a first aspect the present invention provides a method for predicting an output from a test input comprising the step of receiving a set of input data having expected output data; applying a transformation to at least some of the input data to obtain a set of normalised data; applying a rationalising function to the set of normalised data to obtain a set of rationalised input data and rationalised expected output data; applying a clustering function to the set of rationalised data; applying a transformation to a set of rules based at least partly on the results of the clustering function; and evaluating the accuracy of the rationalised expected output data; and generating output data.

In broad terms in another aspect the present invention comprises a prediction system configured to predict an output from a test input, the system comprising a data transformation module configured to transform at least some of the input data to obtain a set of normalised data; a rationalising module configured to apply a rationalising function to the set of normalised data to obtain a set of rationalised input data and rationalised expected output data; a clustering module configured to apply a clustering function to the set of rationalised data; a set of rules maintained in computer memory; an optimiser module configured to apply a transformation to the rules based at least partly on the results of the clustering function; a decoder configured to transform a series of outputs; and an output layer configured to display a set out outputs. The present invention also extends in a still further aspect to a neural network module for carrying out the steps in the first aspect.

The present invention also provides a method for predicting an output from a test input x comprising at least the following steps: a) provide a set D of known global inputs and expected outputs of the used variables; b) select relevant input variables and initialise importance factors for the input variables for the new input vector x - local importance factors; c) perform a transformation of the problem space into a reduced and normalised variable space based on weighted variable normalisation that reflects the local importance of the input variables for the area of the new input vector, thus producing a normalised data set D '; d) rationalise the said set D ' to produce a new rationalised local set D '_x of inputs and expected outputs that are closely related to the test input x in the variable importance space; e) cluster and partition the rationalised set D'_x in the weighted variable normalisation problem space using a clustering algorithm; f) set the initial parameters of the or classification/prediction model based on fuzzy rules according to the results of the clustering and the partitioning in steps (c) and (d); g) optimise the variable normalisation weights and the parameters of the fuzzy rules for the model based on the accuracy measured for the data set ; h) iterate the process from step c) above until a maximum accuracy model is produced; and i) calculate the output for the provided test input by applying fuzzy inference over the fuzzy rules.

The present invention also provides a system for predicting an output from a test input comprising at least the following: a) an input device for receiving a test input x; b) a storage and retrieval medium to provide a set D of known global and previously stored inputs and expected outputs; c) a variable selection and data transformation module that transforms data from the original space to a weighted variable normalisation space by performing a transformation of the problem space into a normalised variable space based on weighted variable normalisation that reflects the local importance of the input variables for the area of the new input vector, thus producing a normalised data set D'; d) a rationalising module that produces a new rationalised set D'_x of inputs and expected outputs that are closely related to the test input x from set D' in the weighted variable normalisation space; e) a clustering module comprising a clustering algorithm for clustering and partitioning the rationalised set D '_x; f) a fuzzy rule creation module that creates fuzzy rules and sets initial parameters for the fuzzy rules according to the results of clustering and partitioning in step e); g) an optimising module that optimises the variable normalising weights and the parameters of the fuzzy rules for the model based on the accuracy measured for the data set and feeds back models into the data transformation module of step c) until the accuracy of the model is adequate; and h) a fuzzy decoder for calculating the output for the provided test input by applying fuzzy inference over the fuzzy rules for a weighted input vector based on the local importance factors; and i) an output device for outputting the output result from the fuzzy decoder and for analysing the fuzzy rules and clusters.

BRIEF DESCRIPTION OF THE DRAWING

Preferred forms of the method and system of the invention will now be described with reference to the accompanying figures in which:

Figure 1 shows a schematic diagram of the main components of an embodiment of the invention; and

Figure 2 shows case study data associated with the invention.

DETAILED DESCRIPTION OF THE INVENTION

Figure 1 shows a schematic diagram of preferred aspects of one form of the invention. The system 100 includes an input layer to which known inputs and expected outputs are passed.

In the above aspects, the set of known inputs and expected outputs are preferably stored in a database maintained in computer memory. In one embodiment, the outputs are membership classes. In a medical context, the membership class may, for example, be a class of patient permitting a classification of a patient or a condition.

In an alternative embodiment, the output is one or more data values or vectors. In medical applications, typical input values are clinical and/or gene patient-specific data and outputs are preferably selected from the group consisting of membership of a group of patients, a risk of an event, a clinical variable not easily directly measured e.g. glomerular filtration rate, a prognostic outcome, a diagnostic outcome, a suggested treatment or treatment regime. In business decision support applications, typical input variables and their corresponding output variables would be: records of applicants for a bank loan and the decision (grant, or don't grant the loan); a set of economic variables and a predicted economic state; a set of financial indexes and a predicted value for an index; etc.

Transformation could be performed on the input data to obtain a set of normalised data, shown as normalised inputs 110. The system could assign an importance factor, or set of importance factors to one or more of the inputs. In one embodiment, the importance factors are normalisation weights. In such a case, the variable importance space is weighted variable normalisation space. The inputs having normalisation weights exceeding a threshold importance factor are selected for subsequent transformation and rationalisation.

Normalised variable space is also known as variable importance space. Variable normalisation weights are also known as local importance factors.

In one embodiment, importance factors in local space may be initialised by assuming they are equal to importance factors determined using a global model, such as an inductive fuzzy neural network. In an alternate embodiment, the importance factors are all initialised to be equal to 1. The model produced from the optimisation is a set of rules.

A rationalising function could then be applied to the normalised inputs 110 to obtain a set of rationalised data, shown as rationalised inputs 115.

The rationalisation process may be in the form of human selection of relevant data based on experience. Alternatively, the rationalisation process may be a computational method as suggested in this invention. A simple example of such a computational method is the k- nearest neighbour (k-NN) transductive decision method described in Mitchell, M.T. (1997). "Machine Learning,." MacGraw-Hill. and Vapnik, V. (1998) Statistical Learning Theory, John Wiley & Sons, Inc.

The rationalised set of inputs and expected outputs that are closely related to the test input, represent a sub-set of the original set data more closely related to the test input than the original set data according to a measured similarity. If desired, the rationalised set can be selected after the initial data set is transformed into the variable importance space through normalisation procedure and distance is measured between the new input vector and all data vectors. One embodiment selects criteria for data selection by selecting a minimum number of N closest vectors to the new vector but all 5 samples that are different from each other in less than 10% percent are also included.

The rationalised set can also be selected based on the distance between the new input vector and the data samples in the original input space, after which importance factors are calculated with the use of methods from the art, such as correlation analysis (for prediction models) and signal-to-noise ratio (for classification models). The rationalised set then is l o transformed into a set in the new space - the variable importance space.

A clustering function could then be applied to create a set of clustered inputs 120. The clustering and partitioning of the rationalised set may be accomplished by using any suitable clustering algorithm in the art, but in a preferred embodiment, clustering and partitioning is performed in the local weighted variable normalisation problem space based 15 on the importance factors. The currently preferred algorithm is ECM described in Kasabov, N. and Q. Song (2002). "DENFIS. Dynamic, evolving neural-fuzzy inference systems and its application for time-series prediction." IEEE Trans. On Fuzzy Systems 10(2): 144-154 , which is hereby incorporated by reference.

The system also maintains a set of rules, shown as fuzzy rules 105. The process of creating 20 fuzzy rules may be undertaken separately from the clustering and partitioning or may be undertaken in the same process. The currently preferred ECM algorithm provides the requisite process as part of the partitioning and clustering process.

The system includes an optimiser 130 that is configured to apply a transformation, for example an optimising transformation, to the rules based at least partly on the clustering 25 function described above. The parameters of the fuzzy rules may be optimised by any objective evaluation method that determines the fitness of the data. The currently preferred method is to determine an overall error for the fitness of the data. Optimisation occurs by: (1) changing the weighted normalisation intervals (importance) for the input variables; and (2) changing the parameters of the fuzzy rules, using in both cases error minimising algorithms in the art. The currently preferred algorithm for optimisation is a steepest descent algorithm. However, there are other well-established algorithms available in the art suitable for application in the practice of the present invention. Following optimisation, control could be passed back to obtain normalised inputs 110, rationalised inputs 115 and clustered inputs 120.

The output from the system and method is calculated using fuzzy decoding algorithms in the art specific for the fuzzy rules used. A decoder 135 applies a fuzzy decoding algorithm. Outputs are then passed to an output layer 140.

For the new input vector, a set of fuzzy rules that represent the rationalised data set is presented along with their activation for and the new input vector thus providing explanation facilities and transparency of the solution.

Typically, the problem space transformation module based on weighted variable normalisation, the clustering module, the fuzzy rule creation module, the optimising module, and fuzzy decoder form part of a computer implemented neural network 145 comprising an input transformation layer comprising one or more input nodes arranged to receive and normalise input data; a rule base layer comprising one or more rule nodes; an output layer comprising one or more output nodes; and an adaptive component arranged to aggregate selected two or more rule nodes in the rule base layer based on the input data.

The system and the method are preferably dynamic multi-input - multi-output neural-fuzzy inference systems and methods respectively with a local generalization, in which a fuzzy inference engine is used, for example the Zadeh-Mamdani engine described in Zadeh, L.A. (1965). "Fuzzy Sets." Information and Control 8. 338-353 or the Takagi-Sugeno engine described in Takagi, T. and M. Sugeno (1985). "Fuzzy Identification of systems and its applications to modeling and control." IEEE Trans. On Systems. Man, and Cybernetics 15: 116-132. The local generalization means that in a sub-space of the whole problem space (local area) a model is created that performs generalization in this area. Gaussian fuzzy membership functions may be applied in each fuzzy rule for both the antecedent and the consequent parts or for the antecedent part only. A BP (Back- Propagation) learning algorithm may be used for optimizing the parameters of the fuzzy membership functions. However, other learning algorithms may be employed. An additional learning function may be derived for use in the model.

The distance between vectors x and y is preferably measured in the weighted variable normalisation space as the normalized Euclidean distance defined as follows:

where x,y G R ^q.

For example, the distance between two data samples x=(0.3, 0.7) and z=(0.2, 0.4) (all variables are in the range of [0,1]) in the original data space is 0.1581, due to the large difference in variable x₂ values. If the variable importance factors (normalisation weights) are ql=0.9 and q2= 0.3 for the two variables xι and x₂ respectively, then the transformed vectors are x'= (0.27, 0.21) and z'=(0.18, 0.12), so the distance between the two vectors in the variable importance space applying the same formula (4), is now 0.06, due to the fact that the difference between the values for x₂ is not so important (importance 0.3, when compared to the importance of variable xl which is 0.9). As a partial case, an importance weight for an input variable can be 0 or close to 0, which indicates that this variable is not selected in the local model to calculate the output value for the input variable.

To partition the weighted variable normalisation space for creating fuzzy rules and obtaining initial values of fuzzy rules, a ECM (Evolving Clustering Method) may be applied, such as that described in Kasabov, N. and Q. Song (2002). "DENFIS. Dynamic, evolving neural-fuzzy inference systems and its application for time-series prediction." IEEE Trans. On Fuzzy Systems 10(2): 144-154 and the cluster centres and cluster radii are respectively taken as initial values of the centres and the widths of the Gaussian membership functions. The data in a cluster may be used for creating a linear output function.

The invention is described below by reference to generic methods and specific application methodologies systems.

1. Transductive Neuro-Fuzzy Inference Method for Prediction - TNFD?: Training and Simulating Methods

We assume that the TNFIP (Transductive Neuro-Fuzzy Inference Method for Prediction System) is given an input vector JC,. The following steps are implemented:

1. Define initial variable normalisation weighting functions f_l3 f₂,...,f_p for the input variables x_l3x₂,...,x_p to represent their importance for the new input vector x_t__. In one implementation the initial values are calculated as: f₁₌ f₂= ... =f_p i.e. all variables are of equal importance for the new input vector *_/. In another implementation, the functions fi, f₂,...,f_p are different linear weighted normalisation functions, so that f_j(x_j) = αj (XJ . Xjmin) /(xjmax - Xjmin) (j⁼ 1 ,2, ... ,p), which is a linear normalisation of the variable j in the interval [0, qj. The more important a variable is, the larger its normalisation interval will be and the more it will influence the distance measure between data samples in the transformed space.

2. Transform the initial problem space {x_l5x ,...,x_p} into weighted variable normalisation space {f_1;f₂,...,f_p} }where all data samples are transformed according to these functions (as a partial case a function fj is equivalent to its weight q_j). The functions (the weights) are subject to optimisation over iterations.

3. Search in the framing data set in the transformed space to find Ni tiaining examples that are closest to JC,. The value for Ni can be pre-defined based on experience, or - optimised through the application of an optimization procedure. Here we assume the former approach.

4. Calculate the distances d_j,j = 1, 2, ..., Ni, between each ofthese data samples and C_/. 5. Calculate the weights W_j = 1 + (Min(d) - df,j = 1, 2, ..., N, Min(</) is the minimum value in the distance vector d = [d_\, d_, ... , dw .

6. Use ECM (or another clustering algorithm) to cluster and partition the input sub-space that consists of Ni selected training examples. 7. Create fuzzy rules and set their initial parameter values according to the ECM clustering procedure results.

8. Optimize the parameters of the fuzzy rules following Eq. (5 - 24).

9. Apply the above points 2-6 for a certain number of iterations (training epochs) (the number can be either pre-defined or optimized) thus optimizing the parameters of the fuzzy rules in the local model Mi based on minimum least square error.

10. Modify the transformation functions f_1; f₂,...,f_p to optimise them based on minimum least square error. Repeat points 2 to 10 until an optimum set of functions and optimum model parameters are obtained.

11. Calculate the output value _y, for the input vector JC, applying fuzzy inference over the set of fuzzy rules that constitute the local model Mi.

12. End of the procedure.

The objective function and the TΝFIP training-simulation procedure are described below:

Consider the system having P inputs (xι, x₂, ..., p) and T outputs (y_\, y%, ..., yx). Suppose that it has M fuzzy rules defined initially through the ECM clustering procedure, and the l- th rule has the form of: i?/ :

is Fn and x₂ is F_ and ... xp is FR, then V_Ϊ is Gn, yi is Gu, ..., yτ is Gπ 1 = 1, 2, ..., M; (Zadeh-Mamdani type) (5) or, Rι :

is Fn and x₂ is Fa and ... x_P is Fp, then vi is m, y₂ is nn, ..., yτ is nπ I = 1 , 2, ... , M. (Takagi-Sugeno type) (6) Here, Fy are fuzzy sets defined by the following Gaussian type membership function:

and Gi_q are of a similar type as Fy and are defined as: ι2 -I GaussianMF = exp (for Zadeh-Mamdani type), 2δ² (8)

or: Kqi ⁼ qio + bqii

+ b_q xι + . . . + b_qn> xp : (for Tokagi-Sugeno type) (9)

Using the Modified Centre Average defuzzification procedure the output values of the system are calculated as follows:

M^x = (for Zadeh-Mamdani type)

(10) or:

U^χi) = (for Takagi-Sugeno type)

(11) The TNFIP model n inimizes the following objective function (an error function):

The steepest descent algorithm (BP) is used then to obtain the formulas for the optimization of the parameters n_qι, δ_qι, my, ay and σy of Zαdeh-Mαmdαni type TNFI such that the error function E from (12) is minimized:

(13)

(15)

16)

17) The steepest descent algorithm (BP) is also used to obtain the formulas for the optimization of the parameters n_qι, my, ay and σy of the Takagi-Sugeno type TNFIP such that the error function E from (12) is minimized: b_ql0(£ + 1) = b_ql0 [k] - η_b∑{w,Φ_Λ (_Xl)[f_ιq ^(k) (*) - y ι=l (18)

H) = α„(k)-!7_β ∑ Φn(x,)∑ [( (x ~y ( ) -fJ^k)( ι=l <jr=l (19)

(20) σ_j c + 1)

(21)

(22) nn(k ₊ l) [(f_ιq ^(k)(_Xl)-y_ιqXn_ql(_k)_f ^W _(Xι))J

(23) where: η„, η_δ , η_m, η_αand η_σare learning rates for updating the parameters n_qι, δ_qι, my, ay and σy respectively.

In the TNFI training-simulating algorithm, the following indexes are used: • Training data samples: = 1 , 2, ... , N;

• Input variables : j = 1 , 2, ... , P;

• Output variables: q = 1, 2, ... , T;

• Fuzzy rules: / = 1, 2, ..., M;

• Training epochs: A:= 1, 2,

Explanation rules can be extracted that apply to the new input vector x and explain the prognosis for this vector in the form of:

(1) Zadeh-Mamdani Rules, e.g.

IF xl has a membership degree of 0.68 to a Gaussian function with a center at 0.7 and a standard deviation of 0.2 (xl has an importance factor of 0.9) and x2 has a membership degree to a Gaussian function with a center at 0.5 and standard deviation of 0.12 (x2 has an importance factor of 0.3) THEN y has a membership degree of 0.9 to a Gaussian function with a center at 0.8 and a standard deviation of 0.18, with 15 vectors being in this cluster,

(2) Takagi Sugeno rules, e.g.:

IF xl has a membership degree of 0.68 to a Gaussian function with a center at 0.7 and a standard deviation of 0.2 (xl has an importance factor of 0.9) and x2 has a membership degree to a Gaussian function with a center at 0.5 and standard deviation of 0.12 (x2 has an importance factor of 0.3) THEN y is calculated as y= - 0.17 +0.73X_J + 0.58x₂ (with 15 vectors being in this cluster). 2. TNFD? Methodology for Time Series Modelling and Prediction

The TNFIP modelling method is used here as part of a methodology for time series prediction for modelling and predicting the future values of time series. The methodology is presented through a case study problem of building transductive models for the prediction of the Mackey-Glass (MG) time series data set. This has been used as a bench-mark problem in the areas of neural networks, fuzzy systems and hybrid systems. This time series is created with the use of the MG time-delay differential equation defined below: dx(t) 0.2 x(t- τ) - 0.1^JC( (24) dt l +x¹⁰(t-τ)

To obtain values at integer time points, the fourth-order Runge-Kutta method was used to find the numerical solution to the above MG equation. Here we assume that: the time step is 0.1; x(0) = 1.2; τ= 17; and x(t) = 0 for t < 0. The task is to predict the values x(t + 85) from input vectors [x(t — 18), x(t — 12), x(t - 6), x(t)] for any value of the time t. For the purpose of a comparative analysis, we also trained other connectionist models applied for inductive inference on the same task. These models are MLP and DENFIS.

Variant 1:

The following experiment was conducted: 200 data points, for t = 4001 to 4200, are extracted from the time series and used as training data; the following 200 data points, for t = 4201 to 4400, are used as simulating data, so that for each of the simulating data sample a local TNFI model is created and tested on this data. Figure 2 displays the target data 200 including training data 205 and simulating data 210. Table 1 lists the prediction results including obtained by using the TNFIP method and two other popular methods - MLP (Multilayer perception) and DENFIS (Dynamic Neuro- Fuzzy Inference System) in terms of RMSE (root mean square error), MAE (mean absolute error) on the simulating data as well as the number Rn of rules or rule nodes or neurone used in each model.

Table 1. Simulating results on MG data (no optimisation of the variable normalisation weights)

The following parameter values were used in the models:

MLP: Number of neurons in the hidden layer: 16; Learning algorithm: Levenberg-

Marguardt BP algorithm; Learning Epochs: 100.

DENFIS: Dthr (distance-threshod): 0.15; MofN: 4; Learning epochs: 60;

TNFIP: JV}: 32; Dthr: 0.20; Learning epochs for weight and parameter optimisation for each input vector: 60.

The TNFIP transductive reasoning system performs better than the other inductive reasoning models. This is a result of the fine tuning of each local model in TNFIP for each simulated example, derived according to the TNFIP learning procedure. The finely tuned local models achieve a better local generalisation.

In the example above, constant transformation functions were used, i.e. fι=xι, f₂= x₂,...,f_p=

In the example below we apply optimisation of the weighted normalisation functions with the use of a genetic algorithm (GA) following the steps below. Variant 2:

1) A GA is run on a population of TNFI models for different values of weights, over several generations. As a fitness function, the root mean square error RMSE of a trained model on the training data or on a validation data is used. The GA runs over generations of populations and standard operations are applied such as binary encoding of the genes (weights); roulette wheel selection criterion; multi-point crossover operation for crossover.

2) The model with the least error is selected as the best one, and its chromosome - the vector of weights [q_ls q₂,...,q_p] defines the optimum normalization range for the input variables.

3) Variables with small weights are removed from the feature set and the steps from above are repeated again to find the optimum and the minimum set of variables for a particular problem and a particular TNFI model. The above method is illustrated as follows. TNFIP is applied on the Mackey-Glass (MG) time series prediction task. The following GA parameter values are used: for each input variable, the values from 0.16 to 1 are mapped onto 4 bit string; the number of individuals in a population is 12; mutation rate is 0.001; termination criterion (the maximum epochs of GA operation) is 100 generations; the root-mean square error RMSE on the training data is used as a fitness function. The resulted weight values, the training RMSE and testing RMSE are shown in Table 2. For a comparison, TNFIP results with the same parameters, the same framing data and testing data, but without optimisation of the normalisation weights are also shown in Table 2.

Table 2: Comparison between TNFIP without - and with optimisation of the variable normalisation weights

With the use of the method, better prediction results are obtained for a significantly less number of rule nodes (clusters) evolved. This is because of the better clustering achieved when different variables are normalized differently and the normalization reflects on their importance.

Two types of rules can be extracted for a particular new input vector:

(1) Zadeh-Mamdani rules, e.g.:

IF xl has a membership degree of 0.68 to a Gaussian function with a center at 0.7 and a standard deviation of 0.2 (xl has an importance factor of 0.4) and x2 has a membership degree to a Gaussian function with a center at 0.5 and standard deviation of 0.12 (x2 has an importance factor of 0.8) and x3 has a membership degree of 0.68 to a Gaussian function with a center at 0.14 and a standard deviation of 0.02 (x3 has an importance factor of 0.28) and x4 has a membership degree to a Gaussian function with a center at 0.87 and standard deviation of 0.2 (x2 has an importance factor of 0.28) THEN y has a membership degree of 0.78 to a Gaussian function with a center at 0.83 and a standard deviation of 0.18, with 10 vectors being in this cluster,

(2) Takagi Sugeno rules, e.g. IF xl has a membership degree of 0.68 to a Gaussian function with a center at 0.7 and a standard deviation of 0.2 (xl has an importance factor of 0.4) and x2 has a membership degree to a Gaussian function with a center at 0.5 and standard deviation of 0.12 (x2 has an importance factor of 0.8) and x3 has a membership degree of 0.68 to a Gaussian function with a center at 0.14 and a standard deviation of 0.02 (x3 has an importance factor of 0.28) and x4 has a membership degree to a Gaussian function with a center at 0.87 and standard deviation of 0.2 (x2 has an importance factor of 0.28) THEN y is calculated as y= - 0.25 +0.93xi + 0.5x₂ (with 10 vectors being in this cluster).

3. TNFIP Methodology for Personalised Medical Decision Support and Prognosis

Here, the TNFIP is used to develop an application oriented methodology for medical decision support systems. It is presented here through a case example - personalised (individualised) modelling for the evaluation of a renal function of patients in a renal clinic. Real data is used and the developed TNFIP system is currently considered for use in a clinical environment.

The accurate evaluation of renal function is fundamental to sound nephrology practice. The early detection of renal impairment will allow for the institution of appropriate diagnostic and therapeutic measures, and potentially maximise preservation of intact nephrons.

Glomerular filtration rate (GFR) is traditionally considered the best overall index to determine renal function in healthy and in diseased people. Most clinicians rely upon the clearance of creatinine (CrCl) as a convenient and inexpensive surrogate for GFR. CrCl can be determined by either timed urine collection, or from serum creatinine using equations developed from regression analyses such as that by Cockcroft-Gault formula, but the accuracy of CrCl is limited by methodological imprecision and the systematic bias.

Recently, the Modification of Diet in Renal Disease (MDRD) study group developed a new formula to more accurately evaluate the GFR. The formula uses six input variables: age, sex, gender, Screat, Salb and Surea and is defined as follows: GFR = 170x Screaf⁰"⁹ x Age ^{0 76} x 0.762 (if female) x x 1.18(ifrace is black) x Sured⁰¹⁷ *SaW 0.318 (25)

In the formula (25) Screat (Serum creatinine) is a protein which is expected to be filtered in the kidneys and the residual of it - released into the blood. The creatinine level in the serum is determined by the rate it is being removed in the kidney and is also a measure of the kidney function. Surea (Serum urea) is a substance produced in the liver as a means of disposing of ammonia from protein metabolism. It is filtered by the kidney and can be reabsorbed to the bloodstream. Salb (Serum albumin) is the protein of the highest concentration in plasma. Decreased serum albumin may result from kidney disease, which allows albumin to escape into the urine. Decreased albumin may also be explained by malnutrition or liver disease .

However, the formulae above that constitute global and fixed models can be misleading as to the presence and progression of renal disease. Here, the TNFIP method is applied for the prediction of the GFR of each new patient where a modified Takagi-Sugeno types of fuzzy rules are used whwre the output function is of the MDRD type but the coefficients will be calculated for every individual patient (personalised model) with the use of the TNFIP method.

Variant 1:

Using the TNFIP on a small GFR data set (93 samples) collected in a hospital in New Zealand, we obtain more accurate results than the MDRD formula. The testing was done with the use of leave-one-out cross validation method over the set of 93 samples. The results are listed in Table 3. Table 3. Comparison between the error of GFR evaluation with the use of the proposed TNFIP transductive reasoning method and the MDRD formula, the MLP, the DENFIS, and the transductive weighted k-NN method (WKNN) (preliminary results)

For comparison, the results produced by the MDRD formula (a global regression model), the MLP (a globally trained connectionist model) and DENFIS (a global model that is a set of adaptive local models), all - inductive reasoning systems, along with the results produced by using the transductive WKNN method, are also listed in the table. The leave-one-out training-simulating tests were performed for each model on the data set and Table 3 lists the results including RMSE (root mean square error), MAE (mean absolute error) and Rn (the number of rules or nodes, neurone) used in each model. In the different models the following parameter values were used:

MLP: Number of neurons in the hidden layer: 10; Learning algorithm: Levenberg-

Marguardt BP; Learning Epochs: 100.

DENFIS: Dt/zr (distance-threshod): 0.15; MofN: 6; Learning epochs: 60;

WKNN: N: 24;

TNFI: N_t: 24; Dthr: 0.20; Learning epochs: 60;

The TNFIP system gives the best accuracy of the GFR evaluation for each individual patient and overall - for the whole data set. There was no optimisation of the variable normalisation weights applied (the transformation functions were assumed constant).

Variant 2: Using weighted normalisation for the input variables

Leave one method is applied and the weighted normalisation of variables

Table 4. Comparison of results of the GFR individual prognosis whn TNFI method is used in its two variants: no weighting is applied; weighting of the input variables is applied through gradient descent optimisation algorithm. The TNFIP with optimisation is a superior method than the TNFIP without optimisation.

The average weighting (importance) factors for the variables across all samples after a leave-one-ut method is applied are shown in table 5.

Table 5. Average variable importance factors (variable normalisation weights) evaluated with the proposed method

Through using the TNFIP method a personalised model for each patient is derived and the input variables are weighted for their importance for the prediction of the output for this patient. This is illustrated in table 6 for a randomly selected single patient (one sample from the GFR data).

Table 6. The input data, the weighted variables and the predicted GFR value obtained with the use of a personalised TNFIP model for a single patient.

Fuzzy rules are extracted from this personalized model (six rules) as shown in Table 7 that best describe the prediction rules for the area of the problem space where the new input vector is located. Table 7. The fuzzy rules extracted from the personalised model for the person's data from fig. 6.

4. TNFIC: Transductive Neuro-Fuzzy Inference Method for Classification

The TNFIC classifies a data set into a number of classes in the n-dimensional input space. The system is a multi-input multi-output type fuzzy inference system optimized by a steepest descent algorithm (BP). The fuzzy rules that constitute the system can be of Zadeh- Mamdani type, of Takagi-Sugeno type or any non-linear function.

Suppose that TNFIC is given an input vector x_q, the following steps are implemented: 1) Define initial variable normalisation weighting functions ft, f₂,...,f_p for the input variables xι,x₂,...,x_p to represent their importance for the new input vector JC,.. In one implementation the initial values are calculated as: fι₌x_l5 f₂=x₂,...,fp= x_P; i.e. all variables are of equal importance for the new input vector x_L In another implementation, the functions ft, f₂,...,f_p are linear weighted normalisation functions, so that fj(xj) = α (XJ . Xj_min) (xjma - Xjmin) (j⁼l,2,...,ρ), which is a linear normalisation of the variable Xj in the interval [0, q ]. The more important a variable is, the larger its normalisation interval will be and the more it will influence the distance measure between data samples in the transformed space. 2) Transform the initial problem space {x_l5x₂,...,x_P} into weighted variable normalisation space {ft, f ,...,f_P} }where all data samples are transformed according to these functions (as a partial case a function fj is a constant equivalent to its weight q_j). The functions (the weights) are subject to optimisation over iterations. 3) Search in the framing data set in the input space to find N_q training examples that are closest to x_q. The value for N_q can be pre-defined based on experience, or - optimized through the application of an optimization procedure. Here we assume the former approach. 4) Calculate the distances d_t, i = 1, 2, ..., N_q, between each ofthese data samples and Xq. 5) Calculate the distance weights w,- = 1 - (d_t - mia(d)), i = 1, 2, ..., N_q, min(< ) is the minimum value in the distance vector d = [d_\, d₂, ... , du_q]. 6) Use ECM (other clustering algorithms can also be used) to cluster and partition the input sub-space that consists of N_q selected training samples. 7) Create fuzzy rules and set their initial parameter values according to the ECM clustering procedure results. 8) Optimise the parameters of the fuzzy rules in the local model M_q following Equations (26) - (35). 9) Apply the above points 2-8 for a certain number of iterations (training epochs) (the number can be either pre-defined or optimised) thus optimising the parameters of the fuzzy rules in the local model M_q based on minimum least square error. 10) Modify the transformation functions ft, f₂,...,f_pto optimise them based on minimum least square error. Repeat points 2 to 10 until an optimum set of functions and optimum model parameters are obtained. 11) Calculate the output value y_q = y_\, y₂, ... , yτ] for the input vector x_q applying fuzzy inference over the set of fuzzy rules that constitute the local model M_q. \fy_s = max(p_?), the input vector x_q belongs to the class s. 12) End of the procedure.

The parameter optimisation procedure is described below:

Consider the system having P inputs, T outputs and fuzzy rules of the Zadeh-Mamdani type defined initially through the ECM clustering procedure, the /-th rule has the form of:

Ri : If ! is Fn andx₂ is Fα and ... x? is F , then^_! is Gn, and ...,yτ is G^ Here, Fy are fuzzy sets defined by the following Gaussian type membership function:

and Gj_s are of a similar type as Fy and are defined as: GaussianMF = exp (27) 2δ²

Using the Modified Centre Average defuzzification procedure the output value of the system can be calculated on input vector x_{ =

x₂, ... , xp] as follows:

Suppose the TNFIC is given a framing data pair [x_t, the system minimizes the following objective function (a weighted error function):

^E = ^w [/*(*<) - J (Wi are defined in step 3) (29)

The steepest descent algorithm (BP) is used then to obtain the formulas for the optimization of the parameters «_&, δι_s, ay, my and σy of the TNFIC such that the value of E from Eq. (29) is minimized:

n_b(k + l) (30)

(33)

(34)

here, Φs (x,) = (35)

where: η„ , η_δ , η_α ,η_m and η_σare learning rates for updating the parameters «_&, δ_/Λ α_/7, my and σ_/ respectively.

In the TNFIC training-simulating algorithm, the following indexes are used:

• Training data samples: = 1, 2, ... , N; • Input variables: 7 = 1 , 2, ... , P;

• Output variables: s = 1 , 2, ... , T;

• Fuzzy rules: /= 1, 2, ..., M;

• Training epochs: k — 1, 2, ....

Example 1: TNFIC for the Classification of Iris data set with Optimisation of the Variable Normalisation Weights

In this section, the TNFIC with weighted variable normalisation using genetic algorithm

(GA) is applied on the Iris data for both classification and feature selection. The same as the experiments in the section 3, all experiments in this section are repeated 50 times with the same parameters and the results are averaged. 50% of the whole data set is randomly selected as training data and another 50% as testing data. The initial weight intervals for the four normalized input variables are [0, 1] and are encoded in a 6-bit binary string. The following GA parameters are used for the weight optimisation: number of individuals in a population 12; mutation rate 0.005; termination criterion (the maximum epochs of GA operation) 50; fitness function - the number of created rule nodes. The resulted weight values and the number of errors on the testing data are shown in Table 8. For comparison, TNFIC classification results with the same parameters, the same training data and testing data, but without variable weight normalisation are also shown in Table 8. From the results, we can see that the weight of the first variable is much smaller than the weights of the other variables. The weights show the importance of the variables and the least important variables can be removed from the input for some particular new input vectors. Same experiment is repeated without the first input variable (least important) and the results have improved as shown in Table 8. If another variable is removed, and the total number of input variables is 2, the test error increases, so it can be assumed that for the particular ECMC model the optimum number of input variables is 3. For different new input vectors, the normalisation weights of the input variables will be different pointing to the different importance of these variables for the classification (or prediction) of every new input vector located in a particular part of the problem space.

Table 8: Comparison between TNFIC without and with optimisation of the variable normalisation weights

Rules can be extracted for each new input vector x that explain the decision of this vector as illustrated below:

(1) Zadeh-Mamdani rules, e.g.: IF x2 has a membership degree of 0.68 to a Gaussian function with a center at 0.7 and a standard deviation of 0.2 (x2 has an importance factor of 0.5) and x3 has a membership degree to a Gaussian function with a center at 0.5 and standard deviation of 0.12 (x3 has an importance factor of 0.92) and x4 has a membership degree of 0.68 to a Gaussian function with a center at 0.14 and a standard deviation of 0.02 (x4 has an importance factor of 1) THEN y has a membership degree of 0.78 to belong to a class 2 defined by a Gaussian function with a center at 0.83 and a standard deviation of 0.18, with 10 vectors being in this cluster. (2) Takagi Sugeno rules, e.g.: IF x2 has a membership degree of 0.68 to a Gaussian function with a center at 0.7 and a standard deviation of 0.2 (x2 has an importance factor of 0.5) and x3 has a membership degree to a Gaussian function with a center at 0.5 and standard deviation of 0.12 (x3 has an importance factor of 0.92) and x4 has a membership degree of 0.68 to a Gaussian function with a center at 0.14 and a standard deviation of 0.02 (x4 has an importance factor of 1) THEN y is calculated as the formula: y=- 0.3 + 0.15 x2 - 0.4x3 + 0.5 x4, with 10 vectors being in this cluster, and if the calculated output value is greater than 1 than new vector belongs to class 2. 5. TNFIC Methodology for Business Decision Making Systems TNFIC is used here to develop a novel methodology for business decision support systems. The methodology is presented through a case example.

The problem used here is mortgage approval for applicants defined by 8 input variables - character (0- doubtful; 1 - good); total asset; equity; mortgage loan; budget surplus; gross income; debt servicing ratio; term of loan, and one output variable (decision ( 0- disapprove; 1 - approve).

TNFIC models are created in a leave-one-out mode for every single sample in the data set of 91 samples and results are presented in Table 9. The results are compared with the results obtained with the use of ECF and MLP as inductive methods.

Table 9. Comparative analysis of TNFIC, ECF and MLP on the business decision support case study data of mortgage approval

TNFIC Number of errors class 1: 3; Number of errors for class 2: 1; Overall correct: 87

(95.6%)

ECF Number of errors class 1 : 4; Number of errors for class 2: 2; Overall correct: 85

(93.4%)

MLP Number of errors class 1: 3; Number of errors for class 2: 2; Overall correct: 86

(94.5%)

Through using TNFIC a personalised decision support model is developed for every applicant that best makes the decision for them and the input variables are also weighted showing the importance of the variables for this applicant personalised model. This is illustrated in table box 10 where two of the rules that comprise the personalised decision model are shown:

Table 10. A personalised decision model for an applicant for a loan, the weighted input variables in this model through TNFI and two of the Zade-Mamdani fuzzy rules that comprise the model. Input vector of applicant/sample 7: [0 0.07 0.108 0.075 0.236 0.109 0.16 0.32]

Output value: Class 0 (disapprove).

Weights for input variables: [0.99 0.97 0.98 0.99 1.0 0.98 0.97 0.99]

Number of selected training data: 24

Rule l: if jci is (Gassian MF - center: -0.17, STD: 0.30) (Importance 0.99), and x₂ is (Gassian MF - center: -0.00, STD: 0.30) (Importance 0.97), and x₃ is (Gassian MF - center: -0.12, STD: 0.30) (Importance 0.98), and x₄ is (Gassian MF - center: -0.05, STD: 0.21) (Importance 0.99), and x₅ is (Gassian MF - center: -0.18, STD: 0.36) (Importance 1.0), and x₆ is (Gassian MF - center: -0.05, STD: 0.48) (Importance 0.98), and x₇ is (Gassian MF - center: 0.22, STD: 0.13) (Importance 0.97), and x₈ is (Gassian MF - center: 0.61, STD: 0.29) (Importance 0.99), then v is Class 0 Rule 2: if xι is (Gassian MF - center: 0.71, STD: 0.30) (Importance 0.99), and x₂ is (Gassian MF - center: -0.34, STD: 0.30) (Importance 0.97), and x₃ is (Gassian MF - center: -0.07, STD: 0.30) (Importance 0.98), and x₄ is (Gassian MF - center: 0.03, STD: 0.20) (Importance 0.99), and ₅ is (Gassian MF - center: 0.69, STD: 0.36) (Importance 1.0), and x₆ is (Gassian MF - center: 0.05, STD: 0.48) (Importance 0.98), and x₇ is (Gassian MF - center: -0.01, STD: 0.12) (Importance 0.97), and x₈ is (Gassian MF - center: 0.69, STD: 0.28) (Importance 0.99), then y is Class 0

6. TNFIC Methodology for Personalised Modelling and Decision Making for Medical Decision Support Systems

A methodology for personalised prognostic and classification systems is presented here through a case study examples of using personal gene expression data. As an example here, we use public domain data on DLBCL Lymphoma cancer outcome prognosis based on gene expression data, published in Shipp et al, Shipp, M.A., K.N. Ross, et al. (2002).

"Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning." Nature Medicine 8(1): 68-74. There are 7129 gene expression variables with 58 samples - 32 survivals (class 1) and 26 fatal outcomes (class

2), all traced in 5 years time. After some pre-processing, there were 11 genes selected with high prognostic power. The prognostic system presented in Shipp et al based on these 11 variables using inductive support vector machines and other techniques resulted in 84.5% accuracy measured through leave-one-out method. Here we suggest that using a personalised TNFIC modeling methodology on the 11 variables not only a better prediction on survival can be made, but a personal model can be evolved to explain the results and to be used for a personalized treatment and personalized drug design (see tables 11 ands 12).

Table 11. Experimental Results of TNFIC, ECF and SVM on DLBCL Lymphoma data (Leave-one-out validation)

Table 12 .A personalised TNFIC model for the survival prediction of a randomly selected person from the data set of Shipp et al.

Input vector of a randomly selected person comprising the gene expression of the selected by M.Shipp 11 genes: [341 275 20 20 725 237 314 20 20 62.6 192] Correctly predicted outcome by the personalised model TNFIC: Class 2 (died in 5 years time)

Weights for input variables: [0.97 0.99 1.0 1.0 0.99 0.98 0.99 0.99 1.0 0.99 0.99]

Number of selected training data samples for the personalised model: 56

7. A Method for Preliminary Variable and Data Set Selection

Transductive reasoning is not practical in case of large data sets D (e.g. millions of data samples) and large number of variables (e.g. thousands). Here we propose that a large data set D* given on a large number of variables V* is transformed into several clusters of data samples, each cluster defining their own list of variables, so that for every new vector xj only the data from the cluster that _X; belongs is used as data set D (see the general TNFI method) on a much smaller number of variables. The method consists of the following steps:

1. Starting from a whole data set D* defined with a set V* of variables, cluster the data into m clusters C1,C2,...,Cm using ECM or other clustering methods. Each cluster Ci contains a subset Di of m; samples.

2. For each cluster Ci, define a set of variables Vi as a subset of V* (starting form 1 variable) that results in a TNFI model for this cluster with the highest accuracy.

3. The set of m clusters, m data sets and m set of variables will represent the initial data set D* in a more convenient format for a TNFI modelling on any new data vectors in following manner. For every new data vector x; for which a model is created through the TNFI methods, first the vector is mapped into the clusters and the cluster Ci the vector belongs to is used to create the data set D=Di with variables Vi as a starting point of the TNFI methods. The foregoing describes the invention including preferred forms thereof. Alterations and modifications as will be obvious to those skilled in the art are intended to be incorporated within the scope hereof, as defined by the accompanying claims.

Claims

CLAIMS:

1. A method for predicting an output from a test input comprising the steps of: receiving a set of input data having expected output data; applying a transformation to at least some of the input data to obtain a set of normalised data; applying a rationalising function to the set of normalised data to obtain a set of rationalised input data and rationalised expected output data; applying a clustering function to the set of rationalised data; applying a transformation to a set of rules based at least partly on the results of the clustering function; evaluating the accuracy of the rationalised expected output data; and generating output data.

2. A method for predicting an output from a test input as claimed in claim 2 further comprising the step of selecting a subset of the set of input data.

3. A method for predicting an output from a test input as claimed in claim 2 further comprising the steps of assigning an importance factor to one or more members of the set of input data and selecting for the subset those members having an importance factor above a threshold importance factor.

4. A method for predicting an output from a test input as claimed in claim 3 wherein the importance factors assigned to respective members of the set of input data are calculated using an inductive fuzzy neural network.

5.. A method for predicting an output from a test input as claimed in claim 3 wherein the same importance factor is assigned to members of the set of input data.

6. A method for predicting an output from a test input as claimed in claim 1 wherein the rationalising function comprises a transductive decision method.

7. A method for predicting an output from a test input as claimed in claim 3 wherein the clustering function is performed based at least partly on the importance factors assigned to one or more members of the set of input data.

8. A method for predicting an output from a test input as claimed in claim 3 further comprising the step of assigning a new importance factor to one or more members of the set of input data following the step of evaluating the accuracy of the rationalised expected output data.

9. A method for predicting an output from a test input as claimed in claim 3 further comprising the step of applying a further transformation to the set of rules following the step of evaluating the accuracy of the rationalised expected output data.

10. A method for predicting an output from a test input as claimed in claim 1 wherein the input data comprises clinical data.

11. A method for predicting an output from a test input as claimed in claim 1 wherein the input data comprises gene data.

12. A method for predicting an output from a test input as claimed in claim 1 wherein the input data comprises financial institution loan application data.

13. A method for predicting an output from a test input as claimed in claim 1 wherein the input data comprises economic data.

14. A prediction system configured to predict an output from a test input, the system comprising: a data transformation module configured to transform at least some of the input data to obtain a set of normalised data; a rationalising module configured to apply a rationalising function to the set of normalised data to obtain a set of rationalised input data and rationalised expected output data; a clustering module configured to apply a clustering function to the set of rationalised data; a set of rules maintained in computer memory; an optimiser module configured to apply a transformation to the rules based at least partly on the results of the clustering function; a decoder configured to transform a series of outputs; and an output layer configured to display a set of outputs.