WO2017072717A1 - Learning of the structure of bayesian networks from a complete data set - Google Patents

Learning of the structure of bayesian networks from a complete data set Download PDF

Info

Publication number
WO2017072717A1
WO2017072717A1 PCT/IB2016/056512 IB2016056512W WO2017072717A1 WO 2017072717 A1 WO2017072717 A1 WO 2017072717A1 IB 2016056512 W IB2016056512 W IB 2016056512W WO 2017072717 A1 WO2017072717 A1 WO 2017072717A1
Authority
WO
WIPO (PCT)
Prior art keywords
variables
parent
score
subsets
learning
Prior art date
Application number
PCT/IB2016/056512
Other languages
French (fr)
Inventor
Mauro SCANAGATTA
Cassio POLPO DE CAMPOS
Giorgio CORANI
Marco ZAFFALON
Original Assignee
Supsi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Supsi filed Critical Supsi
Publication of WO2017072717A1 publication Critical patent/WO2017072717A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the present invention relates to a device and a method for learning the structure of Bayesian networks from a complete data set, particularly but not exclusively useful and practical for learning the structure of Bayesian networks with a large number of variables, which can be used for analysis and correlation of large quantities of data.
  • the device and the method according to the present invention are applicable, for example, to the diagnostics of mobile networks, to the analysis of financial data, to decision analysis, to weather forecasts and to behavior prediction in general.
  • a Bayesian network is the graphical representation of a probabilistic model constituted by a set of random variables and of their conditional dependencies, represented with the aid of a directed acyclic graph (DAG), also known as a network structure.
  • DAG directed acyclic graph
  • a directed acyclic graph is a graph that has nodes and arcs which are directed, but lacks directed cycles. This means that starting from any node of the graph it is not possible to visit the initial node again by moving along the direction of the arcs of the network.
  • the nodes represent random variables, where each variable can assume a given state or value.
  • a probability value is associated with the possible states of each variable, which must be mutually exclusive, while the arcs between nodes indicate a conditional dependence relationship between the variables represented by the nodes.
  • Conditional probability tables i.e., tables that contain the probabilities of the values of the node conditioned by the possible combinations of values of the parent nodes, are associated with the nodes that have parents, i.e., nodes that are connected to at least one arc that points to them.
  • a Bayesian network is the graphical representation of the probabilistic model, and makes it possible to represent and analyze a phenomenon being studied in conditions of uncertainty.
  • DAG directed acyclic graph
  • the quantitive component consists of a set of conditional probability distributions (CPD) o X, ⁇ , which are defined as network parameters.
  • CPD conditional probability distributions
  • the local conditional probability distributions, associated with each random variable and conditioned by every possible combination of the values assumed by the parent set of the variable, are specified by means of a set of parameters.
  • Bayesian network codifies the qualitative relationships, represented by means of arcs, that exist between the discrete random variables, represented by means of nodes.
  • the strength of the relationships that exist between the discrete random variables i.e., the weight of the arcs that connect the nodes, is quantified by the conditional probability distributions associated with each node.
  • the learning process of a Bayesian network substantially comprises two distinct steps.
  • the first step relates to the learning of the structure of the network, or structural learning, i.e., the relationships between the variables.
  • the second step relates to the learning of the network parameters, or learning parameters, i.e., the conditional probabilities.
  • the learning of the structure of a Bayesian network starting from a complete data set is an NP-hard problem.
  • the learning of the structure of the network is performed during two distinct steps. First one proceeds with an identification of the parent set and then with the optimization of the structure.
  • the parent set identification step produces, for each random variable, i.e., for each node, a list of candidate parent sets, i.e., a list of parent sets suitable to maximize the score of the Bayesian network.
  • the structure optimization step instead selects, for each node, a parent set among the ones listed in the above cited list and assigns it to the corresponding node, maximizing the score of the resulting structure without introducing cycles.
  • the user accepts a compromise between two different goals: minimizing computing time and maximizing the score of the Bayesian network.
  • the user can only approximately estimate the impact of the in-degree on computing time and on the score of the Bayesian network.
  • the in-degree is generally set to a small value in order to maintain the feasibility of the structure optimization step.
  • the aim of the present invention is to overcome the limitations of the background art described above, by devising a device and a method for learning the structure of Bayesian networks from a complete data set that make it possible to obtain effects that are similar to, or better than, those obtainable with solutions of the known type, by making it possible to identify a structure of a Bayesian network of good quality, i.e., optimized as much as possible and with a maximized score, while maintaining computing time at a modest level.
  • an object of the present invention is to conceive of a device and a method for learning the structure of Bayesian networks from a complete data set that make it possible to define beforehand the computing time available for the identification of a structure of a Bayesian network of good quality.
  • Another object of the present invention is to devise a device and method for learning the structure of Bayesian networks from a complete data set that can be applied easily to massive data sets with a large number of variables, for example financial data, commodity data, weather data, webpage access data, and so forth.
  • Another object of the present invention is to provide a device and method for learning the structure of Bayesian networks from a complete data set that are highly reliable, relatively simple to provide and low cost.
  • a device for learning the structure of a Bayesian network related to a plurality of variables, each of said variables being able to assume a plurality of finite states which comprises:
  • an initialization module configured to calculate exact scores in the relationships between each one of said variables and the remaining variables
  • a computing engine configured to calculate heuristic scores in the relationships of each one of said variables with respect to the pairs of said remaining variables, to select parent subsets having highest score, and to calculate heuristic scores for said parent subsets integrated with the score for one of said remaining variables, thus generating new subsets;
  • an ordering module configured to order said parent subsets into a list, ordered by score
  • an optimization module configured to generate iteratively, for a selectable period of time, a random sequence of the items that are present in said list of said parent subsets, and to calculate an overall score for the parent set that covers the entire set of said variables, while holding the parent set to which the highest score corresponds.
  • the intended aim and objects are furthermore achieved by a method for learning the structure of a Bayesian network related to a plurality of variables, each one of said variables being able to assume a plurality of finite states, which comprises the steps that consist in: - initializing the exact scores in the relationships between each one of said variables and the remaining variables;
  • Figure 1 is a block diagram showing schematically the components of an embodiment of the device for learning the structure of Bayesian networks, according to the present invention
  • Figure 2 is a view of a known Bayesian network built manually by experts in biology and relating to the diagnosis of pulmonary diseases in children;
  • Figure 3 is a view of a Bayesian network generated by the device and by the method according to the present invention starting from a complete data set, which can be compared with the network of Figure 2.
  • the device designated generally by the reference numeral 1
  • the method for learning the structure of Bayesian networks from a complete data set according to the invention will now be described in relation to the steps that compose said method.
  • the first step of the method for learning the structure of Bayesian networks according to the present invention is described hereinafter.
  • the parent set identification step according to the present invention which uses heuristic assessment of the score of the structure of a Bayesian network, is performed by means of an identification module 20.
  • the first list 10 is termed the open list, and is a list that lists the parent sets still to be explored, ordered by their heuristic score.
  • the second list 12 is termed the closed list, and is a list that lists the parent sets that have already been explored, together with their exact score.
  • a first function score (P, X) 21 which calculates an exact score according to mutually alternative known methods, for example a BDeu (Bayesian Dirichlet equivalent uniform) score or a BIC (Bayesian information criterion) score which is exact for the set of variables P as parents of the variable X;
  • a second function c(Y, sk) 22 which stores in a temporary memory area or cache the score for a variable Fas a single parent;
  • the goal of the parent set identification module 20 is to find the subset of Y that yields the best scores as a parent set for X.
  • an initialization module 30 calculates the exact score for each candidate variable 7 as a single parent of X with a score ⁇ Y]X), adds the set and the score to the closed list 12, and saves the result in the cache memory for retrieval at a later time.
  • the initialization module 30 proceeds with the addition to the open list 10 of all the pairs of candidate variables Y t and Y 2 , assigning to them the heuristic score obtained as s Y X) + s(Y 2 ,X).
  • the open list 10 is initialized and a computing engine 32 can proceed with the execution of the main cycle, repeating the following steps until all the elements in the open list 10 have been processed, or until the time available for the computational calculation has ended.
  • the computing engine 32 extracts from the open list 10 the subset P with the best heuristic score, calculates the exact score for the parent set for said subset P, adding the set and the score to the closed list 12.
  • the computing engine 32 then proceeds by searching for all the possible expansions of P that are the result of the addition of a single candidate variable Y, after checking that the candidate Y is not already part of a subset P that has already been processed, i.e., that the set of variables P u Fdoes not already exist in the open list 10 or in the closed list 12.
  • the computing engine 32 then assigns a heuristic score obtained as s(P,X) + s(Y,X) to each one of these candidate parent sets, and adds these sets to the open list 10.
  • a module 34 for ordering the parent set returns the content of the closed list 12, ordered according to the exact score calculated previously.
  • the operations performed by the identification module 20 are summarized, for greater clarity, in the following lines of pseudocode.
  • the second step of the method for learning the structure of Bayesian networks according to the present invention is described hereinafter.
  • the step of structure optimization of a Bayesian network according to the present invention is performed by means of an optimization module 40.
  • the optimization module 40 implements the following functions: a first function ancestor ⁇ o, p, a) 42, which, given a vector or array a of the ancestors, checks whether p is an ancestor of o; a second function update(o, p, a) 44, which updates the array a of the ancestors or descendants with the parent set p for o;
  • a third function scoreip) 46 which calculates the score of the parent set p
  • a fourth function randomize(L) 48 which arranges in a random order the items of the list L
  • the purpose of the optimization module 40 of the structure is to find and assign the best possible parent set to each variable or node.
  • the optimization module 40 can use, in a particularly effective embodiment, a bit array, in particular one bit array per variable or node.
  • the bit array for the variable i contains, for each position j, the information that indicates whether the variable j is an ancestor of the variable i in the graph or not.
  • the optimization module 40 repeats the same operation for the descendants of a variable, recursively for all the children of that variable.
  • the optimization module 40 uses the array of descendants to update the array of ancestors following each assignment of a parent set, this assignment being executable by means of binary operations that are very quick in terms of execution.
  • the main cycle of the method according to the invention can apply the Monte Carlo method, which is known in the background art, to the possible orders of the variables or nodes.
  • the optimization module 40 performs a random permutation, selecting a new order.
  • the optimization module 40 selects, for each variable, the parent sets that have the best score and which at the same time do not introduce cycles. At the end of the iteration, the optimization module 40 compares the resulting structure, i.e., the choice of a parent set for each variable or node, with the best structure identified up to that moment. When the time available has ended, the optimization module 40 returns that structure with the best score that has been found.
  • the optimization module 40 deems acceptable a parent set p for a variable x if none of the parent variables is an ancestor of x, i.e., the parent set p does not introduce directed cycles into the graph. This check can be performed rapidly where implemented by means of binary operations on an array of ancestors.
  • the optimization module 40 When the optimization module 40 has completed the choice of the parent set, it deals with updating the arrays of the ancestors and descendants.
  • the optimization module 40 sets the variable x and its descendants, as well as the descendants of any parent in p and its ancestors. It then sets every p and its ancestors, as well as the ancestors of x and its descendants. In this case also, these operations can be performed rapidly by means of binary operations.
  • optimization module 40 The operations performed by the optimization module 40 are summarized for greater clarity in the following lines of pseudocode.
  • the invention fully achieves the intended aim and objects.
  • the device and the method for learning the structure of Bayesian networks from a complete data set thus conceived make it possible to overcome the qualitative limitations of the background art, since they make it possible to identify a structure of a Bayesian network of good quality, i.e., as optimized as possible and with a maximized score, while maintaining computing time at a modest level.
  • Another advantage of the device and the method for learning the structure of Bayesian networks from a complete data set is that they make it possible to define beforehand the computing time available for identifying a structure of a Bayesian network of good quality.
  • a further advantage of the device and of the method for learning the structure of Bayesian networks from a complete data set is that they can be applied easily to massive data sets with a large number of variables, for example financial data, commodity data, weather data, webpage access data, and so forth.
  • the device and the method for learning the structure of Bayesian networks from a complete data set according to the invention have been conceived in particular for learning the structure of Bayesian networks with a large number of variables, which can be used for the analysis and correlation of large quantities of data, they can in any case be used more generally for learning the structure of Bayesian networks with any number of variables, even less than ten variables if necessary.
  • the materials used, as well as the contingent shapes and dimensions may be any according to the requirements and the state of the art.

Abstract

A device (1) for learning the structure of a Bayesian network related to a plurality of variables, each of the variables being able to assume a plurality of finite states, which comprises: - an initialization module (30) configured to calculate exact scores in the relationships between each one of the variables and the remaining variables; - a computing engine (32) configured to calculate heuristic scores in the relationships of each one of the variables with respect to the pairs of the remaining variables, to select parent subsets having highest score, and to calculate heuristic scores for the parent subsets integrated with the score for one of the remaining variables, thus generating new subsets; - a ordering module (34) configured to order the parent subsets into a list, ordered by score; - an optimization module (40) configured to generate iteratively, for a selectable period of time, a random sequence of the items that are present in the list of the parent subsets, and to calculate an overall score for the parent set that covers the entire set of the variables, while holding the parent set to which the highest score corresponds.

Description

LEARNING OF THE STRUCTURE OF BAYESIAN NETWORKS FROM A COMPLETE DATA SET
The present invention relates to a device and a method for learning the structure of Bayesian networks from a complete data set, particularly but not exclusively useful and practical for learning the structure of Bayesian networks with a large number of variables, which can be used for analysis and correlation of large quantities of data. The device and the method according to the present invention are applicable, for example, to the diagnostics of mobile networks, to the analysis of financial data, to decision analysis, to weather forecasts and to behavior prediction in general.
A Bayesian network is the graphical representation of a probabilistic model constituted by a set of random variables and of their conditional dependencies, represented with the aid of a directed acyclic graph (DAG), also known as a network structure.
A directed acyclic graph is a graph that has nodes and arcs which are directed, but lacks directed cycles. This means that starting from any node of the graph it is not possible to visit the initial node again by moving along the direction of the arcs of the network.
The nodes represent random variables, where each variable can assume a given state or value.
A probability value is associated with the possible states of each variable, which must be mutually exclusive, while the arcs between nodes indicate a conditional dependence relationship between the variables represented by the nodes.
Conditional probability tables (CPT), i.e., tables that contain the probabilities of the values of the node conditioned by the possible combinations of values of the parent nodes, are associated with the nodes that have parents, i.e., nodes that are connected to at least one arc that points to them.
In greater detail, a Bayesian network is the graphical representation of the probabilistic model, and makes it possible to represent and analyze a phenomenon being studied in conditions of uncertainty. The probabilistic model represents a probability distribution on a set of discrete random variables X= {Xi,X2, ... ,Xn}- Bayesian networks, typically denoted by B = (G,9), are defined by specifying two components: a qualitative component and a quantitative component.
The qualitative component consists of the above mentioned directed acyclic graph (DAG), denoted by G = (V,£), in which the nodes F have a one-to-one correspondence with the set of discrete random variables X = {Xi,X2, ··· ,Xn} and the directed arcs ε are ordered pairs of elements of V. Each arc represents the conditional dependence between the nodes that it connects.
The quantitive component consists of a set of conditional probability distributions (CPD) o X, Θ, which are defined as network parameters. The local conditional probability distributions, associated with each random variable and conditioned by every possible combination of the values assumed by the parent set of the variable, are specified by means of a set of parameters.
In practice, the structure of a Bayesian network codifies the qualitative relationships, represented by means of arcs, that exist between the discrete random variables, represented by means of nodes.
The strength of the relationships that exist between the discrete random variables, i.e., the weight of the arcs that connect the nodes, is quantified by the conditional probability distributions associated with each node.
If a node has many parents or if the parents of a node have many possible states, the associated conditional probability table (CPT) can be very large. The size of a conditional probability table is in fact exponential with respect to the number of parents. The learning process of a Bayesian network substantially comprises two distinct steps. The first step relates to the learning of the structure of the network, or structural learning, i.e., the relationships between the variables. The second step relates to the learning of the network parameters, or learning parameters, i.e., the conditional probabilities.
In particular, the learning of the structure of a Bayesian network starting from a complete data set is an NP-hard problem.
Various exact algorithms are currently known for learning the structure of a Bayesian network as a function of a score, i.e., algorithms that have the task of finding the structure of the Bayesian network that maximizes the score that depends on the data set. These algorithms are based on different methods that are known in the background art, including for example dynamic programming, the branch and bound method, linear and integer programming, or shortest path heuristics.
Typically, the learning of the structure of the network is performed during two distinct steps. First one proceeds with an identification of the parent set and then with the optimization of the structure.
The parent set identification step produces, for each random variable, i.e., for each node, a list of candidate parent sets, i.e., a list of parent sets suitable to maximize the score of the Bayesian network.
The structure optimization step instead selects, for each node, a parent set among the ones listed in the above cited list and assigns it to the corresponding node, maximizing the score of the resulting structure without introducing cycles.
The typical problem that one must deal with in the parent set identification step is that it is unlikely that this step will admit a polynomial- time algorithm with a guarantee of good quality.
This problem has pushed research toward the development of effective search heuristics. However, usually the in-degree k, which indicates the number of parents per node, is defined beforehand and then the score of all the parent sets is calculated.
It should be noted that a higher in-degree entails a larger search space and consequently makes it possible to obtain a higher score.
These known solutions are not free from drawbacks, among which, in particular, one must include the fact that they require a very long computing time, which is exponential with respect to the in-degree defined beforehand. In practice, the larger the in-degree k, the longer the computing time and the greater the computational burden.
By choosing the in-degree beforehand, the user accepts a compromise between two different goals: minimizing computing time and maximizing the score of the Bayesian network. However, the user can only approximately estimate the impact of the in-degree on computing time and on the score of the Bayesian network. When the number of random variables, and therefore of nodes, is very large, the in-degree is generally set to a small value in order to maintain the feasibility of the structure optimization step.
The aim of the present invention is to overcome the limitations of the background art described above, by devising a device and a method for learning the structure of Bayesian networks from a complete data set that make it possible to obtain effects that are similar to, or better than, those obtainable with solutions of the known type, by making it possible to identify a structure of a Bayesian network of good quality, i.e., optimized as much as possible and with a maximized score, while maintaining computing time at a modest level.
Within the scope of this aim, an object of the present invention is to conceive of a device and a method for learning the structure of Bayesian networks from a complete data set that make it possible to define beforehand the computing time available for the identification of a structure of a Bayesian network of good quality.
Another object of the present invention is to devise a device and method for learning the structure of Bayesian networks from a complete data set that can be applied easily to massive data sets with a large number of variables, for example financial data, commodity data, weather data, webpage access data, and so forth.
Another object of the present invention is to provide a device and method for learning the structure of Bayesian networks from a complete data set that are highly reliable, relatively simple to provide and low cost.
This aim, as well as these and other objects that will become better apparent hereinafter, are achieved by a device for learning the structure of a Bayesian network related to a plurality of variables, each of said variables being able to assume a plurality of finite states, which comprises:
- an initialization module configured to calculate exact scores in the relationships between each one of said variables and the remaining variables;
- a computing engine configured to calculate heuristic scores in the relationships of each one of said variables with respect to the pairs of said remaining variables, to select parent subsets having highest score, and to calculate heuristic scores for said parent subsets integrated with the score for one of said remaining variables, thus generating new subsets;
- an ordering module configured to order said parent subsets into a list, ordered by score;
- an optimization module configured to generate iteratively, for a selectable period of time, a random sequence of the items that are present in said list of said parent subsets, and to calculate an overall score for the parent set that covers the entire set of said variables, while holding the parent set to which the highest score corresponds.
The intended aim and objects are furthermore achieved by a method for learning the structure of a Bayesian network related to a plurality of variables, each one of said variables being able to assume a plurality of finite states, which comprises the steps that consist in: - initializing the exact scores in the relationships between each one of said variables and the remaining variables;
- calculating the heuristic scores in the relationships of each one of said variables with respect to the pairs of said remaining variables, in order to select parent subsets having highest score;
- calculating the heuristic scores for said parent subsets integrated with the score for one of said remaining variables, thus generating new subsets;
- ordering said parent subsets into a list, ordered by score;
- optimizing the structure of the Bayesian network by generating iteratively, for a selectable period of time, a random sequence of the items that are present in said list of said parent subsets, and calculating an overall score for the parent set that covers the entire set of said variables, while holding the parent set to which the highest score corresponds.
Further characteristics and advantages of the invention will become better apparent from the description of a preferred but not exclusive embodiment of the device and of the method for learning the structure of Bayesian networks from a complete data set according to the invention, illustrated by way of nonlimiting example in the accompanying drawings, wherein:
Figure 1 is a block diagram showing schematically the components of an embodiment of the device for learning the structure of Bayesian networks, according to the present invention;
Figure 2 is a view of a known Bayesian network built manually by experts in biology and relating to the diagnosis of pulmonary diseases in children;
Figure 3 is a view of a Bayesian network generated by the device and by the method according to the present invention starting from a complete data set, which can be compared with the network of Figure 2.
With reference to the figures, the device, designated generally by the reference numeral 1, and the method for learning the structure of Bayesian networks from a complete data set according to the invention will now be described in relation to the steps that compose said method.
The first step of the method for learning the structure of Bayesian networks according to the present invention is described hereinafter. The parent set identification step according to the present invention, which uses heuristic assessment of the score of the structure of a Bayesian network, is performed by means of an identification module 20.
Within the scope of this first step, mainly two lists 10, 12 are used. The first list 10 is termed the open list, and is a list that lists the parent sets still to be explored, ordered by their heuristic score.
The second list 12 is termed the closed list, and is a list that lists the parent sets that have already been explored, together with their exact score.
The following functions are then used:
a first function score (P, X) 21, which calculates an exact score according to mutually alternative known methods, for example a BDeu (Bayesian Dirichlet equivalent uniform) score or a BIC (Bayesian information criterion) score which is exact for the set of variables P as parents of the variable X;
a second function c(Y, sk) 22 which stores in a temporary memory area or cache the score for a variable Fas a single parent;
a third function s(Y) 23, which retrieves the score for the variable X as a single parent;
a fourth function pop(L) 24, which extracts the parent set with the best score from the list L;
a fifth function add{L, P, s) 25, which adds the parent set P to the list L, with the score s;
a sixth function timeQ 26, which returns the Boolean value true if there is still time available for computational calculation.
Knowing a target variable X and a set of candidate variables Y, the goal of the parent set identification module 20 is to find the subset of Y that yields the best scores as a parent set for X.
As a first step, an initialization module 30 calculates the exact score for each candidate variable 7 as a single parent of X with a score {Y]X), adds the set and the score to the closed list 12, and saves the result in the cache memory for retrieval at a later time.
Subsequently, the initialization module 30 proceeds with the addition to the open list 10 of all the pairs of candidate variables Yt and Y2, assigning to them the heuristic score obtained as s Y X) + s(Y2,X).
At this point the open list 10 is initialized and a computing engine 32 can proceed with the execution of the main cycle, repeating the following steps until all the elements in the open list 10 have been processed, or until the time available for the computational calculation has ended.
Initially, the computing engine 32 extracts from the open list 10 the subset P with the best heuristic score, calculates the exact score for the parent set for said subset P, adding the set and the score to the closed list 12.
The computing engine 32 then proceeds by searching for all the possible expansions of P that are the result of the addition of a single candidate variable Y, after checking that the candidate Y is not already part of a subset P that has already been processed, i.e., that the set of variables P u Fdoes not already exist in the open list 10 or in the closed list 12.
The computing engine 32 then assigns a heuristic score obtained as s(P,X) + s(Y,X) to each one of these candidate parent sets, and adds these sets to the open list 10.
Finally, a module 34 for ordering the parent set returns the content of the closed list 12, ordered according to the exact score calculated previously.
The operations performed by the identification module 20 are summarized, for greater clarity, in the following lines of pseudocode.
PARENT SET IDENTIFICATION 1 : function
Figure imgf000011_0001
variable x)
2: open — 0
3 : closed— 0
4: for all candidate variable 7 do
5: sk <- score({Y}JC)
6: c(Y, sk)
7: add{closed, Y, sk)
8: end for
9: for all pair of candidate variables Yi and Y2 do
10: h <- s(YhX) + s(Y2,X)
1 1 : add(open, {Yl, Y2}, h)
12: end for
13: while open\ = 0 & do
14: P <— pop(open)
15: sk ^ score(PJ0
16: for all candidate variable ΥΑ Ρ, Ρ Ό Y open u closed do
17: h ^ sk + s(Y )
18 : add(open, Ρ υ Υ, η)
19: end for
20: end while
return ordered(closed)
21 : end function
The second step of the method for learning the structure of Bayesian networks according to the present invention is described hereinafter. The step of structure optimization of a Bayesian network according to the present invention is performed by means of an optimization module 40.
The optimization module 40 implements the following functions: a first function ancestor{o, p, a) 42, which, given a vector or array a of the ancestors, checks whether p is an ancestor of o; a second function update(o, p, a) 44, which updates the array a of the ancestors or descendants with the parent set p for o;
a third function scoreip) 46, which calculates the score of the parent set p; a fourth function randomize(L) 48, which arranges in a random order the items of the list L;
a fifth function timeQ 50, which returns true if there is still time available for computational calculation.
Knowing, for each variable or node, a list of the most promising parent sets, together with their scores, the purpose of the optimization module 40 of the structure is to find and assign the best possible parent set to each variable or node.
It is noted that, according to the invention, it is necessary to ensure the acyclic nature of the graph, i.e., there can be no directed cycle in the graph, and to maximize the resulting score, which can be broken down as the sum of the scores of the individual parent sets.
In order to fully understand acyclic control, it can be useful to think in terms of ancestors of a variable, i.e., its parents, grandparents, and so forth, right down to the roots of the graph. A cycle is in fact introduced only when a variable or node is connected to one of its ancestors in the graph.
In order to ensure compliance with this constraint, i.e., in order to ensure the absence of directed cycles in the graph, the optimization module 40 can use, in a particularly effective embodiment, a bit array, in particular one bit array per variable or node. Given the variable , the bit array for the variable i contains, for each position j, the information that indicates whether the variable j is an ancestor of the variable i in the graph or not.
The optimization module 40 repeats the same operation for the descendants of a variable, recursively for all the children of that variable. The optimization module 40 uses the array of descendants to update the array of ancestors following each assignment of a parent set, this assignment being executable by means of binary operations that are very quick in terms of execution.
The main cycle of the method according to the invention can apply the Monte Carlo method, which is known in the background art, to the possible orders of the variables or nodes. At each iteration, the optimization module 40 performs a random permutation, selecting a new order.
Following the selection of this order, the optimization module 40 selects, for each variable, the parent sets that have the best score and which at the same time do not introduce cycles. At the end of the iteration, the optimization module 40 compares the resulting structure, i.e., the choice of a parent set for each variable or node, with the best structure identified up to that moment. When the time available has ended, the optimization module 40 returns that structure with the best score that has been found.
The optimization module 40 deems acceptable a parent set p for a variable x if none of the parent variables is an ancestor of x, i.e., the parent set p does not introduce directed cycles into the graph. This check can be performed rapidly where implemented by means of binary operations on an array of ancestors.
When the optimization module 40 has completed the choice of the parent set, it deals with updating the arrays of the ancestors and descendants.
For this purpose, the optimization module 40 sets the variable x and its descendants, as well as the descendants of any parent in p and its ancestors. It then sets every p and its ancestors, as well as the ancestors of x and its descendants. In this case also, these operations can be performed rapidly by means of binary operations.
The operations performed by the optimization module 40 are summarized for greater clarity in the following lines of pseudocode.
STRUCTURE OPTIMIZATION
1 : function search(parent sets P)
2: order <- l : JV 3 : bestStructure <— nil
4: bestScore <—∞
5: while timeQ do
6: randomize(order)
7: structure <— empty
8: score <— 0
9: for all o <= order do
10: p <— bestParent(o, P, ancestors, descendants)
1 1 : structureip) <— ?
12: score+ = scoreip)
13 : end for
14: if score > bestScore then
15 : bestScore— score
16: bes tStructure <— s tructure
17: end if
18: end while
return bestStructure
19: end function 1 : function besii¾re«i(variable o, parent sets , ancestors array a, descendants array d)
2: best ^ 0
3 : for all parentSet e (o) do
4: admissible <— /rwe
5: for all ?are«i & parentSet do
6 : if ancestor{p, parent, a) then
7: admissible <— a/se
8: end if 9: end for
10 if admissible then
11 best <— parentSet
12 break
13 end if
14: end for
15 : update(ancestor, o, p)
16 : update{descendants, o, p)
return best
17: end function
In practice it has been found that the invention fully achieves the intended aim and objects. In particular, it has been found that the device and the method for learning the structure of Bayesian networks from a complete data set thus conceived make it possible to overcome the qualitative limitations of the background art, since they make it possible to identify a structure of a Bayesian network of good quality, i.e., as optimized as possible and with a maximized score, while maintaining computing time at a modest level.
Another advantage of the device and the method for learning the structure of Bayesian networks from a complete data set is that they make it possible to define beforehand the computing time available for identifying a structure of a Bayesian network of good quality.
A further advantage of the device and of the method for learning the structure of Bayesian networks from a complete data set is that they can be applied easily to massive data sets with a large number of variables, for example financial data, commodity data, weather data, webpage access data, and so forth.
Although the device and the method for learning the structure of Bayesian networks from a complete data set according to the invention have been conceived in particular for learning the structure of Bayesian networks with a large number of variables, which can be used for the analysis and correlation of large quantities of data, they can in any case be used more generally for learning the structure of Bayesian networks with any number of variables, even less than ten variables if necessary.
The invention thus conceived is susceptible of numerous modifications and variations, all of which are within the scope of the inventive concept. All the details may furthermore be replaced with other technically equivalent elements.
In practice, the materials used, as well as the contingent shapes and dimensions, may be any according to the requirements and the state of the art.
To conclude, the scope of protection of the claims must not be limited by the illustrations or by the preferred embodiments presented in the description as examples, but rather the claims must comprise all the characteristics of patentable novelty that reside within the present invention, including all the characteristics that would be treated as equivalent by the person skilled in the art.
Where technical features mentioned in any claim are followed by reference signs, those reference signs have been included for the sole purpose of increasing the intelligibility of the claims and accordingly such reference signs do not have any limiting effect on the interpretation of each element identified by way of example by such reference signs.

Claims

1. A device (1) for learning the structure of a Bayesian network related to a plurality of variables, each of said variables being able to assume a plurality of finite states, which comprises:
- an initialization module (30) configured to calculate exact scores in the relationships between each one of said variables and the remaining variables;
- a computing engine (32) configured to calculate heuristic scores in the relationships of each one of said variables with respect to the pairs of said remaining variables, to select parent subsets having highest score, and to calculate heuristic scores for said parent subsets integrated with the score for one of said remaining variables, thus generating new subsets;
- a ordering module (34) configured to order said parent subsets into a list, ordered by score;
- an optimization module (40) configured to generate iteratively, for a selectable period of time, a random sequence of the items that are present in said list of said parent subsets, and to calculate an overall score for the parent set that covers the entire set of said variables, while holding the parent set to which the highest score corresponds.
2. The device (1) for learning the structure of a Bayesian network according to claim 1, characterized in that said exact scores are calculated by means of the BDeu, Bayesian Dirichlet equivalent uniform, method or the BIC, Bayesian information criterion, method or other methods that operate according to the same principles.
3. A method for learning the structure of a Bayesian network related to a plurality of variables, each one of said variables being able to assume a plurality of finite states, comprising the steps that consist in:
- initializing the exact scores in the relationships between each one of said variables and the remaining variables;
- calculating the heuristic scores in the relationships of each one of said variables with respect to the pairs of said remaining variables, in order to select parent subsets having highest score;
- calculating the heuristic scores for said parent subsets integrated with the score for one of said remaining variables, thus generating new subsets;
- ordering said parent subsets into a list, ordered by score;
- optimizing the structure of the Bayesian network by generating iteratively, for a selectable period of time, a random sequence of the items that are present in said list of said parent subsets, and calculating an overall score for the parent set that covers the entire set of said variables, while holding the parent set to which the highest score corresponds.
4. The method for learning the structure of a Bayesian network according to claim 3, characterized in that said exact scores are calculated by means of the BDeu, Bayesian Dirichlet equivalent uniform, method or the BIC, Bayesian information criterion, method or other methods that operate according to the same principles.
PCT/IB2016/056512 2015-10-29 2016-10-28 Learning of the structure of bayesian networks from a complete data set WO2017072717A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CH01583/15 2015-10-29
CH01583/15A CH711716A1 (en) 2015-10-29 2015-10-29 Learning the structure of Bayesian networks from a complete data set

Publications (1)

Publication Number Publication Date
WO2017072717A1 true WO2017072717A1 (en) 2017-05-04

Family

ID=57543086

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2016/056512 WO2017072717A1 (en) 2015-10-29 2016-10-28 Learning of the structure of bayesian networks from a complete data set

Country Status (2)

Country Link
CH (1) CH711716A1 (en)
WO (1) WO2017072717A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421565A (en) * 2023-12-18 2024-01-19 中国人民解放军国防科技大学 Markov blanket-based equipment assessment method and device and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307160A1 (en) * 2008-06-09 2009-12-10 Microsoft Corporation Parallel generation of a bayesian network
US20100198761A1 (en) * 2009-01-30 2010-08-05 Meng Teresa H Systems, methods and circuits for learning of relation-based networks

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7324981B2 (en) * 2002-05-16 2008-01-29 Microsoft Corporation System and method of employing efficient operators for Bayesian network search
JP5135831B2 (en) * 2007-03-15 2013-02-06 富士ゼロックス株式会社 Computing device
US20120185424A1 (en) * 2009-07-01 2012-07-19 Quantum Leap Research, Inc. FlexSCAPE: Data Driven Hypothesis Testing and Generation System
US9864953B2 (en) * 2013-05-30 2018-01-09 President And Fellows Of Harvard College Systems and methods for Bayesian optimization using integrated acquisition functions
US20150142709A1 (en) * 2013-11-19 2015-05-21 Sikorsky Aircraft Corporation Automatic learning of bayesian networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307160A1 (en) * 2008-06-09 2009-12-10 Microsoft Corporation Parallel generation of a bayesian network
US20100198761A1 (en) * 2009-01-30 2010-08-05 Meng Teresa H Systems, methods and circuits for learning of relation-based networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MALONE BRANDON ET AL: "A Depth-First Branch and Bound Algorithm for Learning Optimal Bayesian Networks", 3 August 2013, NETWORK AND PARALLEL COMPUTING; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER INTERNATIONAL PUBLISHING, CHAM, PAGE(S) 111 - 122, ISBN: 978-3-642-28938-5, ISSN: 0302-9743, XP047267119 *
MARC TEYSSIER ET AL: "Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks", CORR (ARXIV), no. arXiv:1207.1429, 4 July 2012 (2012-07-04), pages 1 - 7, XP055348338 *
MAURO SCANAGATTA ET AL: "Learning Bounded Treewidth Bayesian Networks with Thousands of Variables", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2015), vol. 28, 7 December 2015 (2015-12-07), pages 1 - 9, XP055348145 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421565A (en) * 2023-12-18 2024-01-19 中国人民解放军国防科技大学 Markov blanket-based equipment assessment method and device and computer equipment
CN117421565B (en) * 2023-12-18 2024-03-12 中国人民解放军国防科技大学 Markov blanket-based equipment assessment method and device and computer equipment

Also Published As

Publication number Publication date
CH711716A1 (en) 2017-05-15

Similar Documents

Publication Publication Date Title
Biswas et al. Improving differential evolution through Bayesian hyperparameter optimization
US11010527B2 (en) Optimization of a quantum circuit by inserting swap gates
Dowlatshahi et al. A discrete gravitational search algorithm for solving combinatorial optimization problems
EP3029614B1 (en) Parallel development and deployment for machine learning models
Nagata et al. A new genetic algorithm for the asymmetric traveling salesman problem
US8805845B1 (en) Framework for large-scale multi-label classification
CN109685204B (en) Image processing method and device, storage medium and electronic equipment
JP2020086821A (en) Optimization device and control method thereof
Duarte et al. Path relinking for large-scale global optimization
JP6178004B2 (en) System and method for distance approximation in graphs
CN113011529B (en) Training method, training device, training equipment and training equipment for text classification model and readable storage medium
CN110019973A (en) For estimating the causal methods, devices and systems between observational variable
Akhtar et al. Efficient multi-objective optimization through population-based parallel surrogate search
Li et al. An extended depth-first search algorithm for optimal triangulation of Bayesian networks
Boryczka et al. An effective hybrid harmony search for the asymmetric travelling salesman problem
Czajkowski et al. Evolutionary induction of global model trees with specialized operators and memetic extensions
Nakamura A ucb-like strategy of collaborative filtering
Romero Ruiz et al. Memetic algorithm with hungarian matching based crossover and diversity preservation
Angone et al. Hybrid quantum-classical multilevel approach for maximum cuts on graphs
WO2017072717A1 (en) Learning of the structure of bayesian networks from a complete data set
Khan et al. A tabu search approximation for finding the shortest distance using traveling salesman problem
Wang et al. A multi-objective genetic programming approach to uncover explicit and implicit equations from data
JP4774019B2 (en) Network generation method, information search method, program, network generation device, and information search device
Senz et al. DortmundAI at LeQua 2022: Regularized SLD.
Jo et al. AutoGAN-DSP: Stabilizing GAN architecture search with deterministic score predictors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16810041

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16810041

Country of ref document: EP

Kind code of ref document: A1