WO2020095321A2 - Dynamic structure neural machine for solving prediction problems with uses in machine learning - Google Patents

Dynamic structure neural machine for solving prediction problems with uses in machine learning Download PDF

Info

Publication number
WO2020095321A2
WO2020095321A2 PCT/IN2019/050820 IN2019050820W WO2020095321A2 WO 2020095321 A2 WO2020095321 A2 WO 2020095321A2 IN 2019050820 W IN2019050820 W IN 2019050820W WO 2020095321 A2 WO2020095321 A2 WO 2020095321A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
neurons
hidden layer
hidden
data set
Prior art date
Application number
PCT/IN2019/050820
Other languages
French (fr)
Other versions
WO2020095321A8 (en
WO2020095321A3 (en
Inventor
Vishwajeet Singh Thakur
Original Assignee
Vishwajeet Singh Thakur
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vishwajeet Singh Thakur filed Critical Vishwajeet Singh Thakur
Publication of WO2020095321A2 publication Critical patent/WO2020095321A2/en
Publication of WO2020095321A3 publication Critical patent/WO2020095321A3/en
Publication of WO2020095321A8 publication Critical patent/WO2020095321A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • This invention relates to the field of artificial intelligence. Still particularly this invention relates to that of neural networks in the field of information technology.
  • the present invention discloses a new and novel neural network architecture such as “Dynamic Structure Neural Network (DSNN)” for solving multiclass classification problem. Also, this invention discloses a new and novel automated method of training (or learning) Dynamic Structure Neural Networks such as“Dynamic Structure Neural Learning (DSNL)” for solving multiclass classification problem, and describes a product such as“Dynamic Structure Neural Machine (DSNM)” which is implementable in hardware for specific problems.
  • DSNN Dynamic Structure Neural Network
  • DSNL Dynamic Structure Neural Learning
  • DSNM Dynamic Structure Neural Machine
  • Real world applications involve solving problems where an input data has to be classified as belonging to one of the many pre-defined finite number of classes, such a problem is referred to as a Classification problem in the general Computer Science community.
  • the examples of which are e-mail classification, face recognition, cancer prediction e.t.c.
  • There are real world applications where for each input data a real output value has to be predicted, these types of problems are called Regression problems.
  • Some examples of regression problems are stock price prediction, credit rating in banking or insurance, market demand forecasting, e.t.c.
  • Machine Learning techniques are commonly applied to solve real-world classification and regression problems. Neural Networks is a class of machine learning models that are successfully applied to solve many classification and regression problems.
  • Training a neural network model would require data (which is referred to as training data) and a parameter estimation technique (also known as training or learning methods).
  • a parameter estimation technique also known as training or learning methods.
  • One of the early methods to train (multi-layered) neural networks is called the back propagation method and is described in D.E. Rumelhart, G.E. Hinton and R. J. Williams, Learning representations by back-propagating errors, Nature, 323, 533—536, 1986. Most of the common (supervised learning) methods that have been proposed thus far are some variation or extension of this work.
  • back propagation method relies on the user/developer to guess the appropriate neural network architecture (i.e., number of layers and size of each layer), for which the user relies on trial-and-error method.
  • back propagation based learning can give a locally optimum solution as its computation is based on gradients of some error function.
  • neural networks states that a feed-forward neural network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. This is also referred to as the Universal Approximation Theorem and one of it’s first versions is proved in G. Cybenko, Approximations by superpositions of sigmoidal functions, Mathematics of Control, Signals, and Systems, 2(4), 303-314, 1989. In practice, it is found that neural networks with many hidden layers, also referred to as Deep Neural Networks, tend to perform better on tasks involving large datasets. Training deep neural networks by using trial-and-error to guess the size and shape of the neural network is even more challenging task, as the number of hyperparameters on which trial-and-error is done is much more.
  • DSNN Dynamic Structure Neural Network
  • DSNM Dynamic Structure Neural Machine
  • Neural Networks are a class of machine learning models that can be trained to solve classification, regression and other tasks using (training) data.
  • Traditional methods of training neural networks for any task require the size (or architecture) of the neural network to be specified by the user and that would be fixed during training / learning process. This is required as the majority of the learning methods are based on some variant of back-propagation technique.
  • Back-propagation method (and its variants) determine the parameter values of all the layers simultaneously. All the parameter values are updated in an iterative manner using the steepest-descent logic until a convergence condition is satisfied.
  • the size of the neural network has to be decided by the user before the training / learning process starts and stays fixed there after.
  • This invention is about
  • DSNN Dynamic Structure Neural Network
  • DSNL Dynamic Structure Neural Learning
  • DSNM Dynamic Structure Neural Machine
  • Neural networks comprise of multiple layers arranged in a sequential order, with each layer comprising of multiple processing units (or neurons)
  • the first (hidden) layer gets the training data as it’s input and the output of each layer is then fed as input to the subsequent layer. Output of the final layer is considered as the output of the neural network for a given input
  • weight vectors (w/) and bias terms (b ) of all neurons across all layers form the parameters of the neural network, whose value is determined during the training / learning stage
  • the parameters of a neuron represent a hyperplane, where weight vector of the neuron is normal vector to the hyperplane and bias term of neuron is the intercept term of the hyperplane
  • Absolute value of the affine transformation that the neuron performs is directly proportional to the perpendicular distance between the input to the neuron and the neurons hyperplane.
  • the sign of the output of affine transformation of a neuron signifies if the input is lying on the positive or negative side of the hyperplane of the neuron.
  • output layer comprises the same number of neurons as the number of classes.
  • Training neural network models usually involves splitting the input data set into three subsets, called the train, validation and test sets, in the ratio of roughly 80% train, 10% validation and 10% test. Train data set is used to estimate the neural network parameters. Validation data set is used to decide when to stop the training process, it’s also used to pick the hyper-parameter values. Test data set is used to estimate the generalization performance of the trained neural network model.
  • DSNN is a new and novel architecture of feedforward neural networks for solving multiclass classification problem.
  • This neural network architecture is derived and based on the geometric significance of the role of neurons in hidden layers in achieving the mapping from input of the neural network to it’s output.
  • each hidden layer neuron is playing a role in separating the training data set into homogenous subsets (i.e. all points in the subset belong to the same class).
  • the hidden layers transform the training data set in the input space, which is typically not linearly separable, into a space where the set of points are becoming linearly separable (i.e. single hyperplane separates majority of points of one class from the rest of the points).
  • a neural network comprising of an input layer, one or more hidden layers and an output layer - Neurons in each hidden layer are grouped, based on their geometric orientation w.r.t. the input to the hidden layer, as either Frontier neurons or Inner neurons.
  • the output layer receives, as input, the output of Frontier neurons of all the hidden layers.
  • FIG 1 shows the schematic of the DSNN architecture for multiclass classification problem.
  • DSNL is a novel method of automated learning (training) of DSNN model to solve multiclass classification problem.
  • SAP-Tree (Stochastic Adaptive Partitioning Tree) data structure. Nodes of the SAP-Tree correspond to hyperplanes. This tree is built with the aim of partitioning the input data set to the hidden layer into smaller subsets which are homogenous (i.e. all points in the subset belong to the same class).
  • Each node of the SAP-Tree is converted to either a Frontier or Inner neuron of the hidden layer.
  • Hyperplane which results in dividing a (sub) set of input data points into two subsets at least one of which is homogenous, is converted to a Frontier neuron. Remaining hyperplanes are converted into Inner neurons.
  • Figures 2(A), 2(B) and 2(C) explain the iterative partitioning process of the same input data set to the hidden layer into disjoint homogenous subsets, by hyperplanes which represent either Frontier neurons or Inner neurons, thereby resulting in the SAP-Tree data structure.
  • FIG 2(A) shows the schematic of the processes of creation of Frontier and Inner neurons hyperplanes.
  • FIG 2(B) shows the schematic of the process of iterative partitioning data set into disjoint homogenous subsets.
  • FIG 2(C) show the schematic of SAP-Tree formation by iterative partitioning of data set using hyperplanes.
  • the algorithm decides if it’s required to add yet another hidden layer to the model. The criteria for this is if the sum total neurons in all the hidden layers is greater ‘x’% of the training data set size, then do not add any new hidden layer.
  • a good estimate for the value of x derived from experimental results is 5%. However for specific data sets the value of x can be tuned further.
  • Next hidden layer is constructed from only those data points which do not belong to homogenous subsets of the previous and current hidden layers SAP-Tree(s).
  • the parameters of the output layer are determined (without error back-propagation).
  • the size of the output layer is the same as the number of classes.
  • Cost function is Softmax/Cross-Entropy function.
  • the input to the output layer is the output of Frontier neurons of all the hidden layers and the Inner neurons of the last hidden layer. The parameters of this output layer can be learnt directly using the gradient descent method applied on the cost function and the input data set to the output layer.
  • the dimension of the neural network is fully determined and fixed, hence a few iterations of backpropagation algorithm can be run to fine-tune the parameters (weights and biases) of the DSNN model.
  • This is an optional step and usually results in slight improvement in the accuracy of the DSNN model.
  • N DSNN model
  • Stochastic Adaptive Partitioning algorithm which is a method to automatically construct a SAP-Tree data structure of a hidden layer from the hidden layers input data set.
  • Stochastic Adaptive Partitioning algorithm is an iterative method, which considers a (sub) set of data points to be partitioned in each iteration into two subsets by a hyperplane. Data points lying on the positive side of the hyperplane form one subset and ones lying on the other side form the other subset. It starts the iteration with the hidden layers input data set and iterates over the smaller subsets recursively.
  • DQueue ⁇ D ⁇ , a priority queue data structure containing subsets of data points which are to be partitioned
  • Steps 4, 5 and 6 of the algorithm are described below in detail:
  • D ⁇ (x l y , ), (x 2 , y 2 ), ... , (x m , y m ) ⁇ be the input data to the hidden layer, where x, o R d and is the class label such that ⁇ ⁇ 1, 2, .. C ⁇ V i.
  • training data (sub)set D which contains points belonging to C classes.
  • step i Find the class k which has most number of data points, call it the dominating class (step ii) Split the training data set D into two subsets
  • D primaiy ⁇ X i
  • X i o D and y ⁇ k ⁇ , set of points in D belonging to dominating class k
  • D secondaiy [x, x i e D and y, 1 k ⁇ , set of points in D which do not belong to class k
  • nl and n2 be the sizes of sets D primaiy and D secondaiy respectively.
  • step iii If the data set D is highly imbalanced, i.e., ratio n2 / (nl + n2) is less than threshold T, then STOP.
  • T is a tuning parameter which can take values in range [0, 1).
  • step iv Formulate and solve an optimization problem to find the weight vector of neuron Formulate an optimization problem, parameterized by unit vector w, to achieve the following three objectives:
  • This optimization problem has multiple objectives, that is, maximization of f(w) and minimization of g(w) and h(w). It also has a constraint that w is a unit vector.
  • l 1 > 0 and l 2 > 0 are tuning parameters and cross-validation data is used to determine the ideal values of l 1 and l 2 .
  • l 1 and l 2 start by taking a zero value and slowly increase their value from 0 to 1 as the network size grows.
  • w T A1 w represents the average pairwise difference between the projection of pair of points from D primaiy and D secondaiy respectively onto w. This quantity has to be maximized.
  • the solution w* of this problem is the leading eigenvector of symmetric matrix A, i.e., the eigenvector corresponding to the largest eigenvalue.
  • n is the sum of nl and n2.
  • w T Al approx w represents the random approximation of the average pairwise difference between the projection of pair of points from D primaiy and D secondaiy respectively onto w
  • the resulting vector w* becomes the weight vector of the next neuron of the hidden layer.
  • the heuristic used to decide the value of bias term is to choose a value which improves the overall classification accuracy using minimal number of separating hyperplanes.
  • One approach to implement this heuristic is to choose the bias term which results in the least misclassification error.
  • Input to this subsection is the training data (sub)set D and w* which is the weight vector of the neuron computed in the previous subsection.
  • step i Compute set D proj which contains the projection of points in D onto vector w*
  • step ii Use gini index metric to pick a value that can best split the set D proj Gini index of set D is defined as :
  • Gini index can be used to pick a value v which can be used to split D into two subsets D, and D 2 such that
  • the weighted gini index of the resulting subsets D, and D 2 is defined as:
  • Weighted_Gini(v) (p, * Gini(D,) ) + (p 2 * Gini(D 2 ) )
  • Gini ⁇ ) and Gini(D 2 ) are the gini index values of D, and D 2 respectively.
  • step iii Bias term of the neuron, whose weight vector is the input w*, is set as negative value of
  • DSNM Dynamic Structure Neural Machine
  • DSNL can be applied in a variety of applications, such as Medical Diagnosis system, Face Recognition system, Sentiment Analysis of Social Media content, Speech Recognition system, e.t.c.
  • the hardware implementation allows it to receive a variety of input data, such as Camera based sensor data, data stored in an external storage medium which can be connected via USB port, Stream (Web, Audio and Video) data.
  • the vectorization module is capable of processing these types of data and converting it into vector form.
  • FIG 4 shows the hardware implementation of the Dynamic Structure Neural Machine (DSNM).
  • the DSNM consists of a ARM processor based Central Processing Unit, one or more Sensors which feed the data to DSNM, a User-Input mechanism using which user send control signals to
  • DSNM a storage mechanism to store and retrieve data, model and other meta information.
  • the central element of DSNM is the DSNM process which receives control signals from user and performs data acquisition, model training or live testing.
  • the DSNM operates in three modes (i) Data acquisition mode (ii) the training or learning mode and (iii) testing or live usage mode.
  • data acquisition mode user directs the DSNM process to receive data from a sensor or input device(s) and to store it in the internal storage mechanism.
  • training or learning mode user directs DSNM to train a neural network model for a particular data set.
  • DSNM follows the training procedure described in the“DESCRIPTION” section to train a neural network using the training data set.
  • the trained model is stored using the internal storage mechanism.
  • the testing or live usage mode allows the user to apply an already trained model on test or live data.
  • the sensor or input device(s) are directed to receive the data, which is then converted into vector form and passed as input to the pre-trained model, which predicts the output value for that particular input. The output or predicted value is then transmitted to the output device(s)
  • DSNM When applied on real-world multiclass classification data sets, DSNM has proven to give high accuracy across data sets from various domains.
  • classification data sets on which DSNM is applied and the neural network models were trained in an automated manner are:
  • MNIST This is a publicly available data set, called MNIST. Contains 60,000 train and 10,000 test images of ten handwritten digits, which are of 28x28 dimension. DSNM gave an accuracy of 98.4% on the test dataset.
  • the dataset consists of data collected from heavy Scania trucks in everyday usage.
  • the system in focus is the Air Pressure system (APS) which generates pressurised air that are utilized in various functions in a truck, such as braking and gear changes.
  • the datasets' positive class consists of component failures for a specific component of the APS system.
  • the negative class consists of trucks with failures for components not related to the APS.
  • This data set is available on the UCI Machine Learning repository.
  • the data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution.
  • the classification goal is to predict if the client will subscribe a term deposit (variable y).
  • DSNM gave an accuracy of 88.6% on this data set

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

This invention discloses a new and novel methodology which can be used to solve multiclass classification problems in an automated way. It describes a novel neural network architecture "Dynamic Structure Neural Network (DSNN)", a novel automated learning method "Dynamic Structure Neural Learning (DSNL)" for training DSNN models and a product "Dynamic Structure Neural Machine (DSNM)" which is a computer-implementation of DSNN and DSNL for solving multiclass classification problems, such as, Medical Diagnosis, Face Recognition, Sentiment Analysis, Speech Recognition e.t.c. The system and method given in this invention analyzes any (structured, semi-structured or unstructured) type and form of data that can be vectorized. The novelty of this method is the architecture of the DSNN model and automated learning method DSNL that simultaneously determines the number of hidden layers, number of processing units (or neurons) in each hidden layer hidden layer and their parameters (weight and biases).

Description

COMPLETE SPECIFICATION
Dynamic Structure Neural Machine for Solving Prediction Problems with uses in Machine
Learning
The following specification particularly describes the invention and the manner in which it is to be performed
l
FIELD OF INVENTION: -
This invention relates to the field of artificial intelligence. Still particularly this invention relates to that of neural networks in the field of information technology.
Particularly the present invention discloses a new and novel neural network architecture such as “Dynamic Structure Neural Network (DSNN)” for solving multiclass classification problem. Also, this invention discloses a new and novel automated method of training (or learning) Dynamic Structure Neural Networks such as“Dynamic Structure Neural Learning (DSNL)” for solving multiclass classification problem, and describes a product such as“Dynamic Structure Neural Machine (DSNM)” which is implementable in hardware for specific problems.
BACKGROUND OF INVENTION :-
Real world applications involve solving problems where an input data has to be classified as belonging to one of the many pre-defined finite number of classes, such a problem is referred to as a Classification problem in the general Computer Science community. The examples of which are e-mail classification, face recognition, cancer prediction e.t.c. There are real world applications where for each input data a real output value has to be predicted, these types of problems are called Regression problems. Some examples of regression problems are stock price prediction, credit rating in banking or insurance, market demand forecasting, e.t.c. Machine Learning techniques are commonly applied to solve real-world classification and regression problems. Neural Networks is a class of machine learning models that are successfully applied to solve many classification and regression problems. Training a neural network model would require data (which is referred to as training data) and a parameter estimation technique (also known as training or learning methods). One of the early methods to train (multi-layered) neural networks is called the back propagation method and is described in D.E. Rumelhart, G.E. Hinton and R. J. Williams, Learning representations by back-propagating errors, Nature, 323, 533—536, 1986. Most of the common (supervised learning) methods that have been proposed thus far are some variation or extension of this work. Although widely popular, back propagation method relies on the user/developer to guess the appropriate neural network architecture (i.e., number of layers and size of each layer), for which the user relies on trial-and-error method. Also back propagation based learning can give a locally optimum solution as its computation is based on gradients of some error function.
The theory of neural networks states that a feed-forward neural network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. This is also referred to as the Universal Approximation Theorem and one of it’s first versions is proved in G. Cybenko, Approximations by superpositions of sigmoidal functions, Mathematics of Control, Signals, and Systems, 2(4), 303-314, 1989. In practice, it is found that neural networks with many hidden layers, also referred to as Deep Neural Networks, tend to perform better on tasks involving large datasets. Training deep neural networks by using trial-and-error to guess the size and shape of the neural network is even more challenging task, as the number of hyperparameters on which trial-and-error is done is much more.
Figure imgf000005_0001
It is an object of the present invention to provide a new and novel neural network architecture such as“Dynamic Structure Neural Network (DSNN)” for solving multiclass classification problem. Further it is an object of the invention to provide a new and novel automated learning (training) methodology such as“Dynamic Structure Neural Learning (DSNL)” to train (or learn) DSNN models for solving multiclass classification problem.
Further it is an object of the invention to provide a new and novel Stochastic Adaptive Partitioning algorithm which is used to construct the hidden layer of a feed forward neural network, given the previous layers output on entire data set.
Furthermore it is an object of the invention to provide a product“Dynamic Structure Neural Machine (DSNM)” which is implementable in hardware for specific problems.
DESCRIPTION OF INVENTION -
Figure imgf000006_0001
Neural Networks are a class of machine learning models that can be trained to solve classification, regression and other tasks using (training) data. Traditional methods of training neural networks for any task require the size (or architecture) of the neural network to be specified by the user and that would be fixed during training / learning process. This is required as the majority of the learning methods are based on some variant of back-propagation technique. Back-propagation method (and its variants) determine the parameter values of all the layers simultaneously. All the parameter values are updated in an iterative manner using the steepest-descent logic until a convergence condition is satisfied. Hence the size of the neural network has to be decided by the user before the training / learning process starts and stays fixed there after.
This invention is about
(a) Dynamic Structure Neural Network (DSNN), a new and novel neural network architecture to solve multiclass classification problem
(b) Dynamic Structure Neural Learning (DSNL), a new and novel automated learning method to train DSNN to solve multiclass classification problems without the user having to specify any learning hyperparameters including the size of the neural network. (c) Dynamic Structure Neural Machine (DSNM), a computer-implementation of DSNN and
DSNL to solve real-world machine learning tasks of multiclass classification
Discussion on the terminology used:
I. (Feed-forward) Neural networks comprise of multiple layers arranged in a sequential order, with each layer comprising of multiple processing units (or neurons)
II. The first (hidden) layer gets the training data as it’s input and the output of each layer is then fed as input to the subsequent layer. Output of the final layer is considered as the output of the neural network for a given input
III. Each processing unit i, of layer 1, accepts as input a vector x1 1 and performs an affine transformation z = w,1 * x1 1 + bp where w,1 is a vector of the same dimension as input x1 1 and b, is a one dimensional variable (these will be referred to as weight vector and bias term, respectively, in the subsequent discussion)
IV. This affine transformation is usually followed by a non-linear transformation y,1 = f(z ).
The functions that are commonly used as non-linear transformation are Sigmoid or Tanh. But in practice there are many other functions that are also applied to curtail the output of a neuron to a range.
V. Here, y/, output of the non-linear transformation, is considered as the output of neuron i of layer 1
VI. Also, weight vectors (w/) and bias terms (b ) of all neurons across all layers form the parameters of the neural network, whose value is determined during the training / learning stage
VII. In vector geometry parlance, a hyperplane is represented as w * x + b = 0, where w is a vector normal to the hyperplane and b is the intercept term. Any vector xk which is lying on the hyperplane satisfies the equation w * xk + b = 0
VIII. We say that a point (vector) xk lies on the positive side of the hyperplane if w * xk + b > 0 and on the negative side of the hyperplane if w * x + b < 0 IX. The absolute value of the affine transformation zk = w * xk + b, where w is a unit vector, is interpreted as the perpendicular distance of vector xk from hyperplane w * x + b = 0. And the sign of ¾ determines on which side of the hyperplane lies the point xk
X. Therefore, the parameters of a neuron (weight vector and bias term) represent a hyperplane, where weight vector of the neuron is normal vector to the hyperplane and bias term of neuron is the intercept term of the hyperplane
XI. Absolute value of the affine transformation that the neuron performs is directly proportional to the perpendicular distance between the input to the neuron and the neurons hyperplane.
XII. The sign of the output of affine transformation of a neuron signifies if the input is lying on the positive or negative side of the hyperplane of the neuron.
XIII. In the context of multiclass classification problems, output layer comprises the same number of neurons as the number of classes.
XIV. Training neural network models usually involves splitting the input data set into three subsets, called the train, validation and test sets, in the ratio of roughly 80% train, 10% validation and 10% test. Train data set is used to estimate the neural network parameters. Validation data set is used to decide when to stop the training process, it’s also used to pick the hyper-parameter values. Test data set is used to estimate the generalization performance of the trained neural network model.
Dynamic Structure Neural Network (DSNN):
DSNN is a new and novel architecture of feedforward neural networks for solving multiclass classification problem. This neural network architecture is derived and based on the geometric significance of the role of neurons in hidden layers in achieving the mapping from input of the neural network to it’s output.
In the context of multiclass classification, each hidden layer neuron is playing a role in separating the training data set into homogenous subsets (i.e. all points in the subset belong to the same class). The hidden layers transform the training data set in the input space, which is typically not linearly separable, into a space where the set of points are becoming linearly separable (i.e. single hyperplane separates majority of points of one class from the rest of the points).
With this understanding the DSNN architecture for classification problems is proposed as:
- A neural network comprising of an input layer, one or more hidden layers and an output layer - Neurons in each hidden layer are grouped, based on their geometric orientation w.r.t. the input to the hidden layer, as either Frontier neurons or Inner neurons.
- Frontier neurons of each hidden layer are connected to the next hidden layer and the output layer.
- Therefore the output layer receives, as input, the output of Frontier neurons of all the hidden layers.
- Inner neurons of each hidden layer are only connected to the next (hidden or output) layer.
- The definition and determination of Frontier neurons and Inner neurons is explained in the following section which describes the automated training method for DSNN
FIG 1 shows the schematic of the DSNN architecture for multiclass classification problem.
Dynamic Structure Neural Learning (DSNL):
DSNL is a novel method of automated learning (training) of DSNN model to solve multiclass classification problem.
DSNL constructs each hidden layer, one after the other, from a corresponding SAP-Tree
(Stochastic Adaptive Partitioning Tree) data structure. Nodes of the SAP-Tree correspond to hyperplanes. This tree is built with the aim of partitioning the input data set to the hidden layer into smaller subsets which are homogenous (i.e. all points in the subset belong to the same class).
Each node of the SAP-Tree is converted to either a Frontier or Inner neuron of the hidden layer. Hyperplane which results in dividing a (sub) set of input data points into two subsets at least one of which is homogenous, is converted to a Frontier neuron. Remaining hyperplanes are converted into Inner neurons.
Figures 2(A), 2(B) and 2(C) explain the iterative partitioning process of the same input data set to the hidden layer into disjoint homogenous subsets, by hyperplanes which represent either Frontier neurons or Inner neurons, thereby resulting in the SAP-Tree data structure.
Specifically, FIG 2(A) shows the schematic of the processes of creation of Frontier and Inner neurons hyperplanes. FIG 2(B) shows the schematic of the process of iterative partitioning data set into disjoint homogenous subsets. FIG 2(C) show the schematic of SAP-Tree formation by iterative partitioning of data set using hyperplanes. After adding a hidden layer the algorithm decides if it’s required to add yet another hidden layer to the model. The criteria for this is if the sum total neurons in all the hidden layers is greater ‘x’% of the training data set size, then do not add any new hidden layer. In a general automated learning setting, a good estimate for the value of x derived from experimental results is 5%. However for specific data sets the value of x can be tuned further.
Next hidden layer is constructed from only those data points which do not belong to homogenous subsets of the previous and current hidden layers SAP-Tree(s).
Finally, the parameters of the output layer are determined (without error back-propagation). For a multiclass classification problem, the size of the output layer is the same as the number of classes. Cost function is Softmax/Cross-Entropy function. The input to the output layer is the output of Frontier neurons of all the hidden layers and the Inner neurons of the last hidden layer. The parameters of this output layer can be learnt directly using the gradient descent method applied on the cost function and the input data set to the output layer.
By this stage the dimension of the neural network is fully determined and fixed, hence a few iterations of backpropagation algorithm can be run to fine-tune the parameters (weights and biases) of the DSNN model. This is an optional step and usually results in slight improvement in the accuracy of the DSNN model.
Algorithm: DSNLiD)
D = input training data set
1. Diter = D
2. N = DSNN model
3. Build a SAP-Tree data structure for the current hidden layer using Diter as its input data set
4. Convert the SAP-Tree data structure into hidden layer H with Frontier and Inner Neurons, and add to N
5. Decide if a another hidden layer is required, if YES:
(a) Dl = Subset of Diter which does not belong to homogenous subsets of SAP-Tree
(b) Diter = Current hidden layers (H) output for subset Dl
(c) Go to Step 3
6. Train the output layer using gradient descent on entire data set D (without error backpropagation) Stochastic Adaptive Partitioning algorithm:
Stochastic Adaptive Partitioning algorithm, which is a method to automatically construct a SAP-Tree data structure of a hidden layer from the hidden layers input data set.
Stochastic Adaptive Partitioning algorithm is an iterative method, which considers a (sub) set of data points to be partitioned in each iteration into two subsets by a hyperplane. Data points lying on the positive side of the hyperplane form one subset and ones lying on the other side form the other subset. It starts the iteration with the hidden layers input data set and iterates over the smaller subsets recursively.
Algorithm: Stochastic Adaptive Parti tioningiDl
1. T = tree
2. DQueue = {D}, a priority queue data structure containing subsets of data points which are to be partitioned
3. Dcurr = Get next element from DQueue
STOP and return T if Dcurr is NULL
4. Compute the normal weight vector of the partitioning hyperplane
5. Compute the bias/intercept term of the partitioning hyperplane
6. Split Dcurr into Dpositive and Dnegative
7. Add the hyperplane computed in steps 3 and 4 to T, and Dpositive and Dnegative to the DQueue if none of the following conditions are TRUE:
(a) if Dcurr is homogenous (i.e. all points in the subset belong to the same class)
(b) Information Gain metric, evaluated on the training data set, increases if the Dcurr is partitioned further AND Information Gain metric, evaluated on union of training and validation data sets, does not increase if the Dcurr is partitioned further
(c) sum total neurons in all the hidden layers is greater‘x’% of the training data set size. In a general automated learning setting, a good estimate for the value of x derived from experimental results is 5%. However for specific data sets the value of x can be tuned further.
8. Goto Step 3
Steps 4, 5 and 6 of the algorithm are described below in detail: Let D = {(xl y , ), (x2, y2), ... , (xm, ym)} be the input data to the hidden layer, where x, º Rd and is the class label such that
Figure imgf000012_0001
^ {1, 2, .. C} V i.
(1) Compute the normal weight vector of the partitioning hyperplane
Input to this subsection is the training data (sub)set D, which contains points belonging to C classes.
(step i) Find the class k which has most number of data points, call it the dominating class (step ii) Split the training data set D into two subsets
Dprimaiy = {Xi | Xi º D and y{ = k}, set of points in D belonging to dominating class k Dsecondaiy = [x, xi e D and y, ¹ k} , set of points in D which do not belong to class k Let nl and n2 be the sizes of sets Dprimaiy and Dsecondaiy respectively.
(step iii) If the data set D is highly imbalanced, i.e., ratio n2 / (nl + n2) is less than threshold T, then STOP. T is a tuning parameter which can take values in range [0, 1).
That is if D contains either all points belonging to one class or a very large majority of points belonging to one class, then do nothing and return. For any particular application, cross-validation dataset is used to determine the ideal value of T.
In an automated learning setting, it is experimentally found that 0.975 is a good estimate of T.
(step iv) Formulate and solve an optimization problem to find the weight vector of neuron Formulate an optimization problem, parameterized by unit vector w, to achieve the following three objectives:
(a) Maximize f(w), which is the average pairwise difference between the projection of points from Dprimaiy and Dsecondaiy on w
Figure imgf000013_0001
(b) Minimize g(w), which is the average pairwise difference between the projection of points from Dnrim.,n on w is minimized
Figure imgf000013_0002
(c) Minimize h(w), which is the average pairwise difference between the projection of points from Dsecondaiy on w is minimized
Figure imgf000013_0003
This optimization problem has multiple objectives, that is, maximization of f(w) and minimization of g(w) and h(w). It also has a constraint that w is a unit vector.
Formulate the following multi-objective optimization problem to achieve the multiple objectives of maximizing f(w) and simultaneously minimizing g(w) and h(w):
Maximize wT A w
Subject To 11 w| | = 1
Figure imgf000014_0001
V k
Where l1 > 0 and l2 > 0 are tuning parameters and cross-validation data is used to determine the ideal values of l1 and l2 . In an automated setting, it is experimentally figured that l1 and l2 start by taking a zero value and slowly increase their value from 0 to 1 as the network size grows.
The term (wT A1 w), represents the average pairwise difference between the projection of pair of points from Dprimaiy and Dsecondaiy respectively onto w. This quantity has to be maximized.
The term (wT A2 w), represents the average pairwise difference between the projection of pair of points from Dprimaiy onto w. This quantity has to be minimized
The term (wT A3 w), represents the average pairwise difference between the projection of pair of points from Dsecondaiy onto w. This quantity has to be minimized
The solution w* of this problem is the leading eigenvector of symmetric matrix A, i.e., the eigenvector corresponding to the largest eigenvalue.
Stochastic Approximation of the Optimization Problem:
In the above formulation, if the number of samples (nl and n2) increase, then the amount of computation, which is proportional to (nl*n2 + nl *nl + n2*n2), also increases. To make the amount of computation linearly proportional to the number of samples (especially in the case of large datasets), a randomized approximation of the optimization problem stated above can be formulated by approximating matrix A with a matrix Aapprox A approx ls defined as: A approx = L 1 approx - 1, A2 a, pprox . 12 A3 approx
Where
Figure imgf000015_0001
Where n is the sum of nl and n2. This loop iterates from k=l to k=n and during every iteration of this loop the indices of elements of Dprimaiy and Dsecondaiy, i.e., i and j, are randomly selected from i ^ [1, nl] and j e [1, n2] respectively
The term (wT Alapprox w) represents the random approximation of the average pairwise difference between the projection of pair of points from Dprimaiy and Dsecondaiy respectively onto w
Figure imgf000015_0002
This loop iterates from k=l to k=nl and during every iteration of this loop the indices of elements of Dprimaiy, i.e., i and j, are randomly selected from i º [1, nl] and j ^ [1, nl]
The term (wT A2approx w) represents the random approximation of the average pairwise difference between the projection of pair of points from D ri
Figure imgf000015_0003
onto w
Figure imgf000015_0004
This loop iterates from k=l to k=n2 and during every iteration of this loop the indices of elements of Dsecondaiy, i.e., i and j, are randomly selected from i º [1, n2] and j e [1, n2]
The term (wT A3approx w) represents the random approximation of the average pairwise difference between the projection of pair of points from Dsecondaiy onto w
The optimization problem is now transformed into the following form:
Maximize wT Aapprox w
Subject To 11 w| | = 1 The solution w* of this problem is the leading eigenvector of symmetric matrix Aapprox, i.e., the eigenvector corresponding to the largest eigenvalue. We refer to this solution w* as the Stochastic Linear Discriminant.
The resulting vector w* becomes the weight vector of the next neuron of the hidden layer.
(2) Compute the bias term of the partitioning hyperplane
The heuristic used to decide the value of bias term is to choose a value which improves the overall classification accuracy using minimal number of separating hyperplanes. One approach to implement this heuristic is to choose the bias term which results in the least misclassification error. There are other approaches which are defined in the field of Information Theory, namely Cross-Entropy/Information-Gain and Gini Index (or Gini Impurity). In practice, any of the following metrics can be used to choose the bias term value:
(a) Misclassification error
(b) Cross Entropy
(c) Gini Index
Below are the steps used to compute the bias term value using the Gini Index metric:
Input to this subsection is the training data (sub)set D and w* which is the weight vector of the neuron computed in the previous subsection.
(step i) Compute set Dproj which contains the projection of points in D onto vector w*
Figure imgf000016_0001
(step ii) Use gini index metric to pick a value that can best split the set D proj Gini index of set D is defined as :
Gini(
Figure imgf000017_0001
where pk is the fraction of samples in D belonging to class k. The higher the value of gini index, the more randomness in the data set D.
Gini index can be used to pick a value v
Figure imgf000017_0002
which can be used to split D into two subsets D, and D2 such that
Figure imgf000017_0003
The weighted gini index of the resulting subsets D, and D2 is defined as:
Weighted_Gini(v) = (p, * Gini(D,) ) + (p2 * Gini(D2) )
Where p, and p2 are the fraction of samples from set D that are present in sets D, and D2 respectively. And Gini^) and Gini(D2) are the gini index values of D, and D2 respectively.
How to pick value of bias term ?
Among all values in Dpro|, pick a value, vbias, that results in the subsets D, and D2 having minimum weighted gini index
(step iii) Bias term of the neuron, whose weight vector is the input w*, is set as negative value of
Vbias> i e> (-1 * Vbi
(3) Split Dcurr into two subsets Dpositive and Dnegative The previous two subsections describe steps to determine the parameters (weight vector and bias term) of one partitioning. This step describes the recursive approach towards building the SAP-Tree data structure..
Divide training data (sub)set D into two subsets Dpositive and Dnegative based on whether the points in D lie on the positive or negative side of the hyperplane of neuron whose weight vector and bias term are computed in subsections (a) and (b).
Figure imgf000018_0001
The Dynamic Structure Neural Machine (DSNM), which is an implementation of DSNN and
DSNL, can be applied in a variety of applications, such as Medical Diagnosis system, Face Recognition system, Sentiment Analysis of Social Media content, Speech Recognition system, e.t.c. The hardware implementation allows it to receive a variety of input data, such as Camera based sensor data, data stored in an external storage medium which can be connected via USB port, Stream (Web, Audio and Video) data. The vectorization module is capable of processing these types of data and converting it into vector form.
FIG 4 shows the hardware implementation of the Dynamic Structure Neural Machine (DSNM). The DSNM consists of a ARM processor based Central Processing Unit, one or more Sensors which feed the data to DSNM, a User-Input mechanism using which user send control signals to
DSNM, a storage mechanism to store and retrieve data, model and other meta information. The central element of DSNM is the DSNM process which receives control signals from user and performs data acquisition, model training or live testing. The DSNM operates in three modes (i) Data acquisition mode (ii) the training or learning mode and (iii) testing or live usage mode. In the data acquisition mode, user directs the DSNM process to receive data from a sensor or input device(s) and to store it in the internal storage mechanism. In the training or learning mode, user directs DSNM to train a neural network model for a particular data set. DSNM follows the training procedure described in the“DESCRIPTION” section to train a neural network using the training data set. Once the training is done the trained model is stored using the internal storage mechanism. The testing or live usage mode allows the user to apply an already trained model on test or live data. In this case, the sensor or input device(s) are directed to receive the data, which is then converted into vector form and passed as input to the pre-trained model, which predicts the output value for that particular input. The output or predicted value is then transmitted to the output device(s)
When applied on real-world multiclass classification data sets, DSNM has proven to give high accuracy across data sets from various domains. The examples of classification data sets on which DSNM is applied and the neural network models were trained in an automated manner are:
(i) Handwritten digits recognition
This is a publicly available data set, called MNIST. Contains 60,000 train and 10,000 test images of ten handwritten digits, which are of 28x28 dimension. DSNM gave an accuracy of 98.4% on the test dataset.
(ii) Speech commands recognition
This is the Google Speech commands data set, comprising the audio recordings of 30 different types of speech commands. DSNM gave an accuracy of 96.9% on the test set
(iii) Exotic Particles Search
This is the SUSY data set available on UCI Machine Learning Repository. Here the problem is to distinguish between a signal process which produces supersymmetric particles and a background process which does not. It contains 4.5 million train data points and 0.5 million test data points. DSNM gave an accuracy of 80.1% on this data set. (iv) Predictive Maintenance
This is the IDA-2016 challenge data set, present in the UCI Machine Learning repository. The dataset consists of data collected from heavy Scania trucks in everyday usage. The system in focus is the Air Pressure system (APS) which generates pressurised air that are utilized in various functions in a truck, such as braking and gear changes. The datasets' positive class consists of component failures for a specific component of the APS system. The negative class consists of trucks with failures for components not related to the APS.
DSNM gave an accuracy of 99.2% on this data set.
(v) Bank Marketing data set
This data set is available on the UCI Machine Learning repository. The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y). DSNM gave an accuracy of 88.6% on this data set

Claims

CLAIMS: What is claimed is:
1. A system and method with Artificial Intelligence (AI), Machine Learning comprising:
A computer-implementable Neural Network architecture, referred to as Dynamic Structure Neural Network (DSNN), to perform machine learning task of multiclass classification.
The neural network architecture comprises of an input layer, one or more hidden layers and an output layer. Neurons in each hidden layer are grouped as either Frontier neurons or Inner neurons.
Frontier neurons of each hidden layer are connected to the next hidden layer and the output layer. Therefore the output layer receives, as input, the output of Frontier neurons of all the hidden layers
Inner neurons of each hidden layer are only connected to the next (hidden or output) layer.
2. The system and method of claim 1, further comprising:
A method of automated machine learning, referred to as Dynamic Structure Neural Learning (DSNL), which takes as input a multiclass classification data set (or training data) and outputs a DSNN model whose architecture mentioned in claim 1
For a given multiclass classification data set, the DSNL automated learning process determines:
- number of hidden layers in the neural network model
- number of frontier and inner neurons in each hidden layer
- parameters (weights and biases) of all the neurons of all the hidden layers
- the size and parameters of the output layer
Each hidden layer is constructed, one after the other, from a corresponding SAP-Tree (Stochastic Adaptive Partitioning Tree) data structure. Nodes of the SAP-Tree correspond to hyperplanes. This tree is built with the aim of partitioning the input data set to the hidden layer into smaller subsets which are homogenous (i.e. all points in the subset belong to the same class).
Each node of the SAP-Tree is converted to either a Frontier or Inner neuron of the hidden layer. Hyperplane which results in dividing a (sub) set of input data points into two subsets at least one of which is homogenous, is converted to a Frontier neuron. Remaining hyperplanes are converted into Inner neurons.
After adding a hidden layer the algorithm decides if it’s required to add yet another hidden layer to the model. Next hidden layer is constructed from only those data points which do not belong to homogenous subsets of the (previous and) current hidden layers SAP-Tree(s).
Finally, the parameters of the output layer are determined using gradient descent method (without error back-propagation).
3. The system and method of claim 2, further comprising:
Stochastic Adaptive Partitioning algorithm, which is a method to automatically construct a SAP-Tree data structure of a hidden layer from the hidden layers input data set.
Stochastic Adaptive Partitioning algorithm is an iterative method, which considers a (sub) set of data points to be partitioned in each iteration into two subsets by a hyperplane. Data points lying on the positive side of the hyperplane form one subset and ones lying on the other side form the other subset. It starts the iteration with the hidden layers input data set and iterates over the smaller subsets recursively.
Partitioning a (sub) set containing data points of two or more classes is done by adaptively by splitting these points into two groups, a primary group and a secondary group. Primary group comprises of data points belonging to the class having maximum data points in the (sub) set to be partitioned in the current iteration. All the other data points of the (sub) set are part of the secondary group.
An optimization problem is proposed to find a unit vector on which two groups of data points project in a way that creates maximum separation in between the data points belonging to different groups and (to a certain extent) minimum separation between the points belonging to the same group. A stochastic approximation of this optimization problem is solved and the result is the normal vector of the partitioning hyperplane. Gini index metric is used to find the bias (or intercept term) of the partitioning hyperplane.
A subset which satisfies one of the following two criteria is not divided further:
(a) if the subset is homogenous (i.e. all points in the subset belong to the same class)
(b) if the Information Gain metric, evaluated on the training data set, increases if the (sub) set is partitioned further and Information Gain metric, evaluated on union of training and validation data sets, does not increase if the (sub) set is partitioned further
4. The system and method of claim 1, further comprising:
The computer-implementation of neural network architecture and the automated learning method, referred to as Dynamic Structure Neural Machine (DSNM), which is applicable to real-world Machine Learning tasks, resulting in high accuracy while solving several particular problems such as (i) Handwritten digits recognition (MNIST data set), (ii) Speech commands recognition (Google Speech data set), (iii) Exotic Particles Search (SUSY data set) (iv) Predictive Maintenance (IDA-2016 data set) and (v) Bank Marketing data set; in comparison with similar automated machine learning methods.
PCT/IN2019/050820 2018-11-06 2019-11-05 Dynamic structure neural machine for solving prediction problems with uses in machine learning WO2020095321A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201841041940 2018-11-06
IN201841041940 2018-11-06

Publications (3)

Publication Number Publication Date
WO2020095321A2 true WO2020095321A2 (en) 2020-05-14
WO2020095321A3 WO2020095321A3 (en) 2020-06-25
WO2020095321A8 WO2020095321A8 (en) 2020-07-23

Family

ID=70611488

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2019/050820 WO2020095321A2 (en) 2018-11-06 2019-11-05 Dynamic structure neural machine for solving prediction problems with uses in machine learning

Country Status (1)

Country Link
WO (1) WO2020095321A2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085157A (en) * 2020-07-20 2020-12-15 西安电子科技大学 Prediction model establishing method and device based on neural network and tree model
CN112116143A (en) * 2020-09-14 2020-12-22 贵州大学 Forest pest occurrence probability calculation processing method based on neural network
CN112697435A (en) * 2021-01-26 2021-04-23 山西三友和智慧信息技术股份有限公司 Rolling bearing fault diagnosis method based on improved SELD-TCN network
CN112908446A (en) * 2021-03-20 2021-06-04 张磊 Automatic mixing control method for liquid medicine in endocrinology department
CN113282842A (en) * 2021-01-25 2021-08-20 上海海事大学 Travel purpose identification method based on travel survey of smart phone and artificial neural network particle swarm optimization algorithm
CN113469339A (en) * 2021-06-30 2021-10-01 山东大学 Dimension reduction-based autopilot neural network robustness verification method and system
CN113590748A (en) * 2021-07-27 2021-11-02 中国科学院深圳先进技术研究院 Emotion classification continuous learning method based on iterative network combination and storage medium
CN114239330A (en) * 2021-11-01 2022-03-25 河海大学 Deep learning-based large-span latticed shell structure form creation method
CN115394394A (en) * 2022-10-27 2022-11-25 曹县人民医院 Resident health service reservation method and system based on big data processing technology
CN115534319A (en) * 2022-09-21 2022-12-30 成都航空职业技术学院 3D printing path planning method based on HGEFS algorithm
CN117537951A (en) * 2024-01-10 2024-02-09 西南交通大学 Method and device for detecting internal temperature rise of superconducting suspension based on deep learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7711663B2 (en) * 2006-03-27 2010-05-04 Board Of Trustees Of Michigan State University Multi-layer development network having in-place learning
US9753959B2 (en) * 2013-10-16 2017-09-05 University Of Tennessee Research Foundation Method and apparatus for constructing a neuroscience-inspired artificial neural network with visualization of neural pathways
US20180284735A1 (en) * 2016-05-09 2018-10-04 StrongForce IoT Portfolio 2016, LLC Methods and systems for industrial internet of things data collection in a network sensitive upstream oil and gas environment

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085157A (en) * 2020-07-20 2020-12-15 西安电子科技大学 Prediction model establishing method and device based on neural network and tree model
CN112085157B (en) * 2020-07-20 2024-02-27 西安电子科技大学 Disease prediction method and device based on neural network and tree model
CN112116143A (en) * 2020-09-14 2020-12-22 贵州大学 Forest pest occurrence probability calculation processing method based on neural network
CN112116143B (en) * 2020-09-14 2023-06-13 贵州大学 Forest pest occurrence probability calculation processing method based on neural network
CN113282842A (en) * 2021-01-25 2021-08-20 上海海事大学 Travel purpose identification method based on travel survey of smart phone and artificial neural network particle swarm optimization algorithm
CN112697435A (en) * 2021-01-26 2021-04-23 山西三友和智慧信息技术股份有限公司 Rolling bearing fault diagnosis method based on improved SELD-TCN network
CN112908446A (en) * 2021-03-20 2021-06-04 张磊 Automatic mixing control method for liquid medicine in endocrinology department
CN112908446B (en) * 2021-03-20 2022-03-22 张磊 Automatic mixing control method for liquid medicine in endocrinology department
CN113469339A (en) * 2021-06-30 2021-10-01 山东大学 Dimension reduction-based autopilot neural network robustness verification method and system
CN113469339B (en) * 2021-06-30 2023-09-22 山东大学 Automatic driving neural network robustness verification method and system based on dimension reduction
CN113590748A (en) * 2021-07-27 2021-11-02 中国科学院深圳先进技术研究院 Emotion classification continuous learning method based on iterative network combination and storage medium
CN113590748B (en) * 2021-07-27 2024-03-26 中国科学院深圳先进技术研究院 Emotion classification continuous learning method based on iterative network combination and storage medium
CN114239330A (en) * 2021-11-01 2022-03-25 河海大学 Deep learning-based large-span latticed shell structure form creation method
CN114239330B (en) * 2021-11-01 2022-06-10 河海大学 Deep learning-based large-span latticed shell structure form creation method
CN115534319B (en) * 2022-09-21 2023-08-11 成都航空职业技术学院 3D printing path planning method based on HGEFS algorithm
CN115534319A (en) * 2022-09-21 2022-12-30 成都航空职业技术学院 3D printing path planning method based on HGEFS algorithm
CN115394394A (en) * 2022-10-27 2022-11-25 曹县人民医院 Resident health service reservation method and system based on big data processing technology
CN117537951A (en) * 2024-01-10 2024-02-09 西南交通大学 Method and device for detecting internal temperature rise of superconducting suspension based on deep learning
CN117537951B (en) * 2024-01-10 2024-03-26 西南交通大学 Method and device for detecting internal temperature rise of superconducting suspension based on deep learning

Also Published As

Publication number Publication date
WO2020095321A8 (en) 2020-07-23
WO2020095321A3 (en) 2020-06-25

Similar Documents

Publication Publication Date Title
WO2020095321A2 (en) Dynamic structure neural machine for solving prediction problems with uses in machine learning
Neill An overview of neural network compression
Narkhede et al. A review on weight initialization strategies for neural networks
Zhang et al. An unsupervised parameter learning model for RVFL neural network
Mercioni et al. The most used activation functions: Classic versus current
US11429862B2 (en) Dynamic adaptation of deep neural networks
EP3543917B1 (en) Dynamic adaptation of deep neural networks
US11803744B2 (en) Neural network learning apparatus for deep learning and method thereof
Springenberg et al. Improving deep neural networks with probabilistic maxout units
Li et al. Feature selection using a piecewise linear network
Li et al. Evolutionary extreme learning machine with sparse cost matrix for imbalanced learning
Suneera et al. Performance analysis of machine learning and deep learning models for text classification
Glauner Comparison of training methods for deep neural networks
Yeganejou et al. Improved deep fuzzy clustering for accurate and interpretable classifiers
Bao et al. Cross-entropy pruning for compressing convolutional neural networks
Kim Deep learning
Urgun et al. Composite power system reliability evaluation using importance sampling and convolutional neural networks
Ishii et al. Partially zero-shot domain adaptation from incomplete target data with missing classes
CN116524282A (en) Discrete similarity matching classification method based on feature vectors
Andrade et al. Implementation of Incremental Learning in Artificial Neural Networks.
Shetty et al. Comparative analysis of different classification techniques
Afrasiyabi et al. Energy saving additive neural network
Ye et al. Learning Algorithm in Two-Stage Selective Prediction
Via et al. Training algorithm for dendrite morphological neural network using k-medoids
Keddous et al. Characters Recognition based on CNN-RNN architecture and Metaheuristic

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19882268

Country of ref document: EP

Kind code of ref document: A2

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19882268

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 19882268

Country of ref document: EP

Kind code of ref document: A2