NL2016285A - Device and method for generating a group equivariant convolutional neural network. - Google Patents

Device and method for generating a group equivariant convolutional neural network. Download PDF

Info

Publication number
NL2016285A
NL2016285A NL2016285A NL2016285A NL2016285A NL 2016285 A NL2016285 A NL 2016285A NL 2016285 A NL2016285 A NL 2016285A NL 2016285 A NL2016285 A NL 2016285A NL 2016285 A NL2016285 A NL 2016285A
Authority
NL
Netherlands
Prior art keywords
group
symmetry
sampling points
translation
groups
Prior art date
Application number
NL2016285A
Other languages
Dutch (nl)
Other versions
NL2016285B1 (en
Inventor
Sebastiaan Cohen Taco
Welling Max
Original Assignee
Scyfer B V
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Scyfer B V filed Critical Scyfer B V
Priority to NL2016285A priority Critical patent/NL2016285B1/en
Priority to PCT/NL2017/050084 priority patent/WO2017142397A1/en
Publication of NL2016285A publication Critical patent/NL2016285A/en
Application granted granted Critical
Publication of NL2016285B1 publication Critical patent/NL2016285B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a device for generating a convolutional neural network, said device comprising a computer device comprising a computer program which, when running on said computer device, defines said 5 convolutional neural network, said convolutional neural network comprising at least one non-translational group convolution layer, said non-translational group convolution layer taking as input a set of learnable filters and an input stack of feature maps, both inputs being a vectorvalued function on a discrete set of sampling points having a set of symmetry transformations that form a mathematical group of symmetry transformations, with the exclusion of mathematical groups of symmetry transformations that are pure translation groups, and said non-translational group convolution layer producing as output an output stack of feature maps being a vector-valued function on said mathematical group of symmetry transformations.

Description

Device and method for generating a group equivariant convolutional neural network
Field of the invention
The invention relates to a device for generating a convolutional neural network, a computer program product for generating a convolutional neural network, a series of devices for solving machine learning problems including but not limited to classification, regression, segmentation, detection, labelling, and prediction problems.
Background of the invention
In the art, many different types of neural networks are known. These networks consist of a sequence of data processing steps called layers, each of which takes as input one or more outputs from previous layers and/or one or more data points that represent the input to the entire network. Typically, some of the layers employ parameters that are learned from data by a learning algorithm. Neural networks have been proven valuable for modelling data that cannot easily be described by simple mathematical models. Neural networks have a potentially large number of adjustable parameters that can be adapted by a learning algorithm so as to optimise the performance of the network on some task, such as object recognition, speech recognition, or computer-aided diagnosis based on medical scans. Learning is typically performed by gradient-based optimization algorithms, where the gradient is computed using the backpropagation algorithm.
Deep convolutional neural networks (CNNs, convnets) in particular have proven to be very powerful models for sensory data such as images, video, and audio. Although a strong theory of neural network design is currently lacking, a large amount of empirical evidence supports the notion that both convolutional parameter sharing and network depth (among other factors) are important for good predictive performance.
Over the last few years, convolutional neural networks have established a significant breakthough.
Examples of convolutional neural networks and their applications are for instance described in EP2833295. EP2833295 according to its abstract describes a convolutional-neural-network-based classifier, a classifying method by using a convolutional-neural-network-based classifier and a method for training the convolutional-neural-network-based classifier. The convolutional-neural-network-based classifier comprises: a plurality of feature map layers, at least one feature map in at least one of the plurality of feature map layers being divided into a plurality of regions; and a plurality of convolutional templates corresponding to the plurality of regions respectively, each of the convolutional templates being used for obtaining a response value of a neuron in the corresponding region. EP2869239 according to its abstract describes systems, methods, and non-transitory computer readable media that can align face images, classify face images, and verify face images by employing a deep neural network (DNN). A 3D-aligned face image can be generated from a 2D face image. An identity of the 2D face image can be classified based on provision of the 3D-aligned face image to the DNN. The identity of the 2D face image can comprise a feature vector EP2911111 according to its abstract describes an apparatus and a method for lesion detection are provided. The method of lesion detection involves detecting lesion candidates from a medical image, detecting anatomical objects from the medical image, verifying each of the lesion candidate based on anatomical context information comprising information regarding a location relationship between the lesion candidates and the anatomical objects, and removing one or more false positive lesion candidate from the detected lesion candidates based on a verification result.
Summary of the invention
The invention seeks to provide a collection of convolutional neural network layers that enjoy greater statistical efficiency for problem domains that have symmetry, which means that the network requires fewer training examples to leam to perform its task at a certain level of proficiency in such symmetric problem domains.
To that end, the invention relates to a device for generating a convolutional neural network, said device comprising a computer device comprising a computer program which, when running on said computer device, defines said convolutional neural network, said convolutional neural network comprising at least one non-translational group convolution layer, said non-translational group convolution layer taking as input a set of leamable filters and an input stack of feature maps, both inputs being a vector-valued function on a discrete set of sampling points having a set of symmetry transformations that form a mathematical group of symmetry transformations, with the exclusion of mathematical groups of symmetry transformations that are pure translation groups, and said non-translational group convolution layer producing as output an output stack of feature maps, said output stack of feature maps being a vector-valued function on said mathematical group of symmetry transformations, said non-translational group convolution layer comprising inner products between said input stack of feature maps and each transformed filter from a set of transformed filters, said set of transformed filters comprising leamable filters from said set of leamable filters to which a transformation from said group of symmetry transformations has been applied, and wherein each said inner product produces a result to be stored at a position in said output stack of feature maps, and wherein the result of each inner product is stored in said output stack of feature maps at the position associated with the leamable filter and the symmetry transformation that was applied to the leamable filter to obtain the transformed filter that was used in said inner product.
The invention provides a collection of convolutional neural network layers that enjoy greater statistical efficiency for problem domains that have symmetry, which means that the network requires fewer training examples to learn to perform its task at a certain level of proficiency in such symmetric problem domains. Symmetry in this context means that the correct output for the network does not change (or changes in a predictable, equivariant way) when certain transformations are applied to the input of the network. For example, in an image classification problem, the network has to predict the class of a visual object, which typically does not depend on the position and orientation of the object in the image. As another example, in an image segmentation problem, the output of the network is a segmentation mask, and this mask should undergo a rotation when the input image is rotated.
It was found that known convolutional parameter sharing is effective because there is a translation symmetry in most vision problems: the label function and the data distribution are both approximately invariant to shifts. By using the same parameters to analyze each part of the image, a convolution layer uses far fewer parameters than a fully connected layer, while preserving the capacity to learn many useful data transformations.
It was further found that convolution layers can be used effectively in a deep network because all the layers in such a network are translation equivariant: shifting the image and then feeding it through a number of convolution layers gives the same result as feeding the original image through the same layers and then shifting the result. In other words, the symmetry (translation) is preserved by each layer, which makes it possible to exploit it by parameter-sharing not just in the first but also in higher layers of the network.
It was further found that convolutional layers can be generalized to exploit larger groups of symmetries, including rotations, reflections and glide reflections, thus further increasing the degree of parameter sharing. This is the main invention described herein, and is referred to as a G-convolution or G-correlation layer, the resulting networks being called G-CNNs. Furthermore, other operations typically used in CNNs, such as pooling, bias, and batch normalization layers are generalized so as to be compatible with (i.e. equivariant) G-convolution and G-correlation layers.
The word ‘equivariance’ is not commonly used in the deep learning literature, so we briefly discuss it here. Transforming an input x by a transformation g (forming Tg x) and then passing it through a network or layer φ should give the same result as first mapping x through the network and then transforming the representation: %Φ(χ) = Φ(Τ§χ)
Layer or network φ is then called an equivariant map.
Invariance is a special kind of equivariance where Tg is the identity transformation for all g. In deep learning, general equivariance is more useful than invariance because it is impossible for a layer to determine if low level features are in the right spatial configuration if they are invariant.
To make it easier to introduce the invention, we first describe a model of known CNNs as used in computer vision, and describe their translation equivariance. In this
9 K model, stacks of feature maps are modeled as functions f : TL -> R supported on a bounded domain supp(f). In this respect, both the input (e.g. an image) and the intermediate representations of the input as produced by the layers of the network are considered to be feature maps. We think of the integers (p; q) in Hr as the coordinates of a (typically square) lattice of sampling points. At each sampling point, the stack of feature maps stores a real K-dimensional vector, where K is the number of color channels of the input image (e.g. K=1 for grayscale, K=3 for color images), or the number of feature maps in higher layers. Although the function f extends to infinity, we only have to store a finite array in computer memory because the function is nonzero only on a small domain supp(f). In this text, whenever we refer to a stack of feature maps or to a vector-valued function on a discrete set of sampling points, we are referring to a function that is zero everywhere outside of a finite support, and hence can be stored in a finite array in computer memory.
For transformations of the stack of feature maps, the following notation for a transformation g acting on a stack of feature maps f is introduced:
The same definition is valid for feature maps on a group G or discrete set of sampling points that will be introduced later. In these cases x is not a point in Z2, but a point in said group or discrete set, and the juxtaposition g_1x denotes the group operation (when x is a group element) or to the group action (when x is a point in a discrete set of sampling points on which G acts). For feature maps on the groups p4 and p4m, this is visualized in figure 1. A standard inner product on the space of Z2 stacks of feature maps is defined by:
Again, the generalization to feature maps on other discrete sets and groups than Z is straightforward: simply replace the sum over Z by a sum over said discrete space or group.
At each layer 1 of a regular CNN, the CNN takes as input a stack of feature maps (or an image) ƒ ' -:'’V and convolves or correlates it with a set of K|+i filters # .· 1? K*'«:.
If one employs convolution (*) in the forward pass, the correlation ( ^ ) will appear in the backward pass when computing gradients, and vice versa. Both operations are very similar, and, following standard practice in the literature on CNNs, we will often simply use the word “convolution” to refer to any convolution-type operation including the correlation operation. The invention described herein is not limited to any one particular correlation or convolution-type operation. Here we use the convention that the correlation is used in the forward pass.
Using the substitution y —► y + t, and leaving out the summation over feature maps for clarity, we see that a translation t followed by a correlation is the same as a correlation followed by a translation:
And so we say that “correlation is an equivariant map for the translation group”, or that “correlation and translation commute”.
The known translational correlation operation defined before is computed by shifting a filter and then computing an inner product with the stack of input feature maps. By replacing the shift by a more general transformation from some group G, we get the G-correlation used that can be applied to a function on (e.g. an image):
Notice that both the input image f and the filter ψ are functions of the plane Li , but the feature map f is a function on the discrete group G (which may contain translations as a subgroup). Hence, for all layers after the first, the filters must also be functions on G, and the correlation operation becomes
The scope of the invention also includes G-convolution-like sums where a different group action than g_1h or hg is used, and further includes G-convolution-like sums where the arguments f and ψ are swapped. All of these operations enjoy similar equivariance properties that are easily derived by direct computation (as shown below for the G-correlation).
The equivariance of both the first-layer G-correlation and the full G-correlation operation is derived using the substitution h—»uh:
for the first layer correlation, the inputs f and ψ are functions on Z2, so Luf denotes the transformation of such a function, while Lu[ J *"Ψ ] denotes the transformation of the feature map (see figures 1 and 2 for an illustration), which is a function on G. For the full G-correlation, both the inputs f and ψ and the output ί*Ψ are functions on G.
More generally, we can define a G-convolution layer for vector-valued functions on any discrete set of sampling points which has a group of symmetries (i.e. a set of transformations that map the set of sampling points onto itself, where the set of transformations is closed (in the mathematical sense) under composition of transformations and inverses). Since for such a “symmetric sampling set”, there is a well defined action of the group on the sampling set, we can transform also a function defined on this sampling set using the operator Lg defined before, which can then be used to define an equivariant G-correlation and G-convolution as shown before (replacing the sum over Z or G by a sum over said set of sampling points). The G-correlation and G-convolution of functions on Z2 and G are special cases of this, because indeed a group of symmetries of Z2 acts on Z2 , and indeed G acts on G by left multiplication. Note further that for a given set of sampling points, there may be multiple groups of symmetries. For instance, a square grid Z2 has a group of symmetries that includes translations, rotations and reflections, but also a group that includes only translation, or only translations and rotations. Each of these leads to a different G-convolution or G-correlation layer.
There are many examples where this generic G-convolution or G-correlation is useful. There are many types of data that can have many kinds of symmetry. For example, one could have images sampled on a hexagonal lattice, and a learning problem which has a translation, rotation and reflection symmetry. In this case, one could use a G-CNN with G=p6m, the group of symmetries of a hexagonal lattice. Or one might have a 3D medical scan which has a different resolution along the z-axis than along the x and y-axes. Such a lattice does not have full 3D rotation symmetry, but it admits transformations like for instance vertical reflections and horizontal rotations, among other transformations.
In general, depending of the type of data and the symmetry of the problem, one can choose to use any finite group, any Frieze group (groups of symmetries with one independent translation) except the pure ID translation Frieze group pi for which the CNN is known, any Wallpaper group (groups of symmetries with two independent translations) except the pure 2D translation wallpaper group pi for which the CNN is known, or any space group (except pure translation groups, for which the CNN is known). The mentioned Frieze, wallpaper and space groups have all been classified mathematically, and there exist exactly 7 Frieze groups, 17 wallpaper groups, and 230 space groups (also including pure translation groups). Precise definitions of these groups can be found in (Coxeter, H. S. ., & Moser, W. O. . (1980). Generators and Relations for Discrete Groups. Springer Berlin Heidelberg; Schattschneider, D. (1978). The Plane Symmetry Groups: Their Recognition and Notation. American Mathematical Monthly, 85, 439-450; Hahn, T. (2002). International Tables for Crystallography. Springer Netherlands). These references are incorporated by reference as if fully set forth.
It is customary to add a bias term to each feature map in a convolution layer. This can be done for G-convolution layers as well, but this operation must be adapted so as to use only one bias per G-feature map (not one bias per 2D feature map, if the feature map on G is represented as an array of dimension greater than 2). Likewise, batch normalization (Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the International Conference on Machine Learning (ICML), incorporated by reference as if fully set forth.) must be generalized to use only a single bias and scale parameter per G-feature map.
Known CNNs typically involve layers of non-linearities, which can be used in G-CNNs without modification. While the layer does not change, we show below that it is compatible with G-convolution layers, in the sense that pointwise non-linearities are G-equivariant maps. The nonlinearity v : l& —> M is applied to a feature map f : G -> R by composition to give the rectified feature map [v°f](g) = v(f(g)). Since Lhf = f o h"1, one can see that
so the rectified feature map inherits the transformation properties of the previous layer (i.e. it is G-equivariant, and we may thus stack further G-convolution layers on top).
In order for instance to reduce the size of the feature maps, pooling can be applied in a known convolutional neural network. In such a known convolutional neural network, pooling as such is known in the art. In a G-CNN, we must use a generalized form of pooling which we call G-pooling. This can be used with any of the common pooling operators (as long as they are invariant to permutations of the pooling domain), that as such is known in the art, including max-pooling, sumpooling, and square-pooling.
In order to simplify the analysis, we split the pooling operation as it is usually conceived of into two steps: the pooling itself (performed without stride), and a subsampling step. The non-strided max-pooling operation applied to a feature map f : G —> R can be modeled as an operator P that acts on f as
where gU = {gu u € U) is the g-transformation of some pooling domain U c= G (typically a neighborhood of the identity transformation). In a regular convnet U is usually a 2 x 2 or 3 x 3 square including the origin (0, 0). This operation too commutes with transformations:
Since pooling tends to reduce the variation in a feature map, it makes sense to subsample the pooled feature map, or equivalently, to do a “pooling with stride”. In a G- CNN, the notion of “stride” is generalized by subsampling on a subgroup H c G. That is, H is a subset of G that is itself a group (i.e. closed under multiplication and inversion). The resulting subsampled feature map is then equivariant to H but not equivariant to G.
In a standard convnet, pooling with stride 2 is the same as pooling and then subsampling on H = {(2i, 2j) I (i, j) e Z2} which is a subgroup of G = Z2. For the p4- CNN (p4 being the group of translations and 90 degree rotations), we may for instance subsample on the subgroup H containing all 4 rotations and shifts by multiples of 2n pixels, after n pooling layers.
We can obtain full G-equivariance by choosing our pooling region U to be a subgroup H c G, so that we are pooling over cosets gH={gh I heH}. For example, in a p4-CNN, we can pool over all four rotations at each spatial position (the cosets of the subgroup R of rotations around the origin). Or, in a Z2-CNN, we can pool over the cosets of the subgroup of even horizontal shifts. The resulting coset-pooled feature map is invariant to the right-action of H, because the pooling domains (cosets) are similarly invariant (ghH = gH), so we can arbitrarily choose a set of coset-representatives to subsample on. The resulting feature map may be thought of as a function on the quotient space G/H, which caries a natural G-action: g(g’ H) = (gg’)H. For p4 and R, we have p4/R = Z2, i.e. the feature map resulting from pooling over cosets gR has the same transformation law as the input image.
This concludes our definition of the various layers in a G-CNN (G-convolution or G-correlation, non-linearities and G-pooling, G-batch normalization), and the analysis of their equivariance properties.
To train a group equivariant convolutional network using a gradient-based optimization algorithm, the gradient of a loss function with respect to the parameters of the filters must be computed. As usual, this is done using the backpropagation algorithm. The gradients are easily derived for any G-convolution or G-correlation layer using standard calculus. Here we describe the required computations for the G-correlation layer.
Let feature map k at layer 1 be denoted,
where
is the previous feature map. At some point in the backprop algorithm, we will have computed the derivative
for all k, and we need to compute ÖL/Sf1 1 j to backpropagate to lower layers as well as SL/cVk(to update the parameters). We find that,
where the superscript * denotes the involution, and ψΐ is the set of filter components applied to input feature map j at layer 1:
For the gradient with respect to the filters, we obtain:
So we see that both the forward and backward passes involve convolution or correlation operations, as is the case in the current generation of convnets. Thus, the backward pass may be implemented using the same convolution and correlation routines that can be used in the forward pass.
In an embodiment, said group of symmetry transformations is a symmorphic group, wherein each symmetry transformation in said symmorphic group is the composition of one symmetry transformation from the stabilizer of the origin and one translation. In this, in an embodiment the stabilizer of the origin is the subgroup of said group of symmetry transformations that contains all symmetry transformations that leave invariant the origin of said discrete set of sampling points. In particular, the origin is a fixed but arbitrarily designated point from said discrete set of sampling points. In particular, the non-translational group convolution layer is implemented by first applying every transformation from the stabilizer of the origin to each filter in order to obtain a set of stabilizer transformed filters, and then using a known translational convolution algorithm to said set of stabilizer transformed filters and said input stack of feature maps.
In an embodiment, the non-translational group convolution layer is implemented using the Fast Group Fourier Transform (FGFT) and its inverse, the Inverse Fast Group Fourier Transform (IFGFT), further comprising: - applying the FGFT for said non-translational group of symmetry transformations to both said input stack of feature maps and said set of leamable filters, thus obtaining for said input stack of feature maps and each said leamable filter a matrix-valued function on the unitary dual of said non-translational group of symmetry transformations, and - computing the matrix product of said Fourier transform of said input stack of feature maps with each of said Fourier transforms of said set of leamable filters, thus obtaining the Fourier transform of the group convolution of said input stack of feature maps with said leamable filters, and finally computing the Inverse Fast Group Fourier Transform (IFGFT) for said non-translational group of symmetry transformations.
In an embodiment, a learned bias term is added to one or more of said output stack of feature maps. In particular only a single bias term per feature map is added in the output stack of feature maps.
In an embodiment, the discrete set of sampling points is the set of integers Z. In particular, the group of symmetry transformations is the group which consists of all compositions of integer translations and reflections about any point in the discrete set of sampling points, in particular the group consisting of all composition of integer addition or multiplication by -1.
In an embodiment, the discrete set of sampling points is the set of sampling points in a square lattice. In particular, the group of symmetry transformations is the group p4 which consists of all compositions of integer translations in two orthogonal directions and rotations by multiples of 90 degrees about any point in the discrete set of sampling points.
In an embodiment, the discrete set of sampling points is the set of points in a square lattice. In particular the group of symmetry transformations is the group p4m which consists of all compositions of integer translations in two orthogonal directions, rotations by multiples of 90 degrees about any point in the discrete set of sampling points.
In an embodiment, the discrete set of sampling points is the set of points in a hexagonal lattice. In particular, the symmetry group is the group p6 which consists of compositions of integer translation in the two independent directions of said hexagonal lattice and rotations by multiples of 60 degrees about any point in said hexagonal lattice.
In an embodiment, the discrete set of sampling points is the set of points in a hexagonal lattice. In particular, the group of symmetry transformations is the group p6m which consists of compositions of integer translations in the two independent directions of said hexagonal lattice and rotations by multiples of 60 degrees about any point in said hexagonal lattice and reflections in mirrors that are aligned with the translation axes of said hexagonal lattice.
In an embodiment, the discrete set of sampling points is the set of points in a rhombic lattice, or a square lattice, or a hexagonal lattice, or a rectangular lattice, or a parallelogrammic lattice. In particular, the group of symmetry transformations is, respectively, the group cmm, p4m, p6m, pmm, or p2, or any of the subgroups of these groups that are not pure translation groups, in particular each of the 17 wallpaper groups except the group pi of pure translations.
In an embodiment, the discrete set of sampling points is the set of points in a three-dimensional cubic lattice. In particular, the group of symmetry transformations is the group which consists of all compositions of integer translations in the three orthogonal directions of said lattice, rotations by multiples of 90 degrees about any axis attached to any point in said lattice.
In an embodiment, the discrete set of sampling points is the set of points in a three-dimensional cubic lattice. In particular, the group of symmetry transformations is the group which consists of all compositions of integer translations in the three orthogonal directions of said lattice, rotations by multiples of 90 degrees about any axis attached to any point in said lattice, and reflections in any axis-aligned mirror.
In an embodiment, the discrete set of sampling points is a three-dimensional orthorhombic or tetragonal lattice. In particular, the group of symmetry transformations is the group which consists of all symmetries of said lattice, or any of the subgroups of these groups that are not pure translation groups.
In an embodiment, the discrete set of sampling points is a three-dimensional Bravais lattice. In particular, the group of symmetry transformations is the group which consists of all symmetries of said lattice, or any of the subgroups of these groups that are not pure translation groups.
In an embodiment, the discrete set of sampling points consist of the elements in any of the 17 wallpaper groups except the pure translation group, or any of the 230 space groups except the pure translation groups, or any of the two-dimensional point groups or any of the three-dimensional point groups.
In an embodiment, the group of symmetry transformations is one of the 6 Frieze groups pi 1, pi lg, plml, p2, p2mg, pi lm, p2mm, or any of their subgroups that are not pure translation groups.
In an embodiment, the discrete set of sampling points consists of the elements of a square, rectangular, rhombic, parallelogrammic or hexagonal lattice. In particular, the input stack of feature maps comprises pixels of a digital image having one or more color channels.
In an embodiment, G-pooling is applied to at least one of the feature maps from said output stack of feature maps. In particular said pooling is selected from any one of max-pooling, sum pooling, square pooling and a combination thereof.
In an embodiment, the data point is a spatio-temporal signal. In particular said spatio-temporal signal is selected from any one of audio spectrograms, video, and fMRI sequences. More in particular, said discrete set of sampling points comprises points at different points in time which are included in the group convolution.
The invention further pertains to an image classification device comprising an embodiment of the device described above for classification of digital images, wherein said symmetry group is p4m, and wherein said input stack of feature maps of a first non-translational group convolution layer is a digital image, and subsequent non-translational group convolution layers have p4m symmetry, and between each non-translational group convolution layer is a layer of non-linearities.
The invention further pertains to a classification method comprising applying an embodiment of the device to a data item to obtain a classification, wherein said data item is an input stack of feature maps.
The invention further pertains to a data segmentation method comprising applying an embodiment of the device, wherein the output of said convolutional neural network is obtained by G-pooling over said stabilizer of the origin, thus obtaining a segmentation map that transforms in the same way as the input stack of feature maps to the first layer of the network under transformations from said group of symmetry transformations.
The invention further pertains to a regression method comprising applying an embodiment of the device to a data item to obtain a numerical estimate of a quantity associated with the data item.
The invention further pertains to a data analysis method comprising an embodiment of the device, wherein the output of the convolutional neural network is a feature vector that is used for further computational analysis of input of the convolutional neural network.
The person skilled in the art will understand the term “substantially” when used in this application, such as in “substantially encloses” or in “substantially extends up to”. The term “substantially” may also include embodiments with “entirely”, “completely”, “all”, etc. Hence, in embodiments the adjective substantially may also be removed. Where applicable, the term “substantially” may also relate to 90% or higher, such as 95% or higher, especially 99% or higher, even more especially 99.5% or higher, including 100%. The term “comprise” includes also embodiments wherein the term “comprises” means “consists of’.
Furthermore, the terms first, second, third and the like if used in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
The device and method herein are amongst others described during operation. As will be clear to the person skilled in the art, the invention is not limited to methods of operation or devices in operation.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb "to comprise" and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device or apparatus claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The invention further applies to a device or method or parts thereof comprising one or more of the characterising features described in the description and/or shown in the attached drawings. The invention further pertains to a method or process comprising one or more of the characterising features described in the description and/or shown in the attached drawings.
The various aspects discussed in this patent can be combined in order to provide additional advantages. Furthermore, some of the features can form the basis for one or more divisional applications.
Brief description of the drawings
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying schematic drawings in which corresponding reference symbols indicate corresponding parts, showing an embodiment of a construction element, and showing in:
Figure la: a scalar valued function on the group p4, which can be interpreted as a p4-feature map or as a p4-filter;
Figure lb: the same function after rotation by the r which denotes the 90 degree rotation. The 2d patches associated with a rotation (e, r, r2, r3) will follow the arrows while undergoing the rotation. The transformation law visualized here follows directly from the p4 group operation and the definition of Lg;
Figure 2a: a scalar valued function on the group p4m, which can be interpreted as a p4-feature map or as a p4-filter;
Figure 2b: the same function after rotation by the r which denotes the 90 degree rotation. The 2D patches associated with a rotation (e, r, r2, r3) will follow the arrows while undergoing the rotation;
Figure 2c: the same function as shown on the left after application of the mirroring operation m. The 2D patches follow the lines without arrowheads while undergoing a flip. The transformation law visualized here follows directly from the p4m group operation and the definition of Lg, and
Figure 3: flow chart describing the G-convolution or G-correlation as computed by G-FFT.
The drawings are not necessarily on scale.
Description of preferred embodiments
The G-convolution and correlation can be implemented straightforwardly using a loop or as a parallel GPU kemel. For each cell in the stack of output feature maps (which has indices for the filter dimension and one or more group dimensions such as rotation and x-translation and y-translation), one has to compute an inner product. The inner product (a sum over products of corresponding elements, as defined above), is computed between the stack of feature maps that is the input to the layer, and the transformed filter. Let g be the transformation that corresponds to a given cell in the output feature maps. Then we have to apply g to each un-transformed filter (which together form the set of parameters of the layer). This is done by, for each cell in the array that is to store the transformed filter (this cell being associated with the point h in the discrete sampling set on which the feature map is defined), doing a lookup in the un-transformed filter at the cell associated with the point g'1 h. Having done this for each cell in the array that stores the transformed filter, we have computed the transformed filter. The dot product between the g-transformed filter with the input feature map is then computed and the result is stored in the cell of the output feature map associated with g and the filter that was used.
For groups G that are symmorphic (meaning that every transformation in G can be written as the composition of an element of the stabilizer subgroup of the origin and a translation), we can compute the G-correlation using the Z2 - correlation as a subroutine. This has the practical advantage that very efficient implementations of the Z - correlation exist, so that less time has to be spent optimizing the code for the G-correlation.
This works as follows. The G-correlation can be split into a very cheap filterindexing operation followed by a standard planar convolution.
For example, consider the group p4 = C4 ^ Z2. A transformation h in p4 can be decomposed into a translation t and a rotation r as h = tr. From this, and using the homomorphism property LtLr = Ltr, the G-correlation can be decomposed into two parts:
The first part of the computation is to evaluate Lr\|/ for each of four rotations r about the origin. If ψ is a function on Z2 , it is clear what is meant by “rotating the filter”. For a filter ψ that is a function on the group C4 K Z2, we can index the group by three integers (θ; τχ; zy). Rotating such a function amounts to rotating each 2D plane associated with a value of Θ, and cyclically shifting the planes along the Θ axis.
In general, for other groups G, this transformation law follows directly from the definition of the group operation and the definition of the operator Lg.
The second part of the computation is to evaluate the sum over t and k using an optimized implementation of the 2D convolution. To see how this can be done, note that the sum over h C G amounts to a sum over θ, τχ, and zy axes. The Θ axis of both the filter tensor and the feature maps can be folded into the feature axis (k) using a reshape operation. These tensors can then be passed to the convolution routines as if they represented 4 Ki number of 2D feature maps (for p4, which has a stabilizer containing 4 elements; the rotations around the origin). The resulting 4 Κχ+1 planar feature maps can be reshaped into Κχ+1 number of feature maps on p4, each of which has a first axis of length 4. This approach naturally generalizes to any symmorphic group. A third method for computing the G-convolution or G-correlation, which is asymptotically fastest, is to use a Group Fast Fourier Transform (G-FFT) algorithm and its inverse (G-IFFT), which are sometimes referred to in the literature as Generalized FFTs. It is well known in the literature that a Fourier transform can be y defined not just for functions on Z , but for functions on any locally compact topological group (which includes the discrete groups we make use of in this work). The Fourier transform of a function on G is an operator-valued function (in practice realized as a matrix-valued function) on what is known as the “unitary dual” of G (which is a discrete space whenever G is discrete). Furthermore, the well known convolution theorem, which says that the Fourier transform of the convolution of two functions is the product of their Fourier transforms, generalizes to the case of group Fourier transforms.
In the engineering literature, fast algorithms have been developed for computing the G-FFT and G-IFFT (Chirikjian, G. S., & Kyatkin, A. B. (2001). Engineering Applications of Noncommutative Harmonic Analysis (1st ed.). CRC Press; Rockmore, D. (1995). Some applications of generalized FFTs. In L. Finkelstein & W. Kantor (Eds.), Proceedings of the 1995 DIMACS Workshop on Groups and Computation (Vol. 28, pp. 329-370); Kyatkin, A. B., & Chirikjian, G. S. (1999). Pattern Matching as a Correlation on the Discrete Motion Group. Computer Vision and Image Understanding, 74(1), 22-35; Maslen, D., & Rockmore, D. (1995). Generalized FFTs — a survey of some recent results. In L. Finkelstein & W. Kantor (Eds.), Proceedings of the DIMACS Workshop on Groups and Computation (pp. 183— 237); Rockmore, D. N. (2004). Recent Progress and Applications in Group FFTS. NATO Science Series II: Mathematics, Physics and Chemistry, 136, 227-254. These references are incorporated by reference as if fully set forth.
Since we always work with discrete functions that have bounded support, we can use the FFT for a finite group, even though these discrete functions are modeled as functions on an infinite discrete space or group. This can be achieved by appropriately zero-padding the function in the spatial dimensions, as is commonly done with ordinary FFTs as well. The computation of the G-convolution then proceeds as follows: first, the G-FFT of the input feature maps and filters are computed. The resulting spectral functions are multiplied (and since these functions are matrix-valued, multiplication means matrix-multiplication), and summed over the input feature maps. As is known in the literature, the G-correlation would require taking the conjugate transpose of the G-FFT of the filter, relative to the G-convolution. Finally, we apply the inverse G-FFT to obtain the desired result. This is illustrated in figure 3.
The benefit of using the G-FFT for computing the G-convolution is that it is asymptotically faster, but the other methods have merit because they are easier to implement and require less memory.
The gradient of a loss function with respect to the parameters (the filters) are easily computed for both G-correlation algorithms. In the first case, the gradient is obtained by a simple application of the rules of calculus, as derived previously. For the G-correlation implementation for symmorphic groups which uses existing planar convolution routines as a subroutine, the gradient of the planar convolution is known. The gradient for the filter-transformation is simply the gradient of an array indexing operation, which is also known. The gradient computation for the G-FFT implementation of the G-correlation is implemented by applying the inverses of the G-IFFT (which is the G-FFT) and the G-FFT (which is the G-IFFT) in reverse order to the ouput gradient. This will give the input gradient, because the G-FFT and its inverse are linear and unitary (hence the Jacobian of the G-FFT is the conjugate transpose of the G-FFT, which is the G-IFFT, and vice-versa for the G-IFFT).
Experimental
The rotated MNIST dataset (Larochelle, Hugo, Erhan, D, Courville, Aaron, Bergstra, James, and Bengio, Yoshua. An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th International Conference on Machine Learning (ICML’07), 2007) contains 62000 randomly rotated handwritten digits. The dataset is split into a training, validation and test sets of size 10000, 2000 and 50000, respectively.
Because data augmentation could potentially reduce the benefits of using G-convolutions, all experiments reported in this section use random rotations on each instance presentation. We found that at least for the rotated MNIST dataset this does not make much difference.
We performed model selection using the validation set, yielding an architecture (Z2CNN, see table 1) that outperforms the models tested by Larochelle et al. (2007) (when trained on 12k and evaluated on 50k), but does not match the previous state of the art, which uses prior knowledge about rotations (Schmidt, Uwe and Roth, Stefan. Learning rotation-aware features: Prom invariant priors to equivariant descriptors. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012). The results are listed in the table below.
We then took this architecture and replaced each convolution by a p4-convolution followed by a max-pooling over all 4 rotation angles. The post-rotation-pooling feature maps transform in the same way as the input image (i.e. as a function on Z2 as opposed to a function p4), which means that all the convolutions used in this network have to be first-layer convolutions. The resulting architecture, P4CNNRotPool, has exactly the same number of parameters as the Z2CNN, but performs better: it roughly matches the state of the previous state of the art. Finally, we designed a network, P4CNN (table below), that uses full p4-convolutions. This network clearly outperforms the previous state of the art.
Tabte L Rotated 'MNÏST architectures.
Legend: C(hl oei, sz) denotes a convolution layer with given «umber of input channels, output channels and .kernel size. P4CKm. oat, sz) denotes firsMaver P4 convolution while P4C de-notes full P4 convolutions, MPfksize, stride) denotes max pooling and MPR denotes am. pooling over ail 4 rotation angles.
The work leading to this invention has received funding from the European Union Seventh Framework Program FP7/2007-2013 under grant agreement n° 632913, and by funding of the Innovative Public-Private Partnership in ICT (IPPSI) -KIEM, NWO Physical Sciences, under grant agreement n° IPPSI.KIEM. 14.020”.
It will also be clear that the above description and drawings are included to illustrate some embodiments of the invention, and not to limit the scope of protection. Starting from this disclosure, many more embodiments will be evident to a skilled person. These embodiments are within the scope of protection and the essence of this invention and are obvious combinations of prior art techniques and the disclosure of this patent.

Claims (24)

1. Inrichting voor het genereren van een convolutioneel neuraal netwerk, waarbij de inrichting een computer inrichting omvat omvattende een computer programma dat, wanneer werkend op de computer inrichting, het convolutioneel neuraal network definieert, waarbij het convolutioneel neuraal netwerk ten minste een niet-translatie groep convolutie laag omvat, waarbij de niet-translatie groep convolutie laag als invoer een set trainbare filters en een invoer stapel met kenmerk lijsten heeft, waarbij beide invoeren een vectorwaarde functies zijn op een discrete set bemonsteringspunten die een set symmetrie transformaties hebben die een mathematische groep van symmetrie transformaties vormen, met uitsluiting van de mathematische groepen van symmetrie transformaties die pure translatiegroepen zijn, en de niet-translatie groep convolutie laag als uitvoer een uitvoer stapel met kenmerk lijsten oplevert, waarbij de uitvoer stapel met kenmerk lijsten een vector-waarde functie is op de mathematische groep van symmetrie transformaties, waarbij de niet-translatie groep convolutie laag inwendige producten (in-producten) omvat van de invoer stapel van kenmerk lijsten met elke getransformeerde filter van een set getransformeerde filters, waarbij de set getransformeerde filters trainbare filters omvat uit de set trainbare filters waarop een transformatie van de groep symmetrie transformaties is toegepast, en waarbij elk van de inwendige producten een resultaat oplevert dat opgeslagen wordt op een positie in de uitvoer stapel met kenmerk lijsten, en waarbij het resultaat van elk inwendig product op geslagen wordt in de uitvoer stapel met kenmerk lijsten op de positie die samenhangt met het trainbare filter en de symmetrie transformatie die was toegepast op het trainbare filter om het getransformeerde filter te verkrijgen dat was toegepast in het inwendige product.A device for generating a convolutional neural network, the device comprising a computer device comprising a computer program which, when operating on the computer device, defines the convolutional neural network, the convolutional neural network comprising at least one non-translation group convolution layer includes, where the non-translation group has convolution layer as input a set of trainable filters and an input stack with feature lists, both inputs being vector value functions on a discrete set of sampling points that have a set of symmetry transformations that have a mathematical group of form symmetry transformations, excluding the mathematical groups from symmetry transformations that are pure translation groups, and the non-translation group convolution layer if output produces an output stack with attribute lists, where the output stack with attribute lists is a vector value function on the mathematical group of symmetry transformations, where the non-translation group comprises low internal products (in-products) from the input stack of feature lists with each transformed filter from a set of transformed filters, the set of transformed filters comprising trainable filters from the set of trainable filters to which a transformation of the group of symmetry transformations has been applied, and where each of the internal products yields a result that is stored at a position in the output stack with feature lists, and where the result of each internal product is stored in the output stack of feature lists at the position associated with the trainable filter and the symmetry transformation that was applied to the trainable filter to obtain the transformed filter that was applied in the internal product. 2. Inrichting volgens conclusie 1, waarbij de groep symmetrie transformaties een symmorfe groep is, waarbij elke symmetrie transformatie in de symmorfe groep de samenstelling is van een symmetrie transformatie van de stabilisator van de oorsprong en een translatie, waarbij de stabilisator van de oorsprong de subgroep is van de groep van symmetrie transformaties die alle symmetrie transformaties omvat die de oorsprong van de discrete set bemonsteringspunten invariant laat, waarbij de oorsprong een vast maar willekeuring aangewezen punt uit de discrete set bemonsteringspunten is, en waarbij de niet-translatie groep convolutie laag is geïmplementeerd door eerst elke transformatie van de stabilisator van de oorsprong op elk filter toe te passen om een set gestabiliseerde filters te verkrijgen, en dan een bekend translationeel convolutie algoritme toe te passen op de set stabiliseerde getransformeerde filters en de invoer stapel van kenmerk lijsten.The device of claim 1, wherein the group of symmetry transformations is a symmetrical group, wherein each symmetry transformation in the symmetrical group is the composition of a symmetry transformation of the origin stabilizer and a translation, the origin stabilizer being the subgroup is from the group of symmetry transformations that includes all symmetry transformations that leave the origin of the discrete set of sampling points invariant, the origin being a fixed but randomly designated point from the discrete set of sampling points, and where the non-translation group convolution is low implemented by first applying each transformation of the origin stabilizer to each filter to obtain a set of stabilized filters, and then applying a known translational convolution algorithm to the set of stabilized transformed filters and the input stack of feature lists. 3. Inrichting volgens conclusie 1 of 2, waarbij de niet-translatie groep convolutie laag is geïmplementeerd onder toepassing van de snelle groep Fourier transformatie (Fast Group Fourier Transform, FGFT) en zijn inverse, de inverse snelle groep Fourier transformatie (Inverse Fast Fourier Transform, IFGFT), verder omvattende: - toepassen van de FGFT voor de niet-translatie groep symmetrie transformaties op zowel de invoer stapel met kenmerk lijsten en de set trainbare filters, om voor de invoer stapel met kenmerk lijsten en elk van de trainbare filters een matrix-waarde functie te verkrijgen, en - berekenen van het matrix product van de Fourier transformatie van de invoer stapel van kenmerk lijsten met elk van de Fourier transformaties van de set trainbare filters, om de Fourier getransformeerde te krijgen van de groep convolutie van de invoer stapel van kenmerk lijsten met de trainbare filters, en uiteindelijk het berekenen van de Inverse snelle groep Fourier getransformeerde (IFGFT) voor de niet-translatie groep symmetrie transformaties.The apparatus of claim 1 or 2, wherein the non-translation group convolution is low implemented using the fast group Fourier transformation (FGF) and its inverse, the inverse fast group Fourier transformation (Inverse Fast Fourier Transform , IFGFT), further comprising: - applying the FGFT for the non-translation group symmetry transformations to both the input stack with feature lists and the set of trainable filters, to provide a matrix for the input stack with feature lists and each of the trainable filters -value function to obtain, and - calculate the matrix product of the Fourier transformation of the input stack of feature lists with each of the Fourier transformations of the set of trainable filters, to get the Fourier transformed from the convolution group of the input stack of feature lists with the trainable filters, and ultimately calculating the Inverse fast group Fourier transformed (IFGFT) for the non-translation group symmetry transformations. 4. Inrichting volgens een der voorgaande conclusies, waarbij een getrainde verschuivingsterm (‘bias term’) wordt toegevoegd aan een of meer van de uitvoerstapel kenmerk lijsten, waarbij in het bijzonder slechte een enkele verschuivingsterm per kenmerk lijst wordt toegevoegd in de uitvoer stapel met kenmerk lijsten.Device as claimed in any of the foregoing claims, wherein a trained shift term ('bias term') is added to one or more of the output stack feature lists, wherein in particular only a single shift term per feature list is added in the output stack with feature lists. 5. Inrichting volgens een der voorgaande conclusies, waarbij de discrete set bemonsteringspunten een set van integers Z is, en waarbij de groep symmetrie transformaties de groep is van alle samenstellingen van integer translaties en spiegelingen om een punt in de discrete set bemonsteringspunten, in het bijzonder de groep van alle samenstellingen van integer optellingen of vermenigvuldigingen met -1.Device according to any of the preceding claims, wherein the discrete set of sampling points is a set of integers Z, and wherein the group of symmetry transformations is the group of all assemblies of integer translations and reflections around a point in the discrete set of sampling points, in particular the group of all compositions of integer additions or multiplications by -1. 6. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten de set is van bemonsteringspunten in een vierkant rooster, en waarbij de groep van symmetrie transformaties de groep p4 is van alle samenstellingen van integer translaties in twee orthogonale richtingen en rotaties met veelvouden van 90 graden om een punt in de discrete set bemonsteringspunten.Device according to one or more of the preceding claims, wherein the discrete set of sampling points is the set of sampling points in a square grid, and wherein the group of symmetry transformations is the group p4 of all assemblies of integer translations in two orthogonal directions and rotations with multiples of 90 degrees to a point in the discrete set of sampling points. 7. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten de set is van punten in een vierkant rooster, en waarbij de groep van symmetrie transformaties de groep p4m is van alle samenstellingen van integer translaties in twee orthogonale richtingen, rotaties over meervouden van 90 graden om elk punt in de discrete set van bemonsteringspunten.Device according to one or more of the preceding claims, wherein the discrete set of sampling points is the set of points in a square grid, and wherein the group of symmetry transformations is the group p4m of all assemblies of integer translations in two orthogonal directions, rotations over multiples of 90 degrees to each point in the discrete set of sampling points. 8. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten de set is van punten in een hexagonaal rooster, en waarbij de symmetrie groep de groep p6 is van samenstellingen van integer translaties in de twee onafhankelijke richtingen van het hexagonale rooster en rotaties over meervouden van 60 graden om elk punt in het hexagonale rooster.A device according to any one of the preceding claims, wherein the discrete set of sampling points is the set of points in a hexagonal lattice, and wherein the symmetry group is the group p6 of assemblies of integer translations in the two independent directions of the hexagonal lattice and rotations over multiples of 60 degrees around each point in the hexagonal grid. 9. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten de set is van punten in een hexagonaal rooster, en waarbij de groep van symmetrie transformaties de groep p6m is van samenstellingen van integer translaties in de twee onafhankelijke richtingen van het hexagonale rooster en rotaties over meervouden van 60 graden om elk punt in het hexagonale rooster en spiegelingen in spiegels die uitgelijnd zijn met de translatie assen van het hexagonale rooster.The device of any one of the preceding claims, wherein the discrete set of sampling points is the set of points in a hexagonal lattice, and wherein the group of symmetry transformations is the group p6m of assemblies of integer translations in the two independent directions of the hexagonal lattice and rotations over multiples of 60 degrees around each point in the hexagonal lattice and reflections in mirrors aligned with the translation axes of the hexagonal lattice. 10. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten de set is van punten in een rombisch rooster, of een vierkant rooster, of een hexagonaal rooster, of een rechthoekig rooster, of een parallellogram rooster, en de groep symmetrie transformaties respectievelijk, de groep is van cmm, p4m, p6m, pmm, of p2, of een van de subgroepen van deze groepen die niet pure translatie groepen zijn, in het bijzonder elk van de 17 behang (wallpaper) groepen met uitzondering van de groep pl van pure translaties.Device according to one or more of the preceding claims, wherein the discrete set of sampling points is the set of points in a rhombic lattice, or a square lattice, or a hexagonal lattice, or a rectangular lattice, or a parallelogram lattice, and the group symmetry transformations respectively, the group is of cmm, p4m, p6m, pmm, or p2, or one of the subgroups of these groups that are not pure translation groups, in particular any of the 17 wallpaper groups excluding the group pl of pure translations. 11. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten de set is van punten in een driedimensionaal kubisch rooster, en waarbij de groep van symmetrie transformaties de groep is van alle samenstellingen van integer translaties in drie orthogonale richtingen van het rooster, rotaties om veelvouden van 90 graden om een as gekoppeld met elk punt in het rooster.The device according to any of the preceding claims, wherein the discrete set of sampling points is the set of points in a three-dimensional cubic lattice, and wherein the group of symmetry transformations is the group of all assemblies of integer translations in three orthogonal directions of the lattice , rotations around multiples of 90 degrees around an axis associated with each point in the grid. 12. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten de set is van punten in een driedimensionaal kubisch rooster, en waarbij de groep van symmetrie transformaties de groep is van alle samenstellingen van integer translaties in de drie orthogonale richtingen van het rooster, rotaties met meervouden van 90 graden om een as gekoppeld met een punt in het rooster, en spiegelingen in elke as-uitgelijnde spiegel.12. Device according to one or more of the preceding claims, wherein the discrete set of sampling points is the set of points in a three-dimensional cubic lattice, and wherein the group of symmetry transformations is the group of all assemblies of integer translations in the three orthogonal directions of the grid, rotations with multiples of 90 degrees around an axis coupled with a point in the grid, and reflections in each axis-aligned mirror. 13. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten een driedimensionaal orthorombisch or tetragonaal rooster vormen, en waarbij de invoer stapel met kenmerk lijsten volume beeldpunten of “voxels” zijn in een digitaal magnetisch resonantie beeld.Device according to one or more of the preceding claims, wherein the discrete set of sampling points form a three-dimensional orthorombic oretragonal lattice, and wherein the input stack with feature lists are volume pixels or "voxels" in a digital magnetic resonance image. 14. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten een driedimensionaal Bravais rooster vormen, en waarbij de groep symmetrie transformaties de groep is van alle symmetrieën van het rooster, of een subgroep van deze groepen die niet een pure translatie groep is.Device according to one or more of the preceding claims, wherein the discrete set of sampling points form a three-dimensional Bravais grid, and wherein the group of symmetry transformations is the group of all symmetries of the grid, or a subgroup of these groups that is not a pure translation group is. 15. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten bestaat uit de elementen in een van de 17 behang (wallpaper) groepen met uitzondering van de pure translatie groep, of een van de 230 ruimte groepen met uitzondering van de pure translatie groepen, of een van de tweedimensionale puntgroepen of een van de driedimensionale puntgroepen.15. Device as claimed in one or more of the foregoing claims, wherein the discrete set of sampling points consists of the elements in one of the 17 wallpaper groups with the exception of the pure translation group, or one of the 230 space groups with the exception of the pure translation groups, or one of the two-dimensional point groups or one of the three-dimensional point groups. 16. Inrichting volgens een of meer der voorgaande conclusies, waarbij de groep van symmetrie transformatie er een is gekozen uit de 6 Frieze groepen pl 1, pl lg, plml, p2, p2mg, pl lm, p2mm, of een van hun subgroepen die niet pure translatie groepen zijn.The device according to one or more of the preceding claims, wherein the group of symmetry transformation is one selected from the 6 Frieze groups p1, p1g, p1ml, p2, p2mg, p1m, p2mm, or one of their subgroups not pure translation groups. 17. Inrichting volgens een of meer der voorgaande conclusies, waarbij de discrete set bemonsteringspunten bestaat uit de elementen van een vierkant, rechthoekig, rombisch, parallellogram of hexagonaal rooster, en waarbij de invoer stapel met kenmerk lijsten beeldpunten omvat van een digitaal beeld met een of meer kleur kanalen.17. Device as claimed in one or more of the foregoing claims, wherein the discrete set of sampling points consists of the elements of a square, rectangular, rhombic, parallelogram or hexagonal grid, and wherein the input stack with feature lists comprises pixels of a digital image with one or more more color channels. 18. Inrichting volgens een of meer der voorgaande conclusies, waarbij G-pooling wordt toegepast op ten minste een van de kenmerk lijsten van de uitvoer stapel met kenmerk lijsten, in het bijzonder is de pooling gekozen uit een van max-pooling, som pooling, vierkant pooling en een combinatie daarvan.Device as claimed in one or more of the foregoing claims, wherein G-pooling is applied to at least one of the feature lists of the output stack with feature lists, in particular the pooling is selected from one of max-pooling, sum pooling, square pooling and a combination thereof. 19. Inrichting volgens een of meer der voorgaande conclusies, waarbij het gegevenspunt een spatio-temporeel signaal is, in het bijzonder is het spatio-temporale signaal gekozen uit een audio spectrogram, video, en fMRI sequenties, en waarbij meer in het bijzonder de discrete bemonsteringspunten punten omvatten op verschillende punten in de tijd die zijn opgenomen in de groep convolutie.Device according to one or more of the preceding claims, wherein the data point is a spatio-temporal signal, in particular the spatio-temporal signal is selected from an audio spectrogram, video, and fMRI sequences, and more particularly the discrete sampling points include points at different points in time that are included in the convolution group. 20. Beeldclassificatie inrichting omvattende de inrichting volgens een of meer der voorgaande conclusies, voor digitale beelden, waarbij de symmetriegroep p4m is, en waarbij de invoer stapel met kenmerk lijsten van een eerste niet-translatie groep convolutie laag een digitaal beeld is, en volgende niet-translatie groep convolutie lagen p4m symmetrie hebben, en tussen elke niet-translatie groep convolutie laag een laag is van niet-lineariteiten.An image classification device comprising the device according to one or more of the preceding claims, for digital images, wherein the symmetry group is p4m, and wherein the input stack with feature lists of a first non-translation group convolution layer is a digital image, and the following is not translation group convolution layers have p4m symmetry, and between each non-translation group convolution layer is a layer of non-linearities. 21. Classificatie werkwijze omvattende toepassing van de inrichting volgens een der voorgaande conclusies op een gegevenseenheid om een classificatie te verkrijgen, waarbij de gegevenseenheid een invoer stapel met kenmerk lijsten is.A classification method comprising applying the device according to any one of the preceding claims to a data unit to obtain a classification, the data unit being an input stack with feature lists. 22. Data segmentatie werkwijze omvattende toepassing van de inrichting volgens een of meer der voorgaande conclusies met inbegrip van conclusies 2 en 18, waarbij de uitvoer van het convolutie neuraal netwerk is verkregen middels G-poolen over de stabilisator van de oorsprong, om aldus een segmentatie lijst te verkrijgen die op dezelfde wijze transformeert als de invoer stapel met kenmerk lijsten ten opzichte van de eerste laag van het netwerk onder transformaties uit de groep symmetrie transformaties.A data segmentation method comprising application of the device according to one or more of the preceding claims including claims 2 and 18, wherein the output of the convolutional neural network is obtained by G-pole over the stabilizer of the origin, so as to achieve a segmentation obtain a list that transforms in the same way as the input stack with feature lists relative to the first layer of the network under transformations from the group symmetry transformations. 23. Regressie werkwijze omvattende toepassing van de inrichting volgens een of meer der voorgaande conclusies op een gegevenseenheid om een numerieke schatting te krijgen van een grootheid die geassocieerd is met de gegevenseenheid.A regression method comprising applying the device according to one or more of the preceding claims to a data unit to obtain a numerical estimate of a quantity associated with the data unit. 24. Gegevensanalyse werkwijze omvattende de inrichting volgens een of meer der voorgaande conclusies, waarbij de uitvoer van het convolutie neurale netwerk een kenmerk vector is welke toegepast wordt voor verdere rekenkundige analyse van invoer van het convolutie neurale netwerk.A data analysis method comprising the device according to one or more of the preceding claims, wherein the output of the convolutional neural network is a feature vector which is used for further arithmetic analysis of input of the convolutional neural network.
NL2016285A 2016-02-19 2016-02-19 Device and method for generating a group equivariant convolutional neural network. NL2016285B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
NL2016285A NL2016285B1 (en) 2016-02-19 2016-02-19 Device and method for generating a group equivariant convolutional neural network.
PCT/NL2017/050084 WO2017142397A1 (en) 2016-02-19 2017-02-10 Device and method for generating a group equivariant convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
NL2016285A NL2016285B1 (en) 2016-02-19 2016-02-19 Device and method for generating a group equivariant convolutional neural network.

Publications (2)

Publication Number Publication Date
NL2016285A true NL2016285A (en) 2017-08-24
NL2016285B1 NL2016285B1 (en) 2017-09-20

Family

ID=56852328

Family Applications (1)

Application Number Title Priority Date Filing Date
NL2016285A NL2016285B1 (en) 2016-02-19 2016-02-19 Device and method for generating a group equivariant convolutional neural network.

Country Status (2)

Country Link
NL (1) NL2016285B1 (en)
WO (1) WO2017142397A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875904A (en) * 2018-04-04 2018-11-23 北京迈格威科技有限公司 Image processing method, image processing apparatus and computer readable storage medium
CN112200720A (en) * 2020-09-29 2021-01-08 中科方寸知微(南京)科技有限公司 Super-resolution image reconstruction method and system based on filter fusion
CN113033677A (en) * 2021-03-30 2021-06-25 北京有竹居网络技术有限公司 Video classification method and device, electronic equipment and storage medium

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190392287A1 (en) 2018-06-22 2019-12-26 Samsung Electronics Co., Ltd. Neural processor
CN109359727B (en) * 2018-12-07 2022-01-11 北京字节跳动网络技术有限公司 Method, device and equipment for determining structure of neural network and readable medium
US11671111B2 (en) 2019-04-17 2023-06-06 Samsung Electronics Co., Ltd. Hardware channel-parallel data compression/decompression
US11211944B2 (en) 2019-04-17 2021-12-28 Samsung Electronics Co., Ltd. Mixed-precision compression with random access
US11880760B2 (en) 2019-05-01 2024-01-23 Samsung Electronics Co., Ltd. Mixed-precision NPU tile with depth-wise convolution
US11967015B2 (en) 2020-02-06 2024-04-23 Apple Inc. Neural rendering
CN111401452B (en) * 2020-03-17 2022-04-26 北京大学 Image classification method of equal-variation convolution network model based on partial differential operator
CN111428407B (en) * 2020-03-23 2023-07-18 杭州电子科技大学 Electromagnetic scattering calculation method based on deep learning
CN112990315B (en) * 2021-03-17 2023-10-20 北京大学 3D shape image classification method of constant-variation 3D convolution network based on partial differential operator
CN114463556B (en) * 2022-01-24 2022-12-16 北京智源人工智能研究院 Equal-variation network training method and device, and image recognition method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014035738A1 (en) * 2012-08-29 2014-03-06 Microsoft Corporation Computer-implemented deep tensor neural network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346622A (en) 2013-07-31 2015-02-11 富士通株式会社 Convolutional neural network classifier, and classifying method and training method thereof
EP2869239A3 (en) 2013-11-04 2015-08-19 Facebook, Inc. Systems and methods for facial representation
KR20150098119A (en) 2014-02-19 2015-08-27 삼성전자주식회사 System and method for removing false positive lesion candidate in medical image

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014035738A1 (en) * 2012-08-29 2014-03-06 Microsoft Corporation Computer-implemented deep tensor neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHIYUAN ZHANG ET AL: "Discriminative Template Learning in Group-Convolutional Networks for Invariant Speech Representations", INTERSPEECH 2015 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION DRESDEN, GERMANY, 6 September 2015 (2015-09-06), pages 3229 - 3233, XP055328747, Retrieved from the Internet <URL:http://www.isca-speech.org/archive/interspeech_2015/papers/i15_3229.pdf> [retrieved on 20161213] *
OREN RIPPEL ET AL: "Spectral Representations for Convolutional Neural Networks", CORR (ARXIV), vol. abs/1506.03767v1, 11 June 2015 (2015-06-11), pages 1 - 10, XP055294145 *
ROBERT GENS ET AL: "Deep Symmetry Networks", 1 January 2014 (2014-01-01), XP055252284, Retrieved from the Internet <URL:http://papers.nips.cc/paper/5424-deep-symmetry-networks.pdf> [retrieved on 20160222] *
SANDER DIELEMAN ET AL: "Exploiting Cyclic Symmetry in Convolutional Neural Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 8 February 2016 (2016-02-08), XP080682215 *
TYLER HIGHLANDER ET AL: "Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 January 2016 (2016-01-25), XP080680179 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875904A (en) * 2018-04-04 2018-11-23 北京迈格威科技有限公司 Image processing method, image processing apparatus and computer readable storage medium
CN112200720A (en) * 2020-09-29 2021-01-08 中科方寸知微(南京)科技有限公司 Super-resolution image reconstruction method and system based on filter fusion
CN112200720B (en) * 2020-09-29 2023-08-08 中科方寸知微(南京)科技有限公司 Super-resolution image reconstruction method and system based on filter fusion
CN113033677A (en) * 2021-03-30 2021-06-25 北京有竹居网络技术有限公司 Video classification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
NL2016285B1 (en) 2017-09-20
WO2017142397A1 (en) 2017-08-24
WO2017142397A8 (en) 2018-07-05

Similar Documents

Publication Publication Date Title
NL2016285B1 (en) Device and method for generating a group equivariant convolutional neural network.
Weiler et al. 3d steerable cnns: Learning rotationally equivariant features in volumetric data
Mitchison et al. Counterfactual computation
Chen et al. Color image analysis by quaternion-type moments
Kilmer et al. Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging
Zhang et al. QSobel: A novel quantum image edge extraction algorithm
Kilmer et al. Factorization strategies for third-order tensors
Solomon et al. SPARCOM: Sparsity based super-resolution correlation microscopy
Granlund An associative perception-action structure using a localized space variant information representation
Wang et al. On the structure of nonlinearities in pose graph SLAM
Shifat-E-Rabbi et al. Radon cumulative distribution transform subspace modeling for image classification
Henriques et al. Fast training of pose detectors in the fourier domain
Huang et al. Solving complex quadratic systems with full-rank random matrices
Montanari et al. A projection pursuit approach to variable selection
Kuang et al. Preconditioning of optimal transport
Garcıa-Osorio et al. Visualization of high-dimensional data via orthogonal curves
Bizzarri et al. Symmetries of discrete curves and point clouds via trigonometric interpolation
van Wyk et al. Kronecker product graph matching
Caraiman et al. New applications of quantum algorithms to computer graphics: the quantum random sample consensus algorithm
Dai et al. IsoExplorer: an isosurface-driven framework for 3D shape analysis of biomedical volume data
Peng et al. Efficient long-range convolutions for point clouds
Krivic et al. Learning missing edges via kernels in partially-known graphs.
Hosny et al. CUDAQuat: new parallel framework for fast computation of quaternion moments for color images applications
Chen Standard deviation normalized summed squared difference for image registration
Naud An accurate MDS-based algorithm for the visualization of large multidimensional datasets

Legal Events

Date Code Title Description
PD Change of ownership

Owner name: QUALCOMM INCORPORATED; US

Free format text: DETAILS ASSIGNMENT: CHANGE OF OWNER(S), ASSIGNMENT; FORMER OWNER NAME: SCYFER B.V.

Effective date: 20190328