EP3620970B1

EP3620970B1 - Method for extracting characteristics of a fingerprint represented by an input image

Info

Publication number: EP3620970B1
Application number: EP19195806.5A
Authority: EP
Inventors: Anthony CAZASNOVES; Cédric THUILLIER
Original assignee: Idemia Identity and Security France SAS
Current assignee: Idemia Identity and Security France SAS
Priority date: 2018-09-06
Filing date: 2019-09-06
Publication date: 2021-10-27
Anticipated expiration: 2039-09-06
Also published as: ES2902391T3; AU2019226224B2; EP3620970A1; US20200082147A1; PL3620970T3; US11087106B2; AU2019226224A1; FR3085775A1; FR3085775B1

Description

DOMAINE TECHNIQUE GENERALGENERAL TECHNICAL FIELD

La présente invention concerne le domaine de la biométrie, et propose en particulier un procédé d'extraction de caractéristiques recherchées d'une empreinte digitale représentée par une image d'entrée, en vue d'un traitement biométrique de l'image d'entrée.The present invention relates to the field of biometrics, and in particular proposes a method for extracting desired characteristics from a fingerprint represented by an input image, with a view to biometric processing of the input image.

ETAT DE L'ARTSTATE OF THE ART

L'authentification/identification biométrique consiste à reconnaitre un individu sur la base de traits biométriques de cet individu tels que les empreintes digitales (reconnaissance digitale), l'iris ou le visage (reconnaissance faciale).Biometric authentication / identification consists of recognizing an individual on the basis of biometric traits of this individual such as fingerprints (fingerprint recognition), iris or face (facial recognition).

Les approches biométriques classiques utilisent les informations caractéristiques du trait biométrique extraites à partir de la biométrie acquise, appelées « features », et l'apprentissage/classification est réalisé sur la base de la comparaison de ces caractéristiques.Conventional biometric approaches use the characteristic information of the biometric trait extracted from the acquired biometrics, called “features”, and the learning / classification is carried out on the basis of the comparison of these characteristics.

En particulier, dans le cas de la reconnaissance digitale, les images d'extrémité de doigt sont traitées de sorte à extraire les caractéristiques d'une empreinte qui peuvent être classées en trois catégories :

Le niveau 1 définit le motif général de cette empreinte (une des quatre classes : boucle à droite, boucle à gauche, arche et spirale), et le tracé global des crètes (on obtient en particulier une carte d'orientation dite « Ridge Flow Matrix », carte RFM, qui représente en chaque point de l'empreinte la direction générale de la crête).
Le niveau 2 définit les points particuliers des empreintes appelés minuties, qui constituent des « événements » le long des crètes (fin d'une crète, bifurcation, etc.). Les approches classiques de reconnaissance utilisent essentiellement ces caractéristiques.
Le niveau 3 définit des informations plus complexes telles que la forme des crêtes, les pores de la peau, des cicatrices, etc.

In particular, in the case of finger recognition, the finger tip images are processed so as to extract the characteristics of a fingerprint which can be classified into three categories:

Level 1 defines the general pattern of this imprint (one of the four classes: right loop, left loop, arch and spiral), and the overall outline of the ridges (in particular, an orientation map known as the “Ridge Flow Matrix” is obtained. », RFM map, which represents the general direction of the ridge at each point of the footprint).
Level 2 defines the particular points of the imprints called minutiae, which constitute “events” along the ridges (end of a ridge, bifurcation, etc.). Classical recognition approaches mainly use these characteristics.
Level 3 defines more complex information such as the shape of ridges, skin pores, scars, etc.

On appelle ainsi un « codage » le procédé d'extraction des caractéristiques d'une empreinte (sous la forme de cartes de caractéristiques, ou « feature maps »), lesquelles permettent de composer une signature appelée « template » encodant l'information utile à la phase finale de classification. Plus précisément, on va réaliser la classification par comparaison des cartes de caractéristiques obtenues avec une ou plusieurs carte(s) de caractéristiques de référence associée(s) à des individus connus.The process for extracting the characteristics of a footprint (in the form of feature maps) is thus called “coding”, which makes it possible to compose a signature called a “template” encoding the information useful to the final phase of classification. More precisely, the classification will be carried out by comparing the characteristic maps obtained with one or more reference characteristic maps associated with known individuals.

On dispose aujourd'hui de « codeurs » réalisant efficacement cette opération d'extraction de caractéristiques, i.e. d'algorithmes procédant à un ensemble de traitements :

Amélioration de l'image (rehaussement de contraste, réduction du bruit, etc.) ;
Utilisation de filtre dédiés (Gabor de différentes résolution, dérivateurs, etc.) ;
Utilisation de méthode de décisions (seuillage de binarisation, extraction de points, etc.)

Today, we have “coders” that efficiently perform this feature extraction operation, ie algorithms carrying out a set of processing:

Image improvement (contrast enhancement, noise reduction, etc.);
Use of dedicated filters (Gabor of different resolution, differentiators, etc.);
Use of decision method (binarization thresholding, point extraction, etc.)

Cependant, on cherche maintenant à embarquer de tels codeurs sur des équipements grand public tels que des smartphones, très contraignants en termes de performances, alors que la chaîne de traitement ci-dessus nécessite de fortes puissances de calcul et ressources mémoires. En effet, jusque là la reconnaissance d'empreinte digitale était essentiellement mise en œuvre sur des bornes d'accès fixes, disposant de moyens de traitement dédiés.However, we are now seeking to embed such encoders on consumer equipment such as smartphones, which are very restrictive in terms of performance, while the above processing chain requires high computing power and memory resources. In fact, until then, fingerprint recognition was mainly implemented on fixed access points, having dedicated processing means.

Une piste est l'utilisation de réseaux de neurones, lesquels sont déjà massivement utilisés pour la classification de données.One avenue is the use of neural networks, which are already heavily used for data classification.

Après une phase d'apprentissage automatique (généralement supervisé, c'est-à-dire sur une base de données de référence déjà classifiées), un réseau de neurones « apprend » et devient tout seul capable d'appliquer la même classification à des données inconnues.After a phase of machine learning (generally supervised, that is to say on an already classified reference database), a neural network "learns" and becomes on its own capable of applying the same classification to data. unknown.

Les réseaux de neurones à convolution, ou CNN (Convolutional Neural Networks) sont un type de réseau de neurones dans lequel le motif de connexion entre les neurones est inspiré par le cortex visuel des animaux. Ils sont ainsi particulièrement adaptés à un type particulier de classification qui est l'analyse d'image, ils permettent en effet avec efficacité la reconnaissance d'objets ou de personnes dans des images ou des vidéos, en particulier dans des applications de sécurité (surveillance automatique, détection de menace, etc.).Convolutional Neural Networks, or CNNs, are a type of neural network in which the pattern of connection between neurons is inspired by the visual cortex of animals. They are thus particularly suited to a particular type of classification which is image analysis, in fact they effectively allow the recognition of objects or objects. people in pictures or videos, especially in security applications (automatic surveillance, threat detection, etc.).

Et, dans le domaine de l'authentification/identification biométrique, un CNN peut être entraîné à reconnaitre un individu sur la base de traits biométriques de cet individu dans la mesure où ces données sont manipulées sous formes d'images.And, in the field of biometric authentication / identification, a CNN can be trained to recognize an individual on the basis of biometric traits of that individual to the extent that this data is manipulated in the form of images.

Dans " Deep convolutional neural network for latent fingerprint enhancement" par Jian Li et al. (SIGNAL PROCESSING. IMAGE COMMUNICATION., vol. 60, 1 février 2018, pages 52-63, XP055590662,NL, ISSN: 0923-5965 ), un procédé de pré-traitement est appliqué à des images d'empreintes digitales, ce procédé utilise des CNN pour améliorer l'extraction de caractéristiques. Si de telles approches ont permis des avancées majeures par exemple en reconnaissance de visages, leur application à la reconnaissance d'empreinte digitales se heurte aux spécificités inhérentes aux empreintes digitales et les performances n'ont jusqu'à ce jour pas été convaincantes. De surcroît, la taille du réseau de neurones doit demeurer limitée afin de pouvoir répondre aux contraintes de mémoire des équipements grand public susmentionnés.In " Deep convolutional neural network for latent fingerprint enhancement "by Jian Li et al. (SIGNAL PROCESSING. IMAGE COMMUNICATION., Vol. 60, February 1, 2018, pages 52-63, XP055590662, NL, ISSN: 0923-5965 ), a pre-processing method is applied to fingerprint images, this method uses CNNs to improve feature extraction. While such approaches have enabled major advances, for example in face recognition, their application to fingerprint recognition comes up against the specificities inherent in fingerprints and performance has so far not been convincing. In addition, the size of the neural network must remain limited in order to be able to meet the memory constraints of the aforementioned consumer equipment.

Il serait par conséquent souhaitable de disposer d'une solution plus légère d'extraction de caractéristiques d'une empreinte digitale, qui soit néanmoins au moins aussi performante que les solutions existantes.It would therefore be desirable to have a lighter solution for extracting characteristics of a fingerprint, which is nevertheless at least as efficient as the existing solutions.

PRESENTATION DE L'INVENTIONPRESENTATION OF THE INVENTION

Selon un premier aspect, la présente invention concerne un procédé d'extraction de caractéristiques recherchées d'une empreinte digitale représentée par une image d'entrée, le procédé étant défini par la revendication 1.According to a first aspect, the present invention relates to a method of extracting desired characteristics from a fingerprint represented by an input image, the method being defined by claim 1.

D'autres caractéristiques avantageuses sont définies par les revendications 1-13.Further advantageous features are defined by claims 1-13.

Selon un deuxième et un troisième aspect, l'invention propose un produit programme d'ordinateur comprenant des instructions de code pour l'exécution d'un procédé selon le premier aspect d'extraction de caractéristiques recherchées d'une empreinte digitale représentée par une image d'entrée ; et un moyen de stockage lisible par un équipement informatique sur lequel un produit programme d'ordinateur comprend des instructions de code pour l'exécution d'un procédé selon le premier aspect d'extraction de caractéristiques recherchées d'une empreinte digitale représentée par une image d'entrée.According to a second and a third aspect, the invention proposes a computer program product comprising code instructions for the execution of a method according to the first aspect of extracting desired characteristics of a fingerprint represented by an image entry; and a storage means readable by computer equipment on which a computer program product comprises code instructions for the execution of a method according to the first aspect of extracting desired characteristics from a fingerprint represented by an image entry.

PRESENTATION DES FIGURESPRESENTATION OF FIGURES

D'autres caractéristiques et avantages de la présente invention apparaîtront à la lecture de la description qui va suivre d'un mode de réalisation préférentiel. Cette description sera donnée en référence aux dessins annexés dans lesquels :

la figure 1 est un schéma d'une architecture pour la mise en œuvre des procédés selon l'invention ;
la figure 2 représente une première possibilité de réseau de neurones à convolution ;
la figure 3 représente un exemple de bloc de décompaction utilisé dans des modes de réalisations du procédé selon l'invention ;
la figure 4 illustre des exemples de convolutions de type Atrous ;
la figure 5 représente un exemple de bloc Inception utilisé dans des modes de réalisations du procédé selon l'invention ;
la figure 6 représente un exemple de réseau de neurones à convolution pour la mise en œuvre du procédé selon l'invention.

Other characteristics and advantages of the present invention will become apparent on reading the following description of a preferred embodiment. This description will be given with reference to the accompanying drawings in which:

the figure 1 is a diagram of an architecture for implementing the methods according to the invention;
the figure 2 represents a first possibility of convolutional neural network;
the figure 3 represents an example of a decompaction block used in embodiments of the method according to the invention;
the figure 4 illustrates examples of Atrous type convolutions;
the figure 5 represents an example of an Inception block used in embodiments of the method according to the invention;
the figure 6 represents an example of a convolutional neural network for implementing the method according to the invention.

DESCRIPTION DETAILLEEDETAILED DESCRIPTION Principe et architecturePrinciple and architecture

Le présent procédé propose un procédé d'extraction de caractéristiques recherchées d'une empreinte digitale représentée par une image d'entrée. Ce procédé consiste typiquement en un « codage » de l'empreinte, i.e. lesdites caractéristiques recherchées à extraire sont typiquement des caractéristiques « biométriques », c'est-à-dire les caractéristiques « finales » permettant de composer un template de l'empreinte digitale en vue de faire de classification (identification/authentification d'individu, voir plus loin). A ce titre lesdites caractéristiques recherchées décrivent typiquement des minuties, c'est-à-dire qu'elles comprennent la position et/ou l'orientation des minuties. Toutefois, on comprendra que le présent procédé n'est pas limitée à ce mode de réalisation, et toutes les caractéristiques possiblement d'intérêt en biométrie peuvent être extraites à l'issue de ce procédé.The present method provides a method of extracting desired characteristics from a fingerprint represented by an input image. This method typically consists of an “encoding” of the fingerprint, ie said characteristics sought to be extracted are typically “biometric” characteristics, that is to say the “final” characteristics making it possible to compose a template of the fingerprint. in order to make classification (identification / authentication of individual, see below). As such, said desired characteristics typically describe minutiae, that is to say they include the position and / or orientation of minutiae. However, it will be understood that the present method is not limited to this embodiment, and all the characteristics possibly of interest in biometry can be extracted at the end of this method.

Le présent procédé se distingue en ce qu'il propose une étape (a) de binarisation de ladite image d'entrée au moyen d'un réseau de neurones à convolution, CNN, de sorte à générer une image dite binaire. En effet, alors que l'image d'entrée est en couleur ou typiquement en niveau de gris, l'image binaire n'est constituée que de zones blanches ou noires, les zones blanches représentant les crètes et les zones blanche les vallées entre les crêtes, et est donc particulièrement claire et lisible.The present method is distinguished in that it proposes a step (a) of binarization of said input image by means of a convolutional neural network, CNN, so as to generate a so-called binary image. Indeed, while the input image is in color or typically in gray level, the binary image consists only of white or black areas, the white areas representing the ridges and the white areas the valleys between the peaks. ridges, and is therefore particularly clear and readable.

L'image binaire peut être vue comme une carte de caractéristiques « intermédiaires » de l'empreinte digitale d'entrée (features map). On note qu'il est connu de binariser une image d'empreinte digitale en tant que « pré-traitement » par des algorithme de traitement de l'image, mais il a été découvert qu'il était possible de réaliser cette binarisation de façon très efficace avec des réseaux de neurones de taille limitée répondant aux contraintes d'un embarquement sur un équipement grand public tel qu'un smartphone.The binary image can be seen as a map of “intermediate” characteristics of the input fingerprint (features map). Note that it is known to binarize a fingerprint image as a "pre-processing" by image processing algorithms, but it has been discovered that it is possible to achieve this binarization in a very efficient with neural networks of limited size meeting the constraints of boarding a consumer equipment such as a smartphone.

Plus précisément, binariser l'image facilite considérablement les traitements ultérieurs pour extraire les caractéristiques recherchées de l'empreinte (et donc limite les ressources nécessaires) tout en étant facilement embarquable comme il va être montré. Ainsi, on peut obtenir un codeur complet embarqué ayant les mêmes performances que les codeurs connus.More precisely, binarizing the image considerably facilitates subsequent processing to extract the desired characteristics from the imprint (and therefore limits the necessary resources) while being easily embeddable as will be shown. Thus, it is possible to obtain a complete on-board encoder having the same performance as the known encoders.

Le présent procédé est mis en œuvre au sein d'une architecture telle que représentée par la figure 1 , grâce à un serveur 1 et un client 2. Le serveur 1 est l'équipement d'apprentissage (mettant en œuvre l'apprentissage du CNN) et le client 2 est un équipement de classification (mettant en œuvre le présent procédé d'extraction de caractéristiques recherchées d'une empreinte digitale), par exemple un terminal d'un utilisateur.The present method is implemented within an architecture such as represented by the figure 1 , thanks to a server 1 and a client 2. The server 1 is the learning equipment (implementing the CNN learning) and the client 2 is a classification equipment (implementing the present extraction method characteristics of a fingerprint), for example a user's terminal.

Il est tout à fait possible que les deux équipements 1, 2 soient confondus, mais de façon préférée le serveur 1 est celui d'un fournisseur de solution de sécurité, et le client 2 un équipement grand public personnel, notamment un smartphone, un ordinateur personnel, une tablette tactile, un coffre-fort, etc.It is quite possible that the two devices 1, 2 are merged, but preferably the server 1 is that of a security solution provider, and the client 2 a personal general public device, in particular a smartphone, a computer personal, a touch pad, a safe, etc.

Dans tous les cas, chaque équipement 1, 2 est typiquement un équipement informatique distant relié à un réseau étendu 10 tel que le réseau internet pour l'échange des données. Chacun comprend des moyens de traitement de données 11, 21 de type processeur, et des moyens de stockage de données 12, 22 telle qu'une mémoire informatique, par exemple une mémoire flash ou un disque dur.In all cases, each item of equipment 1, 2 is typically remote computer equipment connected to a wide area network 10 such as the Internet network for the exchange of data. Each comprises data processing means 11, 21 of processor type, and data storage means 12, 22 such as a computer memory, for example a flash memory or a hard disk.

Le serveur 1 stocke une base de données d'apprentissage, i.e. un ensemble d'images d'empreintes digitales pour lesquelles on dispose déjà d'une image binarisée (et éventuellement d'autres informations comme des cartes RFM, voir plus loin) par opposition aux images dites d'entrée que l'on cherche justement à traiter.The server 1 stores a training database, ie a set of fingerprint images for which we already have a binarized image (and possibly other information such as RFM cards, see below) by opposition to the so-called input images that we are precisely trying to process.

L'équipement client 2 comprend avantageusement un scanner d'empreintes digitales 23, de sorte à pouvoir directement acquérir ladite image d'entrée, typiquement pour qu'un utilisateur puisse s'authentifier.The client equipment 2 advantageously comprises a fingerprint scanner 23, so as to be able to directly acquire said input image, typically so that a user can authenticate himself.

CNNCNN

Un CNN contient généralement quatre types de couches traitant successivement l'information :

la couche de convolution qui traite des blocs de l'entrée les uns après les autres ;
la couche non linéaire qui permet d'ajouter de la non linéarité au réseau et donc d'avoir des fonctions de décision beaucoup plus complexes ;
la couche de mise en commun (appelée « pooling ») qui permet de regrouper plusieurs neurones en un seul neurone ;
la couche entièrement connectée qui relie tous les neurones d'une couche à tous les neurones de la couche précédente (pour de la classification).

A CNN generally contains four types of layers processing information successively:

the convolutional layer which processes blocks of the input one after the other;
the nonlinear layer which makes it possible to add nonlinearity to the network and therefore to have much more complex decision functions;
the pooling layer (called “pooling”) which allows several neurons to be grouped together into a single neuron;
the fully connected layer that connects all neurons in one layer to all neurons in the previous layer (for classification).

Les couches non-linéaires sont souvent précédées d'une couche de normalisation en batch (« couche BN » pour batch normalization) avant chaque couche non-linéaire NL, de sorte à accélérer l'apprentissage.The non-linear layers are often preceded by a batch normalization layer (“BN layer” for batch normalization) before each non-linear NL layer, so as to accelerate the learning.

La fonction d'activation de couche non linéaire NL est typiquement la fonction ReLU (Rectified Linear Unit, i.e. Unité de Rectification Linéaire) qui est égale à f(x) = max(0, x) et la couche de pooling (noté POOL) la plus utilisée est la fonction AvgPool qui correspond à une moyennes entre les valeurs d'un carré (on met en commun plusieurs valeurs en une seule).The NL nonlinear layer activation function is typically the ReLU function (Rectified Linear Unit) which is equal to f (x) = max (0, x) and the pooling layer (denoted POOL) the most used is the AvgPool function which corresponds to an average between the values of a square (we pool several values into one).

La couche de convolution, notée CONV, et la couche entièrement connectée, notée FC, correspondent généralement à un produit scalaire entre les neurones de la couche précédente et les poids du CNN.The convolutional layer, denoted CONV, and the fully connected layer, denoted FC, generally correspond to a scalar product between the neurons of the previous layer and the weights of the CNN.

Les architectures typiques de CNN empilent quelques paires de couches CONV → NL puis ajoutent une couche POOL et répètent ce schéma [(CONV → NL)^p → POOL] jusqu'à obtenir un vecteur de sortie de taille suffisamment petite, puis terminent par deux couches entièrement connectées FC.Typical CNN architectures stack a few pairs of CONV → NL layers then add a POOL layer and repeat this scheme [(CONV → NL) ^p → POOL] until you get an output vector of sufficiently small size, then end with two layers fully connected FC.

Voici une architecture CNN typique :

INPUT → [[CONV → NL]^p → POOL]ⁿ → FC → FC

Here is a typical CNN architecture:

INPUT → [[CONV → NL] ^p → POOL] ⁿ → FC → FC

Dans le présent CNN, on comprend qu'aucune couche FC n'est nécessaire dans la mesure où le résultat attendu n'est pas une classe, mais l'image binaire, qui est une carte de caractéristiques.In this CNN, it is understood that no FC layer is necessary since the expected result is not a class, but the binary image, which is a feature map.

De façon générale, ledit CNN comprend un ensemble de couches de convolution successives. De façon connue et comme expliquée avant, chacune desdites couches de convolution peut être suivie d'une couche de normalisation en batch BN et/ou d'une couche non-linéaire, en particulier ReLU, préférentiellement les deux dans cet ordre.In general, said CNN comprises a set of successive convolutional layers. In a known manner and as explained above, each of said convolution layers can be followed by a normalization layer in batch BN and / or by a non-linear layer, in particular ReLU, preferably both in that order.

De sorte à réaliser la binarisation, ledit ensemble de couches de convolution successives présente une taille de filtre décroissante et un nombre de filtres décroissants. La décroissance de la taille de filtre permet ainsi une fusion de l'image par réduction itérative. Ledit ensemble est comme l'on verra disposé à la « fin » du CNN, c'est-à-dire au niveau de sa sortie : la dernière couche de convolution dudit ensemble présente avantageusement une taille de filtre 1×1 et génère en sortie ladite image binaire.So as to achieve binarization, said set of successive convolution layers has a decreasing filter size and a decreasing number of filters. Decreasing the filter size thus allows merging of the image by iterative reduction. Said set is as we will see arranged at the “end” of the CNN, that is to say at its output: the last convolution layer of said set advantageously has a 1 × 1 filter size and generates at output said binary image.

On rappelle en effet qu'une couche de convolution est définie par un ensemble de filtres (ou « kernels ») mis en œuvre sur un bloc de l'entrée, i.e. une sous-surface. Le nombre des filtres mis en œuvre définit la taille du vecteur de sortie, et la taille de ces filtres définit l'étendue de la surface considérée. L'utilisation de filtres de grande dimension permet de considérer un voisinage assez large mais augmente de façon exponentielle l'empreinte mémoire, c'est pourquoi il est nécessaire de conserver un équilibre.We recall that a convolution layer is defined by a set of filters (or “kernels”) implemented on a block of the input, i.e. a subsurface. The number of filters implemented defines the size of the output vector, and the size of these filters defines the extent of the surface considered. The use of large-dimension filters makes it possible to consider a fairly large neighborhood but exponentially increases the memory footprint, which is why it is necessary to maintain a balance.

Ainsi, la convolution finale de taille de filtre 1×1 permet de fusionner l'information multidimensionnelle issue des couches précédentes en une carte de caractéristiques de dimension 1 qui constitue l'image binaire.Thus, the final 1 × 1 filter size convolution merges the multidimensional information from the previous layers into a 1-dimensional feature map that constitutes the binary image.

A noter que cette dernière couche de convolution peut présenter un unique filtre, i.e. ne générer que l'image binaire, ou présenter un deuxième filtre de sorte à générer en outre un masque de confiance associé à ladite image binaire.Note that this last convolutional layer can have a single filter, i.e. generate only the binary image, or present a second filter so as to also generate a confidence mask associated with said binary image.

La figure 2 représente une première possibilité de CNN de binarisation présentant une taille et une empreinte mémoire minimale.The figure 2 represents a first possibility of a binarization CNN having a minimum size and memory footprint.

Ledit CNN est en effet réduit audit ensemble de couches de convolution successives, et comprend deux couches de convolution « de tête » créant de la profondeur.Said CNN is in fact reduced to said set of successive convolution layers, and comprises two “head” convolution layers creating depth.

La première couche de convolution présente huit filtres d'une taille 5×5, la deuxième couche de convolution présente huit filtres d'une taille 3×3, et la dernière couche de convolution notée CONV_finale (la troisième) présente un filtre d'une taille 1×1.The first convolutional layer has eight filters of size 5 × 5, the second convolution layer has eight filters of size 3 × 3, and the last convolution layer denoted C _final ONV (the third) presents a filter of size 1 × 1.

On voit ainsi qu'on a un nombre de filtres constant à huit, avant de tomber à un, c'est-à-dire que c'est en pratique seulement la dernière couche CONV_finale qui permet la binarisation (et n'a pas d'autres sorties).We thus see that we have a constant number of filters at eight, before falling to one, that is to say that it is in practice only the last _final CONV layer which allows the binarization (and does not have other outputs).

Ce CNN est très intéressant au vu de sa taille particulièrement réduite, mais si l'on souhaite améliorer la qualité, il est préférable d'avoir un nombre de filtres strictement décroissant sur l'ensemble, i.e. une diminution progressive du nombre de filtres.This CNN is very interesting given its particularly small size, but if we want to improve the quality, it is preferable to have a strictly decreasing number of filters overall, i.e. a gradual reduction in the number of filters.

DécompactionDecompaction

Ainsi, en référence à la figure 3 , en lieu est place de la dernière couche de convolution CONV_finale assurant seule la binarisation, on prévoit un bloc dit de « décompaction » qui en contient une pluralité (notées ${CONV}_{i}^{DEC}$

, i ∈ [1;n], n ≥ 2, c'est-à-dire au moins deux couches de convolution successives, avantageusement trois, DEC signifiant « décompaction »).Thus, with reference to the figure 3 , instead of the last _final convolution layer CONV ensuring only the binarization, a so-called “decompaction” block is provided which contains a plurality of them (denoted by

{CONV}_{i}^{DEC}

, i ∈ [1; n], n ≥ 2, that is to say at least two successive convolutional layers, advantageously three, DEC signifying “decompaction”).

Le nombre de filtres diminue d'un pas constant d'une couche de convolution ${CONV}_{i}^{DEC}$

du bloc de décompaction à la suivante

{CONV}_{i + 1}^{DEC}

. La dernière couche

{CONV}_{n}^{DEC}

du bloc de décompaction a de façon préférée une taille de filtre 1×1 comme la couche de convolution finale CONV_finale présentée ci-avant, mais la réduction progressive de la taille de filtre permet de d'éviter la perte d'information et donc de diminuer le bruit. La qualité de la binarisation est ainsi sensiblement améliorée.The number of filters decreases by a constant step of a convolutional layer

{CONV}_{i}^{DEC}

from the decompaction block to the next one

{CONV}_{i + 1}^{DEC}

. The last layer

{CONV}_{not}^{DEC}

of the decompaction block preferably has a 1 × 1 filter size like the final CONV _final convolution layer presented above, but the progressive reduction of the filter size makes it possible to avoid the loss of information and therefore to avoid decrease noise. The quality of the binarization is thus appreciably improved.

Dans le bloc de décompaction, on définit le nombre de cartes de caractéristiques en entrée du bloc NB_{feat_in}, le nombre de cartes de caractéristiques en sortie du bloc NB_{feat_out}, et le nombre de couches de convolutions dans le bloc NB_step (qui correspond à n tel que défini précédemment). Ledit pas constant step est alors défini par la formule step = $\frac{{NB}_{feat_in} - {NB}_{feat_out'}}{{NB}_{step}}$

.In the decompaction block, we define the number of feature maps at the input of the NB _{feat_in} block, the number of feature maps at the output of the NB _{feat_out} block, and the number of convolution layers in the NB _step block (which corresponds to n as defined above). Said constant step is then defined by the formula step =

\frac{{NB}_{feat_in} - {NB}_{feat_out'}}{{NB}_{step}}

.

Par exemple, en définissant que l'on a trois couches dans le bloc comme dans l'exemple de la figure 3, que le nombre de cartes de caractéristiques en sortie est deux (comme expliqué, image binaire et masque de confiance), et que le nombre de cartes de caractéristiques en entrée est huit (comme en sortie de la deuxième couche de convolution du CNN de la figure 2), alors on obtient step = 2, c'est-à-dire que la première couche de convolution ${CONV}_{1}^{DEC}$

du CNN de la figure 3 présente six filtres, la deuxième couche de convolution

{CONV}_{2}^{DEC}

du CNN de la figure 3 présente quatre filtres, et comme prévu la troisième couche de convolution (finale)

{CONV}_{3}^{DEC}

du CNN de la figure 3 présente deux filtres.For example, by defining that we have three layers in the block as in the example of figure 3 , that the number of characteristic cards in output is two (as explained, binary image and confidence mask), and that the number of input feature maps is eight (as output from the second convolutional layer of the CNN of the figure 2 ), then we get step = 2, i.e. the first convolutional layer

CONV

_{1}^{DEC}

of the CNN of the figure 3 presents six filters, the second convolution layer

CONV

_{2}^{DEC}

of the CNN of the figure 3 presents four filters, and as expected the third convolutional layer (final)

{CONV}_{3}^{DEC}

of the CNN of the figure 3 presents two filters.

Dans les zones de l'image d'entrée de plus faible qualité, on constate que le bloc de décompaction permet de prendre en compte une information spatiale plus étendue et d'ainsi proposer une segmentation continue. Pour la même raison, lorsqu'une zone d'occlusion existe dans l'image, la décompaction permet de retrouver une connectivité entre les crêtes et vallées aux bords de cette région sans information.In the areas of the input image of lower quality, it can be seen that the decompaction block makes it possible to take into account more extensive spatial information and thus to propose continuous segmentation. For the same reason, when an occlusion zone exists in the image, decompaction makes it possible to find connectivity between the ridges and valleys at the edges of this region without information.

On note qu'il est tout à fait possible qu'il y ait d'autres couches de convolution dans le CNN, en particulier en amont et/ou en parallèle du bloc de décompaction.It is noted that it is quite possible that there are other convolution layers in the CNN, in particular upstream and / or in parallel with the decompaction block.

Convolutions AtrousAtrous convolutions

Avantageusement, comme l'on voit dans la figure 3, au moins une couche de convolution du bloc de décompaction autre que la dernière, i.e. ${CONV}_{i}^{DEC}$

, i ∈ [1; n-1], est de type à filtre dilaté, dit Atrous.Advantageously, as we see in the figure 3 , at least one convolutional layer of the decompaction block other than the last one, ie

{CONV}_{i}^{DEC}

, i ∈ [1; n -1], is of the type with dilated filter, known as Atrous.

En effet, pour proposer une binarisation de qualité d'une empreinte digitale, il est nécessaire de pouvoir discerner les différences existant entre une vallée et un pli de peau ou une cicatrice présent sur le doigt de l'utilisateur.In fact, in order to provide a quality binarization of a fingerprint, it is necessary to be able to discern the differences existing between a valley and a fold of skin or a scar present on the user's finger.

Cette prise de décision implique nécessairement l'utilisation d'une information consolidée sur un voisinage assez large, et on en revient à l'intérêt des filtres de grande taille, présentant malheureusement une forte empreinte mémoire.This decision-making necessarily involves the use of consolidated information over a fairly large neighborhood, and we come back to the interest of large filters, unfortunately having a large memory footprint.

L'utilisation de couches de convolutions Atrous (voir par exemple le document Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 ., dont est extrait la figure 4 ) permet de lever cette limitation. En partant d'un filtre de taille réduite - par exemple 3×3 - il est possible d'étendre le champ de vision de celui-ci en répartissant les coefficients utilisés selon un espacement sélectionné, voir la figure 4. Cela peut aussi être vu comme l'utilisation d'un filtre parcimonieux de la dimension finale.The use of Atrous convolution layers (see for example the document Chen, LC, Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv: 1706.05587 ., from which the figure 4 ) removes this limitation. In starting from a filter of reduced size - for example 3 × 3 - it is possible to extend its field of vision by distributing the coefficients used according to a selected spacing, see figure 4 . It can also be seen as using a parsimonious final dimension filter.

Pour reformuler, alors que dans une convolution « normale » i.e. à filtre non dilaté, la taille du champ de vision et la taille du filtre coïncident, alors que dans une convolution Atrous, i.e. à filtre dilaté, la taille du champ de vision est supérieure à la taille du filtre du fait de l'espacement entre les pixels considérés.To rephrase, while in a "normal" convolution, ie with an unexpanded filter, the size of the field of view and the size of the filter coincide, while in an Atrous convolution, ie with an expanded filter, the size of the field of view is greater to the size of the filter due to the spacing between the pixels considered.

En particulier, quelle que soit la taille du champ de vision on peut garder une taille de filtre raisonnable entre 3x3 et 7x7 compatible avec l'embarquement sur un équipement grand public.In particular, regardless of the size of the field of view, it is possible to keep a reasonable filter size between 3x3 and 7x7 compatible with boarding on consumer equipment.

Et de façon préférée, chaque autre couche de convolution ${CONV}_{i}^{DEC}$

, ∀i ∈ [1; n-1] dudit bloc de décompaction est de type à filtre dilaté, dit Atrous (i.e. seule la dernière est une convolution « normale », à noter qu'une couche de convolution à filtre de taille 1×1 ne peut pas être Atrous, la taille du champ de vision étant nécessairement également 1×1), avec une taille de champ de vision décroissante.And preferably every other layer of convolution

{CONV}_{i}^{DEC}

, ∀ i ∈ [1; n- 1] of said decompaction block is of the dilated filter type, called Atrous (ie only the last one is a “normal” convolution, note that a filter convolution layer of size 1 × 1 cannot be Atrous, the size of the field of view necessarily also being 1 × 1), with a decreasing size of the field of view.

Dans l'exemple de la figure 3, les première et deuxième couche de convolution, ainsi de type Atrous, présentent chacune une taille de filtre 3×3, mais leur taille de champ de vision est respectivement 9×9 et 5×5.In the example of figure 3 , the first and second convolution layers, thus of Atrous type, each have a 3 × 3 filter size, but their field of view size is 9 × 9 and 5 × 5 respectively.

InceptionInception

Un problème que l'on rencontre en extraction de caractéristiques d'empreintes digitales est la déformation des doigts. Pour que le CNN soit robuste à cette déformation, il est souhaitable qu'il puisse gérer différentes résolutions correspondant à différents niveaux de zoom.One problem encountered in extracting fingerprint features is deformation of the fingers. For the CNN to be robust to this deformation, it is desirable that it can manage different resolutions corresponding to different zoom levels.

L'introduction d'une telle composante « multi-résolution » est une possibilité offerte par des blocs dits Inception constitutifs du réseau du même nom décrit par exemple dans le document Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015, June). Going deeper with convolutions. Cvpr. , auquel l'homme du métier pourra se référer.The introduction of such a “multi-resolution” component is a possibility offered by so-called Inception blocks constituting the network of the same name described for example in the document. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015, June). Going deeper with convolutions. Cvpr. , to which those skilled in the art can refer.

Ainsi, le présent CNN comprend avantageusement un tel bloc Inception dont un mode de réalisation avantageux est représenté par la figure 5 . Thus, the present CNN advantageously comprises such an Inception block, an advantageous embodiment of which is represented by the figure 5 .

Un bloc dit Inception présente une pluralité de branches parallèles avec des couches de convolution ${CONV}_{k}^{INC 2}$

, k ∈ [1; l], l ≥ 2 présentant des tailles de champ de vision différentes, les différentes branches apportant de l'information à chacune des échelles. Dans l'exemple de la figure 4, on a l = 7, et les tailles de champ de vision suivantes (dans l'ordre des branches) : 1x1, 3x3, 5x5, 7x7, 15x15, 30x30, 60x60.A so-called Inception block has a plurality of parallel branches with convolutional layers

{CONV}_{k}^{INC}

, k ∈ [1; l ] , l ≥ 2 presenting different sizes of field of vision, the different branches providing information at each of the scales. In the example of figure 4 , we al = 7, and the following field of view sizes (in order of branches): 1x1, 3x3, 5x5, 7x7, 15x15, 30x30, 60x60.

A la fin du bloc Inception un module de concaténation accumule les cartes de caractéristiques des différentes branches.At the end of the Inception block, a concatenation module accumulates the characteristic cards of the different branches.

De façon préférée chaque branche présente deux couches, dont une couche de convolution 1×1 (normalement en entrée, on verra plus loin le cas particulier de la première branche, i.e. k = 1).Preferably, each branch has two layers, including a 1 × 1 convolution layer (normally as an input, we will see the particular case of the first branch below, i.e. k = 1).

Ainsi, au moins une branche du bloc Inception (préférentiellement toutes sauf une ou deux, présentement l'ensemble des branches k ∈ [3; l]) comprend une couche de convolution ${CONV}_{k}^{INC 1}$

présentant une taille de filtre 1×1, puis une couche de convolution

{CONV}_{k}^{INC 2}

de type à filtre dilaté, dit Atrous, à nouveau avec une taille de filtre entre 3×3 et 7×7. Plus précisément, toutes les couches de convolution Atrous avec une taille de champ jusqu'à 15×15 (celles des 3^e, 4^e et 5^e branches) peuvent présenter une taille de filtre de 3×3, mais au-delà (cas de la 6^e branche où la couche de convolution

{CONV}_{6}^{INC 2}

présente une taille de champ de vision 30x30 et de la 7^e branche où la couche de convolution

{CONV}_{7}^{INC 2}

présente une taille de champ de vision de 60x60), on prend préférentiellement des tailles de filtres respectivement de 5x5 et 7x7 pour conserver un espacement raisonnable entre deux coefficients du filtre, assurant une réelle utilisation de l'information comprise dans le champ de vision étendu, tout en gardant une empreinte mémoire limitée compatible avec l'embarquement sur les dispositifs grand public.Thus, at least one branch of the Inception block (preferably all except one or two, currently all the branches k ∈ [3; l ]) comprises a convolution layer

{CONV}_{k INC}^{1}

showing a 1 × 1 filter size, then a convolutional layer

{CONV}_{k INC}^{2}

of the dilated filter type, says Atrous, again with a filter size between 3 × 3 and 7 × 7. More precisely, all the Atrous convolution layers with a field size up to 15 × 15 (those of the 3 ^rd , 4 ^th and 5 ^th branches) can have a filter size of 3 × 3, but beyond (case of the ^6th branch where the convolution layer

CONV

_{6}^{INC 2}

has a size 30x30 field of vision and the ^7th branch where the convolution layer

CONV

_{7}^{INC 2}

has a field of view size of 60x60), filter sizes of 5x5 and 7x7 respectively are preferably taken to maintain a reasonable spacing between two filter coefficients, ensuring real use of the information included in the extended field of view, while keeping a limited memory footprint compatible with boarding on consumer devices.

Par ailleurs, une branche du bloc Inception peut comprendre une couche de convolution ${CONV}_{2}^{INC 1}$

présentant une taille de filtre 1×1, puis une couche de convolution

({CONV}_{2}^{INC 2})

à filtre non-dilaté de taille 3×3 ; et/ou une branche du bloc Inception comprend une couche de pooling

{POOL}_{1}^{INC 1}

, puis une couche de convolution

{CONV}_{1}^{INC 2}

présentant une taille de filtre 1×1. Dans l'exemple de la figure 5, il y a les deux.In addition, a branch of the Inception block can include a convolution layer

CONV

_{2}^{INC 1}

showing a 1 × 1 filter size, then a convolutional layer

(CONV)

(_{2}^{INC 2})

3 × 3 size unexpanded filter; and / or a branch of the Inception block includes a pooling layer

POOL

_{1}^{INC 1}

, then a convolutional layer

CONV

_{1}^{INC 2}

having a 1 × 1 filter size. In the example of figure 5 , there are both.

La 2^e branche correspond à une taille de champ de vision de taille 3×3, i.e. la taille de filtre est obligée de coïncider avec la taille de champ de vision, d'où le fait que la convolution soit normale et non Atrous.The 2 ^nd branch corresponds to a field of view size of 3 × 3 size, ie the size of the filter must coincide with the size of the field of view, hence the fact that the convolution is normal and not Atrous.

La première branche correspond à un champ de vision de taille 1×1, i.e. une taille de filtre de 1×1. Cette branche pourrait ne comprendre que la couche de convolution 1×1, mais de façon préférée elle est mise en 2^e position et précédée par une couche de pooling (typiquement AveragePooling 3×3, i.e. un moyennage sur un carré de taille 3×3) de façon à augmenter l'information de cette branche.The first branch corresponds to a field of view of size 1 × 1, ie a filter size of 1 × 1. This branch could only include the 1 × 1 convolution layer, but preferably it is put in 2 ^nd position and preceded by a pooling layer (typically AveragePooling 3 × 3, ie an averaging over a square of size 3 × 3 ) in order to increase the information of this branch.

Chaque couche de convolution ${CONV}_{k}^{INC 1,2}$

peut présenter un nombre de filtre relativement élevé, par exemple 32, pour créer de la profondeur. Dans l'exemple de la figure 5, la couche de convolution

({CONV}_{2}^{INC 2})

à filtre non-dilaté de taille 3×3 présente par exception 48 filtres, du fait de l'intérêt de l'information qu'elle encode (c'est la dernière convolution « non Atrous », i.e. qui a accès à toute l'information de manière non parcellaire). L'homme du métier saura adapter le nombre de filtres en fonction des contraintes en particulier d'empreinte mémoire à respecter.Each layer of convolution

{CONV}_{k}^{INC 1.2}

can have a relatively high number of filters, for example 32, to create depth. In the example of figure 5 , the convolution layer

(CONV)

(_{2}^{INC 2})

with non-dilated filter of size 3 × 3 presents by exception 48 filters, because of the interest of the information which it encodes (it is the last convolution "not Atrous", ie which has access to all the information in a non-fragmented manner). Those skilled in the art will know how to adapt the number of filters as a function of the constraints, in particular of the memory footprint to be observed.

Exemple de CNNCNN example

De façon préférée, le CNN comprend successivement le ou les blocs Inception (préférentiellement deux) puis le bloc de décompaction.Preferably, the CNN successively comprises the Inception block (s) (preferably two) then the decompaction block.

Dans un mode de réalisation particulièrement préféré, illustré par la figure 6 , le CNN comprend en parallèle du bloc de décompaction un bloc dit de spécialisation générant d'autres cartes utiles, et en particulier au moins une carte d'orientation de crètes de l'empreinte digitale représentée par ladite image d'entrée, dite carte RFM, et le cas échéant le masque de confiance associé. Plus précisément, la branche produit une carte de sinus et une carte de cosinus, encodant à elles deux la RFM.In a particularly preferred embodiment, illustrated by figure 6 , the CNN comprises in parallel with the decompaction block a so-called specialization block generating other useful maps, and in particular at least one orientation map of ridges of the fingerprint represented by said input image, called RFM card , and if applicable the associated trust mask. More precisely, the branch produces a sine map and a cosine map, both encoding the RFM.

En effet, les cartes RFM ont généralement une résolution inférieure à l'image d'entrée ou l'image binaire (e.g un huitième), et la séparation en deux branches permet d'intégrer cette différence de résolution et de permettre une spécification de l'apprentissage pour les différentes cartes considérées.Indeed, RFM cards generally have a lower resolution than the input image or the binary image (eg an eighth), and the separation into two branches makes it possible to integrate this difference in resolution and to allow a specification of the l learning for the different cards considered.

On a donc un « tronc commun » constitué des blocs Inception, puis deux branches, la branche de spécialisation (i.e. le bloc de spécialisation) et la branche de binarisation (i.e. le bloc de décompaction)We therefore have a “common core” made up of the Inception blocks, then two branches, the specialization branch (i.e. the specialization block) and the binarization branch (i.e. the decompaction block)

Dans l'exemple de la figure 6, le bloc de décompaction est constitué d'une couche de pooling (e.g. AveragePooling 8×8 de sorte à diviser la résolution par huit).In the example of figure 6 , the decompaction block consists of a pooling layer (eg AveragePooling 8 × 8 so as to divide the resolution by eight).

Un tel réseau s'avère particulièrement intéressant du fait de son aptitude à produire à la fois l'image binaire et la carte RFM, pour une taille raisonnable.Such a network is particularly advantageous because of its ability to produce both the binary image and the RFM card, for a reasonable size.

Apprentissage et classificationLearning and classification

Avantageusement, le procédé commence par une étape (a0) d'apprentissage, par les moyens de traitement de données 11 du serveur 1, à partir d'une base d'images d'empreintes digitales déjà binarisées, de paramètres dudit CNN.Advantageously, the method begins with a step (a0) of learning, by the data processing means 11 of the server 1, from a base of already binarized fingerprint images, of parameters of said CNN.

Cet apprentissage peut être réalisé de façon classique, par exemple en utilisant le framework Keras. La fonction de coût d'apprentissage peut être composée d'une attache aux données classique ― erreur quadratique moyenne ― et d'une régularisation par variation totale.This learning can be carried out in a conventional manner, for example by using the Keras framework. The learning cost function can be composed of a classical data attachment - mean squared error - and a regularization by total variation.

A noter que ladite base d'images d'empreintes digitales déjà binarisées peut être construite en utilisant un algorithme connu de binarisation (par exemple par rehaussement de contraste), et similairement pour le masque de confiance. Par ailleurs, de façon classique, des algorithmes d'augmentations peuvent être mis en œuvre de sorte à démultiplier la taille de la base d'apprentissage, pour assurer la robustesse du CNN à des défauts d'acquisition usuels.Note that said base of already binarized fingerprint images can be constructed using a known binarization algorithm (for example by contrast enhancement), and similarly for the confidence mask. Moreover, conventionally, increase algorithms can be implemented so as to increase the size of the learning base, to ensure the robustness of the CNN to usual acquisition defects.

Dans le cas où le CNN présente une branche de spécialisation, celle-ci peut être entraînée si l'on dispose également pour chaque empreinte de ladite base de données de la carte d'orientation correspondante (le cas échéant à nouveau en utilisant un algorithme connu).In the event that the CNN has a branch of specialization, this can be trained if the corresponding orientation map is also available for each fingerprint of said database (if necessary again using a known algorithm. ).

Le CNN appris peut être stocké le cas échéant sur des moyens de stockage de données 22 du client 2 pour utilisation en binarisation. A noter que le même CNN peut être embarqué sur de nombreux clients 2, un seul apprentissage est nécessaire.The learned CNN can be stored if necessary on data storage means 22 of the client 2 for use in binarization. Note that the same CNN can be embedded on many clients 2, only one learning is necessary.

Dans une étape (a) principale, ladite image d'entrée est binarisée par les moyens de traitement de données 21 du client 2 au moyen du CNN embarqué, de sorte à générer l'image binaire.In a main step (a), said input image is binarized by the data processing means 21 of the client 2 by means of the onboard CNN, so as to generate the binary image.

Ensuite, dans une étape (b), ladite image binaire peut être traitée de sorte à extraire lesdites caractéristiques recherchées de l'empreinte digitale représentée par ladite image d'entrée, lesquelles peuvent notamment comprendre la position et/ou l'orientation de minuties.Then, in a step (b), said binary image can be processed so as to extract said desired characteristics from the fingerprint represented by said input image, which can in particular include the position and / or orientation of minutiae.

De façon préférée, le procédé comprend en outre une étape (c) d'identification ou d'authentification dudit individu par comparaison des caractéristiques recherchées extraites de l'empreinte digitale représentée par ladite image d'entrée, avec les caractéristiques d'empreintes digitales de référence, qui pourra être mise en œuvre de n'importe quelle façon connue de l'homme du métier.Preferably, the method further comprises a step (c) of identifying or authenticating said individual by comparing the desired characteristics extracted from the fingerprint represented by said input image, with the characteristics of the fingerprints of said person. reference, which can be implemented in any way known to those skilled in the art.

Par exemple, le client 2 peut stocker les caractéristiques des empreintes d'un ou plusieurs utilisateurs autorisés comme empreintes de référence, de sorte à gérer le déverrouillage de l'équipement client 2 (en particulier dans le cas d'une image d'entrée acquise directement par un scanner 23 intégré) : si les caractéristiques extraites correspondent à celles attendues d'un utilisateur autorisé, les moyens de traitement de données 21 considèrent que l'individu qui tente de s'authentifier est autorisé, et procèdent au déverrouillage.For example, the client 2 can store the characteristics of the fingerprints of one or more authorized users as reference fingerprints, so as to manage the unlocking of the client equipment 2 (in particular in the case of an input image acquired directly by an integrated scanner 23): if the extracted characteristics correspond to those expected of an authorized user, the data processing means 21 consider that the individual who tries to authenticate himself is authorized, and proceed to unlocking.

Alternativement, le client 2 peut envoyer les caractéristiques extraites à une base de données distante desdites caractéristiques d'empreintes digitales de référence, pour identification de l'individuAlternatively, the client 2 can send the extracted characteristics to a remote database of said reference fingerprint characteristics, for identification of the individual.

Différents tests du présent procédé ont été mis en œuvre. Une base d'images d'empreintes acquises à une résolution de 500dpi a été constituée. 90% des images sont dédiées à l'apprentissage, 10% à l'évaluation. L'image en entrée du réseau est un patch sélectionné dans une zone aléatoire de l'image pleine résolution.Various tests of the present process have been implemented. A base of fingerprint images acquired at a resolution of 500dpi was established. 90% of the images are dedicated to learning, 10% to evaluation. The network input image is a patch selected from a random area of the full resolution image.

Un premier test compare pour des exemples d'images d'entrée, les images binaires correspondantes prédéterminées et les images binaires obtenues par la mise en œuvre du CNN minimal de la figure 2.A first test compares for examples of input images, the predetermined corresponding binary images and the binary images obtained by the implementation of the minimum CNN of the figure 2 .

Les résultats obtenus sont de bonne qualité, la démarcation crête/vallée obtenue est franche et la normalisation TV permet de garantir une bonne homogénéité pour chacune de ces catégories. On constate néanmoins un léger déséquilibre de répartition entre vallées et crêtes. Par ailleurs, il apparaît que le CNN minimal connecte parfois par erreur des crêtes (cicatrices/plis de peau peuvent être considéré la plupart du temps comme des crêtes).The results obtained are of good quality, the ridge / valley demarcation obtained is clear and the TV standardization makes it possible to guarantee good homogeneity for each of these categories. There is nevertheless a slight imbalance of distribution between valleys and ridges. In addition, it appears that the minimal CNN sometimes connects ridges by mistake (scars / skin folds can be considered most of the time as ridges).

Dans un deuxième test utilisant cette fois le CNN préféré de la figure 6, sont comparés pour les exemples d'image d'entrée, d'une part les images binaires correspondantes prédéterminées et les images binaires obtenues, et d'autre part les masques de confiance correspondant prédéterminés et les masques de confiance obtenus.In a second test this time using the favorite CNN of the figure 6 , are compared for the examples of input images, on the one hand the corresponding predetermined binary images and the binary images obtained, and on the other hand the corresponding predetermined confidence masks and the confidence masks obtained.

L'approche multi-résolution combinée avec l'utilisation de filtres de plus grandes dimensions permet d'assurer une bonne continuité de la segmentation au niveau du pli de peau, et on constate une répartition équilibrée entre vallées et crêtes, ainsi qu'une robustesse aux cicatrices.The multi-resolution approach combined with the use of filters of larger dimensions ensures good continuity of the segmentation at the level of the skin fold, and there is a balanced distribution between valleys and ridges, as well as robustness. to scars.

D'autres tests ont montré que dans les zones de plus faible qualité, le bloc de décompaction permet de prendre en compte une information spatiale plus étendue et d'ainsi proposer une segmentation continue. Pour la même raison, lorsqu'une zone d'occlusion existe dans l'image, la décompaction permet de retrouver efficacement une connectivité entre les crêtes et vallées aux bords de cette région sans information.Other tests have shown that in areas of lower quality, the decompaction block allows more extensive spatial information to be taken into account and thus offers continuous segmentation. For the same reason, when an occlusion zone exists in the image, decompaction makes it possible to efficiently find connectivity between the ridges and valleys at the edges of this region without information.

Par ailleurs, l'adéquation observée pour la branche de binarisation est confirmée sur la branche de spécialisation (dédiée aux cartes d'orientation telle que la RFM).In addition, the adequacy observed for the binarization branch is confirmed on the specialization branch (dedicated to orientation maps such as the RFM).

Produit programme d'ordinateurComputer program product

Selon un deuxième et un troisième aspects, l'invention concerne un produit programme d'ordinateur comprenant des instructions de code pour l'exécution (en particulier sur les moyens de traitement de données 11, 21 du serveur 1 et/ou du client 2) d'un procédé d'extraction de caractéristiques recherchées d'une empreinte digitale représentée par une image d'entrée, ainsi que des moyens de stockage lisibles par un équipement informatique (une mémoire 12, 22 du serveur 1 et/ou du client 2) sur lequel on trouve ce produit programme d'ordinateur.According to a second and a third aspect, the invention relates to a computer program product comprising code instructions for execution (in particular on the data processing means 11, 21 of the server 1 and / or of the client 2) a method of extracting desired characteristics from a fingerprint represented by an input image, as well as means of storage readable by computer equipment (a memory 12, 22 of the server 1 and / or of the client 2) on which this computer program product is found.

Claims

A method for extracting features of interest from a fingerprint represented by an input image, the method comprising the implementation, by data processing means (21) of a client (2), of steps of:
(a) Binarisation of said input image by means of a convolutional neural network, CNN, so as to generate a so-called binary image, said CNN comprising a so-called decompaction block of successive convolution layers ( ${CONV}_{i}^{DEC}$
, i ∈ [1; n], n ≥ 2) having a decreasing filtre size and a decreasing number of filtres, such that the number of filtres decreases by a constant step from one convolution layer $({CONV}_{i}^{DEC})$
of the decompaction block to the next one $({CONV}_{i + 1}^{DEC})$
;

(b) Processing said binary image so as to extract said features of interest from the fingerprint represented by said input image.
The method according to claim 1, wherein the last convolution layer $({CONV}_{n}^{DEC})$
of said decompaction block has a 1×1 filtre size and generates in output said binary image.
The method according to claim 2, wherein at least one other convolution layer of said decompaction block is of the dilated filtre type, called Atrous, with a filtre size between 3×3 and 7×7.
The method according to claim 3, wherein each other convolution layer of said decompaction block is of the dilated filtre type, called Atrous, with a decreasing field of vision size.
The method according to one of claims 2 to 4, wherein the last convolution layer $({CONV}_{n}^{DEC})$
of said decompaction block further generates a mask of confidence associated with said binary image.
The method according to one of claims 1 to 5, wherein said CNN comprises at least one so-called Inception block having a plurality of parallel branches with convolution layers ( ${CONV}_{k}^{INC 2}$
, k ∈ [1; l], l ≥ 2) having different field of vision sizes.
The method according to claim 6, wherein at least one branch of the Inception block comprises a convolution layer $({CONV}_{k}^{INC 1})$
having a 1x1 filtre size, then a convolution layer $({CONV}_{k}^{INC 2})$
of the dilated filtre type, called Atrous, with a filtre size between 3×3 and 7×7.
The method according to one of claims 6 and 7, wherein one branch of the Inception block comprises a convolution layer $({CONV}_{2}^{INC 1})$
having a 1x1 filtre size, then a convolution layer $({CONV}_{2}^{INC 2})$
with a 3×3 size non-dilated filtre; and/or a branch of the Inception block comprises a pooling layer $({POOL}_{1}^{INC 1})$
, then a convolution layer $({CONV}_{1}^{INC 2})$
having a 1×1 filtre size.
The method according to one of claims 6 to 8, wherein the CNN successively comprises the Inception block(s) then the decompaction block.
The method according to claim 9, wherein the CNN comprises, in parallel with the decompaction block, a so-called specialisation block generating at least one map of orientation of ridges of the fingerprint represented by said input image, called RFM map, said RFM map also being processed in step (b).
The method according to one of claims 1 to 10, comprising a prior training step (a0), by data processing means (11) of a server (1), from a database of fingerprint images already binarised, from parameters of said CNN.
The method according to one of claims 1 to 11, wherein said features of interest to be extracted from the fingerprint represented by said input image comprise the position and/or orientation of minutia.
The method according to one of claims 1 to 12, wherein said fingerprint represented by the input image is that of an individual, the method further comprising a step (c) of identifying or authenticating said individual by comparison of features of interest extracted from the fingerprint represented by said input image, with the features of reference fingerprints.
A computer program product comprising code instructions for the execution of a method according to one of claims 1 to 13 for extracting features of interest from a fingerprint represented by an input image, when said program is executed on a computer.
A storage means readable by a computer equipment on which a computer program product comprises code instructions for the execution of a method according to one of claims 1 to 13 for extracting features of interest from a fingerprint represented by an input image.