JP7485028B2

JP7485028B2 - Learning device, method and program

Info

Publication number: JP7485028B2
Application number: JP2022529205A
Authority: JP
Inventors: 瑛彦高島; 亮増村; 愛庵
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2024-05-16
Anticipated expiration: 2040-06-03
Also published as: JPWO2021245822A1; WO2021245822A1

Description

本発明は、ドメインバイアスが大きい画像データセット間において効果的に教師なしドメイン適応の学習をする技術に関する。 The present invention relates to a technique for effectively learning unsupervised domain adaptation between image datasets with large domain bias.

一般的に機械学習において、学習に用いたデータセットと異なるドメイン分布を持つ（ドメインバイアスのある）データセットで推論を行うと精度が低下する問題がある。また、目的タスクのラベルが豊富に収集できないようなケースも発生する。これに対し、教師なしドメイン適応の技術が提案されている。 In general, machine learning has a problem in that accuracy decreases when inference is performed on a dataset that has a different domain distribution (domain bias) from the dataset used for learning. There are also cases where it is not possible to collect a large number of labels for the target task. In response to this problem, unsupervised domain adaptation technology has been proposed.

教師なしドメイン適応とは、ラベルを持つソースデータとラベルを持たない（教師なし）ターゲットデータ間にあるドメインバイアスを解消することで、ターゲットデータにおいて効果的な推論を可能にする手法である。例えばクラス分類問題では、クラスラベルを持つソースデータと、クラスラベルを持たないターゲットデータと、データがターゲット、ソースのどちらに属しているかを示すドメインラベルとを用いて、ソースとターゲットの特徴分布が等しくなるように学習する。これにより、目的のタスクにおいてクラスラベルがない状況でも、クラスラベルが豊富にある別のデータを活用することで、目的タスクで精度の高い推論が可能となる。教師なしドメイン適応の代表的な方法として敵対的ドメイン適応が挙げられる（例えば、非特許文献１参照。）。この手法はクラス分類の学習とドメイン分類の学習を同時に学習する際に、ドメイン学習側の最初のレイヤーとして、誤差逆伝搬時に勾配を反転させるレイヤー（勾配反転レイヤー）を導入する。これによりソースデータとターゲットデータのドメインの分類がしにくくなり、かつ、クラスが分類できる特徴を学習でき、ドメインに依存しない特徴の学習が可能となる。Unsupervised domain adaptation is a method that enables effective inference in target data by eliminating the domain bias between source data with a label and unlabeled (unsupervised) target data. For example, in a class classification problem, source data with a class label, target data without a class label, and a domain label indicating whether the data belongs to the target or source are used to learn so that the feature distributions of the source and target are equal. As a result, even in a situation where there is no class label for the target task, highly accurate inference for the target task is possible by utilizing other data with abundant class labels. A representative method of unsupervised domain adaptation is adversarial domain adaptation (see, for example, Non-Patent Document 1). In this method, when learning class classification and learning domain classification simultaneously, a layer that inverts the gradient during error backpropagation (gradient inversion layer) is introduced as the first layer on the domain learning side. This makes it difficult to classify the domains of source data and target data, and makes it possible to learn features that can classify classes, and to learn features that are independent of the domain.

Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation”, In ICML, 2015Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation”, In ICML, 2015

しかし、従来の敵対的ドメイン適応手法では、ドメインを構成する要素が多様かつ複雑な問題では効果的に学習できない問題がある。例えば、人間の表情分類タスクを考えると、人間の顔画像に関する変化的な要素は、人種や年齢、表情の作り方、顔の向きや、撮影照明環境など多岐にわたる。これらの多種多様な変化的要素を均一な分布でデータを集めることは困難であり、自ずとデータセットには強烈なドメインバイアスが発生する。例えば、人間の表情データセットを作る場合にインターネットで収集する方法が挙げられるが、収集する際の検索キーワードの言語によって、人種のドメインが発生する。また、例えばWEBカメラを使ったデータ収集では、一般的には室内照明環境下での撮影が行われるので、撮影環境のドメインバイアスが発生する。このように例えば顔画像などの多種多様な変化的要素を持つタスクを学習する場合には、より強烈なドメインバイアスが発生する問題がある。However, conventional adversarial domain adaptation methods have a problem in that they cannot effectively learn problems in which the elements that make up the domain are diverse and complex. For example, when considering a task of classifying human facial expressions, the variables related to human facial images are diverse, including race, age, how to make expressions, facial orientation, and the shooting lighting environment. It is difficult to collect data on these diverse variables with a uniform distribution, and strong domain bias naturally occurs in the dataset. For example, when creating a human facial expression dataset, there is a method of collecting data from the Internet, but a racial domain occurs depending on the language of the search keywords used when collecting data. In addition, for example, when collecting data using a web camera, the image is generally taken under an indoor lighting environment, so a domain bias of the shooting environment occurs. In this way, when learning a task with a wide variety of variables, such as face images, there is a problem of even stronger domain bias.

本発明は、教師なしドメイン適応の学習を従来よりも高精度に行うことができる学習装置、方法及びプログラムを提供することを目的とする。 The present invention aims to provide a learning device, method, and program that can perform unsupervised domain adaptation learning with higher accuracy than conventional methods.

この発明の一態様による学習装置は、ソースデータは、画像、その画像に対応するクラスラベル及びドメインラベルであり、複数のソースデータが記憶されているソースデータ記憶部と、ターゲットデータは、画像及びその画像に対応するドメインラベルであり、複数のターゲットデータが記憶されているターゲットデータ記憶部と、画像を入力としてその入力された画像の特徴を出力する前段ネットワークを用いて、ソースデータ記憶部から読み込んだソースデータの画像の特徴であるソースデータ特徴と、ターゲットデータ記憶部から読み込んだターゲットデータの特徴であるターゲットデータ特徴とを得る前段ネットワーク部と、ソースデータ特徴を入力としてその入力されたソースデータ特徴が属するクラスの確率分布を出力するクラス分類ネットワークを用いて、得られたソースデータ特徴に対応する確率分布を計算し、計算された確率分布と、ソースデータ記憶部から読み込んだ、得られたソースデータ特徴に対応するソースデータのクラスラベルとを用いて、損失関数の値を計算し、計算された損失関数の値を最小化するように、前段ネットワーク及びクラス分類ネットワークのパラメータを誤差逆伝搬法により更新するクラス分類ネットワーク部と、ソースデータ特徴又はターゲット特徴を入力として、その入力されたソースデータ特徴又はターゲット特徴が属するドメインの確率分布を計算するドメイン分類ネットワークを用いて、得られたソースデータ特徴又は得られたターゲット特徴に対応する確率分布を計算し、計算された確率分布と、ソースデータ記憶部から読み込んだ得られたソースデータ特徴に対応するソースデータのドメインラベル又はターゲットデータ記憶部から読み込んだ得られたターゲットデータ特徴に対応するターゲットデータのドメインラベルとを用いて、損失関数の値を計算し、計算された損失関数の値を最小化するように、前段ネットワーク及びドメイン分類ネットワークのパラメータを誤差逆伝搬法により更新するドメイン分類ネットワーク部と、を含み、ドメイン分類ネットワークは、ドメイン分類ネットワークのファーストレイヤーとして、ドメイン分類ネットワーク部で誤差逆伝搬法によりパラメータを更新する際に勾配の符号を反転する勾配反転レイヤーを含み、ドメイン分類ネットワーク部で計算される確率分布は、角度空間で計算されたソフトマックス値でありドメイン分類ネットワーク部は、(a)得られたソースデータ特徴又は得られたターゲット特徴を用いて、ドメイン分類ネットワークの中の、最後のレイヤーである角度ソフトマックスレイヤー以外のレイヤーの計算を行うことで、得られたソースデータ特徴に対応する中間特徴r_s又は得られたターゲット特徴に対応する中間特徴r_tを得るドメイン分類ネットワーク中間計算部と、(b)(i)得られたソースデータ特徴に対応する中間特徴r_s又は得られたターゲットデータ特徴に対応する中間特徴r_tを用いて、正規化済み特徴r_s/||r_s||又は正規化済み特徴はr_t/||r_t||を計算する特徴正規化部と、(ii)角度ソフトマックスレイヤーのパラメータz₀又はz₁を用いて、正規化済みパラメータz₀/||z₀||又は正規化済みパラメータz₁/||z₁||を計算するパラメータ正規化部と、(iii)Tは転置を意味するとして、計算された正規化済み特徴がターゲットデータ特徴に対応する正規化済み特徴r_t/||r_t||である場合、式(1),式(2)により定義される、ターゲットデータ特徴に対応する内積計算済み特徴(cos p_t, cos p_s)を計算し、計算された正規化済み特徴がソースデータ特徴に対応する正規化済み特徴r_s/||r_s||である場合、式(3),式(4)により定義される、ソースデータ特徴に対応する内積計算済み特徴(cos p_t, cos p_s)を計算する内積計算部と、

(iv)mは所定のマージンパラメータであるとして、計算された内積計算済み特徴(cos p_t, cos p_s)がターゲットデータ特徴に対応する場合、ターゲットデータ特徴に対応するマージン追加済み特徴(cos(p_t+m), cos p_s)を計算し、計算された内積計算済み特徴(cos p_t, cos p_s)がソースデータ特徴に対応する場合、ソースデータ特徴に対応するマージン追加済み特徴である(cos p_t, cos(p_s+m))を計算するマージン追加部と、(v)sは所定のスケーリングパラメータであるとして、計算されたマージン追加済み特徴がターゲットデータ特徴に対応する場合、ターゲットデータ特徴に対応するスケーリング済み特徴(s・cos(p_t+m), s・cos p_s)を得て、計算されたマージン追加済み特徴がソースデータ特徴に対応する場合、ソースデータ特徴に対応するスケーリング済み特徴(s・cos p_t, s・cos(p_s+m))を得るスケーリング部と、(vi)得られたスケーリング済み特徴がターゲットデータ特徴に対応する場合、ターゲットドメインに対応するソフトマックス値を式(5)により計算し、ソースドメインに対応するソフトマックス値を式(6)により計算し、得られたスケーリング済み特徴がソースデータ特徴に対応する場合、ターゲットドメインに対応するソフトマックス値を式(7)により計算し、ソースドメインに対応するソフトマックス値を式(8)により計算するソフトマックス計算部とを含む角度ソフトマックスレイヤー部と、

を備えている。 A learning device according to one aspect of the present invention includes a source data storage unit in which a plurality of source data are stored, the source data being an image, and a class label and domain label corresponding to the image; a target data storage unit in which a plurality of target data are stored, the target data being an image and a domain label corresponding to the image; a front-stage network unit that uses a front-stage network that receives an image as an input and outputs features of the input image to obtain source data features that are features of the image of the source data read from the source data storage unit and target data features that are features of the target data read from the target data storage unit; a classification network unit that receives the source data features as input and outputs a probability distribution of a class to which the input source data features belong, calculates a probability distribution corresponding to the obtained source data features, and uses the calculated probability distribution and the class label of the source data corresponding to the obtained source data features read from the source data storage unit to calculate a value of a loss function, and updates parameters of the front-stage network and the classification network by backpropagation so as to minimize the value of the calculated loss function; and a domain classification network unit that calculates a probability distribution corresponding to the obtained source data feature or the obtained target feature using a domain classification network that calculates a probability distribution of a domain to which the target feature belongs, calculates a value of a loss function using the calculated probability distribution and a domain label of source data corresponding to the obtained source data feature read from the source data storage unit or a domain label of target data corresponding to the obtained target data feature read from the target data storage unit, and updates parameters of the front-stage network and the domain classification network by backpropagation so as to minimize the value of the calculated loss function, wherein the domain classification network includes, as a first layer of the domain classification network, a gradient inversion layer that inverts a sign of a gradient when updating parameters by backpropagation in the domain classification network unit, and the probability distribution calculated in the domain classification network unit is a softmax value calculated in an angle space, and the domain classification network unit (a) calculates a layer other than the angle softmax layer, which is the last layer in the domain classification network, using the obtained source data feature or the obtained target feature, to generate an intermediate feature r corresponding to the obtained source data feature. (b) (i) a feature normalization unit that calculates a normalized feature r _s /||r _{s || or a normalized feature r t /||r t} _|| _using the intermediate feature r _s corresponding to the obtained source data feature or the intermediate feature r _t corresponding to the obtained target data feature; (ii) a parameter normalization unit that calculates a normalized parameter z ₀ /||z ₀ || or _a normalized parameter z ₁ /||z ₁ || using a parameter z ₀ or z ₁ of the angular softmax layer; and (iii) a dot product calculated feature (cos p t , cos p s ) corresponding to the target data feature defined by equations ( ₁ ) and (2) where T means transpose, and the calculated normalized feature is a normalized feature r _s /||r _t _|| corresponding to the source data feature, and the calculated normalized feature is a normalized feature r _s _/ ||r _s if ||, an inner product calculation unit calculates inner product-calculated features (cos p _t , cos p _s ) corresponding to the source data features defined by equations (3) and (4);

(iv) a margin adding unit, where m is a predetermined margin parameter, for calculating a margin added feature (cos(p _t +m), cos p _s ) corresponding to the target data feature if the calculated dot product calculated feature (cos p _t , cos p _s ) corresponds to a target data feature, and for calculating a margin added feature (cos p _t , cos(p _s + _m )) corresponding to the source data feature if the calculated dot product calculated feature (cos p t , cos p _s ) corresponds to a source data feature; (v) where s is a predetermined scaling parameter, for obtaining a scaled feature (s · cos(p _t +m), s · cos p _s ) corresponding to the target data feature if the calculated margin added feature corresponds to the target data feature , and for calculating a scaled feature (s · cos p _t , s · cos(p _s (vi) an angle softmax layer unit including a scaling unit for obtaining an angle softmax layer including: (i) a scaling unit for obtaining an angle softmax layer including: (i) a scaling unit for obtaining an angle softmax layer including: (ii) a scaling unit for obtaining an angle softmax layer including: (i) a scaling unit for obtaining an angle softmax layer including: (ii) a scaling unit for obtaining an angle softmax layer including: (iii) a scaling unit for obtaining an angle softmax layer including: (i) a scaling unit for obtaining an angle softmax layer including: (ii) a scaling unit for obtaining an angle softmax layer including:

It is equipped with:

角度空間でのソフトマックス計算を行うことで、教師なしドメイン適応の学習を従来よりも高精度に行うことができる。 By performing softmax calculations in angle space, unsupervised domain adaptation learning can be performed with higher accuracy than before.

図１は、学習装置の機能構成の例を示す図である。FIG. 1 is a diagram illustrating an example of a functional configuration of a learning device. 図２は、ドメイン分類ネットワーク部の例を説明するための図である。FIG. 2 is a diagram for explaining an example of the domain classification network unit. 図３は、角度ソフトマックスレイヤー部の例を説明するための図である。FIG. 3 is a diagram illustrating an example of the angle softmax layer unit. 図４は、学習方法の処理手続きの例を示す図である。FIG. 4 is a diagram showing an example of a processing procedure of the learning method. 図５は、コンピュータの機能構成例を示す図である。FIG. 5 is a diagram illustrating an example of a functional configuration of a computer.

以下、本発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。The following describes in detail an embodiment of the present invention. Note that components having the same functions in the drawings are given the same numbers and duplicated explanations are omitted.

[学習装置及び方法]
学習装置及び方法の処理は一般的な深層学習の学習フェーズ（ネットワークへのデータ入力、ネットワークでの演算、損失関数の計算、パラメータの更新）に相当する。 [Learning device and method]
The processing of the learning device and method corresponds to the general learning phase of deep learning (inputting data into the network, performing operations on the network, calculating the loss function, and updating parameters).

以下、学習装置の各構成部について説明する。 Below, each component of the learning device is explained.

学習装置は、図１に示すように、ソースデータ記憶部１、ターゲットデータ記憶部２、前段ネットワーク部３、クラス分類ネットワーク部４、ドメイン分類ネットワーク部５を例えば備えている。As shown in Figure 1, the learning device, for example, includes a source data memory unit 1, a target data memory unit 2, a front-stage network unit 3, a class classification network unit 4, and a domain classification network unit 5.

学習方法は、学習装置の各構成部が、以下に説明する及び図４に示すステップＳ３からステップＳ５の処理を行うことにより例えば実現される。ステップＳ３からステップＳ５の処理は、ソースデータ記憶部１及びターゲットデータ記憶部２に記憶されているソースデータ及びターゲットデータのそれぞれについて少なくとも行われる。また、ステップＳ３からステップＳ５の処理は、所定の収束条件を満たすまで繰り返し行われてもよい。The learning method is realized, for example, by each component of the learning device performing the processes from step S3 to step S5 described below and shown in FIG. 4. The processes from step S3 to step S5 are performed at least for each of the source data and target data stored in the source data storage unit 1 and the target data storage unit 2. Furthermore, the processes from step S3 to step S5 may be repeated until a predetermined convergence condition is satisfied.

<ソースデータ記憶部１>
ソースデータ記憶部１には、複数のソースデータが記憶されている。 <Source data storage unit 1>
The source data storage unit 1 stores a plurality of source data.

ソースデータは、画像、その画像に対応するクラスラベル及びドメインラベルである。画像をx、クラスラベルをy、ドメインラベルをdとすると、ソースデータの集合D_sは、以下のように表すことができる。

<ターゲットデータ記憶部２>
ターゲットデータ記憶部２には、複数のターゲットデータが記憶されている。 The source data is an image, a class label corresponding to the image, and a domain label. If the image is x, the class label is y, and the domain label is d, the set of source data D _s can be expressed as follows:

<Target data storage unit 2>
The target data storage unit 2 stores a plurality of target data.

ターゲットデータは、画像及びその画像に対応するドメインラベルである。ターゲットデータの集合D_tは、以下のように表すことができる。

<前段ネットワーク部３>
前段ネットワーク部３には、ソースデータ記憶部１から読み込んだソースデータの画像と、ターゲットデータ記憶部２から読み込んだターゲットデータの画像とが入力される。 The target data is an image and its corresponding domain label. The set of target data _Dt can be expressed as follows:

<Front-stage network section 3>
The front-stage network unit 3 receives as input the source data image read from the source data storage unit 1 and the target data image read from the target data storage unit 2 .

前段ネットワーク部３は、入力された画像をベクトルに変換する。すなわち、前段ネットワーク部３は、入力されたソースデータの画像をベクトルに変換してソースデータ特徴とし、入力されたターゲットデータの画像をベクトルに変換してターゲットデータ特徴とする（ステップＳ３）。これらのベクトルは、例えば1024次元など多次元で構成される。The front-stage network unit 3 converts the input image into a vector. That is, the front-stage network unit 3 converts the input source data image into a vector to set it as a source data feature, and converts the input target data image into a vector to set it as a target data feature (step S3). These vectors are multidimensional, for example, 1024-dimensional.

生成されたソースデータ特徴は、クラス分類ネットワーク部４及びドメイン分類ネットワーク部５に出力される。生成されたターゲットデータ特徴は、ドメイン分類ネットワーク部５に出力される。The generated source data features are output to the class classification network unit 4 and the domain classification network unit 5. The generated target data features are output to the domain classification network unit 5.

前段ネットワーク部３では、入力された画像をベクトルに変換するネットワークである前段ネットワークを用いて計算が行われる。前段ネットワークは、例えば一般的なCNNレイヤー、Poolingレイヤー、Fully Connectedレイヤーを持ち、これらを多層に積み重ねたもので構成される。 In the pre-stage network unit 3, calculations are performed using a pre-stage network, which is a network that converts the input image into a vector. The pre-stage network has, for example, a general CNN layer, a pooling layer, and a fully connected layer, and is composed of these stacked in multiple layers.

例えば１回の学習（１バッチ）で、32個のソースデータの画像及び32個のターゲットデータの画像の計64個の画像のそれぞれのベクトルが生成される。For example, in one training session (one batch), vectors are generated for each of 32 source data images and 32 target data images, for a total of 64 images.

なお、前段ネットワークの各レイヤーはパラメータを持つものがある。これらのパラメータは、後述するクラス分類ネットワーク部４及びドメイン分類ネットワーク部５で算される損失関数の値を最小化するように誤差逆伝搬法により更新される。 Note that each layer of the upstream network has parameters. These parameters are updated by the backpropagation method so as to minimize the value of the loss function calculated by the class classification network unit 4 and the domain classification network unit 5, which will be described later.

<クラス分類ネットワーク部４>
クラス分類ネットワーク部４には、前段ネットワーク部３で得られたソースデータ特徴及びソースデータ記憶部１から読み込んだソースデータのクラスラベルが入力される。 <Classification Network Part 4>
The source data features obtained by the pre-stage network unit 3 and the class labels of the source data read from the source data storage unit 1 are input to the classification network unit 4 .

クラス分類ネットワーク部４は、まず、クラス分類ネットワークを用いて、前段ネットワーク部３で得られたソースデータ特徴に対応する確率分布を計算する。クラス分類ネットワークは、ソースデータ特徴を入力とし、クラスの確率分布に変換するネットワークであり、例えばFully Connectedレイヤーを多層に積み重ねたもので構成される。クラス分類ネットワークの最終レイヤーでは、活性化関数にソフトマックス関数を例えば用いて、確率分布が計算される。 The classification network unit 4 first uses a classification network to calculate a probability distribution corresponding to the source data features obtained in the preceding network unit 3. The classification network is a network that takes source data features as input and converts them into a probability distribution of classes, and is composed of, for example, multiple fully connected layers stacked together. In the final layer of the classification network, the probability distribution is calculated using, for example, a softmax function as the activation function.

また、クラス分類ネットワーク部４は、この確率分布を用いて損失関数の値を計算し、この損失関数の値を最小化するように、誤差逆伝搬を行い、前段ネットワーク部３で用いられる前段ネットワーク及びクラス分類ネットワーク部４で用いられるクラス分類ネットワークが有する各レイヤーのパラメータを更新する。損失関数の値の具体的な計算手順としては、例えば最終レイヤーで計算した確率分布と入力のソースデータクラスラベルとを用いてクロスエントロピーで損失関数の値を計算する手順を用いることができる。 Furthermore, the classification network unit 4 calculates the value of a loss function using this probability distribution, performs error backpropagation so as to minimize the value of this loss function, and updates the parameters of each layer of the pre-stage network used in the pre-stage network unit 3 and the classification network used in the classification network unit 4. A specific calculation procedure for the value of the loss function can be, for example, a procedure of calculating the value of the loss function by cross-entropy using the probability distribution calculated in the final layer and the class label of the input source data.

このようにして、クラス分類ネットワーク部４は、クラス分類ネットワークを用いて、前段ネットワーク部３の計算で得られたソースデータ特徴に対応する確率分布を計算し、計算された確率分布と、ソースデータ記憶部１から読み込んだ、前段ネットワーク部３の計算で得られたソースデータ特徴に対応するソースデータのクラスラベルとを用いて、損失関数の値を計算し、計算された損失関数の値を最小化するように、前段ネットワーク及びクラス分類ネットワークのパラメータを誤差逆伝搬法により更新する（ステップＳ４）。In this way, the class classification network unit 4 uses the class classification network to calculate a probability distribution corresponding to the source data features obtained by the calculations in the front-stage network unit 3, calculates the value of a loss function using the calculated probability distribution and the class label of the source data corresponding to the source data features obtained by the calculations in the front-stage network unit 3, which is read from the source data storage unit 1, and updates the parameters of the front-stage network and the class classification network by the backpropagation method so as to minimize the value of the calculated loss function (step S4).

<ドメイン分類ネットワーク部５>
ドメイン分類ネットワーク部５には、前段ネットワーク部３で得られたソースデータ特徴及びターゲット特徴が入力される。 <Domain Classification Network Part 5>
The source data features and target features obtained in the previous-stage network section 3 are input to the domain classification network section 5 .

また、ソースデータ特徴が入力される場合には、ソースデータ記憶部１から読み込んだドメインラベルがドメイン分類ネットワーク部５に更に入力される。また、ターゲットデータ特徴が入力される場合には、ターゲットデータ記憶部２から読み込んだドメインラベルがドメイン分類ネットワーク部５に更に入力される。In addition, when source data features are input, the domain label read from the source data storage unit 1 is further input to the domain classification network unit 5. In addition, when target data features are input, the domain label read from the target data storage unit 2 is further input to the domain classification network unit 5.

ドメイン分類ネットワーク部５は、まず、ドメイン分類ネットワークを用いて、前段ネットワーク部３で得られたソースデータ特徴又はターゲット特徴に対応する確率分布を計算する。この確率分布は、角度空間で計算されるソフトマックス値である。この計算は、後述するドメイン分類ネットワーク中間計算部５１及び角度ソフトマックスレイヤー部５２で行われる。The domain classification network unit 5 first uses the domain classification network to calculate a probability distribution corresponding to the source data features or target features obtained by the front-stage network unit 3. This probability distribution is a softmax value calculated in angle space. This calculation is performed by the domain classification network intermediate calculation unit 51 and the angle softmax layer unit 52, which will be described later.

ドメイン分類ネットワークは、ドメイン分類ネットワークの最初のレイヤーとして、ドメイン分類ネットワーク部５で誤差逆伝搬法によりパラメータを更新する際に勾配の符号を反転する勾配反転レイヤーを含む。勾配反転レイヤーは、誤差逆伝搬法によりパラメータを更新する際に、勾配の符号がプラスの場合はマイナス、勾配の符号がプラスの場合はマイナスにする。勾配反転レイヤーにより、ドメインを分類しにくくするように学習をすることができる。また、ドメイン分類ネットワークは、ドメイン分類ネットワークの最後のレイヤーとして、角度空間でソフトマックス値を計算するための角度ソフトマックスレイヤーを含む。ドメイン分類ネットワークの中の、勾配反転レイヤー及び角度ソフトマックスレイヤー以外のレイヤーは、例えば複数のFully Connectedレイヤーの積み重ねたもので構成される。 The domain classification network includes, as the first layer of the domain classification network, a gradient inversion layer that inverts the sign of the gradient when updating parameters by backpropagation in the domain classification network unit 5. When updating parameters by backpropagation, the gradient inversion layer makes the sign of the gradient negative if it is positive, and makes the sign of the gradient negative if it is positive. The gradient inversion layer makes it possible to learn to make domains more difficult to classify. In addition, the domain classification network includes, as the last layer of the domain classification network, an angle softmax layer for calculating a softmax value in angle space. Layers in the domain classification network other than the gradient inversion layer and the angle softmax layer are composed of, for example, a stack of multiple fully connected layers.

また、ドメイン分類ネットワーク部５は、この確率分布を用いて損失関数の値を計算し、この損失関数の値を最小化するように、誤差逆伝搬を行い、前段ネットワーク部３で用いられる前段ネットワーク及びドメイン分類ネットワーク部５で用いられるドメイン分類ネットワークが有する各レイヤーのパラメータを更新する。後述するように、損失関数の値の計算はクロスエントロピー損失関数計算部５３で行われ、パラメータの更新はパラメータ更新部５４で行われる。 In addition, the domain classification network unit 5 calculates the value of a loss function using this probability distribution, performs error backpropagation so as to minimize the value of this loss function, and updates the parameters of each layer of the pre-stage network used in the pre-stage network unit 3 and the domain classification network used in the domain classification network unit 5. As will be described later, the calculation of the value of the loss function is performed by the cross-entropy loss function calculation unit 53, and the parameter update is performed by the parameter update unit 54.

以下、ドメイン分類ネットワーク部５の処理の例について説明する。ドメイン分類ネットワーク部５は、図２に例示するように、ドメイン分類ネットワーク中間計算部５１、角度ソフトマックスレイヤー部５２、クロスエントロピー損失関数計算部５３及びパラメータ更新部５４を備えている。Below, we will explain an example of the processing of the domain classification network unit 5. As shown in Figure 2, the domain classification network unit 5 includes a domain classification network intermediate calculation unit 51, an angle softmax layer unit 52, a cross-entropy loss function calculation unit 53, and a parameter update unit 54.

<<ドメイン分類ネットワーク中間計算部５１>>
ドメイン分類ネットワーク中間計算部５１には、前段ネットワーク部３で得られたソースデータ特徴又はターゲットデータ特徴が入力される。 <<Domain classification network intermediate calculation unit 51>>
The source data features or target data features obtained in the pre-stage network section 3 are input to the domain classification network intermediate calculation section 51 .

ドメイン分類ネットワーク中間計算部５１は、ソースデータ特徴又はターゲットデータ特徴を用いて、ドメイン分類ネットワークの中の、最後のレイヤーである角度ソフトマックスレイヤー以外のレイヤーの計算（言い換えれば、角度ソフトマックスレイヤーに入力される値の計算）を行うことで中間特徴を得る。得られた中間特徴は、角度ソフトマックスレイヤー部５２に出力される。The domain classification network intermediate calculation unit 51 uses source data features or target data features to perform calculations of layers other than the final layer, the angle softmax layer, in the domain classification network (in other words, calculates the values to be input to the angle softmax layer) to obtain intermediate features. The obtained intermediate features are output to the angle softmax layer unit 52.

ドメイン分類ネットワーク中間計算部５１にソースデータ特徴が入力された場合には、入力されたソースデータ特徴に対応する中間特徴が得られる。When source data features are input to the domain classification network intermediate calculation unit 51, intermediate features corresponding to the input source data features are obtained.

ドメイン分類ネットワーク中間計算部５１にターゲットデータ特徴が入力された場合には、入力されたターゲットデータ特徴に対応する中間特徴が得られる。When target data features are input to the domain classification network intermediate calculation unit 51, intermediate features corresponding to the input target data features are obtained.

<<角度ソフトマックスレイヤー部５２>>
角度ソフトマックスレイヤー部５２には、ドメイン分類ネットワーク中間計算部５１で得られた中間特徴と、ソースデータ記憶部１からの読み込んだソースデータのドメインラベル又はターゲットデータ記憶部２から読み込んだターゲットデータドメインラベルとが入力される。 <<Angle softmax layer part 52>>
The angle softmax layer unit 52 receives as input the intermediate features obtained by the domain classification network intermediate calculation unit 51 and the domain labels of the source data read from the source data storage unit 1 or the domain labels of the target data read from the target data storage unit 2.

角度ソフトマックスレイヤー部５２に入力された中間特徴がソースデータ特徴に対応するものである場合には、角度ソフトマックスレイヤー部５２にはソースデータのドメインラベルが入力される。角度ソフトマックスレイヤー部５２に入力された中間特徴がターゲットデータ特徴に対応するものである場合には、角度ソフトマックスレイヤー部５２にはターゲットデータのドメインラベルが入力される。If the intermediate features input to the angle softmax layer unit 52 correspond to source data features, the domain labels of the source data are input to the angle softmax layer unit 52. If the intermediate features input to the angle softmax layer unit 52 correspond to target data features, the domain labels of the target data are input to the angle softmax layer unit 52.

角度ソフトマックスレイヤー部５２及びその各部は、これらの入力されたドメインラベルに基づいて、ソースデータ特徴とターゲットデータ特徴のどちらに対応する処理を行うのかを判断することができる。すなわち、入力されたドメインラベルがソースドメインに対応する場合には、言い換えれば、入力された中間特徴の正解ドメインがソースドメインである場合には、角度ソフトマックスレイヤー部５２及びその各部は、ソースデータ特徴に対応する処理を行う。また、入力されたドメインラベルがターゲットドメインに対応する場合には、言い換えれば、入力された中間特徴の正解ドメインがターゲットドメインである場合には、角度ソフトマックスレイヤー部５２及びその各部は、ターゲットデータ特徴に対応する処理を行う。Based on these input domain labels, the angle softmax layer unit 52 and each of its units can determine whether to perform processing corresponding to the source data features or the target data features. That is, when the input domain label corresponds to the source domain, in other words, when the correct domain of the input intermediate features is the source domain, the angle softmax layer unit 52 and each of its units performs processing corresponding to the source data features. Also, when the input domain label corresponds to the target domain, in other words, when the correct domain of the input intermediate features is the target domain, the angle softmax layer unit 52 and each of its units performs processing corresponding to the target data features.

角度ソフトマックスレイヤー部５２は、入力された中間特徴を用いて、ドメイン分類ネットワークの中の角度ソフトマックスレイヤーの計算を行うことで、入力された中間特徴に対応する角度ソフトマックス値を得る。得られた角度ソフトマックス値は、確率分布でもある。得られた角度ソフトマックス値は、パラメータ更新部５４に出力される。The angular softmax layer unit 52 uses the input intermediate features to perform calculations of the angular softmax layer in the domain classification network to obtain angular softmax values corresponding to the input intermediate features. The obtained angular softmax values are also probability distributions. The obtained angular softmax values are output to the parameter update unit 54.

角度ソフトマックスレイヤー部５２の処理の例の詳細については、後述する。 A detailed example of the processing of the angle softmax layer unit 52 is provided below.

<<クロスエントロピー損失関数計算部５３>>
クロスエントロピー損失関数計算部５３には、角度ソフトマックスレイヤー部５２で得られた角度ソフトマックス値と、ソースデータのドメインラベル又はターゲットデータのドメインラベルとが入力される。 <<Cross-entropy loss function calculation unit 53>>
The cross-entropy loss function calculation unit 53 receives the angle softmax value obtained by the angle softmax layer unit 52 and the domain label of the source data or the domain label of the target data.

クロスエントロピー損失関数計算部５３に入力された中間特徴がソースデータ特徴に対応するものである場合には、クロスエントロピー損失関数計算部５３にはソースデータのドメインラベルが入力される。クロスエントロピー損失関数計算部５３に入力された中間特徴がターゲットデータ特徴に対応するものである場合には、クロスエントロピー損失関数計算部５３にはターゲットデータのドメインラベルが入力される。 If the intermediate features input to the cross-entropy loss function calculation unit 53 correspond to source data features, the domain label of the source data is input to the cross-entropy loss function calculation unit 53. If the intermediate features input to the cross-entropy loss function calculation unit 53 correspond to target data features, the domain label of the target data is input to the cross-entropy loss function calculation unit 53.

クロスエントロピー損失関数計算部５３は、角度ソフトマックス値と、ターゲットデータのドメインラベル又はソースデータのドメインラベルとを用いて、クロスエントロピーの計算を行い、損失関数の値であるドメイン誤差を得る。得られた損失関数の値は、パラメータ更新部５４に出力される。The cross-entropy loss function calculation unit 53 calculates the cross-entropy using the angle softmax value and the domain label of the target data or the domain label of the source data, and obtains the domain error, which is the value of the loss function. The obtained loss function value is output to the parameter update unit 54.

<<パラメータ更新部５４>>
パラメータ更新部５４には、クロスエントロピー損失関数計算部５３で計算された損失関数の値が入力される。 <<Parameter Update Unit 54>>
The parameter update unit 54 receives the value of the loss function calculated by the cross-entropy loss function calculation unit 53 .

パラメータ更新部５４は、損失関数の値を用いて、誤差逆伝搬法により、前段ネットワーク及びドメイン分類ネットワークが有する各レイヤーのパラメータを更新する。The parameter update unit 54 uses the value of the loss function to update the parameters of each layer of the upstream network and the domain classification network using the backpropagation method.

先に述べたように、ドメイン分類ネットワークは、ドメイン分類ネットワークの最初のレイヤーとして、誤差逆伝搬法によりパラメータを更新する際に勾配の符号を反転する勾配反転レイヤーを含む。このため、パラメータ更新部５４でのパラメータの更新において、ドメイン分類ネットワークの最初のレイヤー（勾配反転レイヤー）に対応する処理を行う際には、勾配の符号（勾配反転レイヤーに入力されるベクトルの各成分の符号）が反転される。As mentioned above, the domain classification network includes, as the first layer of the domain classification network, a gradient inversion layer that inverts the sign of the gradient when updating parameters by the backpropagation method. Therefore, when performing processing corresponding to the first layer (gradient inversion layer) of the domain classification network in updating parameters in the parameter update unit 54, the sign of the gradient (the sign of each component of the vector input to the gradient inversion layer) is inverted.

このようにして、ドメイン分類ネットワーク部５は、ソースデータ特徴又はターゲット特徴を入力として、その入力されたソースデータ特徴又はターゲット特徴が属するドメインの確率分布を計算するドメイン分類ネットワークを用いて、前段ネットワーク部３で得られたソースデータ特徴又はターゲット特徴に対応する確率分布を計算し、計算された確率分布と、ソースデータ記憶部１から読み込んだ前記得られたソースデータ特徴に対応するソースデータのドメインラベル又はターゲットデータ記憶部２から読み込んだ前記得られたターゲットデータ特徴に対応するターゲットデータのドメインラベルとを用いて、損失関数の値を計算し、計算された損失関数の値を最小化するように、前段ネットワーク及びドメイン分類ネットワークのパラメータを誤差逆伝搬法により更新する（ステップＳ５）。In this way, the domain classification network unit 5 uses a domain classification network that takes as input the source data features or target features and calculates the probability distribution of the domain to which the input source data features or target features belong, calculates a probability distribution corresponding to the source data features or target features obtained in the front-stage network unit 3, and calculates the value of a loss function using the calculated probability distribution and the domain label of the source data corresponding to the obtained source data features read from the source data storage unit 1 or the domain label of the target data corresponding to the obtained target data features read from the target data storage unit 2, and updates the parameters of the front-stage network and the domain classification network by the backpropagation method so as to minimize the value of the calculated loss function (step S5).

以下、角度ソフトマックスレイヤー部５２の処理の例について説明する。角度ソフトマックスレイヤー部５２は、図３に例示するように、特徴正規化部５２１、パラメータ正規化部５２２、内積計算部５２３、マージン追加部５２４、スケーリング部５２５及びソフトマックス計算部５２６を備えている。Below, we will explain an example of the processing of the angle softmax layer unit 52. As shown in Figure 3, the angle softmax layer unit 52 includes a feature normalization unit 521, a parameter normalization unit 522, an inner product calculation unit 523, a margin addition unit 524, a scaling unit 525, and a softmax calculation unit 526.

<<<特徴正規化部５２１>>>
特徴正規化部５２１には、ドメイン分類ネットワーク中間計算部５１で得られた中間特徴が入力される。 <<<Feature normalization unit 521>>>
The intermediate features obtained by the domain classification network intermediate calculation unit 51 are input to the feature normalization unit 521 .

特徴正規化部５２１は、中間特徴のL2正規化を行うことで、正規化済み特徴を得る。得られた正規化済み特徴は、内積計算部５２３に出力される。L2正規化はベクトルの長さを１に正規化する処理であり、ベクトルをベクトルの長さで除算することで求まる。The feature normalization unit 521 obtains normalized features by performing L2 normalization on the intermediate features. The obtained normalized features are output to the inner product calculation unit 523. L2 normalization is a process of normalizing the length of a vector to 1, and is obtained by dividing the vector by the length of the vector.

中間特徴が、ソースデータ特徴に対応する場合と、ターゲットデータ特徴に対応する場合とがある。ソースデータ特徴に対応する中間特徴をr_sと表記し、ターゲットデータ特徴に対応する中間特徴をr_tと表記する。 The intermediate features may correspond to source data features or target data features. The intermediate features corresponding to the source data features are denoted as r _s , and the intermediate features corresponding to the target data features are denoted as r _t .

中間特徴がソースデータ特徴に対応するものである場合、言い換えれば、正解ドメインがソースドメインである場合、正規化済み特徴はr_s/||r_s||と表記することができる。中間特徴がターゲットデータ特徴に対応するものである場合、言い換えれば、正解ドメインがターゲットドメインである場合、正規化済み特徴はr_t/||r_t||と表記することができる。 If the intermediate features correspond to the source data features, i.e., the ground truth domain is the source domain, the normalized features can be written as r _s /||r _s || If the intermediate features correspond to the target data features, i.e., the ground truth domain is the target domain, the normalized features can be written as r _t /||r _t ||

<<<パラメータ正規化部５２２>>>
パラメータ正規化部５２２には、角度ソフトマックスレイヤーのパラメータが入力される。 <<<Parameter normalization unit 522>>>
The parameter normalization unit 522 receives the parameters of the angle softmax layer.

パラメータ正規化部５２２は、角度ソフトマックスレイヤーのパラメータのL2正規化を行い、正規化済みパラメータを得る。得られた正規化済みパラメータは、内積計算部５２３に出力される。The parameter normalization unit 522 performs L2 normalization on the parameters of the angular softmax layer to obtain normalized parameters. The obtained normalized parameters are output to the inner product calculation unit 523.

角度ソフトマックスレイヤーのパラメータには、後述するcos p_tを計算するためのパラメータz₀と、後述するcos p_sを計算するためのパラメータz₁とがある。 The parameters of the angular softmax layer include a parameter _z0 for calculating cos p _t , which will be described later, and a parameter _z1 for calculating cos p _s , which will be described later.

角度ソフトマックスレイヤーのパラメータがz₀である場合、正規化済みパラメータはz₀/||z₀||と表記することができる。角度ソフトマックスレイヤーのパラメータがz₁である場合、正規化済みパラメータはz₁/||z₁||と表記することができる。 If the parameter of the angular softmax layer is _z0 , the normalized parameter can be written as _z0 /|| _z0 ||. If the parameter of the angular softmax layer is _z1 , the normalized parameter can be written as _z1 /|| _z1 ||.

<<<内積計算部５２３>>>
内積計算部５２３には、特徴正規化部５２１で得られた正規化済み特徴と、パラメータ正規化部５２２で得られた正規化済みパラメータとが入力される。 <<<Inner product calculation unit 523>>>
The inner product calculation unit 523 receives the normalized features obtained by the feature normalization unit 521 and the normalized parameters obtained by the parameter normalization unit 522 .

内積計算部５２３は、ベクトルである正規化済み特徴と、ベクトルである正規化済みパラメータとの内積を計算することで、内積計算済み特徴を得る。得られた内積計算済み特徴は、マージン追加部５２４に出力される。The inner product calculation unit 523 obtains an inner product calculated feature by calculating the inner product of the normalized feature, which is a vector, and the normalized parameter, which is also a vector. The obtained inner product calculated feature is output to the margin addition unit 524.

入力された正規化済み特徴がターゲットデータ特徴に対応する正規化済み特徴r_t/||r_t||である場合、言い換えれば、正解ドメインがターゲットドメインである場合、内積計算部５２３は、以下の式により定義されるcos p_tとcos p_sとを計算する。ここで、Tは転置を意味する。以下の式により定義されるcos p_t, cos p_sから構成されるベクトル(cos p_t, cos p_s)が、正解ドメインがターゲットドメインの場合の内積計算済み特徴である。

入力された正規化済み特徴がソースデータ特徴に対応する正規化済み特徴r_s/||r_s||である場合、言い換えれば、正解ドメインがソースドメインである場合、内積計算部５２３は、以下の式により定義されるcos p_tとcos p_sとを計算する。以下の式により定義されるcos p_t, cos p_sから構成されるベクトル(cos p_t, cos p_s)が、正解ドメインがソースドメインの場合の内積計算済み特徴である。

<<<マージン追加部５２４>>>
マージン追加部５２４には、内積計算部５２３で計算された内積計算済み特徴が入力される。 When the input normalized feature is a normalized feature r _t /||r _t || corresponding to the target data feature, in other words, when the correct domain is the target domain, the inner product calculation unit 523 calculates cos p _t and cos p _s defined by the following equation. Here, T means transpose. A vector (cos p _t , cos p _s ) composed of cos p _t and cos p _s defined by the following equation is the inner product calculated feature when the correct domain is the target domain.

When the input normalized feature is a normalized feature r _s /||r _s || corresponding to the source data feature, in other words, when the correct domain is the source domain, the inner product calculation unit 523 calculates cos p _t and cos p _s defined by the following equation. A vector (cos p _t , cos p _s ) composed of cos p _t and cos p _s defined by the following equation is the inner product calculated feature when the correct domain is the source domain.

<<<Margin Adding Unit 524>>>
The margin adding unit 524 receives the inner product calculated feature calculated by the inner product calculating unit 523 .

入力された内積計算済み特徴がターゲットデータ特徴に対応する場合、言い換えれば、正解ドメインがターゲットドメインである場合、マージン追加部５２４は、内積計算済み特徴の中のcos p_tにマージンを追加する。例えば、マージン追加部５２４は、マージンmが追加されたcos p_tとして、cos(p_t+m)の値を計算する。この場合、内積計算済み特徴の中のcos p_sにはマージンは追加されない。この場合、マージン追加部５２４は、マージン追加済み特徴である(cos(p_t+m), cos p_s)をスケーリング部５２５に出力する。 If the input dot-product-calculated feature corresponds to the target data feature, in other words, if the correct domain is the target domain, the margin adding unit 524 adds a margin to cos p _t in the dot-product-calculated feature. For example, the margin adding unit 524 calculates the value of cos(p _t +m) as cos p _t to which a margin m is added. In this case, no margin is added to cos p _s in the dot-product-calculated feature. In this case, the margin adding unit 524 outputs the margin-added feature (cos(p _t +m), cos p _s ) to the scaling unit 525.

入力された内積計算済み特徴がソースデータ特徴に対応する場合、言い換えれば、正解ドメインがソースドメインである場合、マージン追加部５２４は、内積計算済み特徴の中のcos p_sにマージンを追加する。例えば、マージン追加部５２４は、マージンmが追加されたcos p_sとして、cos(p_s+m)の値を計算する。この場合、内積計算済み特徴の中のcos p_tにはマージンは追加されない。この場合、マージン追加部５２４は、マージン追加済み特徴である(cos p_t, cos(p_s+m))をスケーリング部５２５に出力する。 When the input dot-product-calculated feature corresponds to the source data feature, in other words, when the correct domain is the source domain, the margin adding unit 524 adds a margin to cos p _s in the dot-product-calculated feature. For example, the margin adding unit 524 calculates the value of cos(p _s +m) as cos p _s to which a margin m is added. In this case, no margin is added to cos p _t in the dot-product-calculated feature. In this case, the margin adding unit 524 outputs the margin-added feature (cos p _t , cos(p _s +m)) to the scaling unit 525.

なお、mは所定のマージンパラメータである。マージンパラメータmは、0でもよいし、任意の値であってもよい。 Note that m is a predetermined margin parameter. The margin parameter m may be 0 or any other value.

マージン追加の処理により、ドメインの識別をより正確に行うことができる。なお、マージン追加部５２４の処理は行われなくてもよい。The margin addition process allows for more accurate domain identification. Note that the process of the margin addition unit 524 does not have to be performed.

<<<スケーリング部５２５>>>
スケーリング部５２５には、マージン追加部５２４で得られたマージン追加済み特徴が入力される。 <<<Scaling Unit 525>>>
The scaling unit 525 receives the margin-added features obtained by the margin adding unit 524 .

スケーリング部５２５は、マージン追加済み特徴に対して、スケーリング処理を行い、スケーリング済み特徴を得る。得られたスケーリング済み特徴は、ソフトマックス計算部５２６に出力される。The scaling unit 525 performs a scaling process on the margin-added features to obtain scaled features. The obtained scaled features are output to the softmax calculation unit 526.

入力されたマージン追加済み特徴がターゲットデータ特徴に対応する場合には、言い換えれば、正解ドメインがターゲットドメインである場合には、スケーリング部５２５は、例えば、(s・cos(p_t+m), s・cos p_s)をスケーリング済み特徴とする。 When the input margin-added feature corresponds to the target data feature, in other words, when the correct domain is the target domain, the scaling unit 525 sets, for example, (s·cos(p _t +m), s·cos p _s ) as the scaled feature.

入力されたマージン追加済み特徴がソースデータ特徴に対応する場合には、言い換えれば、正解ドメインがソースドメインである場合には、スケーリング部５２５は、(s・cos p_t, s・cos(p_s+m))をスケーリング済み特徴とする。 If the input margin-added features correspond to the source data features, in other words, if the ground truth domain is the source domain, the scaling unit 525 sets (s·cos p _t , s·cos(p _s +m)) as the scaled features.

なお、sは所定のスケーリングパラメータであり、例えば1≦s≦100である。 Note that s is a predetermined scaling parameter, for example 1≦s≦100.

<<<ソフトマックス計算部５２６>>>
ソフトマックス計算部５２６には、スケーリング部５２５で得られたスケーリング済み特徴が入力される。 <<<Softmax calculation unit 526>>>
The softmax calculation unit 526 receives the scaled features obtained by the scaling unit 525 as input.

ソフトマックス計算部５２６は、スケーリング済み特徴に基づいてソフトマックス計算をし、角度ソフトマックス値を得る。得られた角度ソフトマックス値は、クロスエントロピー損失関数計算部５３に出力される。ソフトマックス計算としては、一般的なソフトマックスが用いられる。The softmax calculation unit 526 performs a softmax calculation based on the scaled features to obtain an angle softmax value. The obtained angle softmax value is output to the cross-entropy loss function calculation unit 53. A general softmax is used as the softmax calculation.

例えば、正解ドメインがターゲットドメインの場合、入力されるスケーリング済み特徴は、(s・cos(p_t+m), s・cos p_s)となる。このスケーリング済み特徴(s・cos(p_t+m), s・cos p_s)に基づいて、ターゲットドメインに対応するソフトマックス値は以下のように計算される。

また、この場合、ソースドメインに対応するソフトマックス値は以下になる。

これらのソフトマックス値は、ドメイン分類ネットワーク部５に入力されたターゲットデータ特徴が、ターゲットドメイン及びソースドメインに属する確率分布である。 For example, when the correct domain is the target domain, the input scaled feature is (s·cos(p _t +m), s·cos p _s ). Based on this scaled feature (s·cos(p _t +m), s·cos p _s ), the softmax value corresponding to the target domain is calculated as follows:

Also, in this case, the softmax value corresponding to the source domain is:

These softmax values are the probability distribution that the target data features input to the domain classification network unit 5 belong to the target domain and the source domain.

また、正解ドメインがソースドメインの場合、ターゲットドメインに対応するソフトマックス値は以下になる。入力されるスケーリング済み特徴は、(s・cos p_t, s・cos(p_s+m))となる。このスケーリング済み特徴(s・cos p_t, s・cos(p_s+m))に基づいて、ターゲットドメインに対応するソフトマックス値は以下のように計算される。

この場合、ソースドメインに対応するソフトマックス値は以下になる。

これらのソフトマックス値は、ドメイン分類ネットワーク部５に入力されたソースデータ特徴が、ターゲットドメイン及びソースドメインに属する確率分布である。 Also, when the correct domain is the source domain, the softmax value corresponding to the target domain is as follows. The input scaled feature is (s・cos p _t , s・cos(p _s +m)). Based on this scaled feature (s・cos p _t , s・cos(p _s +m)), the softmax value corresponding to the target domain is calculated as follows.

In this case, the softmax value corresponding to the source domain is:

These softmax values are the probability distribution that the source data features input to the domain classification network unit 5 belong to the target domain and the source domain.

このように、角度空間でのソフトマックス計算を行うことで、教師なしドメイン適応の学習を従来よりも高精度に行うことができる。 In this way, by performing softmax calculations in angle space, unsupervised domain adaptation learning can be performed with higher accuracy than before.

従来の敵対的ドメイン適応では、ドメイン学習側の最終レイヤーはソフトマックスで計算を行っていた。これに対し、上記の実施形態のように、顔識別タスクなど学習が困難な分類タスクにおいて効果がでている角度空間でのソフトマックス計算（例えば、参考文献１参照。）をドメイン学習側の最終レイヤーに導入する。In conventional adversarial domain adaptation, the final layer on the domain learning side performs calculations using softmax. In contrast, as in the above embodiment, softmax calculations in angle space (see, for example, Reference 1), which have been effective in classification tasks that are difficult to learn, such as face identification tasks, are introduced to the final layer on the domain learning side.

〔参考文献１〕J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive angular margin loss for deep face recognition”, In CVPR, 2019
従来のソフトマックス計算よりも効果的に分類タスクを学習できる能力を持つ角度ソフトマックス計算を、敵対的学習の機構に導入することで、よりドメインが分類しにくい学習が行うことができる。 [Reference 1] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive angular margin loss for deep face recognition”, In CVPR, 2019
By introducing angular softmax calculation, which has the ability to learn classification tasks more effectively than conventional softmax calculation, into the mechanism of adversarial learning, it is possible to learn domains that are more difficult to classify.

上記の実施形態により、例えば、顔画像などの多種多様な変化的要素を含むデータセット間においても、高精度な教師なしドメイン適応が行え、ターゲットタスクに効果的な分類モデルを構築できる。 The above embodiment enables highly accurate unsupervised domain adaptation, even between datasets containing a wide variety of variable elements, such as facial images, and enables the construction of classification models that are effective for the target task.

[変形例]
以上、本発明の実施の形態について説明したが、具体的な構成は、これらの実施の形態に限られるものではなく、本発明の趣旨を逸脱しない範囲で適宜設計の変更等があっても、本発明に含まれることはいうまでもない。 [Variations]
Although the embodiments of the present invention have been described above, the specific configurations are not limited to these embodiments, and it goes without saying that appropriate design changes, etc., are included in the present invention as long as they do not deviate from the spirit of the present invention.

実施の形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。例えば、クラス分類ネットワーク部４によるステップＳ４の処理と、ドメイン分類ネットワーク部５によるステップＳ５の処理とは並列的に行われてもよい。The various processes described in the embodiments may be executed not only in chronological order according to the order described, but also in parallel or individually depending on the processing capacity of the device executing the processes or as necessary. For example, the process of step S4 by the class classification network unit 4 and the process of step S5 by the domain classification network unit 5 may be executed in parallel.

また、学習装置の構成部間のデータのやり取りは直接行われてもよいし、図示していない記憶部を介して行われてもよい。 In addition, data exchange between components of the learning device may be performed directly or via a memory unit not shown.

[プログラム、記録媒体]
上述した各装置の各部の処理をコンピュータにより実現してもよく、この場合は各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムを図５に示すコンピュータの記憶部１０２０に読み込ませ、演算処理部１０１０、入力部１０３０、出力部１０４０などに動作させることにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Programs, recording media]
The processing of each unit of each of the above-mentioned devices may be realized by a computer, in which case the processing contents of the functions that each device should have are described by a program. Then, by loading this program into the storage unit 1020 of the computer shown in Figure 5 and operating the arithmetic processing unit 1010, input unit 1030, output unit 1040, etc., various processing functions of each of the above-mentioned devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体は、例えば、非一時的な記録媒体であり、具体的には、磁気記録装置、光ディスク、等である。 The program describing this processing content can be recorded on a computer-readable recording medium. A computer-readable recording medium is, for example, a non-transitory recording medium, specifically, a magnetic recording device, an optical disk, etc.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program may be distributed, for example, by selling, transferring, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Furthermore, this program may be distributed by storing it in a storage device of a server computer and transferring the program from the server computer to other computers via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の非一時的な記憶装置である補助記録部１０５０に格納する。そして、処理の実行時、このコンピュータは、自己の非一時的な記憶装置である補助記録部１０５０に格納されたプログラムを記憶部１０２０に読み込み、読み込んだプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを記憶部１０２０に読み込み、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。A computer that executes such a program, for example, first stores the program recorded on a portable recording medium or the program transferred from a server computer in its own non-transient storage device, the auxiliary recording unit 1050. Then, when executing the process, the computer reads the program stored in its own non-transient storage device, the auxiliary recording unit 1050, into the storage unit 1020, and executes the process according to the read program. In addition, as another execution form of this program, the computer may read the program directly from the portable recording medium into the storage unit 1020 and execute the process according to the program, or, each time a program is transferred from the server computer to this computer, the computer may execute the process according to the received program one by one. In addition, the server computer may not transfer the program to this computer, but may execute the above-mentioned process by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and the result acquisition. Note that the program in this embodiment includes information used for processing by an electronic computer that is equivalent to a program (data that is not a direct command to the computer but has a nature that specifies the processing of the computer, etc.).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, in this embodiment, the device is configured by executing a specific program on a computer, but at least a portion of the processing content may be realized by hardware.

その他、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。Needless to say, other modifications may be made without departing from the spirit of this invention.

Claims

ソースデータは、画像、その画像に対応するクラスラベル及びドメインラベルであり、複数のソースデータが記憶されているソースデータ記憶部と、
ターゲットデータは、画像及びその画像に対応するドメインラベルであり、複数のターゲットデータが記憶されているターゲットデータ記憶部と、
画像を入力としてその入力された画像の特徴を出力する前段ネットワークを用いて、前記ソースデータ記憶部から読み込んだソースデータの画像の特徴であるソースデータ特徴と、前記ターゲットデータ記憶部から読み込んだターゲットデータの特徴であるターゲットデータ特徴とを得る前段ネットワーク部と、
ソースデータ特徴を入力としてその入力されたソースデータ特徴が属するクラスの確率分布を出力するクラス分類ネットワークを用いて、前記得られたソースデータ特徴に対応する確率分布を計算し、計算された確率分布と、前記ソースデータ記憶部から読み込んだ、前記得られたソースデータ特徴に対応するソースデータのクラスラベルとを用いて、損失関数の値を計算し、計算された損失関数の値を最小化するように、前記前段ネットワーク及び前記クラス分類ネットワークのパラメータを誤差逆伝搬法により更新するクラス分類ネットワーク部と、
ソースデータ特徴又はターゲット特徴を入力として、その入力されたソースデータ特徴又はターゲット特徴が属するドメインの確率分布を計算するドメイン分類ネットワークを用いて、前記得られたソースデータ特徴又は前記得られたターゲット特徴に対応する確率分布を計算し、計算された確率分布と、前記ソースデータ記憶部から読み込んだ前記得られたソースデータ特徴に対応するソースデータのドメインラベル又は前記ターゲットデータ記憶部から読み込んだ前記得られたターゲットデータ特徴に対応するターゲットデータのドメインラベルとを用いて、損失関数の値を計算し、計算された損失関数の値を最小化するように、前記前段ネットワーク及び前記ドメイン分類ネットワークのパラメータを誤差逆伝搬法により更新するドメイン分類ネットワーク部と、を含み、
前記ドメイン分類ネットワークは、前記ドメイン分類ネットワークのファーストレイヤーとして、前記ドメイン分類ネットワーク部で誤差逆伝搬法によりパラメータを更新する際に勾配の符号を反転する勾配反転レイヤーを含み、
前記ドメイン分類ネットワーク部で計算される確率分布は、角度空間で計算されたソフトマックス値であり、
前記ドメイン分類ネットワーク部は、
(a)前記得られたソースデータ特徴又は前記得られたターゲット特徴を用いて、前記ドメイン分類ネットワークの中の、最後のレイヤーである角度ソフトマックスレイヤー以外のレイヤーの計算を行うことで、前記得られたソースデータ特徴に対応する中間特徴r_s又は前記得られたターゲット特徴に対応する中間特徴r_tを得るドメイン分類ネットワーク中間計算部と、
(b)(i)前記得られたソースデータ特徴に対応する中間特徴r_s又は前記得られたターゲットデータ特徴に対応する中間特徴r_tを用いて、正規化済み特徴r_s/||r_s||又は正規化済み特徴はr_t/||r_t||を計算する特徴正規化部と、(ii)前記角度ソフトマックスレイヤーのパラメータz₀又はz₁を用いて、正規化済みパラメータz₀/||z₀||又は正規化済みパラメータz₁/||z₁||を計算するパラメータ正規化部と、(iii)Tは転置を意味するとして、計算された正規化済み特徴がターゲットデータ特徴に対応する正規化済み特徴r_t/||r_t||である場合、式(1),式(2)により定義される、ターゲットデータ特徴に対応する内積計算済み特徴(cos p_t, cos p_s)を計算し、計算された正規化済み特徴がソースデータ特徴に対応する正規化済み特徴r_s/||r_s||である場合、式(3),式(4)により定義される、ソースデータ特徴に対応する内積計算済み特徴(cos p_t, cos p_s)を計算する内積計算部と、

を含む、
学習装置。 The source data is an image, a class label and a domain label corresponding to the image, and a source data storage unit in which a plurality of source data are stored;
The target data is an image and a domain label corresponding to the image, and a target data storage unit in which a plurality of target data are stored;
a front-stage network unit that obtains source data features that are features of the image of the source data read from the source data storage unit and target data features that are features of the target data read from the target data storage unit, using a front-stage network that receives an image as an input and outputs features of the input image;
a classification network unit that uses a classification network that receives source data features as input and outputs a probability distribution of a class to which the input source data features belong to, calculates a probability distribution corresponding to the obtained source data features, calculates a value of a loss function using the calculated probability distribution and a class label of source data that corresponds to the obtained source data features and that is read from the source data storage unit, and updates parameters of the front-stage network and the classification network by an error backpropagation method so as to minimize the value of the calculated loss function;
a domain classification network unit that uses a source data feature or a target feature as an input to calculate a probability distribution of a domain to which the input source data feature or target feature belongs, calculates a probability distribution corresponding to the obtained source data feature or the obtained target feature, calculates a value of a loss function using the calculated probability distribution and a domain label of source data corresponding to the obtained source data feature read from the source data storage unit or a domain label of target data corresponding to the obtained target data feature read from the target data storage unit, and updates parameters of the front-stage network and the domain classification network by an error backpropagation method so as to minimize the value of the calculated loss function,
The domain classification network includes, as a first layer of the domain classification network, a gradient inversion layer that inverts the sign of a gradient when updating parameters by the backpropagation method in the domain classification network section;
The probability distribution calculated by the domain classification network unit is a softmax value calculated in an angle space,
The domain classification network unit includes:
(a) a domain classification network intermediate calculation unit that uses the obtained source data features or the obtained target features to perform calculations of layers other than the final layer, an angular softmax layer, in the domain classification network to obtain intermediate features r _s corresponding to the obtained source data features or intermediate features r _t corresponding to the obtained target features;
(b) (i) a feature normalization unit that calculates a normalized feature r _s /||r _s || or a normalized feature r _t /||r _t || using the intermediate feature r _s corresponding to the obtained source data feature or the intermediate feature r _t corresponding to the obtained target data feature; (ii) a parameter normalization unit that calculates a normalized parameter z ₀ /||z ₀ || or a normalized parameter z ₁ /||z ₁ || using a parameter z ₀ or z ₁ of the angular softmax layer; and (iii) a dot product calculated feature (cos p _t , cos p _s ) corresponding to the target data feature defined by equations (1) and (2) where T means transpose, and the calculated normalized feature is a normalized feature r _s _/ ||r _s corresponding to the source data feature _. if ||, an inner product calculation unit calculates inner product-calculated features (cos p _t , cos p _s ) corresponding to the source data features defined by equations (3) and (4);

including,
Learning device.

ソースデータ記憶部には、ソースデータは、画像、その画像に対応するクラスラベル及びドメインラベルであり、複数のソースデータが記憶されているとし、
ターゲットデータ記憶部には、ターゲットデータは、画像及びその画像に対応するドメインラベルであり、複数のターゲットデータが記憶されているとし、
前段ネットワーク部が、画像を入力としてその入力された画像の特徴を出力する前段ネットワークを用いて、前記ソースデータ記憶部から読み込んだソースデータの画像の特徴であるソースデータ特徴と、前記ターゲットデータ記憶部から読み込んだターゲットデータの特徴であるターゲットデータ特徴とを得る前段ネットワークステップと、
クラス分類ネットワーク部が、ソースデータ特徴を入力としてその入力されたソースデータ特徴が属するクラスの確率分布を出力するクラス分類ネットワークを用いて、前記得られたソースデータ特徴に対応する確率分布を計算し、計算された確率分布と、前記ソースデータ記憶部から読み込んだ、前記得られたソースデータ特徴に対応するソースデータのクラスラベルとを用いて、損失関数の値を計算し、計算された損失関数の値を最小化するように、前記前段ネットワーク及び前記クラス分類ネットワークのパラメータを誤差逆伝搬法により更新するクラス分類ネットワークステップと、
ドメイン分類ネットワーク部が、ソースデータ特徴又はターゲット特徴を入力として、その入力されたソースデータ特徴又はターゲット特徴が属するドメインの確率分布を計算するドメイン分類ネットワークを用いて、前記得られたソースデータ特徴又は前記得られたターゲット特徴に対応する確率分布を計算し、計算された確率分布と、前記ソースデータ記憶部から読み込んだ前記得られたソースデータ特徴に対応するソースデータのドメインラベル又は前記ターゲットデータ記憶部から読み込んだ前記得られたターゲットデータ特徴に対応するターゲットデータのドメインラベルとを用いて、損失関数の値を計算し、計算された損失関数の値を最小化するように、前記前段ネットワーク及び前記ドメイン分類ネットワークのパラメータを誤差逆伝搬法により更新するドメイン分類ネットワークステップと、を含み、
前記ドメイン分類ネットワークは、前記ドメイン分類ネットワークのファーストレイヤーとして、前記ドメイン分類ネットワーク部で誤差逆伝搬法によりパラメータを更新する際に勾配の符号を反転する勾配反転レイヤーを含み、
前記ドメイン分類ネットワーク部で計算される確率分布は、角度空間で計算されたソフトマックス値であり、
前記ドメイン分類ネットワークステップは、
(a)ドメイン分類ネットワーク中間計算部が、前記得られたソースデータ特徴又は前記得られたターゲット特徴を用いて、前記ドメイン分類ネットワークの中の、最後のレイヤーである角度ソフトマックスレイヤー以外のレイヤーの計算を行うことで、前記得られたソースデータ特徴に対応する中間特徴r_s又は前記得られたターゲット特徴に対応する中間特徴r_tを得るドメイン分類ネットワーク中間計算ステップと、
(b)(i)特徴正規化部が、前記得られたソースデータ特徴に対応する中間特徴r_s又は前記得られたターゲットデータ特徴に対応する中間特徴r_tを用いて、正規化済み特徴r_s/||r_s||又は正規化済み特徴はr_t/||r_t||を計算する特徴正規化ステップと、(ii)パラメータ正規化部が、前記角度ソフトマックスレイヤーのパラメータz₀又はz₁を用いて、正規化済みパラメータz₀/||z₀||又は正規化済みパラメータz₁/||z₁||を計算するパラメータ正規化ステップと、(iii)内積計算部が、Tは転置を意味するとして、計算された正規化済み特徴がターゲットデータ特徴に対応する正規化済み特徴r_t/||r_t||である場合、式(1),式(2)により定義される、ターゲットデータ特徴に対応する内積計算済み特徴(cos p_t, cos p_s)を計算し、計算された正規化済み特徴がソースデータ特徴に対応する正規化済み特徴r_s/||r_s||である場合、式(3),式(4)により定義される、ソースデータ特徴に対応する内積計算済み特徴(cos p_t, cos p_s)を計算する内積計算ステップと、

(iv)マージン追加部が、mは所定のマージンパラメータであるとして、計算された内積計算済み特徴(cos p_t, cos p_s)がターゲットデータ特徴に対応する場合、ターゲットデータ特徴に対応するマージン追加済み特徴(cos(p_t+m), cos p_s)を計算し、計算された内積計算済み特徴(cos p_t, cos p_s)がソースデータ特徴に対応する場合、ソースデータ特徴に対応するマージン追加済み特徴である(cos p_t, cos(p_s+m))を計算するマージン追加ステップと、(v)スケーリング部が、sは所定のスケーリングパラメータであるとして、計算されたマージン追加済み特徴がターゲットデータ特徴に対応する場合、ターゲットデータ特徴に対応するスケーリング済み特徴(s・cos(p_t+m), s・cos p_s)を得て、計算されたマージン追加済み特徴がソースデータ特徴に対応する場合、ソースデータ特徴に対応するスケーリング済み特徴(s・cos p_t, s・cos(p_s+m))を得るスケーリングステップと、(vi)ソフトマックス計算部が、得られたスケーリング済み特徴がターゲットデータ特徴に対応する場合、ターゲットドメインに対応するソフトマックス値を式(5)により計算し、ソースドメインに対応するソフトマックス値を式(6)により計算し、得られたスケーリング済み特徴がソースデータ特徴に対応する場合、ターゲットドメインに対応するソフトマックス値を式(7)により計算し、ソースドメインに対応するソフトマックス値を式(8)により計算するソフトマックス計算ステップとを含む角度ソフトマックスレイヤーステップと、

を含む、
学習方法。 The source data storage unit stores a plurality of source data, each of which is an image, a class label corresponding to the image, and a domain label;
The target data storage unit stores a plurality of target data, each of which is an image and a domain label corresponding to the image;
a front-stage network step in which a front-stage network unit obtains source data features, which are features of the image of the source data read from the source data storage unit, and target data features, which are features of the target data read from the target data storage unit, using a front-stage network that receives an image as an input and outputs features of the input image;
a classification network step in which a classification network unit calculates a probability distribution corresponding to the obtained source data features using a classification network that receives source data features as input and outputs a probability distribution of a class to which the input source data features belong, calculates a value of a loss function using the calculated probability distribution and a class label of source data that corresponds to the obtained source data features and that is read from the source data storage unit, and updates parameters of the front-stage network and the classification network by an error backpropagation method so as to minimize the value of the calculated loss function;
a domain classification network step in which a domain classification network unit uses a domain classification network that calculates a probability distribution of a domain to which the input source data feature or target feature belongs by inputting a source data feature or a target feature, calculates a probability distribution corresponding to the obtained source data feature or the obtained target feature, calculates a value of a loss function using the calculated probability distribution and a domain label of source data corresponding to the obtained source data feature read from the source data storage unit or a domain label of target data corresponding to the obtained target data feature read from the target data storage unit, and updates parameters of the front-stage network and the domain classification network by an error backpropagation method so as to minimize the value of the calculated loss function;
The domain classification network includes, as a first layer of the domain classification network, a gradient inversion layer that inverts the sign of a gradient when updating parameters by the backpropagation method in the domain classification network section;
The probability distribution calculated by the domain classification network unit is a softmax value calculated in an angle space,
The domain classification network step includes:
(a) a domain classification network intermediate calculation step in which a domain classification network intermediate calculation unit uses the obtained source data features or the obtained target features to perform calculations of layers other than the final layer, the angle softmax layer, in the domain classification network to obtain intermediate features r _s corresponding to the obtained source data features or intermediate features r _t corresponding to the obtained target features;
(b) (i) a feature normalization step in which a feature normalization unit calculates a normalized feature r _s /||r _s || or a normalized feature r _t /||r _t || using an intermediate feature r _s corresponding to the obtained source data feature or an intermediate feature r _t corresponding to the obtained target data feature; (ii) a parameter normalization step in which a parameter normalization unit calculates a normalized parameter z ₀ /||z ₀ || or a normalized parameter z _{1 /||z 1 || using a parameter z 0} _or _z ₁ of the angular softmax layer; and (iii) an inner product calculation unit calculates an inner product calculated feature (cos p t , cos p s ) corresponding to the target data feature defined by equations (1) and (2), where T means transpose, when the calculated normalized feature is a normalized feature r _t _/ ||r _t || corresponding to the target data feature, and the calculated normalized feature is a normalized feature r _s /||r _s corresponding to the source data feature _. if ||, a dot product calculation step of calculating dot product calculated features (cos p _t , cos p _s ) corresponding to the source data features defined by equations (3) and (4);

(iv) a margin adding step, in which a margin adding unit, when the calculated dot-product-computed feature (cos p _t , cos p _s ) corresponds to a target data feature, calculates a margin added feature (cos(p _t +m), cos p _s ) corresponding to the target data feature, where m is a predetermined margin parameter, and when the calculated dot-product-computed feature (cos p _t , cos p _s ) corresponds to a source data feature, calculates a margin added feature (cos p _t , cos(p _s +m)) corresponding to the source data feature; and (v) a scaling unit, when the calculated margin added feature corresponds to the target data feature , obtains a scaled feature (s · cos(p _t +m), s · cos p _s ) corresponding to the target data feature, where s is a predetermined scaling parameter, and when the calculated margin added feature corresponds to a source data feature, obtains a scaled feature (s · cos p _t , s · cos(p _s (vi) a softmax calculation step in which, if the obtained scaled features correspond to the target data features, a softmax value corresponding to the target domain is calculated according to equation (5) and a softmax value corresponding to the source domain is calculated according to equation (6), and if the obtained scaled features correspond to the source data features, a softmax value corresponding to the target domain is calculated according to equation (7) and a softmax value corresponding to the source domain is calculated according to equation (8);

including,
How to learn.

請求項１の学習装置の各部としてコンピュータを機能させるためのプログラム。 A program for causing a computer to function as each part of the learning device of claim 1.