JP2018032213A

JP2018032213A - Information processor, information processing system, information processing method and program

Info

Publication number: JP2018032213A
Application number: JP2016163886A
Authority: JP
Inventors: 一博阿部; Kazuhiro Abe; 大原　一人; Kazuto Ohara; 一人大原; 向井　理朗; Michiaki Mukai; 理朗向井
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2016-08-24
Filing date: 2016-08-24
Publication date: 2018-03-01

Abstract

PROBLEM TO BE SOLVED: To acquire words and phrases representing the content of a sentence with a simple processing.SOLUTION: A feature vector calculation part calculates a sentence vector representing a feature of a sentence and a word vector representing a feature of each word included in the sentence. A similarity calculation part calculates the similarity between the sentence vector and the word vector and extracts the word in a part included in the sentence based on the similarity. This embodiment is achievable in any mode of an information processor, an information processing system, an information processing method, or a program.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理システム、情報処理方法及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing system, an information processing method, and a program.

従来から、自然言語で記述された文章又はその理解を単純化するため、文章を複数のキーワードで代表する手法が提案されている。 Conventionally, in order to simplify a sentence written in a natural language or an understanding thereof, a technique for representing a sentence with a plurality of keywords has been proposed.

例えば、特許文献１に記載の主題抽出装置は、複数の文章を表すテキストデータから具体主題の候補となる名詞句を抽出し、名詞句ペアを作成し、名詞句各々の出現頻度、及び名詞句ペア各々の共起頻度を抽出し、名詞句各々の出現頻度及び名詞句ペアの共起頻度から求まる名詞句各々の出現確率を求め、名詞句ペアで出現確率に基づく勝敗を示す第１の素性を算出する。また、当該主題抽出装置は、名詞句ペアの係り受け構造毎の出現頻度を抽出し、名詞句ペアで係り先になり易さによる勝敗を示す第２の素性を算出する。また、当該主題抽出装置は、第１の素性及び第２の素性を並べた素性ベクトルを生成し、具体主題が既知の学習用文書に含まれる名詞句の素性ベクトルを用いて学習された分類器に入力して、具体主題を示す名詞句を抽出する。 For example, the subject extraction device described in Patent Literature 1 extracts noun phrases that are candidates for specific subjects from text data representing a plurality of sentences, creates noun phrase pairs, the appearance frequency of each noun phrase, and the noun phrase The first feature indicating the winning or losing of the noun phrase pair based on the appearance probability, by extracting the co-occurrence frequency of each pair, obtaining the appearance probability of each noun phrase obtained from the appearance frequency of each noun phrase and the co-occurrence frequency of the noun phrase pair Is calculated. In addition, the subject extraction device extracts the appearance frequency of each dependency structure of the noun phrase pair, and calculates a second feature indicating victory or defeat due to the ease of becoming a dependency destination in the noun phrase pair. Further, the subject matter extraction device generates a feature vector in which the first feature and the second feature are arranged, and a classifier trained by using a feature vector of a noun phrase included in a learning document whose specific subject is known To extract a noun phrase indicating a specific subject.

特開２０１２−１７３８１０号公報JP 2012-173810 A

特許文献１に記載の主題抽出装置は、文章を形態素解析して得られた名詞句の出現頻度や名詞句ペアの共起頻度などの比較的単純な指標に基づいて、その文章から名詞句をキーワードとして抽出していた。他方、文章を構成する語句の重要度は、必ずしも出現頻度、共起頻度などの単純な指標だけでは説明できるとは限らない。例えば、頻出する語句であっても文章全体の意味内容への貢献が小さい冗長な語句が、その文章に含まれることがある。このような語句は、文章の内容を誤認識する要因となりうる。 The subject extraction device described in Patent Document 1 extracts a noun phrase from a sentence based on relatively simple indicators such as the appearance frequency of noun phrases and the co-occurrence frequency of noun phrase pairs obtained by morphological analysis of the sentence. It was extracted as a keyword. On the other hand, the importance of words constituting a sentence cannot always be explained only by simple indicators such as appearance frequency and co-occurrence frequency. For example, a redundant word or phrase having a small contribution to the semantic content of the entire sentence may be included in the sentence even if the word frequently appears. Such a phrase can be a cause of misrecognizing the content of a sentence.

本発明は上記の点に鑑みてなされたものであり、文章の内容を代表する語句を簡素な処理により取得することができる情報処理装置、情報処理システム、情報処理方法及びプログラムを提供する。 The present invention has been made in view of the above points, and provides an information processing apparatus, an information processing system, an information processing method, and a program capable of acquiring words and phrases representative of the content of a sentence by simple processing.

（１）本発明は上記の課題を解決するためになされたものであり、本発明の一態様は、文章の特徴を示す文章ベクトルと、前記文章に含まれる単語毎の特徴を示す単語ベクトルを算出する特徴ベクトル算出部と、前記文章ベクトルと前記単語ベクトルとの類似度を算出し、前記類似度に基づいて前記文章に含まれる一部の単語を抽出する類似度算出部と、を備える情報処理装置である。 (1) The present invention has been made to solve the above problems, and one aspect of the present invention includes a sentence vector indicating a feature of a sentence and a word vector indicating a feature for each word included in the sentence. Information comprising: a feature vector calculation unit to calculate; a similarity calculation unit that calculates a similarity between the sentence vector and the word vector, and extracts a part of words included in the sentence based on the similarity It is a processing device.

（２）また、本発明の一態様は、上記の情報処理装置において、前記特徴ベクトル算出部は、前記一部の単語の単語ベクトルに基づいて前記文章の主旨を示す主旨ベクトルを算出する。 (2) Further, according to one aspect of the present invention, in the information processing apparatus, the feature vector calculation unit calculates a main vector indicating the main point of the sentence based on a word vector of the part of words.

（３）また、本発明の一態様は、上記の情報処理装置において、前記特徴ベクトル算出部は、視聴されたコンテンツに係る文章について前記主旨ベクトル又は文章ベクトルを算出し、前記主旨ベクトル又は前記文章ベクトルを前記視聴されたコンテンツ間で合成して視聴ベクトルを算出する。 (3) Further, according to one aspect of the present invention, in the information processing apparatus, the feature vector calculation unit calculates the main vector or the text vector for the text related to the viewed content, and the main vector or the text A viewing vector is calculated by synthesizing the vectors between the viewed contents.

（４）また、本発明の一態様は、上記の情報処理装置において、前記特徴ベクトル算出部は、未視聴のコンテンツに係る文章の主旨を示す第２主旨ベクトル又は当該文章の特徴を示す第２文章ベクトルを算出し、前記類似度算出部は、前記第２主旨ベクトルもしくは第２文章ベクトルと前記視聴ベクトルとの類似度を算出し、前記類似度に基づいて前記未視聴のコンテンツから推薦コンテンツを選択するコンテンツ選択部を備える。 (4) Further, according to one aspect of the present invention, in the information processing apparatus, the feature vector calculation unit includes a second main vector indicating the main point of a sentence related to unviewed content or a second characteristic indicating the characteristic of the sentence. A sentence vector is calculated, and the similarity calculation unit calculates a similarity between the second main vector or the second sentence vector and the viewing vector, and recommends a recommended content from the unviewed content based on the similarity. The content selection part to select is provided.

（５）また、本発明の一態様は、受信装置と（４）の情報処理装置とを備える情報処理システムにおいて、前記受信装置は、コンテンツを受信し、視聴されたコンテンツを示す視聴情報を前記情報処理装置に送信し、前記情報処理装置から前記推薦コンテンツに関する推薦コンテンツ情報を受信する情報処理システムである。 (5) According to another aspect of the present invention, in an information processing system including a receiving device and the information processing device according to (4), the receiving device receives content and receives viewing information indicating the viewed content. An information processing system for transmitting to an information processing device and receiving recommended content information related to the recommended content from the information processing device.

（６）また、本発明の一態様は、情報処理装置における情報処理方法であって、文章の特徴を示す文章ベクトルと、前記文章に含まれる単語の特徴を示す単語ベクトルを算出する特徴ベクトル算出過程と、前記文章ベクトルと前記単語ベクトルとの類似度を算出し、前記類似度に基づいて前記文章に含まれる一部の単語を抽出する単語抽出過程と、を有する情報処理方法である。 (6) According to another aspect of the present invention, there is provided an information processing method in an information processing apparatus, wherein a feature vector that calculates a feature vector of a sentence and a feature vector of a word included in the sentence is calculated. And a word extraction step of calculating a similarity between the sentence vector and the word vector and extracting a part of words included in the sentence based on the similarity.

（７）また、本発明の一態様は、情報処理装置のコンピュータに、文章の特徴を示す文章ベクトルと、前記文章に含まれる単語の特徴を示す単語ベクトルを算出する特徴ベクトル算出手順、前記文章ベクトルと前記単語ベクトルとの類似度を算出し、前記類似度に基づいて前記文章に含まれる単語の一部を抽出する単語抽出手順、を実行させるためのプログラムである。 (7) According to one aspect of the present invention, a computer of an information processing apparatus causes a computer to process a feature vector calculation procedure for calculating a sentence vector indicating a feature of a sentence and a word vector indicating a feature of a word included in the sentence. It is a program for calculating a similarity between a vector and the word vector, and executing a word extraction procedure for extracting a part of a word included in the sentence based on the similarity.

本発明によれば、文章の内容を代表する語句を簡素な処理により取得することができる。 According to the present invention, a phrase representing the content of a sentence can be acquired by a simple process.

第１の実施形態に係る情報処理装置の構成例を示すブロック図である。It is a block diagram showing an example of composition of an information processor concerning a 1st embodiment. 第１の実施形態に係る事前学習の一例に係る機能ブロック図である。It is a functional block diagram concerning an example of prior learning concerning a 1st embodiment. 第１の実施形態に係る文章データの一例を示す図である。It is a figure which shows an example of the text data concerning 1st Embodiment. 第１の実施形態に係る形態素データの一例を示す図である。It is a figure which shows an example of the morpheme data which concern on 1st Embodiment. 第１の実施形態に係る単語データの一例を示す図である。It is a figure which shows an example of the word data which concern on 1st Embodiment. 第１の実施形態に係るＤｏｃ２Ｖｅｃ部の構成例を示す図である。It is a figure which shows the structural example of the Doc2Vec part which concerns on 1st Embodiment. 第１の実施形態に係るＤｏｃ２Ｖｅｃ部の別の構成例を示す図である。It is a figure which shows another structural example of the Doc2Vec part which concerns on 1st Embodiment. 第１の実施形態に係る単語ベクトル作成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the word vector creation process which concerns on 1st Embodiment. 第１の実施形態に係る文章ベクトル作成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the text vector creation process which concerns on 1st Embodiment. 第１の実施形態に係る特徴ベクトルの管理の一例に係る機能ブロック図である。It is a functional block diagram concerning an example of management of a feature vector concerning a 1st embodiment. 第１の実施形態に係る文章ベクトルデータの一例を示す図である。It is a figure which shows an example of the text vector data which concerns on 1st Embodiment. 第１の実施形態に係る単語ベクトルデータの一例を示す図である。It is a figure which shows an example of the word vector data which concern on 1st Embodiment. 第１の実施形態に係る特徴ベクトルデータ生成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the feature vector data generation process which concerns on 1st Embodiment. 第１の実施形態に係るキーワード取得処理の一例に係る機能ブロック図である。It is a functional block diagram concerning an example of keyword acquisition processing concerning a 1st embodiment. 第１の実施形態に係るキーワードデータの一例を示す図である。It is a figure which shows an example of the keyword data which concern on 1st Embodiment. 第１の実施形態に係るキーワード取得処理の一例を示すフローチャートである。It is a flowchart which shows an example of the keyword acquisition process which concerns on 1st Embodiment. 第１の実施形態に係るキーワード取得処理に用いた文章の一例を示す図である。It is a figure which shows an example of the text used for the keyword acquisition process which concerns on 1st Embodiment. 第１の実施形態に係るキーワードの選択例を示す表である。It is a table | surface which shows the example of selection of the keyword which concerns on 1st Embodiment. 第２の実施形態に係る情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る主旨ベクトル取得処理の一例に係る機能ブロック図である。It is a functional block diagram which concerns on an example of the main point vector acquisition process which concerns on 2nd Embodiment. 第２の実施形態に係る主旨ベクトルデータの一例を示す図である。It is a figure which shows an example of the main point vector data which concern on 2nd Embodiment. 第２の実施形態に係る主旨ベクトル取得処理の一例を示すフローチャートである。It is a flowchart which shows an example of the main point vector acquisition process which concerns on 2nd Embodiment. 第３の実施形態に係る情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus which concerns on 3rd Embodiment. 第３の実施形態に係る視聴ベクトル取得処理の一例に係る機能ブロック図である。It is a functional block diagram which concerns on an example of the viewing-and-listening vector acquisition process which concerns on 3rd Embodiment. 第３の実施形態に係る視聴データの一例を示す図である。It is a figure which shows an example of the viewing-and-listening data which concern on 3rd Embodiment. 第３の実施形態に係る視聴ベクトルデータの一例を示す図である。It is a figure which shows an example of the viewing vector data which concerns on 3rd Embodiment. 第３の実施形態に係る視聴ベクトル取得処理の一例を示すフローチャートである。It is a flowchart which shows an example of the viewing-and-listening vector acquisition process which concerns on 3rd Embodiment. 第４の実施形態に係る情報処理システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing system which concerns on 4th Embodiment. 第４の実施形態に係る情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus which concerns on 4th Embodiment. 第４の実施形態に係る視聴ベクトル取得処理の一例に係る機能ブロック図である。It is a functional block diagram concerning an example of viewing-and-listening vector acquisition processing concerning a 4th embodiment. 第４の実施形態に係る推薦コンテンツデータの一例を示す図である。It is a figure which shows an example of the recommendation content data which concerns on 4th Embodiment. 第４の実施形態に係るコンテンツ推薦処理の一例を示すフローチャートである。It is a flowchart which shows an example of the content recommendation process which concerns on 4th Embodiment. 第４の実施形態に係る広告配信の一例に係る機能ブロック図である。It is a functional block diagram concerning an example of advertising distribution concerning a 4th embodiment. 第４の実施形態に係る広告配信処理の一例を示すフローチャートである。It is a flowchart which shows an example of the advertisement delivery process which concerns on 4th Embodiment. 第４の実施形態に係る推薦コンテンツ配信の一例に係る機能ブロック図である。It is a functional block diagram concerning an example of recommended content distribution concerning a 4th embodiment. 第４の実施形態に係る推薦コンテンツ配信処理の一例を示すフローチャートである。It is a flowchart which shows an example of the recommendation content delivery process which concerns on 4th Embodiment. 各実施形態に係る情報処理装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the information processing apparatus which concerns on each embodiment.

（第１の実施形態）
以下、図面を参照しながら本発明の第１の実施形態について詳しく説明する。
図１は、本実施形態に係る情報処理装置１０の構成例を示すブロック図である。
情報処理装置１０は、データ管理部１０ａと、データ処理部１０ｂと、を含んで構成される。 (First embodiment)
Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a block diagram illustrating a configuration example of an information processing apparatus 10 according to the present embodiment.
The information processing apparatus 10 includes a data management unit 10a and a data processing unit 10b.

データ管理部１０ａは、データ処理部１０ｂにおいて処理に用いる各種のデータ、処理によって得られた各種のデータを記憶し、これらを管理する。データ管理部１０ａは、ハードウェア資源としてＲＯＭ（Ｒｅａｄ−ｏｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などの各種の記憶媒体を含んで構成される。データ管理部１０ａは、文章管理部１０１と、キーワード管理部１０８を含んで構成される。文章管理部１０１は、処理対象の文章を示す文章データ、それらの文章を構成する単語を示す形態素データ、及びそれらの単語を示す単語データを記憶する。文章データは、図３に例示されるように個々の文章を識別する文章ＩＤ（Ｉｄｅｎｔｉｆｉｅｒ）、文章タグ、及びその文章の情報を対応付けてなるデータである。形態素データは、図４に例示されるように文章毎の文章ＩＤ、その文章を構成する単語、及びその文章におけるその単語の順序である単語順序を対応付けてなるデータである。単語データは、図５に例示されるようにそれぞれの単語とその単語を識別する単語タグを対応付けてなるデータである。 The data management unit 10a stores and manages various data used for processing in the data processing unit 10b and various data obtained by the processing. The data management unit 10a includes various storage media such as a ROM (Read-only Memory) and a RAM (Random Access Memory) as hardware resources. The data management unit 10a includes a sentence management unit 101 and a keyword management unit 108. The sentence management unit 101 stores sentence data indicating the sentences to be processed, morpheme data indicating the words constituting the sentences, and word data indicating the words. The sentence data is data formed by associating a sentence ID (Identifier) for identifying each sentence, a sentence tag, and information on the sentence as illustrated in FIG. As illustrated in FIG. 4, the morpheme data is data in which a sentence ID for each sentence, words constituting the sentence, and a word order that is the order of the words in the sentence are associated with each other. The word data is data obtained by associating each word with a word tag for identifying the word as illustrated in FIG.

キーワード管理部１０８は、文章データが示す文章から類似度算出部１０７から取得したキーワードと、その文章データの文章ＩＤ、キーワードベクトル、及び類似度を含んで構成される。類似度は、そのキーワードベクトルとその文章の文章ベクトルとの類似度を示す指標値である。 The keyword management unit 108 is configured to include the keyword acquired from the similarity calculation unit 107 from the text indicated by the text data, the text ID of the text data, the keyword vector, and the similarity. The similarity is an index value indicating the similarity between the keyword vector and the sentence vector of the sentence.

データ処理部１０ｂは、分離部１０２、Ｄｏｃ２ｖｅｃ部１０３、誤差算出部１０４、Ｗｅｉｇｈｔ算出部１０５、特徴量取得部１０６、及び類似度算出部１０７を含んで構成される。各部が実行する処理については後述する。なお、データ処理部１０ｂは、文章データが示す文章について形態素解析を行って、その文章を構成する単語とその文章における順序を単語順序として判別する文章解析部（図示せず）を備えてもよい。文章解析部は、判別した単語と単語順序を示す形態素データを文章管理部１０１に記憶する。データ処理部１０ｂは、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央処理装置）などの処理デバイスを含んで構成され、所定の制御プログラムで指示される処理を実行することにより、その機能を実現してもよい。 The data processing unit 10 b includes a separation unit 102, a Doc2vec unit 103, an error calculation unit 104, a weight calculation unit 105, a feature amount acquisition unit 106, and a similarity calculation unit 107. Processing executed by each unit will be described later. The data processing unit 10b may include a sentence analysis unit (not shown) that performs morphological analysis on the sentence indicated by the sentence data and discriminates the words constituting the sentence and the order in the sentence as the word order. . The sentence analysis unit stores in the sentence management unit 101 morpheme data indicating the determined word and the word order. The data processing unit 10b may be configured to include a processing device such as a CPU (Central Processing Unit), and may realize the function by executing processing instructed by a predetermined control program.

（事前学習）
次に、本実施形態に係るデータ処理部１０ｂが行う事前学習について説明する。事前学習は、単語ベクトルを作成する単語ベクトル作成処理と、文章ベクトルを作成する文章ベクトル作成処理とを含む。単語ベクトルは、各単語の特徴としてその意味を定量的に示す特徴ベクトルである。文章ベクトルは、各文章の特徴としてその意味を定量的に示す特徴ベクトルである。各単語は、所定の次元数の１つの単語ベクトルに対応付けられ、各文章は、１つの文章ベクトルに対応付けられる。本実施形態では、単語ベクトルと文章ベクトルの次元数は、同一である。 (Learning in advance)
Next, prior learning performed by the data processing unit 10b according to the present embodiment will be described. The pre-learning includes a word vector creation process for creating a word vector and a sentence vector creation process for creating a sentence vector. The word vector is a feature vector that quantitatively indicates the meaning as a feature of each word. The sentence vector is a feature vector that quantitatively indicates the meaning of each sentence as a feature. Each word is associated with one word vector having a predetermined number of dimensions, and each sentence is associated with one sentence vector. In this embodiment, the number of dimensions of the word vector and the sentence vector is the same.

図２は、本実施形態に係る事前学習の一例に係る機能ブロック図である。
分離部１０２は、文章管理部１０１に予め記憶させておいた文章データが示す文章ＩＤ、文章タグ及び文章のセットを読み取り、形態素データを参照して、その文章ＩＤに対応する単語と単語順序を読み取る。分離部１０２は、単語データを参照して、その単語に対応する単語タグを読み取る。従って、文章ＩＤ毎にその文章を構成する複数の単語それぞれの単語タグからなる単語タグ列が読み取られる。分離部１０２は、読みとった単語タグのから各１つの単語タグを分離し、その単語タグを教師信号として誤差算出部１０４に出力する。分離部１０２は、読みとった単語タグのうち、教師信号として用いられる単語タグを基準として所定の順序の範囲内の単語に係る単語タグを分離する。以下の説明では、所定の順序の範囲を分析窓と呼び、分離される各１つの単語タグを目標単語タグと呼ぶ。分離部１０２は、分離した分析窓内の単語タグからなる単語タグ列と、それらの単語タグを含む文章の文章タグとをＤｏｃ２Ｖｅｃ部１０３に出力する。一度にＤｏｃ２Ｖｅｃ部１０３に出力される単語タグには、教師信号として用いられる注目単語タグは含まれない。目標単語タグとして分離される単語タグの順序は、形態素データを参照して読み取られた単語順序に従う。 FIG. 2 is a functional block diagram according to an example of pre-learning according to the present embodiment.
The separation unit 102 reads a set of a sentence ID, a sentence tag, and a sentence indicated by sentence data stored in advance in the sentence management unit 101, refers to the morpheme data, and determines a word and a word order corresponding to the sentence ID. read. The separation unit 102 refers to the word data and reads a word tag corresponding to the word. Therefore, for each sentence ID, a word tag string composed of word tags of a plurality of words constituting the sentence is read. Separation section 102 separates each word tag from the read word tags, and outputs the word tag to error calculation section 104 as a teacher signal. The separation unit 102 separates the word tags related to the words in a predetermined order range from the read word tags with reference to the word tags used as teacher signals. In the following description, the range of the predetermined order is called an analysis window, and each one word tag to be separated is called a target word tag. Separation unit 102 outputs to Doc2Vec unit 103 a word tag string composed of the word tags in the separated analysis window and a sentence tag of a sentence including those word tags. The word tag output to the Doc2Vec unit 103 at a time does not include the attention word tag used as the teacher signal. The order of the word tags separated as the target word tag follows the word order read with reference to the morpheme data.

Ｄｏｃ２Ｖｅｃ部１０３は、分離部１０２から入力される単語タグ列に含まれる各単語タグと、入力される文章タグに基づいて、注目単語タグに係る出力値を、所定の数理モデルに基づいて算出する。単語タグ毎の入力値として、例えば、その単語タグが示す単語の単語ベクトルが用いられる。単語ベクトルとして、Ｗｅｉｇｈｔ算出部１０５が算出した単語重み行列Ｗｅｉｇｈｔ１（後述）を構成する各行各列の重み係数のうち、その単語に係る列の重み係数を要素とする列ベクトルが取得される。文章タグ毎の入力値として、例えば、その文章タグが示す文章の文章ベクトルが用いられる。文章ベクトルとして、Ｗｅｉｇｈｔ算出部１０５が算出した文章重み行列Ｗｅｉｇｈｔ２（後述）を構成する各行各列の重み係数のうち、その文章に係る列の重み係数を要素とする列ベクトルが取得される。Ｄｏｃ２Ｖｅｃ部１０３は、所定の数理モデルとして、例えば、活性化関数としてソフトマックス関数が適用されたニューラルネットワーク（ＮＮ：ＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いる。なお、本実施形態では、単語ベクトルを構成する要素値の個数、つまり次元数は、文章ベクトルの次元数と等しい。次元数は、例えば、１００〜３００である。Ｄｏｃ２Ｖｅｃ部１０３は、入力される単語タグ、文章タグに対する出力値として、分類される単語タグ毎の確率を示す分類器として作用する。出力値は、単語タグ毎の出現確率を要素値とするベクトルとなる。Ｄｏｃ２Ｖｅｃ部１０３は、算出した出力値を誤差算出部１０４に出力する。 The Doc2Vec unit 103 calculates an output value related to the attention word tag based on each word tag included in the word tag string input from the separation unit 102 and the input text tag based on a predetermined mathematical model. . As an input value for each word tag, for example, a word vector of a word indicated by the word tag is used. As the word vector, a column vector having the weight coefficient of the column related to the word as an element among the weight coefficients of each column constituting the word weight matrix Weight1 (described later) calculated by the Weight calculation unit 105 is acquired. As an input value for each sentence tag, for example, a sentence vector of the sentence indicated by the sentence tag is used. As the text vector, a column vector having the weight coefficient of the column related to the text as an element among the weight coefficients of each column constituting the text weight matrix Weight2 (described later) calculated by the Weight calculation unit 105 is acquired. The Doc2Vec unit 103 uses, for example, a neural network (NN) as a predetermined mathematical model to which a softmax function is applied as an activation function. In the present embodiment, the number of element values constituting the word vector, that is, the number of dimensions is equal to the number of dimensions of the sentence vector. The number of dimensions is, for example, 100 to 300. The Doc2Vec unit 103 functions as a classifier indicating the probability of each word tag to be classified as an output value for the input word tag and sentence tag. The output value is a vector whose element value is the appearance probability for each word tag. The Doc2Vec unit 103 outputs the calculated output value to the error calculation unit 104.

誤差算出部１０４は、分離部１０２から入力された教師信号となる目標単語タグに係る所定の出力値からＤｏｃ２Ｖｅｃ部１０３から入力された目標単語タグに係る出力値との差を誤差として算出する。誤差算出部１０４は、例えば、所定の出力値として、その目標単語タグに係る要素値が１であり、その他の単語タグに係る要素値が０である目標出力ベクトルを用いる。誤差算出部１０４は、算出した誤差をＷｅｉｇｈｔ算出部１０５に出力する。 The error calculation unit 104 calculates, as an error, a difference between a predetermined output value related to the target word tag serving as a teacher signal input from the separation unit 102 and an output value related to the target word tag input from the Doc2Vec unit 103. For example, the error calculation unit 104 uses a target output vector in which the element value related to the target word tag is 1 and the element values related to other word tags are 0 as the predetermined output value. The error calculation unit 104 outputs the calculated error to the weight calculation unit 105.

Ｗｅｉｇｈｔ算出部１０５は、誤差算出部１０４から入力された誤差の大きさを示す指標値がより小さくなるように、入力される各単語タグ、文章タグに対して、目標単語タグに係る出力値の算出に用いられるパラメータ群Ｗｅｉｇｈｔを算出する。以下の説明では、パラメータ群Ｗｅｉｇｈｔを単にＷｅｉｇｈｔと呼ぶことがある。Ｗｅｉｇｈｔ算出部１０５は、Ｗｅｉｇｈｔの算出において、例えば、公知の確率的勾配降下法を用いる。これにより目標単語タグに係る確率が１に近づくように最大化され、その他の単語タグに係る確率が０に近づくように最小化されるようにＷｅｉｇｈｔが定められる。そして、Ｗｅｉｇｈｔ算出部１０５は、入力される単語タグ及び文章タグに対する目標単語タグの出力値である出現確率がより大きくなるようにＷｅｉｇｈｔが定めることができる。Ｗｅｉｇｈｔ算出部１０５は、算出されるＷｅｉｇｈｔのうち、単語タグ毎の入力値である単語ベクトルの各要素を行とし、各単語タグの単語ベクトルを列とする行列を単語重み行列Ｗｅｉｇｈｔ１として算出する。Ｗｅｉｇｈｔ算出部１０５は、文章タグの入力値である文章ベクトルの各要素を行とし、各文章タグの文章ベクトルを列とする行列を文章重み行列Ｗｅｉｇｈｔ２として算出する。Ｗｅｉｇｈｔ算出部１０５は、各単語ベクトル又は各文章ベクトルの要素値による出力値の要素値に対する寄与の大きさを示す重み値を各行各列の要素値とする行列と、出力値の各列に係るバイアス値を要素値とするベクトルとのセットをパラメータ群Ｗｅｉｇｈｔ３として算出する。以下の説明では、単語重み行列Ｗｅｉｇｈｔ１、文章重み行列Ｗｅｉｇｈｔ２、及びパラメータ群Ｗｅｉｇｈｔ３を、それぞれ単にＷｅｉｇｈｔ１、Ｗｅｉｇｈｔ２、及びＷｅｉｇｈｔ３と呼ぶことがある。Ｗｅｉｇｈｔ算出部１０５は、算出したＷｅｉｇｈｔ１、Ｗｅｉｇｈｔ２、Ｗｅｉｇｈｔ３をＤｏｃ２Ｖｅｃ部１０３に出力する。単語ベクトル作成処理においては、Ｗｅｉｇｈｔ算出部１０５は、Ｗｅｉｇｈｔ１、Ｗｅｉｇｈｔ２、Ｗｅｉｇｈｔ３のいずれも可変なパラメータとして各文書に含まれる単語毎に算出する。Ｗｅｉｇｈｔ３は、文書、単語によらず共通なパラメータである。これに対し、文章ベクトル作成処理においては、Ｗｅｉｇｈｔ算出部１０５は、単語ベクトル作成処理において算出したＷｅｉｇｈｔ１、Ｗｅｉｇｈｔ３を固定し、Ｗｅｉｇｈｔ２を可変なパラメータとして単語毎に算出する。従って、Ｄｏｃ２Ｖｅｃ部１０３、誤差算出部１０４、及びＷｅｉｇｈｔ算出部１０５は、単語ベクトル、文章ベクトルといった特徴ベクトルを算出する特徴ベクトル算出部として機能する。 The weight calculation unit 105 outputs the output value related to the target word tag for each input word tag and sentence tag so that the index value indicating the magnitude of the error input from the error calculation unit 104 becomes smaller. A parameter group Weight used for calculation is calculated. In the following description, the parameter group Weight may be simply referred to as “Weight”. The weight calculation unit 105 uses, for example, a known probabilistic gradient descent method in calculating the weight. Thus, the weight is determined so that the probability related to the target word tag is maximized so as to approach 1 and the probability related to other word tags is minimized so as to approach 0. Then, the weight calculation unit 105 can determine the weight so that the appearance probability that is the output value of the target word tag with respect to the input word tag and sentence tag becomes larger. The weight calculation unit 105 calculates, as the word weight matrix Weight1, a matrix in which each element of the word vector that is an input value for each word tag is a row and the word vector of each word tag is a column in the calculated weight. The weight calculation unit 105 calculates, as a sentence weight matrix Weight2, a matrix in which each element of the sentence vector that is the input value of the sentence tag is a row and the sentence vector of each sentence tag is a column. The weight calculation unit 105 relates to a matrix in which the weight value indicating the magnitude of the contribution to the element value of the output value by the element value of each word vector or each sentence vector is the element value of each row and each column of the output value A set with a vector having a bias value as an element value is calculated as a parameter group Weight3. In the following description, the word weight matrix Weight1, the sentence weight matrix Weight2, and the parameter group Weight3 may be simply referred to as Weight1, Weight2, and Weight3, respectively. The Weight calculation unit 105 outputs the calculated Weight1, Weight2, and Weight3 to the Doc2Vec unit 103. In the word vector creation process, the weight calculation unit 105 calculates each of words included in each document as variable parameters of weight1, weight2, and weight3. Weight 3 is a common parameter regardless of the document and the word. On the other hand, in the text vector creation process, the weight calculation unit 105 fixes Weight1 and Weight3 calculated in the word vector creation process, and calculates Weight2 for each word as a variable parameter. Accordingly, the Doc2Vec unit 103, the error calculation unit 104, and the weight calculation unit 105 function as a feature vector calculation unit that calculates a feature vector such as a word vector or a sentence vector.

（Ｄｏｃ２Ｖｅｃ部）
次に、本実施形態に係るＤｏｃ２Ｖｅｃ部１０３の構成例について説明する。
図６は、本実施形態に係るＤｏｃ２Ｖｅｃ部１０３の構成例を示す図である。
図６に示す例では、Ｄｏｃ２Ｖｅｃ部１０３は、２層のニューラルネットワークを含んで構成される。第１層（Ｌａｙｅｒ１）は、ベクトル選択部とベクトル保持部を含んで構成され、２ｋ個の単語タグからなる単語タグ列と１つの文章タグが分離部１０２から入力される。ここでｋは、１以上の所定の整数である。２ｋ個の単語タグは、その文章の第ｔ−ｋ〜ｔ−１、ｔ＋１〜ｔ＋ｋ番目の単語のそれぞれを示す単語タグであり、その順序で順次入力される。ベクトル選択部に配置されている縦長の四角形は入力端を示す。単語タグが入力される各入力端は、Ｗｅｉｇｈｔ算出部１０５から入力されるＷｅｉｇｈｔ１のうち、入力される単語タグに対応する列の要素値からなる単語ベクトルを抽出する。その入力端は抽出した単語ベクトルをベクトル保持部の対向する節点群に出力する。ベクトル保持部の節点群は縦長の四角形で表される。それぞれの四角形において上下に一列に配列されている個々の丸印は、節点（ノード）を示す。文章タグが入力される入力端は、Ｗｅｉｇｈｔ算出部１０５から入力されるＷｅｉｇｈｔ２が示す行列のうち、入力される文章タグに対応する列の要素値からなる文章ベクトルを抽出する。その入力端は、抽出した文章ベクトルをベクトル保持部の対向する節点群に出力する。 (Doc2Vec part)
Next, a configuration example of the Doc2Vec unit 103 according to the present embodiment will be described.
FIG. 6 is a diagram illustrating a configuration example of the Doc2Vec unit 103 according to the present embodiment.
In the example illustrated in FIG. 6, the Doc2Vec unit 103 includes a two-layer neural network. The first layer (Layer 1) includes a vector selection unit and a vector holding unit, and a word tag string composed of 2k word tags and one sentence tag are input from the separation unit 102. Here, k is a predetermined integer of 1 or more. The 2k word tags are word tags indicating the t-k to t-1 and t + 1 to t + k-th words of the sentence, and are sequentially input in the order. A vertically long rectangle arranged in the vector selection unit indicates an input end. Each input terminal to which a word tag is input extracts a word vector composed of element values of a column corresponding to the input word tag from the Weight 1 input from the Weight calculation unit 105. The input end outputs the extracted word vector to the opposite node group of the vector holding unit. The node group of the vector holding unit is represented by a vertically long rectangle. Each circle mark arranged in a line in the upper and lower sides in each quadrangle indicates a node (node). The input terminal to which the text tag is input extracts a text vector composed of element values of the column corresponding to the input text tag from the matrix indicated by Weight 2 input from the weight calculation unit 105. The input terminal outputs the extracted sentence vector to the opposite node group of the vector holding unit.

ベクトル保持部の各節点群には、それぞれ対向するベクトル選択部の節点群から単語ベクトル又は文章ベクトルが入力される。ベクトル保持部の各節点群を構成する個々の節点には、その節点に対応する次元の要素値が入力され、入力される要素値を第２層（Ｌａｙｅｒ２）の対応する節点にそれぞれ出力する。Ｌａｙｅｒ２の各節点には、Ｗｅｉｇｈｔ算出部１０５から入力されるＷｅｉｇｈｔ３が示す行列Ｕのうち、その節点に対応する行の要素値からなる行ベクトルと、Ｗｅｉｇｈｔ３が示す列ベクトルｂのその節点に対応する行の要素値であるバイアス値が設定される。Ｌａｙｅｒ２の各節点は、設定された行ベクトルの各列の要素値に、ベクトル保持部の各節点から入力される単語ベクトル及び文章ベクトルのそれぞれ対応する要素値を乗算して得られた乗算値の総和を算出する。Ｌａｙｅｒ２の各節点は、算出した総和に設定されたバイアス値を加算し、加算により得られた対数確率に基づいて、その次元に係る単語の出現確率を算出する。 To each node group of the vector holding unit, a word vector or a sentence vector is input from the node group of the opposing vector selection unit. Element values of dimensions corresponding to the nodes are input to the individual nodes constituting each node group of the vector holding unit, and the input element values are output to the corresponding nodes of the second layer (Layer 2). Each node of Layer 2 corresponds to the row vector composed of the element values of the row corresponding to the node in the matrix U indicated by Weight 3 input from the Weight calculation unit 105 and the node of the column vector b indicated by Weight 3. A bias value that is an element value of a row is set. Each node of Layer 2 is obtained by multiplying the element value of each column of the set row vector by the element value corresponding to each of the word vector and sentence vector input from each node of the vector holding unit. Calculate the sum. Each node of Layer 2 adds a bias value set to the calculated sum, and calculates the appearance probability of a word related to that dimension based on the logarithmic probability obtained by the addition.

次に、本実施形態に係るＤｏｃ２Ｖｅｃ部１０３の別の構成例について説明する。
図７は、本実施形態に係るＤｏｃ２Ｖｅｃ部１０３の別の構成例を示す図である。
図７に示す例では、Ｄｏｃ２Ｖｅｃ部１０３は、３層のニューラルネットワークを含んで構成される。Ｌａｙｅｒ１は、図６に示す例と同様にベクトル選択部とベクトル保持部を含んで構成され、２ｋ個の単語タグからなる単語タグ列と、１つの文章タグが分離部１０２から入力される。各１個の単語タグが入力される各ベクトル選択部は、Ｗｅｉｇｈｔ算出部１０５から入力されるＷｅｉｇｈｔ１のうち、入力される単語タグに対応する列の要素値からなる単語ベクトルを抽出する。そのベクトル選択部は抽出した単語ベクトルをベクトル保持部の対向する節点群に出力する。文章タグが入力されるベクトル選択部は、Ｗｅｉｇｈｔ算出部１０５から入力されるＷｅｉｇｈｔ２が示す行列のうち、入力される文章タグに対応する列の要素値からなる文章ベクトルを抽出する。そのベクトル選択部は、抽出した文章ベクトルをベクトル保持部の対向する節点群に出力する。 Next, another configuration example of the Doc2Vec unit 103 according to the present embodiment will be described.
FIG. 7 is a diagram illustrating another configuration example of the Doc2Vec unit 103 according to the present embodiment.
In the example illustrated in FIG. 7, the Doc2Vec unit 103 includes a three-layer neural network. As in the example shown in FIG. 6, Layer 1 includes a vector selection unit and a vector holding unit, and a word tag string including 2k word tags and one sentence tag are input from the separation unit 102. Each vector selection unit to which each one word tag is input extracts a word vector composed of element values of a column corresponding to the input word tag, out of Weight 1 input from the weight calculation unit 105. The vector selection unit outputs the extracted word vector to the opposite node group of the vector holding unit. The vector selection unit to which the text tag is input extracts a text vector composed of element values of the column corresponding to the input text tag from the matrix indicated by Weight2 input from the weight calculation unit 105. The vector selection unit outputs the extracted sentence vector to the opposite node group of the vector holding unit.

ベクトル保持部の各節点群には、それぞれ対向するベクトル選択部の節点群から単語ベクトル又は文章ベクトルが入力される。ベクトル保持部の各節点群を構成する個々の節点には、その節点に対応する次元の要素値が入力され、入力される要素値をＬａｙｅｒ２の対応する節点にそれぞれ出力する。
Ｌａｙｅｒ２の節点群には、それぞれ対向するベクトル保持部の節点群から単語ベクトルと文章ベクトルが入力される。Ｌａｙｅｒ２の節点群を構成する個々の節点には、その節点に対応する次元の要素値が入力される。図７に示す例では、Ｌａｙｅｒ２の接点群を構成する各節点は、入力された要素値を平均することによって要素平均値を算出し、Ｌａｙｅｒ３の対応する節点にそれぞれ出力する。なお、Ｌａｙｅｒ２の節点群を構成する個々の節点は、要素平均値に代え、各節点に入力された要素値の総和である要素総和値を算出し、算出した要素総和値をＬａｙｅｒ３の対応する節点にそれぞれ出力してもよい。
Ｌａｙｅｒ３の各節点には、Ｗｅｉｇｈｔ算出部１０５から入力されるＷｅｉｇｈｔ３が示す行列Ｕのうち、その節点に対応する行の要素値からなる行ベクトルと、Ｗｅｉｇｈｔ３が示す列ベクトルｂのその節点に対応する行の要素値であるバイアス値が設定される。Ｌａｙｅｒ３の各節点は、設定された行ベクトルの各列の要素値に、Ｌａｙｅｒ２の各節点から入力される要素値を乗算して得られた乗算値の総和を算出する。Ｌａｙｅｒ３の各節点は、算出した総和に設定されたバイアス値を加算し、加算により得られた対数確率に基づいて、その次元に係る単語の出現確率を算出する。 To each node group of the vector holding unit, a word vector or a sentence vector is input from the node group of the opposing vector selection unit. Element values of dimensions corresponding to the nodes are input to individual nodes constituting each node group of the vector holding unit, and the input element values are output to the corresponding nodes of Layer2.
A word vector and a sentence vector are input to the node group of Layer 2 from the node group of the opposing vector holding unit. Element values of dimensions corresponding to the nodes are input to the individual nodes constituting the node group of Layer2. In the example shown in FIG. 7, each node configuring the contact group of Layer 2 calculates an element average value by averaging the input element values, and outputs the element average value to the corresponding node of Layer 3. In addition, instead of the element average value, each node constituting the node group of Layer 2 calculates an element sum value that is the sum of the element values input to each node, and the calculated element sum value is a corresponding node of Layer 3. May be output respectively.
Each node of Layer 3 corresponds to the row vector composed of element values of the row corresponding to the node in the matrix U indicated by Weight 3 input from the Weight calculation unit 105 and to the node of the column vector b indicated by Weight 3. A bias value that is an element value of a row is set. Each node of Layer 3 calculates the sum of the multiplication values obtained by multiplying the element value of each column of the set row vector by the element value input from each node of Layer 2. Each node of Layer 3 adds a bias value set to the calculated sum, and calculates the appearance probability of a word related to that dimension based on the logarithmic probability obtained by the addition.

（単語ベクトル作成処理）
次に、本実施形態に係る単語ベクトル作成処理について説明する。
図８は、本実施形態に係る単語ベクトル作成処理の一例を示すフローチャートである。
（ステップＳ１０１）文章管理部１０１に記憶された文章データが示す文章、単語データが示す単語のうち、未処理の文章の文章タグと、その文章を構成する各単語の単語タグからなる単語タグ列の有無を判定する。あると判定されるとき（ステップＳ１０１ＹＥＳ）、ステップＳ１０２の処理に進む。ないと判定されるとき（ステップＳ１０１ＮＯ）、図８に示す処理を終了する。 (Word vector creation process)
Next, word vector creation processing according to the present embodiment will be described.
FIG. 8 is a flowchart showing an example of a word vector creation process according to the present embodiment.
(Step S101) Among the sentences indicated by the sentence data stored in the sentence management unit 101 and the words indicated by the word data, a word tag string including a sentence tag of an unprocessed sentence and a word tag of each word constituting the sentence. The presence or absence of is determined. When it is determined that there is (YES in step S101), the process proceeds to step S102. When it is determined that there is no (NO in step S101), the process shown in FIG.

（ステップＳ１０２）文章管理部１０１は、未処理の文章のいずれかの文章タグと、その文章を構成する各単語の単語タグからなる単語タグ列を分離部１０２に出力する。なお、単語タグ列をなす単語の順序は、その文書中に現われる順序である。この順序は、形態素データが示す単語順序を参照して特定される。その後、ステップＳ１０３の処理に進む。
（ステップＳ１０３）分離部１０２は、文章管理部１０１から入力された単語タグ列を、その単語タグ列に含まれる単語タグのいずれかである目標単語タグを誤差算出部１０４に出力する教師信号として分離する。１つの文章における初回の処理において、分離部１０２は、単語順位が第１番である単語の単語タグを目標単語タグとして選択する。分離部１０２は、その残りの単語タグからなる単語タグ列を、Ｄｏｃ２Ｖｅｃ部１０３に出力する単語タグ列として分離する。その後、ステップＳ１０４の処理に進む。
（ステップＳ１０４）分離部１０２は、入力された文章タグと分離した単語タグ列とをＤｏｃ２Ｖｅｃ部１０３に出力し、教師信号を誤差算出部１０４に出力する。その後、ステップＳ１０５の処理に進む。 (Step S102) The sentence management unit 101 outputs, to the separation unit 102, a word tag string including any sentence tag of an unprocessed sentence and a word tag of each word constituting the sentence. Note that the order of the words constituting the word tag string is the order in which the words appear in the document. This order is specified with reference to the word order indicated by the morpheme data. Thereafter, the process proceeds to step S103.
(Step S103) The separation unit 102 uses the word tag string input from the sentence management unit 101 as a teacher signal that outputs to the error calculation unit 104 a target word tag that is one of the word tags included in the word tag string. To separate. In the first process for one sentence, the separation unit 102 selects the word tag of the word having the first word rank as the target word tag. The separation unit 102 separates the word tag string including the remaining word tags as a word tag string to be output to the Doc2Vec unit 103. Thereafter, the process proceeds to step S104.
(Step S104) The separation unit 102 outputs the input sentence tag and the separated word tag string to the Doc2Vec unit 103, and outputs a teacher signal to the error calculation unit 104. Thereafter, the process proceeds to step S105.

（ステップＳ１０５）Ｄｏｃ２Ｖｅｃ部１０３は、入力された文章タグと単語タグ列に基づいて出力値をＮＮ出力として算出し、算出したＮＮ出力を誤差算出部１０４に出力する。その後、ステップＳ１０６の処理に進む。
（ステップＳ１０６）誤差算出部１０４は、Ｄｏｃ２Ｖｅｃ部１０３から入力されたＮＮ出力から、分離部１０２から入力された教師信号が示す目標単語タグに基づく出力値との差を誤差として算出する。誤差算出部１０４は、算出した誤差をＷｅｉｇｈｔ算出部１０５に出力する。その後、ステップＳ１０７の処理に進む。
（ステップＳ１０７）Ｗｅｉｇｈｔ算出部１０５は、誤差算出部１０４から入力された誤差の大きさの指標値が小さくなるようにＷｅｉｇｈｔ１、Ｗｅｉｇｈｔ２、Ｗｅｉｇｈｔ３のいずれも（全Ｗｅｉｇｈｔ）算出する。Ｗｅｉｇｈｔ算出部１０５は、誤差が所定の閾値よりも小さくなるまで全Ｗｅｉｇｈｔを再帰的に算出する。その後、Ｗｅｉｇｈｔ算出部１０５は、Ｄｏｃ２Ｖｅｃ部１０３がＮＮ出力の算出に用いる全Ｗｅｉｇｈｔを、計算した全Ｗｅｉｇｈｔに更新する。その後、ステップＳ１０８の処理に進む。
（ステップＳ１０８）分離部１０２は、入力された単語タグ列のうち目標単語タグ、即ち教師信号として未採用の単語タグの有無を判定する。あると判定されるとき（ステップＳ１０８ＹＥＳ）、分離部１０２は、その時点において採用された単語タグの次の単語順位の未採用の単語タグを目標単語タグとして選択する。その後、ステップＳ１０３の処理に進む。未採用の単語タグがないと判定されるとき（ステップＳ１０８ＮＯ）、ステップＳ１０１の処理に進む。 (Step S <b> 105) The Doc2Vec unit 103 calculates an output value as an NN output based on the input sentence tag and word tag string, and outputs the calculated NN output to the error calculation unit 104. Thereafter, the process proceeds to step S106.
(Step S106) The error calculation unit 104 calculates, as an error, the difference between the NN output input from the Doc2Vec unit 103 and the output value based on the target word tag indicated by the teacher signal input from the separation unit 102. The error calculation unit 104 outputs the calculated error to the weight calculation unit 105. Thereafter, the process proceeds to step S107.
(Step S107) The Weight calculation unit 105 calculates all of Weight1, Weight2, and Weight3 (all weights) so that the index value of the magnitude of the error input from the error calculation unit 104 becomes small. The weight calculation unit 105 recursively calculates all weights until the error becomes smaller than a predetermined threshold. Thereafter, the weight calculation unit 105 updates the total weight used by the Doc2Vec unit 103 for calculating the NN output to the calculated total weight. Thereafter, the process proceeds to step S108.
(Step S <b> 108) The separation unit 102 determines the presence or absence of a target word tag in the input word tag string, that is, a word tag that has not been adopted as a teacher signal. When it is determined that there is (YES in step S108), the separation unit 102 selects, as a target word tag, an unadopted word tag in the next word rank after the word tag adopted at that time. Thereafter, the process proceeds to step S103. When it is determined that there is no unadopted word tag (NO in step S108), the process proceeds to step S101.

（文章ベクトル作成処理）
次に、本実施形態に係る文章ベクトル作成処理について説明する。
図９は、本実施形態に係る文章ベクトル作成処理の一例を示すフローチャートである。
図９に示す処理は、ステップＳ１１１〜Ｓ１１８の処理を有する。ステップＳ１１１〜Ｓ１１６及びステップＳ１１８の処理は、それぞれ図８のステップＳ１０１〜Ｓ１０６及びステップＳ１０８の処理と同様であるので、その説明を援用する。
図９に示す処理では、ステップＳ１１６の処理の後、ステップＳ１１７の処理に進む。
（ステップＳ１１８）Ｗｅｉｇｈｔ算出部１０５は、誤差算出部１０４から入力された誤差の指標値が小さくなるようにＷｅｉｇｈｔ２を算出する。Ｗｅｉｇｈｔ算出部１０５は、誤差が所定の閾値よりも小さくなるまでＷｅｉｇｈｔ２を再帰的に算出する。その後、Ｗｅｉｇｈｔ算出部１０５は、Ｄｏｃ２Ｖｅｃ部１０３がＮＮ出力の算出に用いるＷｅｉｇｈｔ２を、計算したＷｅｉｇｈｔ２に更新する。本ステップでは、Ｄｏｃ２Ｖｅｃ部１０３は、単語ベクトル作成処理において算出されたＷｅｉｇｈｔ１及びＷｅｉｇｈｔ３をＮＮ出力の算出に用いる。その後、ステップＳ１１８の処理に進む。 (Sentence vector creation process)
Next, the text vector creation process according to the present embodiment will be described.
FIG. 9 is a flowchart illustrating an example of a text vector creation process according to the present embodiment.
The process illustrated in FIG. 9 includes the processes of steps S111 to S118. Since the processes of steps S111 to S116 and step S118 are the same as the processes of steps S101 to S106 and step S108 of FIG. 8, respectively, the description thereof is incorporated.
In the process shown in FIG. 9, the process proceeds to step S117 after the process in step S116.
(Step S118) The Weight calculation unit 105 calculates Weight2 so that the error index value input from the error calculation unit 104 becomes smaller. The Weight calculation unit 105 recursively calculates Weight2 until the error becomes smaller than a predetermined threshold. Thereafter, the weight calculation unit 105 updates the weight 2 used by the Doc2Vec unit 103 to calculate the NN output to the calculated weight 2. In this step, the Doc2Vec unit 103 uses Weight1 and Weight3 calculated in the word vector creation process for calculation of the NN output. Thereafter, the process proceeds to step S118.

（特徴ベクトルの管理）
次に、本実施形態に係る情報処理装置１０が行う特徴ベクトルの管理について説明する。
図１０は、本実施形態に係る特徴ベクトルの管理の一例に係る機能ブロック図である。
まず、特徴ベクトルとして文章ベクトルの管理について説明する。
文章管理部１０１は、自部に記憶された文章データが示す文章ＩＤと文章タグとを対応付けて特徴量取得部１０６に出力する。文章管理部１０１には、その応答として文章ＩＤと文章ベクトルが特徴量取得部１０６から入力され、入力された文章ＩＤと文章ベクトルを対応付けて自部に記憶する。文章管理部１０１には、図１１に示すように文章ＩＤと文章ベクトルとを対応付けてなる文章ベクトルデータが形成される。 (Management of feature vectors)
Next, feature vector management performed by the information processing apparatus 10 according to the present embodiment will be described.
FIG. 10 is a functional block diagram according to an example of feature vector management according to the present embodiment.
First, management of sentence vectors as feature vectors will be described.
The sentence management unit 101 associates the sentence ID indicated by the sentence data stored in the own part with the sentence tag and outputs the associated sentence ID to the feature amount acquisition unit 106. The sentence management unit 101 receives the sentence ID and the sentence vector from the feature amount acquisition unit 106 as a response, and stores the input sentence ID and the sentence vector in association with each other. In the text management unit 101, text vector data in which text IDs and text vectors are associated is formed as shown in FIG.

特徴量取得部１０６は、文章管理部１０１から文章ＩＤと対応付けて入力された文章タグをＤｏｃ２Ｖｅｃ部１０３に出力する。特徴量取得部１０６には、その応答として文章ベクトルがＤｏｃ２Ｖｅｃ部１０３から入力され、入力された文章ベクトルと文章ＩＤを対応付けて文章管理部１０１に出力する。
Ｄｏｃ２Ｖｅｃ部１０３は、自部に設定されたＷｅｉｇｈｔ２のうち特徴量取得部１０６から入力された文章タグに対応する列の要素値からなる列ベクトルを文章ベクトルとして特定する。Ｄｏｃ２Ｖｅｃ部１０３は、特定した文章ベクトルを特徴量取得部１０６に出力する。 The feature amount acquisition unit 106 outputs the text tag input in association with the text ID from the text management unit 101 to the Doc2Vec unit 103. As a response to the feature amount acquisition unit 106, a text vector is input from the Doc2Vec unit 103, and the input text vector and the text ID are associated with each other and output to the text management unit 101.
The Doc2Vec unit 103 specifies, as a text vector, a column vector composed of element values of the column corresponding to the text tag input from the feature quantity acquisition unit 106 among the weights 2 set in the self unit. The Doc2Vec unit 103 outputs the specified sentence vector to the feature amount acquisition unit 106.

次に、単語ベクトルの管理について説明する。文章管理部１０１は、自部に記憶された単語データが示す単語と単語タグとを対応付けて特徴量取得部１０６に出力する。文章管理部１０１には、その応答として単語と単語ベクトルが特徴量取得部１０６から入力され、入力された単語と単語ベクトルを対応付けて自部に記憶する。文章管理部１０１には、図１２に示すように単語と単語ベクトルとを対応付けてなる単語ベクトルデータが形成される。 Next, management of word vectors will be described. The sentence management unit 101 associates the word indicated by the word data stored in its own unit with the word tag and outputs it to the feature amount acquisition unit 106. The sentence management unit 101 receives a word and a word vector from the feature amount acquisition unit 106 as a response, and stores the input word and the word vector in association with each other. In the text management unit 101, word vector data in which words and word vectors are associated with each other is formed as shown in FIG.

特徴量取得部１０６は、文章管理部１０１から単語と対応付けて入力された単語タグをＤｏｃ２Ｖｅｃ部１０３に出力する。特徴量取得部１０６には、その応答として単語ベクトルがＤｏｃ２Ｖｅｃ部１０３から入力され、入力された単語ベクトルと単語を対応付けて文章管理部１０１に出力する。
Ｄｏｃ２Ｖｅｃ部１０３は、自部に設定されたＷｅｉｇｈｔ１のうち特徴量取得部１０６から入力された単語タグに対応する列の要素値からなる列ベクトルを単語ベクトルとして特定する。Ｄｏｃ２Ｖｅｃ部１０３は、特定した単語ベクトルを特徴量取得部１０６に出力する。 The feature quantity acquisition unit 106 outputs the word tag input in association with the word from the text management unit 101 to the Doc2Vec unit 103. As a response, the feature vector acquisition unit 106 receives a word vector from the Doc2Vec unit 103, associates the input word vector with the word, and outputs the associated word vector to the sentence management unit 101.
The Doc2Vec unit 103 specifies, as a word vector, a column vector composed of element values of columns corresponding to the word tag input from the feature amount acquisition unit 106 of Weight1 set in the unit itself. The Doc2Vec unit 103 outputs the identified word vector to the feature amount acquisition unit 106.

次に、本実施形態に係る特徴ベクトルの管理について説明する。
図１３は、本実施形態に係る特徴ベクトルデータ生成処理の一例を示すフローチャートである。
（ステップＳ１２１）文章管理部１０１に記憶された文章データが示す文章のうち未処理の文章の文章タグ、又は、単語データが示す単語のうち、未処理の単語の単語タグからなる単語タグ列の有無を判定する。あると判定されるとき（ステップＳ１２１ＹＥＳ）、ステップＳ１２２の処理に進む。ないと判定されるとき（ステップＳ１２１ＮＯ）、図１３に示す処理を終了する。 Next, feature vector management according to the present embodiment will be described.
FIG. 13 is a flowchart illustrating an example of feature vector data generation processing according to the present embodiment.
(Step S121) A sentence tag of an unprocessed sentence among sentences indicated by sentence data stored in the sentence management unit 101, or a word tag string including word tags of unprocessed words among words indicated by word data. Determine presence or absence. When it is determined that there is (YES in step S121), the process proceeds to step S122. When it is determined that there is no (NO in step S121), the process shown in FIG. 13 is terminated.

（ステップＳ１２２）文章管理部１０１は、未処理の文章のいずれかの文章ＩＤと文章タグのセット、又は未処理の単語のいずれかとその単語タグのセットを特徴量取得部１０６に出力する。その後、ステップＳ１２３の処理に進む。
（ステップＳ１２３）特徴量取得部１０６は、文章ＩＤと文章タグのセットと、単語と単語タグのセットのいずれが入力されたかを判定する。文章ＩＤと文章タグのセットが入力されたと判定するとき（ステップＳ１２３ＹＥＳ）、ステップＳ１２４の処理に進む。単語と単語タグのセットが入力されたと判定するとき（ステップＳ１２３ＮＯ）、ステップＳ１３４の処理に進む。 (Step S122) The sentence management unit 101 outputs a set of any sentence ID and sentence tag of the unprocessed sentence, or any set of unprocessed words and the word tag to the feature amount acquisition unit 106. Thereafter, the process proceeds to step S123.
(Step S123) The feature amount acquisition unit 106 determines which one of a set of a sentence ID and a sentence tag and a set of a word and a word tag has been input. When it is determined that a set of a sentence ID and a sentence tag has been input (YES in step S123), the process proceeds to step S124. When it is determined that a set of words and word tags has been input (NO in step S123), the process proceeds to step S134.

（ステップＳ１２４）特徴量取得部１０６は、文章ＩＤと対応付けて入力された文章タグをＤｏｃ２Ｖｅｃ部１０３に出力する。その後、ステップＳ１２５の処理に進む。
（ステップＳ１２５）Ｄｏｃ２Ｖｅｃ部１０３は、自部に設定されたＷｅｉｇｈｔ２から特徴量取得部１０６から入力された文章タグに基づいて特定される列の要素値からなる文章ベクトルを取得する。Ｄｏｃ２Ｖｅｃ部１０３は、取得した文章ベクトルを特徴量取得部１０６に出力する。その後、ステップＳ１２６の処理に進む。
（ステップＳ１２６）特徴量取得部１０６は、文章ＩＤとＤｏｃ２Ｖｅｃ部１０３から入力された文章ベクトルとを対応付けて文章管理部１０１に出力する。その後、ステップＳ１２７の処理に進む。
（ステップＳ１２７）文章管理部１０１は、特徴量取得部１０６から入力された文章ＩＤと文章ベクトルとを関連付けて自部に記憶（管理）する。その後、処理対象を他の未処理の文章タグのいずれか又は他の未処理の単語タグのいずれかに変更する。その後、ステップＳ１２１の処理に進む。 (Step S <b> 124) The feature amount acquisition unit 106 outputs the text tag input in association with the text ID to the Doc2Vec unit 103. Thereafter, the process proceeds to step S125.
(Step S <b> 125) The Doc2Vec unit 103 acquires a text vector composed of element values of a column specified based on the text tag input from the feature amount acquisition unit 106 from Weight2 set to the Doc2Vec unit 103. The Doc2Vec unit 103 outputs the acquired sentence vector to the feature amount acquisition unit 106. Thereafter, the process proceeds to step S126.
(Step S126) The feature amount acquisition unit 106 associates the sentence ID with the sentence vector input from the Doc2Vec unit 103, and outputs the associated text vector to the sentence management unit 101. Thereafter, the process proceeds to step S127.
(Step S127) The sentence management unit 101 associates the sentence ID input from the feature amount acquisition unit 106 with the sentence vector and stores (manages) it in its own part. Thereafter, the processing target is changed to any one of other unprocessed sentence tags or any other unprocessed word tag. Thereafter, the process proceeds to step S121.

（ステップＳ１３４）特徴量取得部１０６は、単語と対応付けて入力された単語タグをＤｏｃ２Ｖｅｃ部１０３に出力する。その後、ステップＳ１３５の処理に進む。
（ステップＳ１３５）Ｄｏｃ２Ｖｅｃ部１０３は、自部に設定されたＷｅｉｇｈｔ１から特徴量取得部１０６から入力された単語タグに基づいて特定される列の要素値からなる単語ベクトルを取得する。Ｄｏｃ２Ｖｅｃ部１０３は、取得した単語ベクトルを特徴量取得部１０６に出力する。その後、ステップＳ１３６の処理に進む。
（ステップＳ１３６）特徴量取得部１０６は、単語とＤｏｃ２Ｖｅｃ部１０３から入力された単語ベクトルとを対応付けて文章管理部１０１に出力する。その後、ステップＳ１３７の処理に進む。
（ステップＳ１３７）文章管理部１０１は、特徴量取得部１０６から入力された単語と単語ベクトルとを関連付けて自部に記憶（管理）する。その後、処理対象を他の未処理の単語タグのいずれか又は他の未処理の文章タグのいずれかに変更する。その後、ステップＳ１２１の処理に進む。 (Step S134) The feature quantity acquisition unit 106 outputs the word tag input in association with the word to the Doc2Vec unit 103. Thereafter, the process proceeds to step S135.
(Step S135) The Doc2Vec unit 103 acquires a word vector composed of element values of a column specified based on the word tag input from the feature amount acquiring unit 106 from Weight1 set to the Doc2Vec unit 103 itself. The Doc2Vec unit 103 outputs the acquired word vector to the feature amount acquisition unit 106. Thereafter, the process proceeds to step S136.
(Step S136) The feature amount acquisition unit 106 associates the word with the word vector input from the Doc2Vec unit 103 and outputs the associated word vector to the sentence management unit 101. Thereafter, the process proceeds to step S137.
(Step S137) The sentence management unit 101 stores (manages) the word input from the feature amount acquisition unit 106 and the word vector in association with each other. Thereafter, the processing target is changed to any one of other unprocessed word tags or any other unprocessed sentence tag. Thereafter, the process proceeds to step S121.

（キーワード取得処理）
次に、本実施形態に係る情報処理装置１０が行うキーワード取得処理について説明する。図１４は、本実施形態に係るキーワード取得処理の一例に係る機能ブロック図である。
文章管理部１０１は、自部に記憶された文章ベクトルデータが示す文章ＩＤと文章ベクトルのセットと、単語ベクトルデータが示す単語のうち、その文章を構成する単語と単語ベクトルのセットとを類似度算出部１０７に出力する。この場合、出力される単語と単語ベクトルのセットの個数は、その文章に含まれる単語の数に相当する。なお、文章管理部１０１は、自部に記憶された文章ベクトルデータが示す文章ＩＤと文章ベクトルのセットと、単語ベクトルデータが示すすべての単語についての単語ＩＤと単語ベクトルのセットとを類似度算出部１０７に出力してもよい。 (Keyword acquisition process)
Next, a keyword acquisition process performed by the information processing apparatus 10 according to the present embodiment will be described. FIG. 14 is a functional block diagram according to an example of a keyword acquisition process according to the present embodiment.
The sentence management unit 101 determines a similarity between a set of a sentence ID and a sentence vector indicated by the sentence vector data stored in the own part, and a word and a set of word vectors that constitute the sentence among words indicated by the word vector data. It outputs to the calculation part 107. In this case, the number of sets of words and word vectors to be output corresponds to the number of words included in the sentence. Note that the sentence management unit 101 calculates similarity between a set of sentence IDs and sentence vectors indicated by the sentence vector data stored in itself, and a set of word IDs and word vectors for all words indicated by the word vector data. You may output to the part 107. FIG.

類似度算出部１０７は、文章管理部１０１から文章ＩＤと文章ベクトルのセットと、その文章の単語と単語ベクトルのセットとが入力される。類似度算出部１０７は、文章ベクトルと、その文章の単語毎に単語ベクトルの類似度として、例えば、内積を算出する。内積は、２つの類似度が高いほど、その値が大きい指標値である。内積の最大値は１であり、最小値は０である。類似度算出部１０７は、類似度が最も高い単語から所定の類似度の閾値（例えば、０．７）よりも高い単語をキーワードとして選択する。従って、１つの文章において選択されるキーワードの数は、１個又は０個となりうる。なお、類似度算出部１０７は、そのような単語もしくは文章中の全単語のうち最も類似度が高い単語から類似度の降順に所定の個数の単語をキーワードとして選択してもよい。類似度算出部１０７は、文章ＩＤと、選択したキーワード、その単語ベクトルであるキーワードベクトルならびに類似度のセットとを対応付けてキーワード管理部１０８に出力する。なお、類似度の指標値は、内積に限られず２つのベクトル間の差分二乗和、絶対値和、等であってもよい。差分二乗和、絶対値は、いずれも、その値が小さいほど２つのベクトルの類似度が高いことを示す指標値である。 The similarity calculation unit 107 receives a set of a sentence ID and a sentence vector and a set of words and word vectors of the sentence from the sentence management unit 101. The similarity calculation unit 107 calculates, for example, an inner product as a sentence vector and a word vector similarity for each word of the sentence. The inner product is an index value having a larger value as the two similarities are higher. The maximum value of the inner product is 1, and the minimum value is 0. The similarity calculation unit 107 selects, as a keyword, a word that is higher than a predetermined similarity threshold (for example, 0.7) from words having the highest similarity. Therefore, the number of keywords selected in one sentence can be one or zero. Note that the similarity calculation unit 107 may select a predetermined number of words as keywords in descending order of similarity from the word with the highest similarity among such words or all words in the sentence. The similarity calculation unit 107 associates the sentence ID with the selected keyword, the keyword vector that is the word vector, and the set of the similarity, and outputs them to the keyword management unit 108. The similarity index value is not limited to an inner product, and may be a sum of squares of differences between two vectors, a sum of absolute values, or the like. The difference sum of squares and the absolute value are both index values indicating that the smaller the value, the higher the similarity between the two vectors.

キーワード管理部１０８は、類似度算出部１０７から入力される文章ＩＤ、キーワード、キーワードベクトル及び類似度を対応付けて記憶する。これにより、キーワード管理部１０８には、図１５に例示されるように、文章ＩＤ、キーワード、キーワードベクトル及び類似度を対応付けてなるキーワードデータが形成される。 The keyword management unit 108 stores the sentence ID, the keyword, the keyword vector, and the similarity input from the similarity calculation unit 107 in association with each other. As a result, as illustrated in FIG. 15, keyword data in which the sentence ID, the keyword, the keyword vector, and the similarity are associated with each other is formed in the keyword management unit 108.

図１６は、本実施形態に係るキーワード取得処理の一例を示すフローチャートである。
（ステップＳ１４１）文章管理部１０１は、自部に記憶された文章データが示す文章のうち未処理の文章の有無を判定する。あると判定されるとき（ステップＳ１４１ＹＥＳ）、ステップＳ１４２の処理に進む。ないと判定されるとき（ステップＳ１４１ＮＯ）、図１６に示す処理を終了する。 FIG. 16 is a flowchart illustrating an example of a keyword acquisition process according to the present embodiment.
(Step S141) The sentence management unit 101 determines whether there is an unprocessed sentence among sentences indicated by the sentence data stored in the own part. When it is determined that there is (YES in step S141), the process proceeds to step S142. When it is determined that there is no (NO in step S141), the process shown in FIG. 16 is terminated.

（ステップＳ１４２）文章管理部１０１は、自部に記憶された文章ベクトルデータが示す未処理のいずれかの文章の文章ＩＤと文章ベクトルのセットと、単語ベクトルデータが示す単語のうち、その文章を構成する単語と単語ベクトルのセットとを類似度算出部１０７に出力する。その後、ステップＳ１４３の処理に進む。
（ステップＳ１４３）類似度算出部１０７は、文章管理部１０１から入力された文章ベクトルと、その文章の単語毎に単語ベクトルの類似度を算出する。類似度算出部１０７は、類似度が所定の類似度の閾値（例えば、０．７）よりも高い単語をキーワードとして選択する。その後、ステップＳ１４４の処理に進む。 (Step S142) The sentence management unit 101 sets a sentence ID and sentence vector set of any unprocessed sentence indicated by the sentence vector data stored in its own part, and the sentence among the words indicated by the word vector data. The constituent words and the set of word vectors are output to the similarity calculation unit 107. Thereafter, the process proceeds to step S143.
(Step S143) The similarity calculation unit 107 calculates the similarity between a sentence vector input from the sentence management unit 101 and a word vector for each word of the sentence. The similarity calculation unit 107 selects a word whose similarity is higher than a predetermined similarity threshold (for example, 0.7) as a keyword. Thereafter, the process proceeds to step S144.

（ステップＳ１４４）類似度算出部１０７は、入力された文章ＩＤ、選択したキーワード、その単語ベクトルであるキーワードベクトル、及びそのキーワードの類似度をキーワード管理部１０８に出力する。その後、ステップＳ１４５の処理に進む。
（ステップＳ１４５）キーワード管理部１０８は、類似度算出部１０７から入力された文章ＩＤ、キーワード、キーワードベクトル及び類似度を関連付けて記憶（管理）する。その後、処理対象の文章を他の未処理の文章に変更して、ステップＳ１４１の処理に進む。 (Step S144) The similarity calculation unit 107 outputs the input sentence ID, the selected keyword, the keyword vector that is the word vector, and the similarity of the keyword to the keyword management unit 108. Thereafter, the process proceeds to step S145.
(Step S145) The keyword management unit 108 stores (manages) the sentence ID, the keyword, the keyword vector, and the similarity input from the similarity calculation unit 107 in association with each other. Thereafter, the text to be processed is changed to another unprocessed text, and the process proceeds to step S141.

（キーワードの選択例）
図１８は、本実施形態に係る情報処理装置１０によるキーワードの選択例を示す表である。図１８に示すキーワードは、図１７に示す文章（総務省、平成２６年度版情報通信白書、第６章情報通信政策の動向、第４３２頁）から、図１６に示す処理を実行することにより選択されたキーワードである。図１７に示す文章は、ＩＣＴ（ＩｎｆｏｒｍａｔｉｏｎａｎｄＣｏｍｍｕｎｉｃａｔｉｏｎＴｅｃｈｎｏｌｏｇｙ）が気候変動に与える影響を主旨とする。図１８に示すキーワードは、類似度の降順に温室効果ガス、気候変動、地球温暖化、ＩＣＴ、連携など、いずれもその文章の主旨に関連する単語である。この処理結果は、処理対象の文章から、その文章の主旨に関連する単語が抽出されたことを示す。 (Keyword selection example)
FIG. 18 is a table showing an example of keyword selection by the information processing apparatus 10 according to the present embodiment. The keywords shown in FIG. 18 are selected by executing the processing shown in FIG. 16 from the text shown in FIG. 17 (Ministry of Internal Affairs and Communications, 2014 White Paper on Information and Communication, Chapter 6 Information and Communication Policy Trends, page 432). Keywords. The text shown in FIG. 17 is mainly about the influence of ICT (Information and Communication Technology) on climate change. The keywords shown in FIG. 18 are words related to the gist of the sentence, such as greenhouse gas, climate change, global warming, ICT, cooperation, etc. in descending order of similarity. This processing result indicates that a word related to the gist of the sentence has been extracted from the sentence to be processed.

以上に説明したように、本実施形態に係る情報処理装置１０は、文章の特徴を示す文章ベクトルと、文章に含まれる単語毎の特徴を示す単語ベクトルを算出する特徴ベクトル算出部として、Ｄｏｃ２Ｖｅｃ部１０３、誤差算出部１０４、及びＷｅｉｇｈｔ算出部１０５を備える。また、情報処理装置１０は、文章ベクトルと単語ベクトルとの類似度を算出し、類似度に基づいて文章に含まれる一部の単語を抽出する類似度算出部１０７を備える。
この構成により、文章に含まれる単語のうち、その文章の特徴を示す文章ベクトルと、その文章の特徴を示す単語ベクトルとの類似度に基づいて、その文章の一部の単語が選択される。そのため、その文章と他の単語よりも類似度が高い一部の単語が、その文章の特徴を代表するキーワードとして選択される。従って、複雑な統計処理を行わなくとも代数的な手法により文章の内容を代表する語句が簡素な処理により取得される。 As described above, the information processing apparatus 10 according to the present embodiment includes the Doc2Vec unit as a feature vector calculation unit that calculates a sentence vector indicating a feature of a sentence and a word vector indicating a feature for each word included in the sentence. 103, an error calculation unit 104, and a weight calculation unit 105. The information processing apparatus 10 further includes a similarity calculation unit 107 that calculates the similarity between a text vector and a word vector, and extracts some words included in the text based on the similarity.
With this configuration, among the words included in the sentence, some words of the sentence are selected based on the similarity between the sentence vector indicating the characteristic of the sentence and the word vector indicating the characteristic of the sentence. For this reason, some words having a higher degree of similarity than the sentence and other words are selected as keywords representing the characteristics of the sentence. Therefore, a phrase representing the content of a sentence can be obtained by a simple process by an algebraic method without performing complicated statistical processing.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。第１の実施形態と同一の構成もしくは処理については、同一の符号を付してその説明を援用する。
図１９は、本実施形態に係る情報処理装置１０の構成例を示すブロック図である。
本実施形態に係る情報処理装置１０において、データ管理部１０ａは、文章管理部１０１及びキーワード管理部１０８の他、さらに主旨ベクトル管理部１１０を備える。データ処理部１０ｂは、分離部１０２、Ｄｏｃ２ｖｅｃ部１０３、誤差算出部１０４、Ｗｅｉｇｈｔ算出部１０５、特徴量取得部１０６、及び類似度算出部１０７の他、さらに特徴量算出部１０９を備える。 (Second Embodiment)
Next, a second embodiment of the present invention will be described. About the same structure or process as 1st Embodiment, the same code | symbol is attached | subjected and the description is used.
FIG. 19 is a block diagram illustrating a configuration example of the information processing apparatus 10 according to the present embodiment.
In the information processing apparatus 10 according to the present embodiment, the data management unit 10 a further includes a gist vector management unit 110 in addition to the text management unit 101 and the keyword management unit 108. The data processing unit 10 b includes a feature amount calculation unit 109 in addition to the separation unit 102, the Doc2vec unit 103, the error calculation unit 104, the weight calculation unit 105, the feature amount acquisition unit 106, and the similarity calculation unit 107.

（主旨ベクトル取得処理）
次に、本実施形態に係る情報処理装置１０が行う主旨ベクトル取得処理について説明する。図２０は、本実施形態に係る主旨ベクトル取得処理の一例に係る機能ブロック図である。
キーワード管理部１０８には、予めキーワードデータを記憶させておく。本実施形態では、キーワードデータが示すキーワードの取得に用いられた文章として、コンテンツ自体を構成する文章、又は、そのコンテンツに付随した文章が用いられる。コンテンツは、映像、音声、テキストなどのいずれか、又はそれらの組み合わせである。コンテンツは、例えば、テレビジョン放送番組、ビデオ・オン・デマンド（ＶＯＤ：Ｖｉｄｅｏ−ｏｎ−Ｄｅｍａｎｄ）コンテンツ、広告映像、楽曲、などがある。コンテンツ自体を構成する文章には、例えば、字幕、文字放送で放送される文字、ウェブページを構成するテキスト情報などが含まれる。コンテンツに付随した文章には、例えば、そのコンテンツに付随して送信される宣伝メッセージ、要約、解説文などの各種のテキスト情報などが含まれる。以下に説明する例では、図１５に示す文章ＩＤに代えて、コンテンツＩＤを用いる。 (Subject vector acquisition process)
Next, the gist vector acquisition process performed by the information processing apparatus 10 according to the present embodiment will be described. FIG. 20 is a functional block diagram according to an example of the gist vector acquisition process according to the present embodiment.
The keyword management unit 108 stores keyword data in advance. In the present embodiment, a sentence constituting the content itself or a sentence associated with the content is used as the sentence used for acquiring the keyword indicated by the keyword data. The content is any one of video, audio, text, or a combination thereof. The content includes, for example, a television broadcast program, video-on-demand (VOD) content, advertisement video, music, and the like. The text composing the content itself includes, for example, subtitles, characters broadcast by teletext, text information composing a web page, and the like. The text accompanying the content includes, for example, various kinds of text information such as an advertisement message, a summary, and an explanatory text transmitted along with the content. In the example described below, a content ID is used instead of the sentence ID shown in FIG.

キーワード管理部１０８は、キーワードデータが示すコンテンツ毎にコンテンツＩＤと、取得された各キーワードのキーワードベクトルを特徴量算出部１０９に出力する。
特徴量算出部１０９は、キーワード管理部１０８から入力されたコンテンツ毎に各キーワードベクトルの総和を主旨ベクトルとして算出する。特徴量算出部１０９は、入力されたコンテンツＩＤと算出した主旨ベクトルを主旨ベクトル管理部１１０に出力する。従って、特徴量算出部１０９は、主旨ベクトルといった、その文章の主旨を示す特徴ベクトルを算出する特徴ベクトル算出部として機能する。 The keyword management unit 108 outputs the content ID and the acquired keyword vector of each keyword to the feature amount calculation unit 109 for each content indicated by the keyword data.
The feature amount calculation unit 109 calculates the sum of the keyword vectors as the main vector for each content input from the keyword management unit 108. The feature amount calculation unit 109 outputs the input content ID and the calculated main vector to the main vector management unit 110. Therefore, the feature amount calculation unit 109 functions as a feature vector calculation unit that calculates a feature vector indicating the main point of the sentence, such as the main point vector.

主旨ベクトル管理部１１０は、特徴量算出部１０９から入力されたコンテンツＩＤと主旨ベクトルとを対応付けて記憶する。これにより、主旨ベクトル管理部１１０には、図２１に例示されるように、コンテンツＩＤ及び主旨ベクトルを対応付けてなる主旨ベクトルデータが形成される。 The main vector managing unit 110 stores the content ID input from the feature amount calculating unit 109 and the main vector in association with each other. As a result, as illustrated in FIG. 21, the subject vector management unit 110 forms the subject vector data in which the content ID and the subject vector are associated with each other.

図２２は、本実施形態に係る主旨ベクトル取得処理の一例を示すフローチャートである。
（ステップＳ１５１）文章管理部１０１は、自部に記憶されたキーワードデータが示すコンテンツの文章のうち未処理のコンテンツのコンテンツＩＤの有無を判定する。あると判定されるとき（ステップＳ１５１ＹＥＳ）、ステップＳ１５２の処理に進む。ないと判定されるとき（ステップＳ１５１ＮＯ）、図２２に示す処理を終了する。 FIG. 22 is a flowchart showing an example of the gist vector acquisition process according to the present embodiment.
(Step S151) The sentence management unit 101 determines whether or not there is a content ID of unprocessed content among the sentences of the content indicated by the keyword data stored in itself. When it is determined that there is (YES in step S151), the process proceeds to step S152. When it is determined that there is no (NO in step S151), the process shown in FIG.

（ステップＳ１５２）文章管理部１０１は、自部に記憶されたキーワードデータが示す未処理のいずれかのコンテンツＩＤと、そのコンテンツＩＤに対応付けられたキーワードベクトルを特定し、それらを特徴量算出部１０９に出力する。その後、ステップＳ１５３の処理に進む。
（ステップＳ１５３）特徴量算出部１０９は、キーワード管理部１０８から入力されたコンテンツ毎に各キーワードベクトルの総和を主旨ベクトルとして算出する。その後、ステップＳ１５４の処理に進む。 (Step S152) The sentence management unit 101 identifies any unprocessed content ID indicated by the keyword data stored in the own unit and a keyword vector associated with the content ID, and uses them as a feature amount calculation unit. Output to 109. Thereafter, the process proceeds to step S153.
(Step S153) The feature amount calculation unit 109 calculates the sum of the keyword vectors as the main vector for each content input from the keyword management unit 108. Thereafter, the process proceeds to step S154.

（ステップＳ１５４）特徴量算出部１０９は、入力されたコンテンツＩＤと算出した主旨ベクトルを主旨ベクトル管理部１１０に出力する。その後、ステップＳ１５５の処理に進む。
（ステップＳ１５５）主旨ベクトル管理部１１０は、特徴量算出部１０９から入力されたコンテンツＩＤと主旨ベクトルを関連付けて記憶（管理）する。その後、処理対象のコンテンツを他の未処理のコンテンツのコンテンツＩＤに変更して、ステップＳ１５１の処理に進む。 (Step S154) The feature amount calculation unit 109 outputs the input content ID and the calculated main vector to the main vector management unit 110. Thereafter, the process proceeds to step S155.
(Step S155) The gist vector management unit 110 associates and stores (manages) the content ID input from the feature amount calculation unit 109 and the gist vector. Thereafter, the content to be processed is changed to the content ID of other unprocessed content, and the process proceeds to step S151.

以上に説明したように、本実施形態に係る情報処理装置１０において特徴ベクトル算出部として、特徴量算出部１０９を備える。特徴量算出部１０９は、文章から抽出された一部の単語の単語ベクトルに基づいて文章の主旨を示す主旨ベクトルを算出する。
この構成により、文章の主旨を示す一部の単語であるキーワードの特徴を示す単語ベクトルに基づいて主旨ベクトルが算出される。そのため、文章の全体を示す文章ベクトルよりも、的確にその主旨の特徴が定量化される。 As described above, the information processing apparatus 10 according to the present embodiment includes the feature amount calculation unit 109 as the feature vector calculation unit. The feature amount calculation unit 109 calculates a main point vector indicating the main point of the sentence based on word vectors of some words extracted from the sentence.
With this configuration, the gist vector is calculated based on a word vector that represents the characteristics of the keyword, which is a partial word that represents the gist of the sentence. Therefore, the feature of the subject is quantified more accurately than the sentence vector indicating the whole sentence.

（第３の実施形態）
次に、本発明の第３の実施形態について説明する。上述の実施形態と同一の構成もしくは処理については、同一の符号を付してその説明を援用する。
図２３は、本実施形態に係る情報処理装置１０の構成例を示すブロック図である。
本実施形態に係る情報処理装置１０において、データ管理部１０ａは、文章管理部１０１、キーワード管理部１０８、及び主旨ベクトル管理部１１０の他、視聴者管理部１１１を備える。データ処理部１０ｂは、分離部１０２、Ｄｏｃ２ｖｅｃ部１０３、誤差算出部１０４、Ｗｅｉｇｈｔ算出部１０５、特徴量取得部１０６、類似度算出部１０７、及び特徴量算出部１０９を含んで構成される。 (Third embodiment)
Next, a third embodiment of the present invention will be described. About the same structure or process as the above-mentioned embodiment, the same code | symbol is attached | subjected and the description is used.
FIG. 23 is a block diagram illustrating a configuration example of the information processing apparatus 10 according to the present embodiment.
In the information processing apparatus 10 according to the present embodiment, the data management unit 10a includes a viewer management unit 111 in addition to the text management unit 101, the keyword management unit 108, and the main vector management unit 110. The data processing unit 10b includes a separation unit 102, a Doc2vec unit 103, an error calculation unit 104, a weight calculation unit 105, a feature amount acquisition unit 106, a similarity calculation unit 107, and a feature amount calculation unit 109.

（視聴ベクトル取得処理）
次に、本実施形態に係る情報処理装置１０が行う視聴ベクトル取得処理について説明する。図２４は、本実施形態に係る視聴ベクトル取得処理の一例に係る機能ブロック図である。
視聴者管理部１１１には、予め視聴データを記憶させておく。視聴データは、視聴者毎にコンテンツの視聴の有無を示すデータである。視聴データは、図２５に例示されるように、視聴者ＩＤ、コンテンツＩＤ及び視聴フラグのセットを集積して形成される。視聴者ＩＤは、受信装置２０（図２８）のユーザである個々の視聴者を特定する情報である。コンテンツＩＤは、個々のコンテンツを特定する情報である。視聴フラグは、その視聴者に視聴されたか否かを示す情報である。図２５に示す例では、視聴フラグは、視聴済又は未視聴を示す。視聴者管理部１１１は、各受信装置２０から受信した視聴者ＩＤと視聴するコンテンツのコンテンツＩＤからなる視聴情報に基づいて視聴データを生成してもよい。ここで、視聴者管理部１１１は、所定のコンテンツのうち、受信した視聴情報に基づいて視聴者ＩＤ毎に視聴されたコンテンツを特定し、特定したコンテンツについて視聴フラグの値を視聴済と定め、それ以外のコンテンツについて視聴フラグの値を未視聴としておく。そして、視聴者管理部１１１は、視聴者ＩＤ毎に所定のコンテンツ毎のコンテンツＩＤと設定した視聴フラグを対応付けて視聴データを形成する。 (Viewing vector acquisition processing)
Next, viewing vector acquisition processing performed by the information processing apparatus 10 according to the present embodiment will be described. FIG. 24 is a functional block diagram according to an example of viewing vector acquisition processing according to the present embodiment.
The viewer management unit 111 stores viewing data in advance. The viewing data is data indicating whether or not content is viewed for each viewer. As shown in FIG. 25, the viewing data is formed by accumulating a set of viewer ID, content ID, and viewing flag. The viewer ID is information for identifying individual viewers who are users of the receiving device 20 (FIG. 28). The content ID is information for specifying individual content. The viewing flag is information indicating whether or not the viewing is viewed by the viewer. In the example illustrated in FIG. 25, the viewing flag indicates viewing or non-viewing. The viewer management unit 111 may generate viewing data based on viewing information including the viewer ID received from each receiving device 20 and the content ID of the content to be viewed. Here, the viewer management unit 111 identifies the content viewed for each viewer ID based on the received viewing information among the predetermined content, determines the value of the viewing flag for the identified content as viewed, For other contents, the value of the viewing flag is not viewed. Then, the viewer management unit 111 forms viewing data by associating the content ID for each predetermined content with the set viewing flag for each viewer ID.

また、受信装置２０から受信した視聴情報に、その受信装置２０においてコンテンツが受信された受信時間の情報が含まれる場合には、受信時間に基づいて視聴の有無を判定してもよい。ここで、受信時間とは、そのコンテンツが提示された時間であってもよいし、記録（録画又は録音）された時間であってもよい。視聴者管理部１１１は、その受信時間のコンテンツの所要時間に対する比が所定値（例えば、１／４〜１／２）未満であるとき、そのコンテンツを未視聴と判定し、その比が所定値以上であるとき、そのコンテンツを視聴済と判定してもよい。視聴者管理部１１１は、受信時間が所定時間（例えば、３〜１０分）未満である場合に、そのコンテンツを未視聴と判定し、受信時間が所定時間以上である場合、そのコンテンツを視聴済と判定してもよい。そして、視聴者管理部１１１は、判定した視聴状態、つまり、未視聴であるか視聴済であるかを視聴フラグの値として定める。視聴者管理部１１１は、視聴者ＩＤとそのコンテンツのコンテンツＩＤと視聴フラグを対応付けて視聴データを生成する。 If the viewing information received from the receiving device 20 includes information about the reception time when the content is received by the receiving device 20, the presence or absence of viewing may be determined based on the reception time. Here, the reception time may be the time when the content is presented or the time when recording (recording or recording) is performed. When the ratio of the reception time to the required time of the content is less than a predetermined value (for example, 1/4 to 1/2), the viewer management unit 111 determines that the content is not viewed, and the ratio is a predetermined value. When it is above, it may be determined that the content has been viewed. The viewer management unit 111 determines that the content has not been viewed when the reception time is less than a predetermined time (for example, 3 to 10 minutes), and has already viewed the content when the reception time is equal to or longer than the predetermined time. May be determined. Then, the viewer management unit 111 determines the determined viewing state, that is, whether it is unviewed or viewed as the value of the viewing flag. The viewer management unit 111 generates viewing data by associating the viewer ID, the content ID of the content, and the viewing flag.

視聴者管理部１１１は、視聴データが示す視聴者毎の視聴者ＩＤと、対応付けられた視聴フラグが視聴済を示すコンテンツのコンテンツＩＤを特徴量取得部１０６に出力する。
特徴量取得部１０６は、視聴者管理部１１１から入力された視聴者毎のコンテンツＩＤを主旨ベクトル管理部１１０に出力し、その応答としてコンテンツＩＤに対応する主旨ベクトルを主旨ベクトル管理部１１０から入力される。ここで、主旨ベクトル管理部１１０は、自部に記憶された主旨ベクトルデータを参照して、特徴量取得部１０６から入力されたコンテンツＩＤに対応する主旨ベクトルを特徴量取得部１０６に出力する。特徴量取得部１０６は、視聴者管理部１１１から入力された視聴者ＩＤと主旨ベクトル管理部１１０から入力された主旨ベクトルを対応付けて特徴量算出部１０９に出力する。 The viewer management unit 111 outputs the viewer ID for each viewer indicated by the viewing data and the content ID of the content for which the associated viewing flag indicates that the viewing has been completed, to the feature amount acquisition unit 106.
The feature amount acquisition unit 106 outputs the content ID for each viewer input from the viewer management unit 111 to the main vector management unit 110, and inputs the main vector corresponding to the content ID from the main vector management unit 110 as a response thereto. Is done. Here, the gist vector management unit 110 refers to the gist vector data stored in the own unit, and outputs a gist vector corresponding to the content ID input from the feature quantity acquisition unit 106 to the feature quantity acquisition unit 106. The feature amount acquisition unit 106 associates the viewer ID input from the viewer management unit 111 with the main point vector input from the main point vector management unit 110 and outputs them to the feature amount calculation unit 109.

特徴量算出部１０９は、特徴量取得部１０６から入力された各主旨ベクトルの総和を視聴ベクトルとして算出する。特徴量算出部１０９は、入力された視聴者ＩＤと算出した視聴ベクトルを視聴者管理部１１１に出力する。従って、特徴量算出部１０９は、視聴ベクトルといった視聴されたコンテンツに関する文章の意味を定量的に示す特徴ベクトルを合成する特徴ベクトル算出部として機能する。 The feature amount calculation unit 109 calculates the sum of the main vector input from the feature amount acquisition unit 106 as a viewing vector. The feature amount calculation unit 109 outputs the input viewer ID and the calculated viewing vector to the viewer management unit 111. Therefore, the feature amount calculation unit 109 functions as a feature vector calculation unit that synthesizes a feature vector that quantitatively indicates the meaning of a sentence related to the viewed content such as a viewing vector.

視聴者管理部１１１は、特徴量算出部１０９から入力された視聴者ＩＤと視聴ベクトルとを対応付けて記憶する。これにより、視聴者管理部１１１には、図２６に例示されるように、視聴者ＩＤ及び視聴ベクトルを対応付けてなる視聴ベクトルデータが形成される。 The viewer management unit 111 stores the viewer ID and the viewing vector input from the feature amount calculation unit 109 in association with each other. Thereby, viewing vector data in which the viewer ID and the viewing vector are associated with each other is formed in the viewer management unit 111 as illustrated in FIG.

図２７は、本実施形態に係る視聴ベクトル取得処理の一例を示すフローチャートである。
（ステップＳ１６１）視聴者管理部１１１は、自部に記憶された視聴データが示す視聴者ＩＤのうち未処理の視聴者ＩＤの有無を判定する。あると判定されるとき（ステップＳ１６１ＹＥＳ）、ステップＳ１６２の処理に進む。ないと判定されるとき（ステップＳ１６１ＮＯ）、図２７に示す処理を終了する。 FIG. 27 is a flowchart illustrating an example of a viewing vector acquisition process according to the present embodiment.
(Step S161) The viewer management unit 111 determines whether or not there is an unprocessed viewer ID among the viewer IDs indicated by the viewing data stored in the own unit. When it is determined that there is (YES in step S161), the process proceeds to step S162. When it is determined that there is no (NO in step S161), the process shown in FIG. 27 is terminated.

（ステップＳ１６２）視聴者管理部１１１は、自部に記憶された視聴データを参照して、未処理の視聴者ＩＤのいずれかと関連付けられている全ての視聴済のコンテンツのコンテンツＩＤを特定する。視聴者管理部１１１は、その視聴者ＩＤと特定したコンテンツＩＤを特徴量取得部１０６に出力する。その後、ステップＳ１６３の処理に進む。
（ステップＳ１６３）特徴量取得部１０６は、視聴者管理部１１１から入力されたコンテンツＩＤを主旨ベクトル管理部１１０に出力し、その応答としてそのコンテンツＩＤに関連付けられた主旨ベクトルが主旨ベクトル管理部１１０から入力される。特徴量取得部１０６は、視聴者管理部１１１から入力された視聴者ＩＤと主旨ベクトル管理部１１０から入力された主旨ベクトルを特徴量算出部１０９に出力する。その後、ステップＳ１６４の処理に進む。 (Step S162) The viewer management unit 111 refers to the viewing data stored in the own unit, and identifies the content IDs of all the viewed content associated with any of the unprocessed viewer IDs. The viewer management unit 111 outputs the viewer ID and the identified content ID to the feature amount acquisition unit 106. Thereafter, the process proceeds to step S163.
(Step S163) The feature amount acquisition unit 106 outputs the content ID input from the viewer management unit 111 to the main vector management unit 110, and the main vector associated with the content ID as a response is the main vector management unit 110. It is input from. The feature amount acquisition unit 106 outputs the viewer ID input from the viewer management unit 111 and the purpose vector input from the purpose vector management unit 110 to the feature amount calculation unit 109. Thereafter, the process proceeds to step S164.

（ステップＳ１６４）特徴量算出部１０９は、特徴量取得部１０６から入力された主旨ベクトルの総和を視聴ベクトルとして算出する。その後、ステップＳ１６５の処理に進む。
（ステップＳ１６５）特徴量算出部１０９は、特徴量取得部１０６から入力された視聴者ＩＤと算出した視聴ベクトルとを対応付けて視聴者管理部１１１に出力する。
（ステップＳ１６６）視聴者管理部１１１は、特徴量算出部１０９から入力された視聴者ＩＤと、特徴量算出部１０９から入力された視聴ベクトルとを関連付けて記憶（管理）する。その後、処理対象の視聴者ＩＤを他の未処理の視聴者ＩＤに変更して、ステップＳ１６１の処理に進む。 (Step S164) The feature amount calculation unit 109 calculates the sum of the main vector input from the feature amount acquisition unit 106 as a viewing vector. Thereafter, the process proceeds to step S165.
(Step S165) The feature amount calculation unit 109 associates the viewer ID input from the feature amount acquisition unit 106 with the calculated viewing vector, and outputs it to the viewer management unit 111.
(Step S166) The viewer management unit 111 stores (manages) the viewer ID input from the feature amount calculation unit 109 and the viewing vector input from the feature amount calculation unit 109 in association with each other. Thereafter, the processing target viewer ID is changed to another unprocessed viewer ID, and the process proceeds to step S161.

なお、視聴者ＩＤは、視聴に関する推薦単位として個々の視聴者と対応付けられてもよいし、個々の受信装置と対応付けられてもよいし、コンテンツ提供者との契約者が推薦単位として対応付けられてもよい。視聴者ＩＤが共通の視聴者について複数の受信装置と対応付けられている場合には、視聴データ及び視聴ベクトルデータは、視聴者もしくは視聴者ＩＤが共通する複数の受信装置において視聴されたコンテンツに関するデータが集積されて形成されてもよい。また、視聴者ＩＤに代えて、個々の受信装置を識別する機器ＩＤ又は個々の契約者を識別する契約者ＩＤが推薦単位を示す識別情報として用いられてもよい。また、ステップＳ１６４において、特徴量算出部１０９は、コンテンツに係る文章毎の主旨ベクトルに代えて、その文章毎の特徴を示す文章ベクトルの総和を主旨ベクトルとして算出してもよい。 Note that the viewer ID may be associated with individual viewers as a recommended unit for viewing, may be associated with individual receiving devices, or a contractor with a content provider corresponds as a recommended unit. It may be attached. In the case where viewers with a common viewer ID are associated with a plurality of receiving devices, the viewing data and viewing vector data relate to content viewed by a viewer or a plurality of receiving devices with a common viewer ID. Data may be integrated and formed. Further, instead of the viewer ID, a device ID for identifying each receiving device or a contractor ID for identifying each contractor may be used as identification information indicating a recommended unit. Further, in step S164, the feature amount calculation unit 109 may calculate the sum of the text vectors indicating the features for each sentence as the main vector instead of the main vector for each sentence related to the content.

以上に説明したように、本実施形態に係る情報処理装置１０において、特徴量ベクトル算出部として特徴量算出部１０９は、視聴されたコンテンツに係る文章について主旨ベクトル又は文章ベクトルを算出する。そして、特徴量算出部１０９は、算出した主旨ベクトル又は文章ベクトルを推薦単位毎に視聴されたコンテンツ間で合成して視聴ベクトルを算出する。
この構成により、推薦単位毎に視聴されたコンテンツに係る文章全体の特徴を示す視聴ベクトルが算出される。そのため、視聴単位毎に視聴されたコンテンツの特徴が大量の情報もしくは演算を要する複雑な統計的な手法に頼らずに、簡素な処理により定量化される。 As described above, in the information processing apparatus 10 according to the present embodiment, the feature amount calculation unit 109 as the feature amount vector calculation unit calculates a gist vector or a sentence vector for the sentence related to the viewed content. Then, the feature quantity calculation unit 109 calculates the viewing vector by synthesizing the calculated gist vector or sentence vector between the viewed contents for each recommendation unit.
With this configuration, a viewing vector indicating the characteristics of the entire sentence related to the content viewed for each recommendation unit is calculated. Therefore, the feature of the content viewed for each viewing unit is quantified by a simple process without relying on a large amount of information or a complicated statistical method requiring computation.

（第４の実施形態）
次に、本発明の第４の実施形態について説明する。上述の実施形態と同一の構成もしくは処理については、同一の符号を付してその説明を援用する。
図２８は、本実施形態に係る情報処理システム１の構成例を示すブロック図である。
情報処理システム１は、情報処理装置１０と、受信装置２０とを含んで構成される。情報処理装置１０と受信装置２０との間は、ネットワークＮＷで接続され、相互間において各種のデータが送受信可能である。ネットワークＮＷは、例えば、インターネット、公衆通信網、などの広域通信網、構内通信網、専用回線のいずれか又は任意の組み合わせで構成される通信伝送路である。ネットワークＮＷは、無線、有線のいずれか、又はその組み合わせであってもよい。図２８に示す受信装置２０の個数は、１個であるが一般的には複数となる。また、図２８に示す情報処理装置１０の個数も１個であるが、複数であってもよい。情報処理装置１０は、受信装置２０におけるコンテンツの視聴状態を示す視聴情報に基づいて推薦コンテンツを定め、その推薦コンテンツのコンテンツデータ、又は推薦コンテンツの情報である推薦コンテンツ情報を受信装置２０に提供する。 (Fourth embodiment)
Next, a fourth embodiment of the present invention will be described. About the same structure or process as the above-mentioned embodiment, the same code | symbol is attached | subjected and the description is used.
FIG. 28 is a block diagram illustrating a configuration example of the information processing system 1 according to the present embodiment.
The information processing system 1 includes an information processing device 10 and a receiving device 20. The information processing apparatus 10 and the receiving apparatus 20 are connected by a network NW, and various data can be transmitted and received between them. The network NW is, for example, a communication transmission path configured by any one or any combination of a wide area communication network such as the Internet and a public communication network, a private communication network, and a dedicated line. The network NW may be wireless, wired, or a combination thereof. Although the number of the receiving apparatuses 20 shown in FIG. 28 is one, it is generally plural. In addition, the number of information processing apparatuses 10 illustrated in FIG. 28 is one, but may be more than one. The information processing apparatus 10 determines recommended content based on viewing information indicating the viewing state of the content in the receiving device 20, and provides the receiving device 20 with content data of the recommended content or recommended content information that is information on the recommended content. .

受信装置２０は、コンテンツ提供者の設備（図示せず）から伝送された各種のコンテンツから、ユーザである視聴者の操作に基づいて任意に選択したコンテンツのコンテンツデータを受信する。受信装置２０は、受信したコンテンツデータに係るコンテンツを提示する。コンテンツの提示とは、そのコンテンツを構成する映像、テキストを表示すること、音声を再生することを意味する。受信装置２０は、受信したコンテンツデータのコンテンツＩＤと、受信時間と、その受信装置２０に係る視聴者ＩＤを含んで構成される視聴情報を、ネットワークＮＷを介して送信する。コンテンツＩＤは、個々のコンテンツを特定する識別情報である。通信で伝送されるＶＯＤコンテンツについては、そのコンテンツを特定するコンテンツＩＤが与えられる。放送番組については、例えば、その放送番組を特定する番組ＩＤがコンテンツＩＤに該当する。なお、コンテンツＩＤに代えて、放送チャンネル及び放送時間帯のセットなど、コンテンツを一意に特定できる情報が用いられてもよい。また、本明細書では、視聴とは、コンテンツの映像もしくはテキストの視認と、音声の受聴の一方又は両方の他、その映像もしくはテキストの表示と、音声の再生の一方又は両方を意味することがある。 The receiving device 20 receives content data of content arbitrarily selected based on an operation of a viewer who is a user, from various types of content transmitted from a content provider's equipment (not shown). The receiving device 20 presents content related to the received content data. The presentation of content means displaying video and text constituting the content and playing back sound. The receiving device 20 transmits the viewing information including the content ID of the received content data, the reception time, and the viewer ID related to the receiving device 20 via the network NW. The content ID is identification information that identifies individual content. A VOD content transmitted by communication is given a content ID that identifies the content. For a broadcast program, for example, a program ID that identifies the broadcast program corresponds to the content ID. Instead of the content ID, information that can uniquely identify the content, such as a set of a broadcast channel and a broadcast time zone, may be used. In addition, in this specification, viewing means one or both of viewing and / or listening to the video or text of the content and listening to the audio, as well as displaying or displaying the video or text. is there.

受信装置２０は、例えば、コンテンツ提供業者のサーバ装置（図示せず）からネットワークＮＷを介して映像、音声、テキスト、又はそれらを組み合わせてなる各種のコンテンツのコンテンツデータを受信する通信端末装置であってもよい。通信端末装置は、例えば、多機能携帯電話機（いわゆるスマートフォンを含む）、タブレット端末装置、パーソナルコンピュータ、などである。受信装置２０は、例えば、コンテンツとして放送番組の映像データと音声データを受信する専用のテレビジョン受信装置であってもよいし、テレビジョン放送の番組データを受信可能なチューナを備える汎用の通信端末装置であってもよい。 The receiving device 20 is, for example, a communication terminal device that receives content data of various contents including video, audio, text, or a combination thereof from a server device (not shown) of a content provider via a network NW. May be. The communication terminal device is, for example, a multi-function mobile phone (including a so-called smartphone), a tablet terminal device, a personal computer, or the like. The receiving device 20 may be, for example, a dedicated television receiving device that receives video data and audio data of a broadcast program as content, or a general-purpose communication terminal including a tuner that can receive program data of a television broadcast It may be a device.

次に、本実施形態に係る情報処理装置１０の構成について説明する。
図２９は、本実施形態に係る情報処理装置１０の構成例を示すブロック図である。
情報処理装置１０において、データ管理部１０ａは、文章管理部１０１、キーワード管理部１０８、主旨ベクトル管理部１１０、及び視聴者管理部１１１の他、推薦管理部１１３を備える。データ処理部１０ｂは、分離部１０２、Ｄｏｃ２ｖｅｃ部１０３、誤差算出部１０４、Ｗｅｉｇｈｔ算出部１０５、特徴量取得部１０６、類似度算出部１０７、及び特徴量算出部１０９を含んで構成される。また、情報処理装置１０は、さらにコンテンツ配信部１０ｃを含んで構成される。 Next, the configuration of the information processing apparatus 10 according to the present embodiment will be described.
FIG. 29 is a block diagram illustrating a configuration example of the information processing apparatus 10 according to the present embodiment.
In the information processing apparatus 10, the data management unit 10 a includes a recommendation management unit 113 in addition to the sentence management unit 101, the keyword management unit 108, the gist vector management unit 110, and the viewer management unit 111. The data processing unit 10b includes a separation unit 102, a Doc2vec unit 103, an error calculation unit 104, a weight calculation unit 105, a feature amount acquisition unit 106, a similarity calculation unit 107, and a feature amount calculation unit 109. The information processing apparatus 10 further includes a content distribution unit 10c.

（コンテンツ推薦処理）
次に、本実施形態に係る情報処理装置１０が行うコンテンツ推薦処理について説明する。
以下の説明では、受信装置２０が、主にネットワークＮＷを介して各種のコンテンツデータを受信し、情報処理装置１０が受信装置２０において未視聴のコンテンツを推薦コンテンツの候補とする場合を例にする。視聴者管理部１１１には、上述した視聴ベクトルデータの他、推薦候補となる未視聴の配信可能なコンテンツを示すコンテンツＩＤを含む視聴データを予め記憶しておく。また、主旨ベクトル管理部１１０には、配信可能なコンテンツのコンテンツＩＤと主旨ベクトルとを対応付けて形成された主旨ベクトルデータを予め記憶しておく。 (Content recommendation process)
Next, content recommendation processing performed by the information processing apparatus 10 according to the present embodiment will be described.
In the following description, an example is given in which the receiving device 20 receives various types of content data mainly via the network NW, and the information processing device 10 sets unviewed content as candidates for recommended content in the receiving device 20. . In the viewer management unit 111, in addition to the above-described viewing vector data, viewing data including a content ID indicating an unviewable distributable content that is a candidate for recommendation is stored in advance. The main vector managing unit 110 stores in advance main vector data formed by associating content IDs of content that can be distributed and main vectors.

図３０は、本実施形態に係る視聴ベクトル取得処理の一例に係る機能ブロック図である。
視聴者管理部１１１は、予め記憶された視聴ベクトルデータが示す視聴者毎の視聴者ＩＤと対応付けられた視聴ベクトルを特徴量取得部１０６に出力する。また、視聴者管理部１１１は、その視聴者ＩＤに対応付けられた視聴フラグの値が未視聴であるコンテンツのコンテンツＩＤを特定し、特定したコンテンツＩＤを特徴量取得部１０６に出力する。 FIG. 30 is a functional block diagram according to an example of viewing vector acquisition processing according to the present embodiment.
The viewer management unit 111 outputs the viewing vector associated with the viewer ID for each viewer indicated by the viewing vector data stored in advance to the feature amount acquisition unit 106. In addition, the viewer management unit 111 identifies the content ID of the content whose viewing flag value associated with the viewer ID is unviewed, and outputs the identified content ID to the feature amount acquisition unit 106.

特徴量取得部１０６は、視聴者管理部１１１から入力された配信予定のコンテンツＩＤを主旨ベクトル管理部１１０に出力し、その応答としてコンテンツＩＤに対応する主旨ベクトルを主旨ベクトル管理部１１０から入力される。ここで、主旨ベクトル管理部１１０は、自部に記憶された主旨ベクトルデータを参照して、特徴量取得部１０６から入力されたコンテンツＩＤに対応する主旨ベクトルを特徴量取得部１０６に出力する。特徴量取得部１０６は、視聴者管理部１１１から入力された視聴者ＩＤ、視聴ベクトルと、入力された個々のコンテンツＩＤに、主旨ベクトル管理部１１０から入力された主旨ベクトルを対応付けて類似度算出部１０７に出力する。 The feature amount acquisition unit 106 outputs the content ID to be distributed input from the viewer management unit 111 to the main vector management unit 110, and receives the main vector corresponding to the content ID from the main vector management unit 110 as a response. The Here, the gist vector management unit 110 refers to the gist vector data stored in the own unit, and outputs a gist vector corresponding to the content ID input from the feature quantity acquisition unit 106 to the feature quantity acquisition unit 106. The feature quantity acquisition unit 106 associates the viewer ID and the viewing vector input from the viewer management unit 111 with the main content vector input from the main vector management unit 110 to the input individual content IDs, and the degree of similarity. It outputs to the calculation part 107.

類似度算出部１０７は、特徴量取得部１０６から入力された視聴ベクトルと個々のコンテンツＩＤに対応付けられた主旨ベクトルとの類似度を算出する。類似度を算出する手法は、上述した手法と同様であってもよい。類似度算出部１０７は、特徴量取得部１０６から入力された視聴者ＩＤと、個々のコンテンツＩＤに算出した類似度と対応付けて推薦管理部１１３に出力する。 The similarity calculation unit 107 calculates the similarity between the viewing vector input from the feature amount acquisition unit 106 and the main vector associated with each content ID. The method for calculating the similarity may be the same as the method described above. The similarity calculation unit 107 outputs the viewer ID input from the feature amount acquisition unit 106 to the recommendation management unit 113 in association with the similarity calculated for each content ID.

推薦管理部１１３は、類似度算出部１０７から入力された類似度が所定の類似度よりも高いコンテンツのコンテンツＩＤを推薦コンテンツのコンテンツＩＤとして選択する。推薦管理部１１３は、選択されたコンテンツＩＤもしくは、入力されたコンテンツＩＤから類似度が最も高いコンテンツから類似度の降順に所定数のコンテンツのコンテンツＩＤを推薦コンテンツのコンテンツＩＤとして選択してもよい。推薦管理部１１３は、類似度算出部１０７から入力された視聴者ＩＤと選択したコンテンツＩＤとを対応付けて記憶する。推薦管理部１１３には、図３１に示すように、各視聴者の視聴者ＩＤとその視聴者に推薦する推薦コンテンツのコンテンツＩＤとを対応付けてなる推薦コンテンツデータが記憶される。 The recommendation management unit 113 selects a content ID of content whose similarity input from the similarity calculation unit 107 is higher than a predetermined similarity as the content ID of recommended content. The recommendation management unit 113 may select, as the content ID of the recommended content, the content ID of a predetermined number of content from the selected content ID or the content with the highest similarity from the input content ID in descending order of similarity. . The recommendation management unit 113 stores the viewer ID input from the similarity calculation unit 107 and the selected content ID in association with each other. As shown in FIG. 31, the recommendation management unit 113 stores recommended content data in which the viewer ID of each viewer is associated with the content ID of the recommended content recommended for the viewer.

図３２は、本実施形態に係るコンテンツ推薦処理の一例を示すフローチャートである。
（ステップＳ１７１）視聴者管理部１１１は、自部に記憶された視聴ベクトルデータが示す視聴者ＩＤのうち未処理の視聴者ＩＤの有無を判定する。あると判定されるとき（ステップＳ１７１ＹＥＳ）、ステップＳ１７２の処理に進む。ないと判定されるとき（ステップＳ１７１ＮＯ）、図３２に示す処理を終了する。 FIG. 32 is a flowchart showing an example of content recommendation processing according to the present embodiment.
(Step S171) The viewer management unit 111 determines whether there is an unprocessed viewer ID among the viewer IDs indicated by the viewing vector data stored in the own unit. When it is determined that there is (YES in step S171), the process proceeds to step S172. When it is determined that there is no (NO in step S171), the process shown in FIG. 32 is terminated.

（ステップＳ１７２）視聴者管理部１１１は、自部に記憶された視聴データを参照して、未処理の視聴者ＩＤのうちのいずれかの視聴者ＩＤと関連付けられている全ての未視聴のコンテンツのコンテンツＩＤを特定する。また、視聴者管理部１１１は、自部に記憶された視聴ベクトルデータを参照して、その視聴者ＩＤに対応付けられた視聴ベクトルを特定する。視聴者管理部１１１は、その視聴者ＩＤ、特定した視聴ベクトル及び特定した全てのコンテンツＩＤを特徴量取得部１０６に出力する。その後、ステップＳ１７３の処理に進む。
（ステップＳ１７３）特徴量取得部１０６は、視聴者管理部１１１から入力されたコンテンツＩＤをそれぞれ主旨ベクトル管理部１１０に出力し、その応答としてそのコンテンツＩＤに関連付けられた主旨ベクトルが主旨ベクトル管理部１１０から取得する。その後、ステップＳ１７４の処理に進む。
（ステップＳ１７４）特徴量取得部１０６は、視聴者管理部１１１から入力された視聴者ＩＤと視聴ベクトルと、コンテンツＩＤと主旨ベクトル管理部１１０から入力された主旨ベクトルのセットを類似度算出部１０７に出力する。その後、ステップＳ１７５の処理に進む。 (Step S172) The viewer management unit 111 refers to the viewing data stored in the own unit, and all the unviewed contents associated with any one of the unprocessed viewer IDs The content ID is specified. In addition, the viewer management unit 111 refers to the viewing vector data stored in the own unit, and specifies the viewing vector associated with the viewer ID. The viewer management unit 111 outputs the viewer ID, the specified viewing vector, and all the specified content IDs to the feature amount acquisition unit 106. Thereafter, the process proceeds to step S173.
(Step S173) The feature amount acquisition unit 106 outputs the content ID input from the viewer management unit 111 to the main vector management unit 110, and the main vector associated with the content ID as the response is the main vector management unit Obtain from 110. Thereafter, the process proceeds to step S174.
(Step S174) The feature amount acquisition unit 106 sets a set of a viewer ID and a viewing vector input from the viewer management unit 111 and a content ID and a main vector input from the main vector management unit 110 as a similarity calculation unit 107. Output to. Thereafter, the process proceeds to step S175.

（ステップＳ１７５）類似度算出部１０７は、特徴量取得部１０６から入力された視聴ベクトルと、各コンテンツＩＤに対応付けられた主旨ベクトルとの類似度を算出する。その後、ステップＳ１７６の処理に進む。
（ステップＳ１７６）類似度算出部１０７は、特徴量取得部１０６から入力された視聴者ＩＤと、コンテンツＩＤと算出した類似度とを対応付けてなるセットを推薦管理部１１３に出力する。その後、ステップＳ１７７の処理に進む。
（ステップＳ１７７）推薦管理部１１３は、類似度算出部１０７から入力された視聴者ＩＤと類似度が処理の類似度よりも高いコンテンツのコンテンツＩＤとを関連付けて記憶（管理）する。その後、処理対象の視聴者ＩＤを他の未処理の視聴者ＩＤに変更して、ステップＳ１７１の処理に進む。 (Step S175) The similarity calculation unit 107 calculates the similarity between the viewing vector input from the feature amount acquisition unit 106 and the main vector associated with each content ID. Thereafter, the process proceeds to step S176.
(Step S176) The similarity calculation unit 107 outputs a set in which the viewer ID input from the feature amount acquisition unit 106 is associated with the content ID and the calculated similarity to the recommendation management unit 113. Thereafter, the process proceeds to step S177.
(Step S177) The recommendation management unit 113 stores (manages) the viewer ID input from the similarity calculation unit 107 in association with the content ID of content whose similarity is higher than the processing similarity. Thereafter, the processing target viewer ID is changed to another unprocessed viewer ID, and the process proceeds to step S171.

（コンテンツ推薦処理の応用）
次に、本実施形態のコンテンツ推薦処理の一応用例として、広告配信への応用例について説明する。以下に説明する例では、推薦管理部１１３には、視聴者ＩＤとその視聴者への推薦コンテンツである未視聴のコンテンツに関する広告のコンテンツＩＤとを対応付けてなる推薦コンテンツデータを予め記憶させておく。また、コンテンツ配信部１０ｃは、コンテンツ選択部１１４を含んで構成され、コンテンツＩＤ毎にその広告を示すコンテンツデータを予め記憶させておく。 (Application of content recommendation processing)
Next, as an application example of the content recommendation process of the present embodiment, an application example to advertisement distribution will be described. In the example described below, the recommendation management unit 113 stores in advance recommended content data in which a viewer ID is associated with a content ID of an advertisement related to an unviewed content that is recommended content for the viewer. deep. The content distribution unit 10c includes a content selection unit 114, and stores in advance content data indicating the advertisement for each content ID.

図３３は、本実施形態に係る広告配信の一例に係る機能ブロック図である。
受信装置２０は、所定の時点において広告配信要求信号と自装置に予め設定された視聴者ＩＤを情報処理装置１０のコンテンツ配信部１０ｃに送信する。
コンテンツ配信部１０ｃのコンテンツ選択部１１４は、受信装置２０から広告配信要求信号と視聴者ＩＤを受信したことに応じて、受信した視聴者ＩＤを推薦管理部１１３に出力する。コンテンツ選択部１１４には、その応答として推薦管理部１１３から視聴者ＩＤに対応するコンテンツＩＤが入力される。推薦管理部１１３は、自部に記憶された推薦コンテンツデータを参照して、コンテンツ選択部１１４から入力された視聴者ＩＤに対応するコンテンツＩＤを特定し、特定したコンテンツＩＤをコンテンツ選択部１１４に出力する。 FIG. 33 is a functional block diagram according to an example of advertisement distribution according to the present embodiment.
The receiving device 20 transmits an advertisement distribution request signal and a viewer ID preset in the device itself to the content distribution unit 10c of the information processing device 10 at a predetermined time.
The content selection unit 114 of the content distribution unit 10 c outputs the received viewer ID to the recommendation management unit 113 in response to receiving the advertisement distribution request signal and the viewer ID from the receiving device 20. As a response, the content selection unit 114 receives a content ID corresponding to the viewer ID from the recommendation management unit 113. The recommendation management unit 113 refers to the recommended content data stored in the own unit, identifies the content ID corresponding to the viewer ID input from the content selection unit 114, and sends the identified content ID to the content selection unit 114. Output.

コンテンツ配信部１０ｃは、自部に記憶された広告のコンテンツデータのうち、コンテンツ選択部１１４に入力されたコンテンツＩＤに対応付けられたコンテンツデータを特定し、特定したコンテンツデータを受信装置２０に送信する。
受信装置２０は、情報処理装置１０のコンテンツ配信部１０ｃからコンテンツデータを受信し、受信したコンテンツデータが示す広告を提示する。受信装置２０が広告を提示するタイミング、又は広告配信要求のタイミングは、例えば、放送番組又はその他のコンテンツの非受信中である。その場合には、視聴者による番組その他のコンテンツの視聴が妨げられないうえ、視聴者に広告が視聴される可能性が高くなる。 The content distribution unit 10c identifies content data associated with the content ID input to the content selection unit 114 among the content data of the advertisement stored in itself, and transmits the identified content data to the receiving device 20. To do.
The receiving device 20 receives content data from the content distribution unit 10c of the information processing device 10 and presents an advertisement indicated by the received content data. The timing at which the receiving device 20 presents an advertisement or the timing of an advertisement distribution request is, for example, that a broadcast program or other content is not being received. In that case, the viewer is not hindered from watching the program and other contents, and the viewer is more likely to view the advertisement.

図３４は、本実施形態に係る広告配信処理の一例を示すフローチャートである。
（ステップＳ１８１）コンテンツ配信部１０ｃのコンテンツ選択部１１４は、受信装置２０から広告配信要求信号と視聴者ＩＤを受信したか否かを判定する。受信したと判定されるとき（ステップＳ１８１ＹＥＳ）、ステップＳ１８２の処理に進む。受信していないと判定されるとき（ステップＳ１８１ＮＯ）、図３４に示す処理を終了する。 FIG. 34 is a flowchart illustrating an example of the advertisement distribution process according to the present embodiment.
(Step S181) The content selection unit 114 of the content distribution unit 10c determines whether or not the advertisement distribution request signal and the viewer ID are received from the receiving device 20. When it is determined that it has been received (YES in step S181), the process proceeds to step S182. When it is determined that it has not been received (step S181: NO), the processing shown in FIG. 34 is terminated.

（ステップＳ１８２）コンテンツ選択部１１４は、受信した視聴者ＩＤを推薦管理部１１３に出力し、その応答として推薦管理部１１３から視聴者ＩＤに関連付けられている全てのコンテンツＩＤを取得する。その後、ステップＳ１８３の処理に進む。
（ステップＳ１８３）コンテンツ選択部１１４は、取得したコンテンツＩＤをコンテンツ配信部１０ｃに出力する。その後、ステップＳ１８４の処理に進む。
（ステップＳ１８４）コンテンツ配信部１０ｃは、コンテンツ選択部１１４から入力されたコンテンツＩＤに関連付けられた広告を含んだコンテンツデータを広告配信要求信号の送信元である受信装置２０に送信（配信）する。その後、ステップＳ１８１の処理に進む。 (Step S182) The content selection unit 114 outputs the received viewer ID to the recommendation management unit 113, and acquires all content IDs associated with the viewer ID from the recommendation management unit 113 as a response. Thereafter, the process proceeds to step S183.
(Step S183) The content selection unit 114 outputs the acquired content ID to the content distribution unit 10c. Thereafter, the process proceeds to step S184.
(Step S184) The content distribution unit 10c transmits (distributes) content data including an advertisement associated with the content ID input from the content selection unit 114 to the receiving device 20 that is the transmission source of the advertisement distribution request signal. Thereafter, the process proceeds to step S181.

次に、本実施形態のコンテンツ推薦処理の他の応用例として、推薦コンテンツ配信への応用例について説明する。以下に説明する例では、視聴者管理部１１１には、視聴者ＩＤ、推薦候補となる未視聴の配信可能なＶＯＤコンテンツを示すコンテンツＩＤ、及び視聴フラグとを対応付けてなる視聴データを予め記憶させておく。また、コンテンツ配信部１０ｃには、予めコンテンツＩＤとそのコンテンツを示すコンテンツデータとを対応付けて記憶させておく。 Next, an application example to recommended content distribution will be described as another application example of the content recommendation processing of the present embodiment. In the example described below, the viewer management unit 111 stores in advance viewing data in which a viewer ID, a content ID indicating an unviewable VOD content that can be distributed as a recommendation candidate, and a viewing flag are associated with each other. Let me. The content distribution unit 10c stores a content ID and content data indicating the content in association with each other in advance.

図３５は、本実施形態に係る推薦コンテンツ配信の一例に係る機能ブロック図である。
受信装置２０は、ユーザである視聴者の所定の操作を受け付けるとき、コンテンツ配信要求信号と自装置に予め設定された視聴者ＩＤを情報処理装置１０のコンテンツ配信部１０ｃに送信する。
コンテンツ配信部１０ｃは、受信装置２０からコンテンツ配信要求信号と視聴者ＩＤを受信したことに応じて、受信した視聴者ＩＤを視聴者管理部１１１に出力する。コンテンツ配信部１０ｃには、その応答として視聴者管理部１１１から視聴者ＩＤに対応するコンテンツＩＤが入力される。 FIG. 35 is a functional block diagram according to an example of recommended content distribution according to the present embodiment.
When receiving a predetermined operation of a viewer who is a user, the receiving device 20 transmits a content distribution request signal and a viewer ID preset in the own device to the content distribution unit 10 c of the information processing device 10.
In response to receiving the content distribution request signal and the viewer ID from the receiving device 20, the content distribution unit 10 c outputs the received viewer ID to the viewer management unit 111. As a response, the content distribution unit 10c receives a content ID corresponding to the viewer ID from the viewer management unit 111.

視聴者管理部１１１は、自部に記憶された視聴データを参照して、コンテンツ配信部１０ｃから入力された視聴者ＩＤに対応するコンテンツのうち、関連付けられた視聴フラグの値が未視聴であるコンテンツのいずれかのコンテンツＩＤを選択する。視聴者管理部１１１は、選択したコンテンツＩＤをコンテンツ配信部１０ｃに出力する。なお、視聴者管理部１１１は、入力された視聴者ＩＤに対応する視聴ベクトルと、未視聴であるコンテンツのうち、その主旨ベクトルと視聴ベクトルとの類似度が最も高いコンテンツのコンテンツＩＤを選択してもよい。視聴ベクトル、主旨ベクトルは、それぞれ視聴者管理部１１１に記憶された視聴ベクトルデータ、主旨ベクトル管理部１１０に記憶された主旨ベクトルデータを参照して、取得される。
ここで、視聴者管理部１１１は、選択したコンテンツＩＤに対応付けられた視聴フラグの値を視聴済に変更してもよい。 The viewer management unit 111 refers to the viewing data stored in the own unit, and among the contents corresponding to the viewer ID input from the content distribution unit 10c, the value of the associated viewing flag is unviewed. Select one of the content IDs of the content. The viewer management unit 111 outputs the selected content ID to the content distribution unit 10c. Note that the viewer management unit 111 selects the content ID of the content having the highest similarity between the gist vector and the viewing vector among the viewing vector corresponding to the input viewer ID and the unviewed content. May be. The viewing vector and the subject vector are acquired by referring to the viewing vector data stored in the viewer management unit 111 and the subject vector data stored in the subject vector management unit 110, respectively.
Here, the viewer management unit 111 may change the value of the viewing flag associated with the selected content ID to “viewed”.

コンテンツ配信部１０ｃは、視聴者管理部１１１から入力されたコンテンツＩＤに対応付けられたコンテンツのコンテンツデータを特定し、特定したコンテンツデータをコンテンツ配信要求信号の送信元である受信装置２０に送信する。
受信装置２０は、情報処理装置１０からコンテンツデータを受信し、受信したコンテンツデータが示すコンテンツを提示する。 The content distribution unit 10c specifies the content data of the content associated with the content ID input from the viewer management unit 111, and transmits the specified content data to the receiving device 20 that is the transmission source of the content distribution request signal. .
The receiving device 20 receives content data from the information processing device 10 and presents the content indicated by the received content data.

図３６は、本実施形態に係る推薦コンテンツ配信処理の一例を示すフローチャートである。
（ステップＳ１９１）コンテンツ配信部１０ｃは、受信装置２０からコンテンツ配信要求信号と視聴者ＩＤを受信したか否かを判定する。受信したと判定されるとき（ステップＳ１９１ＹＥＳ）、ステップＳ１９２の処理に進む。受信していないと判定されるとき（ステップＳ１９１ＮＯ）、図３６に示す処理を終了する。 FIG. 36 is a flowchart illustrating an example of recommended content distribution processing according to the present embodiment.
(Step S191) The content distribution unit 10c determines whether a content distribution request signal and a viewer ID are received from the receiving device 20. When it is determined that it has been received (YES in step S191), the process proceeds to step S192. When it is determined that it has not been received (step S191: NO), the processing shown in FIG. 36 is terminated.

（ステップＳ１９２）コンテンツ配信部１０ｃは、受信した視聴者ＩＤを視聴者管理部１１１に出力し、その応答として視聴者管理部１１１から視聴者ＩＤに関連付けられている未視聴のコンテンツのコンテンツＩＤのいずれかを取得する。コンテンツ配信部１０ｃは、取得したコンテンツＩＤに関連付けられたコンテンツデータを受信装置２０に送信（配信）する。その後、ステップＳ１９３の処理に進む。
（ステップＳ１９３）視聴者管理部１１１は、自部に記憶する視聴データにおいて、コンテンツ配信部１０ｃからの視聴者ＩＤに応じて選択したコンテンツのコンテンツＩＤに関連付けられた視聴フラグが示す視聴状態の値を未視聴から視聴済に変更する。その後、ステップＳ１９１の処理に進む。 (Step S192) The content distribution unit 10c outputs the received viewer ID to the viewer management unit 111, and as a response, the content ID of the unviewed content associated with the viewer ID from the viewer management unit 111. Get one. The content distribution unit 10c transmits (distributes) content data associated with the acquired content ID to the receiving device 20. Thereafter, the process proceeds to step S193.
(Step S193) In the viewing data stored in the own part, the viewer management unit 111 indicates the viewing state value indicated by the viewing flag associated with the content ID of the content selected according to the viewer ID from the content distribution unit 10c. Is changed from unviewed to viewed. Thereafter, the process proceeds to step S191.

以上に説明したように、本実施形態に係る情報処理装置１０において、特徴量ベクトル算出部として特徴量算出部１０９は、未視聴のコンテンツに係る文章の主旨を示す第２主旨ベクトル又は当該文章の特徴を示す第２文章ベクトルを、視聴されたコンテンツに係る文章の主旨を示す主旨ベクトル又は当該文章の特徴を示す文章ベクトルとは別個に算出する。また、類似度算出部１０７は、第２主旨ベクトルもしくは第２文章ベクトルと視聴ベクトルとの類似度を算出する。また、本実施形態に係る情報処理装置１０は、コンテンツ選択部として、算出された類似度に基づいて未視聴のコンテンツから推薦コンテンツを選択するコンテンツ配信部１０ｃを備える。
この構成により、視聴単位毎に視聴されたコンテンツに係る文章の特徴を示す視聴ベクトルと、第２主旨ベクトル又は第２文章ベクトルとの類似度が高い文章に係る未視聴のコンテンツが推薦コンテンツとして選択される。そのため、推薦コンテンツとして、視聴単位毎に視聴されたコンテンツと特徴が類似する未視聴のコンテンツが選択される。従って、視聴単位毎の嗜好に沿った未視聴のコンテンツが推薦される。 As described above, in the information processing apparatus 10 according to the present embodiment, the feature quantity calculation unit 109 as the feature quantity vector calculation unit includes the second main vector or the second main vector indicating the main text of the unviewed content. The second sentence vector indicating the feature is calculated separately from the purpose vector indicating the purpose of the sentence related to the viewed content or the sentence vector indicating the feature of the sentence. In addition, the similarity calculation unit 107 calculates the similarity between the second concept vector or the second sentence vector and the viewing vector. The information processing apparatus 10 according to the present embodiment includes a content distribution unit 10c that selects recommended content from unviewed content based on the calculated similarity, as a content selection unit.
With this configuration, an unviewed content related to a sentence having a high degree of similarity between the viewing vector indicating the characteristics of the sentence related to the content viewed for each viewing unit and the second purpose vector or the second sentence vector is selected as the recommended content. Is done. For this reason, unreviewed content whose characteristics are similar to the content viewed for each viewing unit is selected as the recommended content. Therefore, unviewed content according to the preference for each viewing unit is recommended.

また、本実施形態に係る情報処理システム１は、当該情報処理装置１０と、受信装置２０を備える。受信装置２０は、コンテンツを受信し、視聴されたコンテンツを示す視聴情報を情報処理装置１０に送信し、情報処理装置から推薦コンテンツに関する推薦コンテンツ情報として、そのコンテンツデータ又はコンテンツ通知情報を受信する。
この構成により、受信装置２０において視聴されたコンテンツに文章の特徴を示す視聴ベクトルと、第２主旨ベクトル又は第２文章ベクトルとの類似度が高い文章に係る未視聴のコンテンツに関する推薦コンテンツ情報が、受信装置２０に提供される。受信装置２０の視聴者の嗜好に沿った未視聴のコンテンツに関する情報が、推薦コンテンツ情報として提供される。複雑な統計処理を伴わず比較的簡素な代数的な手法により推薦コンテンツ情報が提供可能なため、その実現に係るコストを低減することができる。 Further, the information processing system 1 according to the present embodiment includes the information processing apparatus 10 and a receiving apparatus 20. The receiving device 20 receives content, transmits viewing information indicating the viewed content to the information processing device 10, and receives the content data or content notification information as recommended content information related to the recommended content from the information processing device.
With this configuration, the recommended content information related to the unviewed content related to the sentence having a high similarity between the viewing vector indicating the feature of the sentence in the content viewed by the receiving device 20 and the second gist vector or the second sentence vector, Provided to the receiving device 20. Information regarding unviewed content in accordance with the viewer's preference of the receiving device 20 is provided as recommended content information. Since the recommended content information can be provided by a relatively simple algebraic method without complicated statistical processing, the cost for realizing the recommended content information can be reduced.

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。例えば、上述の実施形態において説明した各構成は、任意に組み合わせることができる。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to For example, the configurations described in the above embodiments can be arbitrarily combined.

上述の実施形態では、主にＤｏｃ２Ｖｅｃ部１０３が単語ベクトル、文章ベクトルを算出する際、ニューラルネットワークを用いる場合を例にしたが、これには限られない。互いに共通のベクトル空間内における特徴ベクトルであって、個々の単語の意味を示す単語ベクトル、個々の文章の意味を示す文章ベクトルを取得できれば、他の数理モデルが用いられてもよい。そのような数理モデルとして、例えば、メディア辞書変換演算モデル（Ｍｅｄｉａ−ｌｅｘｉｃｏｎＴｒａｎｓｆｏｒｍａｔｉｏｎＯｐｅｒａｔｏｒＭｏｄｅｌ）などが用いられてもよい。 In the above-described embodiment, the case where the Doc2Vec unit 103 mainly uses the neural network when calculating the word vector and the sentence vector is described as an example. However, the present invention is not limited to this. Other mathematical models may be used as long as they are feature vectors in a vector space common to each other and a word vector indicating the meaning of each word and a sentence vector indicating the meaning of each sentence can be acquired. As such a mathematical model, for example, a media dictionary conversion operation model (Media-lexicon Transformation Operator Model) may be used.

第４の実施形態では、情報処理装置１０から受信装置２０に提供される推薦コンテンツ情報が広告やＶＯＤコンテンツなどのコンテンツデータである場合を例にしたが、これには限られない。その他の種別の情報が提供されてもよい。例えば、そのコンテンツの概要、人間がそのコンテンツを特定するための情報、そのコンテンツの属性を示す情報などの、いずれか又は任意の組み合わせを含んだコンテンツ通知情報が用いられてもよい。コンテンツを特定するための情報とは、例えば、そのコンテンツのタイトル、サブタイトルなどが含まれる。そのコンテンツがＶＯＤコンテンツである場合には、そのＶＯＤコンテンツを送信可能とするサーバ装置のＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）などのアクセス情報が含まれてもよい。そのコンテンツが放送予定の放送番組である場合には、放送局名もしくは放送チャンネル番号、放送時間などの情報が含まれてもよい。そのコンテンツの属性を示す情報には、例えば、ジャンル、出演者、原作者、などの情報が含まれてもよい。コンテンツ通知情報を提供対象とする場合、コンテンツ配信部１０ｃにおいて、コンテンツデータに代えて、もしくはコンテンツデータとともにコンテンツＩＤとコンテンツ通知情報を対応付けて記憶させておく。そして、コンテンツ配信部１０ｃは、コンテンツデータに代えてコンテンツ通知情報を受信装置２０に送信する。 In the fourth embodiment, the recommended content information provided from the information processing device 10 to the receiving device 20 is content data such as an advertisement or VOD content. However, the present invention is not limited to this. Other types of information may be provided. For example, content notification information including any one or any combination of an outline of the content, information for identifying the content by a person, and information indicating an attribute of the content may be used. The information for specifying content includes, for example, the title and subtitle of the content. When the content is VOD content, access information such as a URL (Uniform Resource Locator) of a server device that can transmit the VOD content may be included. When the content is a broadcast program scheduled to be broadcast, information such as a broadcast station name or a broadcast channel number and a broadcast time may be included. The information indicating the attribute of the content may include, for example, information such as genre, performer, and original author. When the content notification information is to be provided, the content distribution unit 10c stores the content ID and the content notification information in association with the content data instead of the content data or together with the content data. Then, the content distribution unit 10c transmits content notification information to the receiving device 20 instead of the content data.

なお、上述した実施形態において、形態素データに含まれる各単語、単語データに含まれる各単語、キーワードデータに含まれる各単語として、自立語が用いられ、その他の品詞の単語、つまり付属語が除外されてもよい。自立語とは、独立して特定の意味を有する単語である。例えば、日本語では、品詞が動詞、形容動詞、形容詞、動詞、又は名詞である単語である。英語では、品詞が、動詞、形容詞、副詞、又は名詞である単語である。これにより、独立して特定の意味をなさない単語である付属語が頻出する場合でも、その単語の影響を受けずに文章と単語との関係が解析可能となる。また、ある単語、例えば、動詞、形容動詞、形容詞、副詞などの変化形については、それぞれ区別されずに同一の単語として扱われてもよい。また、複数の単語からなる複合語は、その複合語を構成する各単語とは別個の単語として区別されてもよい。 In the above-described embodiment, independent words are used as the words included in the morpheme data, the words included in the word data, and the words included in the keyword data. May be. An independent word is a word having a specific meaning independently. For example, in Japanese, a part of speech is a word that is a verb, an adjective verb, an adjective, a verb, or a noun. In English, parts of speech are words that are verbs, adjectives, adverbs, or nouns. Thereby, even when an attached word that is a word that does not have a specific meaning frequently appears independently, the relationship between the sentence and the word can be analyzed without being influenced by the word. Also, certain words, such as verbs, adjective verbs, adjectives, adverbs, etc. may be treated as the same word without being distinguished from each other. In addition, a compound word composed of a plurality of words may be distinguished from each word constituting the compound word as a separate word.

なお、上述した実施形態において、特徴量算出部１０９が、コンテンツ毎に各キーワードベクトルを合成して主旨ベクトルを算出する演算、コンテンツ間で主旨ベクトル又は文章ベクトルを合成して視聴ベクトルを算出する演算が、総和である場合を例にしたが、合成に用いるそれぞれのベクトルによる寄与を統合する演算であれば総和に限られない。そのような演算は、例えば、平均であってもよい。なお、合成に用いられるベクトルの数が１個である場合には、そのベクトルが合成結果としてのベクトルとなる。 In the above-described embodiment, the feature amount calculation unit 109 calculates the gist vector by synthesizing each keyword vector for each content, and calculates the viewing vector by synthesizing the gist vector or sentence vector between the contents. However, the sum is not limited to the sum as long as it is an operation that integrates the contributions of the vectors used in the synthesis. Such a calculation may be, for example, an average. In addition, when the number of vectors used for synthesis is one, the vector becomes a vector as a synthesis result.

また、上述した各実施形態に係る情報処理装置１０のハードウェア構成について説明する。図３７は、各実施形態に係る情報処理装置１０のハードウェア構成の一例を示すブロック図である。
情報処理装置１０は、ＣＰＵ１２１、記憶媒体１２２、ドライブ部１２３、入力部１２４、出力部１２５、ＲＯＭ１２６、ＲＡＭ１２７、補助記憶部１２８、及びインタフェース部１２９を含んで構成される。ＣＰＵ１２１、ドライブ部１２３、入力部１２４、出力部１２５、ＲＯＭ１２６、ＲＡＭ１２７、補助記憶部１２８、及びインタフェース部１２９は、バス（母線）１２０を介して相互に接続され、各種のデータが入出力可能である。 The hardware configuration of the information processing apparatus 10 according to each embodiment described above will be described. FIG. 37 is a block diagram illustrating an example of a hardware configuration of the information processing apparatus 10 according to each embodiment.
The information processing apparatus 10 includes a CPU 121, a storage medium 122, a drive unit 123, an input unit 124, an output unit 125, a ROM 126, a RAM 127, an auxiliary storage unit 128, and an interface unit 129. The CPU 121, the drive unit 123, the input unit 124, the output unit 125, the ROM 126, the RAM 127, the auxiliary storage unit 128, and the interface unit 129 are connected to each other via a bus (bus) 120, and various data can be input / output. is there.

ＣＰＵ１２１は、所定の制御プログラムをＲＯＭ１２６から読み出し、その制御プログラムで指示される処理を実行する。その処理において記憶媒体１２２、ＲＯＭ１２６、ＲＡＭ１２７、補助記憶部１２８のいずれか又はそれらの組において記憶された各種のデータが用いられることや、ＣＰＵ１２１は、実行している処理により生成されるデータを記憶媒体１２２、ＲＯＭ１２６、ＲＡＭ１２７、補助記憶部１２８のいずれか又はそれらの組に記憶させる。これにより、上述したデータ管理部１０ａならびにデータ処理部１０ｂ、又はデータ管理部１０ａ、データ処理部１０ｂならびにコンテンツ配信部１０ｃの機能が実現される。 The CPU 121 reads a predetermined control program from the ROM 126, and executes processing instructed by the control program. In the processing, various data stored in any one of the storage medium 122, ROM 126, RAM 127, auxiliary storage unit 128 or a combination thereof is used, and the CPU 121 stores data generated by the processing being executed. The data is stored in any one of the medium 122, the ROM 126, the RAM 127, the auxiliary storage unit 128, or a set thereof. Thereby, the functions of the data management unit 10a and the data processing unit 10b, or the data management unit 10a, the data processing unit 10b, and the content distribution unit 10c described above are realized.

記憶媒体１２２は、各種のデータを記憶する可搬記憶媒体である。記憶媒体１２２は、例えば、光磁気ディスク、フレキシブルディスク、フラッシュメモリなどである。
ドライブ部１２３は、記憶媒体１２２からの各種データの読み出し又は書き込みを行う機器を含んで構成されるデバイスである。
入力部１２４は、例えば、マウス、キーボードなどのユーザの操作を受け付け、その操作に基づく操作信号をＣＰＵ１２１に出力するデバイスである。
出力部１２５は、例えば、ディスプレイ、スピーカなどＣＰＵ１２１から入力されるデータを人間が認識できる形態で提示するデバイスである。
ＲＯＭ１２６には、例えば、所定の制御プログラムや所定の設定データを予め記憶させておく記憶媒体である。ＲＡＭ１２７には、例えば、ＣＰＵ１２１における処理に用いる各種のデータ、プログラム、ＣＰＵ１２１において生成された各種のデータを一時的に記憶する記憶媒体である。
補助記憶部１２８は、ＨＤＤ（Ｈａｒｄ−ｄｉｓｋＤｒｉｖｅ）、フラッシュメモリなどの記憶媒体であり、例えば、ＣＰＵ１２１の処理に用いる各種データ、ＣＰＵ１２１で生成された各種のデータを記憶する記憶媒体である。インタフェース部１２９は、通信インタフェースを有し、有線又は無線によりネットワークＮＷに接続される。 The storage medium 122 is a portable storage medium that stores various data. The storage medium 122 is, for example, a magneto-optical disk, a flexible disk, a flash memory, or the like.
The drive unit 123 is a device that includes a device that reads or writes various data from the storage medium 122.
The input unit 124 is a device that receives a user operation such as a mouse and a keyboard and outputs an operation signal based on the operation to the CPU 121.
The output unit 125 is a device that presents data input from the CPU 121 such as a display and a speaker in a form that can be recognized by a human.
For example, the ROM 126 is a storage medium that stores a predetermined control program and predetermined setting data in advance. The RAM 127 is a storage medium that temporarily stores, for example, various data and programs used for processing in the CPU 121 and various data generated in the CPU 121.
The auxiliary storage unit 128 is a storage medium such as an HDD (Hard-Disk Drive) or a flash memory. For example, the auxiliary storage unit 128 is a storage medium that stores various data used for the processing of the CPU 121 and various data generated by the CPU 121. The interface unit 129 has a communication interface and is connected to the network NW by wire or wireless.

なお、ＣＰＵ１２１が実行する処理を指示するプログラムは、ＲＯＭ１２６に限られず、記憶媒体１２２や補助記憶部１２８に記憶されたプログラムであってもよいし、ネットワークＮＷからダウンロードされたプログラムであってもよい。そのダウンロードされたプログラムは、記憶媒体１２２や補助記憶部１２８などに記憶されてもよい。記憶媒体１２２、入力部１２４、及び出力部１２５、補助記憶部１２８のいずれか又はそれらの組は、情報処理装置１０のその他の部位とで着脱可能であってもよい。 Note that the program instructing the processing executed by the CPU 121 is not limited to the ROM 126, and may be a program stored in the storage medium 122 or the auxiliary storage unit 128, or a program downloaded from the network NW. . The downloaded program may be stored in the storage medium 122, the auxiliary storage unit 128, or the like. Any of the storage medium 122, the input unit 124, the output unit 125, the auxiliary storage unit 128, or a combination thereof may be detachable from other parts of the information processing apparatus 10.

なお、上述した実施形態における情報処理装置１０及び受信装置２０の一部をコンピュータで実現するようにしてもよい。その場合、この制御機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、情報処理装置１０又は受信装置２０に内蔵されたコンピュータシステムであって、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。
また、上述した実施形態における情報処理装置１０及び受信装置２０の一部又は全部を、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等の集積回路として実現してもよい。情報処理装置１０及び受信装置２０の各機能ブロックは個別にプロセッサ化してもよいし、一部又は全部を集積してプロセッサ化してもよい。また、集積回路化の手法はＬＳＩに限らず専用回路、又は汎用プロセッサで実現してもよい。また、半導体技術の進歩によりＬＳＩに代替する集積回路化の技術が出現した場合、当該技術による集積回路を用いてもよい。 In addition, you may make it implement | achieve a part of information processing apparatus 10 and the receiver 20 in embodiment mentioned above with a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. Here, the “computer system” is a computer system built in the information processing apparatus 10 or the receiving apparatus 20 and includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, In this case, a volatile memory inside a computer system that serves as a server or a client may be included that holds a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
Moreover, you may implement | achieve part or all of the information processing apparatus 10 and the receiver 20 in embodiment mentioned above as integrated circuits, such as LSI (Large Scale Integration). Each functional block of the information processing apparatus 10 and the receiving apparatus 20 may be individually made into a processor, or a part or all of them may be integrated into a processor. Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. In addition, when an integrated circuit technology that replaces LSI appears due to the advancement of semiconductor technology, an integrated circuit based on the technology may be used.

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to

１…情報処理システム、１０…情報処理装置、１０ａ…データ管理部、１０ｂ…データ処理部、１０ｃ…コンテンツ配信部、２０…受信装置、１０１…文章管理部、１０２…分離部、１０３…Ｄｏｃ２Ｖｅｃ部、１０４…誤差算出部、１０５…Ｗｅｉｇｈｔ算出部、１０６…特徴量取得部、１０７…類似度算出部、１０８…キーワード管理部、１０９…特徴量算出部、１１０…主旨ベクトル管理部、１１１…視聴者管理部、１１３…推薦管理部、１１４…コンテンツ選択部、１２０…バス、１２１…ＣＰＵ、１２２…記憶媒体、１２３…ドライブ部、１２４…入力部、１２５…出力部、１２６…ＲＯＭ、１２７…ＲＡＭ、１２８…補助記憶部、１２９…インタフェース部、ＮＷ…ネットワーク DESCRIPTION OF SYMBOLS 1 ... Information processing system, 10 ... Information processing apparatus, 10a ... Data management part, 10b ... Data processing part, 10c ... Content distribution part, 20 ... Receiving device, 101 ... Text management part, 102 ... Separation part, 103 ... Doc2Vec part 104 ... Error calculation unit, 105 ... Weight calculation unit, 106 ... Feature amount acquisition unit, 107 ... Similarity calculation unit, 108 ... Keyword management unit, 109 ... Feature amount calculation unit, 110 ... Objective vector management unit, 111 ... Viewing Manager ... 113 ... recommendation management unit 114 ... content selection unit 120 ... bus 121 ... CPU 122 ... storage medium 123 ... drive unit 124 ... input unit 125 ... output unit 126 ... ROM 127 ... RAM, 128 ... auxiliary storage unit, 129 ... interface unit, NW ... network

Claims

文章の特徴を示す文章ベクトルと、前記文章に含まれる単語毎の特徴を示す単語ベクトルを算出する特徴ベクトル算出部と、
前記文章ベクトルと前記単語ベクトルとの類似度を算出し、前記類似度に基づいて前記文章に含まれる一部の単語を抽出する類似度算出部と、
を備える情報処理装置。 A sentence vector indicating the characteristics of the sentence; and a feature vector calculating unit that calculates a word vector indicating the characteristics of each word included in the sentence;
A similarity calculation unit that calculates a similarity between the sentence vector and the word vector, and extracts some words included in the sentence based on the similarity;
An information processing apparatus comprising:

前記特徴ベクトル算出部は、
前記一部の単語の単語ベクトルに基づいて前記文章の主旨を示す主旨ベクトルを算出する請求項１に記載の情報処理装置。 The feature vector calculation unit includes:
The information processing apparatus according to claim 1, wherein a gist vector indicating a gist of the sentence is calculated based on a word vector of the part of words.

前記特徴ベクトル算出部は、
視聴されたコンテンツに係る文章について前記主旨ベクトル又は文章ベクトルを算出し、
前記主旨ベクトル又は前記文章ベクトルを前記視聴されたコンテンツ間で合成して視聴ベクトルを算出する請求項２に記載の情報処理装置。 The feature vector calculation unit includes:
Calculate the gist vector or sentence vector for the sentence related to the viewed content,
The information processing apparatus according to claim 2, wherein the viewing vector is calculated by synthesizing the gist vector or the text vector among the viewed content.

前記特徴ベクトル算出部は、
未視聴のコンテンツに係る文章の主旨を示す第２主旨ベクトル又は当該文章の特徴を示す第２文章ベクトルを算出し、
前記類似度算出部は、
前記第２主旨ベクトルもしくは第２文章ベクトルと前記視聴ベクトルとの類似度を算出し、
前記類似度に基づいて前記未視聴のコンテンツから推薦コンテンツを選択するコンテンツ選択部
を備える請求項３に記載の情報処理装置。 The feature vector calculation unit includes:
Calculating a second gist vector indicating the gist of the text related to unviewed content or a second text vector indicating the characteristics of the text;
The similarity calculation unit includes:
Calculating a similarity between the second concept vector or the second sentence vector and the viewing vector;
The information processing apparatus according to claim 3, further comprising: a content selection unit that selects recommended content from the unviewed content based on the similarity.

受信装置と請求項４に記載の情報処理装置とを備える情報処理システムにおいて、
前記受信装置は、
コンテンツを受信し、視聴されたコンテンツを示す視聴情報を前記情報処理装置に送信し、前記情報処理装置から前記推薦コンテンツに関する推薦コンテンツ情報を受信する
情報処理システム。 In an information processing system comprising a receiving device and the information processing device according to claim 4,
The receiving device is:
An information processing system that receives content, transmits viewing information indicating the viewed content to the information processing apparatus, and receives recommended content information related to the recommended content from the information processing apparatus.

情報処理装置における情報処理方法であって、
文章の特徴を示す文章ベクトルと、前記文章に含まれる単語の特徴を示す単語ベクトルを算出する特徴ベクトル算出過程と、
前記文章ベクトルと前記単語ベクトルとの類似度を算出し、前記類似度に基づいて前記文章に含まれる一部の単語を抽出する単語抽出過程と、
を有する情報処理方法。 An information processing method in an information processing apparatus,
A sentence vector indicating the characteristics of the sentence; and a feature vector calculating process for calculating a word vector indicating the characteristics of the word included in the sentence;
Calculating a similarity between the sentence vector and the word vector, and extracting a part of words included in the sentence based on the similarity;
An information processing method comprising:

情報処理装置のコンピュータに
文章の特徴を示す文章ベクトルと、前記文章に含まれる単語の特徴を示す単語ベクトルを算出する特徴ベクトル算出手順、
前記文章ベクトルと前記単語ベクトルとの類似度を算出し、前記類似度に基づいて前記文章に含まれる単語の一部を抽出する単語抽出手順、
を実行させるためのプログラム。 A feature vector calculation procedure for calculating a sentence vector indicating a feature of a sentence and a word vector indicating a feature of a word included in the sentence in a computer of the information processing apparatus;
A word extraction procedure for calculating a similarity between the sentence vector and the word vector, and extracting a part of words included in the sentence based on the similarity;
A program for running