JP2018060374A

JP2018060374A - Information processing device, evaluation system and program

Info

Publication number: JP2018060374A
Application number: JP2016197553A
Authority: JP
Inventors: 耕輔丸山; Kosuke Maruyama; 伊藤　篤; Atsushi Ito; 篤伊藤; 鈴木　譲; Yuzuru Suzuki; 譲鈴木; 河野　功幸; Yoshiyuki Kono; 功幸河野
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2016-10-05
Filing date: 2016-10-05
Publication date: 2018-04-12
Anticipated expiration: 2036-10-05
Also published as: JP6855737B2

Abstract

PROBLEM TO BE SOLVED: To provide a system for supporting evaluation of an interviewee in an interview based on nonverbal information.SOLUTION: An information processing device comprises an operation detecting unit 230 for specifying parts of a human body shown in a video data to detect a motion of the specified part, a non-verbal information extracting unit 240 for extracting behavior defined as an object to be evaluated for predetermined evaluation items based on the motion of the part of the human body detected by the motion detecting unit 230, a classification unit 245 for classifying the behavior extracted by the non-verbal information extracting unit 240 depending on whether or not it was performed during the utterance, and a reaction evaluating unit 250 for evaluating each of the evaluation items based on a predetermined evaluation standard for each of the evaluation items according to the behavior extracted by the non-verbal information extracting unit 240 and the classification by the classification unit 245.SELECTED DRAWING: Figure 3

Description

本発明は、情報処理装置、評価システムおよびプログラムに関する。 The present invention relates to an information processing apparatus, an evaluation system, and a program.

従来、面接の進行や面接志望者の評価を支援するシステムが考えられている。特許文献１には、求人主体側の管理システムと、応募者側の動画機能及びＴＶ電話機能付携帯電話又はコンピュータ端末と、これらを接続する通信網とを含む人材募集・応募支援システムが開示されている。同文献に記載された従来技術において、管理システムは、求人情報及び応募案内情報提供、応募情報受信を行うＷＥＢサーバと、メール送受信を行うメールサーバと、オンライン面接、採用可否評価を行う担当者端末と、情報蓄積処理手段とを有する。コンピュータ端末、携帯電話は、求人情報の検索、氏名等の文字情報を含む応募情報による応募、動画機能に基づくＴＶ電話による採用担当者とのオンライン面接、求人主体側の管理システムからの採用可否情報のメールでの受信を行う。 Conventionally, systems that support the progress of interviews and the evaluation of interview candidates are considered. Patent Document 1 discloses a recruitment / application support system including a management system on the recruiting subject side, a mobile phone or computer terminal with a moving image function and a TV phone function on the applicant side, and a communication network connecting them. ing. In the prior art described in the same document, the management system includes a WEB server that provides job information and application guidance information, receives application information, a mail server that transmits and receives mail, an online interview, and a person in charge terminal that evaluates whether or not to adopt And information storage processing means. For computer terminals and mobile phones, search for job information, application based on application information including character information such as name, online interview with recruiters by video phone based on video functions, recruitment information from the recruitment subject management system Receive by email.

また、特許文献２には、通信ネットワークを介して接続された受講者端末に対して面接試験用教材コンテンツを配信するイーラーニングシステムが開示されている。同文献に記載されたシステムは、業界別及び又は職種別に面接試験用教材コンテンツを格納する教材ファイルと、受講者端末から入力された業界及び又は職種に対応した面接試験用質問を質問に対する複数の回答例及び回答時のアドバイスと共に教材ファイルに格納された面接試験用教材コンテンツの中から検索し受講者端末に送信する質問処理手段とを備える。 Patent Document 2 discloses an e-learning system that distributes teaching material for interview tests to student terminals connected via a communication network. The system described in this document includes a teaching material file that stores interview examination teaching material contents for each industry and / or job type, and a plurality of interview test questions that are input from the student terminal and that correspond to the industry and / or job type. It includes a question processing means for searching from the interview examination teaching material contents stored in the teaching material file together with an answer example and advice at the time of answering and transmitting it to the student terminal.

特開２００７−２６５３６９号公報JP 2007-265369 A 特開２００８−１６５１２３号公報JP 2008-165123 A

面接においては、面接官（面接する者）の質問に対する面接志望者（面接を受ける者）の受け答えのような言語情報だけでなく、面接志望者の姿勢や所作などの非言語情報も参酌される。しかし従来、面接の支援システムにおいて非言語情報の評価は行われていなかった。 In the interview, not only linguistic information such as the interviewee's (interviewee's) answer to the interviewer's (interviewer's) questions, but also non-linguistic information such as the interviewer's attitude and behavior is taken into account. . However, non-verbal information has not been evaluated in the interview support system.

本発明は、面接において面接志望者に対する非言語情報に基づく評価を支援するシステムを提供することを目的とする。 An object of this invention is to provide the system which supports the evaluation based on non-linguistic information with respect to the interview candidate in the interview.

本発明の請求項１に係る情報処理装置は、
動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出部と、
前記動作検出部により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出部と、
前記行動抽出部により抽出された行動を発話中の行動か否かに応じて分類する分類部と、
前記行動抽出部により抽出された行動および前記分類部による分類に応じて前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価部と、
を備えることを特徴とする、情報処理装置である。
本発明の請求項２に係る情報処理装置は、
前記評価部は、前記分類部による分類に基づき、前記行動抽出部により抽出された行動の少なくとも一部に対し、発話中に行われた行動と発話中でないときに行われた行動とで異なる評価を行うことを特徴とする、請求項１に記載の情報処理装置である。
本発明の請求項３に係る情報処理装置は、
前記分類部は、前記行動抽出部により抽出された行動のうち、口を動かす動作に基づいて、発話中か否かを特定することを特徴とする、請求項１または請求項２に記載の情報処理装置である。
本発明の請求項４に係る情報処理装置は、
前記分類部は、前記動画データと共に取得された音声に基づいて、または、当該音声と前記行動抽出部により抽出された行動のうちの口を動かす動作とに基づいて、発話中か否かを特定することを特徴とする、請求項１または請求項２に記載の情報処理装置である。
本発明の請求項５に係る評価システムは、
動画データを取得する取得手段と、
前記取得手段により取得された動画データを解析して動画に映っている人物の行動を評価する行動評価手段と、
前記行動評価手段による評価結果を出力する出力手段と、を備え、
前記行動評価手段は、
動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出部と、
前記動作検出部により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出部と、
前記行動抽出部により抽出された行動を発話中の行動か否かに応じて分類する分類部と、
前記行動抽出部により抽出された行動および前記分類部による分類に応じて前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価部と、
を備えることを特徴とする、評価システムである。
本発明の請求項６に係る評価システムは、
前記行動評価手段の前記分類部は、前記行動抽出部により抽出された行動のうち、口を動かす動作に基づいて、発話中か否かを特定することを特徴とする、請求項５に記載の評価システムである。
本発明の請求項７に係る評価システムは、
音声を収録する音声収録手段をさらに備え、
前記行動評価手段の前記分類部は、前記取得手段により取得された前記動画データと共に前記音声収録手段により収録された音声に基づいて、または、当該音声と前記行動抽出部により抽出された行動のうちの口を動かす動作とに基づいて、発話中か否かを特定することを特徴とする、請求項５に記載の評価システムである。
本発明の請求項８に係るプログラムは、
コンピュータを、
動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出手段と、
前記動作検出手段により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出手段と、
前記行動抽出手段により抽出された行動を発話中の行動か否かに応じて分類する分類手段と、
前記行動抽出手段により抽出された行動および前記分類手段による分類に応じて前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価手段として機能させること、
を特徴とする、プログラムである。 An information processing apparatus according to claim 1 of the present invention provides:
An action detection unit that identifies a part of the human body shown in the video data and detects the action of the specified part;
An action extraction unit for extracting an action defined as an evaluation target in a predetermined evaluation item based on the movement of the part of the human body detected by the movement detection unit;
A classification unit that classifies the behavior extracted by the behavior extraction unit according to whether or not it is an uttering behavior;
An evaluation unit that performs evaluation for each evaluation item based on an evaluation criterion that is predetermined for each evaluation item according to the behavior extracted by the behavior extraction unit and the classification by the classification unit;
An information processing apparatus comprising:
An information processing apparatus according to a second aspect of the present invention includes:
The evaluation unit, based on the classification by the classification unit, for the at least part of the behavior extracted by the behavior extraction unit, the evaluation that is different between the behavior performed during utterance and the behavior performed when not speaking The information processing apparatus according to claim 1, wherein:
An information processing apparatus according to claim 3 of the present invention is provided.
3. The information according to claim 1, wherein the classification unit specifies whether or not a speech is being performed based on an action of moving a mouth among the actions extracted by the action extraction unit. 4. It is a processing device.
An information processing apparatus according to claim 4 of the present invention provides:
The classification unit specifies whether or not the speech is being performed based on the voice acquired together with the moving image data or based on the voice and an action of moving the mouth of the behavior extracted by the behavior extraction unit. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.
An evaluation system according to claim 5 of the present invention is:
An acquisition means for acquiring video data;
Behavior evaluation means for analyzing the video data acquired by the acquisition means and evaluating the behavior of a person shown in the video;
Output means for outputting an evaluation result by the behavior evaluation means,
The behavior evaluation means includes
An action detection unit that identifies a part of the human body shown in the video data and detects the action of the specified part;
An action extraction unit for extracting an action defined as an evaluation target in a predetermined evaluation item based on the movement of the part of the human body detected by the movement detection unit;
A classification unit that classifies the behavior extracted by the behavior extraction unit according to whether or not it is an uttering behavior;
An evaluation unit that performs evaluation for each evaluation item based on an evaluation criterion that is predetermined for each evaluation item according to the behavior extracted by the behavior extraction unit and the classification by the classification unit;
An evaluation system characterized by comprising:
An evaluation system according to claim 6 of the present invention includes:
The said classification | category part of the said action evaluation means specifies whether it is during utterance based on the operation | movement which moves a mouth among the actions extracted by the said action extraction part, The Claim 5 characterized by the above-mentioned. Evaluation system.
An evaluation system according to claim 7 of the present invention includes:
It further comprises a voice recording means for recording voice,
The classification unit of the behavior evaluation unit is based on the voice recorded by the voice recording unit together with the moving image data acquired by the acquisition unit, or among the behaviors extracted by the voice and the behavior extraction unit 6. The evaluation system according to claim 5, wherein it is specified whether or not the utterance is in progress based on the movement of the mouth.
The program according to claim 8 of the present invention is:
Computer
A motion detection means for identifying a part of the human body shown in the video data and detecting a motion of the identified part;
Action extracting means for extracting an action defined as an evaluation target in a predetermined evaluation item based on the action of the part of the human body detected by the action detecting means;
A classifying unit that classifies the behavior extracted by the behavior extracting unit according to whether or not it is an uttering behavior;
Functioning as an evaluation unit that performs evaluation for each evaluation item based on an evaluation criterion predetermined for each evaluation item according to the behavior extracted by the behavior extraction unit and the classification by the classification unit;
It is a program characterized by.

請求項１の発明によれば、言語情報に基づいて面接志望者を評価する構成と比較して、発話中や発話していないときの非言語情報としての行動に基づく評価を支援することができる。
請求項２の発明によれば、言語情報に基づいて面接志望者を評価する構成と比較して、発話中の行動か否かの区別を非言語情報としての行動の評価に反映させ、精度の高い評価を行うことができる。
請求項３の発明によれば、非言語情報としての行動として抽出される動作を用いることにより、効率よく発話中か否かの判断を行うことができる。
請求項４の発明によれば、音声情報を参酌することにより、発話中か否かの判断を精度よく行うことができる。
請求項５の発明によれば、言語情報に基づいて面接志望者を評価する構成と比較して、取得手段により取得した動画を用いて、発話中や発話していないときの非言語情報としての行動に基づく評価を支援することができる。
請求項６の発明によれば、非言語情報としての行動として抽出される動作を用いることにより、効率よく発話中か否かの判断を行うことができる。
請求項７の発明によれば、音声情報を参酌することにより、発話中か否かの判断を精度よく行うことができる。
請求項８の発明によれば、言語情報に基づいて面接志望者を評価する構成と比較して、本発明のプログラムを実行するコンピュータにおいて、発話中や発話していないときの非言語情報としての行動に基づく評価を支援することができる。 According to the first aspect of the present invention, it is possible to support evaluation based on behavior as non-linguistic information when speaking or not speaking, compared to a configuration in which interview candidates are evaluated based on language information. .
According to the second aspect of the present invention, compared with the configuration in which the interview candidate is evaluated based on the linguistic information, the distinction as to whether or not the action is during utterance is reflected in the evaluation of the action as non-linguistic information. High evaluation can be performed.
According to the invention of claim 3, it is possible to efficiently determine whether or not the speech is being performed by using the action extracted as the action as the non-language information.
According to the fourth aspect of the present invention, it is possible to accurately determine whether or not the speech is being made by taking the voice information into consideration.
According to the fifth aspect of the present invention, as a non-linguistic information when speaking or not speaking, the moving image acquired by the acquiring means is used as compared with the configuration in which interview candidates are evaluated based on language information. Can support behavior-based evaluation.
According to the sixth aspect of the present invention, it is possible to efficiently determine whether or not the speech is being performed by using the action extracted as the action as the non-language information.
According to the seventh aspect of the present invention, it is possible to accurately determine whether or not the utterance is in progress by taking the voice information into consideration.
According to the eighth aspect of the present invention, as a non-linguistic information when speaking or not speaking in a computer executing the program of the present invention, compared with a configuration in which interview candidates are evaluated based on language information. Can support behavior-based evaluation.

本実施形態が適用される非言語情報評価システムの構成例を示す図である。It is a figure which shows the structural example of the non-linguistic information evaluation system to which this embodiment is applied. 情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of information processing apparatus. 情報処理装置の機能構成を示す図である。It is a figure which shows the function structure of information processing apparatus. 端末装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a terminal device. 端末装置の機能構成を示す図である。It is a figure which shows the function structure of a terminal device. フレーム間特徴量を用いて人体に関わる領域を特定する手法を説明する図であり、図６（Ａ）は、動画の１フレームにおいて、人物が横を向いて椅子に座っている様子を示す図、図６（Ｂ）は、動画の別の１フレームにおいて、同じ人物が前方へ乗り出した様子を示す図である。FIG. 6A is a diagram for explaining a technique for specifying a region related to a human body using inter-frame feature values, and FIG. 6A is a diagram showing a person sitting sideways on a chair in one frame of a moving image. FIG. 6B is a diagram showing a state where the same person has moved forward in another frame of the moving image. 本実施形態の非言語情報評価システムにおいて、ビデオカメラにより撮影された画像の例を示す図である。It is a figure which shows the example of the image image | photographed with the video camera in the non-language information evaluation system of this embodiment.

＜本実施形態が適用される非言語情報評価システムの構成＞
図１は、本実施形態が適用される非言語情報評価システムの構成例を示す図である。図１に示すように、本実施形態による非言語情報評価システム１０は、動画取得装置としてのビデオカメラ１００と、動画解析装置としての情報処理装置２００と、情報処理装置２００による解析結果を出力する出力装置としての端末装置３００とを備える。ビデオカメラ１００と情報処理装置２００、情報処理装置２００と端末装置３００は、それぞれネットワーク２０を介して接続されている。 <Configuration of non-linguistic information evaluation system to which this embodiment is applied>
FIG. 1 is a diagram illustrating a configuration example of a non-language information evaluation system to which the present embodiment is applied. As shown in FIG. 1, the non-linguistic information evaluation system 10 according to the present embodiment outputs a video camera 100 as a moving image acquisition device, an information processing device 200 as a moving image analysis device, and an analysis result by the information processing device 200. And a terminal device 300 as an output device. The video camera 100 and the information processing device 200, and the information processing device 200 and the terminal device 300 are connected via the network 20, respectively.

ネットワーク２０は、ビデオカメラ１００と情報処理装置２００および情報処理装置２００と端末装置３００の間で情報通信を行えるものであれば特に限定されず、例えばインターネットやＬＡＮ（Local Area Network）等としてよい。情報通信に用いられる通信回線は、有線であっても無線であっても良い。ビデオカメラ１００と情報処理装置２００とを接続するネットワーク２０と、情報処理装置２００と端末装置３００とを接続するネットワーク２０とは、共通のネットワークであってもよいし、異なるネットワークであってもよい。また、特に図示しないが、ネットワーク２０にはネットワークや通信回線を接続するためのゲートウェイやハブ等の中継装置が適宜設けられる。 The network 20 is not particularly limited as long as it can perform information communication between the video camera 100 and the information processing apparatus 200, and between the information processing apparatus 200 and the terminal apparatus 300. For example, the network 20 may be the Internet or a LAN (Local Area Network). A communication line used for information communication may be wired or wireless. The network 20 that connects the video camera 100 and the information processing device 200 and the network 20 that connects the information processing device 200 and the terminal device 300 may be a common network or different networks. . Although not particularly illustrated, the network 20 is appropriately provided with a relay device such as a gateway or a hub for connecting a network or a communication line.

本実施形態の非言語情報評価システム１０は、評価対象である面接志望者の動画を解析して、動作や顔の表情といった非言語情報を抽出し、抽出された非言語情報に基づき評価対象を評価する。評価対象の面接志望者は、一人とする場合もあるし、複数人を一度に対象とする場合もある。評価項目や評価内容は、面接の目的や形式等に応じて設定される。本実施形態における具体的な評価方法については後述する。 The non-linguistic information evaluation system 10 according to the present embodiment analyzes a moving image of an interview candidate who is an evaluation target, extracts non-linguistic information such as actions and facial expressions, and selects an evaluation target based on the extracted non-linguistic information. evaluate. There may be one interview candidate who is the subject of the evaluation, or there may be multiple subjects at once. Evaluation items and evaluation contents are set according to the purpose and format of the interview. A specific evaluation method in this embodiment will be described later.

図１に示すシステムにおいて、ビデオカメラ１００は、動画データの取得手段の一例であり、評価対象である面接志望者を撮影する。本実施形態では、ビデオカメラ１００により撮影された面接志望者の動画を解析し、動作や顔の表情といった非言語情報が抽出される。したがって、一度に撮影する面接志望者の人数および配置等に応じて、面接志望者の動作や表情が識別できるように、ビデオカメラ１００の種類や設置台数が設定される。例えば、１台のビデオカメラ１００で面接志望者の正面から撮影するように構成してもよいし、複数台のビデオカメラ１００で複数の角度から（相異なる向きで）面接志望者を撮影するように構成してもよい。また、複数人の面接志望者に対して同時に面接を行う場合、１台のビデオカメラ１００で複数人の面接志望者を１画面に収めるように撮影してもよいし、複数台のビデオカメラ１００で各面接志望者を個別に撮影してもよい。また、本実施形態において、ビデオカメラ１００は、撮影した動画をデジタル・データとして、ネットワーク２０を介して情報処理装置２００へ送信する機能を備える。 In the system shown in FIG. 1, a video camera 100 is an example of a moving image data acquisition unit, and photographs an interview candidate who is an evaluation target. In this embodiment, a moving image of the interview candidate photographed by the video camera 100 is analyzed, and non-linguistic information such as motion and facial expression is extracted. Accordingly, the type and the number of installed video cameras 100 are set so that the operation and facial expressions of the interview candidate can be identified according to the number and arrangement of the interview applicants who photograph at a time. For example, one video camera 100 may be configured to shoot from the front of the interview candidate, or a plurality of video cameras 100 may shoot the interview candidate from a plurality of angles (in different directions). You may comprise. When interviewing a plurality of interview candidates at the same time, a single video camera 100 may be used to capture a plurality of interview candidates on a single screen, or a plurality of video cameras 100 may be captured. Each interview candidate may be photographed individually. In the present embodiment, the video camera 100 has a function of transmitting a captured moving image as digital data to the information processing apparatus 200 via the network 20.

情報処理装置２００は、行動評価手段の一例であり、ビデオカメラ１００により撮影された動画を解析して評価対象である面接志望者（以下、評価対象者と呼ぶ）に関する非言語情報を抽出し、評価するコンピュータ（サーバ）である。情報処理装置２００は、単体のコンピュータにより構成してもよいし、ネットワーク２０に接続された複数のコンピュータにより構成してもよい。後者の場合、後述する本実施形態の情報処理装置２００としての機能は、複数のコンピュータによる分散処理にて実現される。 The information processing apparatus 200 is an example of behavior evaluation means, extracts non-linguistic information related to an interview candidate (hereinafter referred to as an evaluation target person) as an evaluation target by analyzing a moving image taken by the video camera 100, A computer (server) to be evaluated. The information processing apparatus 200 may be configured by a single computer or may be configured by a plurality of computers connected to the network 20. In the latter case, the function as the information processing apparatus 200 of the present embodiment to be described later is realized by distributed processing by a plurality of computers.

図２は、情報処理装置２００のハードウェア構成例を示す図である。図２に示すように、情報処理装置２００は、制御手段および演算手段であるＣＰＵ（Central Processing Unit）２０１と、ＲＡＭ２０２およびＲＯＭ２０３と、外部記憶装置２０４と、ネットワーク・インターフェイス２０５とを備える。ＣＰＵ２０１は、ＲＯＭ２０３に格納されているプログラムを実行することにより、各種の制御および演算処理を行う。ＲＡＭ２０２は、ＣＰＵ２０１による制御や演算処理において作業メモリとして用いられる。ＲＯＭ２０３は、ＣＰＵ２０１が実行するプログラムや制御において用いられる各種のデータを格納している。外部記憶装置２０４は、例えば磁気ディスク装置や、データの読み書きが可能で不揮発性の半導体メモリで実現され、ＲＡＭ２０２に展開されてＣＰＵ２０１により実行されるプログラムや、ＣＰＵ２０１による演算処理の結果を格納する。ネットワーク・インターフェイス２０５は、ネットワーク２０に接続して、ビデオカメラ１００や端末装置３００との間でデータの送受信を行う。なお、図２に示す構成例は、情報処理装置２００をコンピュータで実現するハードウェア構成の一例に過ぎない。情報処理装置２００の具体的構成は、以下に説明する機能を実現し得るものであれば、図２に示す構成例に限定されない。 FIG. 2 is a diagram illustrating a hardware configuration example of the information processing apparatus 200. As illustrated in FIG. 2, the information processing apparatus 200 includes a central processing unit (CPU) 201 that is a control unit and a calculation unit, a RAM 202 and a ROM 203, an external storage device 204, and a network interface 205. The CPU 201 performs various controls and arithmetic processes by executing programs stored in the ROM 203. The RAM 202 is used as a working memory in the control and arithmetic processing by the CPU 201. The ROM 203 stores various data used in programs executed by the CPU 201 and control. The external storage device 204 is realized by, for example, a magnetic disk device or a non-volatile semiconductor memory that can read and write data, and stores a program that is expanded in the RAM 202 and executed by the CPU 201 and a result of arithmetic processing by the CPU 201. The network interface 205 is connected to the network 20 and transmits / receives data to / from the video camera 100 and the terminal device 300. The configuration example illustrated in FIG. 2 is merely an example of a hardware configuration that implements the information processing apparatus 200 with a computer. The specific configuration of the information processing apparatus 200 is not limited to the configuration example illustrated in FIG. 2 as long as the functions described below can be realized.

図３は、情報処理装置２００の機能構成を示す図である。図３に示すように、情報処理装置２００は、動画データ取得部２１０と、領域識別部２２０と、動作検出部２３０と、非言語情報抽出部２４０と、分類部２４５と、反応評価部２５０と、出力部２６０とを備える。 FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus 200. As illustrated in FIG. 3, the information processing apparatus 200 includes a moving image data acquisition unit 210, a region identification unit 220, an operation detection unit 230, a non-language information extraction unit 240, a classification unit 245, and a reaction evaluation unit 250. And an output unit 260.

動画データ取得部２１０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行し、ネットワーク・インターフェイス２０５を制御することにより実現される。動画データ取得部２１０は、ネットワーク２０を介してビデオカメラ１００から動画データを受信する。受信した動画データは、例えば図２に示すＲＡＭ２０２や外部記憶装置２０４に格納される。 For example, in the computer shown in FIG. 2, the moving image data acquisition unit 210 is realized by the CPU 201 executing a program and controlling the network interface 205. The moving image data acquisition unit 210 receives moving image data from the video camera 100 via the network 20. The received moving image data is stored, for example, in the RAM 202 or the external storage device 204 shown in FIG.

領域識別部２２０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。領域識別部２２０は、動画データ取得部２１０により取得された動画を解析し、後段の非言語情報抽出部２４０により非言語情報として抽出される評価対象者の部位が映っている領域を識別する。具体的には、人体（全体）が映っている領域、人体の頭部、体部、腕部、手部、指などが映っている領域、頭部の顔、目、口、鼻、耳などが映っている領域、上半身、下半身が映っている領域、その他身体の各特徴点が映っている領域等を識別する（以下、人体の全体や一部分を特に区別せず、部位、身体の部位などと呼ぶ）。識別対象の部位としては、予め定められた部位を全て識別してもよいし、後段の非言語情報抽出部２４０による抽出や反応評価部２５０による評価の内容に基づき、これらの処理に用いられる部位のみを識別してもよい。 For example, in the computer shown in FIG. 2, the area identification unit 220 is realized by the CPU 201 executing a program. The area identifying unit 220 analyzes the moving image acquired by the moving image data acquiring unit 210, and identifies an area in which the part of the evaluation target person extracted as non-linguistic information by the non-linguistic information extracting unit 240 in the subsequent stage is shown. Specifically, areas where the human body (the whole) is shown, areas where the human head, body, arms, hands, fingers, etc. are reflected, head face, eyes, mouth, nose, ears, etc. The area where the body is reflected, the area where the upper body and the lower body are reflected, and the area where each body feature point is reflected, etc. (Hereinafter, the whole body part or part of the human body is not particularly distinguished. Called). As the parts to be identified, all the predetermined parts may be identified, or parts used for these processes based on the extraction by the subsequent non-linguistic information extraction unit 240 and the evaluation by the reaction evaluation unit 250 Only may be identified.

動作検出部２３０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。動作検出部２３０は、領域識別部２２０の識別結果に基づき、各領域に映っている身体の部位を特定し、特定した部位ごとの動作を検出する。具体的には、頭の動き、顔の向き、顔の構成部位（目、口など）の動き、腕や脚の動き、身体の向き、身体の移動（歩きまわる等）等の動作を検出する。検出対象の動作としては、予め定められた部位についての予め定められた動作を全て対象として検出してもよいし、後段の非言語情報抽出部２４０による抽出や反応評価部２５０による評価の内容に基づき、これらの処理に用いられる部位の動作のみを検出してもよい。 For example, in the computer shown in FIG. 2, the operation detection unit 230 is realized by the CPU 201 executing a program. The motion detection unit 230 identifies the body part shown in each region based on the identification result of the region identification unit 220, and detects the motion for each identified region. Specifically, motions such as head movements, face orientations, face component movements (eyes, mouth, etc.), arm and leg movements, body orientations, body movements (walking, etc.) are detected. . As the motion of the detection target, all of the predetermined motions regarding a predetermined portion may be detected as targets, or the content of the evaluation by the non-linguistic information extraction unit 240 in the subsequent stage or the evaluation by the reaction evaluation unit 250 may be used. Based on this, only the movement of the part used for these processes may be detected.

非言語情報抽出部２４０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。非言語情報抽出部２４０は、動作検出部２３０により検出された部位の動きに基づき、評価対象者の行動のうち、反応評価部２５０の評価項目ごとの評価に用いられるもの（非言語情報）を抽出する。言い換えれば、非言語情報抽出部２４０は、評価対象者の発する非言語情報として定義された行動を抽出する行動抽出部である。具体的には、例えば、うなずく動作、顔を特定の方向に向けたり顔の向きを変えたりする動作、表情の変化、口を動かして発言する動作、欠伸（あくび）をする動作、居眠りしているときの動き、目くばせをする動作、挙手、筆記動作、キーボードを打つ動作、振り向く動作、貧乏ゆすりなどを抽出する。 The non-linguistic information extraction unit 240 is realized by the CPU 201 executing a program in, for example, the computer shown in FIG. The non-linguistic information extraction unit 240 uses, based on the movement of the part detected by the motion detection unit 230, the evaluation target person's behavior (non-linguistic information) used for evaluation for each evaluation item of the reaction evaluation unit 250. Extract. In other words, the non-linguistic information extraction unit 240 is an action extraction unit that extracts an action defined as non-linguistic information issued by the person to be evaluated. Specifically, for example, nodding, moving the face in a specific direction or changing the direction of the face, changing facial expressions, moving the mouth, speaking, yawning, or falling asleep It extracts motions when you are in motion, moving your eyes, raising your hands, writing, typing your keyboard, turning, turning poor, etc.

分類部２４５は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。分類部２４５は、非言語情報抽出部２４０により抽出された非言語情報としての行動を、評価対象者が話者であるときの行動と、評価対象者が話者でないときの行動とに分類する。すなわち、話者を特定する話者情報と非言語情報としての行動とを関連付ける。非言語情報として抽出される行動には、その行動が評価対象者の発話中に行われた行動か、他者の発話中に行われた行動かによって意義の変わるものがある。そこで、本実施形態では、評価対象者の発話中の行動と、発話していないときの行動とを分類する。この分類は、評価対象者の全ての行動に対して行ってもよいし、評価対象者の発話中に行われた行動か、他者の発話中に行われた行動かによって意義の変わる行動に対して行ってもよい。 For example, in the computer shown in FIG. 2, the classification unit 245 is realized by the CPU 201 executing a program. The classification unit 245 classifies the behavior as non-language information extracted by the non-linguistic information extraction unit 240 into behavior when the evaluation target person is a speaker and behavior when the evaluation target person is not a speaker. . That is, the speaker information that identifies the speaker is associated with the behavior as non-linguistic information. The behavior extracted as non-linguistic information may change its significance depending on whether the behavior is performed during the utterance of the person to be evaluated or the behavior performed during the utterance of another person. Therefore, in the present embodiment, the behavior of the evaluation target person during speech and the behavior when not speaking are classified. This classification may be performed for all actions of the person being evaluated, or actions that change the meaning depending on whether the action was made during the person being uttered or the action taken while another person was speaking. You may do it for.

反応評価部２５０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。反応評価部２５０は、非言語情報抽出部２４０により抽出された評価項目ごとの非言語情報の行動に対し、評価項目ごとに予め定められた評価基準に基づいて、評価対象者の反応を評価する。また、分類部２４５により分類された行動に関しては、発話中の行動か否かを加味して評価を行う。 For example, in the computer shown in FIG. 2, the reaction evaluation unit 250 is realized by the CPU 201 executing a program. The response evaluation unit 250 evaluates the response of the evaluation target person based on an evaluation criterion predetermined for each evaluation item with respect to the behavior of the non-language information for each evaluation item extracted by the non-linguistic information extraction unit 240. . In addition, the behavior classified by the classification unit 245 is evaluated in consideration of whether or not the behavior is an utterance.

出力部２６０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行し、ネットワーク・インターフェイス２０５を制御することにより実現される。出力部２６０は、ネットワーク２０を介して、反応評価部２５０による評価結果の情報を端末装置３００に送信する。 For example, in the computer shown in FIG. 2, the output unit 260 is realized by the CPU 201 executing a program and controlling the network interface 205. The output unit 260 transmits information on the evaluation result by the reaction evaluation unit 250 to the terminal device 300 via the network 20.

端末装置３００は、出力手段の一例であり、情報処理装置２００による評価結果を出力する情報端末（クライアント）である。端末装置３００としては、例えばパーソナルコンピュータ、タブレット端末、スマートフォン等の出力手段として画像表示手段を備えた装置が用いられる。 The terminal device 300 is an example of an output unit, and is an information terminal (client) that outputs an evaluation result by the information processing device 200. As the terminal device 300, for example, a device including an image display unit as an output unit such as a personal computer, a tablet terminal, or a smartphone is used.

図４は、端末装置３００のハードウェア構成例を示す図である。図４に示すように、端末装置３００は、ＣＰＵ３０１と、ＲＡＭ３０２およびＲＯＭ３０３と、表示装置３０４と、入力装置３０５と、ネットワーク・インターフェイス３０６とを備える。ＣＰＵ３０１は、ＲＯＭ３０３に格納されているプログラムを実行することにより、各種の制御および演算処理を行う。ＲＡＭ３０２は、ＣＰＵ３０１による制御や演算処理において作業メモリとして用いられる。ＲＯＭ３０３は、ＣＰＵ２０１が実行するプログラムや制御において用いられる各種のデータを格納している。表示装置３０４は、例えば液晶ディスプレイにより構成され、ＣＰＵ３０１の制御により画像を表示する。入力装置３０５は、例えばキーボードやマウス、タッチセンサ等の入力デバイスで実現され、操作者の入力操作を受け付ける。一例として、端末装置３００がタブレット端末やスマートフォン等である場合は、液晶ディスプレイとタッチセンサとが組み合わされたタッチパネルが表示装置３０４および入力装置３０５として機能する。ネットワーク・インターフェイス３０６は、ネットワーク２０に接続して、ビデオカメラ１００や端末装置３００との間でデータの送受信を行う。なお、図４に示す構成例は、端末装置３００をコンピュータで実現するハードウェア構成の一例に過ぎない。端末装置３００の具体的構成は、以下に説明する機能を実現し得るものであれば、図４に示す構成例に限定されない。 FIG. 4 is a diagram illustrating a hardware configuration example of the terminal device 300. As illustrated in FIG. 4, the terminal device 300 includes a CPU 301, a RAM 302 and a ROM 303, a display device 304, an input device 305, and a network interface 306. The CPU 301 performs various controls and arithmetic processes by executing programs stored in the ROM 303. The RAM 302 is used as a work memory in the control and arithmetic processing by the CPU 301. The ROM 303 stores various data used in programs executed by the CPU 201 and control. The display device 304 is configured by a liquid crystal display, for example, and displays an image under the control of the CPU 301. The input device 305 is realized by an input device such as a keyboard, a mouse, or a touch sensor, for example, and accepts an operator's input operation. As an example, when the terminal device 300 is a tablet terminal or a smartphone, a touch panel in which a liquid crystal display and a touch sensor are combined functions as the display device 304 and the input device 305. The network interface 306 is connected to the network 20 and transmits / receives data to / from the video camera 100 and the terminal device 300. The configuration example illustrated in FIG. 4 is merely an example of a hardware configuration that implements the terminal device 300 with a computer. The specific configuration of the terminal device 300 is not limited to the configuration example illustrated in FIG. 4 as long as the functions described below can be realized.

図５は、端末装置３００の機能構成を示す図である。図５に示すように、本実施形態の端末装置３００は、評価結果取得部３１０と、表示画像生成部３２０と、表示制御部３３０と、操作受け付け部３４０とを備える。 FIG. 5 is a diagram illustrating a functional configuration of the terminal device 300. As illustrated in FIG. 5, the terminal device 300 according to the present embodiment includes an evaluation result acquisition unit 310, a display image generation unit 320, a display control unit 330, and an operation reception unit 340.

評価結果取得部３１０は、例えば図４に示すコンピュータにおいて、ＣＰＵ３０１がプログラムを実行し、ネットワーク・インターフェイス３０６を制御することにより実現される。評価結果取得部３１０は、ネットワーク２０を介して情報処理装置２００から評価結果のデータを受信する。受信した評価結果のデータは、例えば図４のＲＡＭ３０２に格納される。 For example, in the computer shown in FIG. 4, the evaluation result acquisition unit 310 is realized by the CPU 301 executing a program and controlling the network interface 306. The evaluation result acquisition unit 310 receives evaluation result data from the information processing apparatus 200 via the network 20. The received evaluation result data is stored, for example, in the RAM 302 of FIG.

表示画像生成部３２０は、例えば図４に示すコンピュータにおいて、ＣＰＵ３０１がプログラムを実行することにより実現される。表示画像生成部３２０は、評価結果取得部３１０により取得された評価結果のデータに基づき、評価結果を示す出力画像を生成する。生成される出力画像の構成や表示態様は、評価項目や評価内容等に応じて設定し得る。出力画像の詳細については後述する。 For example, in the computer shown in FIG. 4, the display image generating unit 320 is realized by the CPU 301 executing a program. The display image generation unit 320 generates an output image indicating the evaluation result based on the evaluation result data acquired by the evaluation result acquisition unit 310. The configuration and display mode of the generated output image can be set according to the evaluation items, evaluation contents, and the like. Details of the output image will be described later.

表示制御部３３０は、例えば図４に示すコンピュータにおいて、ＣＰＵ３０１がプログラムを実行することにより実現される。表示制御部３３０は、表示画像生成部３２０により生成された出力画像を、例えば図４に示すコンピュータにおける表示装置３０４に表示させる。また、表示制御部３３０は、表示装置３０４への表示に関する命令を受け付け、受け付けた命令に基づいて表示の切り替え等の制御を行う。 For example, in the computer shown in FIG. 4, the display control unit 330 is realized by the CPU 301 executing a program. The display control unit 330 displays the output image generated by the display image generation unit 320 on, for example, the display device 304 in the computer shown in FIG. In addition, the display control unit 330 receives a command related to display on the display device 304 and performs control such as display switching based on the received command.

操作受け付け部３４０は、例えば図４に示すコンピュータにおいて、ＣＰＵ３０１がプログラムを実行することにより実現される。操作受け付け部３４０は、操作者が入力装置３０５により行った入力操作を受け付ける。そして、操作受け付け部３４０により受け付けた操作にしたがって、表示制御部３３０が表示装置３０４への出力画像等の表示制御を行う。 For example, in the computer shown in FIG. 4, the operation receiving unit 340 is realized by the CPU 301 executing a program. The operation receiving unit 340 receives an input operation performed by the operator using the input device 305. Then, in accordance with the operation received by the operation receiving unit 340, the display control unit 330 performs display control of an output image or the like to the display device 304.

＜領域識別部の処理＞
情報処理装置２００の領域識別部２２０による処理について説明する。領域識別部２２０は、ビデオカメラ１００により撮影された動画から、その動画に映っている人物の動作に係る部位を識別する。この部位の識別には、既存の種々の画像解析技術を適用してよい。例えば、顔や笑顔の識別は、デジタルカメラ等で実現されている既存の識別手法を用いてよい。また、動画に映されている特定の形状の部分（領域）やそのような複数の部分の配置等に基づいて、身体の部位が映っている領域を特定し得る。さらに一例として、フレーム間特徴量に基づく識別を行ってもよい。具体的には、動画データの連続する２枚以上のフレームの差分に基づき、フレーム間特徴量を求める。ここで、フレーム間特徴量としては、例えば、色の境界（エッジ）、色の変化量、これらによって特定される領域の移動方向や移動量などが用いられる。予め設定された時間分のフレーム間特徴量を累積し、フレームごとのフレーム間特徴量の距離や類似度に基づいて、フレーム間特徴量を分類、統合する。これにより、動画において連携して変化する領域が特定され、身体の部位が映っている領域が識別される。 <Processing of area identification unit>
Processing performed by the area identification unit 220 of the information processing apparatus 200 will be described. The area identifying unit 220 identifies a part related to the motion of a person shown in the moving image from the moving image captured by the video camera 100. Various existing image analysis techniques may be applied to this part identification. For example, an existing identification method realized by a digital camera or the like may be used to identify a face or a smile. Moreover, based on the part (area | region) of the specific shape currently reflected on the moving image, arrangement | positioning of such a some part, etc., the area | region where the body part is reflected can be specified. Further, as an example, identification based on inter-frame feature values may be performed. Specifically, an inter-frame feature value is obtained based on a difference between two or more consecutive frames of moving image data. Here, as the interframe feature amount, for example, a color boundary (edge), a color change amount, a moving direction or a moving amount of an area specified by these, and the like are used. The inter-frame feature quantity for a preset time is accumulated, and the inter-frame feature quantity is classified and integrated based on the distance and similarity of the inter-frame feature quantity for each frame. Thereby, the area | region which changes in cooperation in a moving image is specified, and the area | region where the part of the body is reflected is identified.

図６は、フレーム間特徴量を用いて人体に関わる領域を特定する手法を説明する図である。図６（Ａ）は、動画の１フレームにおいて、人物が横を向いて椅子に座っている様子を示し、図６（Ｂ）は、動画の別の１フレームにおいて、同じ人物が前方へ乗り出した様子を示している。図６に示す例において、領域識別部２２０は、図６（Ａ）に映っている色の境界や変化量に基づき、近似する色が映っている範囲を特定する。そして、領域識別部２２０は、図６（Ａ）のフレームと図６（Ｂ）のフレームとを対比し、対応する色の範囲の移動方向および移動量に基づき、画像中の破線の枠で囲まれた領域２２１において、複数個の色の範囲が連携して動いていることを認識し、この領域２２１を人体の上半身が映っている領域として識別する。図６（Ａ）、（Ｂ）を参照すると、人体（上半身）を構成する色の範囲の動きに応じて、領域２２１の位置や大きさが変化している。なお、ここでは図６（Ａ）、（Ｂ）の２つのフレームを対比したが、３つ以上のフレームを対比して色の範囲の変化等のフレーム間特徴量を累積した結果に基づいて人体が映っている領域を識別するようにしてもよい。 FIG. 6 is a diagram for explaining a method for specifying a region related to a human body using inter-frame feature values. FIG. 6A shows a person sitting sideways in one frame of the video, and FIG. 6B shows that the same person has moved forward in another frame of the video. It shows a state. In the example illustrated in FIG. 6, the region identification unit 220 identifies a range in which an approximate color is reflected based on the color boundary and the amount of change shown in FIG. Then, the region identification unit 220 compares the frame in FIG. 6A and the frame in FIG. 6B and surrounds the frame with a broken line frame in the image based on the moving direction and moving amount of the corresponding color range. In the area 221, it is recognized that a plurality of color ranges are moving in cooperation, and this area 221 is identified as an area in which the upper body of the human body is reflected. Referring to FIGS. 6A and 6B, the position and size of the region 221 change according to the movement of the color range constituting the human body (upper body). Although the two frames in FIGS. 6A and 6B are compared here, the human body is based on the result of accumulating inter-frame feature quantities such as a change in color range by comparing three or more frames. You may make it identify the area | region where is reflected.

＜動作検出部の処理＞
動作検出部２３０による処理について説明する。動作検出部２３０は、領域識別部２２０により識別された身体の部位が映っている領域を解析して、具体的にどの部位が映っているかを特定し、特定した部位ごとの動きを検出する。この動きの検出には、既存の種々の画像解析技術を適用してよい。検出される動きは、特定された部位ごとに身体動作として起こり得る動きである。例えば、目を閉じたり口を開けたりする動き、視線の変化、顔の向きを上下や左右に変える動き、肘の曲げ伸ばしや腕を振る動き、手指の曲げ伸ばしや手を開いたり閉じたりする動き、腰の曲げ伸ばしや体を捻じる動き、膝の曲げ伸ばしや脚を振る動き、歩行等による身体の移動などが検出される。なお、これらの動きは例示に過ぎず、本実施形態の非言語情報評価システム１０で検出し得る動きは、上記に提示した動きに限定されない。本実施形態では、動作検出部２３０は、領域識別部２２０で領域として識別された全ての部位の動きを検出してもよいし、後段の非言語情報抽出部２４０で抽出される動作を特定するための動き等に限定して検出してもよい。例えば、非言語情報抽出部２４０でうなずく動作のみを抽出するのであれば、顔の向きを上下に変えるような頭の動きを検出すればよい。 <Processing of motion detection unit>
Processing by the motion detection unit 230 will be described. The motion detection unit 230 analyzes a region in which the body part identified by the region identification unit 220 is reflected, identifies which part is specifically reflected, and detects a motion for each identified part. Various existing image analysis techniques may be applied to this motion detection. The detected motion is a motion that can occur as a physical motion for each identified part. For example, moving your eyes closed or opening your mouth, changing your line of sight, moving your face up and down, left and right, bending your elbows and waving your arms, bending your fingers, stretching your hands and opening and closing your hands. Motion, bending and stretching of the waist, twisting of the body, bending and stretching of the knee, movement of shaking the legs, movement of the body by walking, etc. are detected. Note that these movements are merely examples, and movements that can be detected by the non-language information evaluation system 10 of the present embodiment are not limited to the movements presented above. In the present embodiment, the motion detection unit 230 may detect the movements of all parts identified as regions by the region identification unit 220 or specify the motions extracted by the non-linguistic information extraction unit 240 at the subsequent stage. Therefore, the detection may be limited to the movement for the purpose. For example, if only the nodding motion is extracted by the non-linguistic information extraction unit 240, it is only necessary to detect a head movement that changes the face direction up and down.

＜非言語情報抽出部の処理＞
非言語情報抽出部２４０による処理について説明する。非言語情報抽出部２４０は、動作検出部２３０により検出された部位の動きに基づいて、評価対象者が意識的にまたは無意識的に行った意味のある行動を非言語情報として抽出する。例えば、顔の向きを上下に変える動きからうなずくという動作を抽出したり、口を動かす動きから発話や欠伸という動作を抽出したり、腕を上げる動きから挙手という動作を抽出したりする。非言語情報の抽出は、単に動作検出部２３０により検出された部位の動きのみに基づいて行われるのではなく、例えば、検出された動きの前後における該当部位の動き、周囲の部位や他の人物の動き、動きが検出された場面や文脈（背景）等の情報も参酌して行われる。具体例を挙げると、顔の向きを上下に連続的に変える動きが特定の時間内で行われたとき、この動きは、うなずきの動作として抽出される。一方、顔の向きが上を向き、ある程度の時間が経過した後に下方向へ動いてもとに戻ったとき、この動きは、思考するために上方を見上げた動作として抽出される。また、顔の向きが下を向き、ある程度の時間が経過したとき、この動作は、居眠りしていることを示す動作として抽出される。なお、これらの動作や参酌情報は例示に過ぎず、本実施形態の非言語情報評価システム１０で非言語情報として抽出し得る動作や参酌情報は、上記に提示した動作や情報に限定されない。 <Processing of non-linguistic information extraction unit>
Processing by the non-language information extraction unit 240 will be described. Based on the movement of the part detected by the motion detection unit 230, the non-linguistic information extraction unit 240 extracts meaningful behavior that the evaluation target person has consciously or unconsciously performed as non-linguistic information. For example, a motion of nodding is extracted from a motion of changing the face direction up and down, a motion of utterance or absence is extracted from a motion of moving the mouth, and a motion of raising a hand is extracted from a motion of raising an arm. Extraction of non-linguistic information is not performed based solely on the movement of the part detected by the motion detection unit 230. For example, the movement of the corresponding part before and after the detected movement, the surrounding part, or another person This is also performed in consideration of information such as movements, scenes where the movements are detected, and context (background). As a specific example, when a motion that continuously changes the orientation of the face up and down is performed within a specific time, this motion is extracted as a motion of nodding. On the other hand, when the face is directed upward, and after a certain amount of time has passed, it moves downward and returns to its original state, and this movement is extracted as an action looking up upward for thinking. Also, when the face is facing down and a certain amount of time has elapsed, this action is extracted as an action indicating that the person is dozing. In addition, these operation | movement and consideration information are only illustrations, and the operation | movement and consideration information which can be extracted as non-language information in the non-language information evaluation system 10 of this embodiment are not limited to the operation | movement and information which were shown above.

＜分類部の処理＞
分類部２４５による処理について説明する。分類部２４５は、まず、非言語情報抽出部２４０により抽出された非言語情報としての行動のうち、発話を表す行動を検出する。例えば、口の連続的な開閉動作を発話動作として検出し得る。次に、分類部２４５は、非言語情報抽出部２４０により抽出された非言語情報としての他の行動を、評価対象者が話者であるときの行動と、評価対象者が話者でないときの行動とに分類する。この分類により、非言語情報として抽出された行動の意義が変わる。例えば、評価対象者自身の発話中に身振り手振りを行うことは発話内容を補足する意義を有することがあるのに対し、他者の発話中に身振り手振りを行うことは他者の発話に注意を向けていないことを表すことがある。また、髪に触れたり、視線や顔の向きを動かしたりする動作は、自身の発話中の行動であれば、あまり否定的な評価とはならないが、他者の発話中の行動では、非常に否定的な評価の根拠となり得る。なお、非言語情報として抽出される行動であっても、評価対象者の発話中か否かを無視してよい（意義の変わらない）行動もあると考えられる。そこで、評価対象者のどのような行動に対して、評価対象者の発話中の行動か否かに基づく分類を行うかは、面接の目的や形式等に応じて予め設定してもよい。 <Processing of classification unit>
Processing by the classification unit 245 will be described. The classification unit 245 first detects an action representing an utterance out of actions as non-language information extracted by the non-language information extraction unit 240. For example, a continuous opening / closing operation of the mouth can be detected as a speech operation. Next, the classification unit 245 performs another behavior as the non-language information extracted by the non-language information extraction unit 240 when the evaluation target person is a speaker and when the evaluation target person is not a speaker. Categorize as behavior. This classification changes the significance of actions extracted as non-linguistic information. For example, gesturing gestures during the utterance of the person being evaluated may have the significance of supplementing the utterance content, while gesturing gestures during the other person's utterance pay attention to the utterances of others. It may indicate that it is not directed. In addition, actions that touch the hair or move the line of sight or face are not very negative if it is an action during the speech of their own, but it is very It can be the basis for negative evaluation. In addition, even if it is an action extracted as non-linguistic information, it may be considered that there is an action that may ignore whether or not the evaluation target person is speaking (the meaning does not change). Therefore, what kind of behavior of the evaluation target person is to be classified based on whether or not the evaluation target person is speaking is determined according to the purpose and format of the interview.

また、ここでは、特定の評価対象者の行動がその評価対象者の発話時の行動か否かについて分類したが、さらに、その行動が面接官の発話中の行動か否か、他の評価対象者の発話中の行動か否か（複数の面接志望者が同時に面接されている場合）、誰も発話していないときの行動か否か等について分類しても良い。例えば、複数の評価対象者（面接志望者）が同時に面接を受ける場合、発話している評価対象者を特定し、その評価対象者の行動は発話中の行動として、他の評価対象者の行動は発話していないときの行動として、異なる評価を行う。 In addition, here, we classify whether or not the behavior of a specific evaluation subject is the behavior at the time of the utterance of the evaluation subject, but further, whether or not the behavior is the behavior during the interviewer's utterance, It is also possible to classify whether or not a person is speaking or not (if a plurality of interview candidates are interviewed at the same time), whether or not no one is speaking, and so on. For example, when multiple assessment subjects (interview candidates) are interviewed at the same time, the assessment subject who is speaking is specified, and the behavior of the assessment subject is the behavior of the other assessment subject as the behavior during the speech. Performs different evaluations as behavior when not speaking.

＜反応評価部の処理＞
反応評価部２５０による処理について説明する。反応評価部２５０は、非言語情報抽出部２４０により抽出された非言語情報に基づき、面接における評価対象者（面接志望者）の反応を評価する。例えば、上述したように、自身の発話以外のときに手が動いたり、顔の向きが頻繁に変わったりした場合は、低い（悪い）評価値が与えられる。また、質問に対する応答の動作が、予め定められた第１の基準時間よりも早く行われた（すなわち応答が速い）場合は、高い（良い）評価値が与えられ、第１の基準時間よりも長い予め定められた第２の基準時間よりも遅く行われた（すなわち、応答が遅い）場合は、低い評価値が与えられる。その他、話者から視線を外したり、視線が下向きであったり、面接の内容や進行とは関係なく手や足が動いたりした場合は、低い評価値が与えられる等、面接の目的等に応じて様々な評価項目および評価基準を設定してよい。面接全般における評価対象者に対する評価は、例えば、面接中に非言語情報としての行動が出現するたびに与えられた評価値を合計して得る。 <Processing of reaction evaluation unit>
Processing by the reaction evaluation unit 250 will be described. The reaction evaluation unit 250 evaluates the reaction of the evaluation target person (interview candidate) in the interview based on the non-language information extracted by the non-language information extraction unit 240. For example, as described above, a low (bad) evaluation value is given when the hand moves at times other than the user's utterance or the face direction changes frequently. In addition, when the response operation to the question is performed earlier than the predetermined first reference time (that is, the response is fast), a high (good) evaluation value is given, which is higher than the first reference time. If it is performed later than the long predetermined second reference time (that is, the response is slow), a low evaluation value is given. In addition, depending on the purpose of the interview, such as a low evaluation value if the line of sight is removed from the speaker, the line of sight is facing downward, or the hand or foot moves regardless of the content or progress of the interview. Various evaluation items and evaluation criteria may be set. The evaluation for the evaluation target person in the entire interview is obtained by summing up the evaluation values given each time an action as non-linguistic information appears during the interview.

また、反応評価部２５０は、非言語情報としての行動が出現したか否かという二値的な評価だけでなく、どの程度強い反応かを表す多値的な評価を行っても良い。多値的な評価を行う場合、反応評価部２５０において評価される評価項目に応じて、その評価に用いられる非言語情報として定義された（抽出される）行動（以下、反応行動）の種類および反応行動の出現態様が設定される。言い換えると、同じ反応行動であっても、その出現態様に応じて異なる評価となる。例えば、非言語情報として抽出される特定の反応行動が１回行われた場合と、複数回繰り返されたり、一定時間以上継続したりした場合とでは評価が異なる。 In addition, the reaction evaluation unit 250 may perform not only a binary evaluation of whether or not an action as non-linguistic information has appeared, but also a multi-level evaluation indicating how strong the reaction is. When performing multivalued evaluation, according to the evaluation items evaluated in the response evaluation unit 250, the types of actions (hereinafter referred to as reaction actions) defined (extracted) as non-linguistic information used for the evaluation and The appearance mode of the reaction behavior is set. In other words, even if the reaction behavior is the same, the evaluation is different depending on the appearance mode. For example, the evaluation differs between a case where a specific reaction action extracted as non-linguistic information is performed once and a case where the specific reaction action is repeated a plurality of times or continued for a certain time or more.

さらに、多値的な評価を行う場合、例えば、その評価項目における反応行動の種類、出現頻度、継続時間などに基づいて、評価の程度を特定してもよい。一例として、面接に対する集中度を評価するための評価対象の反応行動として、うなずく動作が定義されている場合を考える。この場合、一回だけ軽くうなずく動作よりも、複数回繰り返してうなずく動作や、大きな身振りでうなずく動作の方が高い評価値を与える設定としても良い。 Furthermore, when performing multi-level evaluation, for example, the degree of evaluation may be specified based on the type of reaction behavior, the appearance frequency, the duration, and the like in the evaluation item. As an example, let us consider a case where a nodding motion is defined as a reaction behavior to be evaluated for evaluating the degree of concentration with respect to an interview. In this case, it is good also as a setting which gives a higher evaluation value in the nod operation | movement which repeated several times or the nod operation | movement with a big gesture rather than the nod operation lightly once.

＜適用例＞
図７は、本実施形態の非言語情報評価システム１０において、ビデオカメラ１００により撮影された画像の例を示す図である。図７に示す例では、一人の評価対象者（面接志望者）が画面に捉えられている。ここでは、一人の面接官と一人の評価対象者との面接が行われているものとする。したがって、話者は面接官か評価対象者のいずれか一方となる。図示の例において、評価対象者は正面（ビデオカメラ１００の方向）を向いており、面接官は画像に入っていない。このような場面では、評価対象者自身が話者でないときの動作や、視線や顔の向きを正面から外したり頻繁に動かしたりするような動作は、評価を下げる大きな要素となる。一方、面接官が複数いる場合、評価対象者は、各面接官に視線を合わせるため、発話中であっても、発話していないときであっても、度々、視線や顔の向きを変えることが考えられる。したがって、評価基準については、面接の形式等に応じて、様々な設定が行われることが必要である。 <Application example>
FIG. 7 is a diagram illustrating an example of an image photographed by the video camera 100 in the non-linguistic information evaluation system 10 of the present embodiment. In the example shown in FIG. 7, one person to be evaluated (interview candidate) is captured on the screen. Here, it is assumed that an interviewer and an evaluation subject are interviewed. Therefore, the speaker is either the interviewer or the person being evaluated. In the illustrated example, the person to be evaluated is facing the front (in the direction of the video camera 100), and the interviewer is not in the image. In such a situation, an operation when the evaluation target person is not a speaker, or an operation that moves the line of sight or face from the front or moves frequently is a big factor that lowers the evaluation. On the other hand, if there are multiple interviewers, the person to be evaluated often changes the direction of the line of sight or face, even when speaking or not speaking, to adjust the line of sight to each interviewer. Can be considered. Therefore, various settings need to be made for the evaluation criteria depending on the interview format and the like.

＜他の構成例等＞
以上、本実施形態による非言語情報評価システム１０について説明したが、本実施形態の具体的構成は上記のものに限定されない。例えば、上記の構成では、ビデオカメラ１００で取得した動画を情報処理装置２００が処理し、得られた評価結果を出力手段としての端末装置３００が表示出力するシステム構成とした。これに対し、情報処理装置２００が、別途撮影され、記憶装置に蓄積された面接の動画を解析し、面接志望者の評価を行う構成としても良い。 <Other configuration examples>
The non-linguistic information evaluation system 10 according to the present embodiment has been described above, but the specific configuration of the present embodiment is not limited to the above. For example, in the above configuration, the information processing apparatus 200 processes a moving image acquired by the video camera 100, and the terminal apparatus 300 as an output unit displays and outputs the obtained evaluation result. On the other hand, the information processing apparatus 200 may be configured to analyze the interview moving image that is separately captured and stored in the storage device, and to evaluate the interview candidate.

また、上記の実施形態では、評価対象者の口の動きに基づいて話者か否かを判別することとした。これに対し、動画と同時に音声データを収録し、動画の口の動きに加えて発話音声や音声解析により得られる言語情報を参酌して、評価対象者が発話中か否かを判定しても良い。動画と音声データとを対比することにより、話者を特定する精度を向上させ得る。さらにまた、話者を特定するために、動画における口の動きを用いず、発話音声や発話音声から得られる言語情報のみに基づいて話者を特定しても良い。 In the above embodiment, whether or not the speaker is a speaker is determined based on the mouth movement of the evaluation target person. On the other hand, audio data is recorded at the same time as the video, and in addition to the movement of the mouth of the video, the speech information and language information obtained by voice analysis are taken into account to determine whether the evaluation subject is speaking. good. The accuracy of specifying the speaker can be improved by comparing the moving image and the audio data. Furthermore, in order to specify the speaker, the speaker may be specified based on only the speech information or language information obtained from the speech sound without using the mouth movement in the moving image.

発話中か否かの特定に音声データを用いる場合、音声収録手段として、例えば、ビデオカメラ１００に設けられているマイクロフォンを用いることができる。収録された音声は、動画と共に情報処理装置２００へ送られる。情報処理装置２００においては、動画データ取得部２１０が、音声取得部として機能し、動画と共に音声を取得する。また、単に発話音声を用いるだけでなく音声解析により得られる言語情報を用いる場合は、情報処理装置２００において、音声解析部を備える。音声解析部は、例えば、図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。具体的な音声解析の技術としては、既存の種々の解析技術を適用してよい。 When audio data is used for specifying whether or not an utterance is being made, for example, a microphone provided in the video camera 100 can be used as audio recording means. The recorded voice is sent to the information processing apparatus 200 together with the moving image. In the information processing apparatus 200, the moving image data acquisition unit 210 functions as a sound acquisition unit, and acquires sound together with the moving image. Further, when using linguistic information obtained by speech analysis as well as using speech speech, the information processing apparatus 200 includes a speech analysis unit. For example, in the computer shown in FIG. 2, the voice analysis unit is realized by the CPU 201 executing a program. As specific speech analysis techniques, various existing analysis techniques may be applied.

さらにまた、本実施形態において、情報処理装置２００が出力手段を兼ねる構成としてもよい。すなわち、情報処理装置２００と端末装置３００とを分けず、例えば、情報処理装置２００自身が液晶ディスプレイ等の表示装置を備える構成とし、評価結果の表示出力を行うようにしてもよい。 Furthermore, in the present embodiment, the information processing apparatus 200 may also serve as an output unit. That is, the information processing apparatus 200 and the terminal apparatus 300 are not divided, and for example, the information processing apparatus 200 itself may be configured to include a display device such as a liquid crystal display and display the evaluation result.

１０…非言語情報評価システム、２０…ネットワーク、１００…ビデオカメラ、２００…情報処理装置、２０１…ＣＰＵ、２０２…ＲＡＭ、２０３…ＲＯＭ、２０４…外部記憶装置、２０５…ネットワーク・インターフェイス、２１０…動画データ取得部、２２０…領域識別部、２３０…動作検出部、２４０…非言語情報抽出部、２４５…分類部、２５０…反応評価部、２６０…出力部、３００…端末装置、３０１…ＣＰＵ、３０２…ＲＡＭ、３０３…ＲＯＭ、３０４…表示装置、３０５…入力装置、３０６…ネットワーク・インターフェイス、３１０…評価結果取得部、３２０…表示画像生成部、３３０…表示制御部、３４０…操作受け付け部 DESCRIPTION OF SYMBOLS 10 ... Non-language information evaluation system, 20 ... Network, 100 ... Video camera, 200 ... Information processing apparatus, 201 ... CPU, 202 ... RAM, 203 ... ROM, 204 ... External storage device, 205 ... Network interface, 210 ... Movie Data acquisition unit, 220 ... region identification unit, 230 ... motion detection unit, 240 ... non-linguistic information extraction unit, 245 ... classification unit, 250 ... reaction evaluation unit, 260 ... output unit, 300 ... terminal device, 301 ... CPU, 302 ... RAM, 303 ... ROM, 304 ... display device, 305 ... input device, 306 ... network interface, 310 ... evaluation result acquisition unit, 320 ... display image generation unit, 330 ... display control unit, 340 ... operation reception unit

Claims

動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出部と、
前記動作検出部により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出部と、
前記行動抽出部により抽出された行動を発話中の行動か否かに応じて分類する分類部と、
前記行動抽出部により抽出された行動および前記分類部による分類に応じて前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価部と、
を備えることを特徴とする、情報処理装置。 An action detection unit that identifies a part of the human body shown in the video data and detects the action of the specified part;
An action extraction unit for extracting an action defined as an evaluation target in a predetermined evaluation item based on the movement of the part of the human body detected by the movement detection unit;
A classification unit that classifies the behavior extracted by the behavior extraction unit according to whether or not it is an uttering behavior;
An evaluation unit that performs evaluation for each evaluation item based on an evaluation criterion that is predetermined for each evaluation item according to the behavior extracted by the behavior extraction unit and the classification by the classification unit;
An information processing apparatus comprising:

前記評価部は、前記分類部による分類に基づき、前記行動抽出部により抽出された行動の少なくとも一部に対し、発話中に行われた行動と発話中でないときに行われた行動とで異なる評価を行うことを特徴とする、請求項１に記載の情報処理装置。 The evaluation unit, based on the classification by the classification unit, for the at least part of the behavior extracted by the behavior extraction unit, the evaluation that is different between the behavior performed during utterance and the behavior performed when not speaking The information processing apparatus according to claim 1, wherein:

前記分類部は、前記行動抽出部により抽出された行動のうち、口を動かす動作に基づいて、発話中か否かを特定することを特徴とする、請求項１または請求項２に記載の情報処理装置。 3. The information according to claim 1, wherein the classification unit specifies whether or not a speech is being performed based on an action of moving a mouth among the actions extracted by the action extraction unit. 4. Processing equipment.

前記分類部は、前記動画データと共に取得された音声に基づいて、または、当該音声と前記行動抽出部により抽出された行動のうちの口を動かす動作とに基づいて、発話中か否かを特定することを特徴とする、請求項１または請求項２に記載の情報処理装置。 The classification unit specifies whether or not the speech is being performed based on the voice acquired together with the moving image data or based on the voice and an action of moving the mouth of the behavior extracted by the behavior extraction unit. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

動画データを取得する取得手段と、
前記取得手段により取得された動画データを解析して動画に映っている人物の行動を評価する行動評価手段と、
前記行動評価手段による評価結果を出力する出力手段と、を備え、
前記行動評価手段は、
動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出部と、
前記動作検出部により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出部と、
前記行動抽出部により抽出された行動を発話中の行動か否かに応じて分類する分類部と、
前記行動抽出部により抽出された行動および前記分類部による分類に応じて前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価部と、
を備えることを特徴とする、評価システム。 An acquisition means for acquiring video data;
Behavior evaluation means for analyzing the video data acquired by the acquisition means and evaluating the behavior of a person shown in the video;
Output means for outputting an evaluation result by the behavior evaluation means,
The behavior evaluation means includes
An action detection unit that identifies a part of the human body shown in the video data and detects the action of the specified part;
An action extraction unit for extracting an action defined as an evaluation target in a predetermined evaluation item based on the movement of the part of the human body detected by the movement detection unit;
A classification unit that classifies the behavior extracted by the behavior extraction unit according to whether or not it is an uttering behavior;
An evaluation unit that performs evaluation for each evaluation item based on an evaluation criterion that is predetermined for each evaluation item according to the behavior extracted by the behavior extraction unit and the classification by the classification unit;
An evaluation system comprising:

前記行動評価手段の前記分類部は、前記行動抽出部により抽出された行動のうち、口を動かす動作に基づいて、発話中か否かを特定することを特徴とする、請求項５に記載の評価システム。 The said classification | category part of the said action evaluation means specifies whether it is during utterance based on the operation | movement which moves a mouth among the actions extracted by the said action extraction part, The Claim 5 characterized by the above-mentioned. Evaluation system.

音声を収録する音声収録手段をさらに備え、
前記行動評価手段の前記分類部は、前記取得手段により取得された前記動画データと共に前記音声収録手段により収録された音声に基づいて、または、当該音声と前記行動抽出部により抽出された行動のうちの口を動かす動作とに基づいて、発話中か否かを特定することを特徴とする、請求項５に記載の評価システム。 It further comprises a voice recording means for recording voice,
The classification unit of the behavior evaluation unit is based on the voice recorded by the voice recording unit together with the moving image data acquired by the acquisition unit, or among the behaviors extracted by the voice and the behavior extraction unit 6. The evaluation system according to claim 5, wherein it is specified whether or not the utterance is in progress based on an action of moving the mouth.

コンピュータを、
動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出手段と、
前記動作検出手段により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出手段と、
前記行動抽出手段により抽出された行動を発話中の行動か否かに応じて分類する分類手段と、
前記行動抽出手段により抽出された行動および前記分類手段による分類に応じて前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価手段として機能させること、
を特徴とする、プログラム。 Computer
A motion detection means for identifying a part of the human body shown in the video data and detecting a motion of the identified part;
Action extracting means for extracting an action defined as an evaluation target in a predetermined evaluation item based on the action of the part of the human body detected by the action detecting means;
A classifying unit that classifies the behavior extracted by the behavior extracting unit according to whether or not it is an uttering behavior;
Functioning as an evaluation unit that performs evaluation for each evaluation item based on an evaluation criterion predetermined for each evaluation item according to the behavior extracted by the behavior extraction unit and the classification by the classification unit;
A program characterized by