WO2022209416A1

WO2022209416A1 - Information processing device, information processing system, and information processing method

Info

Publication number: WO2022209416A1
Application number: PCT/JP2022/006852
Authority: WO
Inventors: 拓哉岸本; 乃愛金子; 咲湖安川; 拓田中; 正範勝; 厚志大久保
Original assignee: ソニーグループ株式会社
Priority date: 2021-03-30
Filing date: 2022-02-21
Publication date: 2022-10-06

Abstract

A server device (20) that is an example of one form of an information processing device according to the present disclosure comprises: an input unit (21) to which voices and images of a patient and a doctor are inputted; an extraction unit (for example, a processing unit (22)) that extracts, from the voices and images of the patient and the doctor, a feature amount relating to communication between the patient and the doctor; an estimation unit (for example, the processing unit (22)) that estimates a satisfaction level, dissatisfaction level, or anxiety level of the patient on the basis of the feature amount; and an output unit (23) that outputs the satisfaction level, dissatisfaction level, or anxiety level of the patient.

Description

情報処理装置、情報処理システム及び情報処理方法Information processing device, information processing system and information processing method

　本開示は、情報処理装置、情報処理システム及び情報処理方法に関する。 The present disclosure relates to an information processing device, an information processing system, and an information processing method.

　精神疾患患者などの医療面談で患者状態を診察し、面談による認知療法を行う場合や薬物治療を継続して行う場合、患者は継続的に医師と面談する必要がある。精神疾患患者の場合には、診察時の患者の満足度と治療継続意欲に関連性が高い。つまり、診察時の患者の満足度により、患者が継続して来院または面談する行動に変化があり、診察時の患者の満足度は治療効果に影響する。患者の治療継続意欲が低下すると、治療の中断により治療効果は低下する。 When examining the patient's condition through medical interviews such as those with mental illness, and when conducting cognitive therapy through interviews or continuing drug treatment, the patient must continue to have an interview with the doctor. In the case of psychiatric patients, there is a strong correlation between patient satisfaction at the time of consultation and willingness to continue treatment. In other words, depending on the patient's satisfaction at the time of consultation, there is a change in the patient's behavior of continuing to visit the hospital or have an interview, and the patient's satisfaction at the time of consultation affects the therapeutic effect. When the patient's willingness to continue treatment declines, the treatment effect declines due to interruption of treatment.

米国特許第９０５８８１６号明細書U.S. Pat. No. 9,058,816 米国特許第１００６８６７０号明細書U.S. Patent No. 10068670

　通常、医師は、問診や観察する患者の動態／表出反応（理学所見）によって、患者の満足／不満を推し量っているが、正確に把握することは困難である。近年、患者の満足／不満の定量化をしたいという医師のニーズが顕在化しているが、患者の満足／不満の推定は既存アルゴリズムの転用では難しい。このため、現状、医師が患者の満足／不満を把握することは困難であり、結果として、患者の治療継続意欲が低下し、治療の中断により治療効果が低下してしまう。　Physicians usually estimate the patient's satisfaction/dissatisfaction by interviewing and observing the patient's dynamics/expressive reactions (physical findings), but it is difficult to accurately grasp it. In recent years, physicians' needs to quantify patient satisfaction/dissatisfaction have become apparent, but it is difficult to estimate patient satisfaction/dissatisfaction by diversion of existing algorithms. Therefore, at present, it is difficult for a doctor to grasp a patient's satisfaction/dissatisfaction.

　そこで、本開示では、治療効果を向上させることが可能な情報処理装置、情報処理システム及び情報処理方法を提案する。 Therefore, the present disclosure proposes an information processing device, an information processing system, and an information processing method capable of improving the therapeutic effect.

　本開示の実施形態に係る情報処理装置は、患者及び医師の音声と画像を入力する入力部と、前記患者及び前記医師の音声と画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定する推定部と、前記患者の満足度、不満度又は不安度を出力する出力部と、を備える。 An information processing apparatus according to an embodiment of the present disclosure includes an input unit for inputting voices and images of a patient and a doctor, and extracts feature amounts related to communication between the patient and the doctor from the voices and images of the patient and the doctor. an estimating unit for estimating the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature amount; and an output unit for outputting the degree of satisfaction, dissatisfaction or anxiety of the patient.

　本開示の実施形態に係る情報処理装置は、医師の音声を入力する入力部と、前記医師の音声から、前記医師と患者とのコミュニケーションに関する前記医師の音声特徴量を抽出する抽出部と、前記音声特徴量と、前記医師の返答期待時間に関する情報とに基づいて、前記返答期待時間を学習する切り出し学習部と、を備える。 An information processing apparatus according to an embodiment of the present disclosure includes an input unit for inputting a doctor's voice, an extraction unit for extracting the doctor's voice feature amount related to communication between the doctor and the patient from the doctor's voice, and the a clipping learning unit that learns the expected response time based on the voice feature amount and information about the expected response time of the doctor.

　本開示の実施形態に係る情報処理装置は、患者及び医師の音声と画像を入力する入力部と、前記患者及び医師の音声と画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、前記特徴量と、前記患者の満足度、不満度又は不安度に関するアンケートとに基づいて、前記患者の満足度、不満度又は不安度を学習する学習部と、を備える。 An information processing apparatus according to an embodiment of the present disclosure extracts feature amounts related to communication between the patient and the doctor from an input unit for inputting voices and images of the patient and the doctor, and from the voices and images of the patient and the doctor. an extraction unit; and a learning unit that learns the patient's satisfaction level, dissatisfaction level, or anxiety level based on the feature quantity and a questionnaire regarding the patient's satisfaction level, dissatisfaction level, or anxiety level.

　本開示の実施形態に係る情報処理システムは、患者及び医師の音声及び画像を取得する情報取得装置と、前記患者及び医師の音声及び画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定する推定部と、前記患者の満足度、不満度又は不安度を表示する表示部と、を備える。 An information processing system according to an embodiment of the present disclosure includes an information acquisition device that acquires voices and images of a patient and a doctor, and extracts feature amounts related to communication between the patient and the doctor from the voices and images of the patient and the doctor. an extraction unit that estimates the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature quantity; and a display unit that displays the degree of satisfaction, dissatisfaction or anxiety of the patient.

　本開示の実施形態に係る情報処理方法は、コンピュータが、患者及び医師の音声及び画像を取得し、前記患者及び医師の音声及び画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出し、前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定し、前記患者の満足度、不満度又は不安度を表示する、ことを含む。 In an information processing method according to an embodiment of the present disclosure, a computer acquires voices and images of a patient and a doctor, and extracts feature amounts related to communication between the patient and the doctor from the voices and images of the patient and the doctor. and estimating the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature amount, and displaying the degree of satisfaction, dissatisfaction or anxiety of the patient.

本開示の実施形態に係る情報処理システムの概略構成の一例を示す第１の図である。1 is a first diagram illustrating an example of a schematic configuration of an information processing system according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係る情報処理システムの概略構成の一例を示す第２の図である。2 is a second diagram illustrating an example of a schematic configuration of an information processing system according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係る情報処理システムの各部の概略構成の一例を示す図である。It is a figure showing an example of a schematic structure of each part of an information processing system concerning an embodiment of this indication. 本開示の実施形態に係る満足度（又は不満度）推定処理の流れの一例を示すフローチャートである。6 is a flowchart showing an example of the flow of satisfaction (or dissatisfaction) estimation processing according to the embodiment of the present disclosure; 本開示の実施形態に係る全体の処理の流れを説明するための図である。FIG. 4 is a diagram for explaining the flow of overall processing according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係る全体の処理における各種処理の流れを説明するための第１の図である。FIG. 4 is a first diagram for explaining the flow of various processes in the overall process according to the embodiment of the present disclosure; 本開示の実施形態に係る全体の処理における各種処理の流れを説明するための第２の図である。FIG. 7 is a second diagram for explaining the flow of various processes in the overall process according to the embodiment of the present disclosure; 本開示の実施形態に係る全体の処理における各種処理の流れを説明するための第３の図である。FIG. 9 is a third diagram for explaining the flow of various processes in the overall process according to the embodiment of the present disclosure; 本開示の実施形態に係る全体の処理における各種処理の流れを説明するための第４の図である。FIG. 10 is a fourth diagram for explaining the flow of various processes in the overall process according to the embodiment of the present disclosure; 本開示の実施形態に係る視線交錯推定処理の一例を説明するための第１の図である。FIG. 4 is a first diagram for explaining an example of a line-of-sight intersection estimation process according to an embodiment of the present disclosure; 本開示の実施形態に係る視線交錯推定処理の一例を説明するための第２の図である。FIG. 7 is a second diagram for explaining an example of the line-of-sight intersection estimation process according to the embodiment of the present disclosure; 本開示の実施形態に係る視線交錯推定処理の一例を説明するための第３の図である。FIG. 13 is a third diagram for explaining an example of the line-of-sight intersection estimation process according to the embodiment of the present disclosure; 本開示の実施形態に係る医師用の表示画像の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of a display image for a doctor according to the embodiment of the present disclosure; FIG. 本開示の実施形態に係るシステム応用サービスの一例を説明するための図である。1 is a diagram for explaining an example of a system application service according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係るハードウェアの概略構成の一例を示す図である。1 is a diagram illustrating an example of a schematic configuration of hardware according to an embodiment of the present disclosure; FIG.

　以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、この実施形態により本開示に係る装置、方法及びシステム等が限定されるものではない。また、以下の各実施形態において、基本的に同一の部位には同一の符号を付することにより重複する説明を省略する。 Below, embodiments of the present disclosure will be described in detail based on the drawings. Note that the apparatus, method, system, etc. according to the present disclosure are not limited by this embodiment. Further, in each of the following embodiments, basically the same parts are denoted by the same reference numerals, thereby omitting duplicate descriptions.

　以下に説明される１又は複数の実施形態（実施例、変形例を含む）は、各々が独立に実施されることが可能である。一方で、以下に説明される複数の実施形態は少なくとも一部が他の実施形態の少なくとも一部と適宜組み合わせて実施されてもよい。これら複数の実施形態は、互いに異なる新規な特徴を含み得る。したがって、これら複数の実施形態は、互いに異なる目的又は課題を解決することに寄与し得、互いに異なる効果を奏し得る。なお、各実施形態において、機能や実施場所等はそれぞれ異なっていてもよい。 Each of one or more embodiments (including examples and modifications) described below can be implemented independently. On the other hand, at least some of the embodiments described below may be implemented in combination with at least some of the other embodiments as appropriate. These multiple embodiments may include novel features that differ from each other. Therefore, these multiple embodiments can contribute to solving different purposes or problems, and can produce different effects. In addition, in each embodiment, the function, implementation place, etc. may be different from each other.

　以下に示す項目順序に従って本開示を説明する。
　１．実施形態
　１－１．情報処理システムの概略構成の一例
　１－２．情報処理システムの各部の概略構成の一例
　１－３．満足度（又は不満度）推定処理の流れの一例
　１－４．全体の処理の流れの一例
　１－５．全体の処理における各種処理の流れの一例
　１－６．視線交錯推定処理の一例
　１－７．医師用の表示画像の一例
　１－８．システム応用サービスの一例
　１－９．効果
　２．他の実施形態
　３．ハードウェア構成の一例
　４．付記 The present disclosure will be described according to the order of items shown below.
1. Embodiment 1-1. Example of schematic configuration of information processing system 1-2. Example of schematic configuration of each part of information processing system 1-3. Example of flow of satisfaction level (or dissatisfaction level) estimation process 1-4. Example of overall processing flow 1-5. Example of flow of various processes in overall process 1-6. Example of line-of-sight intersection estimation process 1-7. Example of display image for doctor 1-8. Example of system application service 1-9. Effect 2. Other Embodiments 3. An example of hardware configuration4. Supplementary note

　＜１．実施形態＞
　＜１－１．情報処理システムの概略構成の一例＞
　本実施形態に係る情報処理システム１０の概略構成の一例について図１及び図２を参照して説明する。図１及び図２は、本実施形態に係る情報処理システム１０の概略構成の一例を示す図である。図１及び図２の例では、情報処理システム１０は、治療継続を支援する治療継続支援システム（医療面談継続システム）として機能する。 <1. embodiment>
<1-1. Example of schematic configuration of information processing system>
An example of a schematic configuration of an information processing system 10 according to this embodiment will be described with reference to FIGS. 1 and 2. FIG. 1 and 2 are diagrams showing an example of a schematic configuration of an information processing system 10 according to this embodiment. In the example of FIGS. 1 and 2, the information processing system 10 functions as a treatment continuation support system (medical interview continuation system) that supports treatment continuation.

　図１に示すように、情報処理システム１０は、サーバ装置２０と、情報取得装置３０と、医師端末装置４０と、患者端末装置５０とを備えている。これらのサーバ装置２０、情報取得装置３０、医師端末装置４０及び患者端末装置５０は、有線及び無線の両方又は一方の通信網６０を介して通信可能に接続されている。通信網６０としては、例えば、インターネットやＷＡＮ（Wide　Area　Network）、ＬＡＮ（Local　Area　Network）、衛星通信網等が用いられる。サーバ装置２０、医師端末装置４０及び患者端末装置５０は、それぞれ情報処理装置に相当する。 As shown in FIG. 1, the information processing system 10 includes a server device 20, an information acquisition device 30, a doctor terminal device 40, and a patient terminal device 50. These server device 20 , information acquisition device 30 , doctor terminal device 40 and patient terminal device 50 are communicably connected via a wired and/or wireless communication network 60 . As the communication network 60, for example, the Internet, WAN (Wide Area Network), LAN (Local Area Network), satellite communication network, etc. are used. The server device 20, the doctor terminal device 40, and the patient terminal device 50 each correspond to an information processing device.

　サーバ装置２０は、情報取得装置３０から各種情報を受信し、また、医師端末装置４０や患者端末装置５０等にそれぞれ各種情報を送信する。また、サーバ装置２０は、受信した各種情報に対して各種処理を実行し、処理済の各種情報を医師端末装置４０や患者端末装置５０等に適宜送信する。サーバ装置２０としては、例えば、コンピュータ装置が用いられる。 The server device 20 receives various types of information from the information acquisition device 30, and also transmits various types of information to the doctor terminal device 40, the patient terminal device 50, and the like. In addition, the server device 20 performs various processes on the received various information, and appropriately transmits the processed various information to the doctor terminal device 40, the patient terminal device 50, and the like. For example, a computer device is used as the server device 20 .

　情報取得装置３０は、医師と患者が面談する面談室内において、患者及び医師の音声と画像等のセンシングデータを取得し、取得したセンシングデータをサーバ装置２０に送信する。情報取得装置３０としては、例えば、マイクやカメラ等が用いられる。マイクやカメラ等の装置は、例えば、医師と患者が面談する面談室内に設けられる。マイクやカメラは、医師及び患者に共通に設けられてもよく、また、医師及び患者毎に個別に設けられてもよい。マイクは、音を電気信号に変換して取得する。カメラは、周囲に位置する被写体からの光を集光して撮像面に光像を形成し、撮像面に形成された光像を電気的な画像信号に変換することによって画像を取得する。なお、情報取得装置３０は、面談室以外に設けられてもよく、例えば、オンライン面談の場合には、患者側及び医師側それぞれに設けられてもよい。面談の実施場所は、面談室に限られるものではなく、例えば、オンライン面談のように互いに異なる場所であってよく、様々な場所であってもよい。 The information acquisition device 30 acquires sensing data such as voices and images of the patient and the doctor in the interview room where the doctor and the patient meet, and transmits the acquired sensing data to the server device 20 . For example, a microphone, a camera, or the like is used as the information acquisition device 30 . Devices such as microphones and cameras are installed, for example, in an interview room where a doctor and a patient interview. A microphone and a camera may be provided commonly for the doctor and the patient, or may be provided individually for each doctor and the patient. A microphone converts sound into an electrical signal and captures it. A camera gathers light from a subject located in the surroundings to form an optical image on an imaging surface, and acquires an image by converting the optical image formed on the imaging surface into an electrical image signal. Note that the information acquisition device 30 may be provided in a room other than the interview room. For example, in the case of an online interview, the information acquisition device 30 may be provided on both the patient side and the doctor side. The location of the interview is not limited to the interview room, and may be different locations such as an online interview, or various locations.

　医師端末装置４０は、サーバ装置２０から各種情報を受信し、また、受信した各種情報を表示して医師に提示する。この医師端末装置４０は、医師によって用いられる。医師端末装置４０としては、例えば、パーソナルコンピュータ（例えば、デスクトップパソコンやノートパソコン、タブレット端末等）やスマートフォン等が用いられる。 The doctor terminal device 40 receives various information from the server device 20, and displays and presents the received various information to the doctor. This doctor terminal device 40 is used by a doctor. As the doctor terminal device 40, for example, a personal computer (for example, a desktop computer, a notebook computer, a tablet terminal, etc.), a smart phone, or the like is used.

　患者端末装置５０は、サーバ装置２０から各種情報を受信し、また、受信した各種情報を表示して患者に提示する。この患者端末装置５０は、患者によって用いられる。患者端末装置５０としては、医師端末装置４０と同様、例えば、パーソナルコンピュータやスマートフォン等が用いられる。 The patient terminal device 50 receives various information from the server device 20, and displays and presents the received various information to the patient. This patient terminal device 50 is used by a patient. Similar to the doctor terminal device 40, the patient terminal device 50 is, for example, a personal computer, a smart phone, or the like.

　ここで、サーバ装置２０は、病院内に設けられてもよく、また、病院外に設けられてもよい。また、サーバ装置２０は、例えば、クラウドコンピューティングにより実現されてもよい。情報取得装置３０も、病院内に設けられてもよく、また、病院外に設けられてもよい。情報取得装置３０は、通信網６０に直接接続されてもよく、また、医師端末装置４０を介して通信網６０に接続されてもよい。なお、図１の例では、医師端末装置４０及び患者端末装置５０、また、情報取得装置３０の数はそれぞれ一つであるが、その数は限定されるものではなく、１又は複数であってもよい。 Here, the server device 20 may be provided inside the hospital, or may be provided outside the hospital. Also, the server device 20 may be realized by, for example, cloud computing. The information acquisition device 30 may also be provided inside the hospital or outside the hospital. The information acquisition device 30 may be directly connected to the communication network 60 or may be connected to the communication network 60 via the doctor terminal device 40 . In the example of FIG. 1, the number of the doctor terminal device 40, the number of the patient terminal device 50, and the number of the information acquisition device 30 are respectively one, but the number is not limited, and may be one or more. good too.

　図２に示すように、サーバ装置２０は、情報取得装置３０（録音＋録画）から事前に患者や医者の話者の音声情報を取得した上で、話者ごとに音源の解析（例えば、周波数解析）を行い、患者音声情報と医師音声情報とを分離する。その後、サーバ装置２０は、医師音声情報から医師の返答期待時間を特定し、特定した返答期待時間における患者や医者の音声情報や顔画像情報等に基づく特徴量（例えば、音声特徴量、顔画像特徴量、視差交錯複合特徴量等）と、カルテや患者アンケート等のテキストデータとを併せて、学習モデル（例えば、推論モデル）に入力し、患者の満足度や不満度、抑うつ度、また、不安度等をスコアリングする。スコアリングする方法として、例えば、事前に録音＋録画するときに、医師が患者の状態を診て、リアルタイムに時間スタンプとマーカーを付記し、スコアを入力することで、画像と音声を取得時に学習モデルを構築するための教師ＤＡＴＡとすることができる。また、本装置を、教師ＤＡＴＡを生成するための装置として使用できる。リアルタイムで入植できない場合には、例えば、録画＋録音ＤＡＴＡに時間スタンプとマーカーを付記することで教師ＤＡＴＡを生成する。 As shown in FIG. 2, the server device 20 acquires the voice information of the speaker of the patient or the doctor in advance from the information acquisition device 30 (recording + recording), and then analyzes the sound source (for example, frequency analysis) to separate the patient voice information and the doctor voice information. After that, the server device 20 identifies the doctor's expected response time from the doctor's voice information, and the feature amount (for example, voice feature amount, face image, etc.) based on the patient's or doctor's voice information, facial image information, etc. Characteristic value, parallax complex composite feature value, etc.) and text data such as medical charts and patient questionnaires are input into a learning model (for example, an inference model), and the patient's satisfaction, dissatisfaction, depression, and Anxiety, etc. are scored. As a method of scoring, for example, when recording + recording in advance, the doctor examines the patient's condition, adds a time stamp and marker in real time, and inputs the score, so that the image and sound can be learned at the time of acquisition. It can be a teacher DATA for building a model. Also, this device can be used as a device for generating teacher DATA. If real-time settlement is not possible, teacher DATA is generated by, for example, adding a time stamp and a marker to video+recording DATA.

　医師の返答期待時間は、例えば、医師の音声情報において医師が患者からの返答を期待する部分の時間である。また、音声特徴量、顔画像特徴量、視差交錯複合特徴量等の特徴量は、医療面談において、患者と医師とのコミュニケーションに関する特徴量である。学習モデルは、例えば、畳み込みニューラルネットワーク（ＤＮＮ）やディープラーニング（Deep　learning）で学習したモデルである。例えば、返答を期待する部分の時間を学習モデルで生成するときの教師ＤＡＴＡは、医師がリアルタイムで時間スタンプを音声ＤＡＴＡに付記することで生成することができる。 The doctor's expected response time is, for example, the part of the doctor's voice information in which the doctor expects a response from the patient. Further, the feature amounts such as the voice feature amount, the face image feature amount, the parallax composite feature amount, etc. are the feature amounts related to the communication between the patient and the doctor in the medical interview. A learning model is, for example, a model learned by a convolutional neural network (DNN) or deep learning. For example, when the learning model is used to generate the time of the part where a response is expected, the teacher DATA can be generated by adding a time stamp to the voice DATA in real time by the doctor.

　サーバ装置２０は、満足度や不満度、抑うつ度、不安度等のスコア結果を医師端末装置４０に送信する。医師端末装置４０は、スコア結果を表示して医師に提示する。また、サーバ装置２０は、例えば、スコア結果を元に患者へのアプローチ（例えば、メールやＳＮＳ（Social　Networking　Service）等のやり取り）を行い、患者報告アウトカム電子システム（ｅＰＲＯ）に対するモニタリング・フィードバックを含めたアプリケーションを提供する。患者報告アウトカム電子システムとしては、例えば、患者日誌システム等がある。なお、メール等による患者へのアプローチは、満足度や不満度、抑うつ度、不安度等に応じて実行される。 The server device 20 transmits score results such as satisfaction, dissatisfaction, depression, and anxiety to the doctor terminal device 40 . The doctor terminal device 40 displays the score result and presents it to the doctor. In addition, the server device 20, for example, approaches the patient based on the score results (e.g., exchanging e-mail, SNS (Social Networking Service), etc.), including monitoring and feedback to the patient report outcome electronic system (ePRO) provide a better application. Electronic patient reporting outcomes systems include, for example, patient diary systems. The approach to the patient by e-mail or the like is carried out according to the degree of satisfaction, the degree of dissatisfaction, the degree of depression, the degree of anxiety, and the like.

　ここで、患者の治療継続意欲が低下すると、通院・治療の中断により治療効果は低下する。そこで、満足度や不満度、抑うつ度、不安度等を患者の画像と音声により推定し、医療面談時に医師が次回来院を患者に促したり、又は、在宅の患者に来院や教育コンテンツ視聴、アプリ使用等を促したりすることで、治療効果を向上させることができる。特に、患者の満足や不満度、抑うつ度、不安度等を定量的な情報として可視化することで、医師は満足度や不満度、抑うつ度、不安度等を客観的に把握することができる。また、医師の主観が入らないことで、患者へのアプローチ方法が医師毎にばらつくことを抑えることができる。なお、音声情報には、医師や看護師と患者の音声が混ざっていることから、患者のみ、医師のみのように音源分離が行われる。 Here, if the patient's willingness to continue treatment declines, the treatment effect will decline due to the interruption of hospital visits and treatment. Therefore, by estimating the degree of satisfaction, dissatisfaction, depression, anxiety, etc. from the patient's image and voice, the doctor can encourage the patient to visit the hospital next time during the medical interview, or the patient at home can visit the hospital, watch educational content, and use the app. The therapeutic effect can be improved by promoting the use or the like. In particular, by visualizing the patient's satisfaction, dissatisfaction, depression, anxiety, etc. as quantitative information, the doctor can objectively grasp the satisfaction, dissatisfaction, depression, anxiety, etc. In addition, since the doctor's subjectivity is not included, it is possible to suppress variations in the method of approaching the patient from doctor to doctor. Since the voice information includes the voices of the doctor, the nurse, and the patient, sound source separation is performed for the patient only and the doctor only.

　また、患者の満足度や不満度、抑うつ度、不安度等は医療面談時の医師と患者の相互作用によって変化しており、患者の音声のみでは判定の精度は向上しない。そこで、１つもしくは複数のマイクで患者と医師の音声を分離し、医師と患者の発話のタイミングや応答時間を解析パラメータに取り入れることで、より正確に満足度や不満度、抑うつ度、不安度等を推定することができる。さらに、音声ではなく、患者の顔画像を利用することでより患者の満足度や不満度、抑うつ度、不安度等を正確に推定することができる。全体像から顔部分を顔認識技術により推定し、患者の顔画像を拡大して詳細に記録することで、顔の小さな変化を正確に抽出できるようになる。また、医師と患者の視線を推定し、視線の交差や方向から医師と患者のアイコンタクト（視線の交錯）の度合いと頻度を推定し、解析パラメータに取り入れることで、より正確に患者の満足度や不満度、抑うつ度、不安度を推定することができる。視線の交錯とは、二者間で視線が一致する状態であり、例えば、心理学では、二者間で互いに相手の目に意識的な視線を向け見つめ合う状態であると定義されている。加えて、患者に装着したセンサまたは画像解析から取得した心拍や心拍変動、発汗、血圧等のバイタルデータを解析パラメータに取り入れることで、より正確に患者の満足度や不満度、抑うつ度、不安度を推定することができる。 In addition, the patient's satisfaction, dissatisfaction, depression, anxiety, etc. change depending on the interaction between the doctor and the patient during the medical interview, and the patient's voice alone does not improve the accuracy of the judgment. Therefore, by separating the patient's and doctor's voices with one or more microphones and incorporating the timing and response time of the doctor's and patient's utterances into the analysis parameters, satisfaction, dissatisfaction, depression, and anxiety can be obtained more accurately. etc. can be estimated. Furthermore, by using the patient's face image instead of the voice, the patient's degree of satisfaction, dissatisfaction, depression, anxiety, etc. can be estimated more accurately. By estimating the facial part from the overall image using facial recognition technology, and by enlarging the patient's facial image and recording it in detail, it will be possible to accurately extract small facial changes. In addition, by estimating the line of sight of the doctor and the patient, estimating the degree and frequency of eye contact (crossing of the line of sight) between the doctor and the patient from the line of sight intersection and direction, and incorporating it into the analysis parameters, the patient's satisfaction can be more accurately calculated. , degree of dissatisfaction, degree of depression, and degree of anxiety can be estimated. Interlacing of gazes is a state in which the gazes of two persons coincide, and is defined, for example, in psychology as a state in which two persons consciously direct their gazes to each other's eyes and gaze at each other. In addition, by incorporating vital data such as heart rate, heart rate variability, perspiration, and blood pressure acquired from sensors attached to the patient or image analysis into analysis parameters, the patient's satisfaction, dissatisfaction, depression, and anxiety levels can be more accurately calculated. can be estimated.

　＜１－２．情報処理システムの各部の概略構成の一例＞
　本実施形態に係る情報処理システム１０の各部、すなわち、サーバ装置２０、医師端末装置４０及び患者端末装置５０の概略構成の一例について図３を参照して説明する。図３は、本実施形態に係る情報処理システム１０の各部の概略構成の一例を示す図である。 <1-2. Example of schematic configuration of each part of information processing system>
An example of the schematic configuration of each part of the information processing system 10 according to the present embodiment, that is, the server device 20, the doctor terminal device 40, and the patient terminal device 50 will be described with reference to FIG. FIG. 3 is a diagram showing an example of a schematic configuration of each part of the information processing system 10 according to this embodiment.

　図３に示すように、サーバ装置２０は、入力部２１と、処理部２２と、出力部２３とを備える。処理部２２は、音声特徴量抽出部２２ａと、切り出し学習モデル２２ｂと、音声・画像特徴量抽出部２２ｃと、学習モデル２２ｄとを有する。処理部２２は、抽出部や推定部、学習部等に相当する。 As shown in FIG. 3, the server device 20 includes an input unit 21, a processing unit 22, and an output unit 23. The processing unit 22 includes an audio feature amount extraction unit 22a, a clipping learning model 22b, an audio/image feature amount extraction unit 22c, and a learning model 22d. The processing unit 22 corresponds to an extraction unit, an estimation unit, a learning unit, and the like.

　入力部２１は、情報取得装置３０により取得された患者及び医師の音声情報及び画像情報を受信し、サーバ装置２０に入力する。処理部２２は、入力部２１により入力された患者及び医師の音声情報及び画像情報に基づき、患者と医師とのコミュニケーションに関する特徴量を抽出し、抽出した特徴量に基づき、患者の満足度や不満度、抑うつ度等を推定する。出力部２３は、処理部２２により推定された患者の満足度、不満度及び抑うつ度に関する情報を出力して医師端末装置４０や患者端末装置５０等に送信する。なお、処理部２２は、満足度、不満度及び抑うつ度のいずれか又は全てを推定することが可能である。 The input unit 21 receives voice information and image information of the patient and doctor acquired by the information acquisition device 30 and inputs them to the server device 20 . The processing unit 22 extracts a feature amount related to communication between the patient and the doctor based on the voice information and image information of the patient and the doctor input by the input unit 21, and based on the extracted feature amount, the patient's degree of satisfaction and dissatisfaction. Estimate degree of depression, degree of depression, etc. The output unit 23 outputs information about the patient's degree of satisfaction, degree of dissatisfaction, and degree of depression estimated by the processing unit 22, and transmits the information to the doctor terminal device 40, the patient terminal device 50, and the like. In addition, the processing unit 22 can estimate any one or all of the degree of satisfaction, the degree of dissatisfaction and the degree of depression.

　例えば、処理部２２は、音声特徴量抽出部２２ａにより患者及び医師の音声情報から医師の音声情報を抽出し、抽出した医師の音声情報から医師の返答期待時間を切り出し学習モデル２２ｂにより特定する。その後、処理部２２は、音声・画像特徴量抽出部２２ｃにより、患者及び医師の音声情報及び患者情報から、返答期待時間における患者及び医師の音声特徴量及び画像特徴量を抽出し、学習モデル２２ｄにより患者の満足度、不満度及び抑うつ度を推定する。なお、この処理の流れについて詳しくは後述する。 For example, the processing unit 22 extracts the doctor's voice information from the patient's and the doctor's voice information using the voice feature extraction unit 22a, extracts the doctor's expected response time from the extracted doctor's voice information, and specifies it using the learning model 22b. After that, the processing unit 22 uses the voice/image feature amount extraction unit 22c to extract the voice feature amount and the image feature amount of the patient and the doctor at the expected reply time from the voice information and the patient information of the patient and the doctor, and extracts the voice feature amount and the image feature amount of the patient and the doctor, to estimate patient satisfaction, dissatisfaction and depression. The details of the flow of this process will be described later.

　ここで、切り出し学習モデル２２ｂは、返答期待時間を求めるための学習モデルである。また、学習モデル２２ｄは、患者の満足度や不満度、抑うつ度等を求めるための学習モデルである。これらの切り出し学習モデル２２ｂや学習モデル２２ｄとしては、例えば、ディープラーニング（ＤＬ）のモデルや畳み込みニューラルネットワーク（ＣＮＮ）のモデル等を用いることが可能である。 Here, the cutout learning model 22b is a learning model for obtaining the expected response time. Also, the learning model 22d is a learning model for obtaining the patient's degree of satisfaction, degree of dissatisfaction, degree of depression, and the like. As these cutout learning model 22b and learning model 22d, for example, a deep learning (DL) model, a convolutional neural network (CNN) model, or the like can be used.

　また、上述の処理部２２等の各機能部は、ハードウェア及びソフトウェアの両方又はどちらか一方により構成されてもよい。それらの構成は、特に限定されるものではない。例えば、前述の各機能部は、ＣＰＵ（Central　Processing　Unit）やＭＰＵ（Micro　Processing　Unit）等のコンピュータによって、ＲＯＭに予め記憶されたプログラムがＲＡＭ等を作業領域として実行されることにより実現されてもよい。また、各機能部は、例えば、ＡＳＩＣ（Application　Specific　Integrated　Circuit）やＦＰＧＡ（Field-Programmable　Gate　Array）等の集積回路により実現されてもよい。切り出し学習モデル２２ｂや学習モデル２２ｄは、例えば、各種のストレージ等により格納されてもよい。 Also, each functional unit such as the processing unit 22 described above may be configured by either or both of hardware and software. Their configuration is not particularly limited. For example, each of the functional units described above may be realized by executing a program stored in advance in ROM using a computer such as a CPU (Central Processing Unit) or MPU (Micro Processing Unit) using RAM as a work area. good. Also, each functional unit may be implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array). The clipped learning model 22b and the learning model 22d may be stored, for example, in various types of storage.

　なお、入力部２１は、患者及び医師の音声情報及び画像情報に加え、情報取得装置３０により取得された患者のバイタル情報（バイタルデータ）を受信し、サーバ装置２０に入力してもよい。情報取得装置３０としては、例えば、カメラやマイク等と同期したバイタルセンサが用いられる。バイタルセンサは、患者の心拍や心拍変動、発汗、血圧等のいずれか又は全てを電気信号に変換し、それらの値をバイタル情報として取得する。このバイタルセンサとしては、例えば、スマートウォッチ等のウェアラブルデバイスを用いることが可能である。このウェアラブルデバイスは、例えば、カメラやマイクと同期してバイタル情報を検出する。 It should be noted that the input unit 21 may receive the patient's vital information (vital data) acquired by the information acquisition device 30 in addition to the voice information and image information of the patient and the doctor, and input it to the server device 20 . As the information acquisition device 30, for example, a vital sensor synchronized with a camera, a microphone, or the like is used. The vital sensor converts one or all of the patient's heartbeat, heartbeat variability, perspiration, blood pressure, etc., into electrical signals, and acquires these values as vital information. As this vital sensor, for example, a wearable device such as a smart watch can be used. This wearable device detects vital information in synchronization with, for example, a camera and a microphone.

　医師端末装置４０は、制御部４１と、通信部４２と、表示部４３と、操作部４４とを有する。制御部４１は、通信部４２や表示部４３等の各部を制御する。通信部４２は、通信網６０を介する外部装置との通信を可能にする。表示部４３は、各種情報を表示する。この表示部４３は、例えば、液晶ディスプレイや有機ＥＬ（Electro-Luminescence）ディスプレイ等の表示装置により実現される。操作部４４は、医師等の操作者からの入力操作を受け付ける。この操作部４４は、例えば、タッチパネルやボタン等の入力装置により実現される。 The doctor terminal device 40 has a control unit 41, a communication unit 42, a display unit 43, and an operation unit 44. The control unit 41 controls each unit such as the communication unit 42 and the display unit 43 . The communication unit 42 enables communication with external devices via the communication network 60 . The display unit 43 displays various information. The display unit 43 is implemented by, for example, a display device such as a liquid crystal display or an organic EL (Electro-Luminescence) display. The operation unit 44 receives an input operation from an operator such as a doctor. The operation unit 44 is implemented by, for example, an input device such as a touch panel or buttons.

　患者端末装置５０は、医師端末装置４０と同様、制御部５１と、通信部５２と、表示部５３と、操作部５４とを有する。制御部５１は、通信部５２や表示部５３等の各部を制御する。通信部５２は、通信網６０を介する外部装置との通信を可能にする。表示部５３は、各種情報を表示する。この表示部５３は、例えば、液晶ディスプレイや有機ＥＬディスプレイ等の表示装置により実現される。操作部５４は、患者等の操作者からの入力操作を受け付ける。この操作部５４は、例えば、タッチパネルやボタン等の入力装置により実現される。 The patient terminal device 50 has a control unit 51, a communication unit 52, a display unit 53, and an operation unit 54, similar to the doctor terminal device 40. The control unit 51 controls each unit such as the communication unit 52 and the display unit 53 . The communication unit 52 enables communication with external devices via the communication network 60 . The display unit 53 displays various information. The display unit 53 is realized by, for example, a display device such as a liquid crystal display or an organic EL display. The operation unit 54 receives an input operation from an operator such as a patient. The operation unit 54 is implemented by, for example, an input device such as a touch panel or buttons.

　＜１－３．満足度（又は不満度）推定処理の流れの一例＞
　本実施形態に係る満足度（又は不満度）推定処理の流れの一例について図４を参照して説明する。図４は、本実施形態に係る満足度（又は不満度）推定処理の流れの一例を示すフローチャートである。 <1-3. An example of the flow of satisfaction (or dissatisfaction) estimation processing>
An example of the flow of satisfaction (or dissatisfaction) estimation processing according to this embodiment will be described with reference to FIG. FIG. 4 is a flowchart showing an example of the flow of satisfaction (or dissatisfaction) estimation processing according to this embodiment.

　図４に示すように、音声特徴量抽出部２２ａは、患者・医師間の音声情報を分離し（ステップＳ１１）、患者・医師の音声特徴量を抽出する（ステップＳ１２）。例えば、音声特徴量抽出部２２ａは、入力部２１により入力された患者・医師間の音声情報を解析し、患者・医師間の音声情報から患者の音声情報と医師の音声情報とを分離する。そして、音声特徴量抽出部２２ａは、患者の音声情報から患者の音声特徴量を抽出し、医師の音声情報から医師の音声特徴量を抽出する。 As shown in FIG. 4, the voice feature amount extraction unit 22a separates the voice information between the patient and the doctor (step S11), and extracts the voice feature amount of the patient and the doctor (step S12). For example, the voice feature amount extraction unit 22a analyzes the voice information between the patient and the doctor input by the input unit 21, and separates the voice information of the patient and the voice information of the doctor from the voice information between the patient and the doctor. Then, the speech feature amount extraction unit 22a extracts the patient's speech feature amount from the patient's speech information, and extracts the doctor's speech feature amount from the doctor's speech information.

　処理部２２は、返答期待時間としてタグ付けされた（返答期待時間により意味付けられた）音声特徴量（例えば、医師の音声特徴量）で学習された切り出し学習モデル２２ｂを用いて、音声特徴量（例えば、医師の音声特徴量）を学習し（ステップＳ１３）、返答期待時間を特定する（ステップＳ１４）。返答期待時間をタグ付けする方法として、例えば、事前に録音＋録画するときに、医師が患者の状態を診て、医師がリアルタイムに時間スタンプとマーカーを付記し、タグを入力することで、画像と音声を取得時に学習モデルを構築するための教師ＤＡＴＡとすることができる。また、本装置を、教師ＤＡＴＡを生成するための装置として使用できる。リアルタイムで入力できない場合には、録画＋録音ＤＡＴＡに時間スタンプとマーカーを付記することで教師ＤＡＴＡを生成する。 The processing unit 22 uses the clipped learning model 22b trained with the speech feature quantity (for example, doctor's speech feature quantity) tagged as the expected response time (given meaning by the expected reply time), and uses the speech feature quantity (For example, doctor's speech feature quantity) is learned (step S13), and the expected response time is specified (step S14). As a method of tagging the expected response time, for example, when recording + recording in advance, the doctor examines the patient's condition, adds a time stamp and marker in real time, and inputs the tag, so that the image and speech can be used as teacher DATA for building a learning model when acquiring speech. Also, this device can be used as a device for generating teacher DATA. If it is not possible to input in real time, teacher DATA is generated by appending a time stamp and a marker to the video/recording DATA.

　音声・画像特徴量抽出部２２ｃは、患者・医師間の音声・画像情報（音声情報及び画像情報）から、返答期待時間における特徴量（患者・医師の音声・画像特徴量）を抽出する（ステップＳ１５）。例えば、音声・画像特徴量抽出部２２ｃは、返答期待時間における患者・医師間の音声・画像情報を解析し、返答期待時間における患者の音声・画像情報と医師の音声・画像情報とを分離する。そして、音声・画像特徴量抽出部２２ｃは、返答期待時間における患者の音声・画像情報から返答期待時間における患者の音声・画像特徴量を抽出し、返答期待時間における医師の音声・画像情報から、返答期待時間における医師の音声・画像特徴量を抽出する。 The voice/image feature amount extraction unit 22c extracts the feature amount (voice/image feature amount of the patient/doctor) at the expected response time from the voice/image information (voice information and image information) between the patient and the doctor (step S15). For example, the voice/image feature amount extraction unit 22c analyzes voice/image information between the patient and the doctor at the expected response time, and separates the patient's voice/image information from the doctor's voice/image information at the expected response time. . Then, the voice/image feature amount extraction unit 22c extracts the voice/image feature amount of the patient at the expected response time from the voice/image information of the patient at the expected response time, and extracts the voice/image feature amount of the patient at the expected response time. We extract voice/image features of the doctor at the expected response time.

　処理部２２は、満足度／不満度にてタグ付けされた（満足度／不満度により意味付けられた）音声・画像特徴量で学習された学習モデル２２ｄを用いて、患者及び医師の音声・画像特徴量を学習し（ステップＳ１６）、患者の不満度又は満足度を出力する（ステップＳ１７）。 The processing unit 22 uses a learning model 22d trained with voice/image feature values tagged with satisfaction/dissatisfaction (meaning by satisfaction/dissatisfaction), and uses voice/image features of patients and doctors. The image feature amount is learned (step S16), and the patient's degree of dissatisfaction or satisfaction is output (step S17).

　＜１－４．全体の処理の流れの一例＞
　本実施形態に係る全体の処理の流れの一例について図５を参照して説明する。図５は、本実施形態に係る全体の処理の流れを説明するための図である。この全体の処理は、基本的に、サーバ装置２０の処理部２２により実行される。 <1-4. Example of overall processing flow>
An example of the overall processing flow according to this embodiment will be described with reference to FIG. FIG. 5 is a diagram for explaining the overall processing flow according to this embodiment. This overall processing is basically executed by the processing unit 22 of the server device 20 .

　図５に示すように、患者音声が入力され（ステップＳ２１）、医師音声が入力される（ステップＳ２２）。これらの患者音声及び医師音声は混ざった状態で音声情報として入力される。このため、話者の音源が話者毎に分離される（ステップＳ２３）。 As shown in FIG. 5, the patient's voice is input (step S21) and the doctor's voice is input (step S22). These patient's voice and doctor's voice are mixed and input as voice information. Therefore, the sound sources of the speakers are separated for each speaker (step S23).

　また、患者の上半身画像が入力され（ステップＳ２４）、医師の上半身画像が入力される（ステップＳ２５）。切り出し学習器により処理が実行され（ステップＳ２６）、切り出し時間・回数が求められ（ステップＳ２７）、切り出し音声素因子が求められ（ステップＳ２８）、切り出し顔画像素因子が求められる（ステップＳ２９）。また、切り出し音声素因子に基づく返答応答時間が求められる（ステップＳ３０）。 Also, the patient's upper body image is input (step S24), and the doctor's upper body image is input (step S25). Processing is executed by the cutout learner (step S26), the cutout time and number of cutouts are obtained (step S27), the cutout phoneme factor is obtained (step S28), and the cutout face image element factor is obtained (step S29). Also, a reply response time based on the cut-out phoneme factor is obtained (step S30).

　また、カメラ位置が入力され（ステップＳ３１）、カメラ、患者及び医師のそれぞれの位置が分析され（ステップＳ３２）、視線交錯複合素因子が求められる（ステップＳ３３）。その視線交錯複合素因子に基づく視線交錯頻度・率が求められ（ステップＳ３４）、頭部注視に関する情報が求められる（ステップＳ３５）。 Also, the camera position is input (step S31), the positions of the camera, the patient, and the doctor are analyzed (step S32), and the line-of-sight intersection complex factor is obtained (step S33). A line-of-sight intersection frequency/rate based on the line-of-sight intersection complex factor is obtained (step S34), and information on head gaze is obtained (step S35).

　また、切り出し音声素因子、切り出し顔画像素因子及び視線交錯複合素因子に基づいて、満足度学習器により満足度が推定され（ステップＳ３６）、不満度学習器により不満度が推定され（ステップＳ３７）、抑うつ度学習器により抑うつ度が推定され（ステップＳ３８）、満足度、不満度及び抑うつ度（抑鬱度）が求められる（ステップＳ３９）。 Further, based on the cut-out phoneme factor, the cut-out face image element factor, and the line-of-sight intersection complex factor, the satisfaction level is estimated by the satisfaction level learning machine (step S36), and the dissatisfaction level is estimated by the dissatisfaction level learning machine (step S37). ), the depression degree is estimated by the depression degree learning device (step S38), and satisfaction, dissatisfaction and depression (depression degree) are obtained (step S39).

　なお、上記の返答、切り出し時間・回数、満足度、不満度、抑うつ度（抑鬱度）、視線交錯頻度・率、頭部注視に関する情報等は、例えば、医師端末装置４０に送信される（ステップＳ４０）。医師端末装置４０は、表示部４３により上記の各種情報を表示する。これにより、医師は上記の各種情報を視認することができる。 In addition, the above-mentioned reply, extraction time/number of times, degree of satisfaction, degree of dissatisfaction, degree of depression (degree of depression), frequency/rate of crossing eyes, information on head gaze, etc. are transmitted to the doctor terminal device 40, for example (step S40). The doctor terminal device 40 displays the above various information on the display unit 43 . This allows the doctor to visually recognize the above various information.

　＜１－５．全体の処理における各種処理の流れの一例＞
　本実施形態に係る全体の処理（図５参照）における各種処理の流れの一例について図６から図９を参照して説明する。図６から図９は、それぞれ本実施形態に係る全体の処理における各種処理の流れを説明するための図である。この各種処理も、基本的に、サーバ装置２０の処理部２２により実行される。 <1-5. Example of Flow of Various Processes in Overall Processing>
An example of the flow of various processes in the overall process (see FIG. 5) according to this embodiment will be described with reference to FIGS. 6 to 9. FIG. 6 to 9 are diagrams for explaining the flow of various processes in the overall process according to this embodiment. These various processes are also basically executed by the processing unit 22 of the server device 20 .

　各種処理を行う学習器の構成では、学習データの種類として、医療面談時の患者と医師の二者の画像と音声が用いられる。医療面談は、例えば、２０～６０分程度の長い時間行われる。サーバ装置２０の処理部２２は、学習器（例えば、切り出し学習モデル２２ｂや学習モデル２２ｄ等）を有しており、医療面談中の二者の会話のどの部分を用いるかを特定し、その部分の素要因指標（例えば、素因子）を用いることを可能とする。音声情報や画像情報のうち、患者の満足度等の解析に用いる部分を限定することで、処理時間やコストを低減できる。また、医療面談において、多くは事実確認に時間が使われ、医師はその事実確認に対する同意を患者に求める。患者の同意の多さが面談の満足度につながるとする医師の経験則により、同意の非言語化行動である視線の一致（例えば、視線交錯率や視線交錯時間等）を学習因子に取り込むことで、学習器の正答率を高めることができる。以下、各種処理毎に詳しく説明する。 In the configuration of the learning device that performs various types of processing, images and voices of both the patient and the doctor at the time of the medical interview are used as the type of learning data. A medical interview takes a long time, for example, 20 to 60 minutes. The processing unit 22 of the server device 20 has a learning device (for example, the cutout learning model 22b, the learning model 22d, etc.), specifies which part of the conversation between the two during the medical interview is to be used, and It is possible to use the prime factor indices (eg, prime factors) of . Processing time and costs can be reduced by limiting the portion of the voice information and image information that is used for analysis of the patient's degree of satisfaction and the like. Also, in medical interviews, much time is spent on fact-checking, and the doctor asks the patient for consent to the fact-checking. Based on the doctor's empirical rule that the amount of consent of the patient leads to the satisfaction of the interview, eye contact (e.g., eye contact rate, eye contact time, etc.), which is a non-verbalized behavior of consent, is incorporated into the learning factor. , the correct answer rate of the learner can be increased. Each type of processing will be described in detail below.

　（切り出し時間）
　図６に示すように、医師音声が入力され（ステップＳ５１）、医師音声情報から素因子が抽出され（ステップＳ５２）、切り出し学習器に入力される。また、医師の返答期待時間タグが切り出し学習器に入力される（ステップＳ５３）。切り出し学習器により返答期待時間因子が抽出され（ステップＳ５４）、切り出し学習器から返答期待時間因子が出力されて（ステップＳ５５）、切り出し時間が得られる（ステップＳ５６）。 (Cut out time)
As shown in FIG. 6, a doctor's voice is input (step S51), prime factors are extracted from the doctor's voice information (step S52), and input to a clipping learner. In addition, the doctor's expected response time tag is input to the extraction learner (step S53). An expected response time factor is extracted by the clipping learner (step S54), the expected response time factor is output from the clipping learner (step S55), and a clipping time is obtained (step S56).

　なお、切り出し学習器は、切り出し学習モデル２２ｂを実現し、サーバ装置２０の処理部２２に含まれる。この切り出し学習器では、学習が進み、所望の正解率（再現率）を得ることが可能になれば、ステップＳ５３を実行しなくてもよい。正確率を向上させるためには、学習を繰り返し行うことが望ましい。また、医師の返答期待時間タグは、例えば、医師によって予め設定される。この場合、医師は、医師端末装置４０の操作部４４を入力操作し、返答期待時間タグを設定してもよい。返答期待時間タグは、返答期待時間に関する情報の一例である。 Note that the clipping learning device implements the clipping learning model 22 b and is included in the processing unit 22 of the server device 20 . In this clipping learning device, if learning progresses and a desired accuracy rate (recall rate) can be obtained, step S53 need not be executed. In order to improve the accuracy rate, it is desirable to repeat learning. Also, the doctor's expected response time tag is set in advance by the doctor, for example. In this case, the doctor may operate the operation unit 44 of the doctor terminal device 40 to set the expected response time tag. The expected response time tag is an example of information related to the expected response time.

　このような切り出し時間の推定では、医療面談時において、患者と医師の二者の音声が患者と医師の音声とに分離され、医師の音声が入力される。この医師の音声が周波数やピッチ等に基づいて分析され、特徴量である素因子（素要因指標）を複数抽出する。医師により患者の返答を期待する部分の時間が患者の音声情報や画像情報にタグ付けされる。音声分析された素因子と医師の返答期待時間タグが切り出し学習器に入力され、返答期待時間因子が切り出し時間として出力される。この切り出し時間は、全データのうち患者の満足度等を推定するために使用するデータを特定するための時間である。これにより、患者の満足度等を推定するために使用するデータ量を抑えることが可能になるので、処理速度を向上させることができる。 In such a cut-out time estimation, during a medical interview, the voices of the patient and the doctor are separated into the voices of the patient and the doctor, and the doctor's voice is input. This doctor's voice is analyzed based on frequency, pitch, etc., and a plurality of prime factors (prime factor indices), which are feature quantities, are extracted. The portion of time the physician expects the patient to respond is tagged to the patient's audio and visual information. The voice-analyzed prime factor and the doctor's expected response time tag are input to an extraction learner, and the expected response time factor is output as an extraction time. This cut-out time is the time for specifying the data used for estimating the degree of satisfaction of the patient among all the data. This makes it possible to reduce the amount of data used for estimating the patient's satisfaction level, etc., thereby improving the processing speed.

　（切り出し音声）
　図７に示すように、医師音声が入力され（ステップＳ６１）、医師音声情報から素因子が抽出されて分析される（ステップＳ６２）。患者音声が入力され（ステップＳ６３）、患者音声情報から素因子が抽出されて分析される（ステップＳ６４）。医師音声に基づく素因子、患者音声に基づく素因子、切り出し時間に基づいて、素因子が抽出されて分析され、切り出される（ステップＳ６５）。医師切り出し音声因子が得られ（ステップＳ６６）、患者切り出し音声因子が得られる（ステップＳ６７）。 (Clipped audio)
As shown in FIG. 7, a doctor's voice is input (step S61), and prime factors are extracted from the doctor's voice information and analyzed (step S62). A patient's voice is input (step S63), and prime factors are extracted from the patient's voice information and analyzed (step S64). Prime factors are extracted, analyzed, and cut out based on the doctor's voice-based prime factor, the patient's voice-based prime factor, and the clipping time (step S65). A doctor-cut speech factor is obtained (step S66), and a patient-cut speech factor is obtained (step S67).

　なお、医師切り出し顔画像因子及び患者切り出し顔画像因子も、上記のステップＳ６１～Ｓ６７と同様な手順で得られる。つまり、医師画像及び患者画像（例えば、医師上半身画像や患者上半身画像）が入力され、素因子の抽出及び分析により、切り出し時間における素因子、すなわち、医師切り出し顔画像因子及び患者切り出し顔画像因子が得られる。例えば、サーバ装置２０の処理部２２は、医師画像や患者画像から画像認識処理により顔や目等を認識することが可能であり、医師切り出し顔画像因子及び患者切り出し顔画像因子を取得することができる。 The doctor's cut-out face image factor and the patient's cut-out face image factor are also obtained in the same procedure as steps S61 to S67 above. That is, a doctor's image and a patient's image (for example, a doctor's upper body image and a patient's upper body image) are input, and by extracting and analyzing prime factors, the prime factors at the extraction time, that is, the doctor's extracted face image factor and the patient's extracted face image factor are obtained. can get. For example, the processing unit 22 of the server device 20 can recognize faces, eyes, and the like from doctor images and patient images by image recognition processing, and can acquire doctor-cut face image factors and patient-cut face image factors. can.

　（患者の満足度／不満度）
　図８に示すように、視線交錯複合素因子が満足度学習器（又は不満足学習器）に入力され（ステップＳ７１）、切り出し音声素因子が満足度学習器（又は不満足学習器）に入力され（ステップＳ７２）、切り出し顔画像素因子が満足度学習器（又は不満足学習器）に入力される（ステップＳ７３）。さらに、患者アンケート（例えば、質問票等）の満足度又は不満度に関する教師データが満足度学習器（又は不満足学習器）に入力される（ステップＳ７４）。これらの教師データ、切り出し音声素因子及び切り出し顔画像素因子に基づいて、満足度学習器（又は不満足学習器）により処理が実行され（ステップＳ７５）、患者の満足度又は不満度の個々の推定値が求められる（ステップＳ７６）。 (Patient satisfaction/dissatisfaction)
As shown in FIG. 8, the gaze crossing complex factor is input to the satisfaction level learner (or dissatisfaction learner) (step S71), and the clipped phoneme factor is input to the satisfaction level learner (or dissatisfaction learner) ( Step S72), the clipped face image element factor is input to a satisfaction level learner (or dissatisfaction level learner) (step S73). Further, teacher data regarding the degree of satisfaction or dissatisfaction of the patient questionnaire (for example, questionnaire) is input to the satisfaction learning device (or dissatisfaction learning device) (step S74). Based on these teacher data, clipped phoneme factors, and clipped face image elements, processing is performed by a satisfaction learner (or dissatisfaction learner) (step S75), and individual estimation of the patient's satisfaction or dissatisfaction. A value is obtained (step S76).

　なお、満足度学習器（又は不満足学習器）は、学習モデル２２ｄの一部を実現し、サーバ装置２０の処理部２２に含まれる。この満足度学習器（又は不満足学習器）では、学習が進み、所望の正解率（再現率）を得ることが可能になれば、ステップＳ７４を実行しなくてもよい。正確率を向上させるためには、学習を繰り返し行うことが望ましい。教師データとしては、患者アンケート以外にも、例えば、カルテ等を用いることが可能である。また、学習に用いる教師データは、例えば、医師が患者の状態を診て、医師がリアルタイムに時間スタンプと満足度のスコアを付記して入力することで、画像と音声を取得時に学習モデルを構築することができる。また、本装置を、教師ＤＡＴＡを生成するための装置として使用できる。リアルタイムで入力できない場合には、例えば、録画＋録音ＤＡＴＡに時間スタンプと満足度のスコアを付記することで教師データを生成することもできる。 Note that the satisfaction level learner (or the dissatisfaction learner) implements part of the learning model 22 d and is included in the processing unit 22 of the server device 20 . In this satisfaction level learning device (or dissatisfaction learning device), if learning progresses and a desired accuracy rate (recall rate) can be obtained, step S74 need not be executed. In order to improve the accuracy rate, it is desirable to repeat learning. In addition to patient questionnaires, for example, charts and the like can be used as training data. In addition, the teacher data used for learning is, for example, a doctor who examines the patient's condition, and the doctor adds a time stamp and satisfaction score in real time, and the learning model is built when the image and sound are acquired. can do. Also, this device can be used as a device for generating teacher DATA. If real-time input is not possible, teacher data can be generated by, for example, adding a time stamp and a satisfaction score to the recorded + recorded DATA.

　ここで、視線交錯複合素因子は、視線交錯率や交錯時間等である。つまり、視線交錯複合素因子は、視線交錯率及び交錯時間の両方又は一方でもよく、視線交錯情報に相当する。また、患者の満足度や不満度を推定する際、切り出し音声素因子や切り出し顔画像素因子としては、基本的に、患者切り出し音声素因子や患者切り出し顔画像素因子を用いるが、これに限るものではなく、例えば、必要に応じて医師切り出し音声素因子や医師切り出し顔画像素因子を追加して用いてもよい。また、視線交錯複合素因子の他に、カメラやマイクと同期したバイタルセンサによる患者のバイタルデータ（心拍や心拍変動、発汗、血圧等）を追加して用いてもよい。 Here, the line-of-sight crossing complex factor is the line-of-sight crossing rate, crossing time, etc. That is, the line-of-sight intersection complex factor may be both or one of the line-of-sight intersection rate and the line-of-sight intersection time, and corresponds to line-of-sight intersection information. In addition, when estimating the patient's degree of satisfaction or dissatisfaction, basically, the patient's extracted speech element and the patient's extracted face image element are used as the extracted speech element and the extracted face image element, but they are limited to these. For example, doctor-cut speech element factors and doctor-cut face image element factors may be added and used as needed. In addition to the line-of-sight intersection complex factor, the patient's vital data (heart rate, heart rate variability, perspiration, blood pressure, etc.) obtained by a vital sensor synchronized with a camera and a microphone may be additionally used.

　（患者の抑うつ度）
　図９に示すように、視線交錯複合素因子が抑うつ度学習器に入力され（ステップＳ８１）、切り出し音声素因子が抑うつ度学習器に入力され（ステップＳ８２）、切り出し顔画像素因子が抑うつ度学習器に入力される（ステップＳ８３）。さらに、患者アンケート（例えば、ＰＨＱ－９：Patient　Health　Questionnaire－９等）の抑うつ度に関する教師データが抑うつ度学習器に入力される（ステップＳ８４）。抑うつ度学習器により処理が実行され（ステップＳ８５）、患者の抑うつ度の推定値が求められる（ステップＳ８６）。 (Patient's degree of depression)
As shown in FIG. 9, the gaze crossing complex factor is input to the depression level learning machine (step S81), the clipped phoneme factor is input to the depression level learning machine (step S82), and the clipped face image element is input to the depression level learning machine (step S82). It is input to the learning device (step S83). Furthermore, teacher data regarding the degree of depression in a patient questionnaire (eg, PHQ-9: Patient Health Questionnaire-9) is input to the depression degree learner (step S84). Processing is performed by the depression level learning device (step S85), and an estimated value of the patient's depression level is obtained (step S86).

　なお、抑うつ度学習器は、学習モデル２２ｄの一部を実現し、サーバ装置２０の処理部２２に含まれる。この抑うつ度学習器では、学習が進み、所望の正解率（再現率）を得ることが可能になれば、ステップＳ８４を実行しなくてもよい。正確率を向上させるためには、学習を繰り返し行うことが望ましい。教師データとしては、患者アンケート以外にも、例えば、カルテを用いることが可能である。 It should be noted that the depression degree learning device implements a part of the learning model 22d and is included in the processing unit 22 of the server device 20. In this depression degree learning device, if learning progresses and a desired accuracy rate (recall rate) can be obtained, step S84 need not be executed. In order to improve the accuracy rate, it is desirable to repeat learning. In addition to patient questionnaires, for example, charts can be used as training data.

　以上のように、視線交錯複合素因子（例えば、視線交錯率や視線交錯時間等）と、音声素因子と、画像素因子とを学習器（例えば、満足度学習器や不満度学習器、抑うつ度学習器等）に入力することで、患者の満足度、不満度及び抑うつ度の推定値を得ることができる。学習器には、患者アンケートから満足度や不満度、抑うつ度等に関する情報が教師データとして入力され、学習モデルが生成されている。 As described above, the gaze-crossing complex factor (eg, gaze-crossing rate, gaze-crossing time, etc.), the phoneme factor, and the picture element factor are combined into learners (eg, satisfaction level learner, dissatisfaction level learner, depression degree learner, etc.), it is possible to obtain estimates of the patient's satisfaction, dissatisfaction and depression. A learning model is generated by inputting information on satisfaction, dissatisfaction, depression, etc. from patient questionnaires into the learning machine as teacher data.

　＜１－６．視線交錯推定処理の一例＞
　本実施形態に係る視線交錯推定処理の一例について図１０から図１２を参照して説明する。図１０から図１２は、それぞれ本実施形態に係る視線交錯推定処理の一例を説明するための図である。視線交錯推定処理は、基本的に、サーバ装置２０の処理部２２により実行される。この視線交錯推定処理は、患者及び医師の方位推定と、その方位推定結果に基づいて患者及び医師の視線交錯推定とを実現する。 <1-6. Example of line-of-sight intersection estimation processing>
An example of the line-of-sight intersection estimation process according to the present embodiment will be described with reference to FIGS. 10 to 12. FIG. 10 to 12 are diagrams for explaining an example of the line-of-sight intersection estimation processing according to the present embodiment. The line-of-sight intersection estimation process is basically executed by the processing unit 22 of the server device 20 . This line-of-sight crossing estimation processing realizes the azimuth estimation of the patient and the doctor, and the line-of-sight crossing estimation of the patient and the doctor based on the azimuth estimation result.

　図１０に示すように、患者及び医師の方位推定では、二台のカメラ（カメラ１、カメラ２）を仮想円（設置サークル）上に配置し、その仮想円上に患者及び医師の位置を定義する。図１０の例では、カメラ位置から患者と医師の個々の顔の全球方位を推定する。これにより、例えば、患者全方位推定角や医師全方位推定角等を得ることができる。 As shown in FIG. 10, in patient and doctor orientation estimation, two cameras (camera 1 and camera 2) are placed on a virtual circle (installation circle), and the positions of the patient and doctor are defined on the virtual circle. do. In the example of FIG. 10, the global orientation of the patient's and doctor's individual faces is estimated from the camera positions. As a result, for example, the patient's omnidirectional estimated angle, the doctor's omnidirectional estimated angle, and the like can be obtained.

　図１１に示すように、患者及び医師の方位推定では、二台のカメラ（カメラ１、カメラ２）を仮想円（設置サークル）の中心に配置し、その仮想円上に患者及び医師の位置を定義する。図１１の例では、カメラ位置から患者と医師の個々の顔の全球方位を推定する。これにより、例えば、患者全方位推定角や医師全方位推定角等を得ることができる。 As shown in FIG. 11, in estimating the orientation of a patient and a doctor, two cameras (camera 1 and camera 2) are arranged at the center of a virtual circle (installation circle), and the positions of the patient and the doctor are placed on the virtual circle. Define. In the example of FIG. 11, the global orientation of the patient's and doctor's individual faces is estimated from the camera positions. As a result, for example, the patient's omnidirectional estimated angle, the doctor's omnidirectional estimated angle, and the like can be obtained.

　図１２に示すように、患者及び医師の視線交錯推定では、推定された全球方位角と、各カメラ（例えば、カメラ１やカメラ２等）からの画像からリアルタイムで顔画像分析により顔の方位を推定する。なお、図１２の例では、カメラから患者の角度がαであり、カメラから医師の角度がβであり、患者から見た医師角度は１８０－β－γであり、視線角度がθである。 As shown in FIG. 12, in the line-of-sight crossing estimation of the patient and the doctor, the estimated global azimuth angle and the image from each camera (for example, camera 1, camera 2, etc.) are analyzed in real time to determine the face direction. presume. In the example of FIG. 12, the patient's angle from the camera is α, the doctor's angle from the camera is β, the doctor's angle seen from the patient is 180-β-γ, and the line-of-sight angle is θ.

　例えば、推定された顔の方位である患者の視線ベクトル（例えば、患者の視線角度に基づくベクトル）が、仮想円上において医師が占める円弧範囲内に入っている場合、患者の顔は医師を向いていることになる。同様に、推定された顔の方位である医師の視線ベクトル（例えば、医師の視線角度に基づくベクトル）が、仮想円上において患者が占める円弧範囲内に入っている場合、医師の顔は患者を向いていることになる。このような場合、患者の顔と医師の顔が向かい合っており、患者及び医師の視線が交錯していることになる。なお、２次元又は３次元で顔方位を求めることが可能である。 For example, if the patient's gaze vector (e.g., the vector based on the patient's gaze angle), which is the estimated face orientation, falls within the arc range occupied by the doctor on the virtual circle, the patient's face faces the doctor. It means that Similarly, if the doctor's gaze vector (e.g., the vector based on the doctor's gaze angle), which is the estimated face orientation, falls within the arc range occupied by the patient on the virtual circle, the doctor's face is aligned with the patient. It will be suitable. In such a case, the patient's face and the doctor's face face each other, and the lines of sight of the patient and the doctor intersect. Note that it is possible to obtain the face orientation in two dimensions or three dimensions.

　患者及び医師の視線交錯を推定する場合、例えば、前述の返答期待時間因子の前後１０秒の画像から患者及び医師の顔方位を求め、求めた患者及び医師の顔方位を用いて、互いの顔方位の視線交錯時間及び視線交錯時間率を算出する。視線交錯時間は、例えば、設置サークル上の医師が占める円弧範囲と患者の視線角度が一致している時間である。視線交錯時間率は、例えば、視線交錯時間が返答期待時間の前後１０秒、すなわち２０秒のうちどの程度を占有するかを示す占有率であり、あるいは、視線交錯時間が返答期待時間のうちどの程度を占有するかを示す占有率である。 When estimating the line-of-sight crossing of the patient and the doctor, for example, the face directions of the patient and the doctor are obtained from the images of 10 seconds before and after the response expected time factor, and the obtained face directions of the patient and the doctor are used to determine the mutual face The line-of-sight crossing time and the line-of-sight crossing time rate of the azimuth are calculated. The line-of-sight crossing time is, for example, the time during which the arc range occupied by the doctor on the installation circle and the line-of-sight angle of the patient match. The line-of-sight crossing time rate is, for example, an occupancy rate indicating how much the line-of-sight crossing time occupies within 10 seconds before and after the expected response time, that is, 20 seconds, or how long the line-of-sight crossing time is within the expected response time. It is the occupancy rate that indicates whether the degree is occupied.

　なお、患者及び医師の視線交錯を推定する方法としては、図１０や図１１以外の配置を用いることも可能であり、例えば、図１１において、全方位（３６０度）を撮像可能なカメラを仮想円（設置サークル）の中心に１台だけ配置してもよい。また、全方位を撮像可能なカメラを患者と医師との中間に１台だけ配置してもよい。この場合、医療面談がオンライン面談である場合でも、患者及び医師の視線交錯（視線の交じり）を検出することができる。 As a method of estimating the line-of-sight intersection of the patient and the doctor, it is possible to use arrangements other than those shown in FIGS. 10 and 11. For example, in FIG. Only one unit may be placed in the center of the circle (installation circle). Alternatively, only one camera capable of imaging in all directions may be arranged between the patient and the doctor. In this case, even if the medical interview is an online interview, it is possible to detect crossing of the gazes of the patient and the doctor.

　また、患者の満足度等を推定するための特徴量としては、視線（視線交錯）以外にも、切り出し時間における患者の全球方位角から患者の顔が医師に向いていることや医師の顔より下方に向いていることを判定する複合素因子や、医師の返答期待時間の開始時刻から患者が発話するまでの応答時間（医師の質問に対して患者が発話するまでの応答時間）の複合素因子、切り出し時間における顔画像から生成した心拍（脈拍）や心拍揺らぎ等の複合素因子を用いてもよい。視線交錯は、医師がリアルタイムに時間スタンプとタグを入力することで生成することもできる。 In addition to the line of sight (crossing of lines of sight), other feature values for estimating the patient's satisfaction level include the fact that the patient's face is facing the doctor from the patient's global azimuth angle at the time of extraction, and the fact that the patient's face is facing the doctor. Composite factors for determining downward facing, and composite factors of the response time from the start time of the doctor's expected response time until the patient speaks (response time until the patient speaks to the doctor's question) A complex factor such as a heartbeat (pulse) generated from the face image at the cut-out time or heartbeat fluctuation may be used. Gaze intersections can also be generated by the physician entering time stamps and tags in real time.

　また、患者の満足度等を推定するための特徴量としては、例えば、眉毛の動きや額の皺等を用いて、患者の満足度等を推定してもよい。心拍以外にも、体温や血圧、呼吸等が用いられてもよい。これらの体温や血圧、呼吸等は、例えば、顔画像から求められてもよく、あるいは、バイタルセンサにより求められてもよい。また、患者の満足度等を推定するための特徴量としては、例えば、カメラやマイクと同期したバイタルセンサによる患者のバイタルデータ（心拍や心拍変動、発汗、血圧等）が用いられてもよい。 In addition, as a feature amount for estimating the patient's satisfaction level, for example, eyebrow movement, forehead wrinkles, etc. may be used to estimate the patient's satisfaction level. Besides heartbeat, body temperature, blood pressure, respiration, etc. may be used. These body temperature, blood pressure, respiration, etc. may be obtained from, for example, a face image, or may be obtained from a vital sensor. In addition, as the feature amount for estimating the patient's degree of satisfaction, for example, the patient's vital data (heart rate, heart rate variability, perspiration, blood pressure, etc.) obtained by a vital sensor synchronized with a camera or microphone may be used.

　また、画像や音声等の素因子（特徴量）としては、例えば、ＯｐｅｃＣＶ（Open　Source　Computer　Vision　Library）やＬｉｂｒｏｓａ（音楽と音声の解析のためのパイソン（python）パッケージ）、ＯｐｅｎＦａｃｅ（ディープニューラルネットワークにより顔認証を行うオープンソース）等で抽出される因子を用いることが可能である。 In addition, as prime factors (feature values) such as images and sounds, for example, OpecCV (Open Source Computer Vision Library), Librosa (python package for music and sound analysis), OpenFace (by deep neural network It is possible to use the factor extracted by open source that performs face authentication).

　＜１－７．医師用の表示画像の一例＞
　本実施形態に係る医師用の表示画像の一例について図１３を参照して説明する。図１３は、本実施形態に係る医師用の表示画像の一例を説明するための図である。医師用の表示画像は、例えば、ＵＩ（ユーザインターフェース）画像である。 <1-7. Example of Display Image for Doctor>
An example of a display image for a doctor according to this embodiment will be described with reference to FIG. 13 . FIG. 13 is a diagram for explaining an example of a display image for a doctor according to this embodiment. The display image for doctors is, for example, a UI (user interface) image.

　図１３に示すように、表示画像は、満足度、不満度及び抑うつ度の情報、また、各種情報を含む。図１３の例では、満足度、不満度及び抑うつ度がレーダグラフにより示される。また、図１３の例では、「リアルタイムの切り出しタイミング：時間、ＯＮ－ＯＦＦ」や「視線一致率：表示（％、時間）」、「切り出し積算個数」、「頭部の位置（頭部、下部）」、「返答スピード」、「経過時間」等の情報が示される。これらの情報は、図５に示すような処理により求められた情報から得られる。このような表示画像は、例えば、医師端末装置４０の表示部４３により表示される。これにより、医師は上記の各種情報を視認することができる。 As shown in FIG. 13, the display image includes information on satisfaction, dissatisfaction and depression, as well as various types of information. In the example of FIG. 13, the degree of satisfaction, the degree of dissatisfaction and the degree of depression are indicated by radar graphs. Further, in the example of FIG. 13, "real-time extraction timing: time, ON-OFF", "line-of-sight matching rate: display (%, time)", "accumulated number of extractions", "head position (head, lower part)" )”, “response speed”, and “elapsed time” are displayed. These pieces of information are obtained from the information obtained by the processing shown in FIG. Such a display image is displayed by the display unit 43 of the doctor terminal device 40, for example. This allows the doctor to visually recognize the above various information.

　なお、図１３に示す表示画像は、あくまでも例示であり、各種情報を提示するための表示画像は他の態様であってもよい。また、例えば、上記情報以外の情報も提示することが可能であり、図５に示すような処理により求められた情報をそのまま提示してもよい。 Note that the display image shown in FIG. 13 is merely an example, and the display image for presenting various information may be in another form. Further, for example, it is possible to present information other than the above information, and the information obtained by the processing shown in FIG. 5 may be presented as it is.

　＜１－８．システム応用サービスの一例＞
　本実施形態に係るシステム応用サービスの一例について図１４を参照して説明する。図１４は、本実施形態に係るシステム応用サービスの一例を説明するための図である。システム応用サービスは、スケジュールシステムとして機能する。このスケジュールシステムは、サーバ装置２０により実現される。 <1-8. Example of system application service>
An example of the system application service according to this embodiment will be described with reference to FIG. FIG. 14 is a diagram for explaining an example of a system application service according to this embodiment. The system application service functions as a scheduling system. This schedule system is implemented by the server device 20 .

　図１４に示すように、学習器（例えば、図８や図９等）により処理が実行され（ステップＳ９１）、満足度、不満度及び抑うつ度（抑鬱度）が求められる（ステップＳ９２～Ｓ９４）。満足度、不満度及び抑うつ度のいずれか又は全てに基づいて、医療面談に対する患者の印象（気持ち）が満足群又は不満群に分類され（ステップＳ９５）、満足群結果（医療面談に対する患者の印象が満足である）がスケジュールシステムに入力され（ステップＳ９６）、不満群結果（医療面談に対する患者の印象が不満足である）がスケジュールシステムに入力される（ステップＳ９７）。 As shown in FIG. 14, processing is executed by a learning device (eg, FIG. 8, FIG. 9, etc.) (step S91), and satisfaction, dissatisfaction and depression (depression) are obtained (steps S92 to S94). . Based on one or all of the degree of satisfaction, degree of dissatisfaction, and degree of depression, the patient's impression (feeling) of the medical interview is classified into a satisfaction group or a dissatisfaction group (step S95), and the satisfaction group result (patient's impression of the medical interview is satisfied) is input to the scheduling system (step S96), and the dissatisfaction group result (patient's impression of the medical interview is unsatisfactory) is input to the scheduling system (step S97).

　例えば、満足度、不満度及び抑うつ度は、個々の所定値に対して大きいか小さいか判断され（大小が判断され）、その判断結果に応じて、医療面談に対する患者の印象が満足群又は不満群に分類される。一例として、満足度が第１の所定値以上であり、不満度が第２の所定値より小さく、抑うつ度が第３の所定値より小さければ、医療面談に対する患者の印象が満足群に分類される。逆に、満足度が第１の所定値より小さく、不満度が第２の所定値以下であり、抑うつ度が第３の所定値以下であれば、医療面談に対する患者の印象が不満群に分類される。 For example, the degree of satisfaction, the degree of dissatisfaction, and the degree of depression are judged whether they are large or small with respect to each predetermined value (largeness is judged), and depending on the judgment result, the patient's impression of the medical interview is satisfied or dissatisfied. classified into groups. As an example, if the degree of satisfaction is greater than or equal to a first predetermined value, the degree of dissatisfaction is less than a second predetermined value, and the degree of depression is less than a third predetermined value, the patient's impression of the medical interview is classified into the satisfaction group. be. Conversely, if the degree of satisfaction is less than the first predetermined value, the degree of dissatisfaction is less than or equal to the second predetermined value, and the degree of depression is less than or equal to the third predetermined value, the patient's impression of the medical interview is classified into the dissatisfied group. be done.

　なお、前述のように、満足度、不満度及び抑うつ度の全てを用いて分類を行う以外にも、満足度、不満度及び抑うつ度のいずれか一つ又は二つを用いて分類を行ってもよい。分類に用いる要素を少なくすることで、処理速度を向上させることができる。一方、分類に用いる要素を増やすことで、分類の精度を向上させることができる。 As described above, in addition to classifying using all of satisfaction, dissatisfaction and depression, one or both of satisfaction, dissatisfaction and depression may be used for classification. good too. Processing speed can be improved by reducing the number of elements used for classification. On the other hand, the accuracy of classification can be improved by increasing the number of elements used for classification.

　満足群結果に基づき、スケジュールシステムにより処理が実行され（ステップＳ９８）、医療面談日時や場所等を知らせる事前連絡が患者と医師に送られる（ステップＳ９９）。例えば、事前連絡はメールにより医師端末装置４０や患者端末装置５０に送られ、医師端末装置４０の表示部４３や患者端末装置５０の表示部５３により表示される（ステップＳ１００、Ｓ１０１）。医師及び患者は、事前連絡に基づくスケジュールで医療面談を行う（ステップＳ１０２、Ｓ１０３）。 Based on the results of the satisfaction group, the scheduling system executes processing (step S98), and an advance notice is sent to the patient and doctor informing them of the date and place of the medical interview (step S99). For example, the advance notification is sent to the doctor terminal device 40 and the patient terminal device 50 by e-mail, and displayed on the display section 43 of the doctor terminal device 40 and the display section 53 of the patient terminal device 50 (steps S100 and S101). The doctor and the patient have a medical interview according to the schedule based on the advance notice (steps S102, S103).

　一方、不満群結果に基づき、スケジュールシステムにより処理が実行され（ステップＳ１０４）、医師の変更等の高頻度連絡（高頻度に連絡）が患者に送られる（ステップＳ１０５）。例えば、高頻度連絡はメールにより患者端末装置５０に送られ、患者端末装置５０の表示部５３により表示される（ステップＳ１０１）。 On the other hand, based on the dissatisfaction group results, the scheduling system executes processing (step S104), and high-frequency contact (high-frequency contact) such as change of doctor is sent to the patient (step S105). For example, the high-frequency contact is sent by e-mail to the patient terminal device 50 and displayed on the display unit 53 of the patient terminal device 50 (step S101).

　また、不満群結果に基づき、別の医師が患者に案内される（ステップＳ１０６）。例えば、別の医師を案内する案内はメールにより患者端末装置５０に送られ、患者端末装置５０の表示部５３により表示される。別の医師は、データベースに登録されている複数の医師から選択される。 Also, another doctor is guided to the patient based on the dissatisfied group result (step S106). For example, a guide to another doctor is sent to the patient terminal device 50 by e-mail and displayed on the display unit 53 of the patient terminal device 50 . Another doctor is selected from a plurality of doctors registered in the database.

　患者は、別の医師を案内する案内を受けた場合、その案内による医師が複数である場合、それらの医師から希望の医師を選択する（ステップＳ１０７）。患者が別の医師を選択すると、その新たな医師に対して医療面談日時や場所等を知らせる事前連絡が送られる。なお、元の医師に対して医療面談キャンセルの事前連絡が送られる。 When the patient receives a guide to another doctor, and there are multiple doctors according to the guide, the patient selects the desired doctor from those doctors (step S107). If the patient selects another doctor, the new doctor will be sent advance notice of the date, time and location of the medical interview. In addition, a prior notice of cancellation of the medical interview is sent to the original doctor.

　このようなシステム応用サービスによれば、医師は、患者の満足度や不満度、抑うつ度等を把握することが可能になり、医療面談時に次回来院を患者に促したり、又は、在宅の患者に来院や教育コンテンツ視聴、アプリ使用等を促したりすることで、治療効果を向上させることができる。また、事前連絡により医師や患者は医療面談日時や場所等を把握することが可能になるので、医師や患者の利便性を向上させることができる。 According to such a system application service, the doctor can grasp the patient's degree of satisfaction, dissatisfaction, depression, etc. By encouraging patients to visit the hospital, watch educational content, use apps, etc., it is possible to improve the effectiveness of treatment. In addition, advance notice enables doctors and patients to grasp the date, time and place of medical consultations, thereby improving the convenience of doctors and patients.

　なお、前述のように、患者の満足度、不満度及び抑うつ度のいずれか又は全てを推定して表示することが可能であるが、これに限るものではなく、例えば、患者の不安度を推定して表示してもよい。つまり、患者の満足度、不満度、不安度及び抑うつ度のいずれか又は全てを推定して表示してもよい。例えば、患者の不安度が表示されれば、医師は、患者の不安度を把握することが可能になり、医療面談時に次回来院を患者に促したりすることで、治療効果をより向上させることができる。 In addition, as described above, it is possible to estimate and display any or all of the patient's satisfaction, dissatisfaction, and depression, but the present invention is not limited to this. may be displayed as That is, any or all of the patient's satisfaction, dissatisfaction, anxiety and depression may be estimated and displayed. For example, if the patient's anxiety level is displayed, the doctor can grasp the patient's anxiety level, and by encouraging the patient to come to the hospital next time during the medical consultation, the treatment effect can be further improved. can.

　＜１－９．効果＞
　以上説明したように、本実施形態によれば、入力部２１が患者及び医師の音声と画像を入力し、抽出部（例えば、処理部２２）は、患者及び医師の音声と画像から患者と医師とのコミュニケーションに関する特徴量を抽出し、推定部（例えば、処理部２２）は、その抽出した特徴量に基づいて患者の満足度、不満度又は不安度を推定し、出力部２３は患者の満足度、不満度又は不安度を出力する。これにより、患者の満足度、不満度又は不安度を医師端末装置４０等によって表示し、可視化することが可能になる。したがって、医師は患者の満足度、不満度又は不安度を把握し、患者の治療継続意欲の低下による治療の中断を抑えることが可能になるので、治療効果を向上させることができる。 <1-9. Effect>
As described above, according to the present embodiment, the input unit 21 inputs voices and images of the patient and the doctor, and the extraction unit (for example, the processing unit 22) extracts the patient and the doctor from the voices and images of the patient and the doctor. The estimation unit (for example, the processing unit 22) estimates the patient's satisfaction level, dissatisfaction level, or anxiety level based on the extracted feature amount, and the output unit 23 outputs the patient's satisfaction degree, dissatisfaction or anxiety. As a result, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be displayed and visualized by the doctor terminal device 40 or the like. Therefore, the doctor can grasp the patient's degree of satisfaction, dissatisfaction, or anxiety, and can prevent interruption of treatment due to the patient's decreased willingness to continue treatment, thereby improving the treatment effect.

　また、推定部は、患者の満足度、不満度又は不安度に加え、患者の抑うつ度を推定し、出力部２３は、患者の満足度、不満度又は不安度に加え、患者の抑うつ度を出力してもよい。これにより、患者の満足度、不満度又は不安度に加え、患者の抑うつ度を医師端末装置４０等によって表示し、可視化することが可能になる。したがって、医師は患者の満足度、不満度又は不安度に加え、患者の抑うつ度を把握し、患者の治療継続意欲の低下による治療の中断を確実に抑えることが可能になるので、治療効果を確実に向上させることができる。 Further, the estimation unit estimates the patient's degree of depression in addition to the patient's degree of satisfaction, dissatisfaction or anxiety, and the output unit 23 estimates the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient. may be output. This makes it possible to display and visualize the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient by the doctor terminal device 40 or the like. Therefore, doctors can grasp the patient's degree of depression in addition to the patient's degree of satisfaction, dissatisfaction or anxiety, and can reliably prevent interruption of treatment due to a decrease in the patient's willingness to continue treatment. can definitely be improved.

　また、抽出部は、患者の音声と医師の音声とを分離し、患者の音声特徴量及び医師の音声特徴量を抽出してもよい。これにより、患者の音声と医師の音声とを分離しない場合に比べ、精度高い処理を実現することが可能となり、結果として、患者の満足度、不満度又は不安度を精度高く推定することができる。 In addition, the extraction unit may separate the patient's voice and the doctor's voice, and extract the patient's voice feature amount and the doctor's voice feature amount. As a result, it is possible to realize highly accurate processing compared to the case where the patient's voice and the doctor's voice are not separated, and as a result, it is possible to estimate the patient's degree of satisfaction, dissatisfaction, or anxiety with high accuracy. .

　また、抽出部は、医師の音声特徴量に基づいて医師の返答期待時間を求め、求めた返答期待時間における特徴量を求めてもよい。これにより、患者及び医師の音声と画像から、医師の返答期待時間だけの特徴量を求めればよくなるので、処理速度を向上させることができる。 In addition, the extraction unit may obtain the doctor's expected response time based on the doctor's speech feature amount, and obtain the feature amount at the obtained expected response time. As a result, it is only necessary to obtain the feature amount for the doctor's expected response time from the patient's and doctor's voices and images, so that the processing speed can be improved.

　また、抽出部は、医師の返答期待時間により意味付けられた医師の音声特徴量で学習された切り出し学習モデル２２ｂを用いて、返答期待時間を求めてもよい。これにより、返答期待時間を精度高く求めることができる。 In addition, the extracting unit may obtain the expected response time by using the extraction learning model 22b that is learned with the doctor's speech feature quantity that is given meaning by the doctor's expected response time. As a result, the expected response time can be obtained with high accuracy.

　また、返答期待時間及び医師の音声特徴量に基づいて、切り出し学習モデル２２ｂを生成する学習部をさらに備えてもよい。これにより、切り出し学習モデル２２ｂを適切に生成することができる。 In addition, a learning unit may be further provided that generates the extraction learning model 22b based on the expected response time and the doctor's speech feature amount. Thereby, the clipped learning model 22b can be generated appropriately.

　また、推定部は、患者の満足度、不満度又は不安度により意味付けられた特徴量で学習された学習モデル２２ｄを用いて、患者の満足度、不満度又は不安度を推定してもよい。これにより、患者の満足度、不満度又は不安度を精度高く求めることができる。 In addition, the estimating unit may estimate the patient's degree of satisfaction, dissatisfaction, or anxiety using a learning model 22d that has been learned with feature values assigned meaning by the degree of satisfaction, dissatisfaction, or anxiety of the patient. . Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.

　また、患者の満足度、不満度又は不安度に関するアンケート又は医師によるスコアリング結果、及び特徴量に基づいて、学習モデル２２ｄを生成する学習部をさらに備えてもよい。これにより、学習モデル２２ｄを適切に生成することができる。 Further, a learning unit may be provided that generates the learning model 22d based on the results of a questionnaire regarding patient satisfaction, dissatisfaction, or anxiety, or the results of scoring by a doctor, and feature amounts. Thereby, the learning model 22d can be appropriately generated.

　また、推定部は、患者の画像から顔画像を推定し、顔画像より特徴量を抽出してもよい。これにより、患者の満足度、不満度又は不安度を精度高く求めることができる。 Also, the estimation unit may estimate a facial image from the patient's image and extract the feature amount from the facial image. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.

　また、抽出部は、特徴量として、患者及び医師の視線交錯に関する視線交錯情報を抽出してもよい。これにより、患者の満足度、不満度又は不安度を精度高く求めることができる。 In addition, the extracting unit may extract, as the feature quantity, gaze crossing information regarding the gaze crossing of the patient and the doctor. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.

　また、推定部は、視線交錯情報に基づいて、患者及び医師のアイコンタクトの度合い及び頻度の一方又は両方を求め、求めたアイコンタクトの度合い及び頻度の一方又は両方に基づいて、患者の満足度、不満度又は不安度を推定してもよい。これにより、患者の満足度、不満度又は不安度をより精度高く求めることができる。 Further, the estimating unit obtains one or both of the degree and frequency of eye contact between the patient and the doctor based on the eye-crossing information, and calculates the satisfaction level of the patient based on one or both of the obtained degree and frequency of eye contact. , dissatisfaction or anxiety may be estimated. As a result, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with higher accuracy.

　また、入力部２１は、患者の画像を取得する第１カメラの位置及び医師の画像を取得する第２カメラの位置を入力し、抽出部は、患者、医師、第１カメラ及び第２カメラの個々の位置、患者の画像及び医師の画像に基づいて、視線交錯情報を抽出してもよい。これにより、視線交錯情報を精度高く求めることができる。 Further, the input unit 21 inputs the position of the first camera that acquires the patient's image and the position of the second camera that acquires the doctor's image, and the extraction unit inputs the position of the patient, the doctor, the first camera, and the second camera. Eye-crossing information may be extracted based on individual positions, patient images, and doctor images. Thereby, the line-of-sight crossing information can be obtained with high accuracy.

　また、患者、医師、第１カメラ及び第２カメラは、それぞれ同じ仮想円上に位置してもよい。これにより、視線交錯情報を容易に求めることができる。 Also, the patient, the doctor, the first camera, and the second camera may each be positioned on the same virtual circle. Thereby, the line-of-sight crossing information can be easily obtained.

　また、患者及び医師は、それぞれ同じ仮想円上に位置し、第１カメラ及び第２カメラは、それぞれ仮想円の中心に位置してもよい。これにより、視線交錯情報を容易に求めることができる。 Also, the patient and the doctor may be positioned on the same virtual circle, and the first camera and the second camera may be positioned at the center of the virtual circle. Thereby, the line-of-sight crossing information can be easily obtained.

　また、抽出部は、特徴量として、患者の顔が医師に向いていること又は医師の顔より下方に向いていることを示す特徴量を抽出してもよい。これにより、患者の満足度、不満度又は不安度を精度高く求めることができる。 In addition, the extraction unit may extract, as the feature amount, a feature amount indicating that the patient's face is facing the doctor or facing downward from the doctor's face. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.

　また、抽出部は、特徴量として、医師の質問に対して患者が発話するまでの応答時間を抽出してもよい。これにより、患者の満足度、不満度又は不安度を精度高く求めることができる。 In addition, the extraction unit may extract, as a feature amount, the response time until the patient speaks to the doctor's question. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.

　また、抽出部は、特徴量として、患者の画像から患者の心拍又は心拍揺らぎを抽出してもよい。これにより、患者の満足度、不満度又は不安度を精度高く求めることができる。 The extraction unit may also extract the patient's heartbeat or heartbeat fluctuation from the patient's image as the feature amount. Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.

　また、入力部２１は、患者及び医師の音声と画像に加え、患者のバイタル情報を入力し、抽出部は、患者及び医師の音声と画像に加え、患者のバイタル情報から、特徴量を抽出してもよい。これにより、患者の満足度、不満度又は不安度を精度高く求めることができる。 In addition, the input unit 21 inputs the voice and images of the patient and the doctor as well as the vital information of the patient, and the extraction unit extracts the feature amount from the voice and images of the patient and the doctor as well as the vital information of the patient. may Accordingly, the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety can be obtained with high accuracy.

　＜２．他の実施形態＞
　上述した実施形態（又は変形例）に係る処理は、上記実施形態以外にも種々の異なる形態（変形例）にて実施されてよい。例えば、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 <2. Other Embodiments>
The processing according to the above-described embodiments (or modifications) may be implemented in various different forms (modifications) other than the above embodiments. For example, among the processes described in the above embodiments, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being performed manually can be performed manually. All or part of this can also be done automatically by known methods. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the illustrated information.

　また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Also, each component of each device illustrated is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

　また、上述した実施形態（又は変形例）は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。また、本明細書に記載された効果はあくまで例示であって限定されるものでは無く、他の効果があってもよい。 In addition, the above-described embodiments (or modifications) can be appropriately combined within a range that does not contradict the processing content. Also, the effects described in this specification are only examples and are not limited, and other effects may be provided.

　＜３．ハードウェア構成の一例＞
　上述した実施形態（又は変形例）に係るサーバ装置２０や医師端末装置４０、患者端末装置５０等の情報機器の具体的なハードウェア構成例について説明する。実施形態（又は変形例）に係るサーバ装置２０や医師端末装置４０、患者端末装置５０等の情報機器は、例えば、図１５に示すような構成のコンピュータ５００によって実現されてもよい。図１５は、実施形態（又は変形例）に係るサーバ装置２０や医師端末装置４０、患者端末装置５０等の情報機器の機能を実現するハードウェアの構成例を示す図である。 <3. Example of hardware configuration>
Specific hardware configuration examples of information devices such as the server device 20, the doctor terminal device 40, and the patient terminal device 50 according to the above-described embodiment (or modification) will be described. Information devices such as the server device 20, the doctor terminal device 40, and the patient terminal device 50 according to the embodiment (or modification) may be implemented by, for example, a computer 500 configured as shown in FIG. FIG. 15 is a diagram showing a configuration example of hardware that implements the functions of information devices such as the server device 20, the doctor terminal device 40, and the patient terminal device 50 according to the embodiment (or modification).

　図１５に示すように、コンピュータ５００は、ＣＰＵ５１０、ＲＡＭ５２０、ＲＯＭ（Read　Only　Memory）５３０、ＨＤＤ（Hard　Disk　Drive）５４０、通信インターフェイス５５０及び入出力インターフェイス５６０を有する。コンピュータ５００の各部は、バス５７０によって接続される。 As shown in FIG. 15, the computer 500 has a CPU 510, a RAM 520, a ROM (Read Only Memory) 530, a HDD (Hard Disk Drive) 540, a communication interface 550 and an input/output interface 560. The parts of computer 500 are connected by bus 570 .

　ＣＰＵ５１０は、ＲＯＭ５３０又はＨＤＤ５４０に格納されたプログラムに基づいて動作し、各部の制御を行う。例えば、ＣＰＵ５１０は、ＲＯＭ５３０又はＨＤＤ５４０に格納されたプログラムをＲＡＭ５２０に展開し、各種プログラムに対応した処理を実行する。 The CPU 510 operates based on programs stored in the ROM 530 or HDD 540 and controls each section. For example, the CPU 510 loads programs stored in the ROM 530 or HDD 540 into the RAM 520 and executes processes corresponding to various programs.

　ＲＯＭ５３０は、コンピュータ５００の起動時にＣＰＵ５１０によって実行されるＢＩＯＳ（Basic　Input　Output　System）等のブートプログラムや、コンピュータ５００のハードウェアに依存するプログラム等を格納する。 The ROM 530 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 510 when the computer 500 is started, a program depending on the hardware of the computer 500, and the like.

　ＨＤＤ５４０は、ＣＰＵ５１０によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を非一時的に記録する、コンピュータが読み取り可能な記録媒体である。具体的には、ＨＤＤ５４０は、プログラムデータ５４１の一例である本開示に係る情報処理プログラムを記録する記録媒体である。 The HDD 540 is a computer-readable recording medium that non-temporarily records programs executed by the CPU 510 and data used by such programs. Specifically, the HDD 540 is a recording medium that records an information processing program according to the present disclosure, which is an example of the program data 541 .

　通信インターフェイス５５０は、コンピュータ５００が外部ネットワーク５８０（一例としてインターネット）と接続するためのインターフェイスである。例えば、ＣＰＵ５１０は、通信インターフェイス５５０を介して、他の機器からデータを受信したり、ＣＰＵ５１０が生成したデータを他の機器へ送信したりする。 The communication interface 550 is an interface for connecting the computer 500 to an external network 580 (Internet as an example). For example, CPU 510 receives data from another device or transmits data generated by CPU 510 to another device via communication interface 550 .

　入出力インターフェイス５６０は、入出力デバイス５９０とコンピュータ５００とを接続するためのインターフェイスである。例えば、ＣＰＵ５１０は、入出力インターフェイス５６０を介して、キーボードやマウス等の入力デバイスからデータを受信する。また、ＣＰＵ５１０は、入出力インターフェイス５６０を介して、ディスプレイやスピーカーやプリンタ等の出力デバイスにデータを送信する。 The input/output interface 560 is an interface for connecting the input/output device 590 and the computer 500 . For example, CPU 510 receives data from an input device such as a keyboard or mouse via input/output interface 560 . The CPU 510 also transmits data to an output device such as a display, speaker, or printer via the input/output interface 560 .

　なお、入出力インターフェイス５６０は、所定の記録媒体（メディア）に記録されたプログラム等を読み取るメディアインターフェイスとして機能してもよい。メディアとしては、例えば、ＤＶＤ（Digital　Versatile　Disc）、ＰＤ（Phase　change　rewritable　Disk）等の光学記録媒体、ＭＯ（Magneto-Optical　disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、又は、半導体メモリ等が用いられる。 Note that the input/output interface 560 may function as a media interface for reading programs and the like recorded on a predetermined recording medium (media). Examples of media include optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable Disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, or semiconductor A memory or the like is used.

　ここで、例えば、コンピュータ５００が実施形態に係るサーバ装置２０、医師端末装置４０又は患者端末装置５０として機能する場合、コンピュータ５００のＣＰＵ５１０は、ＲＡＭ５２０上にロードされた情報処理プログラムを実行することにより、サーバ装置２０、医師端末装置４０又は患者端末装置５０が有する各機能の全てや一部を実現する。また、ＨＤＤ５４０には、本開示に係る情報処理プログラムやデータ（例えば、客観データや主観データ、客観スコアデータ、主観スコアデータ、スコア画像等）が格納される。なお、ＣＰＵ５１０は、プログラムデータ５４１をＨＤＤ５４０から読み取って実行するが、他の例として、外部ネットワーク５８０を介して、他の装置からこれらのプログラムを取得するようにしてもよい。 Here, for example, when the computer 500 functions as the server device 20, the doctor terminal device 40, or the patient terminal device 50 according to the embodiment, the CPU 510 of the computer 500 executes the information processing program loaded on the RAM 520. , the server device 20, the doctor terminal device 40, or the patient terminal device 50. The HDD 540 also stores information processing programs and data (eg, objective data, subjective data, objective score data, subjective score data, score images, etc.) according to the present disclosure. Although CPU 510 reads and executes program data 541 from HDD 540 , as another example, these programs may be obtained from another device via external network 580 .

　＜４．付記＞
　なお、本技術は以下のような構成も取ることができる。
（１）
　患者及び医師の音声と画像を入力する入力部と、
　前記患者及び前記医師の音声と画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、
　前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定する推定部と、
　前記患者の満足度、不満度又は不安度を出力する出力部と、
を備える情報処理装置。
（２）
　前記推定部は、前記患者の満足度、不満度又は不安度に加え、前記患者の抑うつ度を推定し、
　前記出力部は、前記患者の満足度、不満度又は不安度に加え、前記患者の抑うつ度を出力する、
　上記（１）に記載の情報処理装置。
（３）
　前記抽出部は、前記患者の音声と前記医師の音声とを分離し、前記患者の音声特徴量及び前記医師の音声特徴量を抽出する、
　上記（１）又は（２）に記載の情報処理装置。
（４）
　前記抽出部は、前記医師の音声特徴量に基づいて前記医師の返答期待時間を求め、求めた前記返答期待時間における前記特徴量を求める、
　上記（３）に記載の情報処理装置。
（５）
　前記抽出部は、前記返答期待時間により意味付けられた前記医師の音声特徴量で学習された切り出し学習モデルを用いて、前記返答期待時間を求める、
　上記（４）に記載の情報処理装置。
（６）
　前記返答期待時間及び前記医師の音声特徴量に基づいて、前記切り出し学習モデルを生成する学習部をさらに備える、
　上記（５）に記載の情報処理装置。
（７）
　前記推定部は、前記患者の満足度、不満度又は不安度により意味付けられた前記特徴量で学習された学習モデルを用いて、前記患者の満足度、不満度又は不安度を推定する、
　上記（１）から（６）のいずれか一つに記載の情報処理装置。
（８）
　前記患者の満足度、不満度又は不安度に関するアンケート又は前記医師によるスコアリング結果、及び前記特徴量に基づいて、前記学習モデルを生成する学習部をさらに備える、
　上記（７）に記載の情報処理装置。
（９）
　前記推定部は、前記患者の画像から顔画像を推定し、前記顔画像より前記特徴量を抽出する、
　上記（１）から（８）のいずれか一つに記載の情報処理装置。
（１０）
　前記抽出部は、前記特徴量として、前記患者及び前記医師の視線交錯に関する視線交錯情報を抽出する、
　上記（１）から（９）のいずれか一つに記載の情報処理装置。
（１１）
　前記推定部は、前記視線交錯情報に基づいて、前記患者及び前記医師のアイコンタクトの度合い及び頻度の一方又は両方を求め、求めた前記アイコンタクトの度合い及び頻度の一方又は両方に基づいて、前記患者の満足度、不満度又は不安度を推定する、
　上記（１０）に記載の情報処理装置。
（１２）
　前記入力部は、前記患者の画像を取得する第１カメラの位置及び前記医師の画像を取得する第２カメラの位置を入力し、
　前記抽出部は、前記患者、前記医師、前記第１カメラ及び前記第２カメラの個々の位置、前記患者の画像及び前記医師の画像に基づいて、前記視線交錯情報を抽出する、
　上記（１０）又は（１１）に記載の情報処理装置。
（１３）
　前記患者、前記医師、前記第１カメラ及び前記第２カメラは、それぞれ同じ仮想円上に位置する、
　上記（１２）に記載の情報処理装置。
（１４）
　前記患者、前記医師は、それぞれ同じ仮想円上に位置し、
　前記第１カメラ及び前記第２カメラは、それぞれ前記仮想円の中心に位置する、
　上記（１２）に記載の情報処理装置。
（１５）
　前記抽出部は、前記特徴量として、前記患者の顔が前記医師に向いていること又は前記医師の顔より下方に向いていることを示す特徴量を抽出する、
　上記（１）から（１４）のいずれか一つに記載の情報処理装置。
（１６）
　前記抽出部は、前記特徴量として、前記医師の質問に対して前記患者が発話するまでの応答時間を抽出する、
　上記（１）から（１５）のいずれか一つに記載の情報処理装置。
（１７）
　前記抽出部は、前記特徴量として、前記患者の画像から前記患者の心拍又は心拍揺らぎを抽出する、
　上記（１）から（１６）のいずれか一つに記載の情報処理装置。
（１８）
　前記入力部は、前記患者及び前記医師の音声と画像に加え、前記患者のバイタル情報を入力し、
　前記抽出部は、前記患者及び前記医師の音声と画像に加え、前記患者のバイタル情報から、前記特徴量を抽出する、
　上記（１）から（１７）のいずれか一つに記載の情報処理装置。
（１９）
　医師の音声を入力する入力部と、
　前記医師の音声から、前記医師と患者とのコミュニケーションに関する前記医師の音声特徴量を抽出する抽出部と、
　前記音声特徴量と、前記医師の返答期待時間に関する情報とに基づいて、前記返答期待時間を学習する切り出し学習部と、
を備える情報処理装置。
（２０）
　患者及び医師の音声と画像を入力する入力部と、
　前記患者及び前記医師の音声と画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、
　前記特徴量と、前記患者の満足度、不満度又は不安度に関するアンケートとに基づいて、前記患者の満足度、不満度又は不安度を学習する学習部と、
を備える情報処理装置。
（２１）
　患者及び医師の音声及び画像を取得する情報取得装置と、
　前記患者及び前記医師の音声及び画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、
　前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定する推定部と、
　前記患者の満足度、不満度又は不安度を表示する表示部と、
を備える情報処理システム。
（２２）
　コンピュータが、
　患者及び医師の音声及び画像を取得し、
　前記患者及び前記医師の音声及び画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出し、
　前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定し、
　前記患者の満足度、不満度又は不安度を表示する、
ことを含む情報処理方法。
（２３）
　上記（１）から（２０）のいずれか一つに記載の情報処理装置を用いる情報処理方法。
（２４）
　上記（１）から（２０）のいずれか一つに記載の情報処理装置を備える情報処理システム。 <4. Note>
Note that the present technology can also take the following configuration.
(1)
an input unit for inputting voices and images of a patient and a doctor;
an extraction unit that extracts a feature amount related to communication between the patient and the doctor from the voices and images of the patient and the doctor;
An estimating unit that estimates the patient's satisfaction, dissatisfaction, or anxiety based on the feature amount;
an output unit that outputs the patient's satisfaction level, dissatisfaction level or anxiety level;
Information processing device.
(2)
The estimation unit estimates the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient,
The output unit outputs the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient,
The information processing apparatus according to (1) above.
(3)
The extraction unit separates the patient's voice and the doctor's voice, and extracts the patient's voice feature amount and the doctor's voice feature amount.
The information processing apparatus according to (1) or (2) above.
(4)
The extraction unit obtains the doctor's expected response time based on the doctor's voice feature amount, and obtains the feature amount at the obtained expected response time.
The information processing apparatus according to (3) above.
(5)
The extracting unit obtains the expected response time using a clipping learning model learned with the doctor's voice feature value assigned meaning by the expected response time.
The information processing apparatus according to (4) above.
(6)
Further comprising a learning unit that generates the clipping learning model based on the expected response time and the doctor's speech feature amount,
The information processing apparatus according to (5) above.
(7)
The estimating unit estimates the patient's satisfaction, dissatisfaction or anxiety using a learning model learned with the feature value assigned by the patient's satisfaction, dissatisfaction or anxiety.
The information processing apparatus according to any one of (1) to (6) above.
(8)
A learning unit that generates the learning model based on the patient's satisfaction, dissatisfaction or anxiety questionnaire or scoring results by the doctor and the feature amount.
The information processing apparatus according to (7) above.
(9)
The estimation unit estimates a facial image from the patient's image and extracts the feature amount from the facial image.
The information processing apparatus according to any one of (1) to (8) above.
(10)
The extracting unit extracts line-of-sight crossing information related to line-of-sight crossing of the patient and the doctor as the feature amount.
The information processing apparatus according to any one of (1) to (9) above.
(11)
The estimation unit obtains one or both of the degree and frequency of eye contact between the patient and the doctor based on the eye-crossing information, and based on one or both of the obtained degree and frequency of eye contact, estimate patient satisfaction, dissatisfaction or anxiety;
The information processing apparatus according to (10) above.
(12)
The input unit inputs a position of a first camera that acquires an image of the patient and a position of a second camera that acquires an image of the doctor,
The extraction unit extracts the line-of-sight crossing information based on the individual positions of the patient, the doctor, the first camera and the second camera, the patient's image and the doctor's image,
The information processing apparatus according to (10) or (11) above.
(13)
the patient, the doctor, the first camera and the second camera are positioned on the same virtual circle,
The information processing device according to (12) above.
(14)
The patient and the doctor are positioned on the same virtual circle,
The first camera and the second camera are each positioned at the center of the virtual circle,
The information processing device according to (12) above.
(15)
The extraction unit extracts, as the feature amount, a feature amount indicating that the patient's face is facing the doctor or facing downward from the doctor's face.
The information processing apparatus according to any one of (1) to (14) above.
(16)
The extraction unit extracts a response time until the patient speaks to the doctor's question as the feature quantity,
The information processing apparatus according to any one of (1) to (15) above.
(17)
The extraction unit extracts the patient's heartbeat or heartbeat fluctuation from the patient's image as the feature amount.
The information processing apparatus according to any one of (1) to (16) above.
(18)
The input unit inputs vital information of the patient in addition to the voice and image of the patient and the doctor,
The extraction unit extracts the feature amount from the patient's vital information in addition to the voice and image of the patient and the doctor.
The information processing apparatus according to any one of (1) to (17) above.
(19)
an input unit for inputting a doctor's voice;
an extracting unit that extracts the doctor's voice feature amount related to communication between the doctor and the patient from the doctor's voice;
a clipping learning unit that learns the expected response time based on the audio feature amount and information about the expected response time of the doctor;
Information processing device.
(20)
an input unit for inputting voices and images of a patient and a doctor;
an extraction unit that extracts a feature amount related to communication between the patient and the doctor from the voices and images of the patient and the doctor;
A learning unit that learns the patient's satisfaction level, dissatisfaction level, or anxiety level based on the feature amount and a questionnaire regarding the patient's satisfaction level, dissatisfaction level, or anxiety level;
Information processing device.
(21)
an information acquisition device that acquires voices and images of patients and doctors;
an extraction unit that extracts a feature amount related to communication between the patient and the doctor from the voices and images of the patient and the doctor;
An estimating unit that estimates the patient's satisfaction, dissatisfaction, or anxiety based on the feature amount;
a display unit that displays the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety;
An information processing system comprising
(22)
the computer
Acquiring voice and images of patients and doctors,
Extracting a feature amount related to communication between the patient and the doctor from the voice and image of the patient and the doctor,
Estimate the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature amount,
indicating the patient's level of satisfaction, dissatisfaction or anxiety;
information processing method, including
(23)
An information processing method using the information processing apparatus according to any one of (1) to (20) above.
(24)
An information processing system comprising the information processing apparatus according to any one of (1) to (20) above.

　１０　　情報処理システム
　２０　　サーバ装置
　２１　　入力部
　２２　　処理部
　２２ａ　音声特徴量抽出部
　２２ｂ　切り出し学習モデル
　２２ｃ　音声・画像特徴量抽出部
　２２ｄ　学習モデル
　２３　　出力部
　３０　　情報取得装置
　４０　　医師端末装置
　４１　　制御部
　４２　　通信部
　４３　　表示部
　４４　　操作部
　５０　　患者端末装置
　５１　　制御部
　５２　　通信部
　５３　　表示部
　５４　　操作部
　６０　　通信網
　５００　コンピュータ
　５１０　ＣＰＵ
　５２０　ＲＡＭ
　５３０　ＲＯＭ
　５４０　ＨＤＤ
　５４１　プログラムデータ
　５５０　通信インターフェイス
　５６０　入出力インターフェイス
　５７０　バス
　５８０　外部ネットワーク
　５９０　入出力デバイス REFERENCE SIGNS LIST 10 information processing system 20 server device 21 input unit 22 processing unit 22a voice feature quantity extraction unit 22b clipping learning model 22c voice/image feature quantity extraction unit 22d learning model 23 output unit 30 information acquisition device 40 doctor terminal device 41 control unit 42 communication Part 43 Display Part 44 Operation Part 50 Patient Terminal Device 51 Control Part 52 Communication Part 53 Display Part 54 Operation Part 60 Communication Network 500 Computer 510 CPU
520 RAM
530 ROMs
540 HDD
541 program data 550 communication interface 560 input/output interface 570 bus 580 external network 590 input/output device

Claims

　患者及び医師の音声と画像を入力する入力部と、
　前記患者及び前記医師の音声と画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、
　前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定する推定部と、
　前記患者の満足度、不満度又は不安度を出力する出力部と、
を備える情報処理装置。 an input unit for inputting voices and images of a patient and a doctor;
an extraction unit that extracts a feature amount related to communication between the patient and the doctor from the voices and images of the patient and the doctor;
An estimating unit that estimates the patient's satisfaction, dissatisfaction, or anxiety based on the feature amount;
an output unit that outputs the patient's satisfaction level, dissatisfaction level or anxiety level;
Information processing device.
　前記推定部は、前記患者の満足度、不満度又は不安度に加え、前記患者の抑うつ度を推定し、
　前記出力部は、前記患者の満足度、不満度又は不安度に加え、前記患者の抑うつ度を出力する、
　請求項１に記載の情報処理装置。 The estimation unit estimates the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient,
The output unit outputs the degree of depression of the patient in addition to the degree of satisfaction, dissatisfaction or anxiety of the patient,
The information processing device according to claim 1 .
　前記抽出部は、前記患者の音声と前記医師の音声とを分離し、前記患者の音声特徴量及び前記医師の音声特徴量を抽出する、
　請求項１に記載の情報処理装置。 The extraction unit separates the patient's voice and the doctor's voice, and extracts the patient's voice feature amount and the doctor's voice feature amount.
The information processing device according to claim 1 .
　前記抽出部は、前記医師の音声特徴量に基づいて前記医師の返答期待時間を求め、求めた前記返答期待時間における前記特徴量を求める、
　請求項３に記載の情報処理装置。 The extraction unit obtains the doctor's expected response time based on the doctor's voice feature amount, and obtains the feature amount at the obtained expected response time.
The information processing apparatus according to claim 3.
　前記抽出部は、前記返答期待時間により意味付けられた前記医師の音声特徴量で学習された切り出し学習モデルを用いて、前記返答期待時間を求める、
　請求項４に記載の情報処理装置。 The extracting unit obtains the expected response time using a clipping learning model learned with the doctor's voice feature value assigned meaning by the expected response time.
The information processing apparatus according to claim 4.
　前記返答期待時間及び前記医師の音声特徴量に基づいて、前記切り出し学習モデルを生成する学習部をさらに備える、
　請求項５に記載の情報処理装置。 Further comprising a learning unit that generates the clipping learning model based on the expected response time and the doctor's speech feature amount,
The information processing device according to claim 5 .
　前記推定部は、前記患者の満足度、不満度又は不安度により意味付けられた前記特徴量で学習された学習モデルを用いて、前記患者の満足度、不満度又は不安度を推定する、
　請求項１に記載の情報処理装置。 The estimating unit estimates the patient's satisfaction, dissatisfaction or anxiety using a learning model learned with the feature value assigned by the patient's satisfaction, dissatisfaction or anxiety.
The information processing device according to claim 1 .
　前記患者の満足度、不満度又は不安度に関するアンケート又は前記医師によるスコアリング結果、及び前記特徴量に基づいて、前記学習モデルを生成する学習部をさらに備える、
　請求項７に記載の情報処理装置。 A learning unit that generates the learning model based on the patient's satisfaction, dissatisfaction or anxiety questionnaire or scoring results by the doctor and the feature amount.
The information processing apparatus according to claim 7.
　前記推定部は、前記患者の画像から顔画像を推定し、前記顔画像より前記特徴量を抽出する、
　請求項１に記載の情報処理装置。 The estimation unit estimates a facial image from the patient's image and extracts the feature amount from the facial image.
The information processing device according to claim 1 .
　前記抽出部は、前記特徴量として、前記患者及び前記医師の視線交錯に関する視線交錯情報を抽出する、
　請求項１に記載の情報処理装置。 The extracting unit extracts line-of-sight crossing information related to line-of-sight crossing of the patient and the doctor as the feature amount.
The information processing device according to claim 1 .
　前記推定部は、前記視線交錯情報に基づいて、前記患者及び前記医師のアイコンタクトの度合い及び頻度の一方又は両方を求め、求めた前記アイコンタクトの度合い及び頻度の一方又は両方に基づいて、前記患者の満足度、不満度又は不安度を推定する、
　請求項１０に記載の情報処理装置。 The estimation unit obtains one or both of the degree and frequency of eye contact between the patient and the doctor based on the eye-crossing information, and based on one or both of the obtained degree and frequency of eye contact, estimate patient satisfaction, dissatisfaction or anxiety;
The information processing apparatus according to claim 10.
　前記入力部は、前記患者の画像を取得する第１カメラの位置及び前記医師の画像を取得する第２カメラの位置を入力し、
　前記抽出部は、前記患者、前記医師、前記第１カメラ及び前記第２カメラの個々の位置、前記患者の画像及び前記医師の画像に基づいて、前記視線交錯情報を抽出する、
　請求項１０に記載の情報処理装置。 The input unit inputs a position of a first camera that acquires an image of the patient and a position of a second camera that acquires an image of the doctor,
The extraction unit extracts the line-of-sight crossing information based on the individual positions of the patient, the doctor, the first camera and the second camera, the patient's image and the doctor's image,
The information processing apparatus according to claim 10.
　前記患者、前記医師、前記第１カメラ及び前記第２カメラは、それぞれ同じ仮想円上に位置する、
　請求項１２に記載の情報処理装置。 the patient, the doctor, the first camera and the second camera are positioned on the same virtual circle,
The information processing apparatus according to claim 12.
　前記患者、前記医師は、それぞれ同じ仮想円上に位置し、
　前記第１カメラ及び前記第２カメラは、それぞれ前記仮想円の中心に位置する、
　請求項１２に記載の情報処理装置。 The patient and the doctor are positioned on the same virtual circle,
The first camera and the second camera are each positioned at the center of the virtual circle,
The information processing apparatus according to claim 12.
　前記抽出部は、前記特徴量として、前記患者の顔が前記医師に向いていること又は前記医師の顔より下方に向いていることを示す特徴量を抽出する、
　請求項１に記載の情報処理装置。 The extraction unit extracts, as the feature amount, a feature amount indicating that the patient's face is facing the doctor or facing downward from the doctor's face.
The information processing device according to claim 1 .
　前記抽出部は、前記特徴量として、前記医師の質問に対して前記患者が発話するまでの応答時間を抽出する、
　請求項１に記載の情報処理装置。 The extraction unit extracts a response time until the patient speaks to the doctor's question as the feature quantity,
The information processing device according to claim 1 .
　前記抽出部は、前記特徴量として、前記患者の画像から前記患者の心拍又は心拍揺らぎを抽出する、
　請求項１に記載の情報処理装置。 The extraction unit extracts the patient's heartbeat or heartbeat fluctuation from the patient's image as the feature amount.
The information processing device according to claim 1 .
　前記入力部は、前記患者及び前記医師の音声と画像に加え、前記患者のバイタル情報を入力し、
　前記抽出部は、前記患者及び前記医師の音声と画像に加え、前記患者のバイタル情報から、前記特徴量を抽出する、
　請求項１に記載の情報処理装置。 The input unit inputs vital information of the patient in addition to the voice and image of the patient and the doctor,
The extraction unit extracts the feature amount from the patient's vital information in addition to the voice and image of the patient and the doctor.
The information processing device according to claim 1 .
　患者及び医師の音声と画像を入力する入力部と、
　前記患者及び前記医師の音声と画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、
　前記特徴量と、前記患者の満足度、不満度又は不安度に関するアンケート又は前記医師によるスコアリング結果とに基づいて、前記患者の満足度、不満度又は不安度を学習する学習部と、
を備える情報処理装置。 an input unit for inputting voices and images of a patient and a doctor;
an extraction unit that extracts a feature amount related to communication between the patient and the doctor from the voices and images of the patient and the doctor;
a learning unit that learns the degree of satisfaction, degree of dissatisfaction, or degree of anxiety of the patient based on the feature amount and a questionnaire regarding the degree of satisfaction, degree of dissatisfaction, or degree of anxiety of the patient or a scoring result by the doctor;
Information processing device.
　患者及び医師の音声及び画像を取得する情報取得装置と、
　前記患者及び前記医師の音声及び画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出する抽出部と、
　前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定する推定部と、
　前記患者の満足度、不満度又は不安度を表示する表示部と、
を備える情報処理システム。 an information acquisition device that acquires voices and images of patients and doctors;
an extraction unit that extracts a feature amount related to communication between the patient and the doctor from the voices and images of the patient and the doctor;
An estimating unit that estimates the patient's satisfaction, dissatisfaction, or anxiety based on the feature amount;
a display unit that displays the patient's degree of satisfaction, degree of dissatisfaction, or degree of anxiety;
An information processing system comprising
　コンピュータが、
　患者及び医師の音声及び画像を取得し、
　前記患者及び前記医師の音声及び画像から、前記患者と前記医師とのコミュニケーションに関する特徴量を抽出し、
　前記特徴量に基づき、前記患者の満足度、不満度又は不安度を推定し、
　前記患者の満足度、不満度又は不安度を表示する、
ことを含む情報処理方法。 the computer
Acquiring voice and images of patients and doctors,
Extracting a feature amount related to communication between the patient and the doctor from the voice and image of the patient and the doctor,
Estimate the degree of satisfaction, dissatisfaction or anxiety of the patient based on the feature amount,
indicating the patient's level of satisfaction, dissatisfaction or anxiety;
information processing method, including