JP2016126569A

JP2016126569A - Behavior recognition device, method, and program

Info

Publication number: JP2016126569A
Application number: JP2015000409A
Authority: JP
Inventors: 公海高橋; Masami Takahashi; 真人松尾; Masato Matsuo
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-01-05
Filing date: 2015-01-05
Publication date: 2016-07-11

Abstract

PROBLEM TO BE SOLVED: To enable behavior to be recognized without using teacher data or a manually created knowledge base, and to thereby accurately recognize behavior without requiring huge effort and time and costs, even when the behavior to be recognized changes with a situation such as a place or time.SOLUTION: The present invention is provided with a behavioral knowledge base storage unit 34 in which a combination of the behavior of a person and the thing, place, situation, time, etc., that is an object of the behavior is written in language information. First, a detection value pertaining to the object of the behavior is acquired from a sensor, the acquired detection value is analyzed, and the detection values obtained at the same time of day are integrated, the result of which is converted into language information representing the object. Language information representing the corresponding behavior is retrieved from the behavioral knowledge base storage unit 34 on the basis of this converted language information representing the object, and the one among the retrieved language information that has highest occurrence probability is selected and outputted in documented form.SELECTED DRAWING: Figure 1

Description

この発明は、人や動物などの行動を認識するための行動認識装置、方法およびプログラムに関する。 The present invention relates to an action recognition apparatus, method, and program for recognizing actions of a person or an animal.

センサ技術の進展やデバイスの小型化により、スマートフォンを含む様々なセンサデバイスが我々の日常生活に浸透しつつあり、そういったセンサを利用して人間の行動や状況を認識する研究が近年数多く行われている。人間の行動を観測・認識する技術としては、加速度センサを用いて「座る」、「歩く」、「走る」といった基本的な行動を認識するものや、人や環境に設置した多数のセンサを用いて「お茶を淹れる」、「料理をする」といったやや複雑な行動を認識するものがある（特許文献１または非特許文献１乃至４を参照）。 Due to advances in sensor technology and device miniaturization, various sensor devices including smartphones are penetrating into our daily lives, and many studies have been conducted in recent years to recognize human behavior and situations using such sensors. Yes. As technology for observing and recognizing human behaviors, we use accelerometers to recognize basic behaviors such as “sit”, “walk” and “run”, and many sensors installed in people and the environment. Some of them recognize somewhat complicated behaviors such as “tea making” and “cooking” (see Patent Document 1 or Non-Patent Documents 1 to 4).

特許第４６７９６７７号公報Japanese Patent No. 4679967

Philipose, M, et al. Inferring activities from interactions with objects. Pervasive Computing, 3(4), 50-57, 2004.Philipose, M, et al. Inferring activities from interactions with objects.Pervasive Computing, 3 (4), 50-57, 2004. 前川卓也ほか、「手首に装着したカメラ付きセンサデバイスを用いた行動認識手法」、電子情報通信学会論文誌B，95(11), 1480-1490, 2012.Takuya Maekawa et al., “Action Recognition Method Using a Sensor Device with a Camera Worn on the Wrist”, IEICE Transactions B, 95 (11), 1480-1490, 2012. Bao, L. and S. S. Intille，“Activity recognition from user-annotated acceleration data”，Proceedings of the Second International Conference on Pervasive Computing, 3001, 1-17, 2004.Bao, L. and S. S. Intille, “Activity recognition from user-annotated acceleration data”, Proceedings of the Second International Conference on Pervasive Computing, 3001, 1-17, 2004. 前川卓也ほか、「Tag and Think：モノに添付したセンサノードのためのモノ自身の推定」、情報処理学会論文誌、Vol. 49, No. 6, pp. 1896-1906 （2008年6月）。Takuya Maekawa et al., “Tag and Think: Estimating Objects for Sensor Nodes Attached to Objects”, IPSJ Transactions, Vol. 49, No. 6, pp. 1896-1906 (June 2008).

ところが、特許文献１および非特許文献１乃至３に記載された技術の多くは、教師データを用いた学習結果をベースとしているため、予め対象とする行動の正解となるセンシングデータを教師データとして準備する必要がある。センシングデータは個人差による影響も大きいため、被験者が同じ行動を何度も試行し、特徴を抽出して教師データとする必要があり、このためデータベースの構築には多大な手間と時間を要し、コストがかかる。 However, since many of the techniques described in Patent Document 1 and Non-Patent Documents 1 to 3 are based on learning results using teacher data, sensing data that is a correct answer to a target action is prepared in advance as teacher data. There is a need to. Sensing data is greatly influenced by individual differences, so it is necessary for the subject to try the same action over and over, extract features, and use it as teacher data. ,There will be a cost.

また、非特許文献４に記載された技術は、教師データを事前に準備しない手法であるが、行動を認識するための知識モデルを人手で記述する必要があり、この技術も実施するには多大なコストがかかる。そのため、認識対象とする行動数を増やすには多大な労力が必要となり、この結果認識可能な行動数は十数程度が限界となっている。このため、人間が日常的に行う行動は非常に多岐にわたるが、従来技術では多数の行動を認識するには至っていない。 The technique described in Non-Patent Document 4 is a technique that does not prepare teacher data in advance. However, it is necessary to manually describe a knowledge model for recognizing actions, and this technique is also very difficult to implement. Cost. Therefore, enormous labor is required to increase the number of actions to be recognized, and as a result, the number of actions that can be recognized is limited to about a dozen. For this reason, the actions that humans perform on a daily basis are very diverse, but the prior art has not yet recognized many actions.

この発明は上記事情に着目してなされたもので、その目的とするところは、教師データや人手により作成した知識ベースを使用することなく行動を認識できるようにし、これにより多大な手間と時間およびコストを要することなく、しかも認識対象の行動が場所・時間などの状況によって変わる場合であっても、正確に認識することができる行動認識装置、方法およびプログラムを提供することにある。 The present invention has been made paying attention to the above circumstances, and its purpose is to make it possible to recognize actions without using a teacher data or a knowledge base created by hand, thereby making it possible to save a lot of time and effort. An object of the present invention is to provide an action recognition apparatus, method, and program capable of accurately recognizing even when the action to be recognized changes depending on the situation such as place and time, without cost.

上記目的を達成するためにこの発明では次のような対策を講じている。すなわち、行動を認識するための知識モデルとして、例えばWeb上のテキストデータのように不特定多数の人が利用するサイトのデータから構築したものを使用している。 In order to achieve the above object, the present invention takes the following measures. That is, as a knowledge model for recognizing an action, a model constructed from data of a site used by an unspecified number of people such as text data on the Web is used.

一般に、Web上のテキストデータは人の多種多様な行動に関する情報を含んでおり、しかも人が行動の対象に関して認識し解釈した結果が言語情報により記述されている。例えば、人間が常識的に持つような知識として、ある対象が「いつ」、「どこで」、「人間とどのようなインタラクションをとるのか」といった情報も抽出することが可能である。より具体的には、「はさみ」に対して人は「はさみで切る」という行動をとることが多い。これは人間にとって常識的な知識であるが、「はさみ」を手掛かりとして、人間がどのような行動をとるのかを、Webのテキストデータの解析結果から推定可能であることを意味する。 In general, text data on the Web includes information on a wide variety of human behaviors, and the results of human recognition and interpretation of behavioral objects are described in language information. For example, it is possible to extract information such as “when”, “where”, and “what kind of interaction with a human being” a certain object as knowledge common to human beings. More specifically, a person often takes the action of “cut with scissors” in response to “scissors”. This is common knowledge for humans, but it means that it is possible to estimate what actions humans will take from the analysis results of Web text data using "scissors" as a clue.

一方、Webのテキストデータと人の行動に関連するセンシングデータは異種データであり、通常、センシングデータそのものと人の行動とを結び付けるには、教師データや人手により作成された知識ベースが必要となる。しかしこの発明では、実世界に存在するモノや場所、時間などの行動対象に関連するセンシングデータを、周知技術を用いてテキストデータに変換し、この変換されたテキストデータを行動を認識する際の手掛かりにしている。例えば、画像を入力として画像中に写っているモノの名称を出力する物体認識アプリケーションや、ＧＰＳ（Global Positioning System）の緯度経度情報を住所やランドマークに変換するアプリケーションが、モノや場所、時間などの行動対象を表すセンシングデータとテキストデータとの仲介役となる。 On the other hand, web text data and sensing data related to human behavior are heterogeneous data. Usually, in order to link sensing data itself with human behavior, teacher data and a knowledge base created manually are required. . However, in the present invention, sensing data related to an action object such as an object, a place, and a time existing in the real world is converted into text data by using a well-known technique, and the converted text data is used when recognizing an action. I have a clue. For example, an object recognition application that outputs the name of an object in an image as an input, or an application that converts latitude and longitude information of GPS (Global Positioning System) into an address or landmark is used for things, places, time, etc. It acts as an intermediary between sensing data and text data representing the action target.

すなわち、人が行動する際の対象となるモノや場所、時間などをテキストデータに変換し、このテキストデータもしくは類似する語が、上記Webのテキストデータをもとに構築された行動知識ベースに含まれていれば、人の行動を認識することが可能となる。したがって、例えばWebのテキストデータをもとに人の一般的な行動とその対象との組み合わせを多く含むように構築された行動知識ベースを用い、さらに人の行動の対象を表すセンシングデータをテキストデータに変換する汎用アプリケーションを利用することで、特定の人の行動に限らず、また一人の人の行動が場所・時間などの状況によって変わる場合であっても、多種多様な行動を認識することが可能となる。 In other words, the object, place, time, etc., when a person acts, is converted into text data, and this text data or similar words are included in the behavior knowledge base constructed based on the text data on the Web. If this is the case, it becomes possible to recognize human behavior. Therefore, for example, using a behavior knowledge base constructed to include many combinations of general human behaviors and their targets based on Web text data, and sensing data representing human behavior targets as text data By using a general-purpose application that converts to, it is possible to recognize not only a specific person's action but also a wide variety of actions even when one person's action changes depending on the situation such as place and time. It becomes possible.

この発明は以上の点に着目したもので、その第１の態様は、実世界で起こり得る行動と当該行動の対象となるモノや場所、時間等との組み合わせを言語情報で記述した行動知識ベース記憶部と、前記行動の対象に関する検出値を出力するセンサを利用し、人の行動の過程で当該行動の対象に関する検出値を前記センサから取得し、当該取得した検出値を前記対象を表す言語情報に対象検出手段を用いて変換する。そして、この変換された対象を表す言語情報をもとに、上記行動知識ベース記憶部から対応する行動を表す言語情報を行動選択手段により検索し、その検索結果を出力するようにしたものである。 The present invention focuses on the above points. The first aspect of the present invention is an action knowledge base in which combinations of actions that can occur in the real world and objects, places, times, and the like that are targets of the actions are described in linguistic information. A language that uses a storage unit and a sensor that outputs a detection value related to the action target, acquires a detection value related to the action target in the course of a human action from the sensor, and uses the acquired detection value to represent the target Information is converted using object detection means. Then, based on the linguistic information representing the converted object, linguistic information representing the corresponding behavior is retrieved from the behavior knowledge base storage unit by the behavior selection means, and the retrieval result is output. .

この発明の第２の態様は、行動知識ベースの作成手段をさらに備え、この行動知識ベースの作成手段により、実世界の状況を人が解釈して言語化したデータから、上記起こり得る行動と当該行動の対象となるモノや場所、時間等との組み合わせを表すデータを統計的に抽出し、当該抽出したデータを言語情報で記述した形態で上記行動知識ベース記憶部に記憶させるようにしたものである。 The second aspect of the present invention further includes a behavioral knowledge base creation means, and the behavioral knowledge base creation means allows the above-mentioned possible behaviors and the above-mentioned behaviors to be obtained from data obtained by interpreting and verbalizing the real world situation. Data that represents a combination of things, places, times, etc. that are the targets of action are statistically extracted, and the extracted data is stored in the action knowledge base storage unit in a form described in language information. is there.

この発明の第３の態様は、上記対象検出手段において、上記行動の対象となるモノや場所、時間等に関する検出値が複数得られた場合に、これらの検出値を、その相関の強さを表す情報をもとに統合または分離するようにしたものである。 According to a third aspect of the present invention, when a plurality of detection values related to an object, a place, a time, etc., to be the target of the action are obtained in the target detection means, these detection values are set as the correlation strength. They are integrated or separated based on the information they represent.

この発明の第４の態様は、上記行動選択手段において、上記行動知識ベース記憶部から、上記変換されたモノや場所、時間等の行動の対象を表す言語情報に対応する行動を表す言語情報が複数検索された場合に、この複数の行動を表す言語情報の中から、検出値が同一時間帯に得られた他の対象の言語情報に対応する行動を表す言語情報と共通のものを選択するようにしたものである。 According to a fourth aspect of the present invention, in the behavior selecting means, language information representing an action corresponding to language information representing an object of the action such as the converted object, place, time, etc. from the action knowledge base storage unit. When a plurality of searches are performed, the language information representing the plurality of behaviors is selected from the language information representing the behavior corresponding to the language information of the other target whose detection value is obtained in the same time zone. It is what I did.

この発明の第５の態様は、上記行動知識ベース記憶部に、上記起こり得る行動と当該行動の対象となるモノや場所、時間等との組み合わせを言語情報で記述したデータと共に当該データの出現頻度を表す情報が記憶されている場合に、上記行動選択手段において、上記行動知識ベース記憶部から上記変換された対象となるモノや場所、時間等を表す言語情報に対応する行動を表す言語情報が複数検索された場合に、上記行動知識ベース記憶部に記憶された出現頻度を参照して、上記複数の行動を表す言語情報の中から予め設定した出現頻度の条件を満たすものを選択するようにしたものである。 According to a fifth aspect of the present invention, in the behavior knowledge base storage unit, the frequency of appearance of the data together with the data describing the combination of the possible behavior and the object, place, time, etc. that is the subject of the behavior in language information Is stored in the behavior selection means, the language information representing the behavior corresponding to the language information representing the object, the place, the time, etc. converted from the behavior knowledge base storage unit. When a plurality of searches are made, the appearance frequency stored in the behavior knowledge base storage unit is referred to, and the language information representing the plurality of behaviors is selected from the language information that satisfies the preset appearance frequency. It is a thing.

この発明の第１の態様によれば、人が行動を起こしたとき対象となるモノや場所、時間等がセンサにより検出されてその検出値が言語情報に変換され、この対象を表す言語情報をキーとして行動知識ベース記憶部がアクセスされ、これにより対応する行動が特定される。ここで、行動知識ベース記憶部には、実世界で起こり得る行動と当該行動の対象となるモノや場所、時間等との一般的な組み合わせが言語情報で記述されている。このため、教師データや人手により作成した知識ベースを使用して人の行動を認識する手法では認識が困難だった、不特定多数の人の行動や特定の人がとった新たな行動についても、高い確率で認識することが可能となる。 According to the first aspect of the present invention, when a person takes action, a target object, place, time, etc. are detected by a sensor, and the detected value is converted into language information. The behavior knowledge base storage unit is accessed as a key, and thereby the corresponding behavior is specified. Here, in the behavior knowledge base storage unit, a general combination of an action that can occur in the real world and an object, a place, a time, or the like that is a target of the action is described in language information. For this reason, the behavior of unspecified large numbers of people and new behaviors taken by specific people, which were difficult to recognize using the method of recognizing human behavior using a knowledge base created by teacher data and human resources, It becomes possible to recognize with high probability.

例えば、加速度等のセンシングデータから行動認識を行う従来技術では認識対象とする行動の数が限られるだけでなく、個人差の影響が少なくない。全く同じ行動をとっているにも拘わらずセンサの検出値が異なることもある。とりわけ、お年寄りなどの見守りに行動認識技術を適用する場合には、病気や怪我で身体の一部が動かしにくいといった状況も十分想定されるが、従来技術ではそのような状況でも精度良く認識することは難しい。しかし「はさみ」を使っている時は何かを切っている確率が高いといったように、この発明の第１の態様では人の行動中にインタラクションするモノや場所、時間などを手掛かりとしているため個人差の影響を受けにくい。職場や日頃の生活といった日常的な状況においては、動作対象が同じであれば同じ行動をとる人が多く、特に精度良く行動認識が可能である。
ちなみに教師データ等を使用する従来の技術では数十程度の行動しか認識対象にできなかった。しかしこの発明の第１の態様によれば、桁違いに多くの行動を認識することが可能となる。 For example, in the conventional technique for performing action recognition from sensing data such as acceleration, not only the number of actions to be recognized is limited, but also the influence of individual differences is not small. The detection value of the sensor may be different even though the same action is taken. In particular, when behavior recognition technology is applied to watch over elderly people, it is possible to assume that it is difficult to move a part of the body due to illness or injury. It ’s difficult. However, when using scissors, there is a high probability of cutting something. In the first aspect of the present invention, the person, place, time, etc. that interacts during human action are clues. Less susceptible to differences. In everyday situations, such as at work or daily life, many people take the same action as long as the movement target is the same, and action recognition can be performed particularly accurately.
By the way, with the conventional technology that uses teacher data or the like, only a few tens of actions can be recognized. However, according to the first aspect of the present invention, it is possible to recognize an extremely large number of actions.

この発明の第１の態様により得られる行動認識結果は、計算機による人の状況把握と行動支援、情報の検索と推薦などに応用が可能である。計算機による行動支援の例としては、いま行っているもしくは行う可能性が高い行動から、Webページなどを検索してユーザをアシストする情報等を提供することが挙げられる。例えば、「服を洗濯機に入れる」、「洗剤を入れる」、「洗濯機を回す」といった行動の流れを認識した際に、行動をクエリとして検索エンジンに入力し、検索結果から「洗濯と干し方のコツ」のような役立つページを推薦することで、ユーザは行動をよりスムーズに行うことができる。 The action recognition result obtained by the first aspect of the present invention can be applied to grasping the situation of the person and supporting the action by using a computer, searching for information, recommending, and the like. An example of behavior support by a computer is to provide information for assisting a user by searching a Web page or the like based on an action that is being performed or is likely to be performed. For example, when recognizing a flow of behavior such as “put clothes into a washing machine”, “put a detergent”, or “turn a washing machine”, the behavior is entered into a search engine as a query, By recommending a useful page such as “How to Make a Way”, the user can perform actions more smoothly.

また、見守りの用途としては、高齢者の行動を自動的に記録し、必要に応じて家族や訪問介護士などの見守る側へ情報を提供するといったサービスも実現できる。行動記録を蓄積してお年寄りの習慣的な行動を抽出すると、いつも習慣的に行っていることを今日は行っていないといった状態を検知してアラートを通知することで、お年寄りの状況を家族と介護士など、複数の関係者の間で共有することが可能となる。 In addition, as a use for watching, it is possible to realize a service that automatically records the actions of the elderly and provides information to the watching side such as family members and visiting caregivers as necessary. By accumulating behavior records and extracting the habitual behavior of the elderly, it is possible to detect the status that the habitual behavior is not being done today and notify the alert so that the family can understand the elderly situation. It can be shared among multiple parties such as caregivers.

この発明の第２の態様によれば、行動知識ベース記憶部の記憶データを行動認識装置が自律的に作成または更新することが可能となる。このため、行動知識ベース記憶部に記憶された行動知識ベースを、認識対象となる行動の変化や種類の増加等に応じてさらに充実させることができる。 According to the second aspect of the present invention, the behavior recognition device can autonomously create or update the storage data of the behavior knowledge base storage unit. For this reason, the behavior knowledge base memorize | stored in the behavior knowledge base memory | storage part can be further enriched according to the change of the action used as recognition object, the increase in a kind, etc. FIG.

この発明の第３の態様によれば、行動の対象に関する検出値が複数得られた場合に、これらの検出値がその相関の強さを表す情報、例えば検出時刻または時間帯に基づいて統合または分離される。この結果、人が行動をとったときのコンテキストをより詳細に把握することが可能となり、これにより行動認識の精度を高めることが可能となる。 According to the third aspect of the present invention, when a plurality of detection values related to an action target are obtained, these detection values are integrated based on information indicating the strength of the correlation, for example, detection time or time zone. To be separated. As a result, it becomes possible to grasp in more detail the context when a person takes action, thereby improving the accuracy of action recognition.

この発明の第４の態様によれば、人が同一時間帯にとった行動が複数の対象に関連する場合に、これら複数の対象をもとに行動知識ベースから検索された行動のうち共通のものが、当該人がとった行動として認識される。このため、人の行動をより正確に認識することが可能となる。 According to the fourth aspect of the present invention, when actions taken by a person in the same time zone are related to a plurality of objects, a common action among actions searched from the action knowledge base based on the plurality of objects is used. Things are recognized as actions taken by the person. For this reason, it becomes possible to recognize a person's action more correctly.

この発明の第５の態様によれば、１つの対象に対し複数の行動を表す言語情報が検索された場合に、これらの行動の言語情報のうち出現頻度が所定条件を満たすもの、例えば出現頻度が最も高いもの、もしくは上記検索された行動の出現頻度が行動知識ベースに記憶された出現頻度と最も近いものが、当該人がとった行動として認識される。この結果、過去の検索履歴を参照してより確率の高い行動を当該人の行動として認識することが可能となる。 According to the fifth aspect of the present invention, when linguistic information representing a plurality of actions is searched for one target, the appearance frequency satisfies a predetermined condition among the linguistic information of these actions, for example, the appearance frequency The action having the highest frequency or the appearance frequency of the searched action closest to the appearance frequency stored in the action knowledge base is recognized as the action taken by the person. As a result, it becomes possible to recognize the action with higher probability as the action of the person with reference to the past search history.

すなわちこの発明によれば、教師データや人手により作成した知識ベースを使用することなく行動を認識することができ、これにより多大な手間と時間およびコストを要することなく、しかも認識対象の行動が場所・時間などの状況によって変わる場合であっても、行動を正確に認識することができる行動認識装置、方法およびプログラムを提供することができる。 In other words, according to the present invention, it is possible to recognize an action without using teacher data or a knowledge base created manually, so that the action to be recognized can be performed in a place without much labor, time and cost. It is possible to provide an action recognition device, method, and program capable of accurately recognizing an action even when the situation changes depending on circumstances such as time.

この発明の一実施形態に係る行動認識装置の機能構成を示すブロック図。The block diagram which shows the function structure of the action recognition apparatus which concerns on one Embodiment of this invention. 図１に示した行動認識装置が備える行動知識ベース記憶部が記憶するデータの一例を示す図。The figure which shows an example of the data which the action knowledge base memory | storage part with which the action recognition apparatus shown in FIG. 1 is provided memorize | stores. 図１に示した行動認識装置のセンサ処理部が実行するセンサ処理の手順と処理内容を示すフローチャート。The flowchart which shows the procedure and process content of the sensor process which the sensor process part of the action recognition apparatus shown in FIG. 1 performs. 図１に示した行動認識装置が備えるセンシングデータ記憶部が記憶するデータの一例を示す図。The figure which shows an example of the data which the sensing data memory | storage part with which the action recognition apparatus shown in FIG. 1 is provided memorize | stores. 図１に示した行動認識装置のデータ解析処理部が実行するデータ解析処理の手順と処理内容を示すフローチャート。The flowchart which shows the procedure and processing content of the data analysis process which the data analysis process part of the action recognition apparatus shown in FIG. 1 performs. 図１に示した行動認識装置が備える解析結果記憶部が記憶するデータの一例を示す図。The figure which shows an example of the data which the analysis result memory | storage part with which the action recognition apparatus shown in FIG. 1 is provided memorize | stores. 図１に示した行動認識装置のテキスト変換処理部が実行するデータ解析処理の手順と処理内容を示すフローチャート。The flowchart which shows the procedure and processing content of the data analysis process which the text conversion process part of the action recognition apparatus shown in FIG. 1 performs. 図１に示した行動認識装置が備える変換結果記憶部が記憶するデータの一例を示す図。The figure which shows an example of the data which the conversion result memory | storage part with which the action recognition apparatus shown in FIG. 1 is provided memorize | stores. 図１に示した行動認識装置の行動検索処理部が実行するデータ解析処理の手順と処理内容を示すフローチャート。The flowchart which shows the procedure and processing content of the data analysis process which the action search process part of the action recognition apparatus shown in FIG. 1 performs. 図１に示した行動認識装置が備える検索結果記憶部が記憶するデータの一例を示す図。The figure which shows an example of the data which the search result memory | storage part with which the action recognition apparatus shown in FIG. 1 is provided memorize | stores. 図１に示した行動認識装置の行動決定処理部が実行するデータ解析処理の手順と処理内容を示すフローチャート。The flowchart which shows the procedure and process content of the data analysis process which the action determination process part of the action recognition apparatus shown in FIG. 1 performs. 図１に示した行動認識装置が備える行動データ記憶部が記憶するデータの一例を示す図。The figure which shows an example of the data which the action data memory | storage part with which the action recognition apparatus shown in FIG. 1 is provided memorize | stores.

以下、図面を参照してこの発明に係わる実施形態を説明する。
［一実施形態］
（構成）
図１は、この発明の一実施形態に係る行動認識装置の機能構成を示すブロック図である。
行動認識装置は、例えばサービス事業者が運用するサーバコンピュータからなり、通信インタフェースユニット１と、制御ユニット３２と、記憶ユニット３を備えている。通信インタフェースユニット１は、制御ユニット２の制御の下で、通信ネットワークを介して図示しないユーザ端末および事業者端末との間で通信を行う。 Embodiments according to the present invention will be described below with reference to the drawings.
[One Embodiment]
(Constitution)
FIG. 1 is a block diagram showing a functional configuration of an action recognition apparatus according to an embodiment of the present invention.
The action recognition device is composed of, for example, a server computer operated by a service provider, and includes a communication interface unit 1, a control unit 32, and a storage unit 3. The communication interface unit 1 communicates with a user terminal and a provider terminal (not shown) via a communication network under the control of the control unit 2.

記憶ユニット３は、記憶媒体としてＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等の書き込みおよび読み出しが可能な不揮発性メモリを使用したもので、この実施形態を実施する上で必要な記憶部として、センシングデータ記憶部３１と、解析結果記憶部３２と、変換結果記憶部３３と、行動知識ベース記憶部３４と、検索結果記憶部３５と、行動データ記憶部３６を備えている。 The storage unit 3 uses a nonvolatile memory capable of writing and reading, such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive) as a storage medium, and a storage unit necessary for carrying out this embodiment As a sensing data storage unit 31, an analysis result storage unit 32, a conversion result storage unit 33, a behavior knowledge base storage unit 34, a search result storage unit 35, and a behavior data storage unit 36.

センシングデータ記憶部３１は、後述するセンサ処理部２１１により受信処理された、行動対象物等のセンシングデータを記憶するために使用される。解析結果記憶部３２は、上記センシングデータ記憶部３１に記憶されたセンシングデータを、後述するデータ解析処理部２１２が解析した結果を表す情報を記憶するために使用される。変換結果記憶部３３は、上記解析結果記憶部３２に記憶されたセンシングデータ解析結果を、後述するテキスト変換処理部２１３がテキストデータに変換した情報を記憶するために使用される。 The sensing data storage unit 31 is used to store sensing data such as an action object that has been received and processed by a sensor processing unit 211 described later. The analysis result storage unit 32 is used to store information representing the result of analyzing the sensing data stored in the sensing data storage unit 31 by the data analysis processing unit 212 described later. The conversion result storage unit 33 is used to store information obtained by converting a sensing data analysis result stored in the analysis result storage unit 32 into text data by a text conversion processing unit 213 described later.

行動知識ベース記憶部３４には、Web上のテキストデータをもとに事前に生成された、実世界で起こり得る人の行動と当該行動の対象となるモノや場所、時間などとの組み合わせを言語情報で記述した行動知識ベースの情報が記憶される。例えば図２に示すように、行動内容を表す複数の用言のそれぞれに対応付けて、対象を表す用例と、格要素と、出現頻度が記憶される。なお、行動知識ベース記憶部３４は、必ずしも行動認識装置内に設ける必要はなく、別途設けられたデータベースサーバ等に設け、行動認識装置から通信ネットワークを介してアクセスして必要な情報を検索するようにしてもよい。 In the behavior knowledge base storage unit 34, a combination of a human behavior that can be generated in the real world and an object, a place, a time, and the like that are generated in advance based on text data on the Web is described in the language. Action knowledge base information described in information is stored. For example, as shown in FIG. 2, an example representing a target, a case element, and an appearance frequency are stored in association with each of a plurality of predicates representing action content. The behavior knowledge base storage unit 34 is not necessarily provided in the behavior recognition device, but is provided in a separately provided database server or the like so as to access the behavior recognition device via the communication network and retrieve necessary information. It may be.

検索結果記憶部３５は、上記変換結果記憶部３３に記憶されたテキストデータをもとに、後述する行動検索処理部２２１が上記変換行動データ記憶部３６を検索して得られた情報を記憶するために使用される。行動データ記憶部３６は、上記検索結果記憶部３５に記憶された検索結果を表す情報の中から、後述する行動決定処理部２２２が選択した、最も尤もらしい行動内容を言語で記述した行動データを記憶するために使用される。 The search result storage unit 35 stores information obtained by the behavior search processing unit 221 (to be described later) searching the conversion behavior data storage unit 36 based on the text data stored in the conversion result storage unit 33. Used for. The behavior data storage unit 36 selects behavior data describing in language the most likely behavior content selected by the behavior determination processing unit 222 described later from the information representing the search results stored in the search result storage unit 35. Used to memorize.

制御ユニット２は、中央処理ユニット（Central Processing Unit：ＣＰＵ）を備え、この実施形態を実現するために必要な処理機能として、対象検出部２１と、行動選択部２２を備えている。このうち対象検出部２１は、センサ処理部２２１と、データ解析処理部２１２と、テキスト変換処理部２１３を有する。また行動選択部２２は、行動検索処理部２２１と、行動決定処理部２２２と、出力生成処理部２２３を有する。なお、制御ユニット２は、上記各処理機能のほかに、行動知識ベース情報を作成して上記行動知識ベース記憶部３４に記憶させる行動知識ベース作成部も備えている。以上の各処理機能はいずれも図示しないプログラムメモリに格納されたプログラムを上記ＣＰＵに実行させることにより実現される。 The control unit 2 includes a central processing unit (CPU), and includes a target detection unit 21 and an action selection unit 22 as processing functions necessary to realize this embodiment. Among these, the target detection unit 21 includes a sensor processing unit 221, a data analysis processing unit 212, and a text conversion processing unit 213. The behavior selection unit 22 includes a behavior search processing unit 221, a behavior determination processing unit 222, and an output generation processing unit 223. In addition to the above processing functions, the control unit 2 also includes a behavior knowledge base creation unit that creates behavior knowledge base information and stores it in the behavior knowledge base storage unit 34. Each of the above processing functions is realized by causing the CPU to execute a program stored in a program memory (not shown).

センシング処理部２１１は、行動の認識対象となるユーザが所持する図示しないユーザ端末から送信されたセンシングデータを、通信インタフェースユニット３を介して受信し、センシングデータ記憶部３１に格納する処理を行う。センシングデータは、例えばユーザが行動を行う際にインタラクションする対象となるモノや場所、状況、環境、時間などをセンシングしたデータからなる。 The sensing processing unit 211 performs processing for receiving sensing data transmitted from a user terminal (not shown) possessed by a user who is a recognition target of behavior via the communication interface unit 3 and storing the sensing data in the sensing data storage unit 31. The sensing data includes, for example, data obtained by sensing an object, a place, a situation, an environment, a time, and the like that are targets for interaction when the user performs an action.

データ解析処理部２１２は、上記センシングデータ記憶部２１１に記憶されたセンシングデータを解析し、テキスト変換処理などのその後の処理が行いやすいようなデータを生成する。そして、当該生成された解析後のセンシングデータを解析結果記憶部３２に格納する処理を行う。 The data analysis processing unit 212 analyzes the sensing data stored in the sensing data storage unit 211 and generates data that facilitates subsequent processing such as text conversion processing. Then, the generated sensing data after analysis is stored in the analysis result storage unit 32.

テキスト変換処理部２１３は、上記解析結果記憶部３２に記憶された解析後のセンシングデータを意味のあるテキスト（文字列）に変換し、この変換されたテキストデータを変換結果記憶部３３に格納する処理を行う。 The text conversion processing unit 213 converts the analyzed sensing data stored in the analysis result storage unit 32 into meaningful text (character string), and stores the converted text data in the conversion result storage unit 33. Process.

行動検索処理部２２１は、上記変換結果記憶部３３に記憶されたテキストを用例として行動知識別記憶部３４を検索し、当該テキストに対応する行動を表す用言を格要素や出現頻度を示す情報と共に読み出して検索結果記憶部３５に格納する処理を行う。 The behavior search processing unit 221 searches the behavior knowledge storage unit 34 using the text stored in the conversion result storage unit 33 as an example, and displays a predicate representing the behavior corresponding to the text as a case element or appearance frequency. At the same time, the data is read out and stored in the search result storage unit 35.

行動決定処理部２２２は、上記検索結果記憶部３５に格納された行動を表す用言の中から、当該用言に関連付けて上記行動知識ベース記憶部３４に記憶された出現頻度を示す情報等を参照して最も尤もらしい行動を表す用言を選択し、この選択された行動を表す用言を行動データ記憶部３６に格納する処理を行う。 The behavior determination processing unit 222 includes information indicating the appearance frequency stored in the behavior knowledge base storage unit 34 in association with the predicate from behaviors stored in the search result storage unit 35. A predicate representing the most likely behavior is selected with reference to this, and a predicate representing the selected behavior is stored in the behavior data storage unit 36.

出力生成処理部２２３は、上記行動データ記憶部３６に記憶された行動を表す用言をもとに、当該行動を表現する自然な文章を生成する。そして、この生成された行動を表す文章を、例えば情報のレコメンドサービスを行う事業者や医療や介護サービスを行う事業者が使用する事業者端末へ送信する処理を行う。 The output generation processing unit 223 generates a natural sentence that expresses the action based on a predicate indicating the action stored in the action data storage unit 36. And the process which transmits the sentence showing this produced | generated action to the provider terminal which the provider who performs recommendation service of information, for example, and the provider who performs medical care or care service uses is performed.

行動知識ベース作成部（図示せず）は、定期的または任意のタイミングでWeb上のテキストデータ群から一般的な人の行動と当該行動の対象となるモノや場所、状況、時間等との組み合わせを抽出し、この抽出した組み合わせデータを上記行動知識ベース記憶部３４に追加記憶する処理を行う。 The action knowledge base creation unit (not shown) is a combination of general human actions from the text data group on the Web at regular or arbitrary timing, and the object, place, situation, time, etc. subject to the action. Is extracted, and the extracted combination data is additionally stored in the behavior knowledge base storage unit 34.

（動作）
次に、以上のように構成された行動認識装置の動作を説明する。
（１）行動知識ベースの作成および更新
行動認識装置は、行動知識ベース作成部の制御の下、定期的または任意のタイミングで、例えばWebのテキストデータをもとに行動知識ベースの作成および更新を行う。行動知識ベースとしては、用言と格要素の関係を整理したもの（格フレーム）を記憶する。図２に行動知識ベース記憶部３４に記憶される行動知識ベースの一例を示す。例えば、「読む」という用言の格フレームの１つとして、｛人，私，子供…｝が｛本，新聞，小説，絵本…｝を読む、といったものが考えられる。 (Operation)
Next, the operation of the action recognition device configured as described above will be described.
(1) Creation and update of a behavioral knowledge base The behavior recognition device creates and updates a behavioral knowledge base based on, for example, Web text data, at regular or arbitrary timing under the control of the behavioral knowledge base creation unit. Do. As an action knowledge base, an arrangement (case frame) in which relations between precautions and case elements are arranged is stored. FIG. 2 shows an example of the behavior knowledge base stored in the behavior knowledge base storage unit 34. For example, one of the case frames of the phrase “read” may be that {person, me, child ...} reads {book, newspaper, novel, picture book ...}.

図２に例示したように行動知識ベース記憶部３４には、用言ごとに、格要素と、用例と、出現頻度が整理されて記憶される。このような行動知識ベースの作成手法としては、例えばDaisuke Kawahara and Sadao Kurohashi，“A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis” , In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL2006), pp.176-183, 2006、または河原大輔ほか、「高性能計算環境を用いたWebからの大規模格フレーム構築」、情報処理学会自然言語処理研究会 171-12, pp.67-73, 2006に詳しく記載されている。 As illustrated in FIG. 2, in the behavior knowledge base storage unit 34, case elements, examples, and appearance frequencies are organized and stored for each word. For example, Daisuke Kawahara and Sadao Kurohashi, “A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis”, In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL2006), pp.176-183, 2006, or Daisuke Kawahara et al. -12, pp.67-73, 2006.

（２）センシングデータの取得
ユーザが行動すると、その行動の対象となるモノや場所、状況、環境などがセンサにより検出され、ユーザ端末より行動認識装置へ送信される。
例えば、家庭にある様々なモノにRFIDタグが貼られている環境において、ユーザがRFIDリーダを身に付けて「雑巾で窓を拭く」という行動をすると、雑巾や窓に貼り付けられたRFIDタグのid情報（センサ観測値）がRFIDリーダにより検出される。そして、この検出された雑巾のid情報が、その検出時刻と、予めユーザ端末に記憶したユーザidと共に、ユーザ端末から行動認識装置に向けて送信される。 (2) Acquisition of Sensing Data When a user behaves, an object, a place, a situation, an environment, or the like that is a target of the behavior is detected by a sensor and transmitted from the user terminal to the behavior recognition device.
For example, in an environment where RFID tags are affixed to various things at home, when a user wears an RFID reader and performs the action of `` wiping the window with a rag '', the RFID tag affixed to the rag or window ID information (sensor observation value) is detected by the RFID reader. Then, the detected id information of the cloth is transmitted from the user terminal to the action recognition device together with the detection time and the user id stored in advance in the user terminal.

これに対し行動認識装置では、センシング処理部２１１の制御の下、以下のようにセンシングデータ（観測値）の取得制御が行われる。図３はその処理手順と処理内容を示すフローチャートである。 On the other hand, in the behavior recognition device, sensing data (observation value) acquisition control is performed as described below under the control of the sensing processing unit 211. FIG. 3 is a flowchart showing the processing procedure and processing contents.

すなわち、上記ユーザ端末から送信されたセンシングデータが通信インタフェースユニット１により受信されると、先ずステップＳ１１によりセンシングデータに含まれるユーザidが抽出され、当該ユーザidが行動認識対象として予めユーザ情報記憶部（図示せず）に登録されているユーザのものか否かがステップＳ１２で判定される。この判定の結果、登録されているユーザであれば、ステップＳ１３によりセンシングデータが受信処理され、ステップＳ１４により当該受信されたセンシングデータがセンシングデータ記憶部３１に格納される。そして、当該センシングデータは、ステップＳ１５によりセンシングデータ記憶部３１から読み出され、データ解析処理部２１２に渡される。なお、上記受信されたセンシングデータに含まれるユーザidが未登録であれば、センシングデータの受信・格納処理は行われず、ステップＳ１６においてデータの取得間隔が調整されたのち、ステップＳ１２に戻る。 That is, when sensing data transmitted from the user terminal is received by the communication interface unit 1, first, a user id included in the sensing data is extracted in step S11, and the user id is stored in advance as a user information storage unit as an action recognition target. In step S12, it is determined whether or not the user is registered in (not shown). As a result of this determination, if the user is a registered user, the sensing data is received in step S13, and the received sensing data is stored in the sensing data storage unit 31 in step S14. Then, the sensing data is read from the sensing data storage unit 31 in step S15 and transferred to the data analysis processing unit 212. If the user id included in the received sensing data is not registered, the sensing data receiving / storing process is not performed, and after the data acquisition interval is adjusted in step S16, the process returns to step S12.

図４は、以上のようなセンシングデータ取得処理を繰り返すことにより、センシングデータ記憶部３１に記憶されたセンシングデータの一例を示すものである。同図に示すように各センシングデータはいずれも、検出時刻と、ユーザidと、RFIDタグidとから構成される。 FIG. 4 shows an example of sensing data stored in the sensing data storage unit 31 by repeating the sensing data acquisition process as described above. As shown in the figure, each sensing data includes a detection time, a user id, and an RFID tag id.

（３）データ解析処理
上記センシングデータ取得処理部２１１からセンシングデータが渡されるごとに、データ解析処理部２１２の制御の下、当該センシングデータの解析処理が以下のように行われる。図５はその処理手順と処理内容を示すフローチャートである。 (3) Data Analysis Processing Every time sensing data is transferred from the sensing data acquisition processing unit 211, the sensing data analysis processing is performed as follows under the control of the data analysis processing unit 212. FIG. 5 is a flowchart showing the processing procedure and processing contents.

すなわち、ステップＳ２１で上記センシングデータが入力されると、先ずステップＳ２２によりセンシングデータが読み込まれ、ステップＳ２３において当該読み込まれたセンシングデータが解析される。例えば、あるユーザの同一時刻の複数のセンサ観測値が統合される。当該解析処理後のセンシングデータは、ステップＳ２４により解析結果記憶部３２に格納されたのち、ステップＳ２５により読み出されてテキスト変換処理部２１３に渡される。図６は、上記解析結果記憶部３２に記憶された解析処理後のセンシングデータの一例を示すもので、同一時刻に取得された複数のセンシングデータが統合された場合を示している。 That is, when the sensing data is input in step S21, the sensing data is first read in step S22, and the read sensing data is analyzed in step S23. For example, a plurality of sensor observation values at the same time of a certain user are integrated. The sensing data after the analysis processing is stored in the analysis result storage unit 32 in step S24, and then read out in step S25 and transferred to the text conversion processing unit 213. FIG. 6 shows an example of the sensing data after the analysis processing stored in the analysis result storage unit 32, and shows a case where a plurality of sensing data acquired at the same time are integrated.

なお、他の解析手法としては、例えばユーザが行動をとる際にほぼ同時に触るモノや場所といった情報をネットワーク化してコミュニティを抽出することで、あるユーザが行動をとる際に頻繁に共起するモノや場所を抽出する手法等が考えられる。このデータ解析処理の実現方法は上記に限定するものではなく、実施の際にはユーザが行動をとった際の状況を的確に示すようセンサシングデータを解析できればよい。 Other analysis methods include, for example, networking information such as things and places that are touched almost simultaneously when a user takes an action, and extracting a community, so that a user frequently co-occurs when taking an action. And a method for extracting the location. The method of realizing this data analysis process is not limited to the above, and it is only necessary that the sensoring data can be analyzed so as to accurately indicate the situation when the user takes an action.

（４）テキスト変換処理
上記データ解析処理部２１２から解析処理後のセンシングデータが渡されると、テキスト変換処理部２１３の制御の下、当該解析処理後のセンシングデータのRFIDタグidを対応するモノの名称を表すテキストデータ（文字列）に変換する処理が以下のように行われる。図７はその処理手順と処理内容を示すフローチャートである。 (4) Text conversion processing When sensing data after analysis processing is passed from the data analysis processing unit 212, under the control of the text conversion processing unit 213, the RFID tag id of the sensing data after analysis processing corresponds to the corresponding mono The process of converting to text data (character string) representing the name is performed as follows. FIG. 7 is a flowchart showing the processing procedure and processing contents.

すなわち、ステップＳ３１においてセンシングデータが渡されると、先ずステップＳ３２により一時変数ｉが処理化（＝０）され、ステップＳ３３において当該ｉ番目のデータが存在するか否かが判定される。この判定の結果ｉ番目のデータが存在すれば、ステップＳ３４によりｉ番目のデータが読み込まれ、ステップＳ３５において当該データがテキストデータ（文字列）に変換される。例えば、図６に示したRFIDタグidが、実際に家庭内にあるモノの名称に変換される。そして、この変換後のテキストデータはステップＳ３６により変換結果記憶部３３に格納される。図８に上記変換されたテキストデータの一例を示す。 That is, when sensing data is passed in step S31, first, a temporary variable i is processed (= 0) in step S32, and it is determined in step S33 whether or not the i-th data exists. If the i-th data exists as a result of this determination, the i-th data is read in step S34, and the data is converted into text data (character string) in step S35. For example, the RFID tag id shown in FIG. 6 is converted into the name of a thing actually in the home. The converted text data is stored in the conversion result storage unit 33 in step S36. FIG. 8 shows an example of the converted text data.

ｉ番目のデータのテキストへの変換処理が終了すると、ステップＳ３７により一時変数ｉがインクリメントされ、ステップＳ３３に戻って上記テキストへの変換処理が繰り返し行われる。これに対し上記ステップＳ３３による判定の結果、ｉ番目のデータが存在しなくなった場合には、ステップＳ３８に移行し、それまでに変換されたテキストデータが変換結果記憶部３３から読み出され、行動選択部２２に渡される。 When the conversion process of the i-th data into the text is completed, the temporary variable i is incremented in step S37, and the process returns to step S33 and the conversion process into the text is repeated. On the other hand, if the i-th data no longer exists as a result of the determination in step S33, the process proceeds to step S38, where the text data converted so far is read from the conversion result storage unit 33, and the action Passed to the selector 22.

センシングデータのセンサ値を文字列に変換するモジュールの実現手段としては、例えば予めセンサ値と名称との対応関係をデータベース化して記憶しておき、このデータベースを検索することで変換するものが用いられる。その他の手段としては、一般に公開されているアプリケーション等を利用し、センサの数値や波形から文字列へと変換するものがある。具体的には、ＧＰＳの緯度経度情報を住所やランドマークに変換するアプリケーションや、画像を入力として画像中に写っているモノの名称を出力する物体認識アプリケーションなどを利用する。 As a means for realizing the module for converting the sensor value of the sensing data into a character string, for example, a correspondence relationship between the sensor value and the name is stored in a database in advance and is converted by searching the database. . As another means, there is a means for converting a numerical value or waveform of a sensor into a character string using a publicly available application or the like. Specifically, an application that converts GPS latitude / longitude information into an address or a landmark, an object recognition application that outputs the name of an object shown in the image using an image, and the like are used.

（５）行動検索処理
上記テキスト変換処理部２１３から変換後のテキストデータが渡されると、行動検索処理部２２１の制御の下、行動知識ベース記憶部３４から対応する行動を表す情報を検索する処理が以下のように行われる。図９はその処理手順と処理内容を示すフローチャートである。 (5) Behavior Search Processing When the converted text data is passed from the text conversion processing unit 213, processing for retrieving information representing the corresponding behavior from the behavior knowledge base storage unit 34 under the control of the behavior search processing unit 221. Is done as follows. FIG. 9 is a flowchart showing the processing procedure and processing contents.

すなわち、ステップＳ４１において変換後のテキストデータ（文字列）が渡されると、先ずステップＳ４２により一時変数ｉが処理化（＝０）され、ステップＳ４３において行動知識ベース記憶部３４にｉ番目のデータが存在するか否かが判定される。この判定の結果、ｉ番目のデータが存在すれば、ステップＳ４４により当該ｉ番目のデータが読み込まれ、ステップＳ４５において当該読み込まれたｉ番目のデータと上記変換後のテキストデータ（文字列）とのマッチング処理が行われる。そして、両者が一致すると、ステップＳ４６において行動知識ベース記憶部３４から上記ｉ番目のデータに関連付けられている「格」、「用言」、「出現頻度」が行動データの候補として読み込まれ、検索結果記憶部３５に格納される。図１０は、上記検索結果記憶部３５に格納された行動データ候補の一例を示す。 That is, when the converted text data (character string) is passed in step S41, the temporary variable i is first processed (= 0) in step S42, and the i-th data is stored in the behavior knowledge base storage unit 34 in step S43. It is determined whether or not it exists. If the i-th data exists as a result of this determination, the i-th data is read in step S44, and the i-th data read in step S45 and the converted text data (character string). A matching process is performed. If they match, the “case”, “precaution”, and “appearance frequency” associated with the i-th data are read from the behavior knowledge base storage unit 34 as behavior data candidates in step S46, and searched. Stored in the result storage unit 35. FIG. 10 shows an example of action data candidates stored in the search result storage unit 35.

なお、上記マッチング処理の結果、両者が一致しなかった場合には、上記行動データ候補の格納は行われない。上記ｉ番目のデータとのマッチング処理が終了すると、ステップＳ４７によりｉの値がインクリメントされ、ステップＳ４３に戻って上記ステップＳ４３〜ステップＳ４６による一連のマッチング処理が繰り返し実行される。 In addition, when both are not matched as a result of the matching process, the behavior data candidate is not stored. When the matching process with the i-th data is completed, the value of i is incremented in step S47, and the process returns to step S43 and the series of matching processes in steps S43 to S46 is repeatedly executed.

上記繰り返し実行処理の結果、行動知識ベース記憶部３４において未検索のデータがなくなると、ステップＳ４８に移行して上記検索結果記憶部３５に記憶された行動データ候補が例えば時間帯別にソートされる。そして、ステップＳ４９において、上記検索結果記憶部３５に記憶されたソート後の行動データ候補が読み出され、行動決定処理部２２２に渡される。 As a result of the repeated execution process, when there is no unsearched data in the behavior knowledge base storage unit 34, the process proceeds to step S48, and the behavior data candidates stored in the search result storage unit 35 are sorted, for example, by time zone. In step S49, the sorted action data candidates stored in the search result storage unit 35 are read out and passed to the action determination processing unit 222.

（６）行動決定処理
上記行動検索処理部２２１から行動データ候補が渡されると、行動決定処理部２２２の制御の下、上記行動データ候補の中から最も尤もらしい行動データを選択する処理が以下のように実行される。図１１はその処理手順と処理内容を示すフローチャートである。 (6) Behavior determination processing When a behavior data candidate is passed from the behavior search processing unit 221, under the control of the behavior determination processing unit 222, processing for selecting the most likely behavior data from the behavior data candidates is as follows. To be executed. FIG. 11 is a flowchart showing the processing procedure and processing contents.

すなわち、ステップＳ５１において行動データ候補が渡されると、先ずステップＳ５２において当該行動データ候補に動作の対象が１つのみ含まれているか或いは複数含まれているかが判定される。この判定の結果、１つであればステップＳ５４において、上記行動データの候補の中から出現頻度が最も高いものが選択され、この選択された行動データを構成する「格」、「用言」、「出現頻度」がステップＳ５６により行動データ記憶部３６に格納される。 That is, when a behavior data candidate is passed in step S51, first, in step S52, it is determined whether the behavior data candidate includes only one or more motion targets. As a result of this determination, if there is one, in step S54, the one with the highest appearance frequency is selected from the candidates for the behavior data, and “case”, “property”, “Appearance frequency” is stored in the behavior data storage unit 36 in step S56.

これに対し、行動データ候補に動作の対象が複数含まれている場合には、ステップＳ５３によりこれら複数の動作対象に共通の行動が存在するか否かが判定される。そして、共通の行動が存在しなければ、上記ステップＳ５４において上記行動データの候補の中から出現頻度が最も高いものが選択され、この選択された行動データを構成する「格」、「用言」、「出現頻度」がステップＳ５６により行動データ記憶部３６に格納される。 On the other hand, when a plurality of motion targets are included in the behavior data candidates, it is determined in step S53 whether or not a common behavior exists among the plurality of motion targets. If there is no common action, the action data candidate having the highest appearance frequency is selected from the action data candidates in the step S54, and the “case” and “property” constituting the selected action data are selected. , “Appearance frequency” is stored in the behavior data storage unit 36 in step S56.

また、複数の動作対象に共通の行動が存在する場合には、ステップＳ５５において当該共通の行動のうち尤も出現頻度が高い行動が選択される。そして、この選択された行動データを構成する「格」、「用言」、「出現頻度」がステップＳ５６により行動データ記憶部３６に格納される。図１２に記憶された行動データの一例を示す。 If there is a common action among a plurality of operation targets, an action having the highest appearance frequency is selected from the common actions in step S55. Then, “case”, “precaution”, and “appearance frequency” constituting the selected behavior data are stored in the behavior data storage unit 36 in step S56. An example of the action data stored in FIG. 12 is shown.

以上の選択処理の具体例を以下に説明する。すなわち、対象となる１つのモノ「雑巾」に対して図１０に示す検索結果が得られた場合には、出現頻度が最も高い「絞る」という用言が選択される。また、複数のモノ「雑巾」と「窓」が同時刻または同一時間帯に取得された場合には、「雑巾」と「窓」が両方とも関係する行動が検索され、その中から出現頻度が最も高い「拭く」という用言が選択される。 A specific example of the above selection process will be described below. In other words, when the search result shown in FIG. 10 is obtained for one target item “cloth”, the word “squeeze” having the highest appearance frequency is selected. In addition, when a plurality of items “clothes” and “windows” are acquired at the same time or at the same time, actions related to both “wives” and “windows” are searched, and the appearance frequency is searched from among them. The highest “wipe” remark is selected.

このとき、格要素が同じ場合には「窓に雑巾にする」といった不自然な日本語が生成されてしまう。このような場合に自然な日本語を生成するには、両方のモノが関係する行動だが、それぞれのモノの格要素は異なる行動を選択するとよい。また、主格（ガ格）やノ格で接続する用言は採用しない、軽動詞は動作対象がサ変接続の名詞の場合以外は採用しない、といったルールを導入するとより自然な日本語を生成できる。 At this time, if the case elements are the same, an unnatural Japanese word such as “make a rag in the window” is generated. In order to generate natural Japanese in such a case, it is an action that involves both things, but it is better to select different actions for the case elements of each thing. In addition, it is possible to generate more natural Japanese by introducing rules that do not use predicates that connect with the main case (ga) or no case, and that light verbs are not used except when the movement target is a noun of the sari connection.

行動検索結果の中に複数のモノが両方とも関係する行動が非常に少ない、もしくはコーパス中で出現頻度が低い場合には、別の動作がほぼ同時に行われていると判断してそれぞれのモノに対して頻度が最も高い用言を選択する。例えば「パソコン」と「お菓子」であれば、「パソコンをする」、「お菓子を食べる」のようにする。別の動作が同時に行われていると判断する基準としては、そもそも行動検索機能で係り得る動作が見つかるかどうか、もしくは頻度の閾値を定めその範囲内で見つかるかどうか、見つからない場合には別の動作に分けて行動を決定する。 If there are very few actions related to both things in the action search result, or if the appearance frequency is low in the corpus, it is judged that different actions are being performed almost simultaneously, and each thing is assigned On the other hand, the most frequently used preach is selected. For example, in the case of “computer” and “sweets”, “use a personal computer”, “eat sweets”, and so on. The criteria for judging that another action is being performed at the same time are whether an action that can be involved in the behavior search function is found in the first place, whether it is found within the range by setting a frequency threshold, The action is determined by dividing it into actions.

閾値は経験的に予め定めた閾値や統計に基づく算出値により決定する。経験的には行動検索処理部２２１の検索結果のうち、出現頻度が上位２割程度のところに設定するとよい。しかしこれに限らず、統計値に基づく算出値、例えば平均値、中央値、最頻値、指数平均値、平均値から標準偏差の数倍以上の外れ値（極端な値）を除いて処理する調整平均、移動平均、平均値から標準偏差の数倍より低い値、その他過去の事例から得られた予め定めた値等を閾値に用いてもよい。なお、ここでは出現頻度により行動を決定する処理手法を述べたが、尤もらしい行動を決定する基準はコーパス中の頻度だけに限定されるものではない。 The threshold value is determined based on a empirically predetermined threshold value or a calculated value based on statistics. Empirically, it is better to set the appearance frequency in the top 20% of the search results of the behavior search processing unit 221. However, the present invention is not limited to this, and calculated values based on statistical values, for example, average values, median values, mode values, exponential average values, and outliers (extreme values) that are several times the standard deviation from the average values are processed. An adjusted average, moving average, a value lower than several times the standard deviation from the average value, other predetermined values obtained from past cases, and the like may be used as the threshold value. Although a processing method for determining an action based on the appearance frequency has been described here, a criterion for determining a likely action is not limited to the frequency in the corpus.

最後に、ステップＳ５７により上記行動データ記憶部３６に記憶された行動データが読み出され、出力生成処理部２２３に渡される。 Finally, the action data stored in the action data storage unit 36 is read out in step S57 and is passed to the output generation processing unit 223.

（７）出力生成処理
上記行動決定処理部２２２により最も尤もらしい行動を表すデータが選択されると、続いて出力生成処理部２２３により、上記選択された行動データを表す自然な文章が生成される。例えば「雑巾」に対して「絞る」という１つの用言が選ばれた場合には、「雑巾を絞る」という行動を表す文章が生成され、また「雑巾」と「窓」という２つの用言が選ばれた場合には「雑巾で窓を拭く」という行動を表す自然な文章が生成される。そして、上記生成された行動を表す文章は、例えば情報のレコメンドサービスを行う事業者や医療や介護サービスを行う事業者が使用する事業者端末へ、通信インタフェースユニット１から送信される。 (7) Output generation processing When data representing the most likely behavior is selected by the behavior determination processing unit 222, a natural sentence representing the selected behavior data is subsequently generated by the output generation processing unit 223. . For example, if one word of “squeeze” is selected for “clothes”, a sentence representing the action of “squeezing the cloth” is generated, and two words “widow” and “window” are generated. When is selected, a natural sentence representing the action “wipe the window with a rag” is generated. Then, the sentence representing the generated action is transmitted from the communication interface unit 1 to, for example, a provider terminal used by a provider that performs an information recommendation service or a provider that performs a medical or nursing care service.

（効果）
以上詳述したように一実施形態では、人の行動と当該行動の対象となるモノや場所、状況、時間等との組み合わせを言語情報で記述した行動知識ベース記憶部３４を備えている。そして、先ず上記行動の対象に関する検出値をセンサから取得し、当該取得した検出値を解析して同一時刻に得られた検出値を統合した後、上記対象を表す言語情報に変換する。そして、この変換された対象を表す言語情報をもとに、上記行動知識ベース記憶部３４から対応する行動を表す言語情報を検索し、この検索された言語情報の中から出現確率が最も高いものを選択して、文章化し出力するようにしている。 (effect)
As described above in detail, the embodiment includes the action knowledge base storage unit 34 that describes a combination of a person's action and an object, a place, a situation, a time, or the like that is the object of the action in linguistic information. First, a detection value related to the action target is acquired from the sensor, the acquired detection value is analyzed, and the detection values obtained at the same time are integrated, and then converted into language information representing the target. Then, based on the linguistic information representing the converted object, the linguistic information representing the corresponding action is retrieved from the behavior knowledge base storage unit 34, and the appearance probability is highest among the retrieved linguistic information. Is selected and written as text.

したがって、教師データや人手により作成した知識ベースを使用して人の行動を認識する手法では認識が困難だった、不特定多数の人の行動や特定の人がとった新たな行動についても、高い確率で認識することが可能となる。 Therefore, the behavior of unspecified large numbers of people and new behaviors taken by specific people, which were difficult to recognize with the method of recognizing human behavior using teacher data and a knowledge base created manually, are also high. It becomes possible to recognize with probability.

また、行動知識ベースの作成部により、Web上のデータから、人の行動と当該行動の対象となるモノや場所、状況、時間等との組み合わせを表すデータを統計的に抽出し、当該抽出したデータを言語情報で記述した形態で上記行動知識ベース記憶部３４に記憶させるようにしている。このため、行動知識ベース記憶部３４の記憶データを行動認識装置において自律的に作成または更新することが可能となり、これにより行動知識ベースを認識対象となる行動の変化や種類の増加等に応じてさらに充実させることができる。 In addition, the behavior knowledge base creation unit statistically extracts data representing the combination of human behavior and the object, place, situation, time, etc. that is the subject of the behavior from the data on the web, and extracted the data The data is stored in the behavior knowledge base storage unit 34 in a form described in language information. For this reason, it becomes possible to autonomously create or update the storage data of the behavior knowledge base storage unit 34 in the behavior recognition device, whereby the behavior knowledge base can be recognized according to a change in behavior or an increase in the types of recognition targets. It can be further enhanced.

さらに、行動の対象となるモノや場所、状況、時間等に関する検出値が複数得られた場合に、これらの検出値を同一時刻に得られたもの同士で統合するようにしている。このため、人が行動をとったときのコンテキストをより詳細に把握することが可能となり、これにより行動認識の精度を高めることが可能となる。 Furthermore, when a plurality of detection values related to an object, a place, a situation, a time, or the like as an action target are obtained, these detection values are integrated with each other obtained at the same time. For this reason, it becomes possible to grasp in detail the context when a person takes action, thereby improving the accuracy of action recognition.

さらに、上記行動知識ベース記憶部３４から、上記変換されたモノや場所、状況、時間等の行動の対象を表す言語情報に対応する行動を表す言語情報が複数検索された場合に、この複数の行動を表す言語情報の中から、検出値が同一時間帯に得られた他の対象の言語情報に対応する行動を表す言語情報と共通のものを選択するようにしている。このため、人が同一時間帯にとった行動が複数の対象に関連する場合に、これら複数の対象をもとに行動知識ベースから検索された行動のうち共通のものが、当該人がとった行動として認識される。このため、人の行動をより正確に認識することが可能となる。 Further, when a plurality of linguistic information representing behavior corresponding to the linguistic information representing the behavioral object such as the converted object, place, situation, and time are retrieved from the behavior knowledge base storage unit 34, From the language information representing the action, the same detection information as the language information representing the action corresponding to the language information of another target obtained in the same time zone is selected. For this reason, when an action taken by a person in the same time zone is related to multiple objects, the common action among the actions retrieved from the action knowledge base based on these multiple objects is taken by the person. Recognized as an action. For this reason, it becomes possible to recognize a person's action more correctly.

さらに、行動知識ベース記憶部３４から、行動を表す言語情報が複数検索された場合に、上記行動知識ベース記憶部３４に記憶された出現頻度を参照して、上記複数の行動を表す言語情報の中から出現頻度が最も高いものを選択するようにしている。このため、複数の行動を表す言語情報のうち出現頻度が所定条件を満たすものが、当該人がとった行動として認識される。この結果、過去の検索履歴を参照してより確率の高い行動を当該人の行動として認識することが可能となる。
［他の実施形態］
行動知識ベース記憶部３４から行動を表す言語情報を検索するごとにその検索頻度を計算し、上記検索された複数の行動を表す言語情報の中から、その検索頻度が行動知識ベース記憶部３４に記憶された出現頻度と最も近い行動の言語情報を選択するようにしてもよい。 Further, when a plurality of linguistic information representing actions are retrieved from the behavior knowledge base storage unit 34, the appearance frequency stored in the behavior knowledge base storage unit 34 is referred to, and the language information representing the plurality of actions is referred to. The one with the highest appearance frequency is selected from among them. For this reason, among the linguistic information representing a plurality of actions, those whose appearance frequency satisfies a predetermined condition are recognized as actions taken by the person. As a result, it becomes possible to recognize the action with higher probability as the action of the person with reference to the past search history.
[Other Embodiments]
Every time linguistic information representing behavior is retrieved from the behavioral knowledge base storage unit 34, the retrieval frequency is calculated, and the retrieval frequency is stored in the behavioral knowledge base storage unit 34 from among the searched linguistic information representing the plurality of behaviors. You may make it select the language information of the action nearest to the memorize | stored appearance frequency.

その他、行動認識装置の設置場所や構成、対象検出部および行動選択部の処理手順と処理内容、センサの種類とその検出値の構成、行動知識ベース記憶部に記憶される行動知識ベースの構成等についても、この発明の要旨を逸脱しない範囲で種々変形して実施できる。 In addition, the installation location and configuration of the behavior recognition device, the processing procedure and processing contents of the target detection unit and the behavior selection unit, the type of sensor and the configuration of the detected value, the configuration of the behavior knowledge base stored in the behavior knowledge base storage unit, etc. With respect to the above, various modifications can be made without departing from the scope of the present invention.

要するにこの発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 In short, the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

１…行動認識装置、２…制御ユニット、３…記憶ユニット、２１…対象検出部、２２…行動選択部、３１…センシングデータ記憶部、３２…解析結果記憶部、３３…変換結果記憶部、３４…行動知識ベース記憶部、３５…検索結果記憶部、３６…行動データ記憶部、２１１…センサ処理部、２１２…データ解析処理部、２１３…テキスト変換処理部、２２１…行動検索処理部、２２２…行動決定処理部、２２３…出力生成処理部。 DESCRIPTION OF SYMBOLS 1 ... Action recognition apparatus, 2 ... Control unit, 3 ... Storage unit, 21 ... Object detection part, 22 ... Action selection part, 31 ... Sensing data storage part, 32 ... Analysis result storage part, 33 ... Conversion result storage part, 34 ... Behavior knowledge base storage unit, 35 ... Search result storage unit, 36 ... Behavior data storage unit, 211 ... Sensor processing unit, 212 ... Data analysis processing unit, 213 ... Text conversion processing unit, 221 ... Behavior search processing unit, 222 ... Action determination processing unit, 223... Output generation processing unit.

Claims

実世界で起こり得る行動と当該行動の対象との組み合わせを言語情報で記述した行動知識ベース記憶部と、前記行動の対象に関する検出値を出力するセンサにそれぞれ接続可能な行動認識装置であって、
行動の過程で当該行動の対象に関する検出値を前記センサから取得し、当該取得した検出値を前記対象を表す言語情報に変換する対象検出手段と、
前記変換された対象を表す言語情報をもとに、前記行動知識ベース記憶部から対応する行動を表す言語情報を検索して出力する行動選択手段と
を具備することを特徴とする行動認識装置。 An action recognition device connectable to a behavior knowledge base storage unit describing a combination of an action that can occur in the real world and a target of the action in linguistic information, and a sensor that outputs a detection value related to the action target,
Target detection means for acquiring a detection value related to the target of the action in the course of the action from the sensor, and converting the acquired detection value into language information representing the target;
An action recognition device comprising: action selection means for searching and outputting linguistic information representing a corresponding action from the action knowledge base storage unit based on the linguistic information representing the converted object.

実世界の状況を人が解釈して言語化したデータから、前記起こり得る行動と当該行動の対象との組み合わせを表すデータを統計的に抽出し、当該抽出したデータを言語情報で記述した形態で前記行動知識ベース記憶部に記憶させる手段を、さらに具備する請求項１記載の行動認識装置。 In a form in which data representing the combination of the possible action and the target of the action is statistically extracted from the data that the person interprets the real world situation and verbalized, and the extracted data is described in language information The action recognition apparatus according to claim 1, further comprising means for storing the action knowledge base storage unit.

前記対象検出手段は、前記行動の対象に関する検出値が複数得られた場合に、これらの検出値を、その相関の強さを表す情報をもとに統合または分離することを特徴とする請求項１または２記載の行動認識装置。 The object detection means, when a plurality of detection values related to the object of the action are obtained, integrates or separates these detection values based on information indicating the strength of the correlation. The action recognition apparatus according to 1 or 2.

前記行動選択手段は、前記行動知識ベース記憶部から、前記変換された対象を表す言語情報に対応する行動を表す言語情報が複数検索された場合に、この複数の行動を表す言語情報の中から、検出値が同一時間帯に得られた他の対象の言語情報に対応する行動を表す言語情報と共通のものを選択することを特徴とする請求項１乃至３のいずれかに記載の行動認識装置。 The behavior selecting means, when a plurality of linguistic information representing behavior corresponding to the linguistic information representing the converted object are retrieved from the behavior knowledge base storage unit, from among the linguistic information representing the plurality of behaviors The action recognition according to any one of claims 1 to 3, characterized in that the detection value is the same as the language information representing the action corresponding to the language information of another target obtained in the same time zone. apparatus.

前記行動知識ベース記憶部に、前記起こり得る行動と当該行動の対象との組み合わせを言語情報で記述したデータと共に当該データの出現頻度を表す情報が記憶されている場合に、
前記行動選択手段は、前記行動知識ベース記憶部から、前記変換された対象を表す言語情報に対応する行動を表す言語情報が複数検索された場合に、前記行動知識ベース記憶部に記憶された出現頻度を参照して、前記複数の行動を表す言語情報の中から予め設定した出現頻度の条件を満たすものを選択することを特徴とする請求項１乃至３のいずれかに記載の行動認識装置。 In the behavior knowledge base storage unit, when information indicating the appearance frequency of the data is stored together with data describing the combination of the possible behavior and the target of the behavior in language information,
The behavior selecting means, when a plurality of linguistic information representing behavior corresponding to the linguistic information representing the converted object is retrieved from the behavior knowledge base storage unit, the appearance stored in the behavior knowledge base storage unit 4. The behavior recognition apparatus according to claim 1, wherein a frequency satisfying a condition of appearance frequency set in advance is selected from language information representing the plurality of behaviors with reference to the frequency. 5.

実世界で起こり得る行動と当該行動の対象との組み合わせを言語情報で記述した行動知識ベース記憶部と、前記行動の対象に関する検出値を出力するセンサにそれぞれ接続可能な行動認識装置が実行する行動認識方法であって、
前記行動認識装置が、行動の過程で当該行動の対象に関する検出値を前記センサから取得し、当該取得した検出値を前記対象を表す言語情報に変換するステップと、
前記行動認識装置が、前記変換された対象を表す言語情報をもとに前記行動知識ベース記憶部から対応する行動を表す言語情報を検索して出力するステップと
を具備することを特徴とする行動認識方法。 Actions executed by a behavior recognition device that can be connected to a behavior knowledge base storage unit describing a combination of a behavior that can occur in the real world and a target of the behavior in linguistic information, and a sensor that outputs a detection value related to the target of the behavior A recognition method,
The behavior recognition device acquires a detection value related to the target of the behavior in the course of the behavior from the sensor, and converts the acquired detection value into linguistic information representing the target;
The behavior recognition device comprising: searching for and outputting language information representing the corresponding behavior from the behavior knowledge base storage unit based on the linguistic information representing the converted object. Recognition method.

請求項１乃至請求項５のいずれかに記載の行動認識装置が具備する手段が実行する処理を、当該行動認識装置が備えるコンピュータに実行させるプログラム。 The program which makes the computer with which the said action recognition apparatus performs the process which the means with which the action recognition apparatus in any one of Claim 1 thru | or 5 comprises is executed.