JP7129531B2

JP7129531B2 - Information processing method and information processing system

Info

Publication number: JP7129531B2
Application number: JP2021136495A
Authority: JP
Inventors: アブドゥルラーマンアブドゥルガニ; 大資玉城
Original assignee: Exa Wizards Inc
Current assignee: Exa Wizards Inc
Priority date: 2020-11-30
Filing date: 2021-08-24
Publication date: 2022-09-01
Anticipated expiration: 2040-11-30
Also published as: JP2022086997A

Description

本発明は、撮影した画像の送受信を行う情報処理方法及び情報処理システムに関する。 The present invention relates to an information processing method and an information processing system for transmitting and receiving captured images.

特許文献１においては、ネットワークに接続された画像記録装置に画像データを送信する画像送信装置が記載されている。この画像送信装置は、使用者による操作に基づいて送信対象の元画像を選択し、選択された元画像を縮小した縮小画像を生成する。画像送信装置は、通信環境を検出して、第１の通信環境である場合に縮小画像を送信し、第２の通信環境である場合に元画像を送信する。 Japanese Unexamined Patent Application Publication No. 2002-200002 describes an image transmission device that transmits image data to an image recording device connected to a network. This image transmission device selects an original image to be transmitted based on a user's operation, and generates a reduced image by reducing the selected original image. The image transmission device detects the communication environment, transmits a reduced image in the first communication environment, and transmits the original image in the second communication environment.

近年、特許文献１に記載の画像送信装置のようにカメラ等が撮影した画像をサーバ装置へ送信する技術、及び、サーバ装置からユーザの端末装置等へ画像を送信する技術が広く利用されている。 In recent years, a technique for transmitting an image captured by a camera or the like to a server device, such as the image transmission device described in Patent Document 1, and a technique for transmitting an image from the server device to a user's terminal device or the like have been widely used. .

特開２０１２－２１７１６６号公報JP 2012-217166 A

ネットワークを介して画像の送受信を行うシステムにおいては、大量の画像を送受信することで通信負荷が増大するという問題がある。通信負荷の増大を抑制するために、例えば画像を圧縮して送信することが行われる。しかしながら、例えば画像の送信先の装置において何らかの画像処理を行う場合、圧縮した画像に対して画像処理を行うと処理の精度が低下する虞がある。 2. Description of the Related Art In a system that transmits and receives images via a network, there is a problem that transmission and reception of a large amount of images increases the communication load. In order to suppress an increase in communication load, for example, images are compressed and transmitted. However, when some image processing is performed in a device to which an image is to be sent, for example, there is a possibility that the accuracy of the processing will be lowered if the image processing is performed on the compressed image.

本発明は、斯かる事情に鑑みてなされたものであって、その目的とするところは、画像の送受信による通信負荷の増大を抑制することが期待できる情報処理方法及び情報処理システムを提供することにある。 SUMMARY OF THE INVENTION The present invention has been made in view of such circumstances, and an object of the present invention is to provide an information processing method and an information processing system that can be expected to suppress an increase in communication load due to transmission and reception of images. That's what it is.

一実施形態に係る情報処理方法は、複数の撮影装置が、それぞれ、撮影部が撮影した画像から人を検出し、人の検出結果に基づいて、前記撮影部が撮影した画像から人の顔を検出し、顔の検出結果に基づいて、顔の表情又は向きを検出し、前記撮影部が撮影した画像に写された人毎に、当該人が写され且つ検出した顔の表情又は向きに関する第１の条件を満たす画像の数をカウントし、前記撮影部が撮影した複数の画像から、各人の画像数が同程度となるように画像を選別してサーバ装置へ送信し、前記サーバ装置が、複数の前記撮影装置により撮影された複数の画像を受信し、受信した画像に写された人毎に、当該人が写され且つ第２の条件を満たす画像の数をカウントし、複数の前記撮影装置により撮影された複数の画像から、各人の画像数が同程度となるように画像を選別する。 In an information processing method according to an embodiment, each of a plurality of photographing devices detects a person from an image photographed by a photographing unit, and based on the detection result of the person, extracts a person's face from the image photographed by the photographing unit. Based on the face detection result, the facial expression or orientation is detected, and for each person photographed in the image photographed by the photographing unit, the facial expression or orientation of the person is photographed and detected. counting the number of images satisfying condition 1, selecting images from a plurality of images captured by the imaging unit so that the number of images of each person is approximately the same , transmitting the images to the server device, and transmitting the images to the server device; receives a plurality of images photographed by a plurality of the photographing devices, counts the number of images in which the person is photographed and satisfies a second condition for each person photographed in the received images, and obtains a plurality of Images are sorted out from the plurality of images captured by the imaging device so that the number of images of each person is approximately the same .

一実施形態による場合は、画像の送受信による通信負荷の増大を抑制することが期待できる。 According to one embodiment, it can be expected to suppress an increase in communication load due to image transmission/reception.

本実施の形態に係る情報処理システムの概要を説明するための模式図である。1 is a schematic diagram for explaining an overview of an information processing system according to an embodiment; FIG. 本実施の形態に係るカメラの構成を示すブロック図である。1 is a block diagram showing the configuration of a camera according to this embodiment; FIG. 本実施の形態に係るサーバ装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a server device according to an embodiment; FIG. 本実施の形態に係る端末装置の構成を示すブロック図である。2 is a block diagram showing the configuration of a terminal device according to this embodiment; FIG. 本実施の形態に係るカメラが行う画像選別処理の手順を示すフローチャートである。4 is a flowchart showing the procedure of image selection processing performed by the camera according to the embodiment; カメラによる同一シーン判定を説明するための模式図である。FIG. 4 is a schematic diagram for explaining same-scene determination by a camera; カメラが送信する画像に付すメタデータの一例を示す模式図である。FIG. 4 is a schematic diagram showing an example of metadata attached to an image transmitted by a camera; 本実施の形態に係るサーバ装置が行う画像選別処理の手順を示すフローチャートである。5 is a flowchart showing the procedure of image selection processing performed by the server device according to the embodiment; 本実施の形態に係るサーバ装置が行う画像送信処理の手順を示すフローチャートである。4 is a flowchart showing the procedure of image transmission processing performed by the server device according to the embodiment; 端末装置が表示する検索条件設定画面の一例を示す模式図である。FIG. 4 is a schematic diagram showing an example of a search condition setting screen displayed by a terminal device; 本実施の形態に係る端末装置が行う処理の手順を示すフローチャートである。4 is a flow chart showing the procedure of processing performed by the terminal device according to the present embodiment;

本発明の実施形態に係る情報処理システムの具体例を、以下に図面を参照しつつ説明する。なお、本発明はこれらの例示に限定されるものではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 A specific example of an information processing system according to an embodiment of the present invention will be described below with reference to the drawings. The present invention is not limited to these examples, but is indicated by the scope of the claims, and is intended to include all modifications within the scope and meaning equivalent to the scope of the claims.

＜システム概要＞
図１は、本実施の形態に係る情報処理システムの概要を説明するための模式図である。本実施の形態に係る情報処理システムは、施設１００に設置された一又は複数のカメラ１が撮影した画像をサーバ装置３へ送信し、カメラ１から受信した画像をサーバ装置３がユーザの端末装置５へ送信するシステムである。図示の例では、施設１００として保育園又は幼稚園等が示されている。保育園又は幼稚園等の施設１００に設置されたカメラ１は、例えば所定の周期で繰り返し自動的に施設１００内の撮影、施設１００を利用する利用者（本例では子供）の撮影を行っている。カメラ１は携帯電話通信網及びインターネット等のネットワークを介した通信を行う機能を備えており、カメラ１は撮影した画像をサーバ装置３へ送信する。 <System Overview>
FIG. 1 is a schematic diagram for explaining an outline of an information processing system according to this embodiment. The information processing system according to the present embodiment transmits images captured by one or a plurality of cameras 1 installed in a facility 100 to the server device 3, and the server device 3 transmits images received from the cameras 1 to terminal devices of users. It is a system that transmits to 5. In the illustrated example, the facility 100 is a nursery school, a kindergarten, or the like. A camera 1 installed in a facility 100 such as a nursery school or a kindergarten automatically and repeatedly takes pictures of the inside of the facility 100 and of users (in this example, children) who use the facility 100, for example, at predetermined intervals. The camera 1 has a function of performing communication via a network such as a mobile phone communication network and the Internet, and the camera 1 transmits captured images to the server device 3 .

サーバ装置３は、ネットワークを介してカメラ１から画像を受信し、受信した画像を記憶装置に記憶して蓄積する。またサーバ装置３は、施設１００の利用者（本例では子供、子供の保護者又は保育士等）に対応付けられたスマートフォン又はＰＣ（パーソナルコンピュータ）等の端末装置５へ、蓄積した画像を送信する。これにより例えば保育園又は幼稚園等の施設１００に子供を預けている保護者は、施設１００で過ごす子供の様子をカメラ１にて撮影した画像を自身の端末装置５にて閲覧することができる。また例えば施設１００の保育士等は、子供の写真を撮影するという業務を行うことなく、施設１００で過ごす子供の様子をカメラ１にて撮影した画像を端末装置５にて取得することができ、例えばフォトアルバムの作成又は施設１００内での写真の掲示等を行うことができる。 The server device 3 receives images from the camera 1 via the network, and stores the received images in a storage device. The server device 3 also transmits the accumulated images to a terminal device 5 such as a smartphone or a PC (personal computer) associated with a user of the facility 100 (in this example, a child, a child's guardian, a nursery teacher, or the like). do. As a result, a parent who entrusts his/her child to a facility 100 such as a nursery school or a kindergarten can view an image of the child spending time at the facility 100 captured by the camera 1 on his or her own terminal device 5.例文帳に追加Further, for example, a childcare worker or the like at the facility 100 can obtain an image of a child spending time at the facility 100 by using the camera 1 and using the terminal device 5 without taking a picture of the child. For example, it is possible to create a photo album or post photos within the facility 100 .

例えば施設１００に設置されたカメラ１が１日に自動的に撮影する画像は数千枚～数十万枚であり、この大量の撮影画像の全てを利用者に提供しても、利用者が画像の管理等を行うことは容易ではない。そこで本実施の形態に係る情報処理システムでは、カメラ１が撮影した大量の画像から適宜に画像を選別し、例えば１日に数十枚～数百枚の画像をサーバ装置３が記憶して蓄積する。またサーバ装置３は、蓄積した画像の中から例えば利用者毎に適した画像を選択して端末装置５へ送信する。これにより利用者が端末装置５にて閲覧又は取得等する画像は、１日に数十枚～数百枚程度となる。なお画像の枚数は一例であって、これに限るものではない。 For example, the camera 1 installed in the facility 100 automatically takes thousands to hundreds of thousands of images per day. It is not easy to manage images. Therefore, in the information processing system according to the present embodiment, images are appropriately selected from a large number of images captured by the camera 1, and the server device 3 stores and accumulates several tens to hundreds of images per day, for example. do. The server device 3 also selects, for example, an image suitable for each user from the stored images and transmits the selected image to the terminal device 5 . As a result, the number of images that the user browses or acquires on the terminal device 5 is about several tens to several hundred per day. Note that the number of images is an example, and the number of images is not limited to this.

またカメラ１が撮影した大量の画像の全てをサーバ装置３へ送信する場合、カメラ１及びサーバ装置３の間の通信量が増大し、ネットワークの通信負荷が増大する虞がある。そこで本実施の形態に係る情報処理システムでは、画像処理及び情報処理等を行うことが可能なカメラ１を用いて、カメラ１にて画像を選別する処理を行い、大量の画像から選別された小量の画像のみをサーバ装置３へ送信する。撮影した全ての画像に対して枚数を例えば百分の一に低減してカメラ１からサーバ装置３へ送信することで、カメラ１及びサーバ装置３の間のネットワークの負荷は、全ての画像を送信する場合と比較して百分の一程度に低減することが期待できる。 In addition, when transmitting all of the large number of images captured by the camera 1 to the server device 3, the amount of communication between the camera 1 and the server device 3 increases, which may increase the communication load on the network. Therefore, in the information processing system according to the present embodiment, the camera 1 capable of performing image processing, information processing, etc. is used, and processing for selecting images is performed by the camera 1 to select small images selected from a large number of images. Quantity of images only are transmitted to the server device 3 . By reducing the number of all captured images to, for example, 1/100 and transmitting them from the camera 1 to the server device 3, the load on the network between the camera 1 and the server device 3 can be reduced by transmitting all images. It can be expected to be reduced to about 1/100 compared to the case of

本実施の形態に係るカメラ１は、例えば撮影した画像から人（子供）を検出する処理を行い、人が写されている画像を選別し、人が写されていない画像を除去する。またカメラ１は、人が写されている画像のうち、例えば写されている人がおむつをしている画像、着替え中の画像又は裸の画像等のように、プライバシーに関して不適切な画像を除去してもよい。またカメラ１は、人が写されている画像について人の顔を検出する処理、顔の表情を検出する処理、及び、顔の向きを検出する処理等を行い、これらの処理結果に基づいて画像の選別を行ってもよい。これにより例えばカメラ１は、正面を向いた笑顔が写された画像を選別してサーバ装置３へ送信することができる。なおカメラ１は、これら以外の種々の条件に基づいてサーバ装置３へ送信する画像の選別を行ってよい。 The camera 1 according to the present embodiment performs, for example, a process of detecting people (children) from captured images, selects images in which people are captured, and removes images in which people are not captured. In addition, the camera 1 removes images inappropriate for privacy, such as an image of a person wearing a diaper, an image of the person being changed, or an image of the person being naked, among images in which a person is photographed. You may In addition, the camera 1 performs a process of detecting a human face, a process of detecting facial expressions, a process of detecting the orientation of a face, and the like for an image in which a person is photographed. can be selected. As a result, for example, the camera 1 can select an image showing a smiling face facing the front and transmit the image to the server device 3 . Note that the camera 1 may select images to be transmitted to the server device 3 based on various conditions other than these.

本実施の形態に係るサーバ装置３は、カメラ１から受信した画像を記憶して蓄積すると共に、これらの画像に対して種々の画像処理及び情報処理を行い、利用者の端末装置５へ画像を送信する。例えばサーバ装置３は、画像に写された人がどのような行動をしているかを判断する行動認識の処理を行い、子供がハイハイをしている画像又は食事をしている画像等のように特定シーンの画像を選別して端末装置５へ送信してもよい。また例えばサーバ装置３は、画像に写された人が誰であるかを判断する処理を行い、子供が写された画像をその保護者の端末装置５へ送信してもよい。また例えばサーバ装置３は、画像全体に対して写されている人が小さい場合に、画像からこの人が写された画像領域を抽出し、抽出した画像領域の解像度を高めて拡大する画像処理を行って、拡大した画像を端末装置５へ送信してもよい。なおサーバ装置３は、これら以外の種々の条件に基づいて端末装置５へ送信する画像の選別を行ってよい。 The server device 3 according to the present embodiment stores and accumulates the images received from the camera 1, performs various image processing and information processing on these images, and transmits the images to the terminal device 5 of the user. Send. For example, the server device 3 performs behavior recognition processing to determine what kind of behavior the person photographed in the image is doing, and recognizes images such as an image of a child crawling or an image of eating. An image of a specific scene may be selected and transmitted to the terminal device 5 . Further, for example, the server device 3 may perform a process of determining who is the person shown in the image, and may transmit the image showing the child to the terminal device 5 of the guardian. Further, for example, when the person in the image is small in the overall image, the server device 3 extracts an image area in which the person is shown from the image, and performs image processing to increase the resolution of the extracted image area and enlarge it. You may go and transmit the enlarged image to the terminal device 5 . Note that the server device 3 may select images to be transmitted to the terminal device 5 based on various conditions other than these.

また本実施の形態に係る情報処理システムでは、利用者毎にこの利用者が写された画像が選別された数をカウントし、利用者毎の画像の数が均一となるように、画像の選別を行う。利用者毎の画像数を均一化する処理は、カメラ１が行ってもよく、サーバ装置３が行ってもよい。本実施の形態に係る情報処理システムでは、カメラ１が利用者毎の画像数を均一化するよう選別した画像をサーバ装置３へ送信すると共に、サーバ装置３においても利用者毎の画像数を均一化するよう選別して画像を記憶する。これは、１つの施設１００に複数のカメラ１が設置されている場合に、各カメラ１で均一化を行うと共に、サーバ装置３が複数のカメラ１から受信した画像を対象に均一化を行うことを想定している。１つの施設１００に１つのカメラ１のみが設置される場合には、カメラ１又はサーバ装置３のいずれか一方でのみ均一化を行ってもよい。 In addition, in the information processing system according to the present embodiment, the number of selected images showing the user is counted for each user, and images are selected so that the number of images for each user is uniform. I do. The process of equalizing the number of images for each user may be performed by the camera 1 or by the server device 3 . In the information processing system according to the present embodiment, the camera 1 transmits to the server apparatus 3 the images selected so as to uniform the number of images for each user, and the server apparatus 3 also uniforms the number of images for each user. The images are stored by sorting them so that they are When a plurality of cameras 1 are installed in one facility 100, each camera 1 performs homogenization, and the server device 3 performs homogenization on the images received from the plurality of cameras 1. is assumed. If only one camera 1 is installed in one facility 100, the equalization may be performed on either the camera 1 or the server device 3 only.

サーバ装置３から端末装置５への画像の送信は、どのような態様で行われてもよい。例えばサーバ装置３は、１日の定められた時刻に、その日に撮影された画像を端末装置５へ送信してもよい。これは、いわゆるプッシュ型の情報送信の方法である。また例えば、サーバ装置３は日時等に基づいて画像を分類して記憶し、端末装置５からの要求に応じてサーバ装置３が画像を送信してもよい。これは、いわゆるプル型の情報送信の方法であり、端末装置５は送信を要求する画像についての条件を指定することが可能であってもよい。条件には、例えば撮影された日時を指定する条件、笑顔又は泣き顔等の表情を指定する条件、ハイハイ又は食事等の行動を指定する条件、及び、我が子等の特定の人を指定する条件等が含まれ得る。サーバ装置３は、蓄積した画像の中から、指定された条件を満たす画像を選別して要求元の端末装置５へ送信する。 The image transmission from the server device 3 to the terminal device 5 may be performed in any manner. For example, the server device 3 may transmit images taken on that day to the terminal device 5 at a predetermined time of the day. This is a so-called push-type information transmission method. Further, for example, the server device 3 may classify and store the images based on date and time, and the server device 3 may transmit the images in response to a request from the terminal device 5 . This is a so-called pull-type information transmission method, and the terminal device 5 may be able to specify the conditions for the image to be transmitted. The conditions include, for example, a condition that specifies the date and time when the photograph was taken, a condition that specifies facial expressions such as a smiling or crying face, a condition that specifies behavior such as crawling or eating, and a condition that specifies a specific person such as one's own child. etc. may be included. The server device 3 selects images that satisfy specified conditions from the stored images and transmits them to the terminal device 5 that is the source of the request.

また本実施の形態に係る情報処理システムでは、利用者が自ら撮影した画像を条件として与えることにより、この画像に写された人と同じ人（同じ又は類似の特徴を有する人）が写された画像の送信をサーバ装置３に要求することができる。利用者は例えば我が子を撮影した画像を端末装置５に読み込ませる操作を行い、端末装置５はこの画像に写された人の特徴を抽出する処理を行い、抽出した特徴を示すデータをサーバ装置３へ送信する。サーバ装置３は、端末装置５から受信した特徴のデータに基づいて、記憶した画像に写された人の特徴との比較を行い、特徴が一致する又は類似する人が写された画像を選別して端末装置５へ送信する。これにより、利用者が多くの画像の中から所望の人が写された画像を取得することを容易化することが期待できる。 In addition, in the information processing system according to the present embodiment, by providing an image taken by the user as a condition, the same person (person having the same or similar characteristics) as the person photographed in this image is photographed. It is possible to request the server device 3 to send an image. For example, the user performs an operation to cause the terminal device 5 to read an image of his/her own child. Send to device 3. The server device 3 compares the feature data received from the terminal device 5 with the features of the person photographed in the stored images, and selects images showing persons whose features match or are similar to each other. to the terminal device 5. As a result, it can be expected that the user can easily obtain an image in which a desired person is photographed from among many images.

なお本例では保育園又は幼稚園等の施設１００にて子供の撮影を行う場合について説明したが、本実施の形態に係る情報処理システムの適用は保育園又は幼稚園等の施設１００に限られない。施設１００は、例えば遊園地又はテーマパーク等の商業施設であってもよく、また例えばスポーツ又はイベント等が開催される会場等の施設であってもよく、また例えば利用者の自宅であってもよい。またカメラ１による撮影対象は、人でなくてもよく、例えば動物園の動物又は自宅のペット等であってもよい。 In this example, the case of photographing a child at the facility 100 such as a nursery school or kindergarten has been described, but application of the information processing system according to the present embodiment is not limited to the facility 100 such as a nursery school or kindergarten. The facility 100 may be, for example, a commercial facility such as an amusement park or a theme park, or may be a facility such as a venue where sports or events are held, or may be, for example, the user's home. good. Also, the object to be photographed by the camera 1 may not be a person, but may be, for example, an animal in a zoo or a pet at home.

＜装置構成＞
図２は、本実施の形態に係るカメラ１の構成を示すブロック図である。本実施の形態に係るカメラ１は、いわゆるエッジカメラ、ＡＩ（Artificial Intelligence）エッジカメラ又はエッジＡＩカメラ等の名称で呼ばれ得る装置であり、画像の撮影を行う撮影部１１と共に、高度な演算処理を行う情報処理装置２０を装置内に備えている。 <Device configuration>
FIG. 2 is a block diagram showing the configuration of the camera 1 according to this embodiment. The camera 1 according to the present embodiment is a device that can be called a so-called edge camera, an AI (Artificial Intelligence) edge camera, an edge AI camera, or the like. An information processing device 20 is provided in the device.

撮影部１１は、レンズ及び撮像素子等を備えて構成されている。撮像素子は、例えばＣＣＤ（Charge Coupled Device）又はＣＭＯＳ（Complementary Metal Oxide Semiconductor）等であり、レンズが収束した像を撮影し、撮影により得られる画像のデータを出力する。本実施の形態において撮影部１１は、撮影により得られる画像のデータを圧縮することなく、例えばビットマップ形式等のような非圧縮の画像データとして情報処理装置２０へ出力する。 The photographing unit 11 includes a lens, an image sensor, and the like. The imaging device is, for example, a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) or the like, captures an image converged by a lens, and outputs data of the captured image. In the present embodiment, the photographing unit 11 outputs image data obtained by photographing to the information processing apparatus 20 as non-compressed image data in a bitmap format, for example, without compressing the data.

情報処理装置２０は、カメラ１の各部の動作を制御すると共に、撮影部１１の撮影により得られた画像を利用する種々の処理を行う。本実施の形態に係る情報処理装置２０は、処理部２１、入出力部２２、記憶部２３及び通信部２４等を備えて構成されている。処理部２１は、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）又はＧＰＵ（Graphics Processing Unit）等の演算処理装置、ＲＯＭ（Read Only Memory）、及び、ＲＡＭ（Random Access Memory）等を用いて構成されている。処理部２１は、記憶部２３に記憶されたプログラム２３ａを読み出して実行することにより、撮影部１１による撮影を制御する処理、及び、撮影により得られた画像を選別する処理等の種々の処理を行う。 The information processing device 20 controls the operation of each unit of the camera 1 and performs various processes using images captured by the imaging unit 11 . The information processing apparatus 20 according to the present embodiment includes a processing section 21, an input/output section 22, a storage section 23, a communication section 24, and the like. The processing unit 21 uses an arithmetic processing unit such as a CPU (Central Processing Unit), an MPU (Micro-Processing Unit) or a GPU (Graphics Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). configured as follows. By reading and executing the program 23a stored in the storage unit 23, the processing unit 21 performs various processes such as processing for controlling imaging by the imaging unit 11 and processing for selecting images obtained by imaging. conduct.

入出力部２２は、撮影部１１との間でデータの入出力を行う。入出力部２２は、例えば信号線を介して撮影部１１と接続されており、信号線を介したシリアル通信又はパラレル通信等によりデータの入出力を行う。入出力部２２は、処理部２１から与えられた制御命令等のデータを撮影部１１へ送信すると共に、撮影部１１から入力された画像のデータを処理部２１へ与える。 The input/output unit 22 inputs and outputs data to and from the imaging unit 11 . The input/output unit 22 is connected to the imaging unit 11 via, for example, a signal line, and inputs/outputs data by serial communication or parallel communication via the signal line. The input/output unit 22 transmits data such as a control command given from the processing unit 21 to the photographing unit 11 , and gives data of an image inputted from the photographing unit 11 to the processing unit 21 .

記憶部２３は、例えばフラッシュメモリ又はＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）等の不揮発性のメモリ素子を用いて構成されている。記憶部２３は、処理部２１が実行する各種のプログラム、及び、処理部２１の処理に必要な各種のデータを記憶する。本実施の形態において記憶部２３は、処理部２１が実行するプログラム２３ａを記憶する。また記憶部２３は、撮影部１１が撮影した画像のデータ等を記憶してもよい。 The storage unit 23 is configured using a non-volatile memory device such as a flash memory or EEPROM (Electrically Erasable Programmable Read Only Memory). The storage unit 23 stores various programs executed by the processing unit 21 and various data required for processing by the processing unit 21 . In the present embodiment, the storage unit 23 stores a program 23a executed by the processing unit 21. FIG. The storage unit 23 may also store data of images captured by the imaging unit 11, and the like.

本実施の形態においてプログラム２３ａは、例えばカメラ１の製造段階において記憶部２３に書き込まれる。また例えばプログラム２３ａは、遠隔のサーバ装置等が配信するものをカメラ１が通信にて取得してもよい。また例えばプログラム２３ａは、メモリカード又は光ディスク等の記録媒体に記録された態様で提供され、カメラ１は記録媒体からプログラム２３ａを読み出して記憶部２３に記憶してもよい。また例えばプログラム２３ａは、記録媒体に記録されたものを書込装置が読み出してカメラ１の記憶部２３に書き込んでもよい。プログラム２３ａは、ネットワークを介した配信の態様で提供されてもよく、記録媒体に記録された態様で提供されてもよい。 In this embodiment, the program 23a is written in the storage unit 23, for example, during the manufacturing stage of the camera 1. FIG. Further, for example, the program 23a may be distributed by a remote server device or the like and acquired by the camera 1 through communication. Alternatively, for example, the program 23 a may be provided in a form recorded on a recording medium such as a memory card or an optical disk, and the camera 1 may read the program 23 a from the recording medium and store it in the storage section 23 . Further, for example, the program 23 a may be recorded in a recording medium and read by a writing device and written in the storage section 23 of the camera 1 . The program 23a may be provided in the form of distribution via a network, or may be provided in the form of being recorded on a recording medium.

通信部２４は、携帯電話通信網、無線ＬＡＮ（Local Area Network）又はインターネット等のネットワークＮを介して、種々の装置との間で通信を行う。本実施の形態において通信部２４は、サーバ装置３との間で通信を行い、撮影部１１が撮影した画像のデータをサーバ装置３へ送信する。通信部２４は、処理部２１から与えられたデータを他の装置へ送信すると共に、他の装置から受信したデータを処理部２１へ与える。 The communication unit 24 communicates with various devices via a network N such as a mobile phone communication network, a wireless LAN (Local Area Network), or the Internet. In the present embodiment, the communication unit 24 communicates with the server device 3 and transmits data of images captured by the imaging unit 11 to the server device 3 . The communication unit 24 transmits data received from the processing unit 21 to other devices, and provides the processing unit 21 with data received from other devices.

また本実施の形態に係るカメラ１の処理部２１には、記憶部２３に記憶されたプログラム２３ａを処理部２１が読み出して実行することにより、人検出部２１ａ、不適切画像検出部２１ｂ、顔検出部２１ｃ、表情向き検出部２１ｄ、画像選別部２１ｅ及び画像送信処理部２１ｆ等がソフトウェア的な機能部として実現される。 Further, the processing unit 21 of the camera 1 according to the present embodiment reads out and executes a program 23a stored in the storage unit 23, whereby a person detection unit 21a, an inappropriate image detection unit 21b, a face The detection unit 21c, facial expression direction detection unit 21d, image selection unit 21e, image transmission processing unit 21f, and the like are implemented as software functional units.

人検出部２１ａは、撮影部１１が撮影した画像に写された人を検出する処理を行う。また本実施の形態に係るカメラ１は、撮影部１１から取得した非圧縮の撮影画像のデータを基に、この画像を圧縮した圧縮画像のデータを生成し、生成した圧縮画像を基に人検出部２１ａが人を検出する処理を行う。人検出部２１ａは、例えば予め機械学習がなされた学習モデルを用いて、画像から人を検出する。学習モデルは、例えば画像のデータを入力として受け付けて、この画像に含まれる人が写された画像領域（を示す座標等のデータ）を出力するように予め機械学習がなされる。学習モデルは、例えば画像のデータと、この画像に含まれる人が写された画像領域を示すデータとが対応付けられた教師データを用いて機械学習がなされる。また保育園又は幼稚園等の施設１００にて撮影された画像を基に検出する対象の人を子供とする場合、子供が写された画像データを用いて教師データを作成することにより、子供の検出精度を高めることが期待できる。人検出部２１ａは、撮影部１１が撮影した画像を圧縮した圧縮画像を学習モデルへ入力し、学習モデルが出力する人の検出結果を取得して、取得した検出結果を顔検出部２１ｃへ与える。また人検出部２１ａは、画像から人が検出されなかった場合、即ち画像に人が写されていない場合、この画像を除去して後続の処理の対象から外す。 The person detection unit 21a performs processing for detecting a person appearing in the image captured by the image capturing unit 11 . Further, the camera 1 according to the present embodiment generates compressed image data obtained by compressing the uncompressed photographed image data obtained from the photographing unit 11, and detects a person based on the generated compressed image. The part 21a performs processing for detecting a person. The person detection unit 21a detects a person from an image using, for example, a learning model that has undergone machine learning in advance. For example, the learning model receives image data as an input, and performs machine learning in advance so as to output (data such as coordinates indicating) an image area in which a person included in the image is photographed. The learning model is machine-learned using teacher data in which, for example, image data and data indicating an image region in which a person is included in the image are associated with each other. Further, when a person to be detected based on an image taken at a facility 100 such as a nursery school or a kindergarten is a child, the child detection accuracy can be improved by creating teacher data using image data of the child. can be expected to increase The human detection unit 21a inputs a compressed image obtained by compressing the image captured by the imaging unit 11 to the learning model, acquires the human detection result output by the learning model, and provides the acquired detection result to the face detection unit 21c. . If a person is not detected from the image, that is, if no person is shown in the image, the person detection unit 21a removes this image from subsequent processing.

不適切画像検出部２１ｂは、撮影部１１が撮影した画像から不適切と判断される画像を検出して除去する処理を行う。本実施の形態において不適切な画像は、例えば施設１００として保育園又は幼稚園等にカメラ１が設置される場合、子供がおむつをしている姿が映された画像、裸（上半身のみ裸又は下半身のみ裸等を含む）の姿の子供が写された画像、及び、着替え中の子供が写された画像等のように、写された子供のプライバシー等に関わる画像が含まれる。不適切画像検出部２１ｂは、例えば予め機械学習がなされた学習モデルを用いて、画像が不適切であるか否かを判定する。学習モデルは、例えば画像のデータを入力として受け付けて、この画像がどの程度適切であるかを示す適切度等の数値を出力するように予め機械学習がなされる。学習モデルは、例えば画像のデータと、この画像が適切であるか否かを示すラベル（適切であればラベル１、不適切であればラベル０など）とを対応付けた教師データを用いて機械学習がなされる。不適切な画像の特徴を学習モデルに予め学習させておくことで、不適切な画像と同じ又は類似する特徴を有する画像を学習モデルが判別することが可能となる。本実施の形態において不適切画像検出部２１ｂは、撮影部１１が撮影した非圧縮の画像のデータを学習モデルへ入力し、学習モデルが出力する適切度を取得し、取得した適切度が予め定められた閾値を超えるか否かに応じて画像が適切であるか否かを判断する。不適切画像検出部２１ｂは、適正ではない、即ち不適切であると判断した画像を除去して後続の処理の対象から外す。 The inappropriate image detection unit 21b detects and removes images determined to be inappropriate from the images captured by the image capturing unit 11 . For example, when the camera 1 is installed in a nursery school or a kindergarten as the facility 100, inappropriate images in the present embodiment include an image showing a child wearing a diaper, a nude image (only the upper body is naked, or only the lower body is naked). (including nudity, etc.), and images related to the privacy of the child, such as an image of a child changing clothes. The inappropriate image detection unit 21b determines whether or not an image is inappropriate by using, for example, a machine-learned learning model in advance. For example, the learning model accepts image data as an input and performs machine learning in advance so as to output a numerical value such as a degree of appropriateness indicating how appropriate the image is. The learning model uses teacher data that associates, for example, image data with a label indicating whether or not the image is appropriate (label 1 if appropriate, label 0 if inappropriate, etc.). learning is done. By having the learning model learn in advance the features of inappropriate images, the learning model can discriminate images having the same or similar features as the inappropriate images. In the present embodiment, the inappropriate image detection unit 21b inputs uncompressed image data captured by the imaging unit 11 to the learning model, acquires the appropriateness output by the learning model, and determines the appropriateness obtained in advance. The image is judged appropriate depending on whether or not it exceeds the set threshold. The inappropriate image detection unit 21b removes images determined to be inappropriate, ie, inappropriate, from subsequent processing.

顔検出部２１ｃは、人検出部２１ａによる人検出結果を利用して、撮影部１１が撮影した画像に写された人の顔を検出する処理を行う。顔検出部２１ｃは、例えば予め機械学習がなされた学習モデルを用いて、画像から人の顔を検出する。学習モデルは、例えば画像のデータと人検出結果とを入力として受け付けて、この画像に含まれる人の顔が写された画像領域を出力するように予め機械学習がなされる。又は、画像から人が写された画像領域を抽出したデータを学習モデルへの入力としてもよい。学習モデルは、例えば画像のデータ及び人検出結果のデータと、この画像に含まれる人の顔が写された画像領域を示すデータとが対応付けられた教師データを用いて機械学習がなされる。顔検出部２１ｃは、撮影部１１が撮影した非圧縮の画像のデータと、人検出部２１ａによる人検出結果のデータとを学習モデルへ入力し、学習モデルが出力する人の顔の検出結果を取得して、取得した検出結果を表情向き検出部２１ｄへ与える。 The face detection unit 21c uses the human detection result obtained by the human detection unit 21a to perform processing for detecting the face of the person captured in the image captured by the imaging unit 11. FIG. The face detection unit 21c detects a person's face from an image using, for example, a machine-learned learning model in advance. The learning model receives, for example, image data and human detection results as inputs, and performs machine learning in advance so as to output an image area in which a person's face is included in the image. Alternatively, data obtained by extracting an image region in which a person is photographed from an image may be input to the learning model. The learning model is machine-learned using teacher data in which, for example, image data and human detection result data are associated with data indicating an image area in which a person's face is included in the image. The face detection unit 21c inputs the uncompressed image data captured by the imaging unit 11 and the human detection result data obtained by the human detection unit 21a to the learning model, and uses the human face detection result output by the learning model as input. It acquires and gives the acquired detection result to the facial expression direction detection unit 21d.

表情向き検出部２１ｄは、顔検出部２１ｃによる人の顔の検出結果を利用して、検出された人の顔の表情を検出する処理、及び、検出された人の顔の向きを検出する処理を行う。本実施の形態において表情向き検出部２１ｄは、顔の表情として笑顔を検出するものとするが、これに限るものではなく、笑顔以外の表情を検出してもよい。表情を検出する処理において表情向き検出部２１ｄは、例えば予め機械学習がなされた学習モデルを用いて、画像に写された人の顔の表情を検出する。学習モデルは、例えば画像のデータと顔検出結果とを入力として受け付けて、この画像に写された人の顔の表情が笑顔である確信度を出力するように予め機械学習がなされる。又は、画像から人の顔が写された画像領域を抽出したデータを学習モデルへの入力としてもよい。学習モデルは、例えば画像のデータ及び顔検出結果のデータと、この画像に写された人の顔の表情が笑顔であるか否かを示すラベル（笑顔であればラベル１、笑顔でなければラベル０など）とを対応付けた教師データを用いて機械学習がなされる。本実施の形態において表情向き検出部２１ｄは、撮影部１１が撮影した非圧縮の画像のデータと顔検出部２１ｃによる顔検出の結果のデータとを学習モデルへ入力し、学習モデルが出力する笑顔の確信度を取得し、取得した確信度が予め定められた閾値を超えるか否かに応じて、表情が笑顔であるか否かを判断する。 The facial expression orientation detection unit 21d uses the detection result of the human face by the face detection unit 21c to detect the expression of the detected human face and the detection process of detecting the orientation of the detected human face. I do. In the present embodiment, facial expression orientation detection unit 21d detects a smile as a facial expression, but the present invention is not limited to this, and facial expressions other than a smile may be detected. In the facial expression detection process, the facial expression orientation detection unit 21d detects the facial expression of the person captured in the image, for example, using a learning model that has undergone machine learning in advance. The learning model receives, for example, image data and face detection results as inputs, and performs machine learning in advance so as to output the degree of certainty that the facial expression of a person captured in this image is a smile. Alternatively, data obtained by extracting an image region in which a person's face is photographed from an image may be input to the learning model. The learning model includes, for example, image data, face detection result data, and a label indicating whether or not the facial expression of the person in the image is smiling (label 1 if smiling, label 1 if not smiling). 0, etc.) are used to perform machine learning. In the present embodiment, the facial expression direction detection unit 21d inputs data of the uncompressed image captured by the image capturing unit 11 and the data of the result of face detection by the face detection unit 21c to the learning model, and detects the smile output by the learning model. is acquired, and whether or not the facial expression is a smile is determined according to whether or not the acquired certainty exceeds a predetermined threshold.

また顔の向きを検出する処理において表情向き検出部２１ｄは、例えば検出された人の顔から目、口及び鼻等の位置を検出し、検出した各部位の位置関係などに基づいて顔の向きを判断する。表情向き検出部２１ｄは、例えば正面向きを０°とし、左右に±９０°の範囲で顔の向きを角度の数値として出力してもよく、また例えば顔の向きが正面を向いているか否かを二値情報として出力してもよく、また例えば顔の向きが正面である確信度を出力してもよく、これら以外の情報を顔の向きの検出結果として出力してもよい。 In addition, in the process of detecting the orientation of the face, the facial expression orientation detection unit 21d detects the positions of the eyes, mouth, nose, etc. from the detected human face, for example, and detects the orientation of the face based on the positional relationship of each detected part. to judge. For example, the facial expression orientation detection unit 21d may output the face orientation as an angle value within a range of ±90° to the left and right, with the front orientation set to 0°. may be output as binary information, or, for example, the degree of certainty that the face is facing forward may be output, or information other than these may be output as the detection result of the face orientation.

又は、表情向き検出部２１ｄは、予め機械学習がなされた学習モデルを用いて、人の顔の向きを検出してもよい。学習モデルは、例えば画像のデータと顔検出結果とを入力として受け付けて、この画像に写された人の顔の向きが正面である確信度を出力するように予め機械学習がなされる。又は、画像から人の顔が写された画像領域を抽出したデータを学習モデルへの入力としてもよい。学習モデルは、例えば画像のデータ及び顔検出結果のデータと、この画像に写された人の顔の向きが正面であるか否かを示すラベル（正面であればラベル１、正面でなければラベル０など）とを対応付けた教師データを用いて機械学習がなされる。本実施の形態において表情向き検出部２１ｄは、撮影部１１が撮影した非圧縮の画像のデータと顔検出部２１ｃによる顔検出の結果のデータとを学習モデルへ入力し、学習モデルが出力する顔の向きが正面である確信度を取得し、取得した確信度が予め定められた閾値を超えるか否かに応じて、顔の向きが正面であるか否かを判断する。 Alternatively, the facial expression orientation detection unit 21d may detect the orientation of a person's face using a learning model that has undergone machine learning in advance. The learning model receives, for example, image data and face detection results as inputs, and performs machine learning in advance so as to output the degree of certainty that the face of a person photographed in the image is facing the front. Alternatively, data obtained by extracting an image region in which a person's face is photographed from an image may be input to the learning model. The learning model includes, for example, image data, face detection result data, and a label indicating whether or not the direction of the person's face in the image is frontal (label 1 if frontal, label 1 if not frontal). 0, etc.) are used to perform machine learning. In the present embodiment, the facial expression orientation detection unit 21d inputs data of an uncompressed image captured by the imaging unit 11 and data of the result of face detection by the face detection unit 21c to the learning model, and detects the face output by the learning model. The degree of certainty that the orientation of the face is the front is obtained, and whether or not the orientation of the face is the front is determined according to whether or not the obtained degree of certainty exceeds a predetermined threshold value.

画像選別部２１ｅは、表情向き検出部２１ｄが検出した人の顔の表情及び向きに基づいて、サーバ装置３へ送信する画像と、サーバ装置３へ送信せずに除去する画像とを選別する処理を行う。本実施の形態において画像選別部２１ｅは、例えば画像に写された人の顔が正面を向いており、且つ、表情が笑顔である画像をサーバ装置３へ送信する画像とし、これ以外の画像を除去する。なお画像の選別の条件は一例であって、これに限るものではない。画像選別部２１ｅは、送信するものと選別した画像を、画像送信処理部２１ｆへ与える。 The image selection unit 21e selects an image to be transmitted to the server device 3 and an image to be removed without being transmitted to the server device 3, based on the expression and direction of the person's face detected by the facial expression direction detection unit 21d. I do. In the present embodiment, the image selection unit 21e selects an image to be transmitted to the server device 3, for example, an image in which the face of a person in the image is facing the front and has a smiling expression. Remove. Note that the image sorting condition is an example, and is not limited to this. The image selection unit 21e provides the image selected for transmission to the image transmission processing unit 21f.

画像送信処理部２１ｆは、画像選別部２１ｅが送信すると選別した画像をサーバ装置３へ送信する処理を行う。また画像送信処理部２１ｆは、送信する画像について、人検出部２１ａによる人の検出結果、顔検出部２１ｃによる顔の検出結果、表情向き検出部２１ｄによる顔の表情及び向きの検出結果に関するデータを取得し、取得したこれらのデータをいわゆるメタデータとして画像に付してサーバ装置３へ送信する。また画像に付すデータには、画像の撮影日時、カメラ１に付されたカメラＩＤ、及び、カメラ１が設置された施設１００の施設ＩＤ等の情報が含まれ得る。 The image transmission processing unit 21f performs processing for transmitting selected images to the server device 3 when the image selection unit 21e transmits the images. Further, the image transmission processing unit 21f stores data regarding the result of human detection by the person detection unit 21a, the result of face detection by the face detection unit 21c, and the result of detection of facial expression and direction by the facial expression direction detection unit 21d for the image to be transmitted. These acquired data are attached to the image as so-called metadata and transmitted to the server device 3 . The data attached to the image may include information such as the date and time when the image was taken, the camera ID attached to the camera 1, and the facility ID of the facility 100 in which the camera 1 is installed.

なお、本実施の形態に係るカメラ１が用いる種々の学習モデルの機械学習は、カメラ１が行ってもよく、カメラ１以外の装置が行ってもよい。学習モデルは、入力値に対して所定の演算を行い、演算結果を出力するものであり、記憶部２３にはこの演算を規定する関数の係数及び閾値等のデータが学習モデルとして記憶される。学習モデルは、例えば複数のニューロンが相互に結合したニューラルネットワークの構造をなす。ニューロンは複数の入力に対して演算を行い、演算結果として１つの値を出力する素子である。ニューロンは、演算に用いられる重み付けの係数及び閾値等の情報を有している。ニューラルネットワークの学習モデルは、一又は複数のデータの入力を受け付ける入力層と、入力層にて受け付けられたデータに対して演算処理を行う中間層と、中間層の演算結果を集約して一又は複数の値を出力する出力層とを備えている。深層学習及び強化学習等の機械学習の処理は、予め与えられた多数の教師データを用いて、ニューラルネットワークを構成する各ニューロンの係数及び閾値等に適切な値を設定する処理である。カメラ１が用いる学習モデルは、例えばニューラルネットワークの学習モデルに対して教師データを用いた深層学習がなされた学習済のモデルであり、例えば勾配降下法、確率的勾配降下法又は誤差逆伝播法等の手法により学習がなされる。なお学習モデルを機械学習する処理の詳細は、既存の技術であるため省略する。また学習モデルはニューラルネットワークの構造でなくてもよく、例えばＳＶＭ（Support Vector Machine）又は決定木等であってもよい。 Machine learning of various learning models used by camera 1 according to the present embodiment may be performed by camera 1 or may be performed by a device other than camera 1 . A learning model performs a predetermined calculation on an input value and outputs a calculation result, and the storage unit 23 stores data such as a coefficient of a function that defines this calculation and a threshold as a learning model. A learning model has, for example, a neural network structure in which a plurality of neurons are interconnected. A neuron is an element that performs an operation on a plurality of inputs and outputs one value as an operation result. A neuron has information such as weighting coefficients and threshold values used in calculations. A learning model of a neural network includes an input layer that receives input of one or a plurality of data, an intermediate layer that performs arithmetic processing on the data accepted by the input layer, and aggregating the operation results of the intermediate layers into one or and an output layer that outputs a plurality of values. Machine learning processes such as deep learning and reinforcement learning are processes of setting appropriate values for the coefficients and threshold values of each neuron constituting a neural network using a large amount of previously given teacher data. The learning model used by the camera 1 is, for example, a trained model that has undergone deep learning using teacher data for a neural network learning model, such as a gradient descent method, a stochastic gradient descent method, or an error backpropagation method. Learning is done by the method of Note that the details of the process of machine learning the learning model are omitted because they are existing techniques. Also, the learning model does not have to be a neural network structure, and may be, for example, an SVM (Support Vector Machine) or a decision tree.

また本実施の形態においてカメラ１は、人検出部２１ａによる人の検出処理を撮影した画像を圧縮した圧縮画像で行い、これ以外の処理については非圧縮の画像で行う。カメラ１からサーバ装置３へ送信する画像は、非圧縮の画像のデータである。ただし、人検出部２１ａの検出処理を非圧縮の画像で行ってもよく、不適切画像検出部２１ｂ、顔検出部２１ｃ又は表情向き検出部２１ｄの検出処理を圧縮画像で行ってもよい。カメラ１からサーバ装置３へ送信する画像は非圧縮であることが好ましいが、圧縮画像であってもよい。又は、カメラ１からサーバ装置３へ可逆圧縮の方式で画像を圧縮して送信してもよく、この場合にはサーバ装置３にて受信した圧縮画像を展開する処理を行って元の画像を取得する。 In the present embodiment, the camera 1 performs human detection processing by the human detection unit 21a with a compressed image obtained by compressing a photographed image, and performs other processing with an uncompressed image. The image transmitted from the camera 1 to the server device 3 is uncompressed image data. However, the detection processing of the person detection unit 21a may be performed with an uncompressed image, and the detection processing of the inappropriate image detection unit 21b, the face detection unit 21c, or the facial expression direction detection unit 21d may be performed with a compressed image. The image transmitted from the camera 1 to the server device 3 is preferably uncompressed, but may be a compressed image. Alternatively, the image may be compressed and transmitted from the camera 1 to the server device 3 using a lossless compression method. do.

図３は、本実施の形態に係るサーバ装置３の構成を示すブロック図である。本実施の形態に係るサーバ装置３は、処理部３１、記憶部（ストレージ）３２及び通信部（トランシーバ）３３等を備えて構成されている。なお本実施の形態においては、１つのサーバ装置にて処理が行われるものとして説明を行うが、複数のサーバ装置が分散して処理を行ってもよい。 FIG. 3 is a block diagram showing the configuration of the server device 3 according to this embodiment. The server device 3 according to this embodiment includes a processing unit 31, a storage unit (storage) 32, a communication unit (transceiver) 33, and the like. In this embodiment, the explanation is given assuming that the processing is performed by one server device, but the processing may be performed by a plurality of server devices in a distributed manner.

処理部３１は、ＣＰＵ、ＭＰＵ又はＧＰＵ等の演算処理装置、ＲＯＭ及びＲＡＭ等を用いて構成されている。処理部３１は、記憶部３２に記憶されたサーバプログラム３２ａを読み出して実行することにより、カメラ１から受信した画像を選別する処理、選別した画像を記憶して蓄積する処理、及び、選別した画像を利用者の端末装置５へ送信する処理等の種々の処理を行う。 The processing unit 31 is configured using an arithmetic processing unit such as a CPU, MPU or GPU, ROM, RAM, and the like. By reading and executing the server program 32a stored in the storage unit 32, the processing unit 31 performs processing for selecting images received from the camera 1, processing for storing and accumulating the selected images, and processing for storing the selected images. to the terminal device 5 of the user.

記憶部３２は、例えばハードディスク等の大容量の記憶装置を用いて構成されている。記憶部３２は、処理部３１が実行する各種のプログラム、及び、処理部３１の処理に必要な各種のデータを記憶する。本実施の形態において記憶部３２は、処理部３１が実行するサーバプログラム３２ａを記憶すると共に、カメラ１から受信した画像を記憶して蓄積する画像記憶部３２ｂと、利用者に関する情報を記憶するユーザＤＢ（データベース）３２ｃとが設けられている。 The storage unit 32 is configured using, for example, a large-capacity storage device such as a hard disk. The storage unit 32 stores various programs executed by the processing unit 31 and various data required for processing by the processing unit 31 . In the present embodiment, the storage unit 32 stores a server program 32a executed by the processing unit 31, an image storage unit 32b that stores and accumulates images received from the camera 1, and a user program that stores information about users. A DB (database) 32c is provided.

本実施の形態においてサーバプログラム３２ａは、メモリカード又は光ディスク等の記録媒体９９に記録された態様で提供され、サーバ装置３は記録媒体９９からサーバプログラム３２ａを読み出して記憶部３２に記憶する。ただし、サーバプログラム３２ａは、例えばサーバ装置３の製造段階において記憶部３２に書き込まれてもよい。また例えばサーバプログラム３２ａは、遠隔の他のサーバ装置等が配信するものをサーバ装置３が通信にて取得してもよい。例えばサーバプログラム３２ａは、記録媒体９９に記録されたものを書込装置が読み出してサーバ装置３の記憶部３２に書き込んでもよい。サーバプログラム３２ａは、ネットワークを介した配信の態様で提供されてもよく、記録媒体９９に記録された態様で提供されてもよい。 In this embodiment, the server program 32a is provided in a form recorded in a recording medium 99 such as a memory card or an optical disk, and the server device 3 reads the server program 32a from the recording medium 99 and stores it in the storage unit 32. However, the server program 32a may be written in the storage unit 32 during the manufacturing stage of the server device 3, for example. Further, for example, the server program 32a may be distributed by another remote server device or the like and acquired by the server device 3 through communication. For example, the server program 32 a may be recorded in the recording medium 99 and read by a writing device and written in the storage unit 32 of the server device 3 . The server program 32 a may be provided in the form of distribution via a network, or may be provided in the form of being recorded on the recording medium 99 .

画像記憶部３２ｂは、カメラ１から受信し、サーバ装置３にて選別された画像を記憶する。画像記憶部３２ｂは、例えばカメラ１が設置された施設１００のＩＤ、カメラ１が複数設置される場合には各カメラ１のＩＤ、及び、画像が撮影された日時等により複数の画像を分類して記憶する。なお本実施の形態においては、カメラ１からサーバ装置３へ非圧縮の画像が送信されるが、受信した画像に対するサーバ装置３の各種の検出及び判定等の処理を終えた後であれば、画像記憶部３２ｂに記憶する画像及び端末装置５へ送信する画像は受信した画像を圧縮した画像であってもよい。 The image storage unit 32b stores images received from the camera 1 and selected by the server device 3. FIG. The image storage unit 32b classifies a plurality of images according to, for example, the ID of the facility 100 where the camera 1 is installed, the ID of each camera 1 when multiple cameras 1 are installed, and the date and time when the image was taken. memorize. In the present embodiment, an uncompressed image is transmitted from the camera 1 to the server device 3. However, after the server device 3 completes various detection and determination processes for the received image, the image The images stored in the storage unit 32b and the images transmitted to the terminal device 5 may be images obtained by compressing the received images.

ユーザＤＢ３２ｃは、本実施の形態に係る情報処理システムが提供するサービスについて利用登録を行った利用者に関する情報を記憶するデータベースである。ユーザＤＢ３２ｃには、例えば利用者として保護者の名前及びＩＤ等、子供の名前及びＩＤ等、施設１００のＩＤ、並びに、画像の送信先（例えば端末装置５のＩＤ又はメールアドレス等）の情報が対応付けて記憶される。またユーザＤＢ３２ｃには、カメラ１による撮影の被写体となり得る人、本例において子供の顔写真等の画像又はこのような画像から抽出された子供の特徴に関するデータ等が記憶されてもよい。 The user DB 32c is a database that stores information about users who have registered for use of services provided by the information processing system according to the present embodiment. The user DB 32c contains information such as the name and ID of the guardian as the user, the name and ID of the child, the ID of the facility 100, and the destination of the image (for example, the ID or email address of the terminal device 5). They are stored in association with each other. Also, the user DB 32c may store images such as faces of persons who can be photographed by the camera 1, children's facial photographs in this example, or data relating to characteristics of children extracted from such images.

サーバ装置３の通信部３３は、携帯電話通信網、無線ＬＡＮ及びインターネット等を含むネットワークＮを介して、種々の装置との間で通信を行う。本実施の形態において通信部３３は、ネットワークＮを介して、カメラ１及び端末装置５との間で通信を行う。通信部３３は、処理部３１から与えられたデータを他の装置へ送信すると共に、他の装置から受信したデータを処理部３１へ与える。 The communication unit 33 of the server device 3 communicates with various devices via a network N including a mobile phone communication network, a wireless LAN, the Internet, and the like. In this embodiment, the communication unit 33 communicates with the camera 1 and the terminal device 5 via the network N. FIG. The communication unit 33 transmits data received from the processing unit 31 to other devices, and provides the processing unit 31 with data received from other devices.

なお記憶部３２は、サーバ装置３に接続された外部記憶装置であってよい。またサーバ装置３は、複数のコンピュータを含んで構成されるマルチコンピュータであってよく、ソフトウェアによって仮想的に構築された仮想マシンであってもよい。またサーバ装置３は、上記の構成に限定されず、例えば可搬型の記憶媒体に記憶された情報を読み取る読取部、操作入力を受け付ける入力部、又は、画像を表示する表示部等を含んでもよい。 Note that the storage unit 32 may be an external storage device connected to the server device 3 . The server device 3 may be a multicomputer including a plurality of computers, or may be a virtual machine virtually constructed by software. The server device 3 is not limited to the above configuration, and may include, for example, a reading unit that reads information stored in a portable storage medium, an input unit that receives operation inputs, or a display unit that displays images. .

また本実施の形態に係るサーバ装置３には、記憶部３２に記憶されたサーバプログラム３２ａを処理部３１が読み出して実行することにより、画像受信処理部３１ａ、行動判定部３１ｂ、ＩＤ付与部３１ｃ、画像選別部３１ｄ、画像補正部３１ｅ及び画像送信処理部３１ｆ等が、ソフトウェア的な機能部として処理部３１に実現される。 In addition, in the server device 3 according to the present embodiment, the server program 32a stored in the storage unit 32 is read out and executed by the processing unit 31, so that the image reception processing unit 31a, the behavior determination unit 31b, and the ID provision unit 31c , an image selection unit 31d, an image correction unit 31e, an image transmission processing unit 31f, and the like are implemented in the processing unit 31 as software functional units.

画像受信処理部３１ａは、カメラ１が送信する画像を通信部３３にて受信する処理を行う。例えば画像受信処理部３１ａは、受信した画像に付されたデータに基づいて、画像の撮影日時、撮影したカメラ１のＩＤ及びカメラ１が設置された施設１００のＩＤ等に対応付けて、画像を分類して記憶部３２に一時的に記憶する。なお画像受信処理部３１ａによって記憶部３２に一時的に記憶された画像は、画像選別処理により選別がなされ、記憶部３２の画像記憶部３２ｂに記憶して蓄積されるか、又は、記憶部３２から消去される。 The image reception processing unit 31 a performs processing for receiving an image transmitted by the camera 1 at the communication unit 33 . For example, the image reception processing unit 31a associates the date and time of image capture, the ID of the camera 1 that captured the image, the ID of the facility 100 in which the camera 1 is installed, etc., based on the data attached to the received image, and generates the image. They are classified and temporarily stored in the storage unit 32 . Note that the images temporarily stored in the storage unit 32 by the image reception processing unit 31a are sorted by image sorting processing and stored in the image storage unit 32b of the storage unit 32 for accumulation. is erased from

行動判定部３１ｂは、カメラ１から受信した画像に写された人がどのような行動を行っているかを判定する処理を行う。行動判定部３１ｂは、例えば予め機械学習がなされた学習モデルを用いて、画像に写された人の行動を判定する。学習モデルは、例えば画像のデータを入力として受け付け、この画像に写された人の行動が所定の行動である確信度を出力するように予め機械学習がなされる。本実施の形態においては、所定の行動として例えばハイハイをしている確信度を出力する学習モデル、及び、食事をしている確信度を出力する学習モデル等のように、行動毎に学習モデルが予め作成される。学習モデルは、例えば画像のデータと、この画像に写された人が所定の行動を行っているか否かを示すラベルとを対応付けた教師データを用いて機械学習がなされる。本実施の形態において行動判定部３１ｂは、カメラ１から受信した画像のデータを各学習モデルへ入力し、学習モデルが出力する各行動の確信度を取得する。行動判定部３１ｂは、複数の行動に関する確信度を比較し、最も確信度が高い行動を、画像に写された人が行っている行動と判定する。行動判定部３１ｂは、判定結果に関するデータを画像に付す。 The behavior determination unit 31b performs processing for determining what kind of behavior the person photographed in the image received from the camera 1 is doing. The action determination unit 31b uses, for example, a machine-learned learning model in advance to determine the action of the person depicted in the image. The learning model receives, for example, image data as an input, and performs machine learning in advance so as to output a degree of certainty that the action of a person photographed in this image is a predetermined action. In the present embodiment, there is a learning model for each action, such as a learning model that outputs a certainty of crawling as a predetermined action, and a learning model that outputs a certainty of eating. created in advance. The learning model is machine-learned using, for example, teacher data in which image data is associated with a label indicating whether or not a person photographed in the image is performing a predetermined action. In the present embodiment, the action determination unit 31b inputs image data received from the camera 1 to each learning model, and acquires the certainty of each action output by the learning model. The behavior determination unit 31b compares the degrees of certainty regarding a plurality of actions, and determines the action with the highest degree of certainty as the action taken by the person in the image. The action determination unit 31b attaches data regarding the determination result to the image.

ＩＤ付与部３１ｃは、カメラ１から受信した画像に写された人を識別するＩＤを付与する処理を行う。ＩＤ付与部３１ｃは、例えば画像に写された人の顔の特徴量を抽出し、ユーザＤＢ３２ｃに登録された顔の画像又はこの画像から抽出した特徴量との比較を行う。ＩＤ付与部３１ｃは、カメラ１の画像に写された人の顔の特徴量と一致する又は類似する特徴量がユーザＤＢ３２ｃに登録されている利用者（子供）を検索し、該当する利用者のＩＤをこの画像に付す。ＩＤ付与部３１ｃは、例えば顔の画像を入力として受け付けて、この顔の特徴量として多次元のベクトル情報を出力する学習モデルを用いて、カメラ１が撮影した画像に写された人の顔の特徴量の抽出を行う。ユーザＤＢ３２ｃに顔の特徴量を登録する際にも、同じ学習モデルを用いることができる。ＩＤ付与部３１ｃは、例えば２つの特徴量に対応する２つのベクトルの距離等を算出し、この距離が閾値以下であり、且つ、距離が最も小さいものを、顔の特徴が一致する又は類似すると判断することができる。 The ID assigning unit 31c performs a process of assigning an ID for identifying a person in the image received from the camera 1. FIG. For example, the ID assigning unit 31c extracts the feature amount of the person's face in the image, and compares it with the face image registered in the user DB 32c or the feature amount extracted from this image. The ID assigning unit 31c searches for a user (child) whose feature amount matches or is similar to the feature amount of the person's face captured in the image of the camera 1 is registered in the user DB 32c. An ID is attached to this image. The ID assigning unit 31c receives, for example, an image of a face as an input, and uses a learning model that outputs multidimensional vector information as the feature quantity of the face to identify the face of the person captured in the image captured by the camera 1. Extract features. The same learning model can also be used when registering facial features in the user DB 32c. The ID assigning unit 31c calculates, for example, the distance between two vectors corresponding to the two feature amounts, and determines that the distance is equal to or less than a threshold and the shortest distance is considered to match or resemble the facial features. can judge.

画像選別部３１ｄは、カメラ１から受信し、行動判定部３１ｂによる行動判定及びＩＤ付与部３１ｃによるＩＤ付与が行われた画像について、この画像に付された情報に基づく選別を行う。なお本実施の形態において画像選別部３１ｄは、行動判定部３１ｂ及びＩＤ付与部３１ｃの処理により付される情報のみではなく、カメラ１にて行われて種々の処理に基づいて画像に付された情報を利用して、この画像を記憶するか除去するかの選別を行う。画像選別部３１ｄは、画像に付された情報に含まれる例えば笑顔の確信度、顔の向きに関する確信度、及び、行動判定に関する確信度等の情報に基づいて、この画像に対するスコアリング（採点）を行って、画像のスコアが閾値を超えるか否かに基づいて選別を行う。スコアリングは、例えば笑顔の確信度が０～１の範囲である場合に、確信度を１０倍して小数点以下を四捨五入して得られる０～１０の値を笑顔のスコアとすることができる。行動判定部３１ｂは、複数の情報に基づいてそれぞれスコアリングを行い、複数のスコアの合計値を画像のスコアとすることができる。なおスコアリングの方法はこれに限るものではなく、種々の方法が採用され得る。行動判定部３１ｂは、例えば笑顔のスコアを２０点満点とし、顔の向きのスコアを１０点満点とするなど、情報の種類に応じた重み付けを行ってもよい。 The image selection unit 31d selects images received from the camera 1 and subjected to action determination by the action determination unit 31b and ID assignment by the ID assignment unit 31c based on information attached to the images. Note that in the present embodiment, the image selection unit 31d not only processes the information added by the action determination unit 31b and the ID addition unit 31c, but also the information added to the image based on various processes performed by the camera 1. The information is used to choose whether to store or remove this image. The image selection unit 31d performs scoring on the image based on information included in the information attached to the image, such as the degree of certainty of a smile, the degree of certainty of a face orientation, and the degree of certainty of action determination. to sort based on whether the score of the image exceeds the threshold. For scoring, for example, when the degree of certainty of a smile is in the range of 0 to 1, the value of 0 to 10 obtained by multiplying the degree of certainty by 10 and rounding off to the nearest whole number can be used as the score of the smile. The action determination unit 31b can perform scoring based on a plurality of pieces of information, and use the total value of the plurality of scores as the score of the image. Note that the scoring method is not limited to this, and various methods can be adopted. The behavior determination unit 31b may perform weighting according to the type of information, for example, the score of a smile is given a maximum of 20 points, and the score of a face orientation is given a maximum of 10 points.

また画像選別部３１ｄは、ＩＤ付与部３１ｃが付与したＩＤに基づいて、利用者（子供）毎に撮影された画像の枚数をカウントし、１日等の所定期間に画像記憶部３２ｂに記憶して蓄積する画像数を均一化するように、選別する画像の枚数を調整する。本実施の形態において画像選別部３１ｄは、利用者毎の画像の枚数調整を、上記のスコアリングにおいて算出した画像のスコアと比較する閾値を増減することで行う。画像選別部３１ｄは、例えばカメラ１により撮影された（カメラ１から受信した）画像の枚数が少ない利用者について、スコアと比較する閾値の値を下げることで、この利用者が写された画像が選別される可能性を高め、選別される画像の数を増加させる。また画像選別部３１ｄは、画像が多い利用者について、スコアと比較する閾値を上げることで、この利用者が写された画像が選別される可能性を低減し、選別される画像の数を減少させる。 The image selection unit 31d counts the number of images taken for each user (child) based on the ID assigned by the ID assignment unit 31c, and stores the number of images in the image storage unit 32b for a predetermined period such as one day. The number of images to be sorted is adjusted so that the number of images to be accumulated is uniformed. In the present embodiment, the image selection unit 31d adjusts the number of images for each user by increasing or decreasing a threshold to be compared with the image score calculated in the above scoring. For example, the image selection unit 31d lowers the threshold value to be compared with the score for a user whose number of images taken by the camera 1 (received from the camera 1) is small, so that the image of the user is To increase the likelihood of being sorted out and increase the number of images sorted out. In addition, the image selection unit 31d reduces the possibility that the images of the user with many images will be selected by increasing the threshold for comparison with the score, thereby reducing the number of images to be selected. Let

画像選別部３１ｄは、各画像について算出したスコアと閾値とを比較し、閾値を超えるスコアが付された画像を画像記憶部３２ｂに記憶して蓄積する。なお画像選別部３１ｄによる画像の選別は、例えばカメラ１からの画像を受信する都度に、受信した画像に対して行ってもよく、また例えば１日に１回等の所定のタイミングでその日に受信した画像についてまとめて行ってもよい。いずれの場合であっても、サーバ装置３は、カメラ１から受信した画像を、画像選別部３１ｄによる選別がなされるまで記憶部３２に一時的に記憶しておく。画像選別部３１ｄは、スコア及び閾値との比較に基づく選別の結果、画像記憶部３２ｂに記憶しないと判断した画像については、記憶部３２から消去（除去）してよい。 The image selection unit 31d compares the score calculated for each image with a threshold value, and stores and accumulates images given scores exceeding the threshold value in the image storage unit 32b. Note that the selection of images by the image selection unit 31d may be performed, for example, on the received images each time an image is received from the camera 1, or may be performed at a predetermined timing, such as once a day, on the same day. It may be performed collectively for the images that have been processed. In either case, the server device 3 temporarily stores the images received from the camera 1 in the storage unit 32 until the images are sorted by the image sorting unit 31d. The image selection unit 31d may delete (remove) from the storage unit 32 images that are determined not to be stored in the image storage unit 32b as a result of selection based on comparison with scores and thresholds.

画像補正部３１ｅは、カメラ１から受信した画像に対する画像補正の処理を行う。画像補正部３１ｅは、例えばカメラ１から受信した画像がピンボケ等により不鮮明である場合に、画像を鮮明化する画像処理を行う。また画像補正部３１ｅは、例えば画像の端に写されている利用者について、この利用者及び周囲の所定範囲の画像領域を抽出し、抽出した画像領域を拡大して新たな画像として扱う。このときに画像補正部３１ｅは、拡大した画像の画質が低減することを抑制すべく、画素間の画素値を補間して解像度を増す技術、いわゆる超解像度技術を用いた画像の拡大を行う。なお超解像度技術は、既存の技術であるため詳細な説明が省略するが、近年では深層学習を用いた超解像度技術が普及しており、本実施の形態に係るサーバ装置３もこの超解像度技術を用いてよい。また画像補正部３１ｅによる画像の補正は、どのようなタイミングで行われてもよく、例えばカメラ１から画像を受信したタイミングで行われてもよく、また例えば画像選別部３１ｄにより画像記憶部３２ｂに記憶すると選別された後のタイミングで行われてもよい。 The image correction unit 31 e performs image correction processing on the image received from the camera 1 . For example, when the image received from the camera 1 is blurred due to blurring or the like, the image correction unit 31e performs image processing for sharpening the image. Further, the image correction unit 31e extracts an image area of a predetermined range of the user and surroundings of the user appearing at the edge of the image, enlarges the extracted image area, and treats it as a new image. At this time, the image correction unit 31e enlarges the image using a technique for increasing the resolution by interpolating pixel values between pixels, that is, a so-called super-resolution technique, in order to prevent deterioration of the image quality of the enlarged image. Since the super-resolution technology is an existing technology, a detailed explanation is omitted, but in recent years, the super-resolution technology using deep learning has become widespread, and the server device 3 according to the present embodiment also uses this super-resolution technology. can be used. Further, the image correction by the image correction unit 31e may be performed at any timing, for example, at the timing when the image is received from the camera 1. For example, the image selection unit 31d may store the image in the image storage unit 32b. Storing may be performed at a timing after sorting.

画像送信処理部３１ｆは、画像記憶部３２ｂに記憶した画像を利用者の端末装置５へ送信する処理を行う。画像送信処理部３１ｆは、どのようなタイミングで、どのような方法で、どの程度の枚数の画像を端末装置５へ送信してもよい。例えば画像送信処理部３１ｆは、１日に１回の所定タイミングで、子供が撮影された画像を、この子供の保護者の端末装置５へ送信してもよい。このときに画像送信処理部３１ｆは、ＩＤ付与部３１ｃが画像に付したＩＤに基づいて、画像に写された利用者（子供）が誰であるかをユーザＤＢ３２ｃに登録されたＩＤを参照して判断し、この利用者に対応付けられた端末装置５へ画像を送信する。また例えば画像送信処理部３１ｆは、端末装置５からの要求に基づいて画像記憶部３２ｂから画像を読み出して要求元の端末装置５へ画像を送信してもよい。 The image transmission processing unit 31f performs processing for transmitting the image stored in the image storage unit 32b to the terminal device 5 of the user. The image transmission processing unit 31f may transmit any number of images to the terminal device 5 at any timing, using any method. For example, the image transmission processing unit 31f may transmit an image in which a child is photographed to the terminal device 5 of the child's guardian at a predetermined timing once a day. At this time, the image transmission processing unit 31f refers to the ID registered in the user DB 32c to determine who the user (child) shown in the image is based on the ID attached to the image by the ID assigning unit 31c. Then, the image is transmitted to the terminal device 5 associated with this user. Further, for example, the image transmission processing unit 31f may read an image from the image storage unit 32b based on a request from the terminal device 5 and transmit the image to the terminal device 5 that is the source of the request.

図４は、本実施の形態に係る端末装置５の構成を示すブロック図である。本実施の形態に係る端末装置５は、処理部５１、記憶部（ストレージ）５２、通信部（トランシーバ）５３、表示部（ディスプレイ）５４及び操作部５５等を備えて構成されている。端末装置５は、対象者を見守る家族又は後見人等のユーザが使用する装置であり、例えばスマートフォン、タブレット型端末装置又はパーソナルコンピュータ等の情報処理装置を用いて構成され得る。 FIG. 4 is a block diagram showing the configuration of the terminal device 5 according to this embodiment. The terminal device 5 according to the present embodiment includes a processing unit 51, a storage unit (storage) 52, a communication unit (transceiver) 53, a display unit (display) 54, an operation unit 55, and the like. The terminal device 5 is a device used by a user such as a family member or a guardian who watches over the target person, and may be configured using an information processing device such as a smart phone, a tablet terminal device, or a personal computer, for example.

処理部５１は、ＣＰＵ又はＭＰＵ等の演算処理装置、ＲＯＭ及び等を用いて構成されている。処理部５１は、記憶部５２に記憶されたプログラム５２ａを読み出して実行することにより、サーバ装置３から送信される画像を受信する処理、受信した画像を表示する処理、及び、サーバ装置３が記憶している画像を検索する処理等の種々の処理を行う。 The processing unit 51 is configured using an arithmetic processing unit such as a CPU or MPU, a ROM, and the like. The processing unit 51 reads out and executes a program 52a stored in the storage unit 52, thereby performing processing for receiving an image transmitted from the server device 3, processing for displaying the received image, and processing performed by the server device 3. Various processing such as processing for retrieving an image that is

記憶部５２は、例えばフラッシュメモリ等の不揮発性のメモリ素子を用いて構成されている。記憶部５２は、処理部５１が実行する各種のプログラム、及び、処理部５１の処理に必要な各種のデータを記憶する。本実施の形態において記憶部５２は、処理部５１が実行するプログラム５２ａを記憶している。本実施の形態においてプログラム５２ａは遠隔のサーバ装置等により配信され、これを端末装置５が通信にて取得し、記憶部５２に記憶する。ただしプログラム５２ａは、例えば端末装置５の製造段階において記憶部５２に書き込まれてもよい。例えばプログラム５２ａは、メモリカード又は光ディスク等の記録媒体９８に記録されたプログラム５２ａを端末装置５が読み出して記憶部５２に記憶してもよい。例えばプログラム５２ａは、記録媒体９８に記録されたものを書込装置が読み出して端末装置５の記憶部５２に書き込んでもよい。プログラム５２ａは、ネットワークを介した配信の態様で提供されてもよく、記録媒体９８に記録された態様で提供されてもよい。 The storage unit 52 is configured using a non-volatile memory device such as a flash memory, for example. The storage unit 52 stores various programs executed by the processing unit 51 and various data necessary for the processing of the processing unit 51 . In the present embodiment, the storage unit 52 stores a program 52a executed by the processing unit 51. FIG. In this embodiment, the program 52a is distributed by a remote server device or the like, and the terminal device 5 acquires it through communication and stores it in the storage unit 52. FIG. However, the program 52a may be written in the storage unit 52 during the manufacturing stage of the terminal device 5, for example. For example, the program 52a may be stored in the storage unit 52 after the terminal device 5 reads the program 52a recorded in the recording medium 98 such as a memory card or an optical disk. For example, the program 52 a may be recorded in the recording medium 98 and read by a writing device and written in the storage unit 52 of the terminal device 5 . The program 52 a may be provided in the form of distribution via a network, or may be provided in the form of being recorded on the recording medium 98 .

通信部５３は、携帯電話通信網、無線ＬＡＮ及びインターネット等を含むネットワークＮを介して、種々の装置との間で通信を行う。本実施の形態において通信部５３は、ネットワークＮを介して、サーバ装置３との間で通信を行う。通信部５３は、処理部５１から与えられたデータを他の装置へ送信すると共に、他の装置から受信したデータを処理部５１へ与える。 The communication unit 53 communicates with various devices via a network N including a mobile phone communication network, a wireless LAN, the Internet, and the like. In the present embodiment, the communication unit 53 communicates with the server device 3 via the network N. FIG. The communication unit 53 transmits the data given from the processing unit 51 to other devices, and gives the data received from the other devices to the processing unit 51 .

表示部５４は、液晶ディスプレイ等を用いて構成されており、処理部５１の処理に基づいて種々の画像及び文字等を表示する。操作部５５は、ユーザの操作を受け付け、受け付けた操作を処理部５１へ通知する。例えば操作部５５は、機械式のボタン又は表示部５４の表面に設けられたタッチパネル等の入力デバイスによりユーザの操作を受け付ける。また例えば操作部５５は、マウス及びキーボード等の入力デバイスであってよく、これらの入力デバイスは端末装置５に対して取り外すことが可能な構成であってもよい。 The display unit 54 is configured using a liquid crystal display or the like, and displays various images, characters, etc. based on the processing of the processing unit 51 . The operation unit 55 receives user operations and notifies the processing unit 51 of the received operations. For example, the operation unit 55 accepts user operations through input devices such as mechanical buttons or a touch panel provided on the surface of the display unit 54 . Further, for example, the operation unit 55 may be an input device such as a mouse and a keyboard, and these input devices may be detachable from the terminal device 5 .

また本実施の形態に係る端末装置５は、記憶部５２に記憶されたプログラム５２ａを処理部５１が読み出して実行することにより、画像受信処理部５１ａ及び画像検索処理部５１ｂ等がソフトウェア的な機能部として処理部５１に実現される。なおプログラム５２ａは、本実施の形態に係る情報処理システムに専用のプログラムであってもよく、インターネットブラウザ又はウェブブラウザ等の汎用のプログラムであってもよい。 In the terminal device 5 according to the present embodiment, the program 52a stored in the storage unit 52 is read out and executed by the processing unit 51, so that the image reception processing unit 51a, the image search processing unit 51b, and the like function as software. It is implemented in the processing unit 51 as a unit. The program 52a may be a program dedicated to the information processing system according to the present embodiment, or may be a general-purpose program such as an Internet browser or web browser.

画像受信処理部５１ａは、サーバ装置３が送信する画像を通信部５３にて受信する処理を行う。画像受信処理部５１ａは、例えばサーバ装置３がプッシュ送信する画像を受信した場合に、表示部５４にメッセージ表示等を行うことによって、画像の受信を通知する処理を行う。また画像受信処理部５１ａは、サーバ装置３から受信した画像を記憶部５２に記憶すると共に、表示部５４に表示する処理を行う。 The image reception processing unit 51 a performs processing for receiving an image transmitted by the server device 3 at the communication unit 53 . For example, when the server apparatus 3 receives an image push-transmitted, the image reception processing unit 51a performs processing for notifying reception of the image by displaying a message or the like on the display unit 54 . The image reception processing unit 51a also stores the image received from the server device 3 in the storage unit 52 and displays the image on the display unit 54 .

画像検索処理部５１ｂは、サーバ装置３が画像記憶部３２ｂに記憶して蓄積した複数の画像の中から、利用者が望む条件の画像を検索するための処理を行う。画像検索処理部５１ｂは、例えば検索条件の入力画面を表示部５４に表示して、利用者による検索条件の入力を受け付ける。画像検索処理部５１ｂは、例えば笑顔の画像、ハイハイ等の特定の行動をしている画像、又は、特定の日時の画像等のように、利用者から種々の検索条件の入力を受け付ける。画像検索処理部５１ｂは、受け付けた検索条件を含む検索要求をサーバ装置３へ送信する。この検索要求に応じてサーバ装置３が検索条件に適合する画像を画像記憶部３２ｂから抽出し、抽出した一又は複数の画像を要求元の端末装置５へ送信する。画像検索処理部５１ｂは、サーバ装置３から検索結果として送信された画像を受信して、表示部５４に表示する。 The image search processing unit 51b performs processing for searching for an image satisfying conditions desired by the user from among a plurality of images stored and accumulated by the server apparatus 3 in the image storage unit 32b. The image search processing unit 51b displays, for example, a search condition input screen on the display unit 54, and receives input of search conditions by the user. The image search processing unit 51b receives input of various search conditions from the user, such as an image of a smiling face, an image of a specific action such as crawling, or an image of a specific date and time. The image search processing unit 51b transmits to the server device 3 a search request including the accepted search conditions. In response to this search request, the server device 3 extracts images that match the search conditions from the image storage unit 32b, and transmits one or a plurality of extracted images to the terminal device 5 that made the request. The image search processing unit 51b receives images transmitted as search results from the server device 3 and displays them on the display unit 54. FIG.

＜カメラ１による画像選別処理＞
本実施の形態に係る情報処理システムでは、施設１００に設置されたカメラ１が周期的に撮影を行っている。カメラ１の撮影周期は、例えば１秒に１回～１分に１回等とすることができる。なお本実施の形態においてカメラ１は、静止画像として撮影を行うものとするが、動画像として撮影を行ってもよく、この場合には撮影周期は動画像のフレームレート等により定まる。またカメラ１は、１日中（２４時間）に亘って継続的に撮影を行ってもよいが、例えば施設１００の営業時間内等に限って撮影を行ってもよい。カメラ１による撮影の開始及び停止は、例えば予め時刻が設定されてもよく、また例えば施設１００の従業員の操作等によって行われてもよい。 <Image selection processing by camera 1>
In the information processing system according to this embodiment, the camera 1 installed in the facility 100 periodically takes pictures. The imaging cycle of the camera 1 can be, for example, once per second to once per minute. In the present embodiment, the camera 1 is assumed to shoot still images, but may shoot moving images. In this case, the shooting cycle is determined by the frame rate of the moving images. In addition, the camera 1 may continuously take pictures throughout the day (24 hours), but may also take pictures only during business hours of the facility 100, for example. The start and stop of photographing by the camera 1 may be set in advance, for example, or may be performed by an employee of the facility 100, for example.

本実施の形態に係るカメラ１は、撮影した画像が所定の条件（第１の条件）を満たすか否かを判定することによって画像を選別し、所定の条件を満たすと判定した画像をサーバ装置３へ送信し、所定の条件を満たさないと判定した画像を破棄（除去）する。本実施の形態においてカメラ１が判定する条件には、画像中に人が写されていること、不適切な画像ではない事、画像中に写された人の顔の表情が笑顔であること、及び、顔の向きが正面向きであること等の条件が含まれる。 The camera 1 according to the present embodiment selects images by determining whether or not a captured image satisfies a predetermined condition (first condition), and sends an image determined to satisfy the predetermined condition to the server device. 3, and discards (removes) images determined not to satisfy a predetermined condition. In the present embodiment, the conditions for determination by the camera 1 are that a person is captured in the image, that the image is not inappropriate, that the facial expression of the person captured in the image is smiling, Also, conditions such as that the face is facing forward are included.

図５は、本実施の形態に係るカメラ１が行う画像選別処理の手順を示すフローチャートである。本実施の形態に係るカメラ１の処理部２１は、撮影部１１にて撮影を行う（ステップＳ１）。処理部２１は、撮影により得られた画像を記憶部２３に一時的に記憶する（ステップＳ２）。処理部２１の人検出部２１ａは、撮影した画像を圧縮した圧縮画像を生成する（ステップＳ３）。なお画像を圧縮する方法には、どのような方法が採用されてもよい。人検出部２１ａは、生成した圧縮画像を基に、この画像に写されている人を検出する人検出処理を行う（ステップＳ４）。このときに人検出部２１ａは、画像から人を検出する学習済の学習モデルを利用し、圧縮画像を学習モデルへ入力して、学習モデルが出力する検出結果を取得する。なお人検出処理の終了後、人検出部２１ａは生成した圧縮画像を破棄してよい。人検出部２１ａは、ステップＳ４の処理の結果に基づいて、撮影画像に人が写されているか否かを判定する（ステップＳ５）。人が写されていない場合（Ｓ５：ＮＯ）、人検出部２１ａは、記憶部２３に一時的に記憶した撮影画像を除去して（ステップＳ１２）、ステップＳ１へ処理を戻す。 FIG. 5 is a flowchart showing the procedure of image selection processing performed by camera 1 according to the present embodiment. The processing unit 21 of the camera 1 according to the present embodiment performs photographing by the photographing unit 11 (step S1). The processing unit 21 temporarily stores the captured image in the storage unit 23 (step S2). The human detection unit 21a of the processing unit 21 generates a compressed image by compressing the photographed image (step S3). Any method may be adopted as the method of compressing the image. Based on the generated compressed image, the person detection unit 21a performs a person detection process for detecting a person appearing in this image (step S4). At this time, the human detection unit 21a uses a trained learning model for detecting a person from an image, inputs the compressed image to the learning model, and acquires the detection result output by the learning model. Note that after the human detection process ends, the human detection unit 21a may discard the generated compressed image. The person detection unit 21a determines whether or not a person is shown in the captured image based on the result of the processing in step S4 (step S5). If no person is captured (S5: NO), the person detection unit 21a removes the captured image temporarily stored in the storage unit 23 (step S12), and returns the process to step S1.

撮影画像に人が写されている場合（Ｓ５：ＹＥＳ）、処理部２１の不適切画像検出部２１ｂは、撮影画像が不適切な画像であるか否かを判定する（ステップＳ６）。このときに不適切画像検出部２１ｂは、画像の適切度を出力する学習済の学習モデルを用い、撮影部１１が撮影した（非圧縮の）画像を学習モデルへ入力し、学習モデルが出力する適切度を取得する。不適切画像検出部２１ｂは、取得した適切度が閾値を超えない画像を不適切と判定することができる。撮影画像が不適切であると判定した場合（Ｓ６：ＹＥＳ）、不適切画像検出部２１ｂは、記憶部２３に一時的に記憶した撮影画像を除去して（ステップＳ１２）、ステップＳ１へ処理を戻す。 If a person is shown in the captured image (S5: YES), the inappropriate image detection unit 21b of the processing unit 21 determines whether or not the captured image is an inappropriate image (step S6). At this time, the inappropriate image detection unit 21b uses a trained learning model that outputs the appropriateness of the image, inputs the (uncompressed) image captured by the imaging unit 11 to the learning model, and the learning model outputs Get relevance. The inappropriate image detection unit 21b can determine as inappropriate an image for which the acquired appropriateness does not exceed the threshold. If it is determined that the captured image is inappropriate (S6: YES), the inappropriate image detection unit 21b removes the captured image temporarily stored in the storage unit 23 (step S12), and proceeds to step S1. return.

撮影画像が不適切ではないと判定した場合（Ｓ６：ＮＯ）、処理部２１の顔検出部２１ｃは、ステップＳ４の人検出処理の結果に基づいて、撮影画像から人の顔を検出する顔検出処理を行う（ステップＳ７）。このときに顔検出部２１ｃは、画像及び人の検出結果に基づいてこの画像に写された人の顔を検出する学習済の学習モデルを利用し、撮影部１１が撮影した（非圧縮の）画像を学習モデルへ入力し、学習モデルが出力する顔検出結果を取得する。 When it is determined that the photographed image is not inappropriate (S6: NO), the face detection unit 21c of the processing unit 21 performs face detection for detecting a person's face from the photographed image based on the result of the person detection processing in step S4. Processing is performed (step S7). At this time, the face detection unit 21c uses a learned learning model for detecting the face of a person captured in the image based on the image and the detection result of the person. Input an image to the learning model and obtain the face detection result output by the learning model.

次いで、処理部２１の表情向き検出部２１ｄは、ステップＳ７の顔検出処理の結果に基づいて、人の顔の表情を検出する処理を行う（ステップＳ８）。ここで本実施の形態において表情向き検出部２１ｄは、撮影画像に写された人の表情が笑顔である確信度を算出する。表情向き検出部２１ｄは、画像に写された人の表情が笑顔である確信度を出力する学習済の学習モデルを利用し、撮影部１１が撮影した（非圧縮の）画像を学習モデルへ入力し、学習モデルが出力する笑顔の確信度を取得する。表情向き検出部２１ｄは、取得した確信度が閾値を超えるか否かに応じて、画像に写された人の表情が笑顔であるか否かを判定することができる。 Next, the facial expression orientation detection unit 21d of the processing unit 21 performs processing for detecting the facial expression of a person based on the result of the face detection processing in step S7 (step S8). Here, in the present embodiment, facial expression direction detection unit 21d calculates the degree of certainty that the facial expression of a person captured in a photographed image is a smile. The facial expression orientation detection unit 21d uses a learned learning model that outputs a degree of confidence that the facial expression of a person captured in the image is a smile, and inputs the (uncompressed) image captured by the imaging unit 11 to the learning model. and obtain the confidence level of the smile output by the learning model. The facial expression orientation detection unit 21d can determine whether or not the facial expression of the person captured in the image is a smile, depending on whether or not the acquired certainty factor exceeds a threshold value.

また表情向き検出部２１ｄは、ステップＳ７の顔検出処理の結果に基づいて、人の顔の向きを検出する処理を行う（ステップＳ９）。ここで本実施の形態において表情向き検出部２１ｄは、撮影画像に写された人の顔が正面向きである確信度を算出する。表情向き検出部２１ｄは、画像に写された人の顔の向きが正面である確信度を出力する学習済の学習モデルを利用し、撮影部１１が撮影した（非圧縮の）画像を学習モデルへ入力し、学習モデルが出力する確信度を取得する。表情向き検出部２１ｄは、取得した確信度が閾値を超えるか否かに応じて、画像に写された人の顔の向きが正面であるか否かを判定することができる。 The facial expression orientation detection unit 21d performs processing for detecting the orientation of a person's face based on the result of the face detection processing in step S7 (step S9). Here, in the present embodiment, the facial expression orientation detection unit 21d calculates the degree of certainty that the face of the person photographed in the photographed image is facing forward. The facial expression orientation detection unit 21d uses a learned learning model that outputs a degree of certainty that the face of a person photographed in the image is facing forward, and uses the (uncompressed) image photographed by the photographing unit 11 as a learning model. to get the confidence level output by the learning model. The facial expression orientation detection unit 21d can determine whether or not the face of the person captured in the image is facing forward, depending on whether or not the obtained degree of certainty exceeds the threshold.

処理部２１の画像選別部２１ｅは、ステップＳ８の表情検出処理の結果及びステップＳ９の顔の向き検出処理の結果に基づいて、画像に写された人の顔が笑顔であり且つ正面を向いているか否かを判定する（ステップＳ１０）。画像に写された人の顔が笑顔であり且つ正面を向いている場合（Ｓ１０：ＹＥＳ）、処理部２１の画像送信処理部２１ｆは、この画像をサーバ装置３へ送信し（ステップＳ１１）、ステップＳ１へ処理を戻す。なおこのときに画像送信処理部２１ｆが送信する画像のデータは、撮影部１１が撮影した非圧縮の画像であり、ステップＳ４の人検出処理、ステップＳ７の顔検出処理、ステップＳ８の表情検出処理及びステップＳ９の向き検出処理等の結果に関する情報がメタデータとして付されたものである。また、画像に写された人の顔が笑顔ではない又は正面を向いていない場合（Ｓ１０：ＮＯ）、画像選別部２１ｅは、この画像を除去して（ステップＳ１２）、ステップＳ１へ処理を戻す。 The image selection unit 21e of the processing unit 21 determines whether the person's face in the image is smiling and facing forward based on the result of the expression detection processing in step S8 and the result of the face direction detection processing in step S9. It is determined whether or not there is (step S10). When the person's face in the image is smiling and facing the front (S10: YES), the image transmission processing unit 21f of the processing unit 21 transmits this image to the server device 3 (step S11), The process is returned to step S1. Note that the image data transmitted by the image transmission processing unit 21f at this time is the uncompressed image captured by the image capturing unit 11, and includes the person detection processing in step S4, the face detection processing in step S7, and the facial expression detection processing in step S8. and information on the result of orientation detection processing in step S9, etc. is added as metadata. If the face of the person captured in the image is not smiling or does not face the front (S10: NO), the image selector 21e removes this image (step S12) and returns the process to step S1. .

なお本実施の形態に係るカメラ１が画像を選別する際に判定する条件は、画像中に人が写されていること、不適切な画像ではない事、画像中に写された人の顔の表情が笑顔であること、及び、顔の向きが正面向きであることに限らない。例えばカメラ１が撮影を行う周期が短い場合、同じシーンを撮影した類似の画像が複数枚撮影され、これら複数枚の画像が全て条件を満たすと判定されてサーバ装置３へ送信されることが生じ得る。そこでカメラ１は、時系列的に連続する画像について、同じシーンであるか否かの判定を行い、同じシーンを撮影した複数の画像についてはこの中から代表の１枚を選別してサーバ装置３へ送信してもよい。 Note that the camera 1 according to the present embodiment judges when selecting images that a person is included in the image, that the image is not inappropriate, and that the face of the person captured in the image is correct. The facial expression is not limited to a smiling face and the face direction is not limited to the front. For example, if the camera 1 takes a short shooting cycle, a plurality of similar images of the same scene may be shot, and all of these images may be determined to satisfy the conditions and sent to the server device 3 . obtain. Therefore, the camera 1 determines whether or not the images consecutive in time series are the same scene. can be sent to

図６は、カメラ１による同一シーン判定を説明するための模式図である。本実施の形態に係るカメラ１は、時系列的に連続する２つの画像を比較し、両画像に写されている人の数の変化と、写されている人の画像間での移動距離とに基づいて、２つの画像が同一シーンであるか否かを判定する。本実施の形態においてカメラ１は、２つの画像において写されている人の数が変化しておらず、且つ、写されている人の画像間での移動距離が閾値以下である場合に、２つの画像が同一シーンであると判定する。またカメラ１は、２つの画像において写されている人の数が変化するか、又は、写されている人の画像間での移動距離が閾値を超える場合に、２つの画像が同一シーンではないと判定する。 FIG. 6 is a schematic diagram for explaining the same scene determination by the camera 1. As shown in FIG. The camera 1 according to the present embodiment compares two consecutive images in time series, changes in the number of people photographed in both images, and the moving distance of the persons photographed between the images. , it is determined whether the two images are the same scene. In the present embodiment, the camera 1 detects two images when the number of people captured in the two images does not change and when the movement distance between the images of the captured people is equal to or less than a threshold. It is determined that two images are the same scene. The camera 1 also detects that the two images are not the same scene if the number of people captured in the two images changes or if the movement distance between the images of the captured people exceeds a threshold. I judge.

例えばカメラ１は、図６上段に示した時刻ｔ１に撮影された画像１と、図６中段に示した次の時刻ｔ２に撮影された画像２とを比較し、両画像には共に２人の人が写されており、各人の移動距離が閾値以下であると判定して、画像１及び画像２は同一シーンであると判断することができる。また例えば図６中段に示した時刻ｔ２に撮影された画像２と、図６下段に示した次の時刻ｔ３に撮影された画像３とを比較し、画像３に写されている人が３人に増えていること、及び、画像２から画像３の間での人の移動距離が閾値を超えることを判定し、画像２及び画像３は同一シーンではないと判断することができる。 For example, the camera 1 compares the image 1 taken at time t1 shown in the upper part of FIG. 6 with the image 2 taken at the next time t2 shown in the middle part of FIG. It is possible to determine that images 1 and 2 are the same scene by determining that people are photographed and that the moving distance of each person is equal to or less than the threshold. For example, the image 2 taken at time t2 shown in the middle of FIG. 6 is compared with the image 3 taken at the next time t3 shown in the bottom of FIG. and that the moving distance of the person between the images 2 and 3 exceeds the threshold, it can be determined that the images 2 and 3 are not the same scene.

カメラ１は、同一シーンであると判断した複数の画像について、複数の画像から１つの画像を選別する処理を行う。このときにカメラ１は、例えば各画像について笑顔の確信度及び正面を向いている確信度等に基づくスコアリングを行い、最も高いスコアが付された画像を選別する。なお画像の選別方法は、スコアリングによるものに限らず、どのような方法が採用されてもよい。例えば時系列で連続する複数の画像について、最初のタイミング、中央のタイミング又は最後のタイミング等の所定タイミングの画像を選別してもよい。また例えば、画像に写されている人の大きさ、画像全体の面積に対する人が占める割合等に基づいて画像を選別してもよい。 The camera 1 performs processing for selecting one image from a plurality of images determined to be the same scene. At this time, the camera 1 performs scoring based on, for example, the degree of certainty of a smiling face and the degree of certainty of facing forward for each image, and selects the image with the highest score. Note that the image selection method is not limited to scoring, and any method may be adopted. For example, among a plurality of images that are continuous in time series, an image at a predetermined timing such as the first timing, the middle timing, or the last timing may be selected. Alternatively, for example, the images may be selected based on the size of the person in the image, the ratio of the person to the area of the entire image, or the like.

また、カメラ１は、画像に写されている人の顔の特徴を抽出することで人を識別し、１日等の所定期間に撮影された人毎にサーバ装置３へ送信する画像として選別した画像の枚数をカウントし、所定期間にサーバ装置３へ送信する画像数を均一化するように、選別する画像の枚数を調整してもよい。なおカメラ１による人の顔の識別は、施設１００の利用者の誰であるかまでを識別する必要はない（即ち、サーバ装置３のユーザＤＢ３２ｃに登録された利用者との一致を判断する必要はない）。カメラ１は、例えば撮影した画像に対して笑顔の確信度及び正面を向いている確信度等に基づくスコアリングを行い、スコアが閾値を超える画像をサーバ装置３へ送信する。このときにカメラ１は、人毎にカウントした画像の枚数に基づいて、例えばサーバ装置３へ送信した画像の枚数が多い人についてはスコアと比較する閾値を上げ、枚数が少ない人については閾値を下げる等の処理を行うことで、サーバ装置３へ送信する画像の枚数を調整することができる。なお送信する画像の枚数の調整方法はこれに限るものではなく、どのような方法が採用されてもよい。 In addition, the camera 1 identifies a person by extracting facial features of the person captured in the image, and selects an image to be transmitted to the server device 3 for each person photographed during a predetermined period such as one day. The number of images may be counted and the number of images to be selected may be adjusted so that the number of images transmitted to the server apparatus 3 in a predetermined period is uniform. It should be noted that identification of a person's face by the camera 1 does not need to identify the user of the facility 100 (that is, it is necessary to determine whether it matches a user registered in the user DB 32c of the server device 3). not). The camera 1 performs scoring based on, for example, the degree of certainty of smiling and the degree of certainty that the person is facing the front, etc., for the captured image, and transmits the image whose score exceeds the threshold to the server device 3 . At this time, based on the number of images counted for each person, the camera 1 raises the threshold for comparison with the score for a person who has sent a large number of images to the server device 3, and raises the threshold for a person who has sent a small number of images. The number of images to be transmitted to the server apparatus 3 can be adjusted by performing processing such as lowering. Note that the method for adjusting the number of images to be transmitted is not limited to this, and any method may be employed.

また、カメラ１は、サーバ装置３へ送信する画像に対して、人の検出結果、人の顔の検出結果、顔の表情の検出結果及び顔の向きの検出結果等の情報をメタデータとして付す処理を行う。図７は、カメラ１が送信する画像に付すメタデータの一例を示す模式図である。本例においてカメラ１は、「画像名」、「撮影日時」、「子供フラグ」、「顔検出結果」、「笑顔の確信度」及び「正面の確信度」等の情報をメタデータとして画像に付してサーバ装置３へ送信する。「画像名」は、カメラ１が撮影した画像に対して一意に付される名称であり、例えば所定の文字及び数字等を組み合わせた文字列が適宜に設定される。なお「画像名」はメタデータに含まれていなくてもよい。また撮影日時に基づく名称を画像名とする場合には、メタデータに「撮影日時」の情報が含まれていなくてもよい。「撮影日時」は、撮影部１１により画像が撮影された日時を示す情報であり、カメラ１内の時計機能等に基づいて情報が付される。 In addition, the camera 1 attaches information such as detection results of people, detection results of human faces, detection results of facial expressions, and detection results of face orientations to images to be transmitted to the server device 3 as metadata. process. FIG. 7 is a schematic diagram showing an example of metadata attached to an image transmitted by the camera 1. As shown in FIG. In this example, the camera 1 stores information such as "image name", "photographing date and time", "child flag", "face detection result", "confidence level of smiling face", and "confidence level of front view" in the image as metadata. attached and transmitted to the server device 3 . The "image name" is a unique name given to the image captured by the camera 1. For example, a character string combining predetermined characters and numbers is appropriately set. Note that the "image name" does not have to be included in the metadata. If the image name is based on the shooting date and time, the metadata does not have to include the "shooting date and time" information. “Photographing date and time” is information indicating the date and time when the image was captured by the imaging unit 11 , and information is attached based on the clock function in the camera 1 and the like.

「人検出結果」は、カメラ１の人検出部２１ａによる検出結果の情報である。本例では画像中に人が検出された領域を、ｘ座標、ｙ座標、幅（ｗ）及び高さ（ｈ）の４つの値で示している。「顔検出結果」は、カメラ１の顔検出部２１ｃによる検出結果の情報である。本例では人検出結果と同様に、画像中に検出された人の顔の領域を、ｘ座標、ｙ座標、幅（ｗ）及び高さ（ｈ）の４つの値で示している。なお画像に複数の人が検出された場合には、人検出結果及び顔検出結果として複数の領域の情報が画像に付されてよい。 “Human detection result” is information on the detection result by the human detection unit 21 a of the camera 1 . In this example, an area in which a person is detected in the image is indicated by four values of x-coordinate, y-coordinate, width (w) and height (h). “Face detection result” is information on the detection result by the face detection unit 21 c of the camera 1 . In this example, similarly to the human detection result, the human face area detected in the image is indicated by four values of x coordinate, y coordinate, width (w) and height (h). When multiple people are detected in the image, information on multiple areas may be attached to the image as the result of human detection and the result of face detection.

「子供フラグ」は、検出された人が子供であるか、大人であるかを示すフラグである。例えばカメラ１は、撮影した画像に写された人が子供であるか大人であるかを判定する処理を行ってもよく、この処理を行う場合に処理結果として子供であるか否かを示すフラグをメタデータとして画像に付してもよい。画像に写された人が子供であるか否かの判定は、例えば画像に写された人の大きさ、身長等を算出して行うことができ、また例えば学習済の学習モデルを利用して子供であるか否かの判定を行ってもよい。学習モデルは、例えば画像及び人検出結果を入力として受け付けて、画像に写された人が子供である確信度を出力するように予め機械学習が行われたものとすることができる。なお本実施の形態においては、保育園又は幼稚園等の施設１００にて子供の写真を撮影することを目的としており、画像に写されている人が大人であると判定された場合には、サーバ装置３へ送信せずに破棄してもよく、この場合には顔検出、表情検出及び向き検出等の処理を行わなくてよい。 "Child flag" is a flag indicating whether the detected person is a child or an adult. For example, the camera 1 may perform processing to determine whether the person in the photographed image is a child or an adult. may be attached to the image as metadata. Whether or not the person depicted in the image is a child can be determined, for example, by calculating the size, height, etc. of the person depicted in the image. You may determine whether it is a child. For example, the learning model can receive an image and a person detection result as input, and machine learning can be performed in advance so as to output a degree of certainty that the person in the image is a child. In this embodiment, the purpose is to take a picture of a child in the facility 100 such as a nursery school or a kindergarten. 3, and in this case, face detection, facial expression detection, orientation detection, etc. need not be performed.

「笑顔の確信度」は、カメラ１の表情向き検出部２１ｄによる表情検出の結果の情報であり、０から１までの数値情報である。同様に、「正面の確信度」は、カメラ１の表情向き検出部２１ｄによる顔の向き検出の結果の情報であり、０から１までの数値情報である。これらの数値情報は、検出処理に用いる学習済の学習モデルが出力する値である。 The “confidence degree of smile” is information on the result of facial expression detection by the facial expression direction detection unit 21d of the camera 1, and is numerical information from 0 to 1. FIG. Similarly, the "certainty factor of front" is information on the result of face direction detection by the facial expression direction detection unit 21d of the camera 1, and is numerical information from 0 to 1. FIG. These numerical information are values output by a learned learning model used for detection processing.

＜サーバ装置３による画像選別処理＞
本実施の形態に係る情報処理システムでは、施設１００に設置された一又は複数のカメラ１が撮影して選別した画像をサーバ装置３へ送信する。カメラ１からの画像を受信したサーバ装置３は、受信した画像が所定の条件（第２の条件）を満たすか否かを判定することによって画像を選別し、所定の条件を満たすと判定した画像を画像記憶部３２ｂに記憶して蓄積すると共に、利用者の端末装置５へ送信する処理を行う。本実施の形態においてサーバ装置３は、カメラ１が画像に付したメタデータに含まれる情報及びサーバ装置３が画像に基づいて判定した種々の判定結果に基づくスコアリングを行い、各画像について算出したスコアが閾値を超えることを条件として画像を選別する。スコアリングには、カメラ１による人の顔の表情及び向き等の検出結果、並びに、サーバ装置３による行動判定の結果等の情報が用いられる。 <Image Selection Processing by Server Apparatus 3>
In the information processing system according to the present embodiment, one or a plurality of cameras 1 installed in a facility 100 capture and select images to be sent to the server device 3 . The server device 3 that has received the image from the camera 1 selects the image by determining whether or not the received image satisfies a predetermined condition (second condition), and selects the image that has been determined to satisfy the predetermined condition. is stored and accumulated in the image storage unit 32b, and processing for transmission to the terminal device 5 of the user is performed. In the present embodiment, the server device 3 performs scoring based on the information included in the metadata attached to the image by the camera 1 and various determination results determined based on the image by the server device 3, and calculates the score for each image. Filter images if their score exceeds a threshold. For scoring, information such as detection results of facial expressions and orientations of people by the camera 1 and behavior determination results by the server device 3 are used.

図８は、本実施の形態に係るサーバ装置３が行う画像選別処理の手順を示すフローチャートである。本実施の形態に係るサーバ装置３の処理部３１の画像受信処理部３１ａは、施設１００に設置された一又は複数のカメラ１から画像を受信したか否かを判定する（ステップＳ２１）。画像を受信していない場合（Ｓ２１：ＮＯ）、画像受信処理部３１ａは、カメラ１からの画像を受信するまで待機する。 FIG. 8 is a flow chart showing the procedure of image selection processing performed by the server device 3 according to the present embodiment. The image reception processing unit 31a of the processing unit 31 of the server device 3 according to the present embodiment determines whether images have been received from one or more cameras 1 installed in the facility 100 (step S21). If no image has been received (S21: NO), the image reception processing unit 31a waits until an image from the camera 1 is received.

カメラ１からの画像を受信した場合（Ｓ２１：ＹＥＳ）、処理部３１の行動判定部３１ｂは、受信した画像に写された人の行動を判定する処理を行う（ステップＳ２２）。このときに行動判定部３１ｂは、画像に写された人が所定の行動を行っている確信度を出力する学習済の学習モデルを複数用い、複数の行動についての確信度を取得し、確信度をメタデータとして画像に付す。次いで処理部３１のＩＤ付与部３１ｃは、画像に写された人を識別するＩＤをメタデータとして画像に付与する処理を行う（ステップＳ２３）。このときにＩＤ付与部３１ｃは、画像から写されている人の顔の特徴量を抽出し、ユーザＤＢ３２ｃに登録されている利用者の特徴量との比較を行うことで、画像に写されている人と登録済の利用者のＩＤとの対応を判定する。 When the image from the camera 1 is received (S21: YES), the action determination section 31b of the processing section 31 performs processing for determining the action of the person shown in the received image (step S22). At this time, the action determination unit 31b uses a plurality of trained learning models that output certainty that the person in the image is performing a predetermined action, acquires certainty about the plurality of actions, and obtains certainty. is attached to the image as metadata. Next, the ID assigning unit 31c of the processing unit 31 performs a process of assigning to the image an ID for identifying the person pictured in the image as metadata (step S23). At this time, the ID assigning unit 31c extracts the feature amount of the person's face in the image, and compares it with the feature amount of the user registered in the user DB 32c. The correspondence between a person who is present and a registered user ID is determined.

次いで処理部３１の画像選別部３１ｄは、画像にメタデータとして付された種々の条件に基づいて画像選別処理を行う（ステップＳ２４）。このときに画像選別部３１ｄは、画像に付された種々の条件に基づいて、この画像に対するスコアリングを行い、画像のスコアが閾値を超えるか否かに基づいて画像を選別する。画像選別処理の結果に基づき、画像選別部３１ｄは、この画像を記憶部３２の画像記憶部３２ｂに記憶するか否かを判定する（ステップＳ２５）。記憶しないと判定した場合（Ｓ２５：ＮＯ）、画像選別部３１ｄは、この画像を破棄して（ステップＳ２６）、ステップＳ２１へ処理を戻す。 Next, the image selection unit 31d of the processing unit 31 performs image selection processing based on various conditions attached to the images as metadata (step S24). At this time, the image selection unit 31d performs scoring for this image based on various conditions attached to the image, and selects the image based on whether or not the score of the image exceeds the threshold. Based on the result of the image selection process, the image selection unit 31d determines whether or not to store this image in the image storage unit 32b of the storage unit 32 (step S25). If it is determined not to be stored (S25: NO), the image selection unit 31d discards this image (step S26) and returns the process to step S21.

画像を記憶すると判定した場合（Ｓ２５：ＹＥＳ）、処理部３１の画像補正部３１ｅは、必要に応じて画像の補正処理を行う（ステップＳ２７）。ここで画像補正部３１ｅは、不鮮明な画像を鮮明化する画像処理、画像の所定範囲を抽出して拡大する処理、超解像度技術により解像度を高める処理等を行う。次いで処理部３１は、画像補正がなされた画像を記憶部３２の画像記憶部３２ｂに記憶して（ステップＳ２８）、ステップＳ２１へ処理を戻す。また本フローチャートにおいてサーバ装置３は、カメラ１から画像を受信する毎に、受信した画像を選別しているが、これに限るものではなく、受信した画像を全て記憶しておき、例えば１日に１回の所定タイミングで記憶しておいた全ての画像について選別を行い、不要な画像を破棄してもよい。 If it is determined to store the image (S25: YES), the image corrector 31e of the processor 31 performs image correction processing as necessary (step S27). Here, the image correction unit 31e performs image processing for sharpening an unclear image, processing for extracting and enlarging a predetermined range of an image, processing for increasing resolution using a super-resolution technique, and the like. Next, the processing section 31 stores the corrected image in the image storage section 32b of the storage section 32 (step S28), and returns the process to step S21. In this flowchart, the server device 3 sorts out the received images each time it receives an image from the camera 1, but the present invention is not limited to this. It is also possible to sort out all the stored images at one predetermined timing and discard unnecessary images.

本実施の形態に係るサーバ装置３は、例えばカメラ１がメタデータとして画像に付した笑顔の確信度及び顔の向きが正面である確信度と、ステップＳ２２による行動判定にて得られる所定の行動に関する確信度とに基づいて、画像のスコアリングを行う。例えばサーバ装置３は、画像に写されている人がより笑顔でより正面を向いている程、画像に高いスコアを与えることができる。また例えばサーバ装置３は、複数の行動について最も確信度が高い行動がいずれであるかに基づいて画像にスコアを与えることができる。この場合にサーバ装置３は、例えばハイハイの行動に１０点、食事の行動に９点、…等のように行動毎に定められたスコアを与えてもよく、また例えば最も高い確信度を１０倍した値をスコアとしてもよく、これら以外の方法でスコアを決定してもよい。 The server device 3 according to the present embodiment stores, for example, the certainty of a smiling face and the certainty that the face is facing the front attached to the image by the camera 1 as metadata, and the predetermined behavior obtained by the behavior determination in step S22. Images are scored based on the confidence about For example, the server device 3 can give a higher score to an image as the person in the image smiles more and faces the front. Further, for example, the server device 3 can give scores to images based on which action has the highest degree of certainty among a plurality of actions. In this case, the server device 3 may give a predetermined score to each behavior such as 10 points for crawling behavior, 9 points for eating behavior, and so on. The obtained value may be used as the score, or the score may be determined by a method other than these.

また更にサーバ装置３は、例えば人検出結果及び顔検出結果等に基づいて、画像に写されている人の人数、人が写されている位置、及び、写されている人の大きさ等に基づいてスコアを与えてもよい。例えばサーバ装置３は、画像の端に人が小さく写されている場合にはこの画像に対するスコアを低減し、画像の中央に人が大きく写されている場合にはこの画像に対するスコアを増加させることができる。 Furthermore, the server device 3 determines the number of people in the image, the positions of the people in the image, the size of the people in the image, etc., based on the results of human detection and face detection, for example. Scores may be given based on For example, the server device 3 reduces the score for this image when a person appears small at the edge of the image, and increases the score for this image when a person appears large at the center of the image. can be done.

また本実施の形態に係るサーバ装置３は、例えば１日等の所定期間において、画像記憶部３２ｂに記憶する画像として選別した画像の枚数を、この画像に写されている利用者（子供）毎にカウントし、記憶する画像数を均一化するように、選別する画像の枚数を調整する。サーバ装置３は、上記のように画像に対するスコアリングを行って、画像に対するスコアが閾値を超えるか否かにより画像を選別するが、例えば画像の枚数が少ない利用者が写された画像については、スコアとの比較を行う閾値を低減することで、この利用者が写された画像が選別されて画像記憶部３２ｂに記憶される可能性を高める。また例えばサーバ装置３は、画像選別を１日の終わり等の所定タイミングで一括して行う構成である場合、利用者毎に写された画像のスコア順に所定枚数の画像を選択し、この所定枚数の画像を画像記憶部３２ｂに記憶する画像として選別してもよい。 Further, the server device 3 according to the present embodiment stores the number of images selected as images to be stored in the image storage unit 32b for a predetermined period of time, such as one day, for each user (child) photographed in the image. The number of images to be selected is adjusted so that the number of images to be stored is uniform. The server device 3 performs scoring on the images as described above, and selects the images depending on whether the score for the images exceeds the threshold. By reducing the threshold for comparison with the score, the possibility of selecting an image of the user and storing it in the image storage unit 32b is increased. Further, for example, if the server device 3 is configured to collectively select images at a predetermined timing such as the end of the day, the server device 3 selects a predetermined number of images in the order of the score of the images photographed for each user, and selects the predetermined number of images. image may be selected as an image to be stored in the image storage unit 32b.

なお本実施の形態に係る情報処理システムでは、利用者毎の画像の枚数調整をカメラ１及びサーバ装置３の両方で行うことができる。ただしサーバ装置３は、複数のカメラ１が施設１００に設置されている場合には、複数のカメラ１から画像を受信する。サーバ装置３は、複数のカメラ１から受信する画像の全てを対象に、利用者毎の画像数のカウントを行って、利用者毎の画像の枚数調整を行うことができる。これに対してカメラ１は、自身が撮影した画像について利用者毎の画像の枚数調整を行う。 In the information processing system according to the present embodiment, both the camera 1 and the server device 3 can adjust the number of images for each user. However, the server device 3 receives images from the multiple cameras 1 when the multiple cameras 1 are installed in the facility 100 . The server device 3 can count the number of images for each user for all the images received from the plurality of cameras 1 and adjust the number of images for each user. On the other hand, the camera 1 adjusts the number of images for each user with respect to the images taken by itself.

なおサーバ装置３は、上記以外のどのような基準を用いて画像のスコアリングを行ってもよい。またサーバ装置３は、スコアリングを行わずに画像の選別を行ってよい。サーバ装置３による画像の選別方法には、どのような方法が採用されてもよい。サーバ装置３は、一又は複数のカメラ１から受信した画像を所定の条件に従って選別し、選別した画像を画像記憶部３２ｂに記憶して蓄積すると共に、利用者（保護者）の端末装置５へ送信する。 Note that the server device 3 may score images using any criteria other than the above. Alternatively, the server device 3 may select images without performing scoring. Any method may be adopted as a method for sorting images by the server device 3 . The server device 3 selects images received from one or more cameras 1 according to predetermined conditions, stores and accumulates the selected images in the image storage unit 32b, and sends them to the terminal device 5 of the user (guardian). Send.

図９は、本実施の形態に係るサーバ装置３が行う画像送信処理の手順を示すフローチャートである。本実施の形態に係るサーバ装置３の処理部３１の画像送信処理部３１ｆは、例えば１日に１回の画像を送信するタイミングに至ったか否かを判定する（ステップＳ４１）。画像の送信タイミングに至った場合（Ｓ４１：ＹＥＳ）、画像送信処理部３１ｆは、記憶部３２の画像記憶部３２ｂに選別して記憶された画像を読み出す（ステップＳ４２）。画像送信処理部３１ｆは、読み出した画像を対応する端末装置５へ送信し（ステップＳ４３）、ステップＳ４１へ処理を戻す。なおこのときに画像送信処理部３１ｆは、読み出した画像に付されたメタデータに基づいて、この画像に写されている利用者（子供）に対応付けて登録された端末装置５をユーザＤＢ３２ｃから検索し、この端末装置５へ画像を送信する。 FIG. 9 is a flowchart showing the procedure of image transmission processing performed by the server device 3 according to this embodiment. The image transmission processing unit 31f of the processing unit 31 of the server device 3 according to the present embodiment determines whether or not it is time to transmit an image once a day, for example (step S41). When the image transmission timing has arrived (S41: YES), the image transmission processing unit 31f reads out the image selected and stored in the image storage unit 32b of the storage unit 32 (step S42). The image transmission processing unit 31f transmits the read image to the corresponding terminal device 5 (step S43), and returns the process to step S41. At this time, the image transmission processing unit 31f transfers the terminal device 5 registered in association with the user (child) shown in the image from the user DB 32c based on the metadata attached to the read image. Search and transmit the image to this terminal device 5 .

画像の送信タイミングに至っていない場合（Ｓ４１：ＮＯ）、画像送信処理部３１ｆは、端末装置５から画像の検索要求を受信したか否かを判定する（ステップＳ４４）。検索要求を受信していない場合（Ｓ４４：ＮＯ）、画像送信処理部３１ｆは、ステップＳ４１へ処理を戻す。検索要求を受信した場合（Ｓ４４：ＹＥＳ）、画像送信処理部３１ｆは、受信した検索要求に含まれる画像の検索条件を取得する（ステップＳ４５）。処理部３１の画像選別部３１ｄは、ステップＳ４５にて取得した検索条件に該当する画像を、画像記憶部３２ｂに記憶された画像の中から選別する（ステップＳ４６）。画像送信処理部３１ｆは、画像記憶部３２ｂから選別された画像を、検索要求の要求元の端末装置５へ送信して（ステップＳ４７）、ステップＳ４１へ処理を戻す。 If the image transmission timing has not come yet (S41: NO), the image transmission processing unit 31f determines whether or not an image search request has been received from the terminal device 5 (step S44). If the search request has not been received (S44: NO), the image transmission processing unit 31f returns the process to step S41. When the search request is received (S44: YES), the image transmission processing unit 31f acquires the image search conditions included in the received search request (step S45). The image selection unit 31d of the processing unit 31 selects images that meet the search conditions acquired in step S45 from the images stored in the image storage unit 32b (step S46). The image transmission processing unit 31f transmits the image selected from the image storage unit 32b to the terminal device 5 that requested the search request (step S47), and returns the process to step S41.

＜端末装置５による画像検索処理＞
本実施の形態に係る情報処理システムでは、施設１００に設置されたカメラ１が撮影した画像から、カメラ１及びサーバ装置３により選別された画像が利用者の端末装置５へ送信される。例えばカメラ１は午前６時から午後６時まで撮影を行い、サーバ装置３はその日に選別された画像を午後７時に端末装置５へ送信する。端末装置５は、サーバ装置３から送信された画像を受信した場合に、受信した旨を利用者に通知すると共に、利用者の操作等に応じて受信した画像を表示部５４に表示する。 <Image Search Processing by Terminal Device 5>
In the information processing system according to the present embodiment, an image selected by the camera 1 and the server device 3 from the images captured by the camera 1 installed in the facility 100 is transmitted to the terminal device 5 of the user. For example, the camera 1 takes pictures from 6:00 am to 6:00 pm, and the server device 3 transmits the images selected on that day to the terminal device 5 at 7:00 pm. When the terminal device 5 receives the image transmitted from the server device 3, the terminal device 5 notifies the user of the reception and displays the received image on the display unit 54 according to the user's operation.

また本実施の形態に係る情報処理システムでは、利用者が端末装置５を操作して、サーバ装置３の画像記憶部３２ｂに記憶して蓄積された画像を検索し、検索条件に該当する画像を取得して端末装置５に表示させることができる。端末装置５は、例えばメニュー画面において画像の検索の項目が選択された場合に、表示部５４に検索条件の設定画面を表示して、利用者による検索条件の入力を受け付ける。 Further, in the information processing system according to the present embodiment, the user operates the terminal device 5 to search for the images stored and accumulated in the image storage unit 32b of the server device 3, and select the image corresponding to the search condition. It can be acquired and displayed on the terminal device 5 . For example, when an image search item is selected on the menu screen, the terminal device 5 displays a search condition setting screen on the display unit 54 to accept an input of search conditions by the user.

図１０は、端末装置５が表示する検索条件設定画面の一例を示す模式図である。図示の検索条件設定画面では、最上部に「検索条件設定」のタイトル文字列が表示され、その下方に設定可能な一又は複数の条件が並べて表示されている。端末装置５は、例えば「登録画像使用」の条件を使用するか否かの設定を受け付けるチェックボックスを検索条件設定画面に表示する。「登録画像使用」の条件は、例えば保護者が子供を自身で撮影した画像を予め登録しておき、登録された画像に写された人と同じ又は類似する特徴を有する人が写された画像を検索するための条件である。本例では、チェックボックス及び「登録画像使用」の文字列の隣に、「画像登録」のラベルが付されたボタンが表示されており、このボタンに対する操作が行われた場合に端末装置５は、画像の選択画面又は撮影画面等を表示して登録用の画像を取得し、取得した画像又はこの画像から抽出した特徴量をサーバ装置３へ送信して、サーバ装置３のユーザＤＢ３２ｃに画像の登録を行う。 FIG. 10 is a schematic diagram showing an example of a search condition setting screen displayed by the terminal device 5. As shown in FIG. In the search condition setting screen shown in the drawing, a title character string of "search condition setting" is displayed at the top, and one or a plurality of conditions that can be set are displayed side by side below the title character string. The terminal device 5 displays, on the search condition setting screen, a check box for accepting a setting as to whether or not to use the condition "registered image used", for example. The condition of "registered image use" is, for example, that a guardian pre-registers an image of a child taken by himself/herself, and an image of a person who has the same or similar characteristics as the person in the registered image. is a condition for searching for . In this example, a button labeled "image registration" is displayed next to the check box and the character string "use registered image", and when this button is operated, the terminal device 5 , an image selection screen, a photographing screen, or the like is displayed to acquire an image for registration, the acquired image or a feature amount extracted from this image is transmitted to the server device 3, and the image is stored in the user DB 32c of the server device 3. register.

また本例では、「表情設定」として、「笑顔」及び「泣き顔」等の表情を選択するチェックボックスが検索条件設定画面に表示されている。また本例では、「行動設定」として、「ハイハイ」及び「食事」等の行動を選択するチェックボックスが検索条件設定画面に表示されている。端末装置５は、これらのチェックボックスに対するチェックの有無により、表情及び行動に関する検索条件の設定を受け付ける。 In this example, check boxes for selecting facial expressions such as "smile" and "crying face" are displayed on the search condition setting screen as "facial expression settings". In this example, check boxes for selecting actions such as "crawling" and "eating" are displayed on the search condition setting screen as "action setting". The terminal device 5 accepts the setting of search conditions related to facial expressions and actions depending on whether or not these check boxes are checked.

なお図示の検索条件設定画面は一例であってこれに限るものではなく、設定可能な選択条件は図示のもの以外に様々な条件が採用され得る。例えば、画像が撮影された日時、画像に対する人の大きさ（占有率）、又は、画像に含まれる人の数等の種々の条件が設定可能であってよい。 Note that the illustrated search condition setting screen is an example and is not limited to this, and various conditions other than those illustrated can be adopted as selectable conditions. For example, it may be possible to set various conditions such as the date and time when an image was taken, the size of people in the image (occupancy), or the number of people included in the image.

端末装置５は、検索条件設定画面にて設定された検索条件に関する情報を含む検索要求をサーバ装置３へ送信する。サーバ装置３は、端末装置５からの検索要求に含まれる検索条件に基づいて画像記憶部３２ｂに記憶された画像を選別し、要求元の端末装置５へ選別した画像を送信する（図９のステップＳ４４～Ｓ４７参照）。サーバ装置３から検索結果として一又は複数の画像を受信した端末装置５は、受信した画像を例えばリスト状又はマトリクス状等に並べて一覧表示する。このときに端末装置５は、例えば検索条件と検索結果の画像との一致度等を算出し、算出した一致度の順に複数の画像を並べて表示してもよい。また画像を表示する順序に関する条件を利用者が設定可能であってもよい。例えば端末装置５は、検索条件として「笑顔」が設定されている場合、笑顔の確信度が高い画像から順に、複数の画像を並べて表示することができる。 The terminal device 5 transmits to the server device 3 a search request including information about the search conditions set on the search condition setting screen. The server device 3 selects the images stored in the image storage unit 32b based on the search conditions included in the search request from the terminal device 5, and transmits the selected images to the requesting terminal device 5 (see FIG. 9). See steps S44 to S47). The terminal device 5 that has received one or a plurality of images as a search result from the server device 3 arranges the received images in a list or matrix, for example, and displays them as a list. At this time, the terminal device 5 may calculate, for example, the degree of matching between the search condition and the image of the search result, and display the plurality of images side by side in order of the calculated degree of matching. In addition, the user may be able to set a condition regarding the order in which the images are displayed. For example, when "smile" is set as a search condition, the terminal device 5 can arrange and display a plurality of images in descending order of confidence of a smile.

図１１は、本実施の形態に係る端末装置５が行う処理の手順を示すフローチャートである。本実施の形態に係る端末装置５の処理部５１の画像受信処理部５１ａは、例えば１日に１回等の所定タイミングでサーバ装置３が送信する画像を受信したか否かを判定する（ステップＳ６１）。所定タイミングで送信される画像を受信した場合（Ｓ６１：ＹＥＳ）、画像受信処理部５１ａは、例えば端末装置５の表示部５４に画像を受信した旨を通知するメッセージを表示する（ステップＳ６２）。画像受信処理部５１ａは、表示したメッセージに対する利用者の操作として、受信した画像を表示する操作を受け付けたか否かを判定する（ステップＳ６３）。画像を表示する操作を受け付けない場合（Ｓ６３：ＮＯ）、画像受信処理部５１ａは、ステップＳ６１へ処理を戻す。画像を表示する操作を受け付けた場合（Ｓ６３：ＹＥＳ）、画像受信処理部５１ａは、サーバ装置３から受信した一又は複数の画像を表示部５４に表示して（ステップＳ６４）、ステップＳ６１へ処理を戻す。 FIG. 11 is a flow chart showing the procedure of processing performed by the terminal device 5 according to this embodiment. The image reception processing unit 51a of the processing unit 51 of the terminal device 5 according to the present embodiment determines whether or not the image transmitted by the server device 3 is received at a predetermined timing such as once a day (step S61). When the image transmitted at the predetermined timing is received (S61: YES), the image reception processing unit 51a displays a message notifying that the image has been received, for example, on the display unit 54 of the terminal device 5 (step S62). The image reception processing unit 51a determines whether or not an operation to display the received image has been accepted as the user's operation for the displayed message (step S63). If the image display operation is not accepted (S63: NO), the image reception processing unit 51a returns the process to step S61. If an operation to display an image has been accepted (S63: YES), the image reception processing unit 51a displays one or more images received from the server device 3 on the display unit 54 (step S64), and proceeds to step S61. return.

所定タイミングで送信される画像を受信していない場合（Ｓ６１：ＮＯ）、処理部５１の画像検索処理部５１ｂは、利用者による画像の検索条件の設定を受け付けたか否かを判定する（ステップＳ６５）。検索条件の設定を受け付けていない場合（Ｓ６５：ＮＯ）、画像検索処理部５１ｂは、ステップＳ６１へ処理を戻す。検索条件の設定を受け付けた場合（Ｓ６５：ＹＥＳ）、画像検索処理部５１ｂは、受け付けた検索条件を含む画像の検索要求をサーバ装置３へ送信する（ステップＳ６６）。画像検索処理部５１ｂは、検索要求に応じたサーバ装置３からの検索結果を受信したか否かを判定する（ステップＳ６７）。検索結果を受信していない場合（Ｓ６７：ＮＯ）、画像検索処理部５１ｂは、検索結果を受信するまで待機する。検索結果を受信した場合（Ｓ６７：ＹＥＳ）、画像検索処理部５１ｂは、検索結果としたサーバ装置３から受信した一又は複数の画像を表示部５４に表示して（ステップＳ６８）、ステップＳ６１へ処理を戻す。 If the image transmitted at the predetermined timing has not been received (S61: NO), the image search processing unit 51b of the processing unit 51 determines whether or not the setting of the image search condition by the user has been received (step S65). ). If the setting of the search condition has not been received (S65: NO), the image search processing unit 51b returns the process to step S61. When the setting of the search condition is accepted (S65: YES), the image search processing unit 51b transmits an image search request including the accepted search condition to the server device 3 (step S66). The image search processing unit 51b determines whether or not the search result has been received from the server device 3 in response to the search request (step S67). If the search result has not been received (S67: NO), the image search processing unit 51b waits until the search result is received. If the search result has been received (S67: YES), the image search processing unit 51b displays one or more images received from the server device 3 as the search result on the display unit 54 (step S68), and proceeds to step S61. Return processing.

＜まとめ＞
以上の構成の本実施の形態に係る情報処理システムでは、カメラ１の情報処理装置２０が、撮影部１１が撮影した画像から所定の対象を検出し、所定の対象が含まれる画像が第１の条件を満たすか否かを判定し、第１の条件を満たす画像を選別してサーバ装置３へ送信し、サーバ装置３に第２の条件を満たす画像を選別させる。カメラ１にて第１の条件に基づく画像の選別を予め行うことによって、カメラ１からサーバ装置３へ送信する画像の量を低減することが期待できる。なお撮影した画像から検出する所定の対象は、人でなくてもよく、例えば動植物で会ってもよい。 <Summary>
In the information processing system according to the present embodiment configured as described above, the information processing device 20 of the camera 1 detects a predetermined target from the image captured by the imaging unit 11, and the image including the predetermined target is selected as the first image. It determines whether or not the conditions are satisfied, selects images that satisfy the first condition, transmits them to the server device 3, and causes the server device 3 to select images that satisfy the second condition. By selecting images in advance based on the first condition in the camera 1, it is expected that the amount of images to be transmitted from the camera 1 to the server device 3 can be reduced. Note that the predetermined object to be detected from the photographed image may not be a person, but may be an animal or plant, for example.

また本実施の形態に係る情報処理システムでは、カメラ１の情報処理装置２０が、所定の対象として人を画像から検出し、人の検出結果に基づいて画像から人の顔を検出し、顔の検出結果に基づいて顔の表情又は向きを検出する。カメラ１にて行う画像の選別に用いられる第１の条件には、人の顔の表情又は向きに関する条件が含まれる。これにより画像に写されている人の表情又は向きについて、不適な画像を予め除去し、適切な画像のみをカメラ１からサーバ装置３へ送信することができる。 Further, in the information processing system according to the present embodiment, the information processing device 20 of the camera 1 detects a person as a predetermined object from an image, detects a person's face from the image based on the detection result of the person, and detects the face. A facial expression or orientation is detected based on the detection result. The first conditions used for selecting images by the camera 1 include conditions related to facial expressions or orientations of people. As a result, images inappropriate for the facial expression or orientation of the person captured in the image can be removed in advance, and only appropriate images can be transmitted from the camera 1 to the server device 3 .

また本実施の形態に係る情報処理システムでは、カメラ１の情報処理装置２０が、人の検出結果に基づいて、撮影した画像から人のプライバシーに関する不適切な画像を除去する。これにより、プライバシーに関する不適切な画像を予め除去することができ、カメラ１からサーバ装置３へ不適切な画像が送信されることを抑制できる。 Further, in the information processing system according to the present embodiment, the information processing device 20 of the camera 1 removes an inappropriate image regarding the privacy of the person from the captured image based on the detection result of the person. As a result, inappropriate images regarding privacy can be removed in advance, and transmission of inappropriate images from the camera 1 to the server device 3 can be suppressed.

また本実施の形態に係る情報処理システムでは、カメラ１の情報処理装置２０が、撮影部１１が撮影した画像を圧縮した圧縮画像を生成して人の検出を行い、人の顔の検出、顔の表情又は向きの検出等を非圧縮の画像に基づいて行い、非圧縮の画像をカメラ１からサーバ装置３へ送信する。これにより、比較的に精度が要求されない処理については圧縮画像を用いて高速な処理を行い、それ以外の処理については非圧縮の画像に基づいた高精度の処理を行うことが期待できる。また非圧縮の画像をカメラ１からサーバ装置３へ送信することによって、サーバ装置３が高精度の処理を行うことが期待できる。 Further, in the information processing system according to the present embodiment, the information processing device 20 of the camera 1 detects a person by generating a compressed image obtained by compressing an image captured by the image capturing unit 11, detects a person's face, and detects a face. facial expression or orientation detection is performed based on the uncompressed image, and the uncompressed image is transmitted from the camera 1 to the server device 3 . As a result, high-speed processing can be performed using compressed images for processing that does not require relatively high accuracy, and high-precision processing based on uncompressed images can be expected for other processing. Further, by transmitting uncompressed images from the camera 1 to the server device 3, the server device 3 can be expected to perform highly accurate processing.

また本実施の形態に係る情報処理システムでは、カメラ１の情報処理装置２０が、画像に対して行った検出及び判定等の結果に関する情報をメタデータとして画像に付してサーバ装置３へ送信する。これによりカメラ１から画像を受信したサーバ装置３は、カメラ１で行われた検出及び判定等の結果を利用して処理を行うことができる。 Further, in the information processing system according to the present embodiment, the information processing device 20 of the camera 1 attaches information regarding the results of detection, determination, etc. performed on the image as metadata to the image and transmits the metadata to the server device 3 . . As a result, the server device 3 that has received the image from the camera 1 can perform processing using the results of detection, determination, etc. performed by the camera 1 .

また本実施の形態に係る情報処理システムでは、カメラ１の情報処理装置２０が、画像に写された人毎に画像の数をカウントし、人毎の選別する画像の数が同程度となるように画像の選別を行うことを第１の条件とする。これにより、カメラ１にて撮影された複数の画像について、人毎の画像数を均一化して画像を選別してサーバ装置３へ送信することができる。 Further, in the information processing system according to the present embodiment, the information processing device 20 of the camera 1 counts the number of images for each person captured in the image so that the number of images to be selected for each person is approximately the same. The first condition is that the images are sorted at each step. As a result, the number of images for each person can be made uniform among a plurality of images captured by the camera 1 , and the images can be selected and transmitted to the server device 3 .

また本実施の形態に係る情報処理システムでは、サーバ装置３が、カメラ１の情報処理装置２０から受信した画像に含まれる人の行動を判定し、判定結果に基づいて第２の条件を満たす画像の選別を行う。これにより、カメラ１の撮影画像に写された人の行動に基づいて、画像の選別することができる。 Further, in the information processing system according to the present embodiment, the server device 3 determines the action of a person included in the image received from the information processing device 20 of the camera 1, and based on the determination result, the image that satisfies the second condition. selection. Thus, the images can be sorted based on the behavior of the person captured in the images captured by the camera 1. FIG.

また本実施の形態に係る情報処理システムでは、カメラ１の情報処理装置２０による検出又は判定等の結果に関する情報をサーバ装置３が取得し、取得した情報に基づいて画像のスコアリングを行い、各画像のスコアに基づいて画像を選別する。これによりサーバ装置３は、自身で行った検出又は判定等のみでなく、カメラ１にて行われた検出又は判定等の結果を用いれ画像の選別を行うことができる。 Further, in the information processing system according to the present embodiment, the server device 3 acquires information about the results of detection or determination by the information processing device 20 of the camera 1, scores the images based on the acquired information, and performs each Filter images based on image scores. Accordingly, the server apparatus 3 can select images using not only the detection or determination performed by itself, but also the results of the detection or determination performed by the camera 1 .

また本実施の形態に係る情報処理システムでは、画像に写された人を識別するＩＤ等の識別情報をサーバ装置３がメタデータとして画像に付す。これにより、サーバ装置３の画像記憶部３２ｂに記憶された各画像に写されている利用者を容易に判断して抽出等を行うことができる。 Further, in the information processing system according to the present embodiment, the server device 3 attaches identification information such as an ID for identifying a person in the image to the image as metadata. As a result, it is possible to easily determine and extract the user appearing in each image stored in the image storage unit 32b of the server device 3. FIG.

また本実施の形態に係る情報処理システムでは、サーバ装置３が、画像に写された人毎に画像の数をカウントし、人毎の選別する画像の数が同程度となるように画像の選別を行うことを第２の条件とする。これにより、一又は複数のカメラ１にて撮影されてサーバ装置３へ送信された複数の画像について、人毎の画像数を均一化して画像を選別して画像記憶部３２ｂに記憶することができる。 In the information processing system according to the present embodiment, the server device 3 counts the number of images for each person shown in the image, and selects images so that the number of images to be selected for each person is approximately the same. The second condition is to perform As a result, among a plurality of images captured by one or a plurality of cameras 1 and transmitted to the server device 3, the number of images for each person can be made uniform, the images can be selected, and the selected images can be stored in the image storage unit 32b. .

また本実施の形態に係る情報処理システムでは、サーバ装置３が、画像から人が写された画像領域を抽出し、抽出した画像領域の解像度を高めた画像を生成する。これにより、画像中に写された人を拡大した画像を生成することができ、拡大による画質の低下等を抑制することができる。 Further, in the information processing system according to the present embodiment, the server device 3 extracts an image area in which a person is shown from the image, and generates an image in which the resolution of the extracted image area is increased. As a result, it is possible to generate an image in which the person in the image is magnified, and it is possible to suppress deterioration in image quality due to enlargement.

また本実施の形態に係る情報処理システムでは、カメラ１が施設１００の撮影を行い、サーバ装置３が施設１００の利用者を識別するＩＤ等の識別情報を画像に付し、施設１００の利用者に対応付けられた端末装置５へ、この利用者に関する識別情報が付された画像を送信する。これによりサーバ装置３は、各利用者が写された画像を適切な端末装置５へ送信することができる。 Further, in the information processing system according to the present embodiment, the camera 1 photographs the facility 100, the server device 3 attaches identification information such as an ID for identifying the user of the facility 100 to the image, and the user of the facility 100 to the terminal device 5 associated with . As a result, the server device 3 can transmit an image of each user to an appropriate terminal device 5 .

なお本実施の形態においては、カメラ１を設置する施設１００を保育園又は幼稚園等とし、カメラ１が施設１００を利用する子供の撮影を行う構成を示したが、施設１００は保育園又は幼稚園等に限らず、どのような施設であってもよい。施設１００は屋内又は屋外のいずれのものであってもよい。また本実施の形態においては、カメラ１の撮影部１１及び情報処理装置２０が一体の構成を示したが、これに限るものではなく、カメラ１の撮影部１１及び情報処理装置２０は別体で会ってもよい。例えばカメラ１及び情報処理装置２０が有線又は無線で接続され、カメラ１が撮影した画像を情報処理装置２０へ与え、情報処理装置２０が第１の条件に基づいて画像を選別してサーバ装置３へ送信してもよい。 In this embodiment, the facility 100 in which the camera 1 is installed is a nursery school, a kindergarten, or the like, and the camera 1 takes pictures of children using the facility 100. However, the facility 100 is limited to a nursery school, a kindergarten, or the like. It can be any kind of facility. Facility 100 may be either indoors or outdoors. In addition, in the present embodiment, the photographing unit 11 of the camera 1 and the information processing device 20 are integrated, but this is not the only option, and the photographing unit 11 of the camera 1 and the information processing device 20 are separated. you can meet For example, the camera 1 and the information processing device 20 are connected by wire or wirelessly, the image captured by the camera 1 is provided to the information processing device 20, the information processing device 20 selects the image based on the first condition, and the server device 3 can be sent to

今回開示された実施形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed this time are illustrative in all respects and should be considered not restrictive. The scope of the present invention is indicated by the scope of the claims rather than the above-described meaning, and is intended to include all modifications within the scope and meaning equivalent to the scope of the claims.

１カメラ
３サーバ装置
５端末装置
１１撮影部
２０情報処理装置
２１処理部
２１ａ人検出部
２１ｂ不適切画像検出部
２１ｃ顔検出部
２１ｄ表情向き検出部
２１ｅ画像選別部
２１ｆ画像送信処理部
２２入出力部
２３記憶部
２３ａプログラム
２４通信部
３１処理部
３１ａ画像受信処理部
３１ｂ行動判定部
３１ｃＩＤ付与部
３１ｄ画像選別部
３１ｅ画像補正部
３１ｆ画像送信処理部
３２記憶部
３２ａサーバプログラム
３２ｂ画像記憶部
３２ｃユーザＤＢ
３３通信部
５１処理部
５１ａ画像受信処理部
５１ｂ画像検索処理部
５２記憶部
５２ａプログラム
５３通信部
５４表示部
５５操作部
９８，９９記録媒体
１００施設
Ｎネットワーク 1 camera 3 server device 5 terminal device 11 photographing unit 20 information processing device 21 processing unit 21a person detection unit 21b inappropriate image detection unit 21c face detection unit 21d expression orientation detection unit 21e image selection unit 21f image transmission processing unit 22 input/output unit 23 storage unit 23a program 24 communication unit 31 processing unit 31a image reception processing unit 31b action determination unit 31c ID provision unit 31d image selection unit 31e image correction unit 31f image transmission processing unit 32 storage unit 32a server program 32b image storage unit 32c user DB
33 communication unit 51 processing unit 51a image reception processing unit 51b image search processing unit 52 storage unit 52a program 53 communication unit 54 display unit 55 operation unit 98, 99 recording medium 100 facility N network

Claims

複数の撮影装置が、それぞれ、
撮影部が撮影した画像から人を検出し、
人の検出結果に基づいて、前記撮影部が撮影した画像から人の顔を検出し、
顔の検出結果に基づいて、顔の表情又は向きを検出し、
前記撮影部が撮影した画像に写された人毎に、当該人が写され且つ検出した顔の表情又は向きに関する第１の条件を満たす画像の数をカウントし、
前記撮影部が撮影した複数の画像から、各人の画像数が同程度となるように画像を選別してサーバ装置へ送信し、
前記サーバ装置が、
複数の前記撮影装置により撮影された複数の画像を受信し、
受信した画像に写された人毎に、当該人が写され且つ第２の条件を満たす画像の数をカウントし、
複数の前記撮影装置により撮影された複数の画像から、各人の画像数が同程度となるように画像を選別する、
情報処理方法。 A plurality of imaging devices each
Detecting people from images taken by the imaging unit,
Detecting a human face from the image captured by the imaging unit based on the human detection result,
Detecting the expression or orientation of the face based on the face detection result,
Counting the number of images that satisfy a first condition regarding facial expressions or orientations of detected facial expressions or orientations in which the person is photographed for each person photographed by the photographing unit;
Selecting images from a plurality of images captured by the imaging unit so that the number of images of each person is approximately the same and transmitting the images to a server device;
The server device
receiving a plurality of images captured by a plurality of the imaging devices;
counting, for each person in the received image, the number of images in which the person is shown and which satisfies a second condition;
Selecting images from a plurality of images captured by a plurality of the imaging devices so that the number of images of each person is approximately the same.
Information processing methods.

前記撮影装置が、人の検出結果に基づいて、前記撮影部が撮影した画像から人のプライバシーに関する不適切な画像を除去する、
請求項１に記載の情報処理方法。 The photographing device removes inappropriate images related to the privacy of the person from the image photographed by the photographing unit based on the detection result of the person.
The information processing method according to claim 1 .

前記撮影装置が、
前記撮影部が撮影した画像を圧縮した圧縮画像を生成し、
人の検出を、前記圧縮画像に基づいて行い、
前記第１の条件を満たすか否かの判定を、前記撮影部が撮影した非圧縮の画像に基づいて行い、
前記非圧縮の画像を前記サーバ装置へ送信する、
請求項１又は請求項２に記載の情報処理方法。 The photographing device
generating a compressed image by compressing the image captured by the imaging unit;
detecting a person based on the compressed image;
determining whether or not the first condition is satisfied based on the uncompressed image captured by the imaging unit;
transmitting the uncompressed image to the server device;
The information processing method according to claim 1 or 2.

前記撮影装置が、画像に対して行った検出及び判定の結果に関する情報を、当該画像と共に前記サーバ装置へ送信する、
請求項１から請求項３までのいずれか１つに記載の情報処理方法。 The photographing device transmits information about the result of detection and determination performed on the image to the server device together with the image;
The information processing method according to any one of claims 1 to 3.

前記サーバ装置が、前記撮影装置から受信した画像に含まれる人の行動を判定し、
前記第２の条件には、判定した行動に関する条件を含む、
請求項１から請求項４までのいずれか１つに記載の情報処理方法。 The server device determines the behavior of a person included in the image received from the imaging device ,
The second condition includes a condition related to the determined behavior,
The information processing method according to any one of claims 1 to 4.

前記サーバ装置が、
前記撮影装置による検出又は判定の結果に関する情報を取得し、
取得した情報に基づいて画像の採点を行い、
前記第２の条件には、採点結果に関する条件を含む、
請求項１から請求項５までのいずれか１つに記載の情報処理方法。 The server device
Acquiring information about the result of detection or determination by the imaging device,
Scoring the images based on the information obtained,
The second condition includes a condition regarding scoring results,
The information processing method according to any one of claims 1 to 5.

前記サーバ装置が、画像に含まれる人を識別する識別情報を当該画像に付す、
請求項１から請求項６までのいずれか１つに記載の情報処理方法。 The server device attaches to the image identification information that identifies a person included in the image,
The information processing method according to any one of claims 1 to 6.

前記サーバ装置が、
選別した画像から人が写された画像領域を抽出し、
抽出した画像領域の解像度を高めた画像を生成する、
請求項１から請求項７までのいずれか１つに記載の情報処理方法。 The server device
Extract the image area where people are photographed from the selected image,
generate an image with increased resolution of the extracted image region,
The information processing method according to any one of claims 1 to 7.

前記サーバ装置は、
選別した画像に写された施設の利用者を識別する識別情報を当該画像に付し、
前記利用者に対応付けられた端末装置へ、当該利用者の識別情報が付された画像を送信する、
請求項１から請求項８までのいずれか１つに記載の情報処理方法。 The server device
Attach identification information to the selected image that identifies the user of the facility in the selected image,
Sending an image with identification information of the user to the terminal device associated with the user;
The information processing method according to any one of claims 1 to 8.

撮影部が撮影した画像から人を検出する第１検出部、前記第１検出部による人の検出結果に基づいて、前記撮影部が撮影した画像から人の顔を検出する第２検出部、前記第２検出部による顔の検出結果に基づいて、顔の表情又は向きを検出する第３検出部、前記撮影部が撮影した画像に写された人毎に、当該人が写され且つ検出した顔の表情又は向きに関する第１の条件を満たす画像の数をカウントするカウント部、及び、前記撮影部が撮影した複数の画像から、各人の画像数が同程度となるように画像を選別して送信する送信部をそれぞれ有する複数の撮影装置と、
複数の前記撮影装置が送信した複数の画像を受信する受信部、受信した画像に写された人毎に、当該人が写され且つ第２の条件を満たす画像の数をカウントするカウント部、及び、複数の前記撮影装置により撮影された複数の画像から、各人の画像数が同程度となるように画像を選別する選別部を有するサーバ装置と
を備える、情報処理システム。 a first detection unit that detects a person from an image captured by the imaging unit; a second detection unit that detects a person's face from the image captured by the imaging unit based on the detection result of the person by the first detection unit; A third detection unit that detects the expression or orientation of the face based on the detection result of the face by the second detection unit; a counting unit that counts the number of images that satisfy a first condition relating to facial expression or orientation of the person; a plurality of imaging devices each having a transmission unit for transmission;
a receiving unit for receiving a plurality of images transmitted by the plurality of photographing devices; a counting unit for counting, for each person in the received images, the number of images in which the person is captured and which satisfies a second condition; and an information processing system comprising: a server device having a selection unit that selects images from a plurality of images captured by a plurality of the imaging devices so that the number of images of each person is approximately the same.