JP2021082301A

JP2021082301A - Neck-mounted device

Info

Publication number: JP2021082301A
Application number: JP2020201692A
Authority: JP
Inventors: 真人藤野; Masato Fujino
Original assignee: Fairy Devices Inc
Current assignee: Fairy Devices Inc
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-05-27

Abstract

To provide a neck-mounted device comprising an imaging unit, which is easy for a wearer to operate.SOLUTION: A neck-mounted device 100 that is worn around the user's neck, comprises: a first arm part 10 and a second arm part 20 that can be arranged across the neck; an imaging unit 60 provided on the first arm part 10; and a non-contact type sensor unit 70 that is provided on the second arm part 20, and receives input of information on control of the imaging unit 60, where the content of a control command based on the input information of the sensor unit 70 changes according to an image captured by the imaging unit 60.SELECTED DRAWING: Figure 1

Description

本発明は、ユーザの首元に装着される首掛け型装置に関する。より具体的に説明すると、本発明は、装着者の周囲を撮像するのに適した首掛け型の装置に関するものである。 The present invention relates to a neck-mounted device worn around the neck of a user. More specifically, the present invention relates to a neck-mounted device suitable for imaging the surroundings of a wearer.

近年、ユーザの身体の任意箇所に装着して、ユーザの状態やその周囲の環境の状態をセンシングすることのできるウェアラブルデバイスが注目を集めている。ウェアラブルデバイスとしては、例えばユーザの腕や、目元、耳元、首元、あるいはユーザが着用している衣服等に装着可能なものなど、様々な形態のものが知られている。このようなウェアラブルデバイスで収集したユーザの情報を解析することで、装着者やその他の者にとって有用な情報を取得することができる。 In recent years, wearable devices that can be attached to any part of the user's body to sense the state of the user and the state of the surrounding environment have been attracting attention. Various forms of wearable devices are known, such as those that can be worn on the user's arm, eyes, ears, neck, or clothing worn by the user. By analyzing the user information collected by such a wearable device, it is possible to obtain useful information for the wearer and other persons.

また、ウェアラブルデバイスの一種として、ユーザの首元に装着して装着者又はその対話者の発した音声を録音することのできる装置が知られている（特許文献１）。この特許文献１には、ユーザに装着される装着部を備え、この装着部が、ビームフォーミングのための音声データを取得する音声取得部（マイク）を少なくとも３つ有する音声処理システムが開示されている。また、特許文献１に記載のシステムは、撮像部を備えており、ユーザに装着された状態で前方を撮像可能に構成されている。また、特許文献１では、撮像部により撮像された撮像画像の画像認識結果により、他の話者の存在及び位置を特定したり、ユーザの顔の向きを推定し、その位置や向きに応じて音声取得部の指向性の向きを制御することも提案されている。 Further, as a kind of wearable device, a device that can be worn around the neck of a user to record a voice emitted by the wearer or the interlocutor thereof is known (Patent Document 1). Patent Document 1 discloses a voice processing system including a mounting portion worn by a user, and the mounting portion has at least three voice acquisition units (microphones) for acquiring voice data for beamforming. There is. Further, the system described in Patent Document 1 is provided with an imaging unit, and is configured to be able to image the front while being worn by the user. Further, in Patent Document 1, the presence and position of another speaker is specified from the image recognition result of the captured image captured by the imaging unit, the orientation of the user's face is estimated, and the orientation is determined according to the position and orientation. It has also been proposed to control the direction of the directivity of the voice acquisition unit.

特開２０１９−１３４４４１号公報JP-A-2019-134441

ところで、特許文献１のシステムは、ユーザからの操作入力を受け付けるための操作部を備える。また、特許文献１では、操作部の例として、撮像部による撮像の開始又は停止を指示する入力を受け付けるカメラボタンが挙げられており、この操作部をタッチ操作やスライド操作を受け付けるタッチスライダーによって実現することもできると提案されている。 By the way, the system of Patent Document 1 includes an operation unit for receiving an operation input from a user. Further, in Patent Document 1, as an example of the operation unit, a camera button that accepts an input instructing the start or stop of imaging by the imaging unit is mentioned, and this operation unit is realized by a touch slider that accepts a touch operation or a slide operation. It is also proposed that it can be done.

しかしながら、首掛け型装置は、装着者の死角（首元）に装着されるものであることから、特許文献１のように、カメラボタンのような物理ボタンで撮像部の操作を受け付けることとすると、装着者が撮像部を操作しにくくなり、必要なタイミングで撮像の開始又は停止をすることが難しくなるという問題がある。また、首掛け型装置において撮像の開始を物理ボタンで受け付ける場合、その物理ボタンを押下するときに装置全体が振動することから、撮像画像にブレが生じたり、狙った範囲とは異なる範囲を撮像してしまったりするという問題が生じることも考えられる。また、このような撮像時における装置全体の振動を抑えるために一方の手で装置を抑えつつ他方の手で物理ボタンを押下することを推奨することも考えられるが、撮像時に装置を両手で扱うことを装着者に求めると首掛け型装置の利便性が低下するという問題がある。また、物理ボタンのように人手による直接の接触が求められるものを採用すると、例えば装着者の手が汚れている場合や、医療現場のように衛生上の理由から装置に直接触れることが好ましくない場合に、装着者が撮像を行いにくくなるという問題もある。また、カメラで撮影を行うにしても、例えば静止画の撮影、動画の撮影、スローモーション撮影、パノラマ撮影など様々な方法があるが、ディスプレイを持たない首掛け型装置において物理ボタンを利用して多様な撮影方法の中から任意の方法を選択することには限界がある。 However, since the neck-mounted device is mounted in the blind spot (neck) of the wearer, it is assumed that the operation of the imaging unit is accepted by a physical button such as a camera button as in Patent Document 1. There is a problem that it becomes difficult for the wearer to operate the imaging unit and it becomes difficult to start or stop imaging at a required timing. In addition, when the start of imaging is received by a physical button in a neck-mounted device, the entire device vibrates when the physical button is pressed, so that the captured image is blurred or a range different from the target range is captured. It is also possible that there will be a problem of doing so. It is also conceivable to hold the device with one hand and press the physical button with the other hand in order to suppress the vibration of the entire device during such imaging, but handle the device with both hands during imaging. If the wearer is required to do so, there is a problem that the convenience of the neck-mounted device is reduced. In addition, if a physical button that requires direct manual contact is adopted, it is not preferable to directly touch the device, for example, when the wearer's hand is dirty or for hygienic reasons such as in a medical setting. In some cases, there is also a problem that it becomes difficult for the wearer to perform imaging. Also, even if you shoot with a camera, there are various methods such as still image shooting, movie shooting, slow motion shooting, panoramic shooting, etc., but you can use physical buttons in a neck-mounted device that does not have a display. There is a limit to selecting an arbitrary method from various shooting methods.

さらに、上記した各種の問題を解消するために、撮像部を常時起動しておくということも考えられるが、装置に搭載されたバッテリー消費が著しく多くなり、装置を長時間連続して使用することができなくなるため現実的ではない。 Further, in order to solve the above-mentioned various problems, it is conceivable to keep the imaging unit always activated, but the battery consumption of the device is significantly increased, and the device is used continuously for a long time. It is not realistic because you cannot do it.

そこで、本発明は、装着者にとって操作しやすい撮像部を備えた首掛け型装置を提供することを主たる目的とする。 Therefore, it is a main object of the present invention to provide a neck-mounted device provided with an imaging unit that is easy for the wearer to operate.

本発明の発明者は、従来発明が抱える問題の解決手段について鋭意検討した結果、首掛け型装置において、装着者の首元を挟んだ位置に配置される２つの腕部の一方に撮像部（カメラ）を配置し、他方に撮像部の制御に関する情報を検知するための非接触型のセンサ部を配置することで、装着者が撮像部の操作を行いやすくなるという知見を得た。そして、発明者は、上記知見に基づけば従来発明の問題を解決できることに想到し、本発明を完成させた。具体的に説明すると、本発明は以下の構成を有する。 As a result of diligent studies on the means for solving the problems of the prior invention, the inventor of the present invention has an imaging unit (on one of the two arms arranged at positions sandwiching the wearer's neck) in the neck-mounted device. It was found that by arranging the camera) and arranging the non-contact type sensor unit for detecting information related to the control of the imaging unit on the other side, it becomes easier for the wearer to operate the imaging unit. Then, the inventor came up with the idea that the problems of the conventional invention could be solved based on the above findings, and completed the present invention. Specifically, the present invention has the following configuration.

本発明は、ユーザの首元に装着される首掛け型装置に関する。本発明に係る首掛け型装置は、首元を挟んだ位置に配置可能な第１腕部と第２腕部を備える。第１腕部には撮像部が設けられ、第２の腕部には非接触型のセンサ部が設けられている。センサ部による検知情報は撮像部の制御に利用される。非接触型のセンサ部としては、例えば光学式、超音波式、磁気式、静電容量式、又は温感式などの近接センサやジェスチャーセンサが挙げられるが、これらのものに限定されない。 The present invention relates to a neck-mounted device worn around the neck of a user. The neck-mounted device according to the present invention includes a first arm portion and a second arm portion that can be arranged at positions sandwiching the neck. An imaging unit is provided on the first arm, and a non-contact sensor unit is provided on the second arm. The detection information from the sensor unit is used to control the imaging unit. Examples of the non-contact type sensor unit include, but are not limited to, a proximity sensor such as an optical type, an ultrasonic type, a magnetic type, a capacitance type, or a warmth type, and a gesture sensor.

上記構成のように、首掛け型装置の一方の腕部に設けられた撮像部を、他方の腕部に設けられた非接触型のセンサ部を介して制御することで、撮像部やセンサ部が装着者の死角に位置する場合であっても、装着者が静止画像や動画像（以下単に画像という）を撮影しやすくなる。また、上記のように撮像部とセンサ部を別々の腕部に配置することで、撮像部の撮像範囲に装着者の手指が入りにくくなるため、撮像画像中に手指が写り込むことを回避しやすくなる。また、非接触型のセンサ部を介して撮像部を制御することにより、撮影時における装置全体の振動を抑制できることから、撮像画像にブレが生じたり、狙った範囲とは異なる範囲を撮像してしまうといった事態も回避できる。さらに、非接触型のセンサ部を採用することで、装着者が装置に直接触れることができない状況であっても、簡単に撮影を行うことができる。また、ジェスチャーセンサでは手指の形や動作に応じて様々な命令を装置に入力することが可能であるため、例えばジェスチャーによって多様な撮影方法の中から任意の方法を選択することも容易になる。また、センサ部の検出情報に基づいて首掛け型装置に搭載されているマイクを起動し、マイクによる音声認識によって様々な命令を装置に入力することとしてもよい。 As in the above configuration, by controlling the image pickup unit provided on one arm of the neck-mounted device via the non-contact sensor section provided on the other arm, the image pickup section and the sensor section can be controlled. Even when is located in the blind spot of the wearer, the wearer can easily take a still image or a moving image (hereinafter, simply referred to as an image). Further, by arranging the imaging unit and the sensor unit on separate arms as described above, it is difficult for the wearer's fingers to enter the imaging range of the imaging unit, so that the fingers are prevented from being reflected in the captured image. It will be easier. In addition, by controlling the image pickup unit via the non-contact type sensor unit, vibration of the entire device during shooting can be suppressed, so that the captured image may be blurred or a range different from the target range may be imaged. It is possible to avoid the situation where it ends up. Further, by adopting the non-contact type sensor unit, it is possible to easily take a picture even in a situation where the wearer cannot directly touch the device. Further, since the gesture sensor can input various commands to the device according to the shape and movement of the fingers, it becomes easy to select an arbitrary method from various shooting methods by, for example, a gesture. Further, the microphone mounted on the neck-mounted device may be activated based on the detection information of the sensor unit, and various commands may be input to the device by voice recognition by the microphone.

本発明に係る首掛け型装置において、撮像部はセンサ部の入力情報に基づいて起動するものであることが好ましい。具体的には、撮像部がスリープ状態（給電停止状態）にある場合に、センサ部が所定の情報を検知したときに撮像部が起動状態（給電状態）となることが好ましい。これにより、撮像部を常時起動しておく必要がなくなる。また、センサ部の消費電力は一般的に撮像部よりも少ないことから、首掛け型装置全体のバッテリー消費を抑えることができる。 In the neck-mounted device according to the present invention, it is preferable that the imaging unit is activated based on the input information of the sensor unit. Specifically, when the imaging unit is in the sleep state (power supply stopped state), it is preferable that the imaging unit is in the activated state (power supply state) when the sensor unit detects predetermined information. This eliminates the need to keep the imaging unit always running. Further, since the power consumption of the sensor unit is generally smaller than that of the image pickup unit, the battery consumption of the entire neck-mounted device can be suppressed.

本発明に係る首掛け型装置は、第１腕部と第２腕部とが装着者の首裏で連結された平面略Ｕ字形であることが好ましい。つまり、首掛け型装置は、首の両側から後ろ側（背中側）にかけて半周回するような形状となる。この場合に、撮像部は第１腕部の先端面に設けられ、センサ部は第２腕部の先端面に設けられていることが好ましい。このように、撮像部とセンサ部をそれぞれの腕部の先端面に設けることで、装着者の正面側を撮影しやすくなるとともに、装着者がセンサ部を介して撮像部を操作しやすくなる。 The neck-hanging device according to the present invention preferably has a substantially U-shape in a plane in which the first arm portion and the second arm portion are connected by the back of the wearer's neck. That is, the neck-hanging device has a shape that makes a half turn from both sides of the neck to the back side (back side). In this case, it is preferable that the imaging unit is provided on the tip surface of the first arm portion and the sensor unit is provided on the tip surface of the second arm portion. By providing the imaging unit and the sensor unit on the tip surfaces of the respective arms in this way, it becomes easier to take a picture of the front side of the wearer, and it becomes easier for the wearer to operate the imaging unit via the sensor unit.

本発明に係る首掛け型装置において、撮像部の光軸は第１腕部の先端面に対して垂直であるか又は上向きに傾いていることが好ましい。より具体的に説明すると、撮像部の光軸が第１腕部の先端面に対して垂直である場合とは、第１腕部の先端面を鉛直に立てたときに、撮像部の光軸が水平と略平行になることを意味する。このように、撮像部の光軸を水平と略平行とすることで、撮像部の撮影画像が装着者が実際に視ている景色に近いものとなる。また、撮像部の光軸が第１腕部の先端面に対して上向きに傾いている場合とは、第１腕部の先端面を鉛直に立てたときに、撮像部の光軸が水平に対して上向きに傾くことを意味する。このように、装着者の首元に位置する撮像部の光軸を上向きに傾けることで、装着者と対話している者（対話者）の顔や口元を撮影しやすくなる。特に、人体の構造上、身体の向きを変えたり屈んだりすることで、首元に位置する撮像部の光軸を、身体の左右方向や垂直方向下側に向けることは比較的容易であるが、この撮像部の光軸を垂直方向上側に向けることは比較的困難である。このため、予め撮像部の光軸を水平に対して上向きに傾けておくことで、装着者に無理な体勢をとることを強いることなく垂直方向上側の空間を撮影することができるようになる。 In the neck-mounted device according to the present invention, it is preferable that the optical axis of the imaging unit is perpendicular to the tip surface of the first arm or is tilted upward. More specifically, when the optical axis of the imaging unit is perpendicular to the tip surface of the first arm portion, the optical axis of the imaging unit is when the tip surface of the first arm portion is vertically erected. Means that is approximately parallel to the horizontal. By making the optical axis of the imaging unit substantially parallel to the horizontal in this way, the captured image of the imaging unit becomes close to the scenery actually viewed by the wearer. Further, when the optical axis of the imaging unit is tilted upward with respect to the tip surface of the first arm portion, the optical axis of the imaging unit becomes horizontal when the tip surface of the first arm portion is vertically erected. On the other hand, it means to lean upward. By tilting the optical axis of the imaging unit located at the neck of the wearer upward in this way, it becomes easier to photograph the face and mouth of the person (interactive person) who is interacting with the wearer. In particular, due to the structure of the human body, it is relatively easy to direct the optical axis of the imaging unit located at the neck to the left-right direction or the vertical direction downward side of the body by turning or bending the body. , It is relatively difficult to direct the optical axis of this imaging unit upward in the vertical direction. Therefore, by tilting the optical axis of the imaging unit upward with respect to the horizontal in advance, it is possible to photograph the space above the vertical direction without forcing the wearer to take an unreasonable posture.

本発明に係る首掛け型装置において、撮像部の垂直方向画角は１００度以上であることが好ましい。このように、撮像部に垂直方向の画角の広い広角レンズを用いることで、装着者の首元に位置する撮像部によって、対話者の顔や口元、胸部、さらには必要に応じて全身の画像を撮影しやすくなる。 In the neck-mounted device according to the present invention, the vertical angle of view of the imaging unit is preferably 100 degrees or more. In this way, by using a wide-angle lens with a wide angle of view in the vertical direction for the image pickup unit, the image pickup section located at the wearer's neck allows the interlocutor's face, mouth, chest, and, if necessary, the entire body. It will be easier to take an image.

本発明に係る首掛け型装置において、撮像部が設けられた第１腕部の先端面は、第１腕部の下縁となす角が鋭角なるように傾斜していることが好ましい。このように設計することで、首掛け型装置の装着時に第１腕部の先端面が鉛直に立ちやすくなり、そこに設けられた撮像部によって広い範囲を効率的に撮像できる。また、装着時において、第１腕部の延長線が対話者の目の方向に向かって延びていると、対話者は撮像部によって撮影されている感覚が強くなり、対話者に対して不快感を与えるおそれがある。この点、装着時において第１腕部の延長線が地面方向を向きつつ、第１腕部の先端面が鉛直に立ち、かつ、撮像部の光軸が水平又は上向きに傾くように首掛け型装置の筐体を設計することで、対話者に与える不快感を軽減しつつ、対話者の顔や口元を効果的に撮影できるようになる。 In the neck-mounted device according to the present invention, it is preferable that the tip surface of the first arm portion provided with the imaging portion is inclined so that the angle formed with the lower edge of the first arm portion is acute. By designing in this way, the tip surface of the first arm portion can easily stand vertically when the neck-mounted device is attached, and the imaging unit provided there can efficiently image a wide range. In addition, if the extension line of the first arm extends toward the eyes of the interlocutor when worn, the interlocutor feels that the image is being photographed by the image pickup unit, which makes the interlocutor uncomfortable. May be given. In this regard, the neck-hanging type so that the extension line of the first arm faces the ground when worn, the tip surface of the first arm stands vertically, and the optical axis of the imaging unit tilts horizontally or upward. By designing the housing of the device, it becomes possible to effectively photograph the face and mouth of the interlocutor while reducing the discomfort given to the interlocutor.

本発明に係る首掛け型装置は撮像部によって撮像された画像に応じて、センサ部の入力情報に基づく制御命令の内容が変化することとしてもよい。例えば、センサ部によって特定のジェスチャーを検出した場合に、撮像部によって風景が撮影されている場合と人物が撮影されている場合とで、その特定のジェスチャーに基づく撮像部の制御命令を変化させることができる。例えば、ある特定のジェスチャーを検出したときに、撮像部によって風景が撮影されている場合にはパノラマ撮影を開始し、人物が撮影されている場合には被写体の顔をオートフォーカスするといったように、同じジェスチャーの意味を撮影状況に応じて変えることができる。このように、撮影状況に応じてジェスチャーの意味を変化させることで、さらに多様な制御命令を首掛け型装置に対して入力できるようになる。 In the neck-mounted device according to the present invention, the content of the control command based on the input information of the sensor unit may be changed according to the image captured by the imaging unit. For example, when a specific gesture is detected by the sensor unit, the control command of the imaging unit based on the specific gesture is changed depending on whether the image pickup unit captures a landscape or a person. Can be done. For example, when a specific gesture is detected, panoramic shooting is started when the landscape is shot by the imaging unit, and the face of the subject is autofocused when a person is shot. The meaning of the same gesture can be changed according to the shooting situation. In this way, by changing the meaning of the gesture according to the shooting situation, it becomes possible to input more various control commands to the neck-mounted device.

本発明によれば、装着者にとって操作しやすい撮像部を備えた首掛け型装置を提供することができる。 According to the present invention, it is possible to provide a neck-mounted device provided with an imaging unit that is easy for the wearer to operate.

図１は、首掛け型装置の実施形態を示した斜視図である。FIG. 1 is a perspective view showing an embodiment of a neck-mounted device. 図２は、首掛け型装置を装着した状態を模式的に示した側面図である。FIG. 2 is a side view schematically showing a state in which the neck-mounted device is attached. 図３は、集音部が設けられる位置を模式的に示した断面図である。FIG. 3 is a cross-sectional view schematically showing the position where the sound collecting unit is provided. 図４は、首掛け型装置の機能構成例を示したブロック図である。FIG. 4 is a block diagram showing a functional configuration example of the neck-mounted device. 図５は、装着者と対話者の音声を取得するビームフォーミング処理を模式的に示している。FIG. 5 schematically shows a beamforming process for acquiring the voices of the wearer and the interlocutor.

以下、図面を用いて本発明を実施するための形態について説明する。本発明は、以下に説明する形態に限定されるものではなく、以下の形態から当業者が自明な範囲で適宜変更したものも含む。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. The present invention is not limited to the forms described below, and includes those which are appropriately modified by those skilled in the art from the following forms to the extent obvious to those skilled in the art.

図１は、本発明に係る首掛け型装置１００の一実施形態を示している。また、図２は、首掛け型装置１００を装着した状態を示している。図１に示されるように、首掛け型装置１００は、左腕部１０、右腕部２０、及び中央集積部３０を備える。左腕部１０と右腕部２０は、それぞれ中央集積部３０の左端と右端から前方に向かって延出しており、首掛け型装置１００は、平面視したときに装置全体として略Ｕ字をなす構造となっている。首掛け型装置１００を装着する際には、図２に示されるように、中央集積部３０を装着者の首裏に接触させ、左腕部１０と右腕部２０を装着者の首横から胸部側に向かって垂らすようにして、装置全体を首元に引っ掛ければよい。 FIG. 1 shows an embodiment of a neck-mounted device 100 according to the present invention. Further, FIG. 2 shows a state in which the neck-mounted device 100 is attached. As shown in FIG. 1, the neck-hanging device 100 includes a left arm portion 10, a right arm portion 20, and a central integration portion 30. The left arm portion 10 and the right arm portion 20 extend forward from the left end and the right end of the central integrated portion 30, respectively, and the neck-mounted device 100 has a structure in which the device as a whole forms a substantially U shape when viewed in a plan view. It has become. When the neck-hanging device 100 is attached, as shown in FIG. 2, the central integrated portion 30 is brought into contact with the back of the wearer's neck, and the left arm portion 10 and the right arm portion 20 are placed on the side of the wearer's neck to the chest side. The entire device can be hooked around the neck so that it hangs down toward.

左腕部１０と右腕部２０には、それぞれ複数の集音部（マイク）４１〜４５が設けられている。集音部４１〜４５は、主に装着者とその対話者の音声を取得することを目的として配置されている。少なくとも、左腕部１０に第１集音部４１と第２集音部４２を設け、右腕部２０に第３集音部４３と第４集音部４４を設ける。また、任意の要素として、左腕部１０と右腕部２０に、一又は複数の集音部を追加で設けることとしてもよい。図１に示した例では、左腕部１０に、上記第１集音部４１及び第２集音部４２に加えて、第５集音部４５を設けることとしている。これらの集音部４１〜４５によって取得した音信号は、中央集積部３０内に設けられた制御部８０（図４参照）へ伝達されて所定の解析処理が行われる。中央集積部３０には、このような制御部８０を含む電子回路やバッテリ（図示省略）などの制御系が内装されている。 A plurality of sound collecting portions (microphones) 41 to 45 are provided on the left arm portion 10 and the right arm portion 20, respectively. The sound collecting units 41 to 45 are arranged mainly for the purpose of acquiring the voices of the wearer and the interlocutor. At least, the left arm portion 10 is provided with the first sound collecting portion 41 and the second sound collecting portion 42, and the right arm portion 20 is provided with the third sound collecting portion 43 and the fourth sound collecting portion 44. Further, as an arbitrary element, one or a plurality of sound collecting portions may be additionally provided in the left arm portion 10 and the right arm portion 20. In the example shown in FIG. 1, the left arm portion 10 is provided with a fifth sound collecting portion 45 in addition to the first sound collecting portion 41 and the second sound collecting portion 42. The sound signals acquired by these sound collecting units 41 to 45 are transmitted to the control unit 80 (see FIG. 4) provided in the central integrated unit 30 to perform a predetermined analysis process. The central integrated unit 30 is equipped with a control system such as an electronic circuit including such a control unit 80 and a battery (not shown).

集音部４１〜４５は、それぞれ左腕部１０と右腕部２０の前方（装着者の胸部側）に設けられている。具体的には、一般的な成人男性（首囲３５〜３７ｃｍ）の首元に首掛け型装置１００を装着することを想定した場合に、少なくとも第１集音部４１から第４集音部４４が、装着者の首よりも前方（胸部側）に位置するように設計されていることが好ましい。首掛け型装置１００は、装着者と対話者の音声を同時に集音することを想定したものであり、各集音部４１〜４４を装着者の首の前方側に配置することで、装着者の音声だけでなく、その対話者の音声を適切に取得することができる。また、各集音部４１〜４４を装着者の首の前方側に配置すると、装着者の背部側に立つ者の音声が装着者の身体によって遮られて、集音部４１〜４４には直接届きにくくなる。装着者の背部側に立つ者は装着者と対話している者ではないと推定されるため、このような者の音声を遮ることで、集音部４１〜４４の物理的な配置によって雑音を抑制できる。 The sound collecting portions 41 to 45 are provided in front of the left arm portion 10 and the right arm portion 20 (on the chest side of the wearer), respectively. Specifically, assuming that the neck-hanging device 100 is attached to the neck of a general adult male (neck circumference 35 to 37 cm), at least the first sound collecting unit 41 to the fourth sound collecting unit 44 However, it is preferable that the device is designed to be located in front of the wearer's neck (chest side). The neck-hanging device 100 is intended to collect the voices of the wearer and the interlocutor at the same time, and by arranging the sound collectors 41 to 44 on the front side of the wearer's neck, the wearer It is possible to appropriately acquire not only the voice of the person but also the voice of the interlocutor. Further, when each sound collecting unit 41 to 44 is arranged on the front side of the wearer's neck, the voice of the person standing on the back side of the wearer is blocked by the wearer's body, and the sound collecting parts 41 to 44 are directly affected. It becomes difficult to reach. It is presumed that the person standing on the back side of the wearer is not the person who is interacting with the wearer. Therefore, by blocking the voice of such a person, noise is generated by the physical arrangement of the sound collecting units 41 to 44. Can be suppressed.

また、第１集音部４１から第４集音部４４は、左右対称となるように、それぞれ左腕部１０と右腕部２０に配置されている。すなわち、第１集音部４１と第２集音部４２を繋ぐ線分、第３集音部４３と第４集音部４４を繋ぐ線分、第１集音部４１と第３集音部４３を繋ぐ線分、及び第２集音部４２と第４集音部４４を繋ぐ線分からなる四角形状が線対称形となる。具体的に、本実施形態においては、第１集音部４１と第３集音部４３を繋ぐ線分が短辺となる台形状をなしている。ただし、上記四角形は台形状に限られず、長方形や正方形となるように各集音部４１〜４４を配置することもできる。 Further, the first sound collecting unit 41 to the fourth sound collecting unit 44 are arranged on the left arm portion 10 and the right arm portion 20, respectively, so as to be symmetrical. That is, a line segment connecting the first sound collecting unit 41 and the second sound collecting unit 42, a line segment connecting the third sound collecting unit 43 and the fourth sound collecting unit 44, and the first sound collecting unit 41 and the third sound collecting unit 44. A line segment consisting of a line segment connecting 43 and a line segment connecting the second sound collecting unit 42 and the fourth sound collecting unit 44 is a line symmetric shape. Specifically, in the present embodiment, the line segment connecting the first sound collecting unit 41 and the third sound collecting unit 43 has a trapezoidal shape having a short side. However, the quadrangle is not limited to a trapezoidal shape, and each sound collecting unit 41 to 44 can be arranged so as to be a rectangle or a square.

左腕部１０には、さらに撮像部６０が設けられている。具体的には、左腕部１０の先端面１２に撮像部６０が設けられており、この撮像部６０によって装着者の正面側の静止画像や動画像を撮影することができる。撮像部６０によって取得された画像は、中央集積部３０内の制御部８０に伝達され、画像データとして記憶される。また、撮像部６０によって取得された画像をインターネットでサーバ装置へ送信することとしてもよい。また、詳しくは後述するとおり、撮像部６０が取得した画像から対話者の口元の位置を特定して、その口元から発せられた音声を強調する処理（ビームフォーミング処理）を行うことも可能である。 The left arm portion 10 is further provided with an imaging unit 60. Specifically, an image pickup unit 60 is provided on the tip surface 12 of the left arm portion 10, and the image pickup unit 60 can capture a still image or a moving image on the front side of the wearer. The image acquired by the image pickup unit 60 is transmitted to the control unit 80 in the central integration unit 30 and stored as image data. Further, the image acquired by the imaging unit 60 may be transmitted to the server device via the Internet. Further, as will be described in detail later, it is also possible to specify the position of the mouth of the interlocutor from the image acquired by the imaging unit 60 and perform a process (beamforming process) for emphasizing the sound emitted from the mouth. ..

右腕部２０には、さらに非接触型のセンサ部７０が設けられている。センサ部７０は、主に首掛け型装置１００の正面側における装着者の手の動きを検知することを目的として、右腕部２０の先端面２２に配置されている。センサ部７０の検知情報は、撮像部６０の起動や、撮影の開始、停止など、主に撮像部６０の制御に利用される。例えば、センサ部７０は、装着者の手などの物体がそのセンサ部７０に近接したことを検知して撮像部６０を制御することとしてもよいし、あるいはセンサ部７０の検知範囲内で装着者が所定のジェスチャーを行ったことを検知して撮像部６０を制御することとしてもよい。なお、本実施形態において、左腕部１０の先端面１２に撮像部６０を配置し、右腕部２０の先端面２２にセンサ部７０を配置することとしているが、撮像部６０とセンサ部７０の位置を入れ替えることも可能である。 The right arm portion 20 is further provided with a non-contact type sensor portion 70. The sensor unit 70 is arranged on the tip surface 22 of the right arm unit 20 mainly for the purpose of detecting the movement of the wearer's hand on the front side of the neck-mounted device 100. The detection information of the sensor unit 70 is mainly used for controlling the image pickup unit 60, such as starting the image pickup unit 60 and starting and stopping shooting. For example, the sensor unit 70 may control the image pickup unit 60 by detecting that an object such as the wearer's hand is close to the sensor unit 70, or the wearer may control the image pickup unit 60 within the detection range of the sensor unit 70. The imaging unit 60 may be controlled by detecting that the sensor has performed a predetermined gesture. In the present embodiment, the imaging unit 60 is arranged on the tip surface 12 of the left arm portion 10, and the sensor unit 70 is arranged on the tip surface 22 of the right arm portion 20, but the positions of the imaging unit 60 and the sensor unit 70 are arranged. It is also possible to replace.

また、センサ部７０での検知情報を、撮像部６０、集音部４１〜４５、及び／又は制御部８０（メインＣＰＵ）の起動に利用することも可能である。例えば、センサ部７０、集音部４１〜４５、及び制御部８０が常時起動し、撮像部６０が停止している状態において、センサ部７０にて特定のジェスチャーを検知したときに撮像部６０を起動させることができる（条件１）。なお、この条件１では、集音部４１〜４５が特定の音声を検出したときに撮像部６０を起動させることも可能である。あるいは、センサ部７０及び集音部４１〜４５が常時起動し、制御部８０及び撮像部６０が停止している状態において、センサ部７０にて特定のジェスチャーを検知したときに制御部８０と撮像部６０のうちの任意のものを起動させることができる（条件２）。この条件２においても、集音部４１〜４５が特定の音声を検出したときに制御部８０及び撮像部６０を起動させることが可能である。あるいは、センサ部７０のみが常時起動し、集音部４１〜４５、制御部８０、及び撮像部６０が停止している状態において、センサ部７０にて特定のジェスチャーを検知したときに集音部４１〜４５、制御部８０、撮像部６０のうちの任意のものを起動させることができる（条件３）。上記条件１〜条件３は、条件３＞条件２＞条件１の順に消費電力の削減効果が大いといえる。 Further, the detection information in the sensor unit 70 can be used to activate the imaging unit 60, the sound collecting units 41 to 45, and / or the control unit 80 (main CPU). For example, when the sensor unit 70, the sound collecting units 41 to 45, and the control unit 80 are constantly activated and the imaging unit 60 is stopped, when the sensor unit 70 detects a specific gesture, the imaging unit 60 is activated. It can be activated (condition 1). Under this condition 1, it is also possible to activate the imaging unit 60 when the sound collecting units 41 to 45 detect a specific sound. Alternatively, when the sensor unit 70 and the sound collecting units 41 to 45 are constantly activated and the control unit 80 and the imaging unit 60 are stopped and the sensor unit 70 detects a specific gesture, the control unit 80 and the imaging unit 80 are imaged. Any one of the parts 60 can be activated (condition 2). Even under this condition 2, the control unit 80 and the imaging unit 60 can be activated when the sound collecting units 41 to 45 detect a specific sound. Alternatively, when only the sensor unit 70 is always activated and the sound collecting units 41 to 45, the control unit 80, and the imaging unit 60 are stopped, and the sensor unit 70 detects a specific gesture, the sound collecting unit Any of 41 to 45, the control unit 80, and the imaging unit 60 can be activated (condition 3). It can be said that the above-mentioned conditions 1 to 3 have a greater effect of reducing power consumption in the order of condition 3> condition 2> condition 1.

図２の側面図に示されるように、本実施形態では、装着時に左腕部１０の先端面１２（及び右腕部２０の先端面２２）が鉛直になることを理想として、首掛け型装置１００の筐体が設計されている。つまり、首掛け型装置１００は、左腕部１０と右腕部２０が首裏から胸部の鎖骨前付近に向かってやや垂れ下がるように装着され、その鎖骨前辺りに左腕部１０と右腕部２０の先端面１２，２２が位置する。このとき、先端面１２，２２が鉛直方向に対してほぼ平行（±１０度以内）になることが好ましい。 As shown in the side view of FIG. 2, in the present embodiment, the tip surface 12 of the left arm portion 10 (and the tip surface 22 of the right arm portion 20) is ideally vertical when worn, and the neck-hanging device 100 The housing is designed. That is, the neck-hanging device 100 is attached so that the left arm portion 10 and the right arm portion 20 hang slightly from the back of the neck toward the vicinity of the front clavicle of the chest, and the tip surfaces of the left arm portion 10 and the right arm portion 20 are attached to the front portion of the clavicle. 12 and 22 are located. At this time, it is preferable that the tip surfaces 12 and 22 are substantially parallel (within ± 10 degrees) to the vertical direction.

また、上記のように先端面１２，２２を鉛直に立てるために、各腕部１０，２０の先端面１２，２２は、それぞれの下縁１３，２３に対して傾斜した面となっている。図２では、先端面１２，２２と下縁１３，２３のなす角（先端面の傾斜角）を符号θ_１で示している。なお、図２において、直線Ｓは先端面１２，２２と平行な直線を示し、符号Ｌは各腕部１０，２０の下縁１３，２３の延長線を示している。ここで、先端面１２，２２の傾斜角θ_１は、鋭角であり、例えば４０〜８５度であることが好ましく、５０〜８０度又は６０〜８０度であることが特に好ましい。このように、先端面１２，２２を各腕部１０，２０の下縁１３，２３に対して傾斜させることで、装着時に先端面１２，２２が鉛直となりやすい。このため、各先端面１２，２２に設けられた撮像部６０とセンサ部７０によって、装着者の正面側の領域を効率よく撮影あるいは検知することができる。
また、図２において、直線Ａは撮像部６０の光軸を示している。光軸（主軸）とは、撮像部６０のレンズの中心を通る対称軸である。図２に示されるように、装着時において左腕部１０の先端面１２が鉛直になっていると仮定した場合に、撮像部６０の光軸Ａは、ほぼ水平（±１０度）となることが好ましい。このように、首掛け型装置１００の装着状態において撮像部６０の光軸Ａがほぼ水平となることにより、装着者が正面を向いている場合の視線と撮像部６０の光軸Ａがほぼ平行となるため、撮像部６０によって撮像された画像が、装着者が実際に視認している景色に近いものとなる。より具体的に説明すると、図２では、左腕部の先端面１２と撮像部６０の光軸Ａのなす角を符号θ_２で示している。この光軸Ａの傾斜角θ_２は、７５〜１１５度又は８０〜１００度であることが好ましく、８５〜９５度又は９０度であることが特に好ましい。 Further, in order to erect the tip surfaces 12 and 22 vertically as described above, the tip surfaces 12 and 22 of the arm portions 10 and 20 are inclined surfaces with respect to the lower edges 13 and 23, respectively. FIG. 2 shows the angle of the distal end surface 12, 22 and the lower edge 13, 23 (inclination angle of the distal end surface) in the code theta _1. In FIG. 2, the straight line S indicates a straight line parallel to the tip surfaces 12 and 22, and the reference numeral L indicates an extension line of the lower edges 13 and 23 of the arm portions 10 and 20. _{Here, the inclination angle θ 1} of the tip surfaces 12 and 22 is an acute angle, for example, preferably 40 to 85 degrees, and particularly preferably 50 to 80 degrees or 60 to 80 degrees. By inclining the tip surfaces 12 and 22 with respect to the lower edges 13 and 23 of the arm portions 10 and 20 in this way, the tip surfaces 12 and 22 tend to be vertical when mounted. Therefore, the image pickup unit 60 and the sensor unit 70 provided on the front end surfaces 12 and 22 can efficiently photograph or detect the area on the front side of the wearer.
Further, in FIG. 2, the straight line A indicates the optical axis of the imaging unit 60. The optical axis (main axis) is an axis of symmetry that passes through the center of the lens of the imaging unit 60. As shown in FIG. 2, assuming that the tip surface 12 of the left arm portion 10 is vertical at the time of mounting, the optical axis A of the imaging unit 60 may be substantially horizontal (± 10 degrees). preferable. In this way, the optical axis A of the image pickup unit 60 is substantially horizontal when the neck-mounted device 100 is attached, so that the line of sight when the wearer is facing the front and the optical axis A of the image pickup unit 60 are substantially parallel. Therefore, the image captured by the imaging unit 60 is close to the scenery actually viewed by the wearer. More specifically, in FIG. 2, the angle formed by the tip surface 12 of the left arm portion and the optical axis A of the imaging portion 60 is indicated _{by reference numeral θ 2.} The inclination angle θ ₂ of the optical axis A is preferably 75 to 115 degrees or 80 to 100 degrees, and particularly preferably 85 to 95 degrees or 90 degrees.

また、図２において、直線Ａ´は撮像部６０の光軸の別例を示している。図２に示されるように、装着時において左腕部１０の先端面１２が鉛直になっていると仮定した場合に、撮像部６０の光軸Ａ´は、水平（図２中の直線Ａに相当）に対して上向きに傾斜していることが好ましい。前述の通り、装着時において各腕部１０，２０の先端面１２，２２は装着者の鎖骨前付近に位置することになるが、撮像部６０の光軸Ａ´を上向きとすることで、対話者の顔や口元を撮影しやすくなる。また、予め撮像部の光軸Ａ´を水平に対して上向きに傾けておくことで、装着者に無理な体勢をとることを強いることなく垂直方向上側の空間を撮影することができるようになる。より具体的に説明すると、図２では、左腕部の先端面１２と撮像部６０の光軸Ａ´のなす角（光軸の傾斜角）を符号θ_３で示している。この光軸Ａ´の傾斜角θ_３は、装着時において上向きになるように、３０〜８５度であることが好ましく、４０〜８０度又は５０〜８０度であることが特に好ましい。 Further, in FIG. 2, the straight line A'shows another example of the optical axis of the imaging unit 60. As shown in FIG. 2, assuming that the tip surface 12 of the left arm portion 10 is vertical at the time of mounting, the optical axis A'of the imaging unit 60 is horizontal (corresponding to the straight line A in FIG. 2). ) Is preferably inclined upward. As described above, the tip surfaces 12 and 22 of the arm portions 10 and 20 are located in the vicinity of the front of the clavicle of the wearer at the time of wearing. It makes it easier to photograph a person's face and mouth. Further, by tilting the optical axis A'of the imaging unit upward with respect to the horizontal in advance, it becomes possible to photograph the space above the vertical direction without forcing the wearer to take an unreasonable posture. .. More specifically, in FIG. 2, the angle (inclination angle of the optical axis) formed by the tip surface 12 of the left arm portion and the optical axis A'of the imaging unit 60 is indicated _{by reference numeral θ 3.} The inclination angle θ ₃ of the optical axis A'is preferably 30 to 85 degrees, and particularly preferably 40 to 80 degrees or 50 to 80 degrees so that it faces upward when mounted.

また、図２に示されるように、各腕部１０，２０は、その下縁１３，２３と上縁１４，２４の延長線が共に下向であり、地面方向を指している。このため、装着者に対峙した対話者は、左腕部１０の先端面１２に設けられた撮像部６０によって自身の顔を撮影されている印象を受けにくくなる。このように、撮像部６０によって対話者の顔や口元を撮影する場合であっても、対話者に対して不快感を与えにくくしている。他方で、前述したとおり、本実施形態では、装着時に左腕部１０の先端面１２がほぼ鉛直に立ち、この先端面１２に配置された撮像部６０の光軸が上向きになるように設計している。このため、対話者は自身の顔を撮影されている印象を受けにくいものの、実際には撮像部６０によってその対話者の顔や口元を効果的に撮影することができる。 Further, as shown in FIG. 2, in each of the arm portions 10 and 20, the extension lines of the lower edges 13 and 23 and the upper edges 14 and 24 are both downward and point toward the ground. For this reason, the interlocutor facing the wearer is less likely to receive the impression that his / her face is being photographed by the imaging unit 60 provided on the tip surface 12 of the left arm portion 10. In this way, even when the image pickup unit 60 photographs the face and mouth of the interlocutor, it is less likely to cause discomfort to the interlocutor. On the other hand, as described above, in the present embodiment, the tip surface 12 of the left arm portion 10 is designed to stand substantially vertically, and the optical axis of the imaging unit 60 arranged on the tip surface 12 is designed to face upward. There is. Therefore, although the interlocutor is less likely to receive the impression that his / her face is being photographed, the image pickup unit 60 can actually effectively photograph the interlocutor's face and mouth.

また、首掛け型装置１００の構造的特徴として、左腕部１０と右腕部２０は、中央集積部３０との連結部位の近傍にフレキシブル部１１，２１を有する。フレキシブル部１１，２１は、ゴムやシリコーンなどの可撓性材料で形成されている。このため、首掛け型装置１００の装着時に、左腕部１０及び右腕部２０が装着者の首元や肩上にフィットしやすくなる。なお、フレキシブル部１１，２１にも、各集音部４１〜４５と操作部５０を制御部８０に接続する配線が挿通されている。 Further, as a structural feature of the neck-hanging device 100, the left arm portion 10 and the right arm portion 20 have flexible portions 11 and 21 in the vicinity of the connecting portion with the central integrated portion 30. The flexible portions 11 and 21 are made of a flexible material such as rubber or silicone. Therefore, when the neck-hanging device 100 is worn, the left arm portion 10 and the right arm portion 20 are likely to fit on the wearer's neck and shoulders. The flexible units 11 and 21 are also provided with wiring for connecting the sound collecting units 41 to 45 and the operation unit 50 to the control unit 80.

また、中央集積部３０は、左腕部１０及び右腕部２０よりも下方に向かって延出する下垂部３１を有する。中央集積部３０に下垂部３１を設けることで、制御系回路を内装するための空間を確保している。また、中央集積部３０には制御系回路が集中して搭載されている。このため、首掛け型装置１００の全重量を１００％とした場合に、中央集積部３０の重量は４０〜８０％又は５０％〜７０％を占める。このような重量の大きい中央集積部３０を装着者の首裏に配置することで、装着時における安定性が向上する。また、装着者の体幹に近い位置に重量の大きい中央集積部３０を配置することで、装置全体の重量が装着者に与える負荷を軽減できる。 Further, the central accumulation portion 30 has a drooping portion 31 extending downward from the left arm portion 10 and the right arm portion 20. By providing the hanging portion 31 in the central integrated portion 30, a space for incorporating the control system circuit is secured. Further, the control system circuits are centrally mounted on the central integrated portion 30. Therefore, when the total weight of the neck-hanging device 100 is 100%, the weight of the central integrated portion 30 occupies 40 to 80% or 50% to 70%. By arranging such a heavy central integrated portion 30 on the back of the wearer's neck, the stability at the time of wearing is improved. Further, by arranging the heavy central integrated portion 30 at a position close to the wearer's trunk, the load given to the wearer by the weight of the entire device can be reduced.

図３は、集音部４１〜４５が設けられた部位における左腕部１０と右腕部２０の断面形状を模式的に表したものである。図３に示されるように、好ましい実施形態において、左腕部１０と右腕部２０は、集音部４１〜４５が設けられた部位の断面形状が略菱形となる。左腕部１０と右腕部２０は、装着者の頭部（より具体的には装着者の口）に向かって面する傾斜面１０ａ，２０ａをそれぞれ有する。つまり、各傾斜面１０ａ，２０ａに対して垂直な垂線が、装着者の頭部の方を向くこととなる。そして、各集音部４１〜４５は、この左腕部１０と右腕部２０の傾斜面１０ａ，２０ａに設けられている。このように傾斜面１０ａ，２０ａに集音部４１〜４５を配置することで、装着者の口から発せられた音声が直線的に各集音部４１〜４５に到達しやすくなる。また、図３に示されるように、例えば装着者の周囲で発生した風雑音などが各集音部４１〜４５に直接入りにくくなるため、このような雑音を物理的に抑制できる。なお、図３に示した例では、左腕部１０と右腕部２０の断面形状を菱形状としたが、これに限られず、三角形状や五角形状、その他の多角形状など、装着者の頭部に対向する傾斜面１０ａ，２０ａを持つ形状とすることも可能である。 FIG. 3 schematically shows the cross-sectional shapes of the left arm portion 10 and the right arm portion 20 at the portion where the sound collecting portions 41 to 45 are provided. As shown in FIG. 3, in a preferred embodiment, the left arm portion 10 and the right arm portion 20 have a substantially rhombic cross-sectional shape at a portion where the sound collecting portions 41 to 45 are provided. The left arm portion 10 and the right arm portion 20 have inclined surfaces 10a and 20a facing the wearer's head (more specifically, the wearer's mouth), respectively. That is, the perpendicular line perpendicular to the inclined surfaces 10a and 20a faces the wearer's head. The sound collecting portions 41 to 45 are provided on the inclined surfaces 10a and 20a of the left arm portion 10 and the right arm portion 20. By arranging the sound collecting units 41 to 45 on the inclined surfaces 10a and 20a in this way, the sound emitted from the wearer's mouth can easily reach each sound collecting unit 41 to 45 in a straight line. Further, as shown in FIG. 3, for example, wind noise generated around the wearer is less likely to enter the sound collecting units 41 to 45 directly, so that such noise can be physically suppressed. In the example shown in FIG. 3, the cross-sectional shape of the left arm portion 10 and the right arm portion 20 is a rhombic shape, but the cross-sectional shape is not limited to this, and the wearer's head may have a triangular shape, a pentagonal shape, or another polygonal shape. It is also possible to have a shape having facing inclined surfaces 10a and 20a.

図４は、首掛け型装置１００の機能構成を示したブロック図である。図４に示されるように、首掛け型装置１００は、第１集音部４１から第５集音部４５、操作部５０、撮像部６０、センサ部７０、制御部８０、記憶部８１、及び通信部８２を有する。左腕部１０には、第１集音部４１、第２集音部４２、第５集音部４５、操作部５０、及び撮像部６０が配置され、右腕部２０には、第３集音部４３、第４集音部４４、及びセンサ部７０が配置され、中央集積部３０には、制御部８０、記憶部８１、及び通信部８２が配置されている。なお、首掛け型装置１００は、図４に示した機能構成に加えて、放音部（スピーカ）や、ジャイロセンサ、加速度センサ、又はＧＰＳセンサなどのセンサ類など、一般的な携帯型情報端末に搭載されているモジュール機器を適宜搭載することができる。 FIG. 4 is a block diagram showing a functional configuration of the neck-mounted device 100. As shown in FIG. 4, the neck-mounted device 100 includes a first sound collecting unit 41 to a fifth sound collecting unit 45, an operation unit 50, an imaging unit 60, a sensor unit 70, a control unit 80, a storage unit 81, and a storage unit 81. It has a communication unit 82. A first sound collecting unit 41, a second sound collecting unit 42, a fifth sound collecting unit 45, an operating unit 50, and an imaging unit 60 are arranged on the left arm unit 10, and a third sound collecting unit 20 is arranged on the right arm unit 20. 43, a fourth sound collecting unit 44, and a sensor unit 70 are arranged, and a control unit 80, a storage unit 81, and a communication unit 82 are arranged in the central integrated unit 30. In addition to the functional configuration shown in FIG. 4, the neck-mounted device 100 is a general portable information terminal such as a sound emitting unit (speaker), sensors such as a gyro sensor, an acceleration sensor, and a GPS sensor. The module equipment mounted on the device can be mounted as appropriate.

各集音部４１〜４５としては、ダイナミックマイクやコンデンサマイクなど、公知のマイクロホンを採用すればよい。集音部４１〜４５は、音を電信信号に変換し、その電気信号をアンプ回路によって増幅した上で、Ａ／Ｄ変換回路によってデジタル情報に変換して制御部８０へと出力する。本発明の首掛け型装置１００は、装着者の音声だけでなく、その周囲に存在する一又は複数の対話者の音声を取得することを目的の一つとしている。このため、装着者周囲で発生した音を広く集音できるように、各集音部４１〜４５としては、全指向性（無指向性）のマイクロホンを採用することが好ましい。 As each sound collecting unit 41 to 45, a known microphone such as a dynamic microphone or a condenser microphone may be adopted. The sound collecting units 41 to 45 convert the sound into a telegraph signal, amplify the electric signal by the amplifier circuit, convert it into digital information by the A / D conversion circuit, and output it to the control unit 80. One of the objects of the neck-mounted device 100 of the present invention is to acquire not only the voice of the wearer but also the voice of one or more interlocutors existing around the wearer. Therefore, it is preferable to use omnidirectional (omnidirectional) microphones as the sound collecting units 41 to 45 so that the sound generated around the wearer can be widely collected.

操作部５０は、装着者による操作の入力を受け付ける。操作部５０としては、公知のスイッチ回路又はタッチパネルなどを採用することができる。操作部５０は、例えば音声入力の開始又は停止を指示する操作や、装置の電源のＯＮ又はＯＦＦを指示する操作、スピーカの音量の上げ下げを指示する操作、その他首掛け型装置１００の機能の実現に必要な操作を受け付ける。操作部５０を介して入力された情報は制御部８０へと伝達される。 The operation unit 50 receives an input of an operation by the wearer. As the operation unit 50, a known switch circuit, touch panel, or the like can be adopted. The operation unit 50 realizes, for example, an operation of instructing the start or stop of voice input, an operation of instructing the power ON or OFF of the device, an operation of instructing the volume up / down of the speaker, and other functions of the neck-mounted device 100. Accepts the necessary operations. The information input via the operation unit 50 is transmitted to the control unit 80.

撮像部６０は、静止画像又は動画像の画像データを取得する。撮像部６０としては一般的なデジタルカメラを採用すればよい。撮像部６０は、例えば、撮影レンズ、メカシャッター、シャッタードライバ、ＣＣＤイメージセンサユニットなどの光電変換素子、光電変換素子から電荷量を読み出し画像データを生成するデジタルシグナルプロセッサ（ＤＳＰ）、及びＩＣメモリで構成される。また、撮像部６０は、撮影レンズから被写体までの距離を測定するオートフォーカスセンサ（ＡＦセンサ）と、このＡＦセンサが検出した距離に応じて撮影レンズの焦点距離を調整するための機構とを備えることが好ましい。ＡＦセンサの種類は特に限定されないが、位相差センサやコントラストセンサといった公知のパッシブ方式のものを用いればよい。また、ＡＦセンサとして、赤外線や超音波を被写体に向けてその反射光や反射波を受信するアクティブ方式のセンサを用いることもできる。撮像部６０によって取得された画像データは、制御部８０へと供給されて記憶部８１に記憶され、所定の画像解析処理が行われたり、あるいは通信部８２を介してインターネット経由でサーバ装置へと送信される。 The imaging unit 60 acquires image data of a still image or a moving image. A general digital camera may be adopted as the image pickup unit 60. The imaging unit 60 is, for example, a photographing lens, a mechanical shutter, a shutter driver, a photoelectric conversion element such as a CCD image sensor unit, a digital signal processor (DSP) that reads out the amount of charge from the photoelectric conversion element and generates image data, and an IC memory. It is composed. Further, the imaging unit 60 includes an autofocus sensor (AF sensor) that measures the distance from the photographing lens to the subject, and a mechanism for adjusting the focal length of the photographing lens according to the distance detected by the AF sensor. Is preferable. The type of AF sensor is not particularly limited, but a known passive type sensor such as a phase difference sensor or a contrast sensor may be used. Further, as the AF sensor, an active type sensor that directs infrared rays or ultrasonic waves toward the subject and receives the reflected light or the reflected wave can also be used. The image data acquired by the image capturing unit 60 is supplied to the control unit 80 and stored in the storage unit 81 to perform a predetermined image analysis process or to the server device via the Internet via the communication unit 82. Will be sent.

また、撮像部６０は、いわゆる広角レンズを備えるものであることが好ましい。具体的には、撮像部６０の垂直方向画角は、１００〜１８０度であることが好ましく、１１０〜１６０度又は１２０〜１５０度であることが特に好ましい。このように、撮像部６０の垂直方向画角を広角とすることで、少なくとも対話者の頭部から胸部を広く撮影することができ、場合によっては対話者の全身を撮影することも可能となる。また、撮像部６０の水平方向画角は特に制限されないが、１００〜１６０度程度の広角のものを採用することが好ましい。 Further, the imaging unit 60 preferably includes a so-called wide-angle lens. Specifically, the vertical angle of view of the imaging unit 60 is preferably 100 to 180 degrees, and particularly preferably 110 to 160 degrees or 120 to 150 degrees. By setting the vertical angle of view of the imaging unit 60 to a wide angle in this way, it is possible to photograph at least the chest from the head of the interlocutor, and in some cases, the whole body of the interlocutor. .. The horizontal angle of view of the imaging unit 60 is not particularly limited, but it is preferable to use a wide angle of view of about 100 to 160 degrees.

また、撮像部６０は、一般的に消費電力が大きいものであるため、必要な場合に限り起動し、それ以外の場合においてはスリープ状態となっていることが好ましい。具体的には、センサ部７０の検知情報に基づいて、撮像部６０の起動や、撮影の開始又は停止が制御されるが、撮影停止後一定時間が経過した場合には、撮像部６０を再びスリープ状態とすればよい。 Further, since the imaging unit 60 generally consumes a large amount of power, it is preferable that the imaging unit 60 is activated only when necessary and is in a sleep state in other cases. Specifically, the activation of the imaging unit 60 and the start or stop of shooting are controlled based on the detection information of the sensor unit 70, but when a certain time elapses after the shooting is stopped, the imaging unit 60 is restarted. It may be put to sleep.

センサ部７０は、装着者の手指などの物体の動きを検知するための非接触型の検知装置である。センサ部７０の例は、近接センサ又はジェスチャーセンサである。近接センサは、例えば装着者の手指が所定範囲まで近接したことを検知する。近接センサとしては、光学式、超音波式、磁気式、静電容量式、又は温感式などの公知のものを採用できる。ジェスチャーセンサは、例えば装着者の手指の動作や形を検知する。ジェスチャーセンサの例は光学式センサであり、赤外発光ＬＥＤから対象物に向けて光を照射し、その反射光の変化を受光素子で捉えることで対象物の動作や形を検出する。センサ部７０による検知情報は、制御部８０へと伝達され、主に撮像部６０の制御に利用される。また、センサ部７０による検知情報に基づいて、各集音部４１〜４５の制御を行うことも可能である。センサ部７０は、一般的に消費電力が小さいものであるため、首掛け型装置１００の電源がＯＮになっている間は常時起動していることが好ましい。 The sensor unit 70 is a non-contact type detection device for detecting the movement of an object such as a wearer's finger. An example of the sensor unit 70 is a proximity sensor or a gesture sensor. The proximity sensor detects, for example, that the wearer's fingers are close to a predetermined range. As the proximity sensor, a known sensor such as an optical type, an ultrasonic type, a magnetic type, a capacitance type, or a warmth type can be adopted. The gesture sensor detects, for example, the movement and shape of the wearer's fingers. An example of a gesture sensor is an optical sensor, which irradiates light from an infrared light emitting LED toward an object and detects the movement or shape of the object by capturing the change in the reflected light with a light receiving element. The detection information by the sensor unit 70 is transmitted to the control unit 80 and is mainly used for controlling the image pickup unit 60. It is also possible to control each sound collecting unit 41 to 45 based on the detection information by the sensor unit 70. Since the sensor unit 70 generally consumes less power, it is preferable that the sensor unit 70 is always activated while the power of the neck-mounted device 100 is turned on.

制御部８０は、首掛け型装置１００が備える他の要素を制御する演算処理を行う。制御部８０としては、ＣＰＵなどのプロセッサを利用することができる。制御部８０は、基本的に、記憶部８１に記憶されているプログラムを読み出し、このプログラムに従って所定の演算処理を実行する。また、制御部８０は、プログラムに従った演算結果を記憶部８１に適宜書き込んだり読み出したりすることができる。詳しくは後述するが、制御部８０は、主に撮像部６０の制御処理やビームフォーミング処理を行うための音声解析部８０ａ、音声処理部８０ｂ、入力解析部８０ｃ、撮像制御部８０ｄ、及び画像解析部８０ｅを有する。これらの要素８０ａ〜８０ｅは、基本的にソフトウェア上の機能として実現される。ただし、これらの要素はハードウェアの回路として実現されるものであってもよい。 The control unit 80 performs arithmetic processing for controlling other elements included in the neck-mounted device 100. As the control unit 80, a processor such as a CPU can be used. The control unit 80 basically reads a program stored in the storage unit 81 and executes a predetermined arithmetic process according to this program. Further, the control unit 80 can appropriately write and read the calculation result according to the program in the storage unit 81. As will be described in detail later, the control unit 80 mainly performs a voice analysis unit 80a, a voice processing unit 80b, an input analysis unit 80c, an imaging control unit 80d, and an image analysis for performing control processing and beamforming processing of the imaging unit 60. It has a part 80e. These elements 80a to 80e are basically realized as functions on software. However, these elements may be realized as a hardware circuit.

記憶部８１は、制御部８０での演算処理等に用いられる情報やその演算結果を記憶するための要素である。具体的に説明すると、記憶部８１は、汎用的な携帯型の情報通信端末を、本発明に係る音声入力装置として機能させるプログラムを記憶している。ユーザからの指示によりこのプログラムが起動されると、制御部８０によってプログラムに従った処理が実行される。記憶部８１のストレージ機能は、例えばＨＤＤ及びＳＤＤといった不揮発性メモリによって実現できる。また、記憶部８１は、制御部８０による演算処理の途中経過などを書き込む又は読み出すためのメモリとしての機能を有していてもよい。記憶部８１のメモリ機能は、ＲＡＭやＤＲＡＭといった揮発性メモリにより実現できる。また、記憶部８１には、それを所持するユーザ固有のＩＤ情報が記憶されていてもよい。また、記憶部８１には、首掛け型装置１００のネットワーク上の識別情報であるＩＰアドレスが記憶されていてもよい。 The storage unit 81 is an element for storing information used for arithmetic processing and the like in the control unit 80 and the arithmetic result thereof. Specifically, the storage unit 81 stores a program that causes a general-purpose portable information communication terminal to function as a voice input device according to the present invention. When this program is started by an instruction from the user, the control unit 80 executes processing according to the program. The storage function of the storage unit 81 can be realized by, for example, a non-volatile memory such as an HDD and an SDD. Further, the storage unit 81 may have a function as a memory for writing or reading the progress of the arithmetic processing by the control unit 80. The memory function of the storage unit 81 can be realized by a volatile memory such as RAM or DRAM. In addition, the storage unit 81 may store ID information unique to the user who possesses it. Further, the storage unit 81 may store the IP address which is the identification information on the network of the neck-mounted device 100.

また、記憶部８１には、制御部８０によるビームフォーミング処理で利用する学習済みモデルが記憶されていてもよい。学習済みモデルは、例えばクラウド上のサーバ装置においてディープラーニングや強化学習等の機械学習を行うことにより得られた推論モデルである。具体的に説明すると、ビームフォーミング処理では、複数の集音部で取得した音データを解析して、その音を発生した音源の位置又は方向を特定する。このとき、例えば、サーバ装置にある音源の位置情報とその音源から発生した音を複数の集音部で取得したデータとのデータセット（教師データ）を多数蓄積し、これらの教師データ用いた機械学習を実施して学習済みモデルを予め作成しておく。そして、個別の首掛け型装置１００において複数の集音部により音データを取得したときに、この学習済みモデルを参照することで、音源の位置又は方向を効率良く特定することができる。また、首掛け型装置１００は、サーバ装置と通信することによりこの学習済みモデルを随時アップデートすることもできる。 Further, the storage unit 81 may store a learned model used in the beamforming process by the control unit 80. The trained model is an inference model obtained by performing machine learning such as deep learning or reinforcement learning in a server device on the cloud, for example. Specifically, in the beamforming process, sound data acquired by a plurality of sound collecting units is analyzed to specify the position or direction of the sound source that generated the sound. At this time, for example, a machine that accumulates a large number of data sets (teacher data) of the position information of the sound source in the server device and the data obtained from the sound generated from the sound source by a plurality of sound collecting units, and uses these teacher data. Perform training and create a trained model in advance. Then, when sound data is acquired by a plurality of sound collecting units in the individual neck-mounted device 100, the position or direction of the sound source can be efficiently specified by referring to this trained model. In addition, the neck-mounted device 100 can update this learned model at any time by communicating with the server device.

通信部８２は、クラウド上のサーバ装置又は別の首掛け型装置と無線通信するための要素である。通信部８２は、インターネットを介してサーバ装置や別の首掛け型装置と通信を行うために、例えば、３Ｇ（W-CDMA）、４Ｇ（LTE／LTE-Advanced）、５Ｇといった公知の移動通信規格や、Wi-Fi（登録商標）等の無線ＬＡＮ方式で無線通信するための通信モジュールを採用すればよい。また、通信部８２は、別の首掛け型装置と直接的に通信を行うために、Bluetooth（登録商標）やＮＦＣ等の方式の近接無線通信用の通信モジュールを採用することもできる。 The communication unit 82 is an element for wireless communication with a server device on the cloud or another neck-mounted device. The communication unit 82 is a known mobile communication standard such as 3G (W-CDMA), 4G (LTE / LTE-Advanced), and 5G in order to communicate with a server device or another neck-mounted device via the Internet. Alternatively, a communication module for wireless communication using a wireless LAN method such as Wi-Fi (registered trademark) may be adopted. Further, in order to directly communicate with another neck-mounted device, the communication unit 82 can also adopt a communication module for proximity wireless communication of a method such as Bluetooth (registered trademark) or NFC.

続いて、図５を参照して、ビームフォーミング処理について具体的に説明する。ユーザが図１に示した実施形態の首掛け型装置１００を装着すると、図５（ａ）及び図５（ｂ）に示されるように、装着者の首元の胸部側に少なくとも４つの集音部４１〜４４が位置することとなる。なお、第５集音部４５は補助的に集音を行うものであり必須の要素ではないため、ここでの説明は割愛する。本実施形態において、第１集音部４１から第４集音部４４はいずれも全指向性のマイクロホンであり、常時、主に装着者の口から発せられた音声を集音するとともに、その他の装着者周囲の環境音を集音している。なお、消費電力低減のため、各集音部４１〜４４及び制御部８０を停止させておき、センサ部７０にて特定のジェスチャー等を検知したとき、これらの集音部４１〜４４及び制御部８０を起動させることとしてもよい。環境音には、装着者の周囲に位置する対話者の音声が含まれる。装着者及び／又は対話者が音声を発すると、各集音部４１〜４４によって音声データが取得される。各集音部４１〜４４は、それぞれの音声データを制御部８０へと出力する。 Subsequently, the beamforming process will be specifically described with reference to FIG. When the user wears the neck-mounted device 100 of the embodiment shown in FIG. 1, at least four sounds are collected on the chest side of the wearer's neck as shown in FIGS. 5 (a) and 5 (b). Parts 41 to 44 will be located. It should be noted that the fifth sound collecting unit 45 is an auxiliary sound collecting unit and is not an essential element, and therefore the description thereof is omitted here. In the present embodiment, the first sound collecting unit 41 to the fourth sound collecting unit 44 are all omnidirectional microphones, and always collect the sound mainly emitted from the wearer's mouth and other sounds. It collects the environmental sounds around the wearer. In order to reduce power consumption, the sound collecting units 41 to 44 and the control unit 80 are stopped, and when a specific gesture or the like is detected by the sensor unit 70, these sound collecting units 41 to 44 and the control unit are used. 80 may be activated. Environmental sounds include the voices of interlocutors located around the wearer. When the wearer and / or the interlocutor emits a voice, the voice data is acquired by each sound collecting unit 41 to 44. Each sound collecting unit 41 to 44 outputs each voice data to the control unit 80.

制御部８０の音声解析部８０ａは、各集音部４１〜４４で取得した音声データを解析する処理を行う。具体的には、音声解析部８０ａは、各集音部４１〜４４の音声データに基づいて、その音声が発せられた音源の空間上の位置又は方向を特定する。例えば、機械学習済みの学習済みモデルが首掛け型装置１００にインストールされている場合、音声解析部８０ａは、その学習済みモデルを参照して各集音部４１〜４４の音声データから音源の位置又は方向を特定できる。あるいは、各集音部４１間の距離は既知であるため、音声解析部８０ａは、音声が各集音部４１〜４４に到達した時間差に基づいて、各集音部４１〜４４から音源までの距離を求め、その距離から三角測量法により音源の空間位置又は方向を特定することとしてもよい。 The voice analysis unit 80a of the control unit 80 performs a process of analyzing the voice data acquired by the sound collection units 41 to 44. Specifically, the voice analysis unit 80a specifies the position or direction in space of the sound source from which the sound is emitted, based on the voice data of each sound collection unit 41 to 44. For example, when a machine-learned trained model is installed in the neck-mounted device 100, the voice analysis unit 80a refers to the trained model and refers to the position of the sound source from the voice data of each sound collecting unit 41 to 44. Or the direction can be specified. Alternatively, since the distance between the sound collecting units 41 is known, the sound analysis unit 80a transfers the sound from the sound collecting units 41 to 44 to the sound source based on the time difference when the sound reaches the sound collecting units 41 to 44. A distance may be obtained, and the spatial position or direction of the sound source may be specified from the distance by a triangular survey method.

また、音声解析部８０ａは、上記処理により特定した音源の位置又は方向が、装着者の口又は対話者の口と推定される位置又は方向と一致するか否かを判断する。例えば、首掛け型装置１００と装着者の口の位置関係や首掛け型装置１００と対話者の口の位置関係は予め想定可能であるため、その想定される範囲内に音源が位置している場合に、その音源を装着者又は対話者の口であると判断すればよい。また、首掛け型装置１００に対して著しく下方、上方、又は後方に音源が位置している場合、その音源は装着者又は対話者の口ではないと判断できる。 Further, the voice analysis unit 80a determines whether or not the position or direction of the sound source specified by the above process matches the position or direction presumed to be the mouth of the wearer or the mouth of the interlocutor. For example, since the positional relationship between the neck-mounted device 100 and the wearer's mouth and the positional relationship between the neck-mounted device 100 and the mouth of the interlocutor can be assumed in advance, the sound source is located within the assumed range. In some cases, it may be determined that the sound source is the mouth of the wearer or the interlocutor. Further, when the sound source is located significantly below, above, or behind the neck-mounted device 100, it can be determined that the sound source is not the mouth of the wearer or the interlocutor.

次に、制御部８０の音声処理部８０ｂは、音声解析部８０ａが特定した音源の位置又は方向に基づいて、音声データに含まれる音成分を強調又は抑圧する処理を行う。具体的には、音源の位置又は方向が装着者又は対話者の口と推定される位置又は方向と一致する場合、その音源から発せられた音成分を強調する。他方で、音源の位置又は方向が装着者又は対話者の口と一致しない場合、その音源から発せられた音成分は雑音であるとみなして、その音成分を抑圧すればよい。このように、本発明では、複数の全指向性のマイクロホンを用いて全方位の音データを取得し、制御部８０のソフトウェア上の音声処理によって特定の音成分と強調又は抑圧するビームフォーミング処理を行う。これにより、装着者の音声と対話者の音声を同時に取得し、必要に応じてその音声の音成分を強調することが可能となる。 Next, the voice processing unit 80b of the control unit 80 performs a process of emphasizing or suppressing the sound component included in the voice data based on the position or direction of the sound source specified by the voice analysis unit 80a. Specifically, when the position or direction of the sound source matches the position or direction presumed to be the mouth of the wearer or the interlocutor, the sound component emitted from the sound source is emphasized. On the other hand, if the position or direction of the sound source does not match the mouth of the wearer or the interlocutor, the sound component emitted from the sound source may be regarded as noise and the sound component may be suppressed. As described above, in the present invention, the beamforming process of acquiring omnidirectional sound data using a plurality of omnidirectional microphones and emphasizing or suppressing the specific sound component by the sound processing on the software of the control unit 80 is performed. Do. As a result, it is possible to acquire the voice of the wearer and the voice of the interlocutor at the same time and emphasize the sound component of the voice as needed.

また、図５（ｂ）に示されるように、対話者の音声を取得する場合には、撮像部６０を起動させて対話者を撮影することが好ましい。具体的に説明すると、装着者は、非接触型のセンサ部７０の検知範囲内で自身の手指によって所定のジェスチャーを行う。ジェスチャーには、手指で所定の動作を行うことや、手指で所定の形を作ることが含まれる。センサ部７０が手指の動作を検知すると、制御部８０の入力解析部８０ｃは、センサ部７０の検知情報を解析して、装着者の手指のジェスチャーが予め設定されているものに一致するかどうかを判断する。例えば、撮像部６０を起動させるためのジェスチャーや、撮像部６０によって撮影を開始するためのジェスチャー、撮影を停止させるためのジェスチャーなど、撮像部６０の制御に関する所定のジェスチャーが予め設定されているため、入力解析部８０ｃは、センサ部７０の検知情報に基づいて、装着者のジェスチャーが上記した所定のものに一致するかどうかを判断することとなる。 Further, as shown in FIG. 5B, when acquiring the voice of the dialogue person, it is preferable to activate the imaging unit 60 to take a picture of the dialogue person. Specifically, the wearer makes a predetermined gesture with his / her fingers within the detection range of the non-contact type sensor unit 70. Gestures include performing a predetermined action with fingers and forming a predetermined shape with fingers. When the sensor unit 70 detects the movement of the fingers, the input analysis unit 80c of the control unit 80 analyzes the detection information of the sensor unit 70 to see if the gesture of the wearer's fingers matches the preset one. To judge. For example, predetermined gestures related to the control of the imaging unit 60, such as a gesture for activating the imaging unit 60, a gesture for starting shooting by the imaging unit 60, and a gesture for stopping shooting, are preset. The input analysis unit 80c will determine whether or not the wearer's gesture matches the above-mentioned predetermined one based on the detection information of the sensor unit 70.

次に、制御部８０の撮像制御部８０ｄは、入力解析部８０ｃの解析結果に基づいて撮像部６０を制御する。例えば、装着者のジェスチャーが撮像部６０起動用のジェスチャーに一致すると入力解析部８０ｃが判断した場合、撮像制御部８０ｄは撮像部６０を起動させる。また、撮像部６０の起動後、装着者のジェスチャーが撮影開始用のジェスチャーに一致すると入力解析部８０ｃが判断した場合、撮像制御部８０ｄは画像の撮影を開始するように撮像部６０を制御する。さらに、撮影の開始後、装着者のジェスチャーが撮影停止用のジェスチャーに一致すると入力解析部８０ｃが判断した場合、撮像制御部８０ｄは画像の撮影を停止するように撮像部６０を制御する。なお、撮像制御部８０ｄは、撮影停止後一定時間を経過した段階で撮像部６０を再びスリープ状態とすることとしてもよい。 Next, the image pickup control unit 80d of the control unit 80 controls the image pickup unit 60 based on the analysis result of the input analysis unit 80c. For example, when the input analysis unit 80c determines that the wearer's gesture matches the gesture for activating the imaging unit 60, the imaging control unit 80d activates the imaging unit 60. Further, when the input analysis unit 80c determines that the wearer's gesture matches the gesture for starting shooting after the image pickup unit 60 is activated, the image pickup control unit 80d controls the image pickup unit 60 to start shooting an image. .. Further, when the input analysis unit 80c determines that the wearer's gesture matches the gesture for stopping the shooting after the start of shooting, the imaging control unit 80d controls the imaging unit 60 so as to stop the shooting of the image. The image pickup control unit 80d may put the image pickup unit 60 into the sleep state again when a certain period of time has elapsed after the shooting is stopped.

制御部８０の画像解析部８０ｅは、撮像部６０によって取得した静止画像又は動画像の画像データを解析する。例えば、画像解析部８０ｅは、画像データに解析することにより、首掛け型装置１００から対話者の口までの距離や両者の位置関係を特定することができる。また、画像解析部８０ｅは、画像データに基づいて、対話者の口が開いているか否か、あるいは対話者の口が開閉しているか否かを解析することにより、対話者が発声しているか否かを特定することも可能である。画像解析部８０ｅによる解析結果は、上述したビームフォーミング処理に利用される。具体的には、各集音部４１〜４４によって集音した音声データの解析結果に加えて、撮像部６０による画像データの解析結果を利用すれば、対話者の口の空間上の位置や方向を特定する処理の精度を高めることができる。また、画像データに含まれる対話者の口の動作を解析して、その対話者が発声していることを特定することで、その対話者の口から発せられた音声を強調する処理の精度を高めることができる。 The image analysis unit 80e of the control unit 80 analyzes the image data of the still image or the moving image acquired by the image pickup unit 60. For example, the image analysis unit 80e can specify the distance from the neck-hanging device 100 to the mouth of the interlocutor and the positional relationship between the two by analyzing the image data. Further, the image analysis unit 80e analyzes whether or not the interlocutor's mouth is open or whether or not the interlocutor's mouth is open and closed based on the image data to determine whether the interlocutor is uttering. It is also possible to specify whether or not. The analysis result by the image analysis unit 80e is used for the above-mentioned beamforming process. Specifically, if the analysis result of the image data by the imaging unit 60 is used in addition to the analysis result of the sound data collected by each sound collecting unit 41 to 44, the position and direction of the interlocutor's mouth in space can be used. It is possible to improve the accuracy of the process of identifying. In addition, by analyzing the movement of the interlocutor's mouth contained in the image data and identifying that the interlocutor is uttering, the accuracy of the process of emphasizing the voice uttered from the interlocutor's mouth can be improved. Can be enhanced.

音声処理部８０ｂによる処理後の音声データと、撮像部６０によって取得された画像データは、記憶部８１に記憶される。また、制御部８０は、処理後の音声データと画像データを、通信部８２を介してクラウド上のサーバ装置や別の首掛け型装置１００に送信することもできる。サーバ装置は、首掛け型装置１００から受信した音声データに基づいて、音声のテキスト化処理や、翻訳処理、統計処理、その他の任意の言語処理を行うこともできる。また、撮像部６０によって取得された画像データを利用して、上記言語処理の精度を高めることともできる。また、サーバ装置は、首掛け型装置１００から受信した音声データと画像データを機械学習用の教師データとして利用して、学習済みモデルの精度を向上させることも可能である。また、首掛け型装置１００間で音声データを送受信し合うことにより装着者間で遠隔通話を行うこととしてもよい。その際に、首掛け型装置１００同士で近接無線通信を介して直接音声データを送受信することしてもよいし、サーバ装置を介してインターネット経由で首掛け型装置１００同士で音声データを送受信することとしてもよい。 The audio data processed by the audio processing unit 80b and the image data acquired by the imaging unit 60 are stored in the storage unit 81. Further, the control unit 80 can also transmit the processed audio data and the image data to the server device on the cloud or another neck-hanging device 100 via the communication unit 82. The server device can also perform voice text conversion processing, translation processing, statistical processing, and other arbitrary language processing based on the voice data received from the neck-mounted device 100. Further, the accuracy of the language processing can be improved by using the image data acquired by the imaging unit 60. Further, the server device can improve the accuracy of the trained model by using the voice data and the image data received from the neck-mounted device 100 as teacher data for machine learning. Further, the wearer may make a remote call by transmitting and receiving voice data between the neck-mounted devices 100. At that time, voice data may be directly transmitted / received between the neck-mounted devices 100 via proximity wireless communication, or voice data may be transmitted / received between the neck-mounted devices 100 via the Internet via the server device. May be.

本願明細書では、主に、首掛け型装置１００が、機能構成として音声解析部８０ａ、音声処理部８０ｂ、及び画像解析部８０ｅを備えており、ローカルでビームフォーミング処理を実行する実施形態について説明した。ただし、音声解析部８０ａ、音声処理部８０ｂ、及び画像解析部８０ｅのいずれか又は全ての機能を、首掛け型装置１００にインターネットで接続されたクラウド上のサーバ装置に分担させることもできる。この場合、例えば、首掛け型装置１００が各集音部４１〜４５で取得した音声データをサーバ装置に送信し、サーバ装置が音源の位置又は方向を特定したり、装着者又は対話者の音声を強調してそれ以外の雑音を抑制する音声処理を行ったりしてもよい。また、撮像部６０によって取得した画像データを首掛け型装置１００からサーバ装置に送信し、サーバ装置において当該画像データの解析処理を行うこととしてもよい。この場合、首掛け型装置１００とサーバ装置によって音声処理システムが構築されることとなる。 In the specification of the present application, an embodiment in which the neck-mounted device 100 mainly includes a voice analysis unit 80a, a voice processing unit 80b, and an image analysis unit 80e as functional configurations, and executes beamforming processing locally will be described. did. However, any or all of the functions of the voice analysis unit 80a, the voice processing unit 80b, and the image analysis unit 80e can be shared by the server device on the cloud connected to the neck-hanging device 100 via the Internet. In this case, for example, the neck-mounted device 100 transmits the voice data acquired by each sound collecting unit 41 to 45 to the server device, and the server device specifies the position or direction of the sound source, or the sound of the wearer or the interlocutor. You may perform voice processing that emphasizes and suppresses other noises. Further, the image data acquired by the imaging unit 60 may be transmitted from the neck-mounted device 100 to the server device, and the server device may perform the analysis processing of the image data. In this case, the voice processing system is constructed by the neck-mounted device 100 and the server device.

また、センサ部７０による検知情報に基づいて、撮像部６０による撮影方法を制御することも可能である。具体的には、撮像部６０の撮影方法としては、例えば静止画の撮影、動画の撮影、スローモーション撮影、パノラマ撮影、タイムラプス撮影、タイマー撮影などが挙げられる。センサ部７０が手指の動作を検知すると、制御部８０の入力解析部８０ｃは、センサ部７０の検知情報を解析して、装着者の手指のジェスチャーが予め設定されているものに一致するかどうかを判断する。例えば、撮像部６０を撮影方法には、それぞれ固有のジェスチャーが設定されており、入力解析部８０ｃは、センサ部７０の検知情報に基づいて、装着者のジェスチャーが予め設定されたジェスチャーに一致するかどうかを判断することとなる。撮像制御部８０ｄは、入力解析部８０ｃの解析結果に基づいて撮像部６０による撮影方法を制御する。例えば、装着者のジェスチャーが静止画撮影用のジェスチャーに一致すると入力解析部８０ｃが判断した場合、撮像制御部８０ｄは撮像部６０を制御して静止画の撮影を行う。あるいは、装着者のジェスチャーが動画撮影用のジェスチャーに一致すると入力解析部８０ｃが判断した場合、撮像制御部８０ｄは撮像部６０を制御して動画の撮影を行う。このように、装着者のジェスチャーに応じて撮像部６０による撮影方法を指定することができる。 It is also possible to control the photographing method by the imaging unit 60 based on the detection information by the sensor unit 70. Specifically, as a shooting method of the imaging unit 60, for example, still image shooting, moving image shooting, slow motion shooting, panoramic shooting, time-lapse shooting, timer shooting and the like can be mentioned. When the sensor unit 70 detects the movement of the fingers, the input analysis unit 80c of the control unit 80 analyzes the detection information of the sensor unit 70 to see if the gesture of the wearer's fingers matches the preset one. To judge. For example, a unique gesture is set for each shooting method of the imaging unit 60, and the input analysis unit 80c matches the wearer's gesture with a preset gesture based on the detection information of the sensor unit 70. It will be judged whether or not. The image pickup control unit 80d controls the shooting method by the image pickup unit 60 based on the analysis result of the input analysis unit 80c. For example, when the input analysis unit 80c determines that the wearer's gesture matches the gesture for still image shooting, the image pickup control unit 80d controls the image pickup unit 60 to shoot a still image. Alternatively, when the input analysis unit 80c determines that the wearer's gesture matches the gesture for movie shooting, the image pickup control unit 80d controls the image pickup unit 60 to shoot the movie. In this way, the photographing method by the imaging unit 60 can be specified according to the gesture of the wearer.

また、前述した実施形態では、センサ部７０による検知情報に基づいて主に撮像部６０を制御することとしたが、センサ部７０による検知情報に基づいて各集音部４１〜４５を制御することも可能である。例えば、集音部４１〜４５による集音の開始又は停止に関する固有のジェスチャーが予め設定されており、入力解析部８０ｃは、センサ部７０の検知情報に基づいて、装着者のジェスチャーが予め設定されたジェスチャーに一致するかどうかを判断する。そして、集音の開始又は停止に関するジェスチャーが検出された場合に、当該ジェスチャーの検知情報に応じて各集音部４１〜４５によって集音を開始したり停止したりすればよい。 Further, in the above-described embodiment, the imaging unit 60 is mainly controlled based on the detection information by the sensor unit 70, but each sound collecting unit 41 to 45 is controlled based on the detection information by the sensor unit 70. Is also possible. For example, a unique gesture regarding the start or stop of sound collection by the sound collecting units 41 to 45 is preset, and the input analysis unit 80c presets the wearer's gesture based on the detection information of the sensor unit 70. Determine if it matches the gesture you made. Then, when a gesture related to the start or stop of sound collection is detected, the sound collection units 41 to 45 may start or stop sound collection according to the detection information of the gesture.

また、前述した実施形態では、主にセンサ部７０による検知情報に基づいて撮像部６０を制御することとしたが、各集音部４１〜４５に入力された音声情報に基づいて撮像部６０を制御することも可能である。具体的には、音声解析部８０ａが、集音部４１〜４５が取得した音声を解析する。つまり、装着者又は対話者の音声認識を行い、その音声が撮像部６０の制御に関するものであるか否かを判断する。その後、撮像制御部８０ｄが、その音声の解析結果に基づいて撮像部６０を制御する。例えば、撮影開始に関する所定の音声が集音部４１〜４５に入力された場合には、撮像制御部８０ｄは、撮像部６０を起動させて撮影を開始する。また、撮像部６０による撮影方法を指定する所定の音声が集音部４１〜４５に入力された場合には、撮像制御部８０ｄは、撮像部６０を制御して指定された撮影方法を実行する。また、センサ部７０による検知情報に基づいて集音部４１〜４５を起動させた後、集音部４１〜４５に入力された音声情報に基づいて撮像部６０を制御することも可能である。 Further, in the above-described embodiment, the imaging unit 60 is controlled mainly based on the detection information by the sensor unit 70, but the imaging unit 60 is controlled based on the voice information input to the sound collecting units 41 to 45. It is also possible to control. Specifically, the voice analysis unit 80a analyzes the voice acquired by the sound collecting units 41 to 45. That is, the wearer or the interlocutor's voice is recognized, and it is determined whether or not the voice is related to the control of the imaging unit 60. After that, the image pickup control unit 80d controls the image pickup unit 60 based on the analysis result of the sound. For example, when a predetermined sound regarding the start of shooting is input to the sound collecting units 41 to 45, the imaging control unit 80d activates the imaging unit 60 to start shooting. Further, when a predetermined sound for designating the photographing method by the imaging unit 60 is input to the sound collecting units 41 to 45, the imaging control unit 80d controls the imaging unit 60 to execute the designated photographing method. .. It is also possible to start the sound collecting units 41 to 45 based on the detection information by the sensor unit 70, and then control the imaging unit 60 based on the voice information input to the sound collecting units 41 to 45.

また、撮像部６０によって撮像された画像に応じて、センサ部７０の入力情報に基づく制御命令の内容が変化させることも可能である。具体的に説明すると、まず、画像解析部８０eは、撮像部６０によって取得された画像を解析する。例えば、画像に含まれる特徴点に基づいて、画像解析部８０ａは、人物が写った画像であるのか、特定の被写体（人工物や自然物など）が写った画像であるのか、あるいはその画像が撮像された状況（撮影場所や撮影時間、天候など）を特定する。なお、画像に含まれる人物については、その性別や年齢を分類することとしてもよいし、個人を特定することとしてもよい。 It is also possible to change the content of the control command based on the input information of the sensor unit 70 according to the image captured by the image capturing unit 60. Specifically, first, the image analysis unit 80e analyzes the image acquired by the image pickup unit 60. For example, based on the feature points included in the image, the image analysis unit 80a captures the image of a person, the image of a specific subject (artificial object, natural object, etc.), or the image. Identify the situation (shooting location, shooting time, weather, etc.). The person included in the image may be classified by gender or age, or may be identified as an individual.

次に、画像の種類（人物、被写体、状況の種別）に応じて、人の手指によるジェスチャーに基づく制御命令のパターンが記憶部８１記憶されている。このとき、同じジェスチャーであっても、画像の種類によって制御命令が異なることとしてもよい。具体的には、ある同一のジェスチャーであっても、画像に人物が写っている場合には、その人物の顔をフォーカスする制御命令となったり、画像に特徴的な自然物が写っている場合には、その自然物の周囲をパノラマ撮影する制御命令となる。また、画像に写っている人物の性別や年齢、被写体が人工物であるか自然物であるか、あるいは画像の撮影場所や時間、天候などを画像から検出して、ジェスチャーの意味内容を異ならせることもできる。そして、入力解析部８０ｃは、画像解析部８０ｅの画像解析結果を参照して、センサ部７０によって検出されたジェスチャーについて、その画像解析結果に対応する意味内容を特定して、首掛け型装置１００に入力される制御命令を生成する。このように、画像の内容に応じてジェスチャーの意味内容を変化させることで、画像の撮影状況や目的に応じて、様々なバリエーションの制御命令をジェスチャーによって装置に入力することが可能となる。 Next, the storage unit 81 stores a pattern of control commands based on gestures by human fingers according to the type of image (type of person, subject, situation). At this time, even if the gesture is the same, the control command may be different depending on the type of image. Specifically, even with the same gesture, when a person is shown in the image, it becomes a control command to focus the face of the person, or when a characteristic natural object is shown in the image. Is a control command for panoramic photography of the surroundings of the natural object. In addition, the meaning and content of gestures are different by detecting the gender and age of the person in the image, whether the subject is an artificial or natural object, or the shooting location, time, weather, etc. of the image. You can also. Then, the input analysis unit 80c refers to the image analysis result of the image analysis unit 80e, specifies the meaning and content corresponding to the image analysis result of the gesture detected by the sensor unit 70, and the neck-hanging device 100 Generates a control command to be input to. In this way, by changing the meaning and content of the gesture according to the content of the image, it is possible to input various variations of control commands to the device by the gesture according to the shooting situation and purpose of the image.

以上、本願明細書では、本発明の内容を表現するために、図面を参照しながら本発明の実施形態の説明を行った。ただし、本発明は、上記実施形態に限定されるものではなく、本願明細書に記載された事項に基づいて当業者が自明な変更形態や改良形態を包含するものである。 As described above, in the specification of the present application, in order to express the content of the present invention, the embodiments of the present invention have been described with reference to the drawings. However, the present invention is not limited to the above-described embodiment, and includes modifications and improvements which are obvious to those skilled in the art based on the matters described in the present specification.

１０…左腕部１１…フレキシブル部
１２…先端面１３…下縁
１４…上縁２０…右腕部
２１…フレキシブル部２２…先端面
２３…下縁２４…上縁
３０…中央集積部３１…下垂部
４１…第１集音部４２…第２集音部
４３…第３集音部４４…第４集音部
４５…第５集音部５０…操作部
６０…撮像部７０…センサ部
８０…制御部８０ａ…音声解析部
８０ｂ…音声処理部８０ｃ…入力解析部
８０ｄ…撮像制御部８０ｅ…画像解析部
８１…記憶部８２…通信部
１００…首掛け型装置 10 ... Left arm part 11 ... Flexible part 12 ... Tip surface 13 ... Lower edge 14 ... Upper edge 20 ... Right arm part 21 ... Flexible part 22 ... Tip surface 23 ... Lower edge 24 ... Upper edge 30 ... Central accumulation part 31 ... Drooping part 41 ... 1st sound collecting unit 42 ... 2nd sound collecting unit 43 ... 3rd sound collecting unit 44 ... 4th sound collecting unit 45 ... 5th sound collecting unit 50 ... Operation unit 60 ... Imaging unit 70 ... Sensor unit 80 ... Control unit 80a ... Sound analysis unit 80b ... Sound processing unit 80c ... Input analysis unit 80d ... Imaging control unit 80e ... Image analysis unit 81 ... Storage unit 82 ... Communication unit 100 ... Neck-hanging device

Claims

ユーザの首元に装着される首掛け型装置であって、
首元を挟んだ位置に配置可能な第１腕部及び第２腕部と、
前記第１腕部に設けられた撮像部と、
前記第２腕部に設けられ、前記撮像部の制御に関する情報の入力を受け付けるための非接触型のセンサ部を備え、
前記撮像部によって撮像された画像に応じて、前記センサ部の入力情報に基づく制御命令の内容が変化する
首掛け型装置。 A neck-mounted device that is worn around the user's neck.
The first and second arms that can be placed across the neck,
An imaging unit provided on the first arm and
A non-contact type sensor unit provided on the second arm portion for receiving input of information related to control of the imaging unit is provided.
A neck-mounted device that changes the content of control commands based on the input information of the sensor unit according to the image captured by the imaging unit.