WO2024111475A1 - Information processing apparatus, information processing method, and information processing program - Google Patents

Information processing apparatus, information processing method, and information processing program Download PDF

Info

Publication number
WO2024111475A1
WO2024111475A1 PCT/JP2023/041042 JP2023041042W WO2024111475A1 WO 2024111475 A1 WO2024111475 A1 WO 2024111475A1 JP 2023041042 W JP2023041042 W JP 2023041042W WO 2024111475 A1 WO2024111475 A1 WO 2024111475A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
information processing
distance value
face
dimensional coordinate
Prior art date
Application number
PCT/JP2023/041042
Other languages
French (fr)
Japanese (ja)
Inventor
勉 一ノ瀬
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2024111475A1 publication Critical patent/WO2024111475A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S15/00Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems
    • G01S15/86Combinations of sonar systems with lidar systems; Combinations of sonar systems with systems not using wave reflection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/302Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays
    • H04N13/305Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays using lenticular lenses, e.g. arrangements of cylindrical lenses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/366Image reproducers using viewer tracking

Definitions

  • This disclosure relates to an information processing device, an information processing method, and an information processing program.
  • a display related to naked eye stereoscopic display is a light field display, which is typified by the lenticular method.
  • the viewpoint positions of the user's right and left eyes are detected, and the optimal light beam is focused at the viewpoint positions to generate an image for the right eye and an image for the left eye.
  • a technique has been proposed for detecting the viewpoint position that detects feature points in an image including the user's face and tracks the viewpoint position based on these feature points.
  • a known method calculates a distance value based on the distance between the user's right and left eyes (interocular distance) as feature points.
  • this method is affected by individual differences in interocular distance, and errors may occur in the distance value.
  • methods have been proposed to eliminate errors in distance values due to individual differences.
  • the above conventional technology has room for improvement in terms of appropriately correcting errors in distance values.
  • errors in distance values may occur when camera images are affected by disturbances such as lighting, but the above conventional technology does not take such disturbances into account, and therefore cannot be said to be able to appropriately correct errors in distance values.
  • the above conventional technology does not necessarily enable appropriate tracking of the viewpoint position.
  • this disclosure proposes an information processing device, an information processing method, and an information processing program that can achieve appropriate tracking of the viewpoint position.
  • an information processing device includes an imaging unit that captures an image of a user to obtain the captured image, an ultrasonic device that is installed near the imaging unit to detect the user, and an information processing unit that acquires three-dimensional coordinate information of the user's face shown in the captured image, determines whether the user's face has been detected continuously for a predetermined time based on the three-dimensional coordinate information, and corrects a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal acquired from the ultrasonic device based on a determination that the user's face has been detected continuously for the predetermined time.
  • FIG. 1 is a diagram showing an example of the appearance of a stereoscopic display device according to an embodiment
  • 1 is a block diagram showing an example of a system configuration of a stereoscopic display device according to an embodiment.
  • 1 is a block diagram showing an example of the configuration of an information processing device according to an embodiment;
  • FIG. 1 is a diagram showing an outline of distance measurement by face detection.
  • FIG. 1 is a diagram showing the relationship between the directivity of a camera and the directivity of an ultrasonic device.
  • 13A to 13C are diagrams illustrating a specific example of correction determination processing based on a face frame.
  • FIG. 11 is an explanatory diagram illustrating a response delay.
  • 4 is a flowchart showing a processing procedure of the information processing device according to the embodiment.
  • FIG. 11 is a diagram showing the relationship between panel temperature and ambient temperature.
  • 13 is a flowchart showing a calculation process of a distance value Zu.
  • FIG. 2 is a block diagram showing an example of a hardware configuration of a computer corresponding to the information processing device according to the embodiment.
  • the information processing related to the proposed technology of this disclosure corrects errors in distance values due to individual differences in interocular distance and errors in distance values due to disturbances such as lighting by setting the difference between the distance measurement value obtained from a separate ultrasonic device that is less susceptible to disturbances such as lighting and the distance measurement value estimated from face detection as a scale factor when the face frame obtained when the user's face is detected from an image captured by a camera is in the central area of the image.
  • This information processing in detail.
  • Example of the appearance of a stereoscopic display device] 1 is a diagram showing an example of the appearance of a stereoscopic display device 1 according to an embodiment.
  • the stereoscopic display device 1 is, for example, about the same size as a notebook personal computer, but can also be made smaller or larger.
  • the stereoscopic display device 1 corresponds to a spatial reproduction display device that can provide a stereoscopic experience with the naked eye without using an attached tool such as a wearable device equipped with liquid crystal shutters.
  • the stereoscopic display device 1 has a base 2 and a display 3 standing upright from the base 2.
  • the stereoscopic display device 1 has a camera 4 above the display 3, and is configured so that the camera 4 can capture an image of a user positioned in front of the display 3.
  • the stereoscopic display device 1 also has an ultrasonic device 5 disposed thereon that uses ultrasonic waves to measure the distance to the user.
  • the stereoscopic display device 1 can display, for example, stereoscopic images using a lenticular method on the display 3.
  • the stereoscopic display device 1 detects the viewpoint position of a naked-eye user who is not using a dedicated wearable device for stereoscopic display, using images captured by a camera 4.
  • the stereoscopic display device 1 then generates images (parallax images) for the right and left eyes using light rays that are focused at the viewpoint positions of the right and left eyes, respectively, and displays the generated images on the display 3 equipped with a lenticular lens.
  • the user can view stereoscopic images without using a wearable device with liquid crystal shutters, a head-mounted display (HMD), or the like.
  • HMD head-mounted display
  • the camera 4 is integrated with the stereoscopic display device 1, but a configuration in which the camera 4 is externally attached to the stereoscopic display device 1 may also be adopted.
  • the camera 4 is preferably installed near the display 3 so as to include a user viewing the display 3, for example, as shown in Fig. 4.
  • the camera 4 may be installed at a position where it is possible to capture an image of the user, or at a position where the user is included in the captured image.
  • the camera 4 may be a high-speed camera capable of high-speed imaging.
  • the ultrasonic device 5 is integrated into the stereoscopic display device 1, but a configuration in which it is externally attached to the stereoscopic display device 1 may also be adopted.
  • the ultrasonic device 5 is installed near the camera 4 so as to detect the user.
  • the ultrasonic device 5 may be installed directly above the camera 4, as shown in FIG. 4.
  • the ultrasonic device 5 is installed in a positional relationship such that the distance between the ultrasonic device 5 and the user is the same as the distance between the camera 4 and the user.
  • the ultrasonic device 5 is less responsive than the camera 4. Specifically, the ultrasonic device 5 measures distances using ultrasonic waves, which have a speed slower than that of light, and therefore has a slower response speed than the camera 4, which uses the characteristics of light. On the other hand, the ultrasonic device 5 does not use the distance between feature points detected from the captured image (e.g., interocular distance), but calculates distance values using sound characteristics as described above, so the distance values obtained by the ultrasonic device 5 have less error due to individual differences and less error due to external disturbances, and can be said to be highly accurate distance values. In other words, the distance values derived from sound characteristics obtained by the ultrasonic device 5 can be said to be more accurate distance values than the distance values derived from feature points obtained by the camera 4.
  • Example of the configuration of a stereoscopic display device 2 is a block diagram showing an example of a system configuration of a stereoscopic display device 1 according to an embodiment.
  • the stereoscopic display device 1 generally includes an information processing device 100 and a parallax image processing unit 20.
  • the information processing device 100 outputs information indicating a user's viewpoint position, for example, three-dimensional coordinates of the viewpoint position, to the subsequent parallax image processing unit 20. Details of the configuration, operation example, etc. of the information processing device 100 will be described later.
  • the parallax image processing unit 20 has a spatial viewpoint coordinate generation unit 21, a parallax image generation unit 22, and a parallax image display unit 23.
  • the spatial viewpoint coordinate generation unit 21 converts three-dimensional coordinates indicating the viewpoint position output from the information processing device 100 into viewpoint coordinates in a spatial position by applying a known method, and generates viewpoint coordinates in space. More specifically, the viewpoint coordinates are obtained by converting the camera coordinates corresponding to the camera 4 into rendering coordinates by a known coordinate system conversion method.
  • the known conversion method may include a translation process of the coordinate system, an optical axis correction process, conversion to a world coordinate system, etc.
  • the parallax image generation unit 22 generates a stereoscopic image by generating light rays (images) corresponding to the viewpoint coordinates in space.
  • the parallax image display unit 23 is a device that presents a stereoscopic video by continuously displaying the parallax images generated by the parallax image generation unit 22, and corresponds to the display 3 described above.
  • Configuration example of information processing device Fig. 3 is a block diagram showing an example of the configuration of the information processing device 100 according to the embodiment.
  • the information processing device 100 includes an image sensor 101, an ultrasonic sensor 102, a face detection unit 103, a correction determination unit 104, an error calculation unit 105, and a multiplier 106.
  • an image sensor 101 an ultrasonic sensor 102, a face detection unit 103, a correction determination unit 104, an error calculation unit 105, and a multiplier 106 are included in the information processing device 100, but the present invention is not limited to this example.
  • the image sensor 101 which is an example of an imaging unit, is, for example, a CMOS (Complementary Metal Oxide Semiconductor) sensor. Other sensors such as a CCD (Charge Coupled Device) may also be applied as the image sensor 101.
  • the image sensor 101 captures an image of a user positioned in front of the display 3, more specifically, the area around the user's face, and acquires the captured image.
  • the captured image acquired by the image sensor 101 is output after being A/D (Analog to Digital) converted.
  • the image sensor 101 corresponds to the camera 4.
  • an A/D converter or the like may be implemented on the image sensor 101, or may be provided between the image sensor 101 and the face detection unit 103.
  • the image sensor 101 according to the embodiment is configured to be capable of capturing images at a high frame rate.
  • the image sensor 101 is capable of capturing images at 1000 fps (frames per second) or more.
  • the image sensor 101 is described as being capable of capturing images at 1000 fps.
  • the ultrasonic sensor 102 is a device used to measure the distance to a user located in front of the display 3. For example, the ultrasonic sensor 102 transmits an ultrasonic signal. The ultrasonic sensor 102 may output the timing of transmitting the ultrasonic signal and the timing of receiving the ultrasonic signal to the error calculation unit 105 as information based on the ultrasonic signal. Although the ultrasonic sensor 102 is slower and has a narrower angle than the image sensor 101, it enables direct and highly accurate distance measurement without relying on image recognition. The ultrasonic sensor 102 corresponds to the ultrasonic device 5.
  • the face detection unit 103 performs face detection based on the captured image acquired by the image sensor 101, and generates and acquires face detection information including the position information of the user's face and face frame information based on the face detection results.
  • FIG. 4 shows an overview of distance measurement by face detection.
  • FIG. 4 shows a scene in which the distance to the user U is measured by a camera 4 (image sensor 101) and an ultrasonic device 5 (ultrasonic sensor 102) that are externally attached to the stereoscopic display device 1.
  • the camera 4 is installed on top of the display 3 of the stereoscopic display device 1, and the ultrasonic device 5 is installed directly above the camera 4, so that the distances from the camera 4 and the ultrasonic device 5 to the user U are adjusted to be equal.
  • the face detection unit 103 calculates three-dimensional position information of the face as position information of the user U's face.
  • this may be expressed as three-dimensional coordinate information (Xf, Yf, Zf).
  • the X-coordinate component "Xf" included in the three-dimensional coordinate information (Xf, Yf, Zf) is a horizontal component with respect to the ground surface, as shown in FIG. 4, and is expressed as horizontal information (Xf).
  • the Y-coordinate component "Yf” is a vertical component with respect to the ground surface, and is expressed as vertical information (Yf).
  • the Z-coordinate component "Zf” is a depth component with respect to the two-dimensional space formed by the horizontal information (Xf) and vertical information (Xf), and is distance information indicating the distance between the user U and the camera 4.
  • the distance information (Zf) corresponds to the distance value Zf indicating the distance between the user U and the camera 4.
  • the horizontal information (Xf) and vertical information (Yf) correspond to information indicating the viewpoint position of the user U, for example, the two-dimensional coordinates of the viewpoint position.
  • the horizontal information (Xf) and vertical information (Yf) include the coordinates of the right eye of the user U in the captured image and the coordinates of the left eye of the user U in the captured image, and are used when generating viewpoint coordinates in space.
  • the face detection unit 103 outputs the horizontal information (Xf) and vertical information (Yf) to the parallax image processing unit 20 as tracking data.
  • the distance information (Zf) is a distance value (distance value Zf) from the user U calculated based on the distance (e.g., interocular distance) between multiple feature points (e.g., the right eye and the left eye) detected from the captured image. Therefore, as explained above, there is a possibility that the distance value Zf contains an error due to individual differences in interocular distance when compared with the true distance value.
  • the distance value Zf is also information indicating the viewpoint position of the user U, and is therefore used when generating viewpoint coordinates in space.
  • the face detection unit 103 outputs the distance value Zf to the error calculation unit 105 and the multiplier 106 so that a correction process is performed on the distance value Zf.
  • the face detection unit 103 also outputs face frame information to the correction determination unit 104.
  • the distance value Zf (an example of the first distance value) is corrected using the distance value Zu (an example of the second distance value), which is the distance value obtained by the ultrasonic device 5. Since the distance value Zu is calculated using the characteristics of the sound, there is little error due to individual differences or due to the influence of disturbances, and it can be treated as a true value indicating the correct distance from the user U.
  • the correction determination unit 104 determines whether the user's face has been detected continuously for a predetermined time based on the three-dimensional coordinate information (Xf, Yf, Zf), and if the user's face has been detected continuously for the predetermined time, determines that correction processing may be performed. This processing is performed to eliminate the delay in distance measurement caused by the time difference between the time required for the image sensor 101 to respond and the time required for the ultrasonic sensor 102 to respond. Therefore, for example, the correction determination unit 104 determines whether the user's face has been detected continuously during the delay in distance measurement caused by the time difference between the time required for the image sensor 101 to respond and the time required for the ultrasonic sensor 102 to respond. Note that the delay in distance measurement here refers to the delay time between the image sensor 101 and the ultrasonic sensor 102.
  • the camera 4 image sensor 101
  • the ultrasonic device 5 ultrasonic sensor 102
  • the correction judgment process by the correction judgment unit 104 is performed in order to prevent a decrease in distance measurement accuracy due to these characteristics.
  • This correction judgment process is performed to take advantage of the advantage of using the ultrasonic device 5 in combination (narrower detection range than the camera 4), while also resolving the issue that the ultrasonic device 5 has (poor responsiveness compared to the camera 4).
  • the correction judgment process will be described in detail below.
  • FIG. 5 is a diagram showing the relationship between the directivity of the camera 4 and the directivity of the ultrasonic device 5.
  • FIG. 5 shows a detection range AR4 according to the directivity of the camera 4, and a detection range AR5 according to the directivity of the ultrasonic device 5.
  • detection range AR4 may be set to a wider angle than detection range AR5 in order to expand the viewing range of user U. This allows camera 4 to achieve high face detection performance even if user U moves, for example, up, down, left, or right. However, widening the detection range AR4 may result in a decrease in resolution due to the use of a wide-angle lens, fluctuations in the lighting conditions for user U's face depending on the position of user U, errors in the three-dimensional coordinate information (Xf, Yf, Zf), and in particular an increase in error in distance value Zf.
  • the correction determination unit 104 therefore sets an imaging range corresponding to a detection range AR5 narrower than the detection range AR4 as a face position determination frame FL12, and determines that the face frame FL11 detected by the face detection unit 103 is contained within the face position determination frame FL12 as one of the determination conditions for permitting the execution of the correction process. This point will be explained in more detail using FIG. 6.
  • FIG. 6 is a diagram showing a specific example of the correction determination process based on the face frame.
  • FIG. 6(a) shows a captured image IM1, which is an example of an image captured by the image sensor 101.
  • the captured image IM1 includes a user U.
  • the face detection unit 103 detects the face of the user U using the captured image IM1. As a result of the face detection, a face frame FL11 is set in the area including the face as shown in FIG. 6(b), and face frame information indicating the area of the face frame FL11 is obtained.
  • the method of detecting the face of the user U can be a known method, such as a method that utilizes the characteristics of the captured image IM1.
  • the face detection unit 103 outputs the face frame information to the correction determination unit 104.
  • the correction determination unit 104 When the correction determination unit 104 acquires the face frame information, it determines whether or not the face frame FL11 indicated by the face frame information is within the face position determination frame FL12.
  • the face position determination frame FL12 corresponds to the imaging range generated according to the detection range AR5 of the ultrasonic device 5.
  • the correction determination unit 104 may have information indicating the imaging range generated according to the detection range AR5 in advance. On the other hand, the correction determination unit 104 may identify the information indicating the imaging range when acquiring the face frame information.
  • the process in which the correction determination unit 104 determines whether the face frame FL11 is within the face position determination frame FL12 takes advantage of the advantage of the ultrasonic device 5 (which has a narrower detection range than the camera 4) and eliminates the disadvantage of the wide-angle detection range AR4.
  • the correction determination unit 104 determines that the face frame FL11 is within the face position determination frame FL12, it continues the correction determination process using another determination condition that allows the execution of the correction process.
  • the ultrasonic device 5 is known to have slower response than the camera 4. This is because the speed of sound is slower than the speed of light.
  • the camera 4 uses the characteristics of light to capture images, enabling high-speed distance measurement (measurement of distance value Zf) by the face detection unit 103, whereas the ultrasonic device 5 uses the characteristics of sound to measure distance (measurement of distance value Zu), and is therefore less responsive than the camera 4.
  • the ultrasonic device 5 is less responsive than the camera 4, a delay in distance measurement occurs.
  • the time required for the camera 4 to acquire an image and for the face detection unit 103 to calculate the distance value Zf based on the acquired image is taken as required time t1.
  • the time required for the ultrasonic device 5 to calculate the distance value Zu using ultrasonic waves is taken as required time t2.
  • required time t1 ⁇ required time t2 holds.
  • time TM1 the time corresponding to the required time t1
  • time TM2 the time corresponding to the required time t2
  • TM2 the time difference between time TM1 and time TM2.
  • time TM1 is earlier than time TM2
  • the time difference TM2-TM1 is a period during which the ultrasonic device 5 is longer involved in distance measurement than the camera 4, and can be said to be a delay time in distance measurement.
  • FIG. 7 is an explanatory diagram for explaining the response delay.
  • FIG. 7 shows the transition of the distance value Zf and the transition of the distance value Zu over time.
  • FIG. 7 also shows the delay time DTM based on the time difference TM2-TM1.
  • the delay time DTM caused by the responsiveness of the ultrasonic device 5 is used as another judgment condition for permitting the execution of the correction process.
  • the correction judgment unit 104 judges that the face frame FL11 is within the face position judgment frame FL12, it uses the delay time DTM to judge whether or not to execute the correction process.
  • the correction determination unit 104 determines whether or not the delay time DTM has elapsed with the face frame FL11 within the face position determination frame FL12. Then, when the correction determination unit 104 determines that the delay time DTM has elapsed, it permits the execution of the correction process. For example, when the correction determination unit 104 determines that the delay time DTM has elapsed with the face frame FL11 within the face position determination frame FL12, it outputs a signal permitting the execution of the correction process to the error calculation unit 105.
  • the process in which the correction determination unit 104 determines whether the delay time DTM has elapsed when the face frame FL11 is within the face position determination frame FL12 eliminates the problem (poor responsiveness) of the ultrasonic device 5.
  • the error calculation unit 105 calculates the distance value Zu based on the ultrasonic signal acquired from the ultrasonic sensor 102. For example, the error calculation unit 105 calculates the distance value Zu based on the time from when the ultrasonic signal is transmitted to when it is received as information based on the ultrasonic signal. Then, the error calculation unit 105 executes the correction process using the distance value Zf input from the face detection unit 103 and the distance value Zu. Specifically, the error calculation unit 105 calculates the absolute value (absolute error) of the difference between the distance value Zf and the distance value Zu.
  • the error calculation unit 105 calculates the error rate (relative error) error rate SFZ, which is the ratio of the absolute value to the distance value Zu, taking the distance value Zu as the true value.
  • the error calculation unit 105 also outputs the error rate SFZ to the multiplier 106.
  • the multiplier 106 corrects the distance value Zf, which includes an error, by multiplying the distance value Zf input from the face detection unit 103 by the error rate SFZ.
  • the multiplier 106 also outputs the corrected distance value Zf to the parallax image processing unit 20 as tracking data.
  • Fig. 8 is a flowchart showing the processing procedure of the information processing device 100 according to the embodiment.
  • the correction process by the information processing device 100 starts with signal processing that has gone through initial settings after the power supply of the stereoscopic display device 1 is turned on.
  • the stereoscopic display device 1 will be described as including a processor, a memory, a camera 4, an ultrasonic device 5, etc. as components, and at least some of these components constitute a PC.
  • a captured image is acquired via the image sensor 101, and the acquired captured image is supplied to the face detection unit 103 (step S801).
  • the image sensor 101 constantly inputs the captured image to the face detection unit 103 via the memory. For example, if the camera 4 is a high-speed camera with a frame rate of 1000 fps, images are captured into the memory at intervals of 1 ms.
  • the face detection unit 103 determines whether or not the user's face has been detected by face detection processing based on the captured image (step S802). If the user's face has not been detected by the face detection unit 103 (step S802; No), the process returns to step S801.
  • step S803 information generated based on the face detection result, specifically, face detection information including three-dimensional coordinate information (Xf, Yf, Zf) of the user's face and face frame information, is output (step S803).
  • the face detection unit 103 outputs the distance value Zf, which is distance information included in the three-dimensional coordinate information (Xf, Yf, Zf), to the error calculation unit 105 and the multiplier 106.
  • the face detection unit 103 outputs the face frame information to the correction determination unit 104.
  • the correction determination unit 104 determines whether or not the face frame FL11 indicated by the face frame information is within the face position determination frame FL12 (step S804). If the correction determination unit 104 determines that the face frame FL11 is not within the face position determination frame FL12 (step S804; No), the process returns to step S801.
  • step S804 determines whether or not a period of the delay time DTM has elapsed with the face frame FL11 within the face position determination frame FL12 (step S805). If the correction determination unit 104 determines that the state in which the face frame FL11 is within the face position determination frame FL12 has not continued for the period of the delay time DTM (step S805: No), the process returns to step S804.
  • step S805 if the correction determination unit 104 determines that the delay time DTM has elapsed while the face frame FL11 is within the face position determination frame FL12 (step S805; Yes), a signal permitting the execution of correction processing is output to the error calculation unit 105 (step S806).
  • the error calculation unit 105 acquires an ultrasonic signal from the ultrasonic sensor 102, and calculates the distance value Zu from information based on the acquired ultrasonic signal (step S807). For example, the error calculation unit 105 calculates the distance value Zu based on the time from when the ultrasonic signal is transmitted to when it is received, as information based on the ultrasonic signal. The error calculation unit 105 may obtain the speed of sound according to the ambient temperature of the display 3, and calculate a more accurate distance value Zu based on this speed of sound. A method for calculating the distance value Zu taking the ambient temperature into account will be described later.
  • the multiplier 106 corrects the error of the distance value Zf relative to the distance value Zu based on the error rate SFZ (step S809). Specifically, the multiplier 106 multiplies the distance value Zf input from the face detection unit 103 by the error rate SFZ to correct the distance value Zf that includes an error.
  • the multiplier 106 outputs the three-dimensional position information (Xf, Yf, Zf x SFZ) obtained by the correction process in step S809 to the parallax image processing unit 20 as tracking data (step S810).
  • the speed of sound which is the speed at which sound travels through the air
  • the speed of sound is affected by the temperature (air temperature) in the air.
  • the speed of sound is 331.5 meters per second at 1 atmosphere and 0°C, and has the characteristic of increasing or decreasing by 0.6 meters per second for every 1°C temperature change.
  • the error calculation unit 105 can also use the sound speed formula to calculate the distance value Zu using the following formula (1).
  • L (cm) is a value indicating the distance value Zu.
  • T (°C) is a value indicating the air temperature.
  • t (sec) is the time it takes for the ultrasonic device 5 to transmit an ultrasonic signal and receive it. Therefore, “t/2” indicates the time it takes for the ultrasonic wave to reach the user.
  • a temperature based on the ambient temperature of the display 3 is used as this actual air temperature.
  • the panel of the display 3 exhibits certain temperature characteristics due to heat generation, so the ambient temperature of the display 3 is expected to be substantially a temperature according to these temperature characteristics, and the ultrasonic device 5 is installed near the display 3 as shown in FIG. 4.
  • FIG. 9 shows the relationship between the panel temperature and the ambient temperature.
  • FIG. 9 shows a table TB showing the relationship between the panel temperature, which is the temperature generated on the panel, and the ambient temperature generated around the display 3 as a result of the influence of the panel temperature.
  • the panel temperature may be detected by a temperature sensor provided on the panel.
  • the following formula (2) can be derived to estimate the ambient temperature T2 of the display 3 from the panel temperature T1, which is the actual measured value of the current panel temperature.
  • the ambient temperature T2 is approximated using the panel temperature T1 as shown in formula (2).
  • the error calculation unit 105 applies the current panel temperature T1 detected by the temperature sensor to equation (2) to estimate the ambient temperature T2 of the display 3.
  • the error calculation unit 105 then corrects the outside air temperature T0 using the temperature difference T1-T2 between the panel temperature T1 and the ambient temperature T2.
  • the error calculation unit 105 corrects the outside air temperature T0 by adding the temperature difference T1-T2 to the outside air temperature T0.
  • the error calculation unit 105 uses the corrected outside air temperature T0 as the air temperature T in the above equation (1), and solves equation (1) to calculate the value of "L", i.e., the distance value Zu.
  • the distance value Zu calculated here takes into account the temperature (the ambient temperature of the display 3 due to heat generation by the panel) that is affected by the ultrasonic device 5 being placed near the display 3, so it can be said to be a more accurate value than if a general temperature of 15°C were used.
  • Fig. 10 is a flowchart showing the procedure for calculating the distance value Zu.
  • Fig. 10 shows a procedure for correcting the temperature characteristic that the ultrasonic device 5 exhibits due to the influence of the panel of the display 3.
  • the error calculation unit 105 acquires the outside air temperature T0 (step S1001).
  • the error calculation unit 105 may acquire the outside air temperature T0 when the panel is powered on.
  • the outside air temperature T0 here refers to the air temperature in the space in which the stereoscopic display device 1 is placed.
  • the error calculation unit 105 also acquires the current panel temperature T1 (step S1002), and applies the panel temperature T1 to equation (2) to estimate the ambient temperature T2 of the display 3 (step S1003).
  • the error calculation unit 105 then calculates the temperature difference T1-T2 between the panel temperature T1 and the ambient temperature T2, and performs a correction by adding the temperature difference T1-T2 to the outside air temperature T0 (step S1004).
  • This method is based on Newton's law of cooling.
  • the corrected outside air temperature T0 can be said to be the temperature of the space in which the stereoscopic display device 1 is placed, that is, the space through which ultrasonic waves are transmitted.
  • the error calculation unit 105 calculates the sound speed c by using the corrected outside air temperature T0 as the sound transmission temperature (step S1005). Specifically, the error calculation unit 105 uses the corrected outside air temperature T0 as T indicated in the sound speed formula to calculate the sound speed c.
  • the error calculation unit 105 applies the sound speed c to the above formula (1) to calculate the distance value Zu (step S1006).
  • the distance value Zu obtained in step S1006 is input to the information processing device 100. As described in step S807, the input distance value Zu is obtained by the error calculation unit 105.
  • FIG. 11 is a block diagram showing an example of the hardware configuration of a computer corresponding to the information processing device 100 according to the embodiment. Note that Fig. 11 shows an example of the hardware configuration of a computer corresponding to the information processing device 100 according to the embodiment, and the configuration does not need to be limited to that shown in Fig. 11.
  • computer 1000 has a CPU (Central Processing Unit) 1100, RAM (Random Access Memory) 1200, ROM (Read Only Memory) 1300, HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input/output interface 1600.
  • CPU Central Processing Unit
  • RAM Random Access Memory
  • ROM Read Only Memory
  • HDD Hard Disk Drive
  • the CPU 1100 operates based on the programs stored in the ROM 1300 or the HDD 1400 and controls each component. For example, the CPU 1100 loads the programs stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processes corresponding to the various programs.
  • the ROM 1300 stores boot programs such as the Basic Input Output System (BIOS) that is executed by the CPU 1100 when the computer 1000 starts up, as well as programs that depend on the hardware of the computer 1000.
  • BIOS Basic Input Output System
  • HDD 1400 is a computer-readable recording medium that non-temporarily records programs executed by CPU 1100 and data used by such programs. Specifically, HDD 1400 records program data 1450.
  • Program data 1450 is an example of an information processing program for realizing an information processing method according to an embodiment of the present disclosure, and data used by such information processing program.
  • the communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (e.g., the Internet).
  • the CPU 1100 receives data from other devices and transmits data generated by the CPU 1100 to other devices via the communication interface 1500.
  • the input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000.
  • the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600.
  • the CPU 1100 also transmits data to an output device such as a display device, a speaker, or a printer via the input/output interface 1600.
  • the input/output interface 1600 may also function as a media interface that reads programs and the like recorded on a specific recording medium.
  • Examples of media include optical recording media such as DVDs (Digital Versatile Discs) and PDs (Phase change rewritable Disks), magneto-optical recording media such as MOs (Magneto-Optical Disks), tape media, magnetic recording media, and semiconductor memories.
  • optical recording media such as DVDs (Digital Versatile Discs) and PDs (Phase change rewritable Disks)
  • magneto-optical recording media such as MOs (Magneto-Optical Disks)
  • tape media magnetic recording media
  • magnetic recording media and semiconductor memories.
  • the CPU 1100 of the computer 1000 executes an information processing program loaded onto the RAM 1200, thereby implementing the various processing functions executed by each process shown in FIG. 3. That is, the CPU 1100 and the RAM 1200, etc., work together with the software (the information processing program loaded onto the RAM 1200) to implement the information processing method by the information processing device 100 according to the embodiment.
  • an imaging unit that captures an image of a user and acquires a captured image
  • an ultrasonic device disposed near the imaging unit so as to detect the user
  • Acquire three-dimensional coordinate information of the face of the user shown in the captured image determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information
  • an information processing unit that corrects a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal acquired from the ultrasonic device based on a determination that the user's face has been detected continuously for a predetermined period of time.
  • the information processing device corrects an error of the first distance value with respect to the second distance value based on the second distance value.
  • the information processing unit includes: calculating an error rate when the second distance value is set as a true value by using a difference between the first distance value and the second distance value; The information processing device according to (2), further comprising: correcting an error of the first distance value with respect to the second distance value based on the error rate.
  • the imaging unit is installed near the display so that the captured image includes a user viewing the display; The information processing device according to (1), wherein the ultrasonic device is installed near the display.
  • the information processing device calculates the second distance value according to an ambient temperature of the display based on a signal acquired from the ultrasonic device.
  • the information processing unit includes: estimating the ambient temperature based on a panel temperature detected by a temperature sensor provided on a panel of the display; Calculating a sound velocity according to a correction temperature based on a temperature difference between the panel temperature and the ambient temperature; The information processing device according to (5), wherein the second distance value is calculated based on the calculated sound speed.
  • the information processing device further comprising, as the display, a stereoscopic display on which a stereoscopic image generated using a viewpoint position of the user is displayed.
  • the predetermined time represents a delay time that is a difference between a time required for the imaging unit to respond and a time required for the ultrasonic device to respond,
  • the information processing device determines whether the user's face has been detected continuously during the delay time as the determination of whether the user's face has been detected.
  • the computer Acquire three-dimensional coordinate information of a face of the user shown in an image of the user captured by an imaging unit; determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information; An information processing method that executes a process of correcting a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal obtained from an ultrasonic device installed near the imaging unit to detect the user, based on a determination that the user's face has been detected continuously for a predetermined period of time.
  • (10) Computer Acquire three-dimensional coordinate information of a face of the user shown in an image of the user captured by an imaging unit; determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information;
  • An information processing program for functioning as an information processing unit that, based on a determination that the user's face has been detected continuously for a predetermined period of time, corrects a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal obtained from an ultrasonic device installed near the imaging unit to detect the user.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

An information processing apparatus in one embodiment according to the present disclosure comprises: an imaging unit that captures an image of a user and acquires the captured image; an ultrasonic device that is installed near the imaging unit so as to detect the user; and an information processing unit that acquires three-dimensional coordinate information pertaining to the face of the user appearing in the captured image, assesses, on the basis of the three-dimensional coordinate information, whether the face of the user is detected continuously for a prescribed duration, and, on the basis of an assessment that the face of the user is detected continuously for the prescribed duration, corrects a first distance value indicated by the three-dimensional coordinate information on the basis of a second distance value that is based on a signal acquired from the ultrasonic device.

Description

情報処理装置、情報処理方法および情報処理プログラムInformation processing device, information processing method, and information processing program
 本開示は、情報処理装置、情報処理方法および情報処理プログラムに関する。 This disclosure relates to an information processing device, an information processing method, and an information processing program.
 近年、ディスプレイに立体視画像を表示させる様々な技術が提案されている。立体視画像を表示させる技術のうち、液晶シャッタを備えるウェアラブルデバイス等の付属のツールを必要としない技術として、所謂、裸眼立体視表示に関する提案もなされている。例えば、裸眼立体視表示に関するディスプレイとして、レンチキュラ方式に代表されるLight Fieldディスプレイがある。 In recent years, various technologies have been proposed for displaying stereoscopic images on displays. Among the technologies for displaying stereoscopic images, so-called naked eye stereoscopic display has been proposed as a technology that does not require additional tools such as wearable devices equipped with liquid crystal shutters. For example, a display related to naked eye stereoscopic display is a light field display, which is typified by the lenticular method.
 Light Fieldディスプレイに立体視画像を表示する場合、ユーザの右眼および左眼それぞれの視点位置を検出し、視点位置に最適な光線を集光し、右眼用の画像および左眼用の画像を生成する。例えば、視点位置を検出する技術として、ユーザの顔を含む画像の特徴点を検出し、この特徴点に基づき視点位置をトラッキングする手法が提案されている。 When displaying a stereoscopic image on a Light Field display, the viewpoint positions of the user's right and left eyes are detected, and the optimal light beam is focused at the viewpoint positions to generate an image for the right eye and an image for the left eye. For example, a technique has been proposed for detecting the viewpoint position that detects feature points in an image including the user's face and tracks the viewpoint position based on these feature points.
 非装着型のディスプレイによる裸眼立体視表示では、ユーザとディスプレイの距離の変化に応じて右眼用の画像と左眼用の画像の視差を変化させる必要がある。このため、ユーザの視聴距離を表す距離値を正確に取得することが重要である。例えば、ユーザの右眼、左眼を特徴点として、特徴点間の距離(眼間距離)を基にして、距離値を算出する手法が知られている。しかし、この手法は眼間距離の個人差の影響を受けるため、距離値に誤差が生じる場合がある。この課題に対し、個人差による距離値の誤差を解消する手法も提案されている。 In naked-eye stereoscopic display using a non-wearable display, it is necessary to change the parallax between the right-eye image and the left-eye image according to changes in the distance between the user and the display. For this reason, it is important to accurately obtain a distance value that represents the user's viewing distance. For example, a known method calculates a distance value based on the distance between the user's right and left eyes (interocular distance) as feature points. However, this method is affected by individual differences in interocular distance, and errors may occur in the distance value. In response to this issue, methods have been proposed to eliminate errors in distance values due to individual differences.
特開2013-134599号公報JP 2013-134599 A
 しかしながら、上記の従来技術では、距離値の誤差を適切に補正する点で改善の余地があるといえる。例えば、カメラによる撮像が照明等による外乱影響を受けることで、距離値に誤差が生じる場合もあり、上記の従来技術では、この外乱影響までは考慮されておらず、距離値の誤差を適切に補正することができるとはいえない。このため、上記の従来技術では、視点位置を適切にトラッキングすることができるとは限らない。 However, the above conventional technology has room for improvement in terms of appropriately correcting errors in distance values. For example, errors in distance values may occur when camera images are affected by disturbances such as lighting, but the above conventional technology does not take such disturbances into account, and therefore cannot be said to be able to appropriately correct errors in distance values. For this reason, the above conventional technology does not necessarily enable appropriate tracking of the viewpoint position.
 そこで、本開示では、適切な視点位置のトラッキングを実現することができる情報処理装置、情報処理方法および情報処理プログラムを提案する。 In view of this, this disclosure proposes an information processing device, an information processing method, and an information processing program that can achieve appropriate tracking of the viewpoint position.
 上記の課題を解決するために、本開示に係る一形態の情報処理装置は、ユーザを撮像して撮像画像を取得する撮像部と、前記ユーザを検出するように前記撮像部の近傍に設置される超音波デバイスと、前記撮像画像が示す前記ユーザの顔の3次元座標情報を取得し、前記3次元座標情報に基づいて、所定時間継続的に前記ユーザの顔が検出されたか否かを判定し、所定時間継続的に前記ユーザの顔が検出されているという判定に基づいて、前記3次元座標情報が示す第1の距離値を、前記超音波デバイスから取得した信号に基づく第2の距離値に基づいて補正する情報処理部とを備える。 In order to solve the above problem, an information processing device according to one embodiment of the present disclosure includes an imaging unit that captures an image of a user to obtain the captured image, an ultrasonic device that is installed near the imaging unit to detect the user, and an information processing unit that acquires three-dimensional coordinate information of the user's face shown in the captured image, determines whether the user's face has been detected continuously for a predetermined time based on the three-dimensional coordinate information, and corrects a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal acquired from the ultrasonic device based on a determination that the user's face has been detected continuously for the predetermined time.
実施形態に係る立体視表示装置の外観例を示す図である。1 is a diagram showing an example of the appearance of a stereoscopic display device according to an embodiment; 実施形態に係る立体視表示装置のシステム構成例を示すブロック図である。1 is a block diagram showing an example of a system configuration of a stereoscopic display device according to an embodiment. 実施形態に係る情報処理装置の構成例を示すブロック図である。1 is a block diagram showing an example of the configuration of an information processing device according to an embodiment; 顔検出による測距の概要を示す図である。FIG. 1 is a diagram showing an outline of distance measurement by face detection. カメラの指向性と、超音波デバイスの指向性との関係性を示す図である。FIG. 1 is a diagram showing the relationship between the directivity of a camera and the directivity of an ultrasonic device. 顔枠に基づく補正判定処理の具体例を示す図である。13A to 13C are diagrams illustrating a specific example of correction determination processing based on a face frame. 応答遅延を説明する説明図である。FIG. 11 is an explanatory diagram illustrating a response delay. 実施形態に係る情報処理装置の処理手順を示すフローチャートである。4 is a flowchart showing a processing procedure of the information processing device according to the embodiment. パネル温度と周辺温度との関係性を示す図である。FIG. 11 is a diagram showing the relationship between panel temperature and ambient temperature. 距離値Zuの算出処理手順を示すフローチャートである。13 is a flowchart showing a calculation process of a distance value Zu. 実施形態に係る情報処理装置に対応するコンピュータのハードウェア構成例を示すブロック図である。FIG. 2 is a block diagram showing an example of a hardware configuration of a computer corresponding to the information processing device according to the embodiment.
 以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、この実施形態により本開示に係る情報処理装置、情報処理方法および情報処理プログラムが限定されるものではない。また、以下の実施形態において、同一の部位には同一の符号を付することにより重複する説明を省略する。 Below, an embodiment of the present disclosure will be described in detail with reference to the drawings. Note that the information processing device, information processing method, and information processing program according to the present disclosure are not limited to this embodiment. Furthermore, in the following embodiments, the same components are given the same reference numerals, and duplicated descriptions will be omitted.
[実施形態]
〔1.はじめに〕
 ユーザの顔検出により得られた両眼位置から求められる眼間距離の個人差により発生する距離値の誤差、および、照明等による外乱影響により発生する距離値の誤差を適応的に補正することで、視点位置を適切にトラッキングすることが求められている。また、より高精度で安定した視点位置のトラッキングを非接触かつ非装着で実現することや、距離値を補正する補正処理を可能な限り高速化することも求められている。
[Embodiment]
1. Introduction
There is a demand for appropriate tracking of the gaze point position by adaptively correcting errors in distance values that occur due to individual differences in interocular distances determined from the positions of both eyes obtained by face detection of the user, and errors in distance values that occur due to disturbance effects from lighting, etc. There is also a demand for more accurate and stable tracking of the gaze point position without contact and without wearing the device, and for the correction process for correcting the distance value to be as fast as possible.
 本開示の提案技術に係る情報処理は、カメラにより撮像された画像からユーザの顔を検出した際に得られる顔枠が画像中央領域にあるときに、別途照明等の外乱影響を受けにくい超音波デバイスから得られる測距値と顔検出から推測される測距値との差分をScale Factorとすることで、眼間距離の個人差による距離値の誤差、照明等の外乱影響による距離値の誤差を補正するものである。以下では、係る情報処理について詳細に説明する。 The information processing related to the proposed technology of this disclosure corrects errors in distance values due to individual differences in interocular distance and errors in distance values due to disturbances such as lighting by setting the difference between the distance measurement value obtained from a separate ultrasonic device that is less susceptible to disturbances such as lighting and the distance measurement value estimated from face detection as a scale factor when the face frame obtained when the user's face is detected from an image captured by a camera is in the central area of the image. The following describes this information processing in detail.
〔2.立体視表示装置の外観例〕
 図1は、実施形態に係る立体視表示装置1の外観例を示す図である。立体視表示装置1は、例えば、ノート型パーソナルコンピュータと同程度の大きさとされているが、より小型化もしくは大型化することも可能である。立体視表示装置1は、液晶シャッタを備えるウェアラブルデバイス等の付属のツールを使用せずとも裸眼で立体視体験を提供することができる空間再現ディスプレイ装置に相当する。
[2. Example of the appearance of a stereoscopic display device]
1 is a diagram showing an example of the appearance of a stereoscopic display device 1 according to an embodiment. The stereoscopic display device 1 is, for example, about the same size as a notebook personal computer, but can also be made smaller or larger. The stereoscopic display device 1 corresponds to a spatial reproduction display device that can provide a stereoscopic experience with the naked eye without using an attached tool such as a wearable device equipped with liquid crystal shutters.
 立体視表示装置1は、ベース2と、ベース2から上方に向かって立設するディスプレイ3とを有する。立体視表示装置1は、ディスプレイ3の上側にカメラ4を有しており、カメラ4によりディスプレイ3の前方に位置するユーザを撮像できるように構成されている。また、立体視表示装置1には、超音波によりユーザとの距離を測定する超音波デバイス5が配置される。 The stereoscopic display device 1 has a base 2 and a display 3 standing upright from the base 2. The stereoscopic display device 1 has a camera 4 above the display 3, and is configured so that the camera 4 can capture an image of a user positioned in front of the display 3. The stereoscopic display device 1 also has an ultrasonic device 5 disposed thereon that uses ultrasonic waves to measure the distance to the user.
 立体視表示装置1は、例えば、レンチキュラ方式による立体視画像をディスプレイ3に表示可能である。立体視表示装置1は、概略的には、立体視表示用の専用のウェアラブルデバイス等を使用していない裸眼のユーザの視点位置を、カメラ4により撮像された撮像画像を使用して検出する。そして、立体視表示装置1は、右眼および左眼それぞれの眼の視点位置に集光する光線で右眼用および左眼用の画像(視差画像)を生成し、生成した画像をレンチキュラレンズが実装されたディスプレイ3に表示する。この結果、ユーザは、液晶シャッタを有するウェアラブルデバイスやHMD(Head-Mounted Display)等を用いることなく、立体視画像を視聴することが可能となる。 The stereoscopic display device 1 can display, for example, stereoscopic images using a lenticular method on the display 3. In general, the stereoscopic display device 1 detects the viewpoint position of a naked-eye user who is not using a dedicated wearable device for stereoscopic display, using images captured by a camera 4. The stereoscopic display device 1 then generates images (parallax images) for the right and left eyes using light rays that are focused at the viewpoint positions of the right and left eyes, respectively, and displays the generated images on the display 3 equipped with a lenticular lens. As a result, the user can view stereoscopic images without using a wearable device with liquid crystal shutters, a head-mounted display (HMD), or the like.
〔3.カメラと超音波デバイスとの関係性〕
 図1の例によれば、カメラ4は、立体視表示装置1に対して一体化されているが、立体視表示装置1に対して外付けする構成が採用されてもよい。この場合、カメラ4は、例えば、図4に示すように、ディスプレイ3を視聴するユーザを含むようにディスプレイ3の近傍に設置されることが好ましい。具体的には、カメラ4は、ユーザを撮像可能な位置、あるいは、撮像画像の中にユーザが収まる位置に設置されてよい。また、カメラ4は、高速な撮像が可能な高速カメラであってよい。
[3. Relationship between cameras and ultrasonic devices]
1, the camera 4 is integrated with the stereoscopic display device 1, but a configuration in which the camera 4 is externally attached to the stereoscopic display device 1 may also be adopted. In this case, the camera 4 is preferably installed near the display 3 so as to include a user viewing the display 3, for example, as shown in Fig. 4. Specifically, the camera 4 may be installed at a position where it is possible to capture an image of the user, or at a position where the user is included in the captured image. Furthermore, the camera 4 may be a high-speed camera capable of high-speed imaging.
 図1の例によれば、超音波デバイス5は、立体視表示装置1に対して一体化されているが、同様に、立体視表示装置1に対して外付けする構成が採用されてもよい。この場合、超音波デバイス5は、ユーザを検出するようにカメラ4の近傍に設置されることが好ましい。具体的には、超音波デバイス5は、図4に示すように、カメラ4の真上に設置されてよい。つまり、超音波デバイス5とユーザとの距離は、カメラ4とユーザとの距離と同等になるような位置関係で超音波デバイス5は設置される。 In the example of FIG. 1, the ultrasonic device 5 is integrated into the stereoscopic display device 1, but a configuration in which it is externally attached to the stereoscopic display device 1 may also be adopted. In this case, it is preferable that the ultrasonic device 5 is installed near the camera 4 so as to detect the user. Specifically, the ultrasonic device 5 may be installed directly above the camera 4, as shown in FIG. 4. In other words, the ultrasonic device 5 is installed in a positional relationship such that the distance between the ultrasonic device 5 and the user is the same as the distance between the camera 4 and the user.
 超音波デバイス5は、カメラ4と比較して応答性には劣る。具体的には、超音波デバイス5は、光よりも速度の遅い超音波で測距するため、光の特性を用いるカメラ4と比較して、応答速度が遅い。一方で、超音波デバイス5は、撮像画像から検出された特徴点間の距離(例えば、眼間距離)を用いるのではなく、上記の通り音の特性を用いて距離値を算出するため、超音波デバイス5により得られた距離値は、個人差による誤差や、外乱影響による誤差が少なく、高精度な距離値といえる。つまり、超音波デバイス5により得られた音特性由来の距離値は、カメラ4により得られた特徴点由来の距離値と比較して、より高精度な距離値といえる。 The ultrasonic device 5 is less responsive than the camera 4. Specifically, the ultrasonic device 5 measures distances using ultrasonic waves, which have a speed slower than that of light, and therefore has a slower response speed than the camera 4, which uses the characteristics of light. On the other hand, the ultrasonic device 5 does not use the distance between feature points detected from the captured image (e.g., interocular distance), but calculates distance values using sound characteristics as described above, so the distance values obtained by the ultrasonic device 5 have less error due to individual differences and less error due to external disturbances, and can be said to be highly accurate distance values. In other words, the distance values derived from sound characteristics obtained by the ultrasonic device 5 can be said to be more accurate distance values than the distance values derived from feature points obtained by the camera 4.
〔4.立体視表示装置の構成例〕
 図2は、実施形態に係る立体視表示装置1のシステム構成例を示すブロック図である。立体視表示装置1は、概略的には、情報処理装置100と、視差画像処理ユニット20とを有する。情報処理装置100は、ユーザの視点位置を示す情報、例えば、視点位置の3次元的な座標を、後段の視差画像処理ユニット20に出力する。情報処理装置100の構成、動作例等の詳細については後述する。
4. Example of the configuration of a stereoscopic display device
2 is a block diagram showing an example of a system configuration of a stereoscopic display device 1 according to an embodiment. The stereoscopic display device 1 generally includes an information processing device 100 and a parallax image processing unit 20. The information processing device 100 outputs information indicating a user's viewpoint position, for example, three-dimensional coordinates of the viewpoint position, to the subsequent parallax image processing unit 20. Details of the configuration, operation example, etc. of the information processing device 100 will be described later.
 視差画像処理ユニット20は、空間視点座標生成部21、視差画像生成部22および視差画像表示部23を有している。空間視点座標生成部21は、情報処理装置100から出力される視点位置を示す3次元座標を、公知の手法を適用して空間位置における視点座標に変換し、空間上の視点座標を生成する。より具体的には、視点座標は、カメラ4に対応するカメラ座標を、公知の座標系の変換手法によりレンダリング座標に変換することで得られる。公知の変換手法は、座標系の並進処理、光軸補正処理、ワールド座標系への変換等を含み得る。視差画像生成部22は、空間上の視点座標に対応する光線(画像)を生成することにより、立体視画像を生成する。視差画像表示部23は、視差画像生成部22により生成された視差画像を連続的に表示することで立体映像を提示するデバイスであり、上述したディスプレイ3に対応する。 The parallax image processing unit 20 has a spatial viewpoint coordinate generation unit 21, a parallax image generation unit 22, and a parallax image display unit 23. The spatial viewpoint coordinate generation unit 21 converts three-dimensional coordinates indicating the viewpoint position output from the information processing device 100 into viewpoint coordinates in a spatial position by applying a known method, and generates viewpoint coordinates in space. More specifically, the viewpoint coordinates are obtained by converting the camera coordinates corresponding to the camera 4 into rendering coordinates by a known coordinate system conversion method. The known conversion method may include a translation process of the coordinate system, an optical axis correction process, conversion to a world coordinate system, etc. The parallax image generation unit 22 generates a stereoscopic image by generating light rays (images) corresponding to the viewpoint coordinates in space. The parallax image display unit 23 is a device that presents a stereoscopic video by continuously displaying the parallax images generated by the parallax image generation unit 22, and corresponds to the display 3 described above.
〔5.情報処理装置の構成例〕
 図3は、実施形態に係る情報処理装置100の構成例を示すブロック図である。図3に示すように、情報処理装置100は、イメージセンサ101と、超音波センサ102と、顔検出部103と、補正判定部104と、誤差算出部105と、乗算器106とを有する。
5. Configuration example of information processing device
Fig. 3 is a block diagram showing an example of the configuration of the information processing device 100 according to the embodiment. As shown in Fig. 3, the information processing device 100 includes an image sensor 101, an ultrasonic sensor 102, a face detection unit 103, a correction determination unit 104, an error calculation unit 105, and a multiplier 106.
 図3の例では、イメージセンサ101と、超音波センサ102と、顔検出部103と、補正判定部104と、誤差算出部105と、乗算器106とが情報処理装置100に含まれる例を示すが、これに限定されるものではない。 In the example of FIG. 3, an image sensor 101, an ultrasonic sensor 102, a face detection unit 103, a correction determination unit 104, an error calculation unit 105, and a multiplier 106 are included in the information processing device 100, but the present invention is not limited to this example.
 撮像部の一例であるイメージセンサ101は、例えば、CMOS(Complementary Metal Oxide Semiconductor)センサである。イメージセンサ101として、CCD(Charge Coupled Device)等の他のセンサが適用されてもよい。イメージセンサ101は、ディスプレイ3の前方に位置するユーザ、より具体的には、ユーザの顔の周囲を撮像し、撮像画像を取得する。イメージセンサ101で取得された撮像画像はA/D(Analog to Digital)変換された後、出力される。イメージセンサ101は、カメラ4に対応する。 The image sensor 101, which is an example of an imaging unit, is, for example, a CMOS (Complementary Metal Oxide Semiconductor) sensor. Other sensors such as a CCD (Charge Coupled Device) may also be applied as the image sensor 101. The image sensor 101 captures an image of a user positioned in front of the display 3, more specifically, the area around the user's face, and acquires the captured image. The captured image acquired by the image sensor 101 is output after being A/D (Analog to Digital) converted. The image sensor 101 corresponds to the camera 4.
 なお、図示は省略しているが、A/D変換器等がイメージセンサ101上に実装されてもよいし、イメージセンサ101と顔検出部103との間に設けられてもよい。また、実施形態に係るイメージセンサ101は、ハイフレームレートの撮像が可能なように構成されている。一例として、イメージセンサ101により1000fps(frame per second)以上の撮像が可能とされている。実施形態では、イメージセンサ101により1000fpsの撮像が可能とされているものとして説明する。 Note that although not shown in the figure, an A/D converter or the like may be implemented on the image sensor 101, or may be provided between the image sensor 101 and the face detection unit 103. Furthermore, the image sensor 101 according to the embodiment is configured to be capable of capturing images at a high frame rate. As an example, the image sensor 101 is capable of capturing images at 1000 fps (frames per second) or more. In the embodiment, the image sensor 101 is described as being capable of capturing images at 1000 fps.
 超音波センサ102は、ディスプレイ3の前方に位置するユーザまでの距離を測定するために用いられるデバイスである。例えば、超音波センサ102は、超音波信号を送信する。なお、超音波センサ102は、超音波信号に基づく情報として、超音波信号を送信したタイミングと、超音波信号を受信したタイミングとを、誤差算出部105に出力してよい。また、超音波センサ102は、イメージセンサ101と比較して低速かつ挟角ではあるものの、画像認識に依らず直接的かつ高精度な測距を可能とする。超音波センサ102は、超音波デバイス5に対応する。 The ultrasonic sensor 102 is a device used to measure the distance to a user located in front of the display 3. For example, the ultrasonic sensor 102 transmits an ultrasonic signal. The ultrasonic sensor 102 may output the timing of transmitting the ultrasonic signal and the timing of receiving the ultrasonic signal to the error calculation unit 105 as information based on the ultrasonic signal. Although the ultrasonic sensor 102 is slower and has a narrower angle than the image sensor 101, it enables direct and highly accurate distance measurement without relying on image recognition. The ultrasonic sensor 102 corresponds to the ultrasonic device 5.
 顔検出部103は、イメージセンサ101により取得された撮像画像に基づいて顔検出を行い、顔検出結果に基づいて、ユーザの顔の位置情報と、顔枠情報とを含む顔検出情報を生成、取得する。 The face detection unit 103 performs face detection based on the captured image acquired by the image sensor 101, and generates and acquires face detection information including the position information of the user's face and face frame information based on the face detection results.
 図4は、顔検出による測距の概要を示すである。図4では、立体視表示装置1に対して外付けされたカメラ4(イメージセンサ101)、超音波デバイス5(超音波センサ102)それぞれによってユーザUとの距離が測定される場面が示される。また、図4の例では、カメラ4が立体視表示装置1のディスプレイ3上部に設置され、そして、超音波デバイス5がカメラ4の真上に設置されることで、カメラ4および超音波デバイス5それぞれからユーザUまでの距離が等しくなるよう調整されている。 FIG. 4 shows an overview of distance measurement by face detection. FIG. 4 shows a scene in which the distance to the user U is measured by a camera 4 (image sensor 101) and an ultrasonic device 5 (ultrasonic sensor 102) that are externally attached to the stereoscopic display device 1. In the example of FIG. 4, the camera 4 is installed on top of the display 3 of the stereoscopic display device 1, and the ultrasonic device 5 is installed directly above the camera 4, so that the distances from the camera 4 and the ultrasonic device 5 to the user U are adjusted to be equal.
 例えば、顔検出部103は、ユーザUの顔の位置情報として、顔の3次元位置情報を算出する。以下、3次元座標情報(Xf、Yf、Zf)と表記する場合がある。なお、3次元座標情報(Xf、Yf、Zf)に含まれるX座標成分「Xf」は、図4に示すように、地表に対して水平方向の成分であり水平情報(Xf)と表記する。Y座標成分「Yf」は、地表に対して垂直方向の成分であり垂直情報(Yf)と表記する。Z座標成分「Zf」は、水平情報(Xf)と垂直情報(Xf)で形成される2次元空間に対する奥行き成分であり、ユーザUとカメラ4との距離を示す距離情報である。すなわち、距離情報(Zf)は、ユーザUとカメラ4との距離を示す距離値Zfに相当する。 For example, the face detection unit 103 calculates three-dimensional position information of the face as position information of the user U's face. Hereinafter, this may be expressed as three-dimensional coordinate information (Xf, Yf, Zf). Note that the X-coordinate component "Xf" included in the three-dimensional coordinate information (Xf, Yf, Zf) is a horizontal component with respect to the ground surface, as shown in FIG. 4, and is expressed as horizontal information (Xf). The Y-coordinate component "Yf" is a vertical component with respect to the ground surface, and is expressed as vertical information (Yf). The Z-coordinate component "Zf" is a depth component with respect to the two-dimensional space formed by the horizontal information (Xf) and vertical information (Xf), and is distance information indicating the distance between the user U and the camera 4. In other words, the distance information (Zf) corresponds to the distance value Zf indicating the distance between the user U and the camera 4.
 ここで、水平情報(Xf)と、垂直情報(Yf)とは、ユーザUの視点位置を示す情報、例えば、視点位置の2次元座標に対応する。例えば、水平情報(Xf)、垂直情報(Yf)には、撮像画像におけるユーザUの右眼の座標と、撮像画像におけるユーザUの左眼の座標とが含まれ、空間上の視点座標を生成する際に用いられる。このため、顔検出部103は、トラッキングデータとして、水平情報(Xf)と、垂直情報(Yf)とを視差画像処理ユニット20に出力する。 Here, the horizontal information (Xf) and vertical information (Yf) correspond to information indicating the viewpoint position of the user U, for example, the two-dimensional coordinates of the viewpoint position. For example, the horizontal information (Xf) and vertical information (Yf) include the coordinates of the right eye of the user U in the captured image and the coordinates of the left eye of the user U in the captured image, and are used when generating viewpoint coordinates in space. For this reason, the face detection unit 103 outputs the horizontal information (Xf) and vertical information (Yf) to the parallax image processing unit 20 as tracking data.
 距離情報(Zf)は、撮像画像から検出された複数の特徴点(例えば、右眼と左眼)の間の距離(例えば、眼間距離)に基づき算出された、ユーザUとの距離値(距離値Zf)である。このため、これまで説明してきたように、距離値Zfには、真の距離値と比較して、眼間距離の個人差による誤差が発生している可能性がある。 The distance information (Zf) is a distance value (distance value Zf) from the user U calculated based on the distance (e.g., interocular distance) between multiple feature points (e.g., the right eye and the left eye) detected from the captured image. Therefore, as explained above, there is a possibility that the distance value Zf contains an error due to individual differences in interocular distance when compared with the true distance value.
 なお、距離値ZfもユーザUの視点位置を示す情報であるため、空間上の視点座標を生成する際に用いられるものである。しかしながら、誤差が発生している状態でトラッキングデータとして出力された場合、空間上の正しい視点位置が得られず、適切な視点位置のトラッキングを実現できなくなる。視点位置を適切にトラッキングできない場合、自然な立体視画像が生成されず、ユーザUに違和感を与えてしまうことになる。そこで、距離値Zfについては補正する必要がある。このため、顔検出部103は、距離値Zfに対して補正処理が行われるよう、距離値Zfを誤差算出部105および乗算器106に出力する。また、顔検出部103は、顔枠情報を補正判定部104に出力する。 Note that the distance value Zf is also information indicating the viewpoint position of the user U, and is therefore used when generating viewpoint coordinates in space. However, if it is output as tracking data while an error has occurred, the correct viewpoint position in space cannot be obtained, and appropriate tracking of the viewpoint position cannot be achieved. If the viewpoint position cannot be tracked appropriately, a natural stereoscopic image will not be generated, which will give the user U an uncomfortable feeling. Therefore, the distance value Zf needs to be corrected. For this reason, the face detection unit 103 outputs the distance value Zf to the error calculation unit 105 and the multiplier 106 so that a correction process is performed on the distance value Zf. The face detection unit 103 also outputs face frame information to the correction determination unit 104.
 補正処理では、超音波デバイス5により得られた距離値である距離値Zu(第2の距離値の一例)を用いて距離値Zf(第1の距離値の一例)が補正される。距離値Zuは、音の特性を用いて算出されているため、個人差による誤差や、外乱影響による誤差が少なく、ユーザUとの正しい距離を示す真値として扱うことができる。 In the correction process, the distance value Zf (an example of the first distance value) is corrected using the distance value Zu (an example of the second distance value), which is the distance value obtained by the ultrasonic device 5. Since the distance value Zu is calculated using the characteristics of the sound, there is little error due to individual differences or due to the influence of disturbances, and it can be treated as a true value indicating the correct distance from the user U.
 図3に戻り、補正判定部104は、3次元座標情報(Xf、Yf、Zf)に基づいて、所定時間継続的にユーザの顔が検出されたか否かを判定し、所定時間継続的にユーザの顔が検出された場合には、補正処理を行ってよいと判定する。係る処理は、イメージセンサ101が応答に要する時間と、超音波センサ102が応答に要する時間との時間差によって生じる測距の遅延時間を解消するために行われる。したがって、例えば、補正判定部104は、イメージセンサ101が応答に要する時間と、超音波センサ102が応答に要する時間との時間差によって生じる測距の遅延時間の間、継続的にユーザの顔が検出されたか否かを判定する。なお、ここでいう測距の遅延時間とは、イメージセンサ101と、超音波センサ102との間での遅延時間である。 Returning to FIG. 3, the correction determination unit 104 determines whether the user's face has been detected continuously for a predetermined time based on the three-dimensional coordinate information (Xf, Yf, Zf), and if the user's face has been detected continuously for the predetermined time, determines that correction processing may be performed. This processing is performed to eliminate the delay in distance measurement caused by the time difference between the time required for the image sensor 101 to respond and the time required for the ultrasonic sensor 102 to respond. Therefore, for example, the correction determination unit 104 determines whether the user's face has been detected continuously during the delay in distance measurement caused by the time difference between the time required for the image sensor 101 to respond and the time required for the ultrasonic sensor 102 to respond. Note that the delay in distance measurement here refers to the delay time between the image sensor 101 and the ultrasonic sensor 102.
 また、カメラ4(イメージセンサ101)と、超音波デバイス5(超音波センサ102)との間では、検出範囲や応答性(応答速度)に関して、特性が異なる。このため、これらの特性に応じて測距精度が低下することを防止する目的でも補正判定部104による補正判定処理が行われる。係る補正判定処理は、超音波デバイス5を併用することの利点(カメラ4よりも検出範囲が狭い)を生かす一方で、超音波デバイス5が有する課題(カメラ4と比較して応答性が劣る)を解消するために行われるものでもある。以下では、補正判定処理について具体的に説明する。 Furthermore, the camera 4 (image sensor 101) and the ultrasonic device 5 (ultrasonic sensor 102) have different characteristics in terms of detection range and responsiveness (response speed). For this reason, the correction judgment process by the correction judgment unit 104 is performed in order to prevent a decrease in distance measurement accuracy due to these characteristics. This correction judgment process is performed to take advantage of the advantage of using the ultrasonic device 5 in combination (narrower detection range than the camera 4), while also resolving the issue that the ultrasonic device 5 has (poor responsiveness compared to the camera 4). The correction judgment process will be described in detail below.
 まず、検出範囲の特性による測距精度への影響を制御する手法について図5を用いて説明する。図5は、カメラ4の指向性と、超音波デバイス5の指向性との関係性を示す図である。図5には、カメラ4の指向性に応じた検出範囲AR4と、超音波デバイス5の指向性に応じた検出範囲AR5とが示される。 First, a method for controlling the effect of the detection range characteristics on distance measurement accuracy will be described with reference to FIG. 5. FIG. 5 is a diagram showing the relationship between the directivity of the camera 4 and the directivity of the ultrasonic device 5. FIG. 5 shows a detection range AR4 according to the directivity of the camera 4, and a detection range AR5 according to the directivity of the ultrasonic device 5.
 図5に示すように、検出範囲AR4は、ユーザUの視聴範囲拡大のため、検出範囲AR5と比較して、より広角に設定される場合がある。このためカメラ4は、ユーザUが例えば上下左右に移動しても高い顔検出性能を実現させることができる。しかしながら、検出範囲AR4の広角化は、広角レンズの利用による解像度の低下、ユーザUの位置に応じた、ユーザUの顔への照明条件の変動、3次元座標情報(Xf、Yf、Zf)の誤差、特に距離値Zfの誤差の増加等を招く恐れがある。 As shown in FIG. 5, detection range AR4 may be set to a wider angle than detection range AR5 in order to expand the viewing range of user U. This allows camera 4 to achieve high face detection performance even if user U moves, for example, up, down, left, or right. However, widening the detection range AR4 may result in a decrease in resolution due to the use of a wide-angle lens, fluctuations in the lighting conditions for user U's face depending on the position of user U, errors in the three-dimensional coordinate information (Xf, Yf, Zf), and in particular an increase in error in distance value Zf.
 そこで、補正判定部104は、検出範囲AR4よりも狭い検出範囲AR5に対応する撮像範囲を顔位置判定枠FL12として設定し、顔位置判定枠FL12内に顔検出部103が検出した顔枠FL11が入っていることを、補正処理の実行を許可する1つの判定条件とする。この点について、図6を用いてより具体的に説明する。図6は、顔枠に基づく補正判定処理の具体例を示す図である。 The correction determination unit 104 therefore sets an imaging range corresponding to a detection range AR5 narrower than the detection range AR4 as a face position determination frame FL12, and determines that the face frame FL11 detected by the face detection unit 103 is contained within the face position determination frame FL12 as one of the determination conditions for permitting the execution of the correction process. This point will be explained in more detail using FIG. 6. FIG. 6 is a diagram showing a specific example of the correction determination process based on the face frame.
 図6(a)には、イメージセンサ101により撮像された画像の一例である撮像画像IM1が示される。撮像画像IM1には、ユーザUが含まれている。 FIG. 6(a) shows a captured image IM1, which is an example of an image captured by the image sensor 101. The captured image IM1 includes a user U.
 顔検出部103は、撮像画像IM1を使用してユーザUの顔を検出する。顔検出の結果、図6(b)に示すように顔を含む領域に顔枠FL11が設定され、顔枠FL11の領域を示す顔枠情報が得られる。なお、ユーザUの顔を検出する方法は、撮像画像IM1の特徴を利用して行う方法等、公知の方法を適用することができる。顔検出部103は、顔枠情報を補正判定部104に出力する。 The face detection unit 103 detects the face of the user U using the captured image IM1. As a result of the face detection, a face frame FL11 is set in the area including the face as shown in FIG. 6(b), and face frame information indicating the area of the face frame FL11 is obtained. Note that the method of detecting the face of the user U can be a known method, such as a method that utilizes the characteristics of the captured image IM1. The face detection unit 103 outputs the face frame information to the correction determination unit 104.
 補正判定部104は、顔枠情報を取得すると、顔枠情報が示す顔枠FL11が、顔位置判定枠FL12内に入っているか否かを判定する。顔位置判定枠FL12は、超音波デバイス5の検出範囲AR5に応じて発生する撮像範囲に対応する。なお、補正判定部104は、検出範囲AR5に応じて発生する撮像範囲を示す情報を予め有していてよい。一方、補正判定部104は、顔枠情報を取得した際に、撮像範囲を示す情報を特定してもよい。 When the correction determination unit 104 acquires the face frame information, it determines whether or not the face frame FL11 indicated by the face frame information is within the face position determination frame FL12. The face position determination frame FL12 corresponds to the imaging range generated according to the detection range AR5 of the ultrasonic device 5. The correction determination unit 104 may have information indicating the imaging range generated according to the detection range AR5 in advance. On the other hand, the correction determination unit 104 may identify the information indicating the imaging range when acquiring the face frame information.
 補正判定部104が、顔位置判定枠FL12内に顔枠FL11が入っているか否か判定する処理は、超音波デバイス5の利点(カメラ4よりも検出範囲が狭い)を生かし、検出範囲AR4の広角化されていることによるデメリットを排除するものである。 The process in which the correction determination unit 104 determines whether the face frame FL11 is within the face position determination frame FL12 takes advantage of the advantage of the ultrasonic device 5 (which has a narrower detection range than the camera 4) and eliminates the disadvantage of the wide-angle detection range AR4.
 そして、補正判定部104は、顔位置判定枠FL12内に顔枠FL11が入っていると判定した場合には、補正処理の実行を許可するもう1つの判定条件を用いて、引き続き補正判定処理を行う。 If the correction determination unit 104 determines that the face frame FL11 is within the face position determination frame FL12, it continues the correction determination process using another determination condition that allows the execution of the correction process.
 これまで説明してきたように、超音波デバイス5は、カメラ4よりも応答性が遅いことが知られている。これは、音速が光速よりも遅いことに起因する。つまり、カメラ4が光の特性を用いて撮像を行うため、顔検出部103による高速な測距(距離値Zfの測定)が可能となる一方、超音波デバイス5は、音の特性を用いて測距(距離値Zuの測定)を行うものであるため、カメラ4よりも応答性が劣っている。 As explained above, the ultrasonic device 5 is known to have slower response than the camera 4. This is because the speed of sound is slower than the speed of light. In other words, the camera 4 uses the characteristics of light to capture images, enabling high-speed distance measurement (measurement of distance value Zf) by the face detection unit 103, whereas the ultrasonic device 5 uses the characteristics of sound to measure distance (measurement of distance value Zu), and is therefore less responsive than the camera 4.
 このように、超音波デバイス5がカメラ4に対して応答性が劣っている場合、測距の遅延時間が発生する。例えば、カメラ4および超音波デバイス5が同時に処理を開始したとして、カメラ4が撮像画像を取得し、取得した撮像画像に基づき顔検出部103により距離値Zfを算出するまでに要した時間を所要時間t1とする。また、超音波デバイス5が超音波を用いて距離値Zuを算出するまでに要した時間を所要時間t2とする。係る例では、光速と音速との違いから、所要時間t1<所要時間t2が成り立つ。 In this way, if the ultrasonic device 5 is less responsive than the camera 4, a delay in distance measurement occurs. For example, assuming that the camera 4 and the ultrasonic device 5 start processing at the same time, the time required for the camera 4 to acquire an image and for the face detection unit 103 to calculate the distance value Zf based on the acquired image is taken as required time t1. Also, the time required for the ultrasonic device 5 to calculate the distance value Zu using ultrasonic waves is taken as required time t2. In this example, due to the difference between the speed of light and the speed of sound, required time t1<required time t2 holds.
 また、所要時間t1に対応する時刻すなわち距離値Zfが得られた時刻を時刻TM1とし、所要時間t2に対応する時刻すなわち距離値Zuが得られた時刻を時刻TM2とすると、時刻TM1と時刻TM2との間には時間差が生じることになる。具体的には、時刻TM1の方が時刻TM2より早い時刻となるため、時間差TM2-TM1は、超音波デバイス5がカメラ4よりも長く測距に係る期間であり、測距の遅延時間といえる。 If the time corresponding to the required time t1, i.e., the time when the distance value Zf is obtained, is taken as time TM1, and the time corresponding to the required time t2, i.e., the time when the distance value Zu is obtained, is taken as time TM2, then a time difference will occur between time TM1 and time TM2. Specifically, since time TM1 is earlier than time TM2, the time difference TM2-TM1 is a period during which the ultrasonic device 5 is longer involved in distance measurement than the camera 4, and can be said to be a delay time in distance measurement.
 応答遅延について、図7を用いて説明する。図7は、応答遅延を説明する説明図である。図7には、時間経過に伴う距離値Zfの推移と、距離値Zuの推移とが示される。また、図7には、時間差TM2-TM1に基づく遅延時間DTMが示される。係る例によれば、超音波デバイス5の応答性に起因した遅延時間DTMが、補正処理の実行を許可するもう1つの判定条件として用いられる。例えば、補正判定部104は、顔位置判定枠FL12内に顔枠FL11が入っていると判定した場合には、遅延時間DTMを用いて、補正処理を実行させるか否かを判定する。 The response delay will be explained using FIG. 7. FIG. 7 is an explanatory diagram for explaining the response delay. FIG. 7 shows the transition of the distance value Zf and the transition of the distance value Zu over time. FIG. 7 also shows the delay time DTM based on the time difference TM2-TM1. According to this example, the delay time DTM caused by the responsiveness of the ultrasonic device 5 is used as another judgment condition for permitting the execution of the correction process. For example, when the correction judgment unit 104 judges that the face frame FL11 is within the face position judgment frame FL12, it uses the delay time DTM to judge whether or not to execute the correction process.
 具体的には、補正判定部104は、顔位置判定枠FL12内に顔枠FL11が入っていると判定した場合には、顔位置判定枠FL12内に顔枠FL11が入っている状態で遅延時間DTMが経過したか否かを判定する。そして、補正判定部104は、遅延時間DTMが経過したと判定した場合には、補正処理の実行を許可する。例えば、補正判定部104は、顔位置判定枠FL12内に顔枠FL11が入っている状態で遅延時間DTMが経過したと判定した場合には、補正処理の実行を許可する信号を誤差算出部105に出力する。 Specifically, when the correction determination unit 104 determines that the face frame FL11 is within the face position determination frame FL12, it determines whether or not the delay time DTM has elapsed with the face frame FL11 within the face position determination frame FL12. Then, when the correction determination unit 104 determines that the delay time DTM has elapsed, it permits the execution of the correction process. For example, when the correction determination unit 104 determines that the delay time DTM has elapsed with the face frame FL11 within the face position determination frame FL12, it outputs a signal permitting the execution of the correction process to the error calculation unit 105.
 補正判定部104が、顔位置判定枠FL12内に顔枠FL11が入っている状態で遅延時間DTMが経過したか否かを判定する処理は、超音波デバイス5が有する課題(応答性が劣る)を排除するものである。 The process in which the correction determination unit 104 determines whether the delay time DTM has elapsed when the face frame FL11 is within the face position determination frame FL12 eliminates the problem (poor responsiveness) of the ultrasonic device 5.
 図3に戻り、誤差算出部105は、補正判定部104によって補正処理の実行を許可された場合には、超音波センサ102から取得した超音波信号に基づいて、距離値Zuを算出する。例えば、誤差算出部105は、超音波信号に基づく情報として、超音波信号が送信されてから受信されるまでの時間に基づいて、距離値Zuを算出する。そして、誤差算出部105は、顔検出部103から入力されている距離値Zfと、距離値Zuとを用いて、補正処理を実行する。具体的には、誤差算出部105は、距離値Zfと、距離値Zuとの差の絶対値(絶対誤差)を算出する。そして、誤差算出部105は、距離値Zuを真値として、距離値Zuに対する絶対値の割合である誤差率(相対誤差)誤差率SFZを算出する。また、誤差算出部105は、誤差率SFZを乗算器106に出力する。 Returning to FIG. 3, when the execution of the correction process is permitted by the correction determination unit 104, the error calculation unit 105 calculates the distance value Zu based on the ultrasonic signal acquired from the ultrasonic sensor 102. For example, the error calculation unit 105 calculates the distance value Zu based on the time from when the ultrasonic signal is transmitted to when it is received as information based on the ultrasonic signal. Then, the error calculation unit 105 executes the correction process using the distance value Zf input from the face detection unit 103 and the distance value Zu. Specifically, the error calculation unit 105 calculates the absolute value (absolute error) of the difference between the distance value Zf and the distance value Zu. Then, the error calculation unit 105 calculates the error rate (relative error) error rate SFZ, which is the ratio of the absolute value to the distance value Zu, taking the distance value Zu as the true value. The error calculation unit 105 also outputs the error rate SFZ to the multiplier 106.
 乗算器106は、顔検出部103から入力されている距離値Zfに誤差率SFZを掛け合わせることで、誤差を含む距離値Zfを補正する。また、乗算器106は、補正後の距離値Zfをトラッキングデータとして、視差画像処理ユニット20に出力する。 The multiplier 106 corrects the distance value Zf, which includes an error, by multiplying the distance value Zf input from the face detection unit 103 by the error rate SFZ. The multiplier 106 also outputs the corrected distance value Zf to the parallax image processing unit 20 as tracking data.
〔6.情報処理装置の処理手順〕
 次に、実施形態に係る情報処理装置100の処理手順について説明する。図8は、実施形態に係る情報処理装置100の処理手順を示すフローチャートである。例えば、情報処理装置100による補正処理は、立体視表示装置1の電源投入後、初期設定を経た信号処理で始まる。以下の説明においては、立体視表示装置1が、プロセッサ、メモリ、カメラ4、超音波デバイス5等を構成要素として含み、これらの構成要素の少なくとも一部がPCを構成するものとして説明する。
6. Processing Procedure of Information Processing Device
Next, a processing procedure of the information processing device 100 according to the embodiment will be described. Fig. 8 is a flowchart showing the processing procedure of the information processing device 100 according to the embodiment. For example, the correction process by the information processing device 100 starts with signal processing that has gone through initial settings after the power supply of the stereoscopic display device 1 is turned on. In the following description, the stereoscopic display device 1 will be described as including a processor, a memory, a camera 4, an ultrasonic device 5, etc. as components, and at least some of these components constitute a PC.
 処理が始まると、イメージセンサ101を介して撮像画像が取得され、取得された撮像画像が顔検出部103に供給される(ステップS801)。イメージセンサ101が定常的にメモリを経由して顔検出部103に撮像画像を入力する。例えば、カメラ4が1000fpsの高速カメラであれば、1ms周期でメモリに画像が取り込まれる。 When the process starts, a captured image is acquired via the image sensor 101, and the acquired captured image is supplied to the face detection unit 103 (step S801). The image sensor 101 constantly inputs the captured image to the face detection unit 103 via the memory. For example, if the camera 4 is a high-speed camera with a frame rate of 1000 fps, images are captured into the memory at intervals of 1 ms.
 顔検出部103は、撮像画像に基づく顔検出処理によって、ユーザの顔を検出できたか否かを判定する(ステップS802)。顔検出部103によってユーザの顔が検出されなかった場合には(ステップS802;No)、ステップS801へと処理が戻される。 The face detection unit 103 determines whether or not the user's face has been detected by face detection processing based on the captured image (step S802). If the user's face has not been detected by the face detection unit 103 (step S802; No), the process returns to step S801.
 一方、顔検出部103によってユーザの顔が検出された場合には(ステップS802;Yes)、顔検出結果に基づき生成された情報、具体的には、ユーザの顔の3次元座標情報(Xf、Yf、Zf)と、顔枠情報とを含む顔検出情報が出力される(ステップS803)。例えば、顔検出部103は、3次元座標情報(Xf、Yf、Zf)に含まれる距離情報である距離値Zfを誤差算出部105および乗算器106に出力する。また、顔検出部103は、顔枠情報を補正判定部104に出力する。 On the other hand, if the face of the user is detected by the face detection unit 103 (step S802; Yes), information generated based on the face detection result, specifically, face detection information including three-dimensional coordinate information (Xf, Yf, Zf) of the user's face and face frame information, is output (step S803). For example, the face detection unit 103 outputs the distance value Zf, which is distance information included in the three-dimensional coordinate information (Xf, Yf, Zf), to the error calculation unit 105 and the multiplier 106. In addition, the face detection unit 103 outputs the face frame information to the correction determination unit 104.
 補正判定部104は、顔枠情報が示す顔枠FL11が、顔位置判定枠FL12内に入っているか否かを判定する(ステップS804)。補正判定部104によって顔位置判定枠FL12内に顔枠FL11が入っていないと判定された場合には(ステップS804;No)、ステップS801へと処理が戻される。 The correction determination unit 104 determines whether or not the face frame FL11 indicated by the face frame information is within the face position determination frame FL12 (step S804). If the correction determination unit 104 determines that the face frame FL11 is not within the face position determination frame FL12 (step S804; No), the process returns to step S801.
 一方、補正判定部104は、顔位置判定枠FL12内に顔枠FL11が入っていると判定した場合には(ステップS804;Yes)、顔位置判定枠FL12内に顔枠FL11が入っている状態で遅延時間DTM分の期間が経過したか否かを判定する(ステップS805)。補正判定部104によって顔位置判定枠FL12内に顔枠FL11が入っている状態が遅延時間DTM分の期間継続されなかったと判定された場合には(ステップS805:No)、ステップS804へと処理が戻される。 On the other hand, if the correction determination unit 104 determines that the face frame FL11 is within the face position determination frame FL12 (step S804; Yes), it determines whether or not a period of the delay time DTM has elapsed with the face frame FL11 within the face position determination frame FL12 (step S805). If the correction determination unit 104 determines that the state in which the face frame FL11 is within the face position determination frame FL12 has not continued for the period of the delay time DTM (step S805: No), the process returns to step S804.
 一方、補正判定部104によって顔位置判定枠FL12内に顔枠FL11が入っている状態で遅延時間DTM分の期間が経過したと判定された場合には(ステップS805;Yes)、補正処理の実行を許可する信号が誤差算出部105に出力される(ステップS806)。 On the other hand, if the correction determination unit 104 determines that the delay time DTM has elapsed while the face frame FL11 is within the face position determination frame FL12 (step S805; Yes), a signal permitting the execution of correction processing is output to the error calculation unit 105 (step S806).
 誤差算出部105は、超音波センサ102から超音波信号を取得し、取得した超音波信号に基づく情報から距離値Zuを算出する(ステップS807)。例えば、誤差算出部105は、超音波信号に基づく情報として、超音波信号が送信されてから受信されるまでの時間に基づいて、距離値Zuを算出する。なお、誤差算出部105は、ディスプレイ3の周辺温度に応じた音速を求め、この音速に基づいて、より正確な距離値Zuを算出してよい。周辺温度を考慮した距離値Zuの算出手法については後述する。 The error calculation unit 105 acquires an ultrasonic signal from the ultrasonic sensor 102, and calculates the distance value Zu from information based on the acquired ultrasonic signal (step S807). For example, the error calculation unit 105 calculates the distance value Zu based on the time from when the ultrasonic signal is transmitted to when it is received, as information based on the ultrasonic signal. The error calculation unit 105 may obtain the speed of sound according to the ambient temperature of the display 3, and calculate a more accurate distance value Zu based on this speed of sound. A method for calculating the distance value Zu taking the ambient temperature into account will be described later.
 誤差算出部105は、距離値Zfと距離値Zuとに基づいて、誤差率SFZを算出する(ステップS808)。具体的には、誤差算出部105は、距離値Zfと、距離値Zuとの差の絶対値を算出し、距離値Zuに対する絶対値の割合である誤差率SFZを算出する。また、誤差算出部105は、誤差率SFZを乗算器106に出力する。 The error calculation unit 105 calculates the error rate SFZ based on the distance value Zf and the distance value Zu (step S808). Specifically, the error calculation unit 105 calculates the absolute value of the difference between the distance value Zf and the distance value Zu, and calculates the error rate SFZ, which is the ratio of the absolute value to the distance value Zu. The error calculation unit 105 also outputs the error rate SFZ to the multiplier 106.
 乗算器106は、誤差率SFZに基づいて、距離値Zuに対する距離値Zfの誤差を補正する(ステップS809)。具体的には、乗算器106は、顔検出部103から入力されている距離値Zfに誤差率SFZを掛け合わせることで、誤差を含む距離値Zfを補正する。 The multiplier 106 corrects the error of the distance value Zf relative to the distance value Zu based on the error rate SFZ (step S809). Specifically, the multiplier 106 multiplies the distance value Zf input from the face detection unit 103 by the error rate SFZ to correct the distance value Zf that includes an error.
 そして、乗算器106は、ステップS809の補正処理で得られた3次元位置情報(Xf、Yf、Zf×SFZ)をトラッキンググデータとして、視差画像処理ユニット20に出力する(ステップS810)。 Then, the multiplier 106 outputs the three-dimensional position information (Xf, Yf, Zf x SFZ) obtained by the correction process in step S809 to the parallax image processing unit 20 as tracking data (step S810).
〔7.距離値の算出手法〕
 ここからは、誤差算出部105による距離値Zuの算出手法について説明する。まず、空気中を音が伝達する速度である音速は、空気中の温度(気温)の影響を受けることが知られている。例えば、音速は、1気圧で0℃の場合では毎秒331.5メートルであり、温度変化1℃上ごとに毎秒0.6メートルずつ増減する特性を有している。このため、音速cを求める公式として、c=331.5+0.6T(m/sec)(T=空気中の摂氏温度)が知られている。係る公式によれば、常温として一般的に15℃が採用されることが多いことから、一般的な音速には、340(m/sec)がよく用いられる。
7. Distance value calculation method
From here, the calculation method of the distance value Zu by the error calculation unit 105 will be described. First, it is known that the speed of sound, which is the speed at which sound travels through the air, is affected by the temperature (air temperature) in the air. For example, the speed of sound is 331.5 meters per second at 1 atmosphere and 0°C, and has the characteristic of increasing or decreasing by 0.6 meters per second for every 1°C temperature change. For this reason, the formula for calculating the speed of sound c is c=331.5+0.6T (m/sec) (T=Celsius temperature in air). According to this formula, since 15°C is generally adopted as the normal temperature, 340 (m/sec) is often used as the general speed of sound.
 また、誤差算出部105は、上記音速の公式を用いて、下記式(1)により距離値Zuを算出することができる。 The error calculation unit 105 can also use the sound speed formula to calculate the distance value Zu using the following formula (1).
  (数1)L=331.5+0.6T×100×t/2・・・(1) (Equation 1) L = 331.5 + 0.6T x 100 x t/2... (1)
 「L(cm)」は、距離値Zuを示す値である。「T(℃)」は、気温を示す値である。「t(sec)」は、超音波デバイス5が超音波信号を送信してから受信するまでの時間である。したがって、「t/2」は、超音波がユーザに到達するまでに係る時間を示す。 "L (cm)" is a value indicating the distance value Zu. "T (°C)" is a value indicating the air temperature. "t (sec)" is the time it takes for the ultrasonic device 5 to transmit an ultrasonic signal and receive it. Therefore, "t/2" indicates the time it takes for the ultrasonic wave to reach the user.
 ここで、上述したように、気温として一般的に15℃が採用されるため、「T(℃)」=15を用いて、距離値Zuを算出することが考えられる。しかしながら、音速は、気温の影響を受けることから、「T(℃)」=15を固定値として用いた場合、正確な距離値Zuが得られない場合がある。そこで、一般的な気温15℃ではなく、実際の気温を示す値をTに入力する必要がある。 As mentioned above, 15°C is generally used as the temperature, so it is conceivable to calculate the distance value Zu using "T(°C)" = 15. However, since the speed of sound is affected by the temperature, if "T(°C)" = 15 is used as a fixed value, an accurate distance value Zu may not be obtained. Therefore, it is necessary to input a value indicating the actual temperature into T, rather than the general temperature of 15°C.
 そして、この実際の気温としては、ディスプレイ3の周辺温度に基づく温度が用いられることが好ましい。この理由は、ディスプレイ3が有するパネルは、発熱による所定の温度特性を示すため、ディスプレイ3の周辺温度は、実質、この温度特性に応じた温度になると予想され、かつ、図4に示すよう超音波デバイス5はディスプレイ3の近傍に設置されるためである。 It is preferable to use a temperature based on the ambient temperature of the display 3 as this actual air temperature. The reason for this is that the panel of the display 3 exhibits certain temperature characteristics due to heat generation, so the ambient temperature of the display 3 is expected to be substantially a temperature according to these temperature characteristics, and the ultrasonic device 5 is installed near the display 3 as shown in FIG. 4.
 図9に、パネル温度と周辺温度との関係性を示す。図9には、パネルに生じる温度であるパネル温度と、パネル温度の影響を受けてディスプレイ3の周辺に生じる周辺温度との関係性が示されたテーブルTBが図示される。なお、パネル温度は、パネルに備えられた温度センサによって検出されてよい。 FIG. 9 shows the relationship between the panel temperature and the ambient temperature. FIG. 9 shows a table TB showing the relationship between the panel temperature, which is the temperature generated on the panel, and the ambient temperature generated around the display 3 as a result of the influence of the panel temperature. The panel temperature may be detected by a temperature sensor provided on the panel.
 そして、テーブルTBに示されるパネル温度と周辺温度との関係性を用いれば、現在のパネル温度の実測値であるパネル温度T1から、ディスプレイ3の周辺温度T2を推定する下記式(2)が導き出される。つまり、周辺温度T2は、パネル温度T1を用いて式(2)のように近似される。 Then, by using the relationship between the panel temperature and the ambient temperature shown in table TB, the following formula (2) can be derived to estimate the ambient temperature T2 of the display 3 from the panel temperature T1, which is the actual measured value of the current panel temperature. In other words, the ambient temperature T2 is approximated using the panel temperature T1 as shown in formula (2).
  (数2)T2=0.0388(T1)-1.4252(T1)+12.132(T1)+13.437・・・(2) (Equation 2) T2 = 0.0388 (T1) 3 - 1.4252 (T1) 2 + 12.132 (T1) + 13.437 ... (2)
 そこで、誤差算出部105は、温度センサによって検出された現在のパネル温度T1を式(2)に適用し、ディスプレイ3の周辺温度T2を推定する。そして、誤差算出部105は、パネル温度T1と、周辺温度T2との温度差T1-T2を用いて、外気温T0を補正する。例えば、誤差算出部105は、温度差T1-T2を外気温T0に加算することにより、外気温T0を補正する。そして、誤差算出部105は、上記式(1)の気温Tとして、補正後の外気温T0を採用し、式(1)を解くことで、「L」の値すなわち距離値Zuを算出する。 Then, the error calculation unit 105 applies the current panel temperature T1 detected by the temperature sensor to equation (2) to estimate the ambient temperature T2 of the display 3. The error calculation unit 105 then corrects the outside air temperature T0 using the temperature difference T1-T2 between the panel temperature T1 and the ambient temperature T2. For example, the error calculation unit 105 corrects the outside air temperature T0 by adding the temperature difference T1-T2 to the outside air temperature T0. The error calculation unit 105 then uses the corrected outside air temperature T0 as the air temperature T in the above equation (1), and solves equation (1) to calculate the value of "L", i.e., the distance value Zu.
 ここで算出された距離値Zuには、超音波デバイス5がディスプレイ3の近傍に設置されることで影響受けている気温(パネルの発熱に伴うディスプレイ3の周辺温度)が考慮された値であるため、一般的な気温15℃が用いられた場合と比べてより精度の高い値といえる。 The distance value Zu calculated here takes into account the temperature (the ambient temperature of the display 3 due to heat generation by the panel) that is affected by the ultrasonic device 5 being placed near the display 3, so it can be said to be a more accurate value than if a general temperature of 15°C were used.
〔8.距離値の算出処理手順〕
 次に、ディスプレイ3の周辺温度が考慮された場合における距離値Zuの算出処理手順について説明する。図10は、距離値Zuの算出処理手順を示すフローチャートである。図10には、超音波デバイス5が、ディスプレイ3が有するパネルの影響を受けて示すことになる温度特性を補正する手順が示される。
8. Distance Value Calculation Process
Next, a procedure for calculating the distance value Zu when the ambient temperature of the display 3 is taken into consideration will be described. Fig. 10 is a flowchart showing the procedure for calculating the distance value Zu. Fig. 10 shows a procedure for correcting the temperature characteristic that the ultrasonic device 5 exhibits due to the influence of the panel of the display 3.
 まず、誤差算出部105は、外気温T0を取得する(ステップS1001)。例えば、誤差算出部105は、パネルの電源が投入された際の外気温T0を取得してよい。また、ここでいう外気温T0とは、立体視表示装置1が置かれた空間での気温である。 First, the error calculation unit 105 acquires the outside air temperature T0 (step S1001). For example, the error calculation unit 105 may acquire the outside air temperature T0 when the panel is powered on. The outside air temperature T0 here refers to the air temperature in the space in which the stereoscopic display device 1 is placed.
 また、誤差算出部105は、現在のパネル温度T1を取得し(ステップS1002)、パネル温度T1を式(2)に適用し、ディスプレイ3の周辺温度T2を推定する(ステップS1003)。 The error calculation unit 105 also acquires the current panel temperature T1 (step S1002), and applies the panel temperature T1 to equation (2) to estimate the ambient temperature T2 of the display 3 (step S1003).
 次に、誤差算出部105は、パネル温度T1に対する周辺温度T2との温度差T1-T2を算出し、外気温T0に温度差T1-T2を加算するという補正を行う(ステップS1004)。なお、係る手法は、ニュートンの冷却の法則に基づくものである。また、補正後の外気温T0は、立体視表示装置1が置かれた空間、すなわち超音波が伝達する空間の温度といえる。 The error calculation unit 105 then calculates the temperature difference T1-T2 between the panel temperature T1 and the ambient temperature T2, and performs a correction by adding the temperature difference T1-T2 to the outside air temperature T0 (step S1004). This method is based on Newton's law of cooling. The corrected outside air temperature T0 can be said to be the temperature of the space in which the stereoscopic display device 1 is placed, that is, the space through which ultrasonic waves are transmitted.
 このため、誤差算出部105は、補正後の外気温T0を音の伝達温度として用いて、音速cを算出する(ステップS1005)。具体的には、誤差算出部105は、補正後の外気温T0を音速の公式に示されるTとして採用し、音速cを算出する。 For this reason, the error calculation unit 105 calculates the sound speed c by using the corrected outside air temperature T0 as the sound transmission temperature (step S1005). Specifically, the error calculation unit 105 uses the corrected outside air temperature T0 as T indicated in the sound speed formula to calculate the sound speed c.
 そして、誤差算出部105は、上記式(1)に音速cを適用し、距離値Zuを算出する(ステップS1006)。 Then, the error calculation unit 105 applies the sound speed c to the above formula (1) to calculate the distance value Zu (step S1006).
 不図示であるが、ステップS1006で得られた距離値Zuが情報処理装置100に入力される。ステップS807で説明したように、入力された距離値Zuは、誤差算出部105によって取得される。 Although not shown, the distance value Zu obtained in step S1006 is input to the information processing device 100. As described in step S807, the input distance value Zu is obtained by the error calculation unit 105.
〔9.ハードウェア構成〕
 図11を用いて、上述した実施形態に係る情報処理装置100に対応するコンピュータのハードウェア構成例について説明する。図11は、実施形態に係る情報処理装置100に対応するコンピュータのハードウェア構成例を示すブロック図である。なお、図11は、実施形態に係る情報処理装置100に対応するコンピュータのハードウェア構成の一例を示すものであり、図11に示す構成には限定される必要はない。
9. Hardware Configuration
An example of the hardware configuration of a computer corresponding to the information processing device 100 according to the embodiment described above will be described with reference to Fig. 11. Fig. 11 is a block diagram showing an example of the hardware configuration of a computer corresponding to the information processing device 100 according to the embodiment. Note that Fig. 11 shows an example of the hardware configuration of a computer corresponding to the information processing device 100 according to the embodiment, and the configuration does not need to be limited to that shown in Fig. 11.
 図11に示すように、コンピュータ1000は、CPU(Central Processing Unit)1100、RAM(Random Access Memory)1200、ROM(Read Only Memory)1300、HDD(Hard Disk Drive)1400、通信インターフェイス1500、および入出力インターフェイス1600を有する。コンピュータ1000の各部は、バス1050によって接続される。 As shown in FIG. 11, computer 1000 has a CPU (Central Processing Unit) 1100, RAM (Random Access Memory) 1200, ROM (Read Only Memory) 1300, HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input/output interface 1600. Each part of computer 1000 is connected by a bus 1050.
 CPU1100は、ROM1300またはHDD1400に格納されたプログラムに基づいて動作し、各部の制御を行う。例えば、CPU1100は、ROM1300またはHDD1400に格納されたプログラムをRAM1200に展開し、各種プログラムに対応した処理を実行する。 The CPU 1100 operates based on the programs stored in the ROM 1300 or the HDD 1400 and controls each component. For example, the CPU 1100 loads the programs stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processes corresponding to the various programs.
 ROM1300は、コンピュータ1000の起動時にCPU1100によって実行されるBIOS(Basic Input Output System)等のブートプログラムや、コンピュータ1000のハードウェアに依存するプログラム等を格納する。 The ROM 1300 stores boot programs such as the Basic Input Output System (BIOS) that is executed by the CPU 1100 when the computer 1000 starts up, as well as programs that depend on the hardware of the computer 1000.
 HDD1400は、CPU1100によって実行されるプログラム、および、係るプログラムによって使用されるデータ等を非一時的に記録する、コンピュータが読み取り可能な記録媒体である。具体的には、HDD1400は、プログラムデータ1450を記録する。プログラムデータ1450は、本開示の実施形態に係る情報処理方法を実現するための情報処理プログラム、および、係る情報処理プログラムによって使用されるデータの一例である。 HDD 1400 is a computer-readable recording medium that non-temporarily records programs executed by CPU 1100 and data used by such programs. Specifically, HDD 1400 records program data 1450. Program data 1450 is an example of an information processing program for realizing an information processing method according to an embodiment of the present disclosure, and data used by such information processing program.
 通信インターフェイス1500は、コンピュータ1000が外部ネットワーク1550(例えばインターネット)と接続するためのインターフェイスである。例えば、CPU1100は、通信インターフェイス1500を介して、他の機器からデータを受信したり、CPU1100が生成したデータを他の機器へ送信したりする。 The communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (e.g., the Internet). For example, the CPU 1100 receives data from other devices and transmits data generated by the CPU 1100 to other devices via the communication interface 1500.
 入出力インターフェイス1600は、入出力デバイス1650とコンピュータ1000とを接続するためのインターフェイスである。たとえば、CPU1100は、入出力インターフェイス1600を介して、キーボードやマウス等の入力デバイスからデータを受信する。また、CPU1100は、入出力インターフェイス1600を介して、表示装置やスピーカーやプリンタ等の出力デバイスにデータを送信する。また、入出力インターフェイス1600は、所定の記録媒体(メディア)に記録されたプログラム等を読み取るメディアインターフェイスとして機能してもよい。メディアとは、たとえばDVD(Digital Versatile Disc)、PD(Phase change rewritable Disk)等の光学記録媒体、MO(Magneto-Optical disk)等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600. The CPU 1100 also transmits data to an output device such as a display device, a speaker, or a printer via the input/output interface 1600. The input/output interface 1600 may also function as a media interface that reads programs and the like recorded on a specific recording medium. Examples of media include optical recording media such as DVDs (Digital Versatile Discs) and PDs (Phase change rewritable Disks), magneto-optical recording media such as MOs (Magneto-Optical Disks), tape media, magnetic recording media, and semiconductor memories.
 例えば、コンピュータ1000が、実施形態に係る情報処理装置100として機能する場合、コンピュータ1000のCPU1100は、RAM1200上にロードされた情報処理プログラムを実行することにより、図3に示された各処理が実行する各種処理機能を実現する。すなわち、CPU1100およびRAM1200等は、ソフトウェア(RAM1200上にロードされた情報処理プログラム)との協働により、実施形態に係る情報処理装置100による情報処理方法を実現する。 For example, when the computer 1000 functions as the information processing device 100 according to the embodiment, the CPU 1100 of the computer 1000 executes an information processing program loaded onto the RAM 1200, thereby implementing the various processing functions executed by each process shown in FIG. 3. That is, the CPU 1100 and the RAM 1200, etc., work together with the software (the information processing program loaded onto the RAM 1200) to implement the information processing method by the information processing device 100 according to the embodiment.
〔10.まとめ〕
 以上、本開示の実施形態について説明したが、本開示の技術的範囲は、上述の実施形態そのままに限定されるものではなく、本開示の要旨を逸脱しない範囲において種々の変更が可能である。また、実施形態および変形例にわたる構成要素を適宜組み合わせてもよい。
10. Summary
Although the embodiments of the present disclosure have been described above, the technical scope of the present disclosure is not limited to the above-described embodiments, and various modifications are possible without departing from the gist of the present disclosure. In addition, the components of the embodiments and modifications may be appropriately combined.
 また、本明細書に記載された各実施形態における効果はあくまで例示であって限定されるものでは無く、他の効果があってもよい。 Furthermore, the effects of each embodiment described in this specification are merely examples and are not limiting, and other effects may also be present.
 なお、本開示は以下のような構成も取ることができる。 In addition, this disclosure can also be configured as follows:
(1)
 ユーザを撮像して撮像画像を取得する撮像部と、
 前記ユーザを検出するように前記撮像部の近傍に設置される超音波デバイスと、
 前記撮像画像が示す前記ユーザの顔の3次元座標情報を取得し、
 前記3次元座標情報に基づいて、所定時間継続的に前記ユーザの顔が検出されたか否かを判定し、
 所定時間継続的に前記ユーザの顔が検出されているという判定に基づいて、前記3次元座標情報が示す第1の距離値を、前記超音波デバイスから取得した信号に基づく第2の距離値に基づいて補正する
 情報処理部とを
 備える情報処理装置。
(2)
 前記情報処理部は、前記第2の距離値に基づいて、前記第2の距離値に対する前記第1の距離値の誤差を補正する
 前記(1)に記載の情報処理装置。
(3)
 前記情報処理部は、
前記第1の距離値と、前記第2の距離値との差分を用いて、前記第2の距離値を真値とした場合の誤差率を算出し、
 前記誤差率に基づいて、前記第2の距離値に対する前記第1の距離値の誤差を補正する
 前記(2)に記載の情報処理装置。
(4)
 ディスプレイをさらに備え、
 前記撮像部は、前記撮像画像が前記ディスプレイを視聴するユーザを含むように前記ディスプレイの近傍に設置され、
 前記超音波デバイスは、前記ディスプレイの近傍に設置される
 前記(1)に記載の情報処理装置。
(5)
 前記情報処理部は、前記超音波デバイスから取得した信号に基づいて、前記ディスプレイの周辺温度に応じた前記第2の距離値を算出する
 前記(4)に記載の情報処理装置。
(6)
 前記情報処理部は、
 前記ディスプレイのパネルに備えられた温度センサが検出したパネル温度に基づいて、前記周辺温度を推定し、
 前記パネル温度と前記周辺温度との温度差に基づく補正温度に応じた音速を算出し、
 前記算出した音速に基づいて、前記第2の距離値を算出する
 前記(5)に記載の情報処理装置。
(7)
 前記ディスプレイとして、前記ユーザの視点位置を用いて生成された立体視画像が表示される立体視ディスプレイを備える
 前記(4)に記載の情報処理装置。
(8)
 前記所定時間は、前記撮像部が応答に要する時間と前記超音波デバイスが応答に要する時間の差分である遅延時間を表し、
 前記情報処理部は、前記ユーザの顔が検出されたか否かの前記判定として、前記遅延時間の間、継続的に前記ユーザの顔が検出されたか否かを判定する
 前記(1)に記載の情報処理装置。
(9)
 コンピュータが、
 撮像部により撮像されたユーザの撮像画像が示す前記ユーザの顔の3次元座標情報を取得し、
 前記3次元座標情報に基づいて、所定時間継続的に前記ユーザの顔が検出されたか否かを判定し、
 所定時間継続的に前記ユーザの顔が検出されているという判定に基づいて、前記3次元座標情報が示す第1の距離値を、前記ユーザを検出するように前記撮像部の近傍に設置される超音波デバイスから取得した信号に基づく第2の距離値に基づいて補正する
 処理を実行する情報処理方法。
(10)
 コンピュータを、
 撮像部により撮像されたユーザの撮像画像が示す前記ユーザの顔の3次元座標情報を取得し、
 前記3次元座標情報に基づいて、所定時間継続的に前記ユーザの顔が検出されたか否かを判定し、
 所定時間継続的に前記ユーザの顔が検出されているという判定に基づいて、前記3次元座標情報が示す第1の距離値を、前記ユーザを検出するように前記撮像部の近傍に設置される超音波デバイスから取得した信号に基づく第2の距離値に基づいて補正する
 情報処理部
 として機能させるための情報処理プログラム。
(1)
an imaging unit that captures an image of a user and acquires a captured image;
an ultrasonic device disposed near the imaging unit so as to detect the user;
Acquire three-dimensional coordinate information of the face of the user shown in the captured image;
determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information;
and an information processing unit that corrects a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal acquired from the ultrasonic device based on a determination that the user's face has been detected continuously for a predetermined period of time.
(2)
The information processing device according to (1), wherein the information processing unit corrects an error of the first distance value with respect to the second distance value based on the second distance value.
(3)
The information processing unit includes:
calculating an error rate when the second distance value is set as a true value by using a difference between the first distance value and the second distance value;
The information processing device according to (2), further comprising: correcting an error of the first distance value with respect to the second distance value based on the error rate.
(4)
Further equipped with a display,
the imaging unit is installed near the display so that the captured image includes a user viewing the display;
The information processing device according to (1), wherein the ultrasonic device is installed near the display.
(5)
The information processing device according to (4), wherein the information processing unit calculates the second distance value according to an ambient temperature of the display based on a signal acquired from the ultrasonic device.
(6)
The information processing unit includes:
estimating the ambient temperature based on a panel temperature detected by a temperature sensor provided on a panel of the display;
Calculating a sound velocity according to a correction temperature based on a temperature difference between the panel temperature and the ambient temperature;
The information processing device according to (5), wherein the second distance value is calculated based on the calculated sound speed.
(7)
The information processing device according to (4), further comprising, as the display, a stereoscopic display on which a stereoscopic image generated using a viewpoint position of the user is displayed.
(8)
the predetermined time represents a delay time that is a difference between a time required for the imaging unit to respond and a time required for the ultrasonic device to respond,
The information processing device according to (1), wherein the information processing unit determines whether the user's face has been detected continuously during the delay time as the determination of whether the user's face has been detected.
(9)
The computer
Acquire three-dimensional coordinate information of a face of the user shown in an image of the user captured by an imaging unit;
determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information;
An information processing method that executes a process of correcting a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal obtained from an ultrasonic device installed near the imaging unit to detect the user, based on a determination that the user's face has been detected continuously for a predetermined period of time.
(10)
Computer,
Acquire three-dimensional coordinate information of a face of the user shown in an image of the user captured by an imaging unit;
determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information;
An information processing program for functioning as an information processing unit that, based on a determination that the user's face has been detected continuously for a predetermined period of time, corrects a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal obtained from an ultrasonic device installed near the imaging unit to detect the user.
  1   立体視表示装置
  2   ベース
  3   ディスプレイ
  4   カメラ
  5   超音波デバイス
  100 情報処理装置
  101 イメージセンサ
  102 超音波センサ
  103 顔検出部
  104 補正判定部
  105 誤差算出部
  106 乗算器
REFERENCE SIGNS LIST 1 Stereoscopic display device 2 Base 3 Display 4 Camera 5 Ultrasonic device 100 Information processing device 101 Image sensor 102 Ultrasonic sensor 103 Face detection unit 104 Correction determination unit 105 Error calculation unit 106 Multiplier

Claims (10)

  1.  ユーザを撮像して撮像画像を取得する撮像部と、
     前記ユーザを検出するように前記撮像部の近傍に設置される超音波デバイスと、
     前記撮像画像が示す前記ユーザの顔の3次元座標情報を取得し、
     前記3次元座標情報に基づいて、所定時間継続的に前記ユーザの顔が検出されたか否かを判定し、
     所定時間継続的に前記ユーザの顔が検出されているという判定に基づいて、前記3次元座標情報が示す第1の距離値を、前記超音波デバイスから取得した信号に基づく第2の距離値に基づいて補正する
     情報処理部とを
     備える情報処理装置。
    an imaging unit that captures an image of a user and acquires a captured image;
    an ultrasonic device disposed near the imaging unit so as to detect the user;
    Acquire three-dimensional coordinate information of the face of the user shown in the captured image;
    determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information;
    and an information processing unit that corrects a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal acquired from the ultrasonic device based on a determination that the user's face has been detected continuously for a predetermined period of time.
  2.  前記情報処理部は、前記第2の距離値に基づいて、前記第2の距離値に対する前記第1の距離値の誤差を補正する
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1 , wherein the information processing unit corrects an error of the first distance value with respect to the second distance value based on the second distance value.
  3.  前記情報処理部は、
     前記第1の距離値と、前記第2の距離値との差分を用いて、前記第2の距離値を真値とした場合の誤差率を算出し、
     前記誤差率に基づいて、前記第2の距離値に対する前記第1の距離値の誤差を補正する
     請求項2に記載の情報処理装置。
    The information processing unit includes:
    calculating an error rate when the second distance value is set as a true value by using a difference between the first distance value and the second distance value;
    The information processing apparatus according to claim 2 , further comprising: correcting an error of the first distance value with respect to the second distance value based on the error rate.
  4.  ディスプレイをさらに備え、
     前記撮像部は、前記撮像画像が前記ディスプレイを視聴するユーザを含むように前記ディスプレイの近傍に設置され、
     前記超音波デバイスは、前記ディスプレイの近傍に設置される
     請求項1に記載の情報処理装置。
    Further equipped with a display,
    the imaging unit is installed near the display so that the captured image includes a user viewing the display;
    The information processing apparatus according to claim 1 , wherein the ultrasonic device is installed near the display.
  5.  前記情報処理部は、前記超音波デバイスから取得した信号に基づいて、前記ディスプレイの周辺温度に応じた前記第2の距離値を算出する
     請求項4に記載の情報処理装置。
    The information processing device according to claim 4 , wherein the information processing unit calculates the second distance value according to an ambient temperature of the display based on a signal acquired from the ultrasonic device.
  6.  前記情報処理部は、
     前記ディスプレイのパネルに備えられた温度センサが検出したパネル温度に基づいて、前記周辺温度を推定し、
     前記パネル温度と前記周辺温度との温度差に基づく補正温度に応じた音速を算出し、
     前記算出した音速に基づいて、前記第2の距離値を算出する
     請求項5に記載の情報処理装置。
    The information processing unit includes:
    estimating the ambient temperature based on a panel temperature detected by a temperature sensor provided on a panel of the display;
    Calculating a sound velocity according to a correction temperature based on a temperature difference between the panel temperature and the ambient temperature;
    The information processing device according to claim 5 , wherein the second distance value is calculated based on the calculated sound speed.
  7.  前記ディスプレイとして、前記ユーザの視点位置を用いて生成された立体視画像が表示される立体視ディスプレイを備える
     請求項4に記載の情報処理装置。
    The information processing device according to claim 4 , further comprising, as the display, a stereoscopic display on which a stereoscopic image generated using the viewpoint position of the user is displayed.
  8.  前記所定時間は、前記撮像部が応答に要する時間と前記超音波デバイスが応答に要する時間の差分である遅延時間を表し、
     前記情報処理部は、前記ユーザの顔が検出されたか否かの前記判定として、前記遅延時間の間、継続的に前記ユーザの顔が検出されたか否かを判定する
     請求項1に記載の情報処理装置。
    the predetermined time represents a delay time that is a difference between a time required for the imaging unit to respond and a time required for the ultrasonic device to respond,
    The information processing device according to claim 1 , wherein the information processing unit determines whether the user's face has been detected by determining whether the user's face has been detected continuously during the delay time.
  9.  コンピュータが、
     撮像部により撮像されたユーザの撮像画像が示す前記ユーザの顔の3次元座標情報を取得し、
     前記3次元座標情報に基づいて、所定時間継続的に前記ユーザの顔が検出されたか否かを判定し、
     所定時間継続的に前記ユーザの顔が検出されているという判定に基づいて、前記3次元座標情報が示す第1の距離値を、前記ユーザを検出するように前記撮像部の近傍に設置される超音波デバイスから取得した信号に基づく第2の距離値に基づいて補正する
     処理を実行する情報処理方法。
    The computer
    Acquire three-dimensional coordinate information of a face of the user shown in an image of the user captured by an imaging unit;
    determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information;
    An information processing method that executes a process of correcting a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal obtained from an ultrasonic device installed near the imaging unit to detect the user, based on a determination that the user's face has been detected continuously for a predetermined period of time.
  10.  コンピュータを、
     撮像部により撮像されたユーザの撮像画像が示す前記ユーザの顔の3次元座標情報を取得し、
     前記3次元座標情報に基づいて、所定時間継続的に前記ユーザの顔が検出されたか否かを判定し、
     所定時間継続的に前記ユーザの顔が検出されているという判定に基づいて、前記3次元座標情報が示す第1の距離値を、前記ユーザを検出するように前記撮像部の近傍に設置される超音波デバイスから取得した信号に基づく第2の距離値に基づいて補正する
     情報処理部
     として機能させるための情報処理プログラム。
    Computer,
    Acquire three-dimensional coordinate information of a face of the user shown in an image of the user captured by an imaging unit;
    determining whether the user's face has been detected continuously for a predetermined period of time based on the three-dimensional coordinate information;
    An information processing program for functioning as an information processing unit that, based on a determination that the user's face has been detected continuously for a predetermined period of time, corrects a first distance value indicated by the three-dimensional coordinate information based on a second distance value based on a signal obtained from an ultrasonic device installed near the imaging unit to detect the user.
PCT/JP2023/041042 2022-11-25 2023-11-15 Information processing apparatus, information processing method, and information processing program WO2024111475A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022188652 2022-11-25
JP2022-188652 2022-11-25

Publications (1)

Publication Number Publication Date
WO2024111475A1 true WO2024111475A1 (en) 2024-05-30

Family

ID=91195651

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/041042 WO2024111475A1 (en) 2022-11-25 2023-11-15 Information processing apparatus, information processing method, and information processing program

Country Status (1)

Country Link
WO (1) WO2024111475A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1062552A (en) * 1996-08-14 1998-03-06 Mitsubishi Heavy Ind Ltd Distance measuring apparatus
JP2012209677A (en) * 2011-03-29 2012-10-25 Kyocera Corp Portable electronic apparatus
JP2014112757A (en) * 2012-12-05 2014-06-19 Nlt Technologies Ltd Stereoscopic image display device
US20150326968A1 (en) * 2014-05-08 2015-11-12 Panasonic Intellectual Property Management Co., Ltd. Directivity control apparatus, directivity control method, storage medium and directivity control system
WO2020130048A1 (en) * 2018-12-21 2020-06-25 京セラ株式会社 Three-dimensional display device, head-up display system, and moving object

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1062552A (en) * 1996-08-14 1998-03-06 Mitsubishi Heavy Ind Ltd Distance measuring apparatus
JP2012209677A (en) * 2011-03-29 2012-10-25 Kyocera Corp Portable electronic apparatus
JP2014112757A (en) * 2012-12-05 2014-06-19 Nlt Technologies Ltd Stereoscopic image display device
US20150326968A1 (en) * 2014-05-08 2015-11-12 Panasonic Intellectual Property Management Co., Ltd. Directivity control apparatus, directivity control method, storage medium and directivity control system
WO2020130048A1 (en) * 2018-12-21 2020-06-25 京セラ株式会社 Three-dimensional display device, head-up display system, and moving object

Similar Documents

Publication Publication Date Title
JP6747504B2 (en) Information processing apparatus, information processing method, and program
EP2556410B1 (en) 3d camera with foreground object distance sensing
WO2017092334A1 (en) Method and device for image rendering processing
US20110242286A1 (en) Stereoscopic Camera With Automatic Obstruction Removal
JP2003242527A (en) Information processor and method
JP2015166890A (en) Information processing apparatus, information processing system, information processing method, and program
JP2009505247A (en) Methods and circuit arrangements for tracking and real-time detection of multiple observer eyes
US12010288B2 (en) Information processing device, information processing method, and program
CN103139463A (en) Method, system and mobile device for augmenting reality
US10630890B2 (en) Three-dimensional measurement method and three-dimensional measurement device using the same
JP2017129904A (en) Information processor, information processing method, and record medium
JP2017531388A (en) Head mounted display device controlled by tap, control method thereof, and computer program for the control
US11557020B2 (en) Eye tracking method and apparatus
WO2015194075A1 (en) Image processing device, image processing method, and program
WO2019233125A1 (en) Image processing method, image capture device, computer apparatus, and readable storage medium
KR101139287B1 (en) Panorama video display device and display control method using user motion
TW201501508A (en) Stereoscopic display method with tracking function for two-view naked-eye stereoscopic display
JP2020042206A (en) Image display control apparatus and image display control program
US20190369807A1 (en) Information processing device, information processing method, and program
WO2024111475A1 (en) Information processing apparatus, information processing method, and information processing program
US20240031551A1 (en) Image capturing apparatus for capturing a plurality of eyeball images, image capturing method for image capturing apparatus, and storage medium
JPWO2020071029A1 (en) Information processing equipment, information processing method, and recording medium
JP2012080294A (en) Electronic device, video processing method, and program
US8064655B2 (en) Face image detection device, face image detection method and imaging apparatus
US11181977B2 (en) Slippage compensation in eye tracking