WO2024123603A1

WO2024123603A1 - Verification of head-mounted display performance

Info

Publication number: WO2024123603A1
Application number: PCT/US2023/081989
Authority: WO
Inventors: Zhiheng Jia; Narges NOORI; Xiao YUAN; Luke SONG; Jeffrey Neil Margolis; Chao GUO
Original assignee: Google Llc
Priority date: 2022-12-05
Filing date: 2023-12-01
Publication date: 2024-06-13

Abstract

Techniques include performing an end-to-end verification of an extended reality device operation that encompasses all components of the operation. The end-to-end verification is performed on an apparatus that includes a linear rail station (100) that can move an extended reality device, e.g., a head-mounted display, HMD, (125) in three degrees of freedom: translation, pitch, and yaw. In addition, the apparatus includes a jig (120) to mount the HMD and a pair of eye proxy cameras (130) that are configured to generate images of the displays of the HMD as a user would observe.

Description

VERIFICATION OF HEAD-MOUNTED DISPLAY

PERFORMANCE

BACKGROUND

[0001] Verification of an extended reality device involves constructing a metric for benchmarking a component of the device operation. Nevertheless, verification of individual components of the extended reality device operation is not indicative of the user experience as a whole.

SUMMARY

[0002] Implementations described herein are related to a system and method for providing an end-to-end verification of a head-mounted display (HMD) for use in an extended reality system, that is, an augmented reality (AR)/virtual reality (VR)/mixed reality (MR) system. The system includes a linear rail station configured to guide the HMD along a path, a jig configured to mount the HMD in the linear rail station, and an eye proxy camera configured to simulate an eye by forming an image of a display of the HMD. In some implementations, the linear rail station may translate the HMD along a linear path. In some implementations, the linear rail station may rotate the HMD along an angular path. In some implementations, a world-facing camera of the HMD is directed at a stationary target placed in front of the linear rail station. The world-facing camera is part of an end-to-end pipeline that renders an image of the target in a display of the HMD. The linear rail station then moves the HMD along a path - linear or angular. As the HMD is moved along the path, the eye proxy camera forms an image of the display of the HMD and a position of the target in the display is detected. A verification metric such as pixel-to-world alignment error and/or jitter may then be based on the position of the target in the display as the HMD is moved along the path.

[0003] In one general aspect, a method can include capturing an image of a target using a world-facing camera of a head-mounted display (HMD) as the HMD is moved along a path, the image of the target corresponding to a location along the path and being displayed in a display of the HMD. The method can also include generating, using an eye proxy camera, a position of the image of the target while displayed within the display of the HMD. The method can further include generating a verification metric based on the position of the image of the target.

[0004] In another general aspect, a system can include a linear rail station configured to guide a head-mounted display (HMD) along a path. The system can also include a jig configured to mount the HMD in the linear rail station. The system can further include an eye proxy camera configured to simulate an eye by forming an image of a display of the HMD. The system can further include processing circuitry. The processing circuitry can be configured to capture an image of a target using a world-facing camera of the HMD as the HMD is moved along the path, the image of the target corresponding to a location along the path and being displayed in a display of the HMD. The processing circuitry can also be configured to generate, using the eye proxy camera, a position of the image of the target while displayed within the display of the HMD. The processing circuitry can further be configured to generate a verification metric based on the position of the image of the target.

[0005] In another general aspect, a computer program product can include a nontransitory storage medium, the computer program product including code that, when executed by processing circuitry, causes the processing circuitry to perform a method. The method can include capturing an image of a target using a world-facing camera of a head-mounted display (HMD) as the HMD is moved along a path, the image of the target corresponding to a location along the path and being displayed in a display of the HMD. The method can also include generating, using an eye proxy camera, a position of the image of the target while displayed within the display of the HMD. The method can further include generating a verification metric based on the position of the image of the target.

[0006] The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] FIG. 1 A is a diagram that illustrates an example linear rail station.

[0008] FIG. IB is a diagram that illustrates a specific example of a linear rail station.

[0009] FIG. 2 is a diagram that illustrates an example linear movement of a head-mounted display (HMD) past a target in the linear rail station. [0010] FIG. 3A is a diagram that illustrates an example target in the form of a modified marker.

[0011] FIG. 3B is a diagram that illustrates an example pixel-to-world alignment error resulting from a misaligned display.

[0012] FIG. 4 is a diagram that illustrates an example target in the form of a world-locked virtual sphere.

[0013] FIG. 5A is a plot that illustrates an example temporal signal generated based on a pixel-to-world alignment error over time from movement of an HMD along a path.

[0014] FIG. 5B is a plot that illustrates an example frequency space representation of the temporal signal of FIG. 5 A.

[0015] FIG. 6 is a diagram that illustrates an example electronic environment in which the improved techniques described herein may be implemented.

[0016] FIG. 7 is a flow chart that illustrates an example method of performing end-to-end verification of a head-mounted display (HMD), according to disclosed implementations.

DETAILED DESCRIPTION

[0017] Verification of an extended reality device involves constructing a metric for benchmarking a component of the extended reality device operation. Such a component can include any of 6DoF for pose estimation, LSR for late-stage rendering, and pass-through for rendering real -world objects combined with virtual objects, for example.

[0018] The components of a world-facing camera configured to capture an image on the world side of the HMD, the 6DoF for pose estimation, the LSR for latestage rendering, and the pass-through for rendering real-world objects combined with virtual objects is referred to as an end-to-end pipeline. In some implementations, the end-to-end pipeline includes depth estimation, display calibration, and other components.

[0019] A technical problem with the above is that verification of such individual components of the extended reality device operation is not indicative of the user experience as a whole. In other words, a process for performing an end-to-end verification of an extended reality device operation that encompasses all components of the operation does not exist.

[0020] In accordance with the implementations described herein, a technical solution to the above-described technical problem includes performing an end-to-end verification of an extended reality⁷ device operation that encompasses all components of the operation. The end-to-end verification is performed on an apparatus that includes a linear rail station that can move an extended reality device, e.g., a headmounted display (HMD) in three degrees of freedom: translation, pitch, and yaw. In addition, the apparatus includes a jig to mount the HMD and a pair of eye proxy cameras that are configured to generate images of the displays of the HMD as a user would observe.

[0021] The metrics used in the end-to-end verification of an extended reality device operation include pixel-to-world alignment error and jitter. Pixel-to-world alignment error is defined as a difference between positions of a fixed target as imaged by the HMD as the HMD is moved in the apparatus, e.g., translated along a rail or rotated in the jig. Jitter is defined as the pixel-to-world alignment error filtered over a particular frequency band, e.g., between 1 Hz and 30 Hz; this frequency band is where users can be sensitive to the jittery⁷ motion of the image of the target. The definitions of pixel-to-world alignment error and jitter further depend on whether the end-to-end verification is performed with or without pass-through, e.g.. whether the target can be seen directly through the HMD.

[0022] With see-through, the target is a marker surrounded by a circle. An example of such a marker is an ALVAR marker. The circle surrounding the marker, or the real-world circle, is used to locate the center of the marker. The HMD then produces an image of the marker/circle on a display, which in turn is imaged by an eye proxy camera. In a perfectly aligned system, the center of the image of the marker/circle on the display is co-centric w ith the center of the real-w orld circle. In misaligned systems, however, the center of the image of the marker/circle on the display is apart from the center of the real-world circle by a distance. This distance may vary⁷ over time as a result of accumulation of errors in various components of the end-to-end pipeline, e.g., drift in 6DoF even if the HMD is stationary⁷; this is the pixel-to-world alignment error. The jitter may then be computed by performing a Fourier transform of the pixel-to-world alignment error to produce a frequency distribution of the error; setting the amplitudes of the distribution outside a particular frequency band to zero to produce a filtered distribution; performing an inverse Fourier transform on the filtered distribution to produce a transformed error; and reporting either the total power or the 95th percentile value of the norm (e.g., absolute value) of the transformed error as the jitter.

[0023] Without see-through, the target is a virtual world-locked sphere produced by the HMD as a virtual object. In a perfectly aligned HMD, the virtual world-locked sphere would not move whether the HMD is stationary or is moved in the apparatus. When the HMD is stationary, the baseline position of the virtual world-locked sphere is its initial position at t=0 in the image space. When the HMD is moved in the apparatus, a baseline position of the virtual world-locked sphere is determined from two consecutive images of the virtual world-locked sphere taken at t=tO and an initial time step t=t 1. That is, the baseline position is determined from a triangulation of centers of circles representing the images of the virtual world-locked sphere at t=tO and t=tl. This triangulation may be repeated for subsequent time steps until the motion is stopped at t=tN. The pixel-to-world alignment error is then defined as the difference between the triangulated position of the virtual world-locked sphere at a time t and the baseline position in the image space. The jitter is defined as the filtered pixel-to-world alignment error similarly as for the pass-through case.

[0024] A technical advantage of disclosed implementations is that end-to-end verification of a head mounted display for an HMD is achieved that encompasses the entirety of the user experience rather than a single component that may or may not encompass any part of the user experience. For example, the jitter verification metric measures movement of a display image over frequencies to which a user may be especially sensitive.

[0025] FIG. 1A is a diagram that illustrates an example linear rail station 100. The linear rail station 100 is configured to move a head -mounted display (HMD) 125 over any of three degrees of freedom in front of a target that is imaged by the HMD 125. As shown in FIG. 1, the linear rail station 100 includes a linear rail 110, a jig 120 for mounting the HMD 125 on the linear rail station 100. an eye proxy camera 130, and processing circuitry 140.

[0026] The linear rail 110 is configured to guide the HMD 125 along a linear path in one direction. In some implementations, the linear rail 110 is a metal rail. In some implementations, the linear rail 110 includes a motor 115 configured to move the jig 120 on the linear rail 110 along the linear path. In some implementations, the motor 115 is configured to move the jig 120 on the linear rail 110 at a specified, constant speed. In some implementations, the speed is adjustable.

[0027] The jig 120 is configured to mount the HMD 125 in the linear rail station 100. The jig 120 is further configured to allow the HMD to be moved in any of two rotational directions, e.g., pitch and yaw. The motor 115 is configured to move the HMD 125 in either of the two rotational directions. In some implementations, the motor 115 is configured to move the jig 120 at a constant angular speed in one or both of the angular directions.

[0028] The eye proxy camera 130 is configured to simulate an ey e by forming an image of a display of the HMD 125. The eye proxy camera 130 is placed behind the display of the HMD 125 such that an exit pupil of the eye proxy camera 130 is placed approximately where a pupil of a user’s eye would be when using the HMD 125, and a focal length of the eye proxy camera 130 is approximately that of the user’s eye when looking at the display of the HMD 125. The eye proxy camera 130 is also attached to the jig 120 such that the eye proxy camera 130 moves with the HMD 125 synchronously; that is, there is no motion between the HMD 125 and the eye proxy camera 130.

[0029] The processing circuitry⁷ 140 is configured to capture an image of a target 150 using a world-facing camera of the HMD 125, generate a position of the image of the target 150 while displayed within a display of the HMD 125, and generate a verification metric based on the position of the image of the target. In some implementations, the processing circuitry⁷ 140 is located on the HMD 125 and is the processing circuitry⁷ used by the HMD 125. In some implementations, however, the processing circuitry⁷ 140 is external to the HMD 125 and is contained in, e.g., a computer connected to the HMD 125.

[0030] The processing circuitry 140 also provides the end-to-end pipeline apart from the world-facing camera of the HMD 125 for rendering an image on the display of the HMD 125. That is, the processing circuitry provides components for rendering an image on the display of the HMD 125 such as depth estimation, display calibration, pose estimation, and late-stage rendering (LSR).

[0031] FIG. IB is a diagram that illustrates a specific example of a linear rail station.

[0032] FIG. 2 is a diagram that illustrates an example linear movement of a head-mounted display (HMD) 210 past a target 230 in the linear rail station, e.g., linear rail station 100. It is noted that a path 240 can include not only linear motion but also rotational motion instead or in addition.

[0033] As shown in FIG. 2, the HMD 210 includes a world-facing camera 215 which is configured to capture an image of the target 230. The image of the target 230 is displayed in a display of the HMD 210. The eye-proxy camera 220 then captures an image of the display. In doing so, a location of the image of the target 230 within the display of the HMD 210 may be determined.

[0034] The HMD 210 is moved in a linear rail (e.g., linear rail 110) along a path 240 as the world-facing camera 215 captures images of the target 230. As the HMD 210 is moved along the path 240, the location of the image of the target 230 in the display of the HMD 210 may change. Because the HMD 210 is moving along the path 240 over time, e.g., at a substantially constant velocity, the location of the image of the target 230 changes over time.

[0035] In some implementations, the HMD 210 provides a see-through image of the target as well as that rendered by the end-to-end pipeline including capture by the world-facing camera 215. When the display of the HMD 210 is in alignment with what is rendered by the end-to-end pipeline, the location of the image of the target 230 is substantially coincident (e.g., within 5%) of the location of the see-through image of the target 230. When there is a misalignment, however, between what is rendered by the end-to-end pipeline and the display of the HMD 210. there is a difference between the location of the image of the target 230 and the location of the see-through image of the target 230. This difference is the pixel-to-world alignment error.

[0036] In some implementations, the pixel-to-world alignment error defines a temporal signal. This is because the pixel-to-world alignment error changes over time as the location of the image of the target 230 rendered with the end-to-end pipeline changes with time as the HMD 210 is moved along the path 240. The temporal signal represents movement of the image of the target 230 within the display. A user looking at the image may experience discomfort when the image is moving with a certain amount of rapidity. This movement may be characterized as belonging to a particular frequency band. Accordingly, such a particular frequency band may define a passband filter over which ajitter verification metric may be defined. Such a jitter verification metric may be indicative of a level of user discomfort when using the HMD 210. Further details of the jitter metric are discussed with regard to FIGs. 5 A and 5B.

[0037] FIG. 3A is a diagram that illustrates an example target 300 in the form of a modified marker. As shown in FIG. 3 A, the target 300 includes a marker 310 surrounded by a circle 320.

[0038] The marker 310 as shown in FIG. 3 A is an ALVAR marker which is used with AR systems for performing tracking in three dimensions. In some implementations, however, a general ArUco marker may be used. These markers are 2D binary-encoded fiducial patterns designed to be quickly located by computer vision systems.

[0039] The circle 320 surrounds the marker 310 such that the center 325 of the circle 320 is substantially coincident with a center of the marker 310. The image of the circle 320 and hence the location of the image of the center 325 defines the location of the target 300.

[0040] FIG. 3B is a diagram that illustrates an example pixel-to-world alignment error 380 resulting from a misaligned display. The eye proxy camera (not pictured here) forms an image of a see-through image 360 of a marker and circle and an image 370 of the marker and circle on the display from the end-to-end pipeline.

[0041] Processing circuitry (e.g., processing circuitry 140 of FIG. 1) determines a baseline position 365 of the see-through image and an apparent position 375 of the image of the world-facing camera in the display. In some implementations and as shown in FIG. 3B, the apparent position 375 of the image is the apparent position of the center of the marker in the respective images. In some implementations, the apparent position 375 is the center of the circle surrounding the marker. The apparent position 375 and the baseline position 365 are, in some implementations, expressed in terms of display coordinates.

[0042] A verification metric, the pixel-to-world alignment error 380, may then be defined as a difference between the apparent position 375 of the center of the markers in the world-camera image and the baseline position 365. In some implementations, the difference is defined as a norm of the difference in display coordinates (e.g., a Euclidean norm).

[0043] FIG. 4 is a diagram that illustrates a scenario 400 in which a verification metric is obtained without a see-through image. Rather, a world-locked virtual sphere 410 is rendered on an HMD 420. The world-locked virtual sphere 410 is configured to represent a virtual object at a fixed position in world space. Nevertheless, because of misalignment in the display, the apparent position of the world-locked virtual sphere 410 will change as the HMD 420 is moved within the linear rail station or even when the HMD is stationary.

[0044] In the scenario 400, at time tO (beginning of the motion of the HMD 420), the eye proxy camera 425 captures a first image of the display. At a next timestep, at time tl. the eye proxy camera captures a second image of the display. Based on the first image and the second image, e.g., via a triangulation, processing circuitry (e.g., processing circuitry 140 of FIG. 1) estimates baseline world coordinate C_tl of the world-locked virtual sphere 410. As the HMD 420 moves within the linear rail station beyond time tl. the eye proxy camera 425 captures an image of the display of the HMD 420 at each time step t2, t3, .. . , tN. For each time t, the processing circuitry estimates a world coordinate C_t of the world-locked virtual sphere 410.

[0045] It is noted that, in some implementations, the above analysis applies whether or not the HMD is moving along a path. For example, defects in other parts of the end-to-end pipeline such as depth estimation, display calibration, pose estimation, and LSR can cause the rendered image in the display to move with time.

[0046] Again, if the display of the HMD were perfectly aligned, then the world coordinates C_tl = C_t for all times t. But when the display is misaligned, however, the world coordinates C_t change with time t. The pixel-to-world alignment error at time t is then a difference between the world coordinates expressed in image space. In some implementations, the difference is a norm of the difference C_t — C_tl. In some implementations, the norm is a Euclidean norm.

[0047] When the normed difference || C_t — C_tl || is plotted as a function of time, the result is a temporal signal representing the pixel-to-world alignment error over time. The jitter metric may be derived from this temporal signal by applying a bandpass filter over a specified range of frequencies (e.g.. 1 Hz to 30 Hz) to the temporal signal, and selecting a specified percentile (e.g., 95 percentile) of the resulting filtered signal to represent the jitter. This is illustrated in FIGs. 5A and 5B.

[0048] FIG. 5 A is a plot 500 that illustrates an example temporal signal 510 generated based on a pixel-to-world alignment error over time from movement of an HMD along a path. The pixel-to-world alignment error can result from a see-through image or a non-see-through image. In the see-through image case, the pixel-to-world alignment error is defined as a difference between the detected positions of an end-to- end rendered image of a target and a detected position of a see-through image of the target (see, e.g., FIG. 3B). In the non-see-through image case, the pixel-to-world alignment error is defined as a difference between an estimated position of a world- locked virtual sphere in image space at time t and that at time tl, e.g., at an initial timestep.

[0049] The temporal signal 510 is represented in FIG. 5 A as a continuous signal but in reality, the temporal signal 510 is a result of many discrete measurements made over time. The measurements of the pixel-to-world alignment error are made at a specified frequency, e.g., 10 Hz, 20 Hz, 50 Hz, 100 Hz, 200 Hz, 500 Hz, 1000 Hz, as the HMD is moved along a path in the linear rail station. In some implementations, the motion is linear, e.g., along a rail. In some implementations, the motion is along a yaw or pitch angular direction.

[0050] The pixel-to-world alignment error as a metric may be a single number based on the temporal signal 510. For example, the pixel-to-world alignment error can be a mean, a root-mean-square, a median, or a percentile of the values of the temporal signal 510. In some implementations, the pixel-to-world alignment error may be a 95^th percentile of the values of the temporal signal 510.

[0051] The temporal signal 510 may be represented in frequency space. A frequency space representation of the temporal signal 510 can provide insight into the nature of the motion of the image location in the display as the HMD moves along a path in the linear rail station. Specifically, the frequency space representation can provide a jitter verification metric. Such a frequency space representation is provided in FIG. 5B.

[0052] FIG. 5B is a plot 550 that illustrates an example frequency space representation of the temporal signal 510 of FIG. 5 A. The frequency space representation of the temporal signal 510 is denoted as a frequency signal 560 in FIG. 5B. In some implementations, the frequency signal 560 is derived from the temporal signal via a discrete Fourier transform, e.g., a fast Fourier transform. Accordingly, although the frequency signal 560 is presented as a continuous curve, in reality it is realty a discrete curve that may be very highly sampled.

[0053] As shown in FIG. 5B, the frequency signal is divided into at least three frequency regions: a drift region 572 between 0 and 0. 1 Hz. a swim region 574 between 0. 1 Hz and 1 Hz, and a jitter region 570 between 1 Hz and 30 Hz. There is a high-frequency region above the jitter region 570, e.g., above 30 Hz, that is unlabeled. It is noted that the boundaries between the regions 570, 572, and 574 can correspond to other frequencies.

[0054] In some implementations, the part of the frequency signal 560 corresponding to the jitter region 570 is of interest. The reason for this is because the frequencies corresponding to the jitter region 570 are those which may provide the most discomfort to a user. Accordingly, to define a jitter verification metric which represents the effect of frequencies in the jitter region 570 on a user, a new frequency signal is defined that is the frequency signal 560 in the jitter region 570 and is zero outside of the jitter region 570. In this way, a bandpass filter is defined for frequencies in the jitter region 570.

[0055] To determine the jitter verification metric, a discrete inverse Fourier transform is applied to the new frequency signal in which frequencies outside of the jitter region 570 have zero amplitude. The result is a filtered temporal signal. The jitter verification metric is defined in terms of the filtered temporal signal, e.g., a mean, a root-mean-square, a median, a percentile, etc. In some implementations, the jitter verification metric is defined as the 95^th percentile of the filtered temporal signal.

[0056] FIG. 6 is a diagram that illustrates an example electronic apparatus in which the above-described technical solution may be implemented. Processing circuitry 620 is configured to perform end-to-end verification of an HMD by evaluating a verification metric corresponding to the HMD as the HMD moves within a linear rail station.

[0057] The processing circuitry 620 includes a network interface 622, one or more processing units 624, and memory 626. The network interface 622 includes, for example, Ethernet adaptors, Token Ring adaptors, and the like, for converting electronic and/or optical signals received from the network to electronic form for use by the processing circuitry 620. The set of processing units 624 include one or more processing chips and/or assemblies. The memory 626 includes both volatile memory (e.g., RAM) and non-volatile (nontransitory) memory, such as one or more ROMs, disk drives, solid state drives, and the like.

[0058] In some implementations, one or more of the components of the processing circuitry 620 can be, or can include processors (e.g., processing units 624) configured to process instructions stored in the memory 626. Examples of such instructions as depicted in FIG. 6 include a target image manager 630, an eye proxy position manager 640, and a verification metric manager 650. Further, as illustrated in FIG. 6, the memory 626 is configured to store various data, which is described with respect to the respective managers that use such data.

[0059] The target image manager 630 is configured to capture an image (target image data 632) of a target for display in an HMD. In some implementations, the target image data 632 is captured using a world-facing camera of the HMD. In some implementations, the target image data 632 is captured as the HMD is moved along a path within a linear rail station. In some implementations, the target includes an ArUco marker. In some implementations, the target includes an ALVAR marker. In some implementations, the marker is surrounded by a circle such that the center of the circle defines the position of the target.

[0060] The eye proxy position manager 640 is configured to generate, using an eye proxy camera, a position of the image of the target (eye proxy position data 542) while displayed within the display of the HMD. In some implementations, the eye proxy position data 642 represents an image of the center of the circle surrounding a marker.

[0061] The verification metric manager 650 is configured to generate a verification metric (verification metric data 652) based on the eye proxy position data 642. A verification metric is a metric that defines a degree of error in the end-to-end pipeline. One example of a verification metric is the pixel-to-world alignment error as defined previously. Another example of a verification metric is a jitter as defined previously. In some implementations, the verification metric is a pixel-to-world alignment error and is based on a temporal signal that is formed from a difference between the image position and a baseline image position of the target as the HMD is moved along a path within the linear rail station. In some implementations, the verification metric is a jitter value that is derived from the temporal signal by applying a bandpass filter to the temporal signal.

[0062] It is noted that a baseline image position is an image position that is obtained in conjunction with an end-to-end pipeline that is functioning without defect. For example, in the see-through image case, the see-through image is aligned with the image in the display when the end-to-end pipeline is operating without defect. Accordingly, the baseline image position in the see-through image case is the position of the see-through image. [0063] The components (e.g., modules, processing units 624) of the processing circuitry 620 can be configured to operate based on one or more platforms (e.g., one or more similar or different platforms) that can include one or more types of hardware, software, firmware, operating systems, runtime libraries, and/or so forth. In some implementations, the components of the processing circuitry 620 can be configured to operate within a cluster of devices (e.g., a server farm). In such an implementation, the functionality and processing of the components of the processing circuitry 620 can be distributed to several devices of the cluster of devices.

[0064] The components of the processing circuitry 620 can be, or can include, any t pe of hardware and/or software configured to process attributes. In some implementations, one or more portions of the components shown in the components of the processing circuitry 620 in FIG. 6 can be, or can include, a hardware-based module (e.g., a digital signal processor (DSP), a field programmable gate array (FPGA), a memory ), a firmware module, and/or a software-based module (e.g., a module of computer code, a set of computer-readable instructions that can be executed at a computer). For example, in some implementations, one or more portions of the components of the processing circuitry 620 can be, or can include, a software module configured for execution by at least one processor (not shown). In some implementations, the functionality of the components can be included in different modules and/or different components than those shown in FIG. 6. including combining functionality illustrated as two components into a single component.

[0065] Although not show n, in some implementations, the components of the processing circuitry' 620 (or portions thereof) can be configured to operate within, for example, a data center (e.g., a cloud computing environment), a computer system, one or more server/host devices, and/or so forth. In some implementations, the components of the processing circuitry 620 (or portions thereof) can be configured to operate within a netw ork. Thus, the components of the processing circuitry 620 (or portions thereof) can be configured to function yvithin various ty pes of netw ork environments that can include one or more devices and/or one or more server devices. For example, the network can be, or can include, a local area network (LAN), a yvide area nefyvork (WAN), and/or so forth. The nefyvork can be, or can include, a wireless network and/or wireless netw ork implemented using, for example, gateway devices, bridges, switches, and/or so forth. The network can include one or more segments and/or can have portions based on various protocols such as Internet Protocol (IP) and/or a proprietary' protocol. The network can include at least a portion of the Internet.

[0066] In some implementations, the memory 626 can be any type of memory such as a random-access memory, a disk drive memory, flash memory, and/or so forth. In some implementations, the memory 626 can be implemented as more than one memory component (e.g., more than one RAM component or disk drive memory) associated with the components of the processing circuitry 620. In some implementations, the memory 626 can be a database memory. In some implementations, the memory 626 can be, or can include, a non-local memory. For example, the memory' 626 can be, or can include, a memory shared by multiple devices (not shown). In some implementations, the memory 626 can be associated with a server device (not shown) within a network and configured to serve the components of the processing circuitry' 620. As illustrated in FIG. 6, the memory 626 is configured to store various data, including target image data 632, eye proxy position data 642, and verification metric data 652.

[0067] FIG. 7 is a flow chart depicting an example method 700 of performing end-to-end verification of an HMD according to the above-described improved techniques. The method 700 may be performed by software constructs described in connection with FIG. 6, which reside in memory 626 of the processing circuitry 620 and are run by the set of processing units 624.

[0068] At 702, the target image manager 630 captures an image of a target using a world-facing camera of a head-mounted display (HMD) as the HMD is moved along a path, the image of the target corresponding to a location along the path and being displayed in a display of the HMD.

[0069] At 704, the eye proxy position manager 640 generates, using an eye proxy camera, a position of the image of the target while displayed within the display of the HMD.

[0070] At 706, the verification metric manager 650 generates a verification metric based on the position of the image of the target.

[0071] Clause 1. A method, comprising: capturing an image of a target using a world-facing camera of a head-mounted display (HMD) as the HMD is moved along a path, the image of the target corresponding to a location of the HMD along the path and being displayed in a display of the HMD; determining, using an eye proxy camera, a position of the image of the target while displayed within the display of the HMD; and generating a verification metric based on the position of the image of the target.

[0072] Clause 2. The method as in clause 1, wherein generating the verification metric includes: generating a temporal signal based on a difference between the position of the image and a baseline image position of the target on the HMD; and generating a jitter value for the HMD by applying a bandpass filter to the temporal signal.

[0073] Clause 3. The method as in clause 2, wherein generating the temporal signal includes: determining the baseline image position of the target in the HMD as a location of a see-through image; and forming, as the temporal signal, a norm of the difference between the position of the image and the baseline image position.

[0074] Clause 4. The method as in clause 3, wherein the target includes a marker surrounded by a circle, and wherein the position of the image is located at an image of a center of the circle in the display of the HMD.

[0075] Clause 5. The method as in clause 4, wherein the marker includes an ALVAR marker.

[0076] Clause 6. The method as in any of clauses 2 to 5, wherein applying the bandpass filter to the temporal signal includes: generating a frequency spectrum of the temporal signal by computing a Fourier transform of the temporal signal; setting amplitudes of the frequency spectrum to zero outside of a specified frequency range to produce a filtered frequency spectrum; and computing an inverse Fourier transform of the filtered frequency spectrum to produce a filtered temporal signal.

[0077] Clause 7. The method as in clause 6, wherein the specified frequency range is about 1 Hz to about 30 Hz.

[0078] Clause 8. The method as in any of clauses 6 or 7, wherein the jitter value is determined by computing a 95th percentile value of the filtered temporal signal.

[0079] Clause 9. The method as in any of clauses 1 to 8, wherein the HMD is translated along a linear rail of a linear rail station.

[0080] Clause 10. A computer program product comprising a nontransitory storage medium, the computer program product including code that, when executed by processing circuitry, causes the processing circuitry' to perform the method as in clause 1. [0081] Clause 11. A system, comprising: a rail station configured to guide a head-mounted display (HMD) along a path; a jig configured to mount the HMD in the rail station; an eye proxy camera configured to simulate an eye by forming an image of a display of the HMD; and processing circuitry configured to: capture an image of a target using a world-facing camera of the HMD as the HMD is moved along the path, the image of the target corresponding to a location of the HMD along the path and being displayed in a display of the HMD; determine, using the eye proxy camera, a position of the image of the target while displayed within the display of the HMD; and generate a verification metric based on the position of the image of the target.

[0082] Clause 12. The system as in clause 11, wherein the processing circuitry configured to generate the verification metric is further configured to: generate a temporal signal based on a difference between the position of the image and a baseline image position of the target; and generate a jitter value for the head-mounted display by applying a bandpass filter to the temporal signal.

[0083] Clause 13. The system as in clause 12, wherein the processing circuitry configured to generate the temporal signal is further configured to: determine the baseline image position of the target as a location of a see-through image; and form, as the temporal signal, a norm of the difference between the position of the image and the baseline image position.

[0084] Clause 14. The system as in clause 13. wherein the target includes a marker surrounded by a circle, and wherein the position of the image is located at an image of a center of the circle in the display of the HMD.

[0085] Clause 15. The system as in clause 14, wherein the marker includes an ALVAR marker.

[0086] Clause 16. The system as in any of clauses 12 to 15, wherein the processing circuitry configured to apply the bandpass filter to the temporal signal is further configured to: generate a frequency spectrum of the temporal signal by computing a Fourier transform of the temporal signal; set amplitudes of the frequency spectrum to zero outside of a specified frequency range to produce a filtered frequency spectrum; and compute an inverse Fourier transform of the filtered frequency spectrum to produce a filtered temporal signal.

[0087] Clause 17. The system as in clause 16, wherein the specified frequency range is about 1 Hz to about 30 Hz. [0088] Clause 18. The system as in any of clauses 16 or 17, wherein the jitter value is determined by computing a 95th percentile value of the filtered temporal signal.

[0089] Clause 19. The system as in any of clauses 11 to 18, wherein the HMD is translated along a linear rail of a linear rail station.

[0090] Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.

[0091] These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “nontransitory machine-readable medium"’ "non trans i lory computer-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory. Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.

[0092] To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input. [0093] The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN’'), a wide area network (“WAN”), and the Internet.

[0094] The computing system can include clients and servers. A client and server are generally remote from each other and ty pically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

[0095] A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the specification.

[0096] It will also be understood that when an element is referred to as being on, connected to. electrically connected to, coupled to, or electrically coupled to another element, it may be directly on, connected or coupled to the other element, or one or more intervening elements may be present. In contrast, when an element is referred to as being directly on, directly connected to or directly coupled to another element, there are no intervening elements present. Although the terms directly on, directly connected to, or directly coupled to may not be used throughout the detailed description, elements that are shown as being directly on, directly connected or directly coupled can be referred to as such. The claims of the application may be amended to recite example relationships described in the specification or shown in the figures.

[0097] While certain features of the described implementations have been illustrated as described herein, many modifications, substitutions, changes and equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the scope of the implementations. It should be understood that they have been presented by way of example only, not limitation, and various changes in form and details may be made. Any portion of the apparatus and/or methods described herein may be combined in any combination, except mutually exclusive combinations. The implementations described herein can include various combinations and/or sub-combinations of the functions, components and/or features of the different implementations described.

[0098] In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Accordingly, other implementations are within the scope of the following claims.

Claims

WHAT IS CLAIMED IS:

1. A method, comprising: capturing an image of a target using a world-facing camera of a headmounted display (HMD) as the HMD is moved along a path, the image of the target corresponding to a location of the HMD along the path and being displayed in a display of the HMD; determining, using an eye proxy camera, a position of the image of the target while displayed within the display of the HMD; and generating a verification metric based on the position of the image of the target.

2. The method as in claim 1, wherein generating the verification metric includes: generating a temporal signal based on a difference between the position of the image and a baseline image position of the target on the HMD; and generating a jitter value for the HMD by applying a bandpass filter to the temporal signal.

3. The method as in claim 2, wherein generating the temporal signal includes: determining the baseline image position of the target in the HMD as a location of a see-through image; and forming, as the temporal signal, a norm of the difference between the position of the image and the baseline image position.

4. The method as in claim 3, wherein the target includes a marker surrounded by a circle, and wherein the position of the image is located at an image of a center of the circle in the display of the HMD.

5. The method as in claim 4, wherein the marker includes an ALVAR marker. The method as in claim 2, wherein applying the bandpass filter to the temporal signal includes: generating a frequency spectrum of the temporal signal by computing a Fourier transform of the temporal signal; setting amplitudes of the frequency spectrum to zero outside of a specified frequency range to produce a filtered frequency spectrum; and computing an inverse Fourier transform of the filtered frequency spectrum to produce a filtered temporal signal. The method as in claim 6, wherein the specified frequency range is about 1 Hz to about 30 Hz. The method as in claim 6, wherein the jitter value is determined by computing a 95^th percentile value of the filtered temporal signal. The method as in claim 1, wherein the HMD is translated along a linear rail of a linear rail station. A computer program product comprising a nontransitory storage medium, the computer program product including code that, when executed by processing circuitry, causes the processing circuitry to perform the method as in claim 1. A system, comprising: a rail station configured to guide a head-mounted display (HMD) along a path; a jig configured to mount the HMD in the rail station; an eye proxy camera configured to simulate an eye by forming an image of a display of the HMD; and processing circuitry configured to: capture an image of a target using a world-facing camera of the HMD as the HMD is moved along the path, the image of the target corresponding to a location of the HMD along the path and being displayed in a display of the HMD: determine, using the eye proxy camera, a position of the image of the target while displayed within the display of the HMD; and generate a verification metric based on the position of the image of the target. The system as in claim 11, wherein the processing circuitry configured to generate the verification metric is further configured to: generate a temporal signal based on a difference between the position of the image and a baseline image position of the target; and generate a jitter value for the head-mounted display by applying a bandpass filter to the temporal signal. The system as in claim 12, wherein the processing circuitry configured to generate the temporal signal is further configured to: determine the baseline image position of the target as a location of a see-through image; and form, as the temporal signal, a norm of the difference between the position of the image and the baseline image position. The system as in claim 13, wherein the target includes a marker surrounded by a circle, and wherein the position of the image is located at an image of a center of the circle in the display of the HMD. The system as in claim 14, wherein the marker includes an ALVAR marker. The system as in claim 12, wherein the processing circuitry configured to apply the bandpass filter to the temporal signal is further configured to: generate a frequency spectrum of the temporal signal by computing a Fourier transform of the temporal signal; set amplitudes of the frequency spectrum to zero outside of a specified frequency range to produce a filtered frequency spectrum; and compute an inverse Fourier transform of the filtered frequency spectrum to produce a filtered temporal signal. The system as in claim 16, wherein the specified frequency range is about 1 Hz to about 30 Hz. The system as in claim 16, wherein the jitter value is determined by computing a 95^th percentile value of the filtered temporal signal. The system as in claim 11. wherein the HMD is translated along a linear rail of a linear rail station.