WO2015030623A1

WO2015030623A1 - Methods and systems for locating substantially planar surfaces of 3d scene

Info

Publication number: WO2015030623A1
Application number: PCT/RU2013/000761
Authority: WO
Inventors: Evgeny Sergeevich SOLOGUB
Original assignee: 3Divi Company
Priority date: 2013-09-02
Filing date: 2013-09-02
Publication date: 2015-03-05

Abstract

The technology described herein allows for locating a room surfaces, such as a floor, walls, a ceiling, based on depth maps obtained by a depth-sensing device. The room surfaces can be located through a multi-step process including, for example, the steps of obtaining a depth map, selecting characteristic pixels from a plurality of pixels pertained to the depth map, calculating 3D coordinates with respect to the selected characteristic pixels, determining clusters of the 3D coordinates, generating a plurality of candidate planes based at least in part on the clusters, selecting planes associated with the floor, walls, ceiling from the plurality of candidate planes, and determining 3D coordinates of the selected planes. The 3D coordinates or related data of the located room surfaces can be utilized in virtual reality simulation or rendering of 3D images.

Description

METHODS AND SYSTEMS FOR LOCATING SUBSTANTIALLY PLANAR SURFACES OF 3D SCENE

TECHNICAL FIELD

[0001] This disclosure relates generally to human-computer interfaces involving depth sensing and, more particularly, to the technology for locating substantially planar surfaces of three-dimensional (3D) scene, such as room walls, floor, ceiling, and determining coordinates of these surfaces based on processing of depth maps.

DESCRIPTION OF RELATED ART

[0002] The approaches described in this section could be pursued, but are not necessarily approaches that have previously been conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.

[0003] Technologies associated with human-computer interaction have been significantly evolved over the last several decades. Today, various electronic devices, such as computers, game consoles, smart phones, are controlled through a wide range of various input devices and associated interfaces. Keyboards, keypads, pointing devices, joysticks, remote controllers, and touchscreens are just some examples of input devices that can be used to interact with electronic devices. One of the rapidly growing technologies in the field of human-computer interaction is the gesture recognition technology, which enables users to interact with electronic devices naturally, using body language rather than mechanical devices. The gesture recognition technology implies that the users make inputs or generate commands using gestures or motions of hands, arms, fingers, legs, head, and so forth. For example, using the concept of gesture recognition, it is possible to point a finger at a computer screen and cause the cursor to move accordingly.

[0004] The gesture recognition technology is based on the use of depth maps generated by a 3D camera or a depth-sensing device (or simply a depth sensor). The depth maps may be processed and interpreted by a control system, such as a computer or a game console, to generate various commands based on identification of user gestures or motions. The gesture recognition technology is successfully used in gaming software applications and virtual reality applications. In either example, depth maps may be processed to generate a user avatar and translate user motions or gestures into motions and gestures of the avatar being displayed on a display screen. [0005] In conventional systems, 3D space within which the user is present is neither analyzed nor interpreted based on depth maps. Hence, the user motions are not always reliably and accurately translated into motions of the user avatar. For example, when a user walks within the 3D environment (e.g., a room), the user avatar repeats the user motions, but the walking speed or step distance of the user avatar may be far from the real user speed or real step distance. Further, in traditional systems, the user height may not be interpreted correctly based on the depth maps and thus visual interpretation of user avatar standing on a virtual floor or ground may not be proper and accurate. In view of at least the foregoing drawbacks, there is still a need in the art for improvements of gesture recognition technology. SUMMARY

[0006] This summary is provided to introduce a selection of concepts in a simplified form that are further described in the Detailed Description below. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

[0007] The present disclosure refers to methods and systems allowing for accurate locating substantially planar surfaces of 3D space including a floor (ground), walls, ceiling of a room (building, premises). The present method and systems also allows for determining corresponding coordinates or related data associated with the floor, walls, ceiling based on processing and analysis of depth maps. The coordinates of located planar surfaces can be further used in virtual reality simulation or computer games including rendering 3D images or 3D video. The present technology overcomes or helps in overcoming drawbacks of the prior art systems associated with inaccurate visualization of virtual reality having an avatar. For example, accurate determining coordinates of a floor of 3D scene within which the user is present helps in more accurate translation of user motions or gestures in motions and gestures of a user avatar. Some other applications of the present technology may include interior design software, augmented reality applications, and so forth.

[0008] According to various embodiments of the present disclosure, provided are a system and method for locating at least one substantially planar surface of 3D space. The system may include a depth-sensing device configured to obtain depth maps in real time. The system may further include a control system implemented as a computing device operatively coupled to the depth-sensing device and configured to process and analyze the depth maps. In certain embodiments, the depth-sensing device and the control system may be integrated into a single unit. In yet other

embodiments, the depth-sensing device and/or the control system may be a part of a game console or a virtual reality simulator. Further, there may be provided multiple depth-sensing devices linked to a single control system.

[0009] In operation, according to various embodiments of the present disclosure, the control system (e.g., a processor or computing means) receives at least one depth map from the depth-sensing device. The depth map is associated with the 3D space, within which one or more users and/or various objects may be present. The 3D space may relate to a room, a part of building, premises or otherwise limited space domain. The depth map includes a plurality of pixels, and each of the pixels includes two-dimensional (2D) coordinates and a depth value for each given 2D coordinate. For example, the 2D coordinates may include a first coordinate associated with an abscissa axis value (i.e., X) and a second coordinate associated with an ordinate axis value (i.e., Y). The pixel represented as {X, Y, Z} would then include a depth value related to the x and y coordinates.

[0010] The control system further analyzes the depth map and selects "characteristic" pixels from the plurality of pixels. The characteristic pixels are characteristic for at least one substantially planar surface of the 3D space. The substantially planar surface may refer to a side wall, rear wall, floor, ceiling, or related surfaces. The selection of characteristic pixels may involve selection groups of pixels that have a substantially common the first coordinate (X), while for the second coordinate (Y) the following rule is applied: with the increase of the second coordinate values (Y), the associated depth value (Z) also increases. In another example, the selecting of characteristic pixels may involve selecting those groups of pixels that has a substantially common the first coordinate (X), while for the second coordinate (Y) the following rule is applied: with the increase of the second coordinate values (Y), the associated depth value (Z) deceases. In yet another example, the selecting of characteristic pixels may involve selecting those groups of pixels that has a substantially common the second coordinate (Y), while for the first coordinate (X) the following rule is applied: with the increase of the second coordinate values (X), the associated depth value (Z) increases, or vice versa. In yet another example, the selecting of characteristic pixels may involve selecting those groups of pixels that just has a substantially common the depth value (Z).

[0011] Further, the control system may calculate 3D coordinates with respect to the selected characteristic pixels. The calculation may be based on a number of parameters including, for example, a pixel dimension, focal length, coordinates of optical axis related to the depth-sensing device, and so forth.

[0012] The control system may then analyze all 3D coordinates and determine (locate, calculate) clusters of the 3D coordinates. The clusters may refer to groups of 3D coordinates that similar to each other, or in other words, present within a predetermined proximity from each other. For example, for a certain 3D coordinates of a given cluster, all other 3D coordinates of the same cluster present with a predetermined range. Accordingly, the clusters may be characterized by an elemental area.

[0013] Further, the control system may generate (calculate) a plurality of candidate planes based at least in part on the clusters. The candidate planes may be calculated by (i) selecting a candidate pixel per cluster, and (ii) running, for example, Random Sample Consensus (RANSAC) process with respect to the candidate pixels for all clusters.

[0014] The control system may then select one or more planes associated with the substantially planar surfaces of the 3D space among the plurality of candidate planes. In an example embodiment, the lowest candidate plane may be selected. The lowest candidate plane means its second coordinate (Y) has the smallest (or highest) value among all candidate planes. In other embodiments, the lowest candidate plane means its first coordinate (X) has the smallest or highest value among all candidate planes. Further, the control system may determine 3D coordinates of the selected plane(s) associated with the substantially planar surfaces of the 3D space. In other words, the control system may determine coordinates of depth maps pixels related to the selected candidate plane(s). Said 3D coordinates of the selected plane(s) may be then used in rendering images or video involving 3D virtual reality of 3D space.

[0015] Thus, the present technology provides multiple benefits including improved and more accurate virtual reality simulation as well as better gaming experience, which includes enhanced representation of 3D virtual reality and representation of user avatar. Other features, aspects, examples, and embodiments are described below.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] Embodiments are illustrated by way of example, and not by limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

[0017] FIG. 1 shows an example 3D scene suitable for implementation of a real time human-computer interface employing the present technology for detecting at least one substantially planar surface of the 3D scene.

[0018] FIG. 2 shows exemplary coordinate systems that can be used by a control system.

[0019] FIG. 3 shows an example depth map associated with an example 3D scene.

[0020] FIG. 4 shows a coordinate system having coordinates of selected characteristic pixels according to an example embodiment.

[0021] FIG. 5 shows an example 3D coordinate system having clusters of 3D coordinates associated with selected characteristic pixels.

[0022] FIG. 6 shows an example 3D coordinate system and one of candidate planes created by RANSAC process.

[0023] FIG. 7 shows a high-level block diagram of control system for implementing methods for locating a floor of 3D scene.

[0024] FIG. 8 shows another example 3D scene suitable for implementation the present technology for detecting at least one substantially planar surface of the 3D scene.

[0025] FIG. 9 shows the 3D scene of FIG. 8 and also various clusters of 3D coordinates associated with selected characteristic pixels.

[0026] FIG. 10 is a process flow diagram showing an example method for locating at least one substantially planar surface of a 3D scene. [0027] FIG. 11 is a diagrammatic representation of an example machine in the form of a computer system within which a set of instructions for the machine to perform any one or more of the methodologies discussed herein is executed.

DETAILED DESCRIPTION

[0028] The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These example embodiments, which are also referred to herein as "examples," are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other

embodiments can be utilized, or structural, logical, and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is therefore not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents. In this document, the terms "a" and "an" are used, as is common in patent documents, to include one or more than one. In this document, the term "or" is used to refer to a nonexclusive "or," such that "A or B" includes "A but not B," "B but not A," and "A and B," unless otherwise indicated.

[0029] The techniques of the embodiments disclosed herein may be implemented using a variety of technologies. For example, the methods described herein may be implemented in software executing on a computer system or in hardware utilizing either a combination of microprocessors, controllers or other specially designed application-specific integrated circuits (ASICs), programmable logic devices, or various combinations thereof. In particular, the methods described herein may be implemented by a series of computer-executable instructions residing on a storage medium such as a disk drive, solid-state drive or on a computer-readable medium.

Introduction & Terminology

[0030] The embodiments described herein relate to computer- implemented methods and corresponding systems for determining and/or locating one or more substantially planar surfaces of 3D environment including at least one of the following: a floor, a wall and a ceiling.

[0031] The term "depth-sensing device," as used herein, may refer to any suitable electronic device capable to generate depth maps of a 3D space. Some examples of the depth-sensing devices include a depth sensor, depth sensitive camera, 3D camera, video camera configured to process images to generate depth maps, and so forth. The depth-sensing device may be also characterized by an optical axis (e.g., a central axis of lens related to the depth-sensing device, along which axis there is some degree of rotational symmetry). The depth-sensing device may be also characterized by a focal length. The depth-sensing device may be also characterized by resolution of generated depth maps or by pixel dimensions.

[0032] The term "depth map," as used herein, may refer to an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint, i.e. from the depth-sensing device. This term is related to and may be analogous to depth buffer, Z-buffer, Z- buffering, Z-depth, and so forth. The "Z" in these latter terms relates to a convention that the central axis of view of depth-sensing device is in the direction of the device's Z-axis (normal to XY-plane), and may not relate to the absolute Z-axis of a scene. Accordingly, in this disclosure, the depth map may include a plurality of pixel values {X, Y, Z}, where X and Y represent values of two-dimensional (2D) coordinates associated with XY orthogonal coordinate system coinciding with an image of depth map. The value Z is associated with depth value for given XY coordinates.

[0033] The term "control system," as used herein, may refer to any suitable computing apparatus or system configured to process data, such as depth maps, user inputs, and so forth. Some examples of control system may include a desktop computer, laptop computer, tablet computer, gaming console, audio system, video system, cellular phone, smart phone, personal digital assistant, set-top box, television set, smart television system, in-vehicle computer, infotainment system, head-mounted display, head-coupled display, helmet-mounted display, wearable computer having a display (e.g., a head-mounted computer with a display), and so forth. In certain

embodiments, the control system may be incorporated or operatively coupled to a game console, infotainment system, television device, and so forth. In yet more embodiments, the control system and the depth-sensing device may be integrated into a single device. In certain embodiments, the term "control system" may be simplified to or be interchangeably mentioned as "computing device," "processing means" or simply a "processor".

[0034] The term "substantially planar surfaces," as used herein, refer to at least a part of the following: a floor, a ground, a wall, a ceiling, or other surfaces pertained to a limited space region including a building, premise, room, and so forth.

[0035] The depth maps can be processed by the control system to determine or locate at least one substantially planar surface of 3D space (scene) and/or determine characteristics (e.g., coordinates) of the at least one substantially planar surface. The depth maps can be also processed by the control system to locate a user present within the 3D space and also its body parts including a user head, limbs. In certain embodiments, the control system may identify and interpret user motions and gestures. Further, the depth maps, when processed, may be used to generate a virtual skeleton of the user. The virtual skeleton may be used for creating a user avatar. The characteristics of at least one substantially planar surface may be used in software for simulation of virtual reality, interior design, etc.

[0036] The term "virtual reality" may refer to a computer-simulated environment that can simulate physical presence in places in the real world, as well as in imaginary worlds. Most current virtual reality environments are primarily visual experiences, but some simulations may include additional sensory information, such as sound through speakers or headphones. Some advanced, haptic systems may also include tactile information, generally known as force feedback, in medical and gaming applications.

[0037] The term "avatar," as used herein, may refer to a visible

representation of a user's body in a virtual reality world. An avatar can resemble the user's physical body, or be entirely different, but typically it corresponds to the user's position, movement and gestures, allowing the user to see their own virtual body, as well as for other users to see and interact with them.

[0038] The term "coordinate system," as used herein, may refer to 2D coordinate system or 3D coordinate system, for example, a 2D or 3D Cartesian coordinate system.

Principles for Locating 3D Scene Floor

[0039] With reference now to the drawings, FIG. 1 shows an example 3D scene 100 (e.g., a room) suitable for implementation of a real time human- computer interface employing the present technology for locating a floor of the 3D scene. In particular, there is shown a control system 110 employing one or more depth-sensing devices and/or one or more video cameras configured to generate depth maps of at least a part of the scene 100. The control system 110 may implement the floor locating technology based on the depth maps as described herein. The detailed description of the control system 110 and its components are given below with reference to FIG. 9.

[0040] The control system 110 is secured to a wall such that various objects of the scene 100 and also a floor 120 may be in the field of view of the depth- sensing device(s) and/or video camera(s) pertained to the control system 110. The scene objects may include, as shown in the figure, a chair 130, table 140, and user 150. It should be clear that these objects are merely examples and this disclosure shall not be limited to just these example scene objects. The present technology allows for accurate locating the floor 120 and its coordinates regardless of that there are other objects, such as the objects 130- 150, present within the scene 100.

[0041] It should be also clear for those skilled in the art that the control system 110 and in particular its depth-sensing device(s) may not be disposed on the wall, but positioned on any other surface such as a table, secured to a TV display or game console, and so forth. In yet other embodiments, the control system 110 and in particular its depth-sensing device(s) and/or video camera(s) may be worn by the user 150 (e.g., in a form of head-mounted computer or the like).

[0042] FIG. 2 shows a scene 200 and exemplary coordinate systems that can be used by the control system 110. More specifically, the control system 110 may utilize a 3D Cartesian coordinate system XYZ, which origin may be, for example, in the center of the depth-sensing device or a specific lens.

Abscissa axis X and ordinate axis Y may be normal to the axis Z and coincide with a plane of depth map images.

[0043] As shown in this figure, there is another 3D Cartesian coordinate system X'Y'Z' associated with the floor 120. In most applications, the coordinate system XYZ and coordinate system X'Y'Z' are disoriented with respect to each other. For example, the axis Z may be inclined with respect to the axis Z' and form an angle 210. The present technology may provide reliable results when the angle 210 is in the range of 0° to 90°.

[0044] In operation, the control system 110 obtains depth maps bound to the coordinate system XYZ, and which depth maps illustrate at least a part of the scene including at least a part of the floor 120. FIG. 3 shows an example depth map 300 associated with an example scene 100. The depth map 300 include pixels related to the user 150, pixels 310 related to the floor 120, pixels 320 related to a wall, and so forth. The pixels 310 associated with the floor 120 are characterized by the increase of the depth value Z with the increase of the coordinate Y for a given and fixed coordinate X. Accordingly, in operation, the control system 110 processes the depth map(s), such as the depth map 300, and selects "characteristic" pixels which are characteristic for the floor 120. In other words, the characteristic pixels are groups of pixels, wherein each group of pixels includes one common coordinate value X, while the depth values Z change with the change of the coordinate value Y. In an example, for a given coordinate value X, the depth values Z may substantially increase with the increase of the coordinate value Y. However, in another example, for said given coordinate value X, the depth values Z may substantially decrease with the decrease of the coordinate value Y. In yet another example, for said given coordinate value X, the depth values Z may substantially decrease with the increase of the coordinate value Y, or vice versa.

[0045] Accordingly, pixels related to other objects, such as the user 150 or the wall, are not considered and not included into the group of characteristic pixels.

[0046] Further, the control system 110 may calculate 3D coordinates with respect to the selected characteristic pixels (i.e. the pixels 310). The 3D coordinates may be calculated based on transformations corresponded to a process for obtaining depth maps. For example, the 3D coordinates may be calculated using the following equations:

Z =z

(Equation No.l) where x and y are characteristic pixel coordinates as taken from the depth map; z is a depth value for given XY coordinates of characteristic pixel. XYZ are "global" coordinates (see FIG. 2). The values Sx and s_y are pixel dimension. The values fx and f_y are a focal distance. The values Ox and o_y are an optical axis of the depth-sensing device. [0047] FIG. 4 shows a coordinate system illustrating coordinates of selected characteristic pixels according to an example embodiment. More specifically, FIG. 4 illustrates the XYZ coordinate system 410 as discussed above and a plurality of 3D coordinates 410 pertained to the selected characteristic pixels. As shown in the figure, there is a breach in the 3D coordinates which relate to the user 150, which was not "omitted" when characteristic pixels were selected.

[0048] Further, the control system 110 may analyze 3D coordinate and then generate or determine clusters of the 3D coordinates 410. The clusters may be determined based on a wide range of principles. In the given example, the clusters may be created by coordinates X and Y and for the each cluster the smallest Y coordinate may be selected. In other words, the clusters may be created by selecting those characteristic pixels that have similar 3D coordinates. 3D coordinates of each cluster may be located within an area having a predetermined size. FIG. 5 shows an example 3D coordinate system 500 having clusters 510 of 3D coordinates associated with the selected characteristic pixels 410. For each cluster 510, a candidate pixel 520 may be calculated or determined. The candidate pixel 520 may represent statistical average value of all pixels from a given cluster 510 or it may be a pixel with the smallest Y coordinate. Those skilled in the art should understand that various methodologies can be applied for selecting the candidate pixels.

[0049] In the following operation steps, the control system 110 may generate a plurality of candidate planes based on the clusters 510. More specifically, the plurality of candidate planes can be calculated based on the candidate pixels discussed above. In an example embodiment, the candidate planes may be calculated using the following equation:

Ax +By+Cz+D=0 (Equation No.2) where x, y, z are coordinates of candidate pixel, and A, B, C, D are

predetermined constants.

[0050] The plurality of candidate planes may be then determined using, for example, Random Sample Consensus (RANSAC). RANSAC is an iterative process for estimating parameters from a set of observed data. A basic assumption of this process is that the observed data consists of "inliers", i.e., data whose distribution can be explained by a certain model, though may be subject to noise, and "outliers" which are data that do not fit said model. The outliers can come, for example, from extreme values of noise or from erroneous measurements or incorrect hypotheses about the interpretation of data. RANSAC process also assumes that, given a set of inliers, there exists a procedure which can estimate the parameters of a model that optimally explains or fits this data. In the present technology, RANSAC process may rely on a number of pixels associated with a given candidate plane and a predetermined deviation, which may result in determining a plurality of planes. Among this plurality of planes, candidate planes are selected according to a model which imply that the candidate planes have an area size greater than a predetermined threshold.

[0051] It should be clear for those skilled in the art that not only RANSAC process may be used to determine the plurality of candidate planes, but also other statistical algorithms, approximation algorithms, heuristic mathematical algorithms, machine learning algorithms, and so forth.

[0052] FIG. 6 shows an example 3D coordinate system 600 and one of the candidate planes 610 created by the use of RANSAC process. As shown in the figure, there are pixels 520-A, which have 3D coordinates above the candidate plane 610, pixels 520-B, which have 3D coordinates below the candidate plane 610, and pixels 520-C, which 3D coordinates are substantially fit the candidate plane 610.

[0053] Further, the control system 110 may analyze all of the candidate planes and select one of those as a most characteristic and representative of the floor 120. In an example, the control system 110 may select the lowest of the candidate planes. The lowest candidate plane means that its ordinate axis value Y is the lowest among all candidate planes. In another example, an average or median plane of all candidate planes may be selected. In yet other examples, there may be developed other criteria or rules for selecting one of the candidate planes.

[0054] Further, the control system 110 may optionally determine 3D coordinates of the selected candidate plane associated with the floor 120. As will be appreciated by those skilled in the art it may be accomplished by a number of methods. However, perhaps the simplest one is the use of the following equation (which is derived from Equation No. 2):

A _^B_ D__{= z}

C ^J C (Equation No.3)

[0055] The control system 110 may merely apply {x, y, z} values taken from the depth map pixels to this equation so to determine whether or not a given pixel of the depth map pertains to the selected candidate plane. In other words, all or a part of the depth map pixels may be analyzed using the equation No. 3 to find those that relate to the floor 120. The depth map pixels associated with the floor 120 using these principles may be considered as floor coordinates or floor data. This data can be then utilized in virtual reality simulation applications or gaming software applications. In particular, this data can be used when 3D images or 3D video are rendered for virtual reality simulation. [0056] It should be also noticed that the method described herein may determine floor data with controllable discrepancy. However, the level of discrepancy is relatively low for the present technology, which allows for obtaining reliable and accurate location of floor on the depth maps.

[0057] Also, in certain embodiments, the floor 120 may be detected on the depth maps only once, e.g. in the beginning of operation of the control system 110. Alternatively, in other embodiments, the floor 120 may be detected and floor data may be determined dynamically or repeatedly (e.g., for the embodiment when the depth-sensing device moves a lot).

Principles for Locating 3D Scene Walls, Floor and Ceiling

[0058] The present technology is applicable not only for locating a floor, but also other substantially planar surfaces of the 3D scene including walls and ceiling. With reference to FIG. 7, there is shown an example 3D scene 700 (substantially similar to what is shown in FIG. 1). The 3D scene 700 represents a room having a floor 120, right side wall 710 (respective to an optical axis of depth-sensing device of control system 110), left side wall 720, rear wall 730, and a ceiling 740. Similarly to other embodiments, there may be present various objects within the 3D scene such as a user 150, a chair 130, and the like. The control system 110 is secured to a wall 750 such that at least a part of the floor 120, walls 710-730, and/or ceiling 740 are within the field of view of the depth-sensing device(s) and/or video camera(s) pertained to the control system 110. The floor 120, walls 710-730, and/or ceiling 740 may have planar or substantially planar shapes.

[0059] Similarly to FIG. 2, the control system 110 may utilize a 3D

Cartesian coordinate system XYZ, which origin may coincide with the center of the depth-sensing device or a specific lens. Abscissa axis X and ordinate axis Y may be normal to the axis Z and coincide with a plane of depth map images. The axis Z may represent an optical axis of the depth-sensing device.

[0060] In operation, the control system 110 obtains depth maps bound to the coordinate system XYZ, and which depth maps illustrate at least a part of the scene including at least a part of the floor 120, walls 710-730, and/or ceiling 740. The depth map pixels associated with the floor 120 are

characterized by the increase of the depth value Z with the increase of the coordinate Y for a given and fixed coordinate X. The depth map pixels associated with the right wall 710 are characterized by the increase of the depth value Z with the decrease of the coordinate X for a given and fixed coordinate Y. The depth map pixels associated with the left wall 720 are characterized by the increase of the depth value Z with the increase of the coordinate X for a given and fixed coordinate Y. The depth map pixels associated with the rear wall 730 are characterized by substantially similar the depth value Z for a wide range of coordinates X and Y. The depth map pixels associated with the ceiling 740 are characterized by the increase of the depth value Z with the decrease of the coordinate Y for a given and fixed coordinate X.

[0061] Similar to the operation described above, the control system 110 processes the depth map(s), such as the depth map 300, and selects

"characteristic" pixels which are characteristic for at least one of the floor 120, walls 710-730, and ceiling 740. The characteristic pixels can be grouped into pixel groups having one common property or feature as described above. The pixel groups can include a group of pixels related to the floor, a group of pixels related to the left wall, a group of pixels related to the right wall, a group of pixels related to the rear wall, and/or a group of pixels related to the ceiling. [0062] Further, the control system 110 may individually calculate 3D coordinates with respect to the selected characteristic pixels (i.e. the pixels 310) for each surface (floor, walls, ceiling). For example, the 3D coordinates may be calculated using Equation No.l (see above). The control system 110 may then analyze 3D coordinate and generate or determine clusters of the 3D coordinates. The clusters may be determined based on a wide range of principles. In the given example, the clusters may be created by coordinates X and Y and for the each cluster the smallest Y coordinate may be selected. In other words, the clusters may be created by selecting those characteristic pixels that have similar 3D coordinates. 3D coordinates of each cluster may be located within an area having a predetermined size. FIG. 8 shows multiple exemplary pixel clusters of the 3D scene 700. There can be found clusters 810 of characteristic pixels associated with the floor 120, clusters 820 of

characteristic pixels associated with the left wall 720, clusters 830 of characteristic pixels associated with the rear wall 730. Those skilled in the art will appreciate that there may be a greater or smaller number of pixel clusters.

[0063] The control system 110 may further generate a plurality of candidate planes based on the pixel clusters. For these ends, Equation No.2 can be utilized as described above. The plurality of candidate planes may be determined using, for example, RANSAC technology.

[0064] Further, the control system 110 may analyze all of the candidate planes and select those planes that are most characteristic or representative of the surfaces 120, 710-740. In other words, one candidate plane per surface 120, 710-740 can be selected. In an example, the control system 110 may select the lowest of the candidate planes for the floor 120 and the highest candidate plane for the ceiling 740. The lowest candidate plane means that its ordinate axis value Y is the lowest among all candidate planes. The highest candidate plane means that its ordinate axis value Y is the highest among all candidate planes. Similarly, for the walls 710 and 720, the most left sided wall or the most right sided wall can be selected. With respect to the rear wall 730, the furthest plane can be selected.

[0065] Further, the control system 110 may optionally determine 3D coordinates of the selected candidate planes. As described above, Equation No.3 can be utilized for these ends. The coordinates of surfaces 120, 710-740 can be then utilized in virtual reality simulation applications, gaming software applications, graphic and design software, and so forth. ^ Control System

[0066] FIG. 9 shows a high-level block diagram of an environment 900 suitable for implementing methods for locating and determining coordinates (or related data) of at least one substantially planar surface of 3D scene(s). As shown in this figure, there is provided the control system 110, which may comprise one or more depth-sensing devices. In the shown example, the control system 110 may have a depth sensor 910 configured to dynamically capture depth maps. In various embodiments, the depth sensor 910 may include an infrared (IR) projector to generate modulated light, and an IR camera to capture 3D images of reflected modulated light. Alternatively, the depth sensor 910 may include two digital stereo cameras enabling it to generate depth maps. In yet additional embodiments, the depth sensor 910 may include time-of-flight sensors or integrated digital video cameras together with depth sensors.

[0067] In some example embodiments, the control system 110 may optionally include a color video camera 920 to capture a series of 2D images in addition to 3D imagery already created by the depth sensor 910. The series of 2D images captured by the color video camera 920 may be used to facilitate identification of the user, user gestures or motions, facilitate identification of user emotions, facilitate identification of the floor 120, and so forth. In yet more embodiments, the only color video camera 920 can be used to generate depth maps, and not the depth sensor 910. It should also be noted that the depth sensor 910 and the color video camera 920 can be either stand alone devices or be encased within a single housing with the remaining components of the control system 110.

[0068] Furthermore, the control system 110 may also comprise a computing unit 930, such as a processor or a Central Processing Unit (CPU), for processing and analyzing at least the depth maps as described herein. The computing unit 930 may also generate virtual reality, i.e. render 3D images of virtual reality simulation which images can be shown to the user 150 via a display device. In certain embodiments, the computing unit 930 may run game software.

[0069] In certain embodiments, the control system 110 may optionally include at least one motion sensor 940 such as a movement detector, accelerometer, gyroscope, magnetometer or alike. The motion sensor 940 may determine whether or not the control system 110 and more specifically the depth sensor 910 is/are moved or differently oriented with respect to the 3D scene or its surfaces. If it is determined that the control system 110 or its elements are moved, said at least one substantially planar surface may be detected again. In certain embodiments, when the depth sensor 910 and/or the color video camera 920 are separate devices not present in a single housing with other elements of the control system 110, the depth sensor 910 and/or the color video camera

920 may include internal motion sensor(s) 940.

[0070] The control system 110 also includes a communication module 950 configured to communicate with one or more optional peripheral electronic devices 960 using wireless or wired interface. These electronic devices 960 may refer, for example, to computers (e.g., laptop computers, tablet computers), displays, audio systems, video systems, television systems, set- top boxes, gaming consoles, game pads, entertainment systems, home appliances, and so forth.

[0071] The control system 110 may also include a bus 970 interconnecting the depth sensor 910, optional color video camera 920, computing unit 930, optional motion sensor 790, and communication module 950. Those skilled in the art will understand that the control system 110 may include other modules or elements, such as a power module, user interface, housing, control key pad, memory, etc., but these modules and elements are not shown not to burden the description of the present technology.

[0072] The communication between the control system 110 (i.e., via the communication module 950) and the optional electronic devices 960 can be performed via a network 980. The network 980 can be a wireless or wired network, or a combination thereof. Some examples of the network 980 include the Internet, local intranet, PAN (Personal Area Network), LAN (Local Area Network), WAN (Wide Area Network), MAN (Metropolitan Area Network), virtual private network (VPN), storage area network (SAN), frame relay connection, Advanced Intelligent Network (AIN) connection, synchronous optical network (SONET) connection, digital Tl, T3, El or E3 line, Digital Data Service (DDS) connection, DSL (Digital Subscriber Line) connection, Ethernet connection, ISDN (Integrated Services Digital Network) line, ATM (Asynchronous Transfer Mode) connection, FDDI (Fiber

Distributed Data Interface) or CDDI (Copper Distributed Data Interface) connection. Furthermore, the network 980 may also include links to any of a variety of wireless networks including WAP (Wireless Application Protocol), GPRS (General Packet Radio Service), GSM (Global System for Mobile Communication), CDMA (Code Division Multiple Access), TDMA (Time Division Multiple Access), cellular phone networks, CDPD (cellular digital packet data), RIM (Research in Motion, Limited) duplex paging network, Bluetooth radio, or an IEEE 802.11-based radio frequency network. Example of Operation

[0073] FIG. 10 is a process flow diagram showing an example method 1000 for locating at least one substantially planar surface 120, 710-740 within a 3D space. The method 1000 may be performed by processing logic that may comprise hardware (e.g., dedicated logic, programmable logic, and microcode), software (such as software run on a general-purpose computer system or a dedicated machine), or a combination of both. The method 1000 can be performed by the units/devices discussed above with reference to FIG. 9. Each of these units or devices may comprise processing logic. It will be appreciated by one of ordinary skill in the art that examples of the foregoing units/devices may be virtual, and instructions said to be executed by a unit/device may in fact be retrieved and executed by a processor. The foregoing units/devices may also include memory cards, servers, and/or computer discs. Although various modules may be configured to perform some or all of the various steps described herein, fewer or more units may be provided and still fall within the scope of example embodiments.

[0074] As shown in FIG. 10, the method 1000 may commence at operation 1010 with receiving, by a processor of the control system 110, at least one depth map of the 3D scene having at least one substantially planar surface. The depth maps may be created by the depth sensor 910 and/or video camera 920. As was discussed above, the depth map includes a plurality of pixels, and each of the pixels includes 2D coordinates and a depth value (Z). Said 2D coordinates include a first coordinate (e.g., X) and a second coordinate (e.g.,

Y). [0075] At operation 1020, the processor of the control system 110 selects characteristic pixels from the plurality of pixels. The characteristic pixels are characteristic for said at least one substantially planar surface. The selection may be based on locating specific groups of pixels, which for a given first coordinate (X), have increasing (decreasing) depth values (Z) with the increase (decrease) of the second coordinate (Y), or vice versa. The selection may be also based on locating pixels having substantially similar depth values (Z) regardless of other coordinates.

[0076] At operation 1030, the processor of the control system 110 calculates 3D coordinates with respect to the selected characteristic pixels. The calculation may be based, for example, on Equation No. 1 as discussed above and rely on such parameters as pixel dimensions, focal length, position of optical axis, etc.

[0077] At operation 1040, the processor of the control system 110 determines clusters of the 3D coordinates. The clusters may be related to

"clouds" of pixels being located within a predetermined virtual area.

Accordingly, all or most of the pixels related to the at least one substantially planar surface are grouped in clusters based on a predetermined rule.

[0078] At operation 1050, the processor of the control system 110 generates a plurality of candidate planes based at least in part on the clusters. The candidate planes may be generated (calculated) based, for example, on

RANSAC iterative process with respect to specific candidate pixels taken from each cluster.

[0079] At operation 1060, the processor of the control system 110 selects a plane associated with the floor 120 of the 3D space among the plurality of candidate planes. In an example, there may be selected the lowest (or highest, the most rear, the most left sided, the most right sided) candidate plane.

[0080] At operation 1070, the processor of the control system 110 may optionally determine 3D coordinates of the selected plane associated with the at least one substantially planar surface 120, 710-740. More specifically, there can be determined those pixels of the depth map that relate to the selected plane and thus relate to the floor 120, one of the walls 710-730, and/or ceiling 740. The parameters of these pixels may be then considered as coordinates or related data. This data can be then used in virtual reality simulation and rendering 3D images.

Example of Computing Device

[0081] FIG. 11 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system 1100, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. In example embodiments, the machine operates as a standalone device, or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate in the capacity of a server, a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a desktop computer, laptop computer, tablet computer, cellular telephone, portable music player, web appliance, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that separately or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.

[0082] The example computer system 1100 includes one or more processors 1102 (e.g., a central processing unit (CPU), graphics processing unit (GPU), or both), main memory 1104, and static memory 1106, which communicate with each other via a bus 1108. The computer system 1100 can further include a video display unit 1110 (e.g., a liquid crystal display). The computer system 1100 also includes at least one input device 1112, such as an alphanumeric input device (e.g., a keyboard), cursor control device (e.g., a mouse), microphone, digital camera, video camera, and so forth. The computer system 1100 also includes a disk drive unit 1114, signal generation device 1116 (e.g., a speaker), and network interface device 1118.

[0083] The disk drive unit 1114 includes a computer-readable medium 1120 that stores one or more sets of instructions and data structures (e.g., instructions 1122) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 1122 can also reside, completely or at least partially, within the main memory 1104 and/or within the processors 1102 during execution by the computer system 1100. The main memory 1104 and the processors 1102 also constitute machine- readable media. The instructions 1122 can further be transmitted or received over the network 1124 via the network interface device 1118 utilizing any one of a number of well-known transfer protocols (e.g., Hyper Text Transfer Protocol (HTTP), CAN, Serial, and Modbus).

[0084] While the computer-readable medium 1120 is shown in an example embodiment to be a single medium, the term "computer-readable medium" should be understood to include a either a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers), either of which store the one or more sets of instructions. The term "computer-readable medium" shall also be understood to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine, and that causes the machine to perform any one or more of the methodologies of the present application. The "computer- readable medium may also be capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term "computer-readable medium" shall accordingly be understood to include, but not be limited to, solid-state memories, and optical and magnetic media. Such media may also include, without limitation, hard disks, flash memory cards, digital video disks, random access memory (RAM), read only memory (ROM), and the like.

[0085] The example embodiments described herein may be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions may be written in a computer programming language or may be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions may be executed on a variety of hardware platforms and for interfaces associated with a variety of operating systems. Although not limited thereto, computer software programs for implementing the present method may be written in any number of suitable programming languages such as, for example, C, C++, C#, .NET, Java, JavaScript, or Python, as well as with any other compilers, assemblers, interpreters, or other computer languages or platforms.

Conclusion

[0086] Thus, methods and systems for locating a floor, walls and ceiling of 3D scene based on depth maps have been described. Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A method for locating one or more surfaces of a three-dimensional (3D) space, the method comprising:

receiving, by a processor, at least one depth map associated with the

3D space, wherein the at least one depth map includes a plurality of pixels, each of the pixels includes two-dimensional (2D) coordinates and a depth value, wherein the 2D coordinates include a first coordinate and a second coordinate;

selecting, by the processor, characteristic pixels from the plurality of pixels, wherein the characteristic pixels are characteristic for at least one substantially planar surface of the 3D space;

calculating, by the processor, 3D coordinates with respect to the selected characteristic pixels;

determining, by the processor, clusters of the 3D coordinates;

determining, by the processor, a plurality of candidate planes based at least in part on the clusters; and

selecting, by the processor, at least one plane associated with the at least one substantially planar surface of the 3D space among the plurality of candidate planes.

2. The method of claim 1, further comprising determining, by the processor, coordinates of the at least one selected plane associated with the at least one substantially planar surface of the 3D space.

3. The method of claim 1, further comprising determining, by the processor, pixels of the plurality of pixels that correspond to the at least one selected plane associated with the at least one substantially planar surface of the 3D space.

4. The method of claim 1, wherein the selecting of the characteristic pixels includes selecting groups of pixels, wherein each group of pixels includes one common first coordinate, while the depth values substantially increase with the increase of the second coordinate within the group.

5. The method of claim 1, wherein the selecting of the characteristic pixels includes selecting groups of pixels, wherein each group of pixels includes one common first coordinate, while the depth values substantially decrease with the increase of the second coordinate within the group.

6. The method of claim 1, wherein the selecting of the characteristic pixels includes selecting groups of pixels, wherein each group of pixels includes one substantially one and the same depth values.

7. The method of claim 1, wherein the calculating of the 3D coordinates with respect to the selected pixels is based at least in part on a dimension of the pixels.

8. The method of claim 1, wherein the calculating of the 3D coordinates with respect to the selected pixels is based at least in part on a focal length of a depth camera, which captured the at least one depth map.

9. The method of claim 1, wherein the calculating of the 3D coordinates with respect to the selected pixels is based at least in part on a coordinates of an optical axis of a depth camera, which captured the at least one depth map.

10. The method of claim \, wherein the determining of the clusters includes selecting sets of the characteristic pixels having similar 3D coordinates.

11. The method of claim 1, wherein each cluster has a predetermined area size.

12. The method of claim 1, wherein the determining of the plurality of candidate planes further includes selecting, by the processor, one candidate pixel per cluster.

13. The method of claim 12, further comprising calculating, by the processor, the plurality of candidate planes based at least in part on the candidate pixels.

14. The method of claim 12, wherein the calculating of the plurality of candidate planes is based at least on part on Random Sample Consensus (RANSAC) process.

15. The method of claim 1, wherein the selecting of the at least one plane associated with the at least one substantially planar surface of the 3D space among the plurality of candidate planes includes selecting the lowest candidate plane.

16. The method of claim 15, wherein the lowest candidate plane has the smallest coordinate of an ordinate axis with respect to the candidate pixels.

17. The method of claim 15, wherein the lowest candidate plane has the highest coordinate of an ordinate axis with respect to the candidate pixels.

18. The method of claim 15, wherein the lowest candidate plane has the smallest coordinate of an abscissa axis with respect to the candidate pixels.

19. The method of claim 1, wherein the selecting of the at least one plane associated with the at least one substantially planar surface of the 3D space among the plurality of candidate planes includes selecting the highest candidate plane.

20. The method of claim 19, wherein the highest candidate plane has the greatest coordinate of an ordinate axis with respect to the candidate pixels.

21. The method of claim 19, wherein the highest candidate plane has the greatest coordinate of an abscissa axis with respect to the candidate pixels.

22. The method of claim 1, further comprising transmitting, by the processor, data related to the plane associated with the at least one substantially planar surface of the 3D space to a remote device for facilitating simulation of a virtual reality.

23. The method of claim 1, further comprising transmitting, by the processor, data related to the at least one plane associated with the at least one substantially planar surface of the 3D space to a remote device for facilitating rendering of a 3D still images or a 3D video.

24. The method of claim 1, wherein the at least one substantially planar surface of the 3D space includes a floor.

25. The method of claim 1, wherein the at least one substantially planar surface of the 3D space includes a side wall, which expands substantially along an optical axis of a depth camera, which captured the at least one depth map.

26. The method of claim 1, wherein the at least one substantially planar surface of the 3D space includes a rear wall, which substantially

perpendicular to an optical axis of a depth camera, which captured the at least one depth map.

27. The method of claim 1, wherein the at least one substantially planar surface of the 3D space includes a ceiling.

28. A method for locating room surfaces of a three-dimensional (3D) space, the method comprising:

receiving, by a processor, at least one depth map associated with the 3D space, wherein the at least one depth map includes a plurality of pixels, each of the pixels includes two-dimensional (2D) coordinates and a depth value, wherein the 2D coordinates include a first coordinate and a second coordinate;

selecting, by the processor, characteristic pixels from the plurality of pixels, wherein the characteristic pixels are characteristic for at least two walls, a floor and a ceiling of the 3D space;

calculating, by the processor, 3D coordinates with respect to the selected characteristic pixels; determining, by the processor, clusters of the 3D coordinates;

selecting, by the processor, planes associated with the at least two walls, a floor and a ceiling of the 3D space among the plurality of candidate planes.

29. A system for locating at least one room surface of a 3D space, the system comprising:

a depth-sensing device configured to obtain at least depth map of the

3D space, wherein the at least one depth map includes a plurality of pixels, each of the pixels includes 2D coordinates and a depth value, wherein the 2D coordinates include a first coordinate and a second coordinate; and

a computing unit communicatively coupled to the depth-sensing device, the computing unit is configured to:

select characteristic pixels from the plurality of pixels, wherein the characteristic pixels are characteristic to at least one substantially planar surface of the 3D space;

calculate 3D coordinates with respect to the selected characteristic pixels;

determine clusters of the 3D coordinates;

determine a plurality of candidate planes based at least in part on the clusters; and

select at least one plane associated with the at least one substantially planar surface of the 3D space among the plurality of candidate planes.

30. The system of claim 29, wherein the computing unit is further configured to determine pixels of the depth map, which relate to the selected at least one plane associated with the at least one substantially planar surface of the 3D space.

31. The system of claim 29, wherein the computing unit is further configured to render 3D images or 3D video based at least in part on the 3D coordinates.

32. A non-transitory processor-readable medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to implement a method for locating one or more surfaces of a three-dimensional (3D) space, the method comprising:

determining, by the processor, clusters of the 3D coordinates;