AU2011265572A1

AU2011265572A1 - Structured light system for robust geometry acquisition

Info

Publication number: AU2011265572A1
Application number: AU2011265572A
Authority: AU
Inventors: David John Battle; Donald James Bone; David John Maunder
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-12-23
Filing date: 2011-12-23
Publication date: 2013-07-11
Also published as: WO2013091016A1

Abstract

STRUCTURED LIGHT SYSTEM FOR ROBUST GEOMETRY ACQUISITION 5 Disclosed is a method of determining at least two coordinates (790) of a reference point (780) on an object in three-dimensional space in a scene captured by an image sensor. The object is irradiated simultaneously by a plurality of spatio-temporally modulated light sources (710,720), at least one of the spatio-temporally modulated light sources being modulated at a different spatio-temporal frequency to another of the plurality. The method 10 generates a composite phase signal (630,640) on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space and captures (910) the composite phase signal at the reference point with the image sensor. A processing arrangement determines from the captured composite phase signal, a 15 set of measured positioning parameters (measured carrier phase IJ), used in the determination of at least two coordinates of the reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor. The method then determines at least two coordinates at the reference point using the set of measured positioning parameters from the plurality of light 20 sources. P019423_speci lodge \ 110 170 180 120 Fig. 1 P019423_figslodge

Description

S&F Ref: P019423 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3 of Applicant: chome, Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): David John Battle David John Maunder Donald James Bone Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Structured light system for robust geometry acquisition The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(5860013_1) STRUCTURED LIGHT SYSTEM FOR ROBUST GEOMETRY ACQUISITION TECHNICAL FIELD The current invention relates generally to the photographic acquisition of detailed geometric information regarding a scene and, in particular, to the use of modulated light sources to make this information robust and independent of imaging system calibrations. 5 Applications of the invention include metrology, robot part picking, reverse engineering and geometry-based post processing of digital images. BACKGROUND Digital images represent projections of the three-dimensional world into two dimensions from the particular viewpoint of a camera. There are many situations, 10 however, where it is desirable for a human, computer or mobile robot to possess additional information regarding a captured scene, such as the relative distances, or depths, of objects contained within the scene. The ability to record object distances in an image allows a photographer, for example, to selectively blur features in the background so as to enhance the salience of foreground objects. Computer security systems employing image analysis 15 algorithms are greatly assisted in segmenting objects of interest where appropriate geometric information is available. Accurate knowledge of scene geometry is also important in the case of mobile robots, which may be required to negotiate and handle complex objects in the real world. Several methods are known for acquiring depth information from scenes. Most of 20 these belong to one of the three general categories of time of flight (TOF), depth from defocus (DFD), and triangulation methods. In TOF methods, light propagation is timed through projection, reflection and reception to directly measure object distances. In DFD methods, variations of blur size throughout the depth of field of a camera are used to gauge approximate ranges. Whereas these two methods involve considerable expense or 25 complexity to achieve moderate accuracy and speed, triangulation methods, which relate lateral object displacements to depth through straightforward triangulation, are known to be fast, accurate and inexpensive. Within the triangulation category of depth capture methods, there are both passive and active branches. Passive triangulation methods essentially amount to stereo vision, 30 wherein cameras record disparities in feature locations between multiple viewpoints. Problems arise, however, when the scene itself lacks sufficient feature points to permit P019423_specijodge -2 unambiguous triangulation, in which case the depth map becomes sparse and loses robustness. In active stereo methods, one camera is replaced by alight source projecting a specially designed - or structured - pattern of illumination. This approach has important 5 advantages over passive stereo methods, because the projected patterns make up for any lack of features in the scene and also improve robustness against variations in ambient lighting. Notwithstanding the considerable advances made in structured light technologies to date, there are still significant problems that limit the effectiveness of even the most 10 advanced systems. Chief amongst these are a lack of robustness due to shadows and occlusion, which can be expected in any real-world scene. Loss of 3D information through shadows and occlusion comes as a direct result of a loss of dimensionality in the captured image. In conventional structured light systems, scenes lose dimensionality even before their images have been captured on account of using a single direction of illumination. For 15 surface orientations oblique to the illumination and/or the camera vectors, the reliability of reconstructed depths is also seriously compromised by shadowing and intensity spreading. Loss of information in captured geometries poses particular problems in robotics, where the analysis of object shapes determines how they are negotiated or manipulated. While multi-projector and multi-camera systems have been put forward as a means 20 of improving the performance of structured light systems, it has generally proved difficult to de-multiplex and fuse the resulting information to form coherent geometry estimates. Sequential pattern projection is common, but this makes systems slow, and still requires a data fusion step. Aside from these problems, projectors of the type used for precision structured lighting are usually cumbersome and expensive. Multiple projector systems, 25 therefore, tend to be bulky, with little chance of mobile deployment. Another problem with existing structured light technology is the distortion inherent in camera and projector optics. On account of X and Y scene coordinates being inferred from sensor pixel positions, rather than actually being measured, the accuracy of triangulated Z coordinates is only as good as the distortion calibrations. This poses 30 difficulties, for example, when strongly distorting fisheye lenses are used to obtain a wide field of view. In summary, a system is desired to be capable of utilising multiple sources of illumination without being cumbersome or expensive. Such a system would be expected to P019423_specilodge - 3 scale well with the number of sources, implying that the sources themselves should be simple and inexpensive with minimal communication and control requirements. Such an improved depth ranging system would also be independent of the kind of optical distortion that currently needs to be calibrated out of depth calculations, implying 5 that the use of optical elements should be minimised and that depth calculations should not rely on implicit correspondences between pixel positions and scene coordinates. Lastly, an improved depth ranging system should be capable of acquiring information from multiple sources simultaneously, and efficiently fusing the information into a coherent geometric description of the scene. Such a system would then constitute a 10 geometry acquisition system, rather than simply a depth mapping system. Simplifications in projector design, in which virtually no optics are employed, have recently been proposed in "Development of a 3D vision range sensor using equiphase light section method", Kumagai, M., Journal of Robotics and Mechatronics, vol. 17, no. 2, pp. 110-115, 2005. These simplifications involve projecting light through a rotating mask 15 of sinusoidally varying transparency such that the resulting illumination is spatio temporally modulated. When successive video frames are processed, the phases of sinusoidal components in the intensity, calculated with respect to some angular datum, can be used to estimate the angular displacements. By associating these measured angular displacements with calibrated angular displacements of pixels within the field of view of 20 the camera, the depths of scene points can be triangulated for each camera-source pair. Unfortunately, this scheme is not ideal, in that the estimated scene depths remain strongly dependent on both the location of the camera and its inherent calibration parameters. SUMMARY According to one aspect of the present disclosure, there is provided a method of 25 determining at least two coordinates of a reference point on an object in three-dimensional space in a scene captured by an image sensor, the object being irradiated simultaneously by a plurality of spatio-temporally modulated light sources, at least one of the plurality of spatio-temporally modulated light sources being modulated at a different spatio-temporal frequency to another one of said plurality of spatio-temporally modulated light sources, the 30 method comprising the steps of: generating a composite phase signal on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space; P019423_specilodge -4 capturing the composite phase signal at the reference point with the image sensor; determining from the captured composite phase signal, a set of measured positioning parameters, used in the determination of at least two coordinates of the reference point, from the plurality of light sources, wherein the set of measured positioning 5 parameters is determined independent of a position of the image sensor; and determining at least two coordinates at the reference point using the set of measured positioning parameters from the plurality of light sources. Preferably one of the at least two coordinates is a depth coordinate. Desirably each of the plurality of light sources is characterised by at least one 10 known positioning parameter with respect to a reference line through said reference point. Advantageously the difference in spatio-temporal frequency of the at least one light source results from at least one of: (i) a different spatial frequency of a pattern on the at least one light source, (ii) a different rotation velocity of the at least one light source; and 15 (iii) a different orientation of the at least one light source. Desirably each light source comprises multiple intersecting patterns to create a two dimensional signal. Preferably the patterns are orthogonal. In a specific implementation each said light source may comprise a rotating pattern surrounding the light source. 20 Desirably the composite phase signal forms a wavefront that is radial to the corresponding light source. Typically the patterns are sinusoidal. Generally the measured positioning parameters comprise an angular displacement from the light source. 25 Most typically the object is in a three-dimensional space and the method determines the three-dimensional coordinates of the reference point in the three-dimensional space. Desirably the positioning parameters are measured with respect to a reference line through each of the plurality of spatio-temporally modulated light sources, thereby being independent of a position of the image sensor. 30 According to another aspect of the present disclosure, there is provided a robotic system comprising: a robotic manipulator arranged for operation in association with an object in three dimensional space; an image sensor arranged for imaging a scene formed at least by the object; P019423_speciJodge -5 a plurality of spatio-temporally modulated light sources configured to simultaneously illuminate the scene, at least one of said plurality of spatio-temporally modulated light sources being modulated at a different spatio-temporal frequency to another one of said plurality of spatio-temporally modulated light sources, each one of said 5 plurality of light sources having a known position in the three-dimensional space; a computing device connected to the robotic manipulator, the image sensor and the each of the light sources and configured to: generate a composite phase signal on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light 10 sources; capture the composite phase signal at a reference point on the object with the image sensor; determine from the captured composite phase signal, a set of measured positioning parameters, used in the determination of at least two coordinates of the 15 reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor; determining the at least two coordinates of the reference point using the set of measured positioning parameters from the plurality of light sources; and controlling a position of the robotic manipulator based on the determined 20 coordinates of the reference point. Preferably the image sensor is mounted upon the robotic manipulator. Alternatively the image sensor may be located at the reference point, where desirably the image sensor can be a photodiode. Other aspects are also disclosed. 25 BRIEF DESCRIPTION OF THE DRAWINGS At least one embodiment of the present invention will now be described with reference to the following drawings, in which: Fig. I illustrates the intersections of iso-phase planes generated by a pair of cylindrical spatio-temporally modulated light sources; 30 Fig. 2 is a plan view of a pair of spatio-temporally modulated sources in relation to two objects in the scene and two possible camera viewpoints; Figs. 3A and 3B illustrate a typical intensity signal projected by a spatio-temporally modulated light source comprising two superimposed sinusoidal carriers with distinct periods, and the appearance of the signal in the Fourier domain; P019423_specilodge -6 Figs. 4A and 4B illustrate a skew sinusoidal pattern possessing modulation in both horizontal (X) and vertical (Y) orientations along with mappings of such a pattern into cylindrical and spherical geometries in one implementation according to the present disclosure; 5 Fig. 5 is a visualisation of an iso-phase surface of a spherically mapped skew sinusoidal mask and how light rays emanating from the centre of the sphere are mapped to spatial coordinates with varying azimuth and elevation angles; Figs. 6A and 6B illustrate the construction of a pattern possessing skew-orthogonal sinusoidal components with two distinct periods along with mappings of such a pattern 10 into cylindrical and spherical geometries in an implementation according to the present disclosure; Fig. 7 illustrates the deployment of two spatio-temporally modulated light sources, each projecting multiple skewed sinusoidal patterns into a 3-D scene according to a preferred implementation; 15 Fig. 8 illustrates the typical convergence of the reconstruction algorithm when fusing data from multiple spatio-temporal light sources in another implementation; Fig. 9 is a schematic block diagram illustrating the sequence of steps in processing captured frames from a video camera into estimates of scene geometry according to a preferred implementation; 20 Fig. 10 is a schematic illustration of a typical robotic part picking application involving multiple spatio-temporally modulated light sources; and Figs. 11 A and I IB form a schematic block diagram of a general purpose computer system upon which arrangements described can be practiced. DETAILED DESCRIPTION INCLUDING BEST MODE 25 Fig. 10 illustrates a robotic system 1000 in which a manipulator 1060 controlled by a computer 1070 is tasked with handling various objects 1030. The system 1000 therefore, needs to know or otherwise estimate or determine precise spatial locations of the objects 1030, and particularly in association with manipulator 1060, whose location in the 3D space will be known. Also shown in Fig. 10 is a video camera 1050 operating as an image 30 sensor for capturing images of the scene in which the objects 1030 are located at a high frame rate. Typically in such a robotic system 1000, the image sensor (camera) 1050 may be conveniently mounted to a peripheral limb of the manipulator 1060. The system 1000 also includes multiple light sources 1010, 1020 and 1040 configured at known locations around the periphery of the scene for substantially simultaneous irradiation of the scene. P019423_speciJodge -7 These light sources 1010, 1020 and 1040 illuminate the scene coincidently, but are spatio temporally modulated on account of radiating intensities that are functions of both position and time. At least two such light sources are required according to the present disclosure. The multiple light sources are preferably modulated at different carrier frequencies, 5 as the resulting diversity of illumination can improve robustness to shadows and occlusions, as discussed above. The difference in frequency between any two light sources may be from any one or combination of: (i) a different spatial frequency (or corresponding wavelength) of a pattern of one of the light sources, 10 (ii) a different rotation (or angular) velocity of one of the light sources; and (iii) a different orientation of one of the light sources (for example one light source having a different axis of rotation to another). The arrangements presently disclosed utilise L such spatio-temporally modulated sources (where L is generally greater than 1, and equal to 3 in the example of Fig. 10) to 15 achieve robustness against occlusions and shadows, as well as partial or complete independence (depending on the specific implementation) of the calibration parameters of the camera 1050. Importantly, the scene geometry afforded by the objects 1030 is estimated in the coordinate frame of the sources 1010, 1020 and 1040, which is stationary. Figs. I IA and 1 B depict the computer system 1070, which may be implemented 20 using a general-purpose computer, and upon which the various arrangements described can be practiced. As seen in Fig. 11A, the computer system 1070 includes: a computer module 1101; input devices such as a keyboard 1102, a mouse pointer device 1103, a scanner 1126, the camera 1050, and a microphone 1180; and output devices including the light sources 1010, 25 1020, 1040, a display device 1114 and loudspeakers 1117. An external Modulator Demodulator (Modem) transceiver device 1116 may be used by the computer module 1101 for communicating to and from a communications network 1120 via a connection 1121. The communications network 1120 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the 30 connection 1121 is a telephone line, the modem 1116 may be a traditional "dial-up" modem. Alternatively, where the connection 1121 is a high capacity (e.g., cable) connection, the modem 1116 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 1120. P019423_specilodge -8 The computer module 1101 typically includes at least one processor unit 1105, and a memory unit 1106. For example, the memory unit 1106 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 1101 also includes an number of input/output (I/O) interfaces including: an audio 5 video interface 1107 that couples to the video display 1114, loudspeakers 1117 and microphone 1180; an 1/0 interface 1113 that couples to the keyboard 1102, mouse 1103, scanner 1126, camera 1050 and optionally a joystick or other human interface device (not illustrated); and an interface 1108 for the external modem 1116 and light sources 1010, 1020 and 1040. In some implementations, the modem 1116 may be incorporated within 10 the computer module 1101, for example within the interface 1108. The computer module 1101 also has a local interface 1111, which permits coupling of the computer system 1070 via a connection 1123 to the manipulator 1060. . The I/O interfaces 1108, 1111 and 1113 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal 15 Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 1109 are provided and typically include a hard disk drive (HDD) 1110. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 1112 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, 20 Blu-ray DiscT"), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 1070. The components 1105 to 1113 of the computer module 1101 typically communicate via an interconnected bus 1104 and in a manner that results in a conventional mode of operation of the computer system 1070 known to those in the relevant art. For 25 example, the processor 1105 is coupled to the system bus 1104 using a connection 1118. Likewise, the memory 1106 and optical disk drive 1112 are coupled to the system bus 1104 by connections 1119. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations', Apple MacTM or a like computer systems. 30 The methods of coordinate determination may be implemented using the computer system 1070 wherein the processes of Figs. I to 10, to be described, may be implemented as one or more software application programs 1133 executable within the computer system 1070. In particular, the steps of the methods of depth mapping and coordinate determination are effected by instructions 1131 (see Fig. 11 B) in the software 1133 that are P019423_specijodge - 9 carried out within the computer system 1070. The software instructions 1131 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the depth and coordinate determination methods and 5 a second part and the corresponding code modules manage a user interface between the first part and the user. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 1070 from the computer readable medium, and then executed by the computer 10 system 1070. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 1070 preferably effects an advantageous apparatus for coordinate determination, and associated depth determination. The software 1133 is typically stored in the HDD 110 or the memory 1106. The 15 software is loaded into the computer system 1070 from a computer readable medium, and executed by the computer system 1070. Thus, for example, the software 1133 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 1125 that is read by the optical disk drive 1112. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer 20 program product in the computer system 1070 preferably effects an apparatus for determining 3D coordinates and/or depth. In some instances, the application programs 1133 may be supplied to the user encoded on one or more CD-ROMs 1125 and read via the corresponding drive 1112, or alternatively may be read by the user from the network 1120. Still further, the software 25 can also be loaded into the computer system 1070 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 1070 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-rays Disc, a hard disk drive, a ROM or integrated circuit, USB 30 memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 1101. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 1101 include radio or infra-red transmission channels as well as a P019423_speci-lodge - 10 network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. The second part of the application programs 1133 and the corresponding code modules mentioned above may be executed to implement one or more graphical user 5 interfaces (GUIs) to be rendered or otherwise represented upon the display 1114. Through manipulation of typically the keyboard 1102 and the mouse 1103, a user of the computer system 1070 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be 10 implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 1117 and user voice commands input via the microphone 1180. Fig. 1I B is a detailed schematic block diagram of the processor 1105 and a "memory" 1134. The memory 1134 represents a logical aggregation of all the memory modules (including the HDD 1109 and semiconductor memory 1106) that can be accessed 15 by the computer module 1101 in Fig. 11 A. When the computer module 1101 is initially powered up, a power-on self-test (POST) program 1150 executes. The POST program 1150 is typically stored in a ROM 1149 of the semiconductor memory 1106 of Fig. I IA. A hardware device such as the ROM 1149 storing software is sometimes referred to as firmware. The POST 20 program 1150 examines hardware within the computer module 1101 to ensure proper functioning and typically checks the processor 1105, the memory 1134 (1109, 1106), and a basic input-output systems software (BIOS) module 1151, also typically stored in the ROM 1149, for correct operation. Once the POST program 1150 has run successfully, the BIOS 1151 activates the hard disk drive 1110 of Fig. 11A. Activation of the hard disk 25 drive 1110 causes a bootstrap loader program 1152 that is resident on the hard disk drive 1110 to execute via the processor 1105. This loads an operating system 1153 into the RAM memory 1106, upon which the operating system 1153 commences operation. The operating system 1153 is a system level application, executable by the processor 1105, to fulfil various high level functions, including processor management, memory management, 30 device management, storage management, software application interface, and generic user interface. The operating system 1153 manages the memory 1134 (1109, 1106) to ensure that each process or application running on the computer module 1101 has sufficient memory in which to execute without colliding with memory allocated to another process. P019423.speci-lodge - 11 Furthermore, the different types of memory available in the system 1070 of Fig. I I A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 1134 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by 5 the computer system 1070 and how such is used. As shown in Fig. I I B, the processor 1105 includes a number of functional modules including a control unit 1139, an arithmetic logic unit (ALU) 1140, and a local or internal memory 1148, sometimes called a cache memory. The cache memory 1148 typically includes a number of storage registers 1144 - 1146 in a register section. One or more 10 internal busses 1141 functionally interconnect these functional modules. The processor 1105 typically also has one or more interfaces 1142 for communicating with external devices via the system bus 1104, using a connection 1118. The memory 1134 is coupled to the bus 1104 using a connection 1119. The application program 1133 includes a sequence of instructions 1131 that may 15 include conditional branch and loop instructions. The program 1133 may also include data 1132 which is used in execution of the program 1133. The instructions 1131 and the data 1132 are stored in memory locations 1128, 1129, 1130 and 1135, 1136, 1137, respectively. Depending upon the relative size of the instructions 1131 and the memory locations 1128-1130, a particular instruction may be stored in a single memory location as 20 depicted by the instruction shown in the memory location 1130. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 1128 and 1129. In general, the processor 1105 is given a set of instructions which are executed 25 therein. The processor 1105 waits for a subsequent input, to which the processor 1105 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 1102, 1103, data received from an external source across one of the networks 1120, 1102, data retrieved from one of the storage devices 1106, 1109 or data 30 retrieved from a storage medium 1125 inserted into the corresponding reader 1112, all depicted in Fig. I1 A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 1134. The disclosed depth and coordinate measurement arrangements use input variables 1154, which are stored in the memory 1134 in corresponding memory P019423_specijodge - 12 locations 1155, 1156, 1157. The arrangements produce output variables 1161, which are stored in the memory 1134 in corresponding memory locations 1162, 1163, 1164. Intermediate variables 1158 may be stored in memory locations 1159, 1160, 1166 and 1167. 5 Referring to the processor 1105 of Fig. I IB, the registers 1144, 1145, 1146, the arithmetic logic unit (ALU) 1140, and the control unit 1139 work together to perform sequences of micro-operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set making up the program 1133. Each fetch, decode, and execute cycle comprises: 10 (a) a fetch operation, which fetches or reads an instruction 1131 from a memory location 1128, 1129, 1130; (b) a decode operation in which the control unit 1139 determines which instruction has been fetched; and (c) an execute operation in which the control unit 1139 and/or the ALU 1140 15 execute the instruction. Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 1139 stores or writes a value to a memory location 1132. Each step or sub-process in the processes of Figs. 1 to 10 is associated with one or 20 more segments of the program 1133 and is performed by the register section 1144, 1145, 1147, the ALU 1140, and the control unit 1139 in the processor 1105 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 1133. The methods or parts thereof may alternatively be implemented in dedicated 25 hardware such as one or more integrated circuits performing the functions or sub functions of depth and coordinate mapping. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories. Fig. I illustrates one arrangement in which L=2 and two spatio-temporally 30 modulated light sources 110 and 120 achieve camera independence in the X and Z co ordinates. While still relying on camera calibration for coordinate estimates in the Y-axis, such a system is useful in 2-D applications. Generally, spatial modulation of the 1 l ofL sources is achieved by virtue of a grey scale mask whose transmittance is a linear sum of M superposed sinusoidal functions of the P019423_speciJodge - 13 circumferential angle 01 170 in the X-Z plane as illustrated. In the arrangement depicted in Fig. 1, (L = 2, M = 1) the masks are mapped onto cylinders with horizontal circumferential frequencies of Kim cycles around the circumference. Hence, the intensity radiated by the Itasource is given by Il oc1 [1 + cos (Kim~l)] M=1 5 which is a linear sum of sinusoidal terms, each offset (in this instance by a value of one) to maintain overall positivity. In its static condition, the pattern of light intensity radiated through each mask from a line or point source on an axis of the mask is proportional to the mask transmittance. It should be noted that, although the light source described would ordinarily radiate in all 10 directions, with an attendant rapid loss of intensity with range, it is straightforward to constrain the angle of illumination to any desired value by using suitable internal reflectors. The central aspect of importance here is that there are no refracting optics in the light path of the source. Unlike conventional projectors, therefore, which use lenses and are thus limited to finite depths of field, the spatio-temporal light source described radiates 15 an unfocussed, diverging field. Generally, for a light source network comprising L spatio-temporal sources, the temporal component of modulation for the 1 th source is achieved by rotating its cylindrical mask at a velocity of N revolutions per second, where the specific value of NJ is characteristic of the 1th source. 20 The far-field intensity of the l1h source then comprises a rotating sum of sinusoidal carriers whose phases can be directly related to the instantaneous mechanical angle 0 170 through which the cylinder has rotated with respect to the datum 60 180, which is common to all sources. The oscillation frequency of each carrier intensity is therefore determined by the horizontal circumferential frequency Kim and the number of revolution per second 25 N 1 , as follows: fim = KimNi (Hz). Although, for simplicity, this and later implementations to be described involve physically rotating the mask to achieve the desired spatio-temporal modulation, for example, similar results can be achieved using solid-state spatial light modulators, 30 provided the far-field intensities remain sinusoidal with a linear phase proportional to the mechanical (azimuth) angle of respective light source 01. Henceforth, phases of sinusoidal P01 9423_speci-lodge -14 carrier intensities will be referred to in terms of the electrical angles 01m, which are related to the mechanical (azimuth) angles 01 by iPim = Kim(61 + 27rNitn)(radians) where t, is an instant in time. By virtue of sweeping sinusoidal light patterns at 5 predictable frequencies and phases, the spatio-temporal sources 110 and 120 encode points in the scene such that their azimuth angles with respect to each source can be readily estimated. This is accomplished by demodulating the pixel time histories across successive frames captured by the camera 1050 and determining the phase(s) of each carrier. De-multiplexing multiple carriers is relatively straight forward on account of 10 different sources using different values of either Kim or N 1 , which enables a form of frequency-division multiplexing, as illustrated in Figs. 3A and 3B. Fig. 2 is a plan view of the pair of spatio-temporal sources 210 and 220, such as those depicted in Fig. 1, in relation to objects forming a scene 230, which in this case is the same as that shown in Fig. 10. Fig. 2 shows coordinate axes 290 by which positioning 15 parameters of the source locations 210 and 220 are known or determinable. With knowledge of the source locations 210 and 220, and the angles 01 270 and 62 240 relative to the angular datum 00 280, this arrangement permits direct triangulation of scene points 250 in X and Y, being at least two coordinates in the 3D system, without regard to either the calibration parameters of a camera 295, nor the location 260, 265, which is free to vary. 20 Fig. 3A illustrates what the temporal history, or time signal 310, of a single camera pixel might resemble when two carriers are present, according to the arrangement depicted in Figs. I and 2. Fig. 3B is the temporal spectrum of the time signal in Fig. 3A computed using a fast Fourier transform (FFT). Fig. 3B firstly shows the presence of the two sinusoidal signals 330 and 340 having different frequencies. In view of the spread of peaks 25 at DC (0 Hz) and at 21 Hz and 27 Hz, Fig. 3B secondly shows the limitations of FFT techniques in estimating signal parameters from short time records. Particularly, the FFT approach displays poor resolution of closely spaced frequencies in comparison to algebraic techniques, especially when the spectrum becomes more crowded with carriers. In view of this poorer resolution, together with interference from the background (DC) illumination 30 320, and mutual interference between sinusoidal components 330 and 340, the preferred implementation uses an algebraic (matrix) approach to estimating carrier amplitudes and phases. Demodulating the camera frame data using an algebraic approach gives superior phase estimation performance on account of the frequencies being precisely known beforehand. P019423_specilodge - 15 Though moving parts are generally undesirable in practical devices, the necessary frequency and phase stability for the above implementation may be readily achieved using a combination of low-drift oscillators and synchronous motors. Similarly, there are numerous means of achieving the necessary spatio-temporal synchronisation between the 5 light sources 210,220 and the camera 295, ranging from sophisticated solutions employing digital compasses and wireless communication, to simpler wired solutions applicable in industrial situations where the illumination arrangements rarely change. Referring again to Fig. 1, iso-phase planes 130 and 140 are shown radiating from the spatio-temporal sources 110 and 120. On such planes, the phases of sinusoidal carriers 10 radiated by a given source are constant, and thus the corresponding wavefront is radial. Due to the integer Kim cycles around the circumference of each mask, there will also be Kim such planes corresponding to any measured carrier phase Vyim. As a consequence of this phase wrapping phenomenon, there will be ambiguities in associating phase measurements with the correct iso-phase planes, which are needed to triangulate scene 15 coordinates. As illustrated in Fig. 1, a line of intersection 150 between the iso-planes 130 and 140 may be no more valid than the line 160. Solutions to this problem involve selecting both fine and coarse periods for the M superposed sinusoids projected by each source such that the azimuth angles 91 become unambiguous. When this is achieved, the correct iso-phase planes 150 and 160 are 20 unambiguously identifiable and the scene coordinates are localised to lie on intersections between the planes. As evident from Fig. l, the elevations of points above and below the X-Z plane remain ambiguous. This means that the desirable goal of camera independence is not completely achieved by this first implementation with respect to the third dimension. This implementation still offers significant advantages for applications requiring 25 independence in a two-dimensional plane. For example, for a robotic arm applicator confined to a two-dimensional plane. In another implementation, camera independence maybe achieved by adding additional sources projecting rotating patterns surrounding the light sources in the vertical plane, and hence intersecting the line 150 to provide unambiguous localisation. Such an 30 arrangement will not be discussed further on account of it being inferior to the next (and preferred) implementation, which utilises additional degrees of freedom available in the original pattern, thereby reducing the number of sources required to achieve true camera independence. P019423_specijodge - 16 Fig. 4A illustrates a sinusoidal pattern 430 similar to that discussed in Fig. 1, however, this pattern is skewed, or tilted, with respect to the X and Y axes. By virtue of this skew, the pattern 430 can be considered to have spatial frequencies in both horizontal and vertical directions, the aim of which is to resolve the ambiguous elevations in the first 5 implementation. Fig. 4B shows a skew-sinusoidal pattern 430 mapped to both cylindrical 410 and spherical 420 geometries, where the condition of integral circumferential cycles is observed in each case. Further discussion will focus on the spherical geometry 420 on account of its advantages in providing a direct mapping between its vertical phase 10 component and the elevation angle #), as well as the sphere having surfaces normal to the light path, and thus less likely to introduce unwanted refraction into the projected intensities. In Fig. 5, a single spherical mask 510 is illustrated with respect to the global coordinate frame 550. Whereas the iso-phase surfaces for the first arrangement took the 15 form of vertical planes, the iso-phase surfaces for this spherical skew pattern take the form of helical coils 520. Rays projected from the centre of the sphere 510 through the helix 520 intersect scene points p 560 possessing identical phase. As the spherical skew pattern rotates around its axis, in this case the Y-axis of the coordinate frame 550, the iso-phase surfaces 520 rotate with the pattern, encoding the angular displacements represented by the 20 azimuth angle 01 540 and the elevation angle #P 530 into the projected intensities Ir according to Ilm oc 1 + cos (Kim~s + Vimos) where Vim is the vertical equivalent of the horizontal circumferential frequency Kim. Given the two angles to be resolved and the single phase measurement available, 61 and #, cannot be resolved using a single skew-sinusoidal pattern. For this reason, the 25 preferred implementation comprises at least two sinusoidal patterns 610 and 620 skewed in opposite directions as illustrated in Fig. 6A. These patterns, corresponding to positive and negative values of Vim, are summed to construct or create a composite pattern 630, being a two-dimensional signal. The pattern 630 is an example of a composite signal, being a composite of the signal patterns 610 and 620 as impinging upon the object. As seen in 30 Fig. 6A the patterns are orientated orthogonal to each other, thus creating a cross-hatched composite pattern. By arranging for the skew-sinusoidal components to have the distinct circumferential frequencies K 1 1 and K1 2 , their respective contributions to the intensity I P019423_speci-lodge - 17 can be easily separated in temporal frequency, as previously discussed in relation to demodulation. Once the sinusoidal carriers are separated and their unwrapped phases Vii and 12 have been determined throughout the scene, the associated azimuth and elevation angles 01 and PIis determined for all points with respect to the 1 th source 640 in Fig. 6B by 5 solving the system of equations set out below. [$11] = [Ki1 I1] [81] Iz j2 Kiz Viz2 01i As a consequence of using oppositely skewed sinusoidal components, Fig. 7 shows an example implementation 700 in a three-dimensional system 790 where respective iso phase surfaces 740 and 750 wind around a known positioned spatio-temporal source 710 in opposite directions. The positioning of the light sources 710 and 720 provide for a 10 predetermined geometric arrangement of the sources relative to the object or reference point 780 within a three-dimensional space. The intersections of these contra-rotating surfaces define a ray 760 which identifies the azimuth angle Gand elevation angle #1 of points, such as a point 780 in the scene with respect to the ltfsource 710. The addition of a second source (I = 2) 720 at a known position in the space 790 with either different values 15 of Kim, or simply different rotational speeds NI, introduces additional rays 770, which intersect the first ray 760 to uniquely localise 3D points in the scene, particularly the point 780 as illustrated. Complete camera independence, with respect to the third dimension, can thus be achieved by using two or more spatio-temporal light sources of the type described. The 20 question of uniqueness for the calculated angles still arises, as the measured phases for the various carriers will generally be wrapped. This issue can be handled in the manner previously discussed, whereby sinusoidal patterns are augmented with additional components of differing periods to resolve the ambiguity. Wrapped phase means that a phase angle and corresponding phase angle rotated by 27E radians is not disambiguated. As 25 such, following from the patterns discussed with reference to Figs. 6A and 6B, as seen in Fig. 7, the point 780 is irradiated with a composite wrapped phase signal produced by the light sources 710, 720, with each light source being characterised by at least one known positioning parameter with respect to a reference line, such as 760, 770 through the reference point 780. 30 The remaining aspect of the present disclosure to be discussed is the reconstruction algorithm, which is responsible for forming the overall geometry estimate based on information from L distributed spatio-temporal sources. As discussed above, the angles 9, P019423_specilodge - 18 and #1 can, in principle, be determined directly for each source. This, however, ignores the influence of noise in the intensity measurements, and also the non-uniformity in data quality to be expected in real-world measurements. For reasons mentioned in the Background discussion, much of the raw data captured will be unreliable. Some sources 5 will be shadowed in certain parts of the scene, while specular surfaces may reflect the light from other sources away from the camera. To reconstruct reliable estimates of the scene geometry, the complete data ensemble from L distributed sources must be weighted and combined according to its reliability. The remaining component of the present disclosure is a minimisation algorithm 10 (e.g. Newton's algorithm) designed to reconstruct scene geometries such that modelled carrier phases match the measured phases in an overall least squares sense. For a single sinusoidal intensity component lim and a known geometry comprising the coordinate triplets(x,. y, z), the forward data model, mapping scene coordinates to estimated phases lim, takes the form Oi m = Kim atan - +Vmatan 2 -I z( z - zi 15 where (xi, yt, zj) is the locationof the lt source. The cost function for the least squares minimisation is calculated over the M sinusoidal carriers of each of the L spatio-temporal sources. The spectral intensities estimated in the demodulation step, exemplified in Fig. 3B by 1i 350 and 12 360, are used to weight the respective phase estimates such that those associated with stronger signals 20 take precedence over noisier ones. This is the underlying principle whereby geometric diversity in the positioning of multiple sources is able to improve the robustness of geometry estimates. The overall weighted cost function to be minimised is given by L M 1=1 m=1 L M tim Kimatan(X- +Vimatan(" -, im Ilm z - zj) z - zi) 1=1 m=1 25 The cost function derivatives necessary for calculating the coordinate increments at each iteration of the minimisation algorithm (e.g. Newton's algorithm) include the vectors P019423_specilodge - 19 of first derivatives VX 2 , and the matrices of second derivatives VVX 2 for each scene point p, given respectively by 2 Ox 2 OxZ [X VX2= =s , where p= -8z x z zX 2 a 2

X

2 a 2

X

2 and VVX = X 2 ay 2 8 2

X

2 0 2

X

2 a 2

X

2 0zX 2 .Oxaz ayaz az 2 5 All components of VX 2 and VVX 2 have straight-forward analytic forms, which makes them simple to recalculate during each iteration. Given the non-linearity of the cost function associated with the arctangent functions, the step-length in the error minimisation is scaled by a fraction y so that piece-wise quadratic approximations to the cost-function remain reasonably accurate. The refinement step for the scene coordinate estimation then 10 takes the form 0 = + y[VVX 2 ]-1VX 2 where the initial estimate focan be constructed using direct triangulation, as practiced in the prior art, without regard to camera calibration. Generally, the better the initial estimate of p, the fewer steps are required to reduce the squared error below the desired tolerance. Fig. 8 is a typical plot of the convergence of the above reconstruction algorithm in 15 which it can be observed that the squared error 820 is monotonically reduced on each iteration 810. The minimum attainable residual error 830 is a function of the signal to noise ratio of the input image frames, as well as the accuracy of the source locations etc. Fig. 9 summarises a data processing architecture 900 representative of a process of the preferred implementation of the geometry acquisition system described herein. At the 20 commencement of the process 900, before geometry estimation can begin, a certain minimum number of frames are acquired at step 910 from the camera system, of Fig. 10 for example, to permit the phases of sinusoidal illumination components to be estimated. The scene geometry is then initialised, as indicated by the dashed arrow connection 912, to a starting estimate at step 920. This starting estimate can take the form of a regular 25 Cartesian grid of pixels having some uniform (or user specified) default depth. Alternatively, on the assumption that the scene has not changed substantially from the previous estimation cycle (i.e. assuming small movements or a sufficiently high frame P019423_specilodge - 20 rate), the processing system can use the result of the previous calculation to initialise the current estimate. In yet another implementation, the carrier phase outputs 935 arising from the sinusoidal fit 930 can be used in conjunction with the camera data to triangulate the approximate coordinates of the camera pixels as an initial geometry estimate. 5 Following demodulation of the carrier phases 935 from the sinusoidal fitting as performed at step 930, the measured carrier phases 935 are compared in step 990 with the modelled phases generated from the current geometry estimate determined in step 980, or step 920 in the first iteration. The comparison of step 990 calculates errors of the carrier phases with respect to the current geometry estimate. If the sum of squared errors is less 10 than a prescribed threshold, being a convergence test performed at step 995, the reconstruction of the scene geometry halts, and the process 900 proceeds to acquire the next frame at step 999 for processing in the next cycle, as indicated by the dashed line 998. Where the error has not converged at step 995, the next stage of the processing, being the most numerically intensive, involves the calculation of derivatives of the cost 15 function at step 950. In step 960, the derivatives are weighted according to the carrier amplitudes 940 found during the sinusoidal fitting step 930 and used to construct the Newton increment in step 970. The Newton increment is then scaled and added to the current geometry estimate in step 980 to provide the updated geometry estimate to step 990. 20 The iterative process 900 then continues, with the subsequently calculated error being less than that in the preceding iteration. On convergence, the variance of the geometry estimates is greatly improved over straightforward triangulation, on account of the data being fused from multiple geometrically diverse sources of illumination, independently of any camera or projector calibration. 25 INDUSTRIAL APPLICABILITY The arrangements described are applicable to the computer and data processing industries and particularly for the measurement of depth in 3D environments using a single imaging device. The foregoing describes only some embodiments of the present invention, and 30 modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive. For example, whilst the implementation of Fig. 10 illustrates the image sensor as a camera 1050 preferably mounted to the robotic arm 1060, the sensor may be implemented as a simple light detector, such as a photodiode, positioned at the reference point in the 3D P019423_spocilodge -21 scene. In such an implementation, the sensor does not detected light from the sources reflected from the 3D scene, but rather the light from the spatio-temporal source as the light directly impinges the scene. This arrangement can be useful of detecting motion in the scene, being a situation where the depth may be undergoing variation. 5 (Australia Only) In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of". Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. 10 P019423_specilodge

Claims

1. A method of determining at least two coordinates of a reference point on an object in three-dimensional space in a scene captured by an image sensor, said object being 5 irradiated simultaneously by a plurality of spatio-temporally modulated light sources, at least one of said plurality of spatio-temporally modulated light sources being modulated at a different spatio-temporal frequency to another one of said plurality of spatio-temporally modulated light sources, said method comprising the steps of: generating a composite phase signal on the object in the scene by a predetermined 10 geometric arrangement of the plurality of spatio-temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space; capturing the composite phase signal at the reference point with the image sensor; determining from the captured composite phase signal, a set of measured 15 positioning parameters, used in the determination of at least two coordinates of the reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor; and determining at least two coordinates at the reference point using the set of measured positioning parameters from the plurality of light sources. 20

2. A method according to claim 1, wherein one of the at least two coordinates is a depth coordinate.

3. A method according to claim 1, wherein each of said plurality of light sources is 25 characterised by at least one known positioning parameter with respect to a reference line through said reference point.

4. A method according to claim 1, wherein the difference in spatio-temporal frequency of the at least one light source results from at least one of: 30 (i) a different spatial frequency of a pattern on the at least one light source, (ii) a different rotation velocity of the at least one light source; and (iii) a different orientation of the at least one light source. P019423_specilodge - 23

5. A method according to claim 1, wherein each said light source comprises multiple intersecting patterns to create a two-dimensional signal.

6. A method according to claim 5, wherein the patterns are orthogonal. 5

7. A method according to claim 1, wherein each said light source comprises a rotating pattern surrounding the light source.

8. A method according to claim 1, wherein the composite phase signal forms a 10 wavefront that is radial to the corresponding light source.

9. A method according to claim 1, wherein the patterns are sinusoidal.

10. A method according to claim 1, wherein the measured positioning parameters 15 comprise an angular displacement from the light source.

11. A method according to claim 1, wherein the object is in a three-dimensional space and the method determines the three-dimensional coordinates of the reference point in the three-dimensional space. 20

12. A method according to claim 1, wherein the positioning parameters are measured with respect to a reference line through each of the plurality of spatio-temporally modulated light sources, thereby being independent of a position of the image sensor. 25

13. A robotic system comprising: a robotic manipulator arranged for operation in association with an object in three dimensional space; an image sensor arranged for imaging a scene formed at least by the object; a plurality of spatio-temporally modulated light sources configured to 30 simultaneously illuminate the scene, at least one of said plurality of spatio-temporally modulated light sources being modulated at a different spatio-temporal frequency to another one of said plurality of spatio-temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space; P019423_speci_lodge - 24 a computing device connected to the robotic manipulator, the image sensor and the each of the light sources and configured to: generate a composite phase signal on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light 5 sources; capture the composite phase signal at a reference point on the object with the image sensor; determine from the captured composite phase signal, a set of measured positioning parameters, used in the determination of at least two coordinates of the 10 reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor; determining the at least two coordinates of the reference point using the set of measured positioning parameters from the plurality of light sources; and controlling a position of the robotic manipulator based on the determined 15 coordinates of the reference point.

14. A robotic system according to claim 13 wherein the image sensor is mounted upon the robotic manipulator. 20

15 A robotic system according to claim 13 wherein the image sensor is located at the reference point.

16. A robotic system according to claim 15 wherein the image sensor is a photodiode. 25

17. A computer readable storage medium having a program recorded thereon, the program being executable by computer apparatus to determine at least two coordinates of a reference point on an object in three-dimensional space in a scene captured by an image sensor, said object being irradiated simultaneously by a plurality of spatio-temporally modulated light sources, at least one of said plurality of spatio-temporally modulated light 30 sources being modulated at a different spatio-temporal frequency to another one of said plurality of spatio-temporally modulated light sources, said program comprising: code for generating a composite phase signal on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light P019423_speci-lodge - 25 sources, each one of said plurality of light sources having a known position in the three dimensional space; code for capturing the composite phase signal at the reference point with the image sensor; 5 code for determining from the captured composite phase signal, a set of measured positioning parameters, used in the determination of at least two coordinates of the reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor; and code for determining at least two coordinates at the reference point using the set of 10 measured positioning parameters from the plurality of light sources.

18. Computer apparatus for determining at least two coordinates of a reference point on an object in three-dimensional space in a scene captured by an image sensor, said object being irradiated simultaneously by a plurality of spatio-temporally modulated light 15 sources, at least one of said plurality of spatio-temporally modulated light sources being modulated at a different spatio-temporal frequency to another one of said plurality of spatio-temporally modulated light sources, said appratus comprising: means for generating a composite phase signal on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light 20 sources, each one of said plurality of light sources having a known position in the three dimensional space; means for capturing the composite phase signal at the reference point with the image sensor; means for determining from the captured composite phase signal, a set of measured 25 positioning parameters, used in the determination of at least two coordinates of the reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor; and means for determining at least two coordinates at the reference point using the set of measured positioning parameters from the plurality of light sources. 30

19. A method of determining at least two coordinates of a reference point on an object in three-dimensional space in a scene captured by an image sensor, said method being substantially as described herein with reference to the drawings. P019423_specijodge - 26

20. A robotic system substantially as described herein with reference to the drawings Dated this 23rd of December 2011 CANON KABUSHIKI KAISHA 5 Patent Attorneys for the Applicant Spruson&Ferguson P019423_specijodge