GB2550344A

GB2550344A - Locating and augmenting object features in images

Info

Publication number: GB2550344A
Application number: GB1608424.6A
Authority: GB
Inventors: Freeman Russell; Jose Garcia Sopo Maria
Original assignee: Holition Ltd
Current assignee: Holition Ltd
Priority date: 2016-05-13
Filing date: 2016-05-13
Publication date: 2017-11-22
Anticipated expiration: 2036-05-13
Also published as: GB2550344B; GB201608424D0

Abstract

A method of augmenting image data including a visible feature of an object, comprising receiving image data S5-3, storing a plurality of masks corresponding to masked portions of the image, and sampling pixel values at predefined locations of the image data S5-5. At least one mask is selected based on the sampled pixel values S5-11, and pixel values in the masked portion of the image are modified based on colourisation parameters S5-15. The modified image is then output S5-17. Preferably, the masks are grouped based on visible features or aspects of visible features. A mask may be selected based on generated feature descriptors S5-9 from the sampled pixel values. Preferably shape model data associated with the object and having a plurality of labelled points, at least one of which corresponds to the feature, is stored and modified to fit the object in the captured image (figs 6, 7.)

Description

Locating and Augmenting Object Features in Images Field of the Invention [0001] This invention relates to an image processing system, and more particularly to techniques for locating and augmenting object features in images.

Background of the Invention [0002] Choosing a new wearable product, such as clothing, eyewear, headwear and cosmetics, is often a tedious and time consuming process, and conventionally is only possible in a retail environment where physical specimens are available for the customer to try. An important consideration for a customer trying on any new wearable product is seeing how it looks as they move around, taking momentary opportunity to view themselves wearing the product from particular angles or with particular expressions.

[0003] Utilising the mass availability of handheld, or other, computing devices to make real-time virtual try-on of new wearable products possible in any environment has the potential to radically change the way the customer finds the perfect product. Three main challenges for any such system are first, locating and tracking the features of a subject or object in a live captured image data stream, second, augmenting a virtual wearable product accurately and realistically in place over the live images, and finally to do all this in real-time, particularly on devices having limited hardware capabilities.

[0004] What is desired are real-time augmentation systems that provide improved accuracy, processing efficiency and realism for a better user experience.

Statements of the Invention [0005] Aspects of the present invention are set out in the accompanying claims.

[0006] In one aspect, the present invention provides a computer-implemented method of augmenting image data, the method comprising receiving data of an image captured by a camera, the captured image including a region having a visible feature of an object; storing masking data defining a plurality of masks, each mask defining a respective masked portion of the region of the captured image; sampling pixel values at predefined locations of the captured image data; selecting at least one stored mask based on the sampled pixel values; modifying pixel values in the or each selected masked portion of the region of the captured image based on colourisation parameters; and outputting the captured image with the modified pixel values for display.

[0007] In another aspect, the present invention provides a system comprising means for performing the above methods. In a further aspect, there is provided a computer program arranged to carry out the above method when executed by a programmable device.

Brief Description of the Drawings [0008] There now follows, by way of example only, a detailed description of embodiments of the present invention, with references to the figures identified below.

[0009] Figure 1 is a block diagram showing the main components of an augmented reality system according to an embodiment of the invention.

[0010] Figure 2 is a schematic illustration of an exemplary data structure of a trained model including a global shape and a plurality of sub-shapes.

[0011] Figures 3 A and 3B schematically illustrate examples of data processed and generated by the texture model training module during the training process.

[0012] Figure 4, which comprises Figures 4A and 4B, schematically illustrate further examples of data processed and generated by the texture model training module during the training process.

[0013] Figure 5 is a flow diagram illustrating the main processing steps performed by the system of Figure 1 to track and augment an object in a captured image according to an embodiment.

[0014] Figure 6 is a flow diagram illustrating exemplary processing steps performed by the tracking module of Figure 1 to determine and track the location of the object in the captured image.

[0015] Figure 7 is a flow diagram illustrating exemplary sub-processing steps performed by the tracking module to refine the object shape during the tracking process.

[0016] Figure 8 is a flow diagram illustrating processing steps performed by a feature detector of the tracking module of Figure 1 to generate feature descriptors.

[0017] Figure 9 is a diagram of an example of a computer system on which one or more of the functions of the embodiment may be implemented.

Detailed Description of Embodiments of the Invention [0018] Referring to Figure 1, an augmented reality system 1 is schematically illustrated. The augmented reality system 1 comprises a tracking module 3 that automatically processes image data of a scene captured by a camera 5 to detect and determine the location of an obj ect in the captured scene. A colourisation module 7 of the system 1 modifies captured image data of the detected object, based on colourisation parameters 9 corresponding to one or more virtual wearable products, for example retrieved from a data store 9a, which may he remote from the system 1. A user interface (not shown) may be provided to receive user input selection of the one or more virtual wearable products to try-on. The augmented image data is then output to a display 11. Alternatively or additionally, the tracking module 3 may be configured to output image frames as captured to the display 11, where the colourisation module 7 is configured to output the regions of modified pixels to the display 11, over the captured pixels of respective regions in the captured image frame. Preferably the operations are conducted in real time, or near real time.

[0019] The tracking module 3 includes an object detector 13 that automatically detects and determines the location of a predefined object in the captured image data based on a trained shape model 15. A plurality of object detectors may be provided, each configured to detect the presence of a respective different type of object in the captured image data. Instead or alternatively, the object detector 13 may be configured to identify the presence of one or more types of objects in the captured image data. In this embodiment, the trained shape model 15 includes a global shape model 15a and a plurality of sub-shape models 15b for a trained object shape, for example as described in the applicant’s earlier application GB2516739. The trained shape model 15 may be stored in the data store 17a of the system 1. It is appreciated that the object detector 13 can implement any known shape model based algorithm.

[0020] Figure 2 is a schematic illustration of an exemplary data structure of a trained shape model 15, including a global shape 15a and a plurality of sub-shapes 15b. As shown, the exemplary data structure of the shape model 15 is an array of (x,y) coordinates, each coordinate associated with a respective labelled point of the global shape 15a. Figure 3A schematically illustrates a plurality of defined labelled points 19 overlaid on a representation of a reference image. Preferably, a common set of labelled points is used by the processing modules of the system 1, so that vertex and texture coordinate data can be shared across a common reference plane. Each sub-shape model 15b may be associated with a respective subset of the (x,y) coordinates, each subset thereby defining a plurality of labelled points 19 of the respective subshape. The subsets of labelled points 19 for each sub-shape may overlap.

[0021] A shape model training module 21a may generate and store a plurality of trained shape models 15 associated with respective different types of objects in the model data store 17a, for example based on input training data (training images) and the defined plurality of labelled points 19. Any known technique may be used to generate the shape model 27. For example, the applicant’s above-referenced application GB2516739 discusses use of the Active Shape Modelling (ASM) technique to generate a global shape model 15a and associated sub-shape models 15b, each including a plurality of modes of variation as determined by the shape model training module 21a from the training data. Each mode describes deviations from a mean shape 15a’, 15b’ of the respective shape model 15a, 15b, the deviations differing for each respective mode. It will be appreciated that the precise data structure of the shape models 15 will depend on the particular shape modelling technique that is implemented by the system 1.

[0022] Each shape model 15 may also include a global shape regression coefficient matrix 15c for each global shape 15a, and at least one sub-shape regression coefficient matrix 15d for each associated sub-shape 15b. As is known in the art, the regression coefficient matrices 15c,15d define an approximation of a trained function that can be applied, for example during a tracking phase, to bring the features of a candidate object shape from respective estimated locations to determined “real” positions in an input image. The regression coefficient matrices 15c,15d may be generated by the shape model training module 21a in the training process and define respective trained functions which relate the texture around an estimated shape and the displacement between their estimated positions and the final position where the shape features are truly located. The shape model training module 21a can be configured to compute the respective regression coefficient matrices 15c,15d based on any known regression analysis technique, such as principal component regression (PCR), linear regression, least squares, etc. The plurality of regression coefficient matrices 15c,15d form parts of the trained shape model 15 stored in the model data store 17a.

[0023] In this embodiment, the shape model training module 21a also generates data defining a normalised object mesh 23, formed by a plurality of polygonal regions, each polygonal region defined by at least three labelled points 19 and representing a polygonal face of the two-dimensional mesh 23. It is appreciated that the normalised mesh may instead define three-dimensional polygonal regions. The shape model training module 21a may be configured to generate the mesh of triangular regions by performing triangulation of the labelled points 19 of the global mean shape 15a’, for example as discussed in the applicant’s co-pending application GB2518589. Figure 3B schematically illustrates an example of a resulting normalised object mesh 23 of a trained face object model, generated from the eighty labelled features points 19 shown in Figure 3A, corresponding to the vertices 19’ of the mesh that are numbered in sequence. The normalised object mesh 23 may be stored as a data structure including a first data array consisting of an indexed listing of the labelled points defined by x and y coordinates relative to a common two dimensional reference plane, and a second data array consisting of a listing of polygon faces defined by indices of three or more labelled points in the first data array. For example, the first data array be an indexed listing of m vertices: [xo, yo, xi, yi, Xm, ym\, each index corresponding to a different labelled feature point. The second data array may be listing of n exemplary polygon faces: [1/2/20, 1/21/5, ..., 92/85/86], each polygon face defined by indices or three vertices in the first data array. The normalised object mesh 23 may be stored in the model data store 17a of the system 1.

[0024] In this exemplary embodiment, the augmented reality system 1 simulates the visible appearance of one or more virtual wearable products applied to respective predefined features or feature areas of a detected object in the captured image frame. In the virtual try-on context, the object feature(s) may be facial features of a person’s face, hairstyle of a person’s head, clothing or footwear items on a person’s body, style or pattern of clothing, etc. It will be appreciated that aspects of the invention may be applicable to image augmentation in other contexts involving any type of object with visible features, such as medical imaging to detect, track and augment the display of internal body organs.

[0025] Improved processing efficiency and enhanced realism is achieved by defining and providing a mask library 25 storing a plurality of user-defined feature masks 27, which are used by the colourisation module 7 to determine the specific region or regions of pixels of the captured image data to be processed for colourisation. The mask library 25 may be stored in a texture data store 17a of the system 1. The stored feature masks 27 are arranged into groups, each group 27’ associated with a particular visible feature or aspect of a visible feature in an image, and each individual feature mask 15 is associated with a variation of the associated visible feature or aspect. The variation may include one or more of shape, pattern, colour, size, density, intensity, brightness, etc. Figures 4A and 4B schematically illustrates a plurality of exemplary groups 15’ of feature masks stored in a mask library 16a. A first exemplary group of feature masks 27’-l is associated with variations of applied makeup around the eyes of a person’s face. A second exemplary group of feature masks 27’-2 is associated with variations of applied makeup in respective cheek areas of a person’s face. A third exemplary group of feature masks 27’-3 is associated with variations of a person’s lips and/or variations of applied makeup to the lips of a person’s face. A fourth exemplary group of feature masks 27’-4 is associated with variations of an item of clothing worn on the upper torso of a person’s body, in particular the sleeve length. A fifth exemplary group of feature masks 27’-5 is associated with variations of patterns of an item of clothing worn by a person. A sixth exemplary group of feature masks 27’-5 is associated with variations of hairstyle of a person’s head. Each mask 27 may define a contiguous region of pixels or a plurality of discontinuous regions of pixels.

[0026] Many masks can be compounded together to produce a particular desired virtual look or appearance, which consists of multiple layers of virtually applied products, in multiple application styles. The masks 16 may include black and white pixel data. Preferably, the masks 16 are grey-scale image data, for example including black pixels defining portions of a corresponding texture data file 33 that are not to be included in the colourisation process, white pixels defining portions of the corresponding texture data file 20 that are to be included at 100% intensity, and grey pixels defining portions of the corresponding texture data file 20 that are to be included at an intensity defined by the associated grey value. The white and grey pixels are referred to as the masked data regions. In this way, different masks 27 can be provided for various blurring effects.

[0027] The tracking module 3 includes a visible feature detector 29 that automatically identifies the presence or absence of one or more predefined visible features of the detected object in the captured image. The feature detector 29 processes captured image data of the detected object and selects a matching feature mask 15 for each visible feature detected in the capture image, based on pixel values sampled from locations of the captured image data that are predefined for each feature. A corresponding plurality of feature sampling points 31 are user-defined for each group of feature masks 27’. The feature sampling points 31 may be a selected subset of the labelled points 19 of the trained global shape model 15a, or may be defined relative to the labelled points 19. Figure 4A schematically illustrates exemplary sets of feature sampling points 31-1 to 31-3 defined for respective groups of feature masks 27’. The feature detector 29 generates a feature descriptor 32 of the detected visible feature from the sampled pixel values, and uses a trained classifier 33 to identify the feature mask 27 that matches the detected visible feature, based on the generated descriptor 32. A classifier training module 21b may be provided to train the classifier 33 based on training image data. The training image data may include synthetic images that are generated by the colourisation module 7 from a reference image, where the renderer 7c outputs image data that is augmented using a respective one of the feature masks 27. Suitable feature descriptors such as HOG, SIFT, SURF, FAST, BRIEF, ORB, BRISK, FREAK, or the like, and image classifiers based on PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), SVM (Support Vector Machines), neural networks, etc., are of a type that is known per se, and need not be described further.

[0028] The tracking module 3 passes the captured image data to the colourisation module 15, together with the determined location of the target object in that image frame and data identifying the selected feature mask 15 for each detected visible feature. Each individual feature mask 15 may include a unique identifier 15a that can be output by the trained classifier 33 and used by the colourisation module 7 to retrieve the associated mask data 15b during the colourisation process. Each feature mask 15 may also include data 27c defining a subset of the normalised object mesh 23 that is determined based on the associated feature mask 27, such that the colourisation module 7 performs efficient and accurate modification of the pixel values within the masked regions of the captured image data. The mesh subset data 27c may be generated in a texture training process by a texture training module (not shown). In this way, the colourisation module 7 determines a subset of polygonal faces of the normalised object mesh 23 corresponding to an identified feature mask 27 from the mesh subset data 27c included in the feature mask 27.

[0029] The colourisation module 7 modifies the pixel values of the or each selected masked region of the captured image data to augment the associated visible feature with the appearance of the virtual wearable product, based on colourisation parameters 9 such as pixel value adjustment properties and/or identification of texture data 33 that is representative of the appearance of a virtual wearable product. The texture data 33 may include image data or a mathematical model that can be used to generate an array of augmentation values to be applied by the colourisation module 7 to the selected masked regions of the captured image data.

[0030] The colourisation module 7 may include a plurality of shader modules 7a that determine and apply image colourisation to selected regions of captured image data and/or texture data files 33. The output of a shader module 7a is sent to a renderer 7b that augments the underlying object in the captured image from the camera 5 with the specified virtual wearable product. As will be described in more detail below, each shader module 7a can be based on predefined sets of subshader modules to be applied in sequence, for example based on selected sets of colourisation parameters 9. The colourisation module 7 may also include a transform module 7c that receives data defining the location of labelled features points in the common reference plane, determined by the tracking module 3 for a captured image. The determined coordinates from the camera image data define the positions of the polygonal regions of a normalised object mesh 23 that matches the detected object.

[0031] The transform module 7c determines a mapping from the vertices of a selected region of a trained mesh 23 to vertices of the corresponding tracked labelled points. The transform module 7c uses the determined mapping to transform the selected mask data 16a (and/or texture data 33) for the particular feature, into respective “warped” versions that can be processed by the shader modules 7a. The renderer 7b may be configured to overlay the respective augmented masked image data of each feature according to the common reference plane, and in conjunction with an alpha blended shader sub-module (not shown), performs an alpha blend of the respective regions of augmented image data. The final result is obtained by the renderer 7b applying the blended result back onto the object represented by the captured image data from the camera 5, and output to the display 11.

[0032] In this way, the colourisation module 7 uses the image data coordinates from the detected object, referenced by the mesh subsets 27c, as texture coordinates to the mask data 27b and texture data files 33, for each feature mask 27 associated with a respective set of colourisation parameters 9 for one or more selected virtual wearable products, transformed according to the tracked labelled point locations, and rendered over the captured image data, resulting in the visual effect of morphing the selected product(s) to the object in a real-time augmented reality display. It will be appreciated that the processing modules of the colourisation module 7 may include calls to a set of predefined functions provided by a Graphics Processing Unit (GPU) of the system 1. Advantageously, the present embodiment provides for more efficient GPU usage, as only the masked portions of the respective texture data files and captured image data are transmitted to the GPU for processing.

[0033] The system 1 may be implemented by any suitable computing device of a type that is known per se, such as a desktop computer, laptop computer, a tablet computer, a smartphone such as an iOS™, Blackberry™ or Android™ based smartphone, a ‘feature’ phone, a personal digital assistant (PDA), or any processor-powered device with suitable user input, camera and display means. Additionally or alternatively, the display 11 can include an external computing device, such as a mobile phone, tablet PC, laptop, etc. in communication with a host device for example via a data network (not shown), for example a terrestrial cellular network such as a 2G, 3G or 4G network, a private or public wireless network such as a WiFi™-based network and/or a mobile satellite network or the Internet.

[0034] The processing of data by the training modules 33 may be referred to as “offline” preprocessing, as the training processes are typically carried out in advance of the real-time image processing by the tracking module 3.

[0035] The tracking process performed by the tracking module 3 in the system 1 will now be described in more detail with reference to Figure 5, which shows the steps of an example computer-implemented tracking and augmentation process in an embodiment of the present invention. As shown in Figure 5, at step S5-1, the tracking module 3 may perform an initialisation sub-process based on received data of an initial captured image from the camera, for example as described in the applicant’s above-referenced application GB2518589. At step S5-3, the initialised tracking module 3 receives captured image data from the camera 5, which can be an image in a sequence of images or video frames. At step S5-5, the tracking module 3 determines the location of a detected object in the captured image. An exemplary object tracking sub-process is described with reference to Figure 6, for the shape model 15 illustrated in Figure 1. Referring to Figure 6, at step S6-1, the tracking module 3 determines if an object, was previously detected and located for tracking in a prior image or video frame. In subsequent iterations of the tracking process, the tracking module 3 may determine that the object was previously detected and located, for example from tracking data (not shown) stored by the system 1, the tracking data including a determined global object shape of the detected object, which can be used as the initialised global object shape for the current captured image. As this is the first time the tracking process is executed, processing proceeds to step S6-3 where the captured image data is processed by the object detector 13 to detect an object in the image and to output a bounding box of an approximate location for the detected object. At step S6-5, the tracking module 3 initialises the detected object shape using the trained global shape model 15a and the corresponding global shape regression coefficient matrix 15c retrieved from the model data store 17a, based on the image data within the identified bounding box.

[0100] At step S6-7, the tracking module 3 performs processing to refine the initialised global object shape using the trained sub-shape models 15b and its corresponding cascading regression coefficient matrices 15d for each sub-shape model 15b. This processing is described in more detail with reference to Figure 7. As shown in Figure 7, at step S7-1, the refinement process starts with the tracking module 3 computing and adjusting the nearest shape fitting the global shape model. The weighting of the eigenvectors or parameters of the model for the computed plausible shape should be contained in the scope of valid shapes. A valid shape is defined to have their parameters between some boundaries. Given the shape computed in the previous frame, it is checked if the output from the sub-shape regression coefficient matrices computed independently fits the global shape model definition before proceeding further. Accordingly, at step S7-3, it is determined if the percentage of parameters out of boundaries is greater than a predefined threshold a. In the positive case, tracking of the object is considered to be lost. If so, the refinement process is terminated and processing may return to step S5-3 where a new captured image is received from the camera for processing. Otherwise, the tracking module 3 proceeds to adjust the object shape to fit the global shape model 15a, at step S7-3.

[0101] At step S7-5, the tracking module 3 computes a similarity transformation between the adjusted shape and the mean shape 15a’. At step S7-7, the captured image is transformed based on the computed similarity transformation. At step S7-9, the tracking module 3 calculates a conversion of the adjusted shape to the transformed image. At step S7-11, the tracking module 3 determines a plurality of candidate sub-shapes from the current adjusted global shape, based on the sub-shape models 15b as discussed above. The candidate sub-shapes are then updated by iteratively applying the corresponding cascading sub-shape regression coefficient matrices 15d to the sub-shape, starting with the highest, most generalised cascade level.

[0102] Accordingly, at step S7-13, the tracking module 3 selects a first of the candidate subshapes. The tracking module 3 then determines at step S7-15 a BRIEF descriptor for the candidate sub-shape, based on the transformed image at the current cascade level. At step S7-17, the tracking module 3 updates the current candidate sub-shape based on the corresponding subshape regression coefficient matrix 15d at the current cascade level, retrieved from the model data store 17a. This updating step will depend on the particular regression analysis technique implemented by the system 1 to apply the trained function defined by the sub-shape regression coefficient matrix 15d to the sub-shape data values. At step S7-19, the tracking module 3 determines if there is another candidate sub-shape to process and returns to step S7-13 to select the next sub-shape to be processed at the current cascade level. Once all of the candidate subshapes have been processed at the current cascade level, the tracking module 3 determines at step S7-20 if there is another cascade level to process, and processing returns to step S7-13 where the sub-shape refinement process is repeated for the next cascade level.

[0103] When it is determined at step S7-21 that all of the sub-shapes have been processed for all of the cascade levels of the sub-shape regression coefficient matrices 15d, then at step S7-23, the tracking module 3 may check that the refined sub-model meets predefined accuracy thresholds, before completing the object refinement process. Returning to Figure 6, the tracking module 3 determines at step S6-9 whether refinement of the detected object sub-shapes within the acceptable parameters was successfully achieved at step S6-7. If not, for example if it was determined at step S7-3 or step S7-23 that tracking of the object was lost, then processing can return to step S5-3, where a new captured image is received from the camera for processing in a new iteration by the tracking module 3. Otherwise, if the tracking module 11 determines that acceptable sub-shape refinement was achieved by the processing at step S6-7, then at step S6-11, the tracking module 11 optionally applies an exponential smoothing process to the object shape, based on the object shape detected on the previous frame when available. Exponential smoothing can be carried out on the estimated object shape data in order to produce smoothed data for presentation purposes, based on the following exemplary equation:

St = OXt+ (l-Oi)St-l where St-1 is the previous object shape determined from the previous frame, St is the smoothed version of the current estimated object shape xt, and a is a weighting value which is adapted automatically during runtime. It will be appreciated that this smoothing technique advantageously provides for improved visualisation of the estimated shape(s), therefore forecasts need not be obtained to make predictions of where the object might be in the next frame. The complex environments where the invention aims to operate includes unknown lighting conditions, movements of both the camera and the object to track occasioning very complicated motion models and no ground truth of the real position or measurement to be used in the update step in more complicated strategies for tracking such as Kalman filtering.

[0104] Referring back to Figure 5, after the tracking module 3 has determined at step S5-7 that the object detector 13 has successfully tracked the location of a detected object in the captured image and generated or updated an instance of the object shape model 15’ with the refined locations of the labelled points 19, then at step S5-9, the feature detector 29 generates one or more feature descriptors 32 for respective predefined feature areas of the tracked object. This processing is described in more detail with reference to Figure 8. As shown in Figure 8, at step S8-1, the feature detector 29 computes an affine transformation of the captured image to the global mean shape 15a’ to obtain a warped instance of the captured image. At step S8-3, the feature detector 29 may normalise the warped image by applying photometric normalization to compensate for different lighting conditions, for example. The result of these steps is a uniformly distorted instance of the captured image to the trained, and thus static, model estimation. At step S8-5, the feature detector 29 may perform edge-preserving smoothing of the warped image data, for example based on the Bilateral filter, the Guided filter, anisotropic diffusion, or the like, to smooth away textures whilst retaining sharp edges.

[0105] At step S8-6, the feature detector 29 identifies the next visible feature (or aspect) of the tracked object to be processed, this being a first feature the first time the sub-process is executed. For example, each selected virtual wearable product may be associated with one or more visible features or aspects to be detected. Alternatively or additionally, the feature detector 29 may be configured to automatically determine the presence or absence of a visible feature or aspect in the captured image. At step S8-7, the feature detector 29 retrieves the stored plurality of feature sampling points 31 defined for the current visible feature, for example from the data store 17b. At step S8-9, the feature detector 29 samples pixel values from the captured image at the locations defined by the retrieved feature sampling points 31. For example, a selection of ten labelled points 19 around the eye region of a face object may be defined as feature sampling points 31-1 associated with the first exemplary group of feature masks 27’-l illustrated in Figure 4A. As another example. For example, a grid of twelve sampling points may be defined relative to labelled points 19 around a cheek area of a face object may be defined for the second exemplary group of feature masks 27’-2. The corresponding locations of each feature sampling point 31 can be determined from the stored instance of the object shape model 15’. At step S8-11, the feature detector 29 generates a feature descriptor 32 for the current feature area, based on the sampled pixel values. It will be appreciated that the precise data structure and composition of the feature descriptor 32 will depend on the particular type of descriptor that is implemented by the feature detector 29. At step S8-13, the feature detector 29 determines if there is another predefined feature area to process, and if so, processing returns to step S8-6 to identify the next visible feature (or aspect) to be processed.

[0106] Referring back to Figure 5, the feature detector 29 identifies at step S5-11 a matching feature mask 27 for each predefined feature area, by passing each respective feature descriptor through the trained classifier 33. For example, the trained classifier 33 may output an identifier 28a of the selected feature mask 27 determined to be the closest match to the particular visible feature or aspect of the visible feature in the captured image. The tracking module 3 may pass the captured image data and the identifiers 27a of the selected feature masks to the colourisation module 7 to complete the tracking process.

[0107] At step S5-13, the colourisation module 7 retrieves the mask data 27b of each selected feature mask 27 from the data store 17b. The colourisation module 7 may then process each polygonal region of the mesh subset 27c from the or each retrieved feature mask 27, to determine a set of transformation values by mapping the coordinates of the vertices of the selected mask mash subset to the location of the corresponding tracked labelled point determined by the tracking module 3, and apply the transformation to the masked data to generate corresponding warped masked data for the selected masked region. At step S5-15, the colourisation module 7 applies the image colourisation to the captured image data by modifying pixel values in the respective selected masked regions of the captured image data, based on colourisation parameters 9 for example corresponding to one or more virtual try-on products, retrieved from the data store 9a. The colourisation module 7 may also retrieve one or more texture data files 33 as identified by the selected set of colourisation parameters 9. Optionally, the colourisation module 7 may also apply the determined transformation values to the retrieved region of texture data to generate a corresponding warped texture data region. The colourisation module 7 applies the one or more image colourisation adjustments to the warped masked image data region using the one or more shader modules 7a. The renderer 7b may receive and overlay all of the modified regions of image data as a sequence of layered data to be applied to the captured image data, and perform an alpha blend of the modified image data regions. The renderer 7b overlays the blended results on the original captured image data for output to the display 11, at step S5-17. At step S5-19, the tracking module 3 determines that there is another captured image frame to process, and processing returns to step S5-3 to repeat the tracking and colourisation processes for the next frame.

Computer Systems [0108] The modules described herein, such as the tracking and colourisation modules, may be implemented by a computer system or systems, such as computer system 1000 as shown in Figure 38. Embodiments of the present invention may be implemented as programmable code for execution by such computer systems 1000. After reading this description, it will become apparent to a person skilled in the art how to implement the invention using other computer systems and/or computer architectures.

[0109] Computer system 1000 includes one or more processors, such as processor 1004. Processor 1004 may be any type of processor, including but not limited to a special purpose or a general-purpose digital signal processor. Processor 1004 is connected to a communication infrastructure 1006 (for example, a bus or network). Various software implementations are described in terms of this exemplary computer system. After reading this description, it will become apparent to a person skilled in the art how to implement the invention using other computer systems and/or computer architectures.

[0110] Computer system 1000 also includes a user input interface 1003 connected to one or more input device(s) 1005 and a display interface 1007 connected to one or more display(s) 1009. Input devices 1005 may include, for example, a pointing device such as a mouse or touchpad, a keyboard, a touchscreen such as a resistive or capacitive touchscreen, etc. After reading this description, it will become apparent to a person skilled in the art how to implement the invention using other computer systems and/or computer architectures, for example using mobile electronic devices with integrated input and display components.

[0111] Computer system 1000 also includes a main memory 1008, preferably random access memory (RAM), and may also include a secondary memory 610. Secondary memory 1010 may include, for example, a hard disk drive 1012 and/or a removable storage drive 1014, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. Removable storage drive 1014 reads from and/or writes to a removable storage unit 1018 in a well-known manner. Removable storage unit 1018 represents a floppy disk, magnetic tape, optical disk, etc., which is read by and written to by removable storage drive 1014. As will be appreciated, removable storage unit 1018 includes a computer usable storage medium having stored therein computer software and/or data.

[0112] In alternative implementations, secondary memory 1010 may include other similar means for allowing computer programs or other instructions to be loaded into computer system 1000. Such means may include, for example, a removable storage unit 1022 and an interface 1020. Examples of such means may include a program cartridge and cartridge interface (such as that previously found in video game devices), a removable memory chip (such as an EPROM, or PROM, or flash memory) and associated socket, and other removable storage units 1022 and interfaces 1020 which allow software and data to be transferred from removable storage unit 1022 to computer system 1000. Alternatively, the program may be executed and/or the data accessed from the removable storage unit 1022, using the processor 1004 of the computer system 1000.

[0113] Computer system 1000 may also include a communication interface 1024. Communication interface 1024 allows software and data to be transferred between computer system 1000 and external devices. Examples of communication interface 1024 may include a modem, a network interface (such as an Ethernet card), a communication port, a Personal Computer Memory Card International Association (PCMCIA) slot and card, etc. Software and data transferred via communication interface 1024 are in the form of signals 1028, which may be electronic, electromagnetic, optical, or other signals capable of being received by communication interface 1024. These signals 1028 are provided to communication interface 1024 via a communication path 1026. Communication path 1026 carries signals 1028 and may be implemented using wire or cable, fibre optics, a phone line, a wireless link, a cellular phone link, a radio frequency link, or any other suitable communication channel. For instance, communication path 1026 may be implemented using a combination of channels.

[0114] The terms "computer program medium" and "computer usable medium" are used generally to refer to media such as removable storage drive 1014, a hard disk installed in hard disk drive 1012, and signals 1028. These computer program products are means for providing software to computer system 1000. However, these terms may also include signals (such as electrical, optical or electromagnetic signals) that embody the computer program disclosed herein.

[0115] Computer programs (also called computer control logic) are stored in main memory 1008 and/or secondary memory 1010. Computer programs may also be received via communication interface 1024. Such computer programs, when executed, enable computer system 1000 to implement embodiments of the present invention as discussed herein. Accordingly, such computer programs represent controllers of computer system 1000. Where the embodiment is implemented using software, the software may be stored in a computer program product 1030 and loaded into computer system 1000 using removable storage drive 1014, hard disk drive 1012, or communication interface 1024, to provide some examples.

[0116] Alternative embodiments may be implemented as control logic in hardware, firmware, or software or any combination thereof.

Alternative Embodiments [0117] It will be understood that embodiments of the present invention are described herein by way of example only, and that various changes and modifications may be made without departing from the scope of the invention. Further alternative embodiments may be envisaged, which nevertheless fall within the scope of the following claims.

[0118] For example, it will be appreciated that although the respective processes and associated processing modules are described as separate embodiments, aspects of the described embodiments can be combined to form further embodiments. For example, alternative embodiments may comprise one or more of the object tracking and object colourisation and augmentation aspects described in the above embodiments.

[0119] As yet another alternative, the training modules, tracking module and/or the colourisation module may be provided as one or more distributed computing modules or processing services on a remote server that is in communication with the augmented reality system via a data network. Additionally, as those skilled in the art will appreciate, the tracking module and/or the colourisation module functionality may be provided as one or more application programming interface (API) accessible by an application program executing on the augmented reality system, or as a plug-in module, extension, embedded code, etc., configured to communicate with the application program.

Claims

1. A computer-implemented method of augmenting image data, the method comprising: receiving data of an image captured by a camera, the captured image including a region having a visible feature of an object; storing masking data defining a plurality of masks, each mask defining a respective masked portion of the region of the captured image; sampling pixel values at predefined locations of the captured image data; selecting at least one stored mask based on the sampled pixel values; modifying pixel values in the or each selected masked portion of the region of the captured image based on colourisation parameters; and outputting the captured image with the modified pixel values for display.

2. The method of claim 1, wherein each mask defines variations of the appearance of the visible feature.

3. The method of claim 1 or 2, wherein the plurality of masks are arranged in groups, each group associated with a respective visible feature or aspect of a visible feature in the captured image.

4. The method of any preceding claim, wherein selecting at least one stored mask comprises generating a feature descriptor based on the sampled pixel values, and identifying a selected one of the stored masks based on the feature descriptor.

5. The method of claim 4, further comprising using a trained classifier to identify a stored mask.

6. The method of any preceding claim, further comprising storing shape model data defining a representation of the object shape, the shape representation identifying locations of a plurality of labelled points, at least a subset of said labelled points associated with the visible feature of the object.

7. The method of claim 6, further comprising determining a location of said object in the captured image.

8. The method of claim 7, wherein determining a location comprises modifying an instance of the shape model data to fit the object in the captured image.

9. The method of claim 8, wherein pixel values are sampled at predefined locations defined by or relative to said labelled points.

10. The method of claim 9, wherein each mask corresponds to at least one polygonal region defined by three or more of vertices, wherein each vertex is associated with a corresponding labelled point.

11. The method of claim 10, determining a transformation of the at least one polygonal region of a mask based on determined coordinates of the corresponding labelled points of the modified instance of the shape model.

12. The method of claim 11, wherein the colourisation parameters comprise data defining at least one texture image defining values to augment said one or more masked regions of said captured image.

13. The method of claim 12, applying the determined transformation to corresponding regions of the texture image data.

14. The method of any preceding claim, wherein at least one masked region comprises a plurality of discontinuous regions of pixels.

15. The method of any preceding claim, wherein the colourisation parameters define values to augment said masked image data.

16. The method of any preceding claim, wherein said image is one of a captured sequence of images, and wherein the object is tracked from one image to the next image in the sequence.

17. A system comprising means for performing the method of any one of claims 1 to 16.

18. A storage medium comprising machine readable instructions stored thereon for causing a computer system to perform a method in accordance with any one of claims 1 to 16.