GB2575165A

GB2575165A - Object identification system

Info

Publication number: GB2575165A
Application number: GB201906635A
Authority: GB
Inventors: Oscar Thomas Wood Billy
Original assignee: Owlett Ltd
Current assignee: Owlett Ltd
Priority date: 2018-05-13
Filing date: 2019-05-10
Publication date: 2020-01-01
Anticipated expiration: 2039-05-10
Also published as: GB201906635D0; GB2575165B

Abstract

A digital device 3 includes a camera 1 and a sensor 2. When the sensor detects an object within the field of view of the camera a processing system is informed to control the camera to capture an image of the object, and to identify the object based on the captured image. An audio speaker could inform the user about the object, and any text thereon may be read and communicated to the user. An enclosure 6 for placing the objects could be provided and includes a tactile pad 9 to help a visually-impaired user to place the item correctly for the camera. End walls 9, 23 further aid in the placement of the camera system and the object to be identified. The dominant colour of the object could also be identified, and the image cropped so as to emphasise the object.

Description

OBJECT IDENTIFICATION SYSTEM

The present disclosure relates to a system arranged to identify an object, and a method of identifying an object.

A commonly occurring issue with people that have trouble seeing is difficulty identifying common household objects, for instance identifying the contents of a tin can, identifying food they could be allergic to, or identifying the colour of wool.

Devices that can identify items, and provide an output perceptible by a blind or partially sighted person are known. These can either rely on image analysis, or scanning of barcodes or other indicia provided on the object. However, these devices can be difficult to use, and unreliable.

According to a first aspect of the invention, there is provided a system arranged to identify an object, the system including a device having: a camera having a field of view; a sensor arranged to detect an object in the field of view of the camera; and a processing system arranged to: receive an input indicative of the sensor detecting an object in the field of view; in response to receiving the input, control the camera to capture an image of the object; and cause processing of the image to identify the object based on the captured image.

Since the image is captured in response to detecting an object in the field of view, no user input is required, and the device is simple to use. Furthermore, an image is only captured when required.

The processing system may be arranged to: use the sensor to determine a distance from the camera to the object. The processing system may optionally be arranged to identify the object based at least in part on the captured image and the determined distance. By measuring the distance from the camera to the object, the device can determine the size of the object, so it can identify the object with improved accuracy.

The device may include an output device arranged to provide a non-visual output indicating the object. A non-visual output can be perceived by a blind or partially sighted user. The output device may comprise a speaker, arranged to provide a verbal output.

The processing system may be arranged to cause recognition of any text on the image of the object. The processing system may be arranged to cause processing of the image to identify the object based at least in part on the captured image. The processing system may be arranged to identify the dominant colour of the object.

The processing system may be arranged to identify the object based on a combination of two or more of the recognised text, the object identification, and the dominant colour.

The device may be arranged to be handheld.

The device may be battery powered and sufficiently small to fit into a user’s pocket such that the user can use the device whilst out of the house, for example at a supermarket. It may be that instead of the sensor arranged to detect an object in the field of view of the camera, the processing system is arranged to control the camera to capture an image of the object in response to a user input, such as pressing a button.

The system is able to taken an image using an embedded camera after detecting an object placed in front of it using a sensor, process the image to identify its characteristics and output it verbally to the user.

The system may include an enclosure for locating an object relative to a camera, the enclosure including: a camera area for locating the device; and a tactile guide for locating an object in a field of view of the camera.

By providing a tactile guide, the enclosure allows a blind or partially sighted person to easily use a camera to take an image of an object. The image can then be used to help identify the object and provide details of the object to the user.

The tactile guide may be located a predetermined distance from the camera area. This ensures that the object is always a fixed distance from the camera. When the image of the object is analysed, this can optionally be used to help to establish the size of the object, thus making the identification of the object more accurate.

The tactile guide may include a tactile pad or area provided on a surface of the guide.

The enclosure may further include a base, wherein the camera area and tactile guide are provided on a top surface of the base. Providing the camera area and guide on the same base helps keep a fixed arrangement between the two.

The tactile guide may include a vertically extending portion. This can help to support an object placed in the enclosure. The vertically extending portion may be of known size. Therefore, when the image of the object is analysed, this can optionally help to establish the size of the object, thus making the identification of the object more accurate. The vertically extending portion may comprise a wall extending perpendicular to a direction between the camera area and tactile guide.

The use of the enclosure in combination with the device enables a user to easily place an object in the field of view of the camera. The accuracy of the image analysis process may also be enhanced, as discussed in relation to the first and second aspects.

The device may be integral with the enclosure.

Preferably, the device may be:

• Housebound (static) • Simple to use • Stable • Identifiable by touch • Connected to the mains electricity (as opposed to a battery) • Small enough to fit comfortably on a desk • Placed in a specially designed enclosure for easy operation

The processing system may be arranged to: receive an input indicative of known dimensions of the enclosure; and process the image to identify the object based at least in part on the captured image and the known dimensions. This can help provide accurate identification of the object. The known dimensions may comprises one or more of: a distance from the camera to the tactile guide; and a height of a vertically extending portion of the enclosure

The processing system may be arranged to pre-process the captured image. The preprocessing may include one or more of: cropping the image, emphasising the object to be identified, and edge detection.

The processing system may comprise a server. The image may be sent from the device to the server. The image may be sent wirelessly. The server may be arranged to perform at least part of the pre-processing.

The processing system may comprise an image analysis server. The image analysis server may be arranged to cause processing of the image to identify the object based at least in part on the captured image.

The processing system may comprise a text recognition server arranged to cause recognition of any text on the object.

The processing system may comprise a server configured to process the captured image to identify the dominant colour of the object.

The image may be sent from the first server to one or more of: the image analysis server, the text recognition server, and the dominant colour identification server.

One or more of the servers may be cloud based.

Pre-processing of the image, such as cropping, emphasising the object to be identified, or edge detection, reduces the amount of processing which needs to be done by the image analysis module and/or text recognition module and/or dominant colour recognition module. This reduces the processing time associated with the image processing which leads to a faster overall response time from the system.

According to a second aspect of the invention, there is provided a method of identifying an object comprising: determining if an object is provided in a field of view of a camera; in response to determining that an object is in the field of view of the camera, capturing an image of the object; and based at least in part on the image, identifying the object.

Since the image is captured in response to detecting an object in the field of view, no user input is required, and the device is simple to use, and an image is only captured when required.

Alternatively, the image may be captured in response to a user input such as pressing a button.

The method may further comprise: determining a distance from the camera to the object using the sensor, the object may be identified based at least in part on the determined distance. By measuring the distance from the camera to the object, the device can determine the size of the object, so it can identify the object with improved accuracy.

The method may include providing a non-visual output indicating the object. The output may comprise an verbal output. A non-visual output can be perceived by a blind or partially sighted user.

The method may include identifying the object using the processing system described in relation to the first aspect.

According to a third aspect of the invention, there is provided computer instructions that, when executed by a processor, perform the steps of the second aspect.

The system, method and computer program of the above aspects may be used in various use cases, such as:

• Identifying the contents of a package of food before it is eaten;

• Identifying the colour of wool before it is used; and • Identifying food that could potentially set off an allergy

According to yet a further aspect of the invention, there is provided a system arranged to identify an object, the system including a device having: a camera having a field of view; and a processing system arranged to: receive an input; in response to receiving the input, control the camera to capture an image of the object; perform processing on the image in order to reduce the size the image such that the portion of the image occupied by the object is increased and/or emphasise the object; cause processing of the image to identify the object based on the processed image.

Embodiments of the disclosure may be further understood with reference to the following numbered clauses:

1. A static device that can identify objects placed in front of it using an inbuilt camera and processing system.

2. A device according to clause 1, in which the device knows that an object is placed in front of it using a built in sensor.

3. A device according to clause 1, in which the output is represented verbally to the user.

4. A device according to clause 1, which is plugged into a wall socket.

5. A device according to clause 1 which is placed in an enclosure designed to help a visually impaired user operate it.

6. An enclosure according to clause 5, which can be identified by touch

The device provides a physical device that can identify an object placed in front of it using a camera that feeds an image to an image description algorithm. The output from which is returned to the user verbally.

It will be appreciated that any feature discussed in relation to a particular aspect of the invention may be applied to any other aspect.

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:

Figure 1 shows a device for identifying objects;

Figure 2 shows the device as placed within a specially designed enclosure according to an embodiment of the invention;

Figure 3 schematically illustrates the operating components of the device of Figure 1; and

Figure 4 illustrates a method of identifying an object.

Figures 1 and 3 illustrate a device 3 that can be used to identify an object (not shown) placed in front of the device 3. A camera 1 is attached to the front of the device 3.

When the object is placed in front of the camera 1, as detected by the sensor 2, the camera 1 takes an image. This is then processed to detect factors of the object photographed, such as its shape, colour, and the contents of any text detected. This is then output to the user using a speaker 4, installed internally in the device 3. The operation of the device 3 will be discussed in more detail below, in relation to Figures 3 and 4.

All of the above components take power from a power connection 5, taking power from an electrical plug (not shown) connected between the device and a mains power socket (not shown).

In the example shown in Figure 1, the device 3 comprises a housing 10 that encloses the camera 1, sensor 2, speaker 4, and a processing system 11 that is used to analyse the image of the object taken by the camera 1.

The housing 11 defines a front face 12 of the device 3. The camera 1 and sensor 2 are arranged to face out of the front face 12, in a forward direction defining the field of view of the camera 1. In the front face 12, a first opening 13 is formed for the camera 1, to allow the camera 1 to take images of objects placed in the field of view of the camera. A second opening 14 is similarly provided for the sensor 2, to allow the sensor to detect objects placed in front of the device 3.

The housing 11 also defines a rear face 15, opposite the front face 12, and sides 16a, 16b between the front and rear faces 12, 15. The housing also defines a base 17 for the device 3 to rest on, and a top 18 opposite the base

In some, but not all, examples, a further opening (not shown) may be formed for the speaker 4. This may be formed in the front or rear face 12, 15, the sides 16a,b, or the top 18. Similarly, the power connection may be formed in the front or rear face 12, 15, the sides 16a,b, or the top 18.

The camera 1 may be any suitable camera for capturing in an image. For example, the camera may be CMOS image sensor. Alternatively, the camera may be a CCD sensor, or any other suitable camera.

The sensor 2 may be any type of suitable sensor that can detect the presence of an object. For example, the sensor 2 may be a LIDAR sensor, radar sensor, ultra sound sensor, or infrared sensor. The device 3 may also include a source 2a. The source 2a emits an appropriate signal. The sensor 2 then detects reflections of the signal from the object. For example, where the sensor is a LIDAR sensor, the source 2a may comprise a pulsed laser, or where the sensor 2 comprises an ultrasound sensor, the source 2a may be an ultrasound source. In some examples, passive sensing technologies, that can detect the object without use of a source, may be used.

The device may be designed to fit in a physical enclosure 6. The enclosure 6 includes a space 19 designed to fit the device 7, a tactile pad 8 which can be used by the visually impaired user to identify where the object to be identified should be placed. A wall 9 is used where an object can be propped up if necessary. The wall 9 also provides a guide to the height and width of the image taken by the camera 1.

As shown in Figure 2, the enclosure 6 includes a rigid base 20, extending along a length from a first end 21 to a second end 22. The device 3 is provided on a top surface 20a of the base 20, adjacent the first end 21 of the base 20, with the front face 12 facing in a direction along the length of the base 20 to the second end 22. The tactile pad 8 is provided on the top surface 20a of the base 20, at or near the second end 22. The tactile pad 8 extends a portion of the length of the base 20, and a portion of the width. The tactile pad 8 is positioned and sized to ensure that objects placed on the pad are in the field of view of the camera 1 (and sensor 2).

The wall 9 is a rigid wall, provided at the second end 22 of the base 6. The wall 9 extends vertically from the base 20, across the width of the base 6. A second rigid wall 23 is provided at the first end 21 of the base. The second wall 23 extends vertically from the base 20, and across the width of the base 20. The rear face 15 of the device 3 abuts the second wall 23. Thus the second wall 23 acts as a guide to locate the device 3. The second wall 23 does not extend as high from the base 20 as the first wall 9. This provides further differentiation between the first and second ends 21, 22 of the enclosure 6.

The power socket on the device 3 should be provided so as not to interfere with the second wall.

The base 20 and walls 9, 23 may be made of any suitable rigid material. For example, plastics, metals or wood. The pad 8 may be formed of any suitable tactile material. For example, the pad 8 may be rubber, fabric, or another material provided on the top surface of the base 20 (or in a recess formed in the base 20). Alternatively, the pad may be formed by a recess or raised area in the material of the base 20.

The operation of the device 3 shown in Figure 1, placed in the enclosure 6 shown in Figure 2, will now be discussed with reference to Figures 3 and 4.

Figure 3 illustrates the processing system 11 in more detail. The processing system 11 includes a processing unit 24 (for example an Intel® X86 processor such as an 15, 17 processor or the like) a memory 25, a camera driver 26, sensor driver 27 and speaker driver 28, and a communications interface 29, connected to each other via a system bus 30.

The memory 25 is subdivided into program storage 31 and data storage 32. The processing unit 24 can access the memory 25 via the system bus 30, to access program code stored in the program storage 31, to instruct it what steps to perform and to access data stored in the data storage 32, when needed. The processing unit 24 may also receive data from the camera driver 26 and sensor driver 27 through the system bus 30, and process it in accordance with the program code, and may control the camera driver 26, sensor driver 27 and speaker driver 28 through the system bus 30.

The communications interface 29 may enable any suitable wired or wireless communication protocol, such as WiFi, Bluetooth, or any other suitable method.

The program code and data may be delivered to memory 24 in any suitable manner. This may be through the communications interface 29 or otherwise. For example, the program code may be installed on the device from a CDROM; a DVD ROM / RAM (including -R/-RW or +R/+RW); a separate hard drive; a memory (including a USB drive; an SD card; a compact flash card or the like); a transmitted signal (including an Internet download, ftp file transfer of the like); a wire; etc.

It will be appreciated that although reference is made to a memory 25 it is possible that the memory 25 could be provided by a variety of devices. For example, the memory may be provided by a cache memory, a RAM memory, a local mass storage device such as the hard disk, any of these connected to the processing system 11 over a network connection. As discussed above, the processing unit 24 can access the memory 25 via the system bus 30 and, if necessary, communications interface 29, to access program code to instruct it what steps to perform and also to access data to be processed.

Similarly, the processing unit 24 and any of the drivers 26, 27, 28 may also be accessible via the bus 30, and, if necessary, communications interface 29.

The program storage 31 includes a sensor module 39 and an image capture module 33. The sensor module 39 controls the sensor source 2a to emit signals at a regular basis. The sensor module 39 also receives the return signal from the sensor 2, and processes it.

The sensor module 39 is able to detect when an object is placed in front of the camera 1. For example, detection of a reflection within a predefined time window may indicate that an object is placed in front of the camera 1. On the other hand, detection of a reflection outside the window, or no reflection, may indicate that no object is placed in front of the camera.

Where the device 3 is to be used in an enclosure 6 as described above, the predefined window in which detection of reflected signals indicates the presence of an object may be selected such that reflections from the wall 9 fall outside the time window. Alternatively, the wall 9 may be made of or coated with a material that does not reflect the emitted signal.

In further examples, the predefined window may be further selected based on the position of the tactile pad 8 within the enclosure.

Using, for example, time of flight calculations, the sensor module 39 is able to process the detected signals from the sensor 2 to determine a distance to the object. It will be appreciated that the procedure for detecting the presence of an object may be based on the time of flight or the calculated distance, as these parameters are analogous. Other distance calculation techniques may also be used.

In response to detecting the object, the image capture module 31 controls the camera 1 to capture an image of the object. The image capture module 31 may capture further images if instructed to do so. This may be under control of the image capture module 33, in response to a user input, or in response to the module 39 detecting that the object is removed and a new object provided. Otherwise, only a single image is captured.

The program storage 31 also includes an image analysis module 34. This processes the image captured by the camera 1, to determine certain characteristics of the object. For example, this may be colour, outline shape, and dimensions. The image processing may also generate natural language descriptions of the images captured.

The image analysis module may also use optionally the distance of the object from the camera 1, determined by the sensor module 39 using time of flight calculations, to determine the dimensions, scale and other characteristics of the image. In some embodiments, this may be omitted.

The natural language description of the object may be generated using image analysis programs such as https://cloudsight, ai/ (available as at the priority date of this application).

The colour of the object may be determined by suitable processing of the output of the CMOS sensor. For example, where the sensor includes different pixels sensitive to different colours, this may be used to determine the colour. Alternatively, a spectrum of the object may be generated by the output of the camera 1.

The program storage 31 may also include a text recognition module 36. This can analyse the image taken by the camera 1 to recognise any text on the object. This may be, for example, by optical character recognition, or other suitable techniques. For example, the text recognition may be through a program such as available at https://cloud.***.com/vision/docs/ocr (as at the priority date of this application).

In the embodiment discussed above, the raw image, as captured by the device, is analysed to identify the object. However, in other embodiments, the image may first be pre-processed, before object identification. The pre-processing steps are arranged to detect and then emphasise the key object in the image which is the object to be identified. The pre-processing steps may include cropping the image such that it is as small as possible whilst still containing the entire object to be identified, altering the contrast of the image to emphasise the object, and using edge detection to identify the boundaries of the key object in the image.

Cropping the image may remove areas around the outer edge of the image. Therefore, an object in the centre of the image becomes more prominent in the image. Altering the contrast may cause items foreground (and/or objects in focus) to be highlighted and emboldened. Edge detection may identify and optionally highlight the edge of objects in the image.

In some examples, the image may be cropped based on the edge detection. For example, the image may be cropped such that the limits of the image are tightly fitted to an edge identified within the image. In further examples, an initial cropping stage may occur prior to edge detection, and then further cropping may occur after edge detection. Edge detection and altering the contrast may occur in any suitable order, before or after cropping.

Image analysis, text recognition and colour detection may then be performed on the pre-processed image.

Image analysis module 34 generates a description of the object. The text recognition module 36 detects and reads any text on the image. The colour detection module identifies the dominant colour of the emphasised object. This data is then presented to the user.

An object identification module 37, stored in the program storage 31, controls the speaker 4 to provide a verbal indication of the object. This may be by simply outputting of the information determined by the image analysis module 34, colour detection module, and text recognition module 36. For example, the natural language description, colour and text may be provided through the speaker 4.

In other examples, the object identification module 37 may carry out further processing of the data provided from the image analysis module 34 and text recognition module 36. For example, the data determined may be compared to image reference data 38 stored in the data storage 32, to provide further information on the object. The image reference data 38 may be predetermined and/or may be updated by use of the device.

In the example discussed above, the processing and pre-processing is carried out by the processing system 11 of the device 3. In other examples, these steps may be carried out by different entities.

For example, an image analysis server may implement the image analysis module 34, a text recognition server may implement the text recognition module 36, and a dominant colour detection server may implement the dominant colour detection module. The pre-processing may occur at the device, or at one or more of the servers, or at a combination thereof.

In yet further examples, a further server may be provided. The server may be cloud based. The unprocessed image may be sent from the device 3 to the server, and the server may forward the image to the text recognition server, dominant colour detection server, and image analysis server. Pre-processing steps, such as cropping, object emphasis, such as via contrast adjustment, and edge detection may be carried out at the server.

The server may first forward the pre-processed image to the image analysis server, since the image analysis takes more time than the text recognition and colour detection. The server may then forward the pre-processed image to the text recognition server since the text recognition takes more time than the colour detection. The server may finally forward the pre-processed image to the colour detection server.

Each server may send the result back to the device so that the results can be output to the user.

Figure 4 schematically illustrates the method 50 of identifying the object, as discussed above. In a first step 51, the presence of an object is detected. In a second step 52, the spacing of the object from the camera is determined. In a third step 53, an image is captured. At a fourth step 54, the object is identified, and an output provided through the speaker 4. It will be appreciated that the step of determining the spacing 52 may be carried out before, after or at the same time as the step of capturing the image 53.

The system discussed above is given by way of example only.

The tactile pad 8 is just one example a tactile guide that can be provided. Any suitable guiding mechanism that helps a partially sighted or blind user place an object in the field of view of the camera 1 may be used. For example, the guide may be formed by a border defining an area or other tactile markings defining the area. In further examples, vibration of electrically controlled devices provided in the base 20 or other suitable methods may be used to define an area.

In some examples, the guide may include a rotating platform on which the object is placed. This may either enable the processing system 11 to control rotation of the platform, to allow images of the object to be taken from different points of view, or allow the user to control the platform.

The guide may include tactile markers to indicate which way the camera is facing.

In the above example, the tactile pad 8 extends across only part of the width of the base 20. This is by way of example only, and the guide may extend the full width of the base 20, if this is in the field of view of the camera 1. Multiple tactile pads may be provided on the top surface 20a of the base 20 and/or a surface of the wall 9 facing the camera 1.

The enclosure 6 may include a tactile guide to assist in locating the area 19 in which the device 3 including the camera 1 is placed. In the above example, the second wall 23 helps to locate the camera area 19. Any of the above techniques for providing a tactile guide 8 for the object may be used to define the camera area 19. However, it will be appreciated that the guide to locate the camera area 19 should be differentiated from the guide for locating the object in a manner perceptible by a blind or partially sighted person. For example, different tactile characteristics may be used to identify the camera area 19 and the area for placing the object, or different tactile markings may be used, or braille markings may be used.

In the above example, the walls 9, 23 extend the full width of the base 20. It may be that one of both of the walls 9, 23 extend only part of the width of the base 20. Furthermore, any suitable vertically extending column or member may be provided instead of a wall 9, 23. In some examples, the wall 9, 23 or vertically extending members may not be provided at the end of the base 20. Instead, the walls 9, 23 or vertical members may simply define an operating area between the device 3 and the tactile pad 8 or other guide.

In the example discussed above, the device 3 is connected to mains power through a power connection 5. In other examples, the enclosure 6 may include a power connection, and a socket for connecting to the device 3. Furthermore, in some examples, the device 3 may be battery operated, or capable of battery or mains operation.

Separate sensor systems may be provided for detecting the object, and determining the distance to the object. The system for determining the distance to the object may only be triggered when an object is detected.

In some embodiments, the sensor 2 may be partially or wholly included in the enclosure 6. For example, the wall 9 may include an emitter that transmits a signal to a sensor 2 in the device 3. The processing system 11 of the device 3 is then able to detect when the beam is broken by an object. The sensor 2 may also be in the second wall 23, if it is not blocked by the device 3.

The device 3 may be integral with the enclosure, or may be removable. Where the device is integral, guides may not be required to indicate the area 19 for the camera. The device 3 may also be used separately from the enclosure.

Furthermore, in some examples, the device may be a user device having software installed to allow it to operate in combination with the enclosure. For example the device may be a mobile phone, tablet or laptop of the user. Therefore, the user device provides at least part of the processing system

In the example discussed above the device 3 includes the camera 1, sensor 2, speaker 4 and processing system 11. It will be appreciated that in other examples, these components may be distributed across a number of devices that communicate by wired or wireless means. In practice, the speaker 4 and processing system 11 may be provided outside the enclosure, such that only the camera 1 and sensor 2 are provided in the enclosure. For example, the camera 1 and/or sensor 2 may be provide in a module that connects to a user’s computer, phone or other device, which provides the speaker and processing system.

The sensor 2 may also be omitted all together. In some examples, the camera 1 may be operated to perform the function of the sensor 2, in addition to image taking. In other examples, the function of the sensor 2 may be omitted, and the device 3 may be operated to capture images in response to a user command. This may be provided over the communications interface, or through a control input/output system (not shown) included on the device, such as a button.

In systems without the sensor 2, the device 3 may make use of the relative scale of the enclosure 6 to help identify the object. For example the system may use the distance between the camera 1 and the tactile pad 8 and/or the height of the vertical wall 9, both of which are known, to determine the relative dimensions of the object. The information about the size of the enclosure 6 may be stored as enclosure reference data 35 in the data storage 32, for use by the image analysis module 34.

Alternatively, the sensor 2 may detect further parameters of the object, to be used by the object identification module 37.

In the example discussed above, the output of the object identification module 37 is provided through the speaker 4. However, in other example, alternative outputs, perceptible by a blind or partially sighted user, may be provided instead of or as well as the speaker. For example, the device may include a braille printer that can print braille on paper or stickers to be fixed to the object.

The image, text and object identification techniques discussed above are given by way of example only. Any suitable technique may be used for identifying the object. Furthermore, additional sensors may be provided to determine further features or information that can be provided to the user. For example, there could be temperature 5 sensors or gas sensors that can help determine if food is spoilt.

Claims

1. A system arranged to identify an object, the system including a device having:

a camera having a field of view;

a sensor arranged to detect an object in the field of view of the camera; and a processing system arranged to:

receive an input indicative of the sensor detecting an object in the field of view;

in response to receiving the input, control the camera to capture an image of the object; and cause processing of the image to identify the object based on the captured image.

2. The system of claim 1, wherein the processing system is arranged to:

use the sensor to determine a distance from the camera to the object.

3. The system of claim 1 or claim 2, including:

an output device arranged to provide a non-visual output indicating the object, optionally wherein the output device comprises a speaker, arranged to provide a verbal output.

4. The system of any preceding claim, wherein the processing system is arranged to cause recognition of any text on the object.

5. The system of any preceding claim, wherein the processing system is arranged to cause processing of the image to identify the object based at least in part on the captured image.

6. The system of any preceding claim, wherein the processing system is arranged to identify the dominant colour of the object.

7. The system of any preceding claim, wherein the device is arranged to be handheld.

8. The system of any preceding claim, including an enclosure for locating an object relative to a camera, the enclosure including:

a camera area for locating the device; and a tactile guide for locating an object in a field of view of the camera, optionally wherein the tactile guide is located a predetermined distance from the camera area, and/or wherein the tactile guide includes a tactile pad or area provided on a surface of the guide.

9. The system of claim 8, wherein the enclosure further includes a base, wherein the camera area and tactile guide are provided on a top surface the base.

10. The system of claim 8 or claim 9, wherein the tactile guide includes a vertically extending portion, optionally wherein the vertically extending portion comprises a wall extending perpendicular to a direction between the camera area and tactile guide.

11. The system of any of claims 8 to 10, wherein the processing system is arranged to:

receive an input indicative of known dimensions of the enclosure; and process the image to identify the object based at least in part on the captured image and the known dimensions, optionally wherein the known dimensions comprises one or more of:

a distance from the camera to the tactile guide; and a height of a vertically extending portion of the enclosure.

12. The system of any preceding claim, wherein the processing system is configured to pre-process the captured image.

13. The system of claim 12 wherein the pre-processing includes one or more of:

cropping the image;

emphasising the object to be identified; and edge detection.

14. The system of any preceding claim wherein the processing system comprises a server, and wherein the image is sent from the device to the server.

15. The system of claim 14 when dependent on claim 13, wherein the server is configured to perform at least part of the pre-processing.

16. The system of claim 4, or any of claims 5 to 15 when dependent on claim 4, wherein the processing system comprises a text recognition server arranged to cause recognition of any text on the object.

17. The system of claim 5, or any of claims 6 to 16 when dependent on claim 5, wherein the processing system comprises an image analysis server arranged to cause processing of the image to identify the object based at least in part on the captured image.

18. The system of claim 6, or any of claims 7 to 17 when dependent on claim 6, wherein the processing system comprises a server configured to process the captured image to identify the dominant colour of the object.

19. The system of claim 14, wherein the image is sent from the server to one or more of:

the text recognition server of claim 16;

the image analysis server of claim 17; and the server of claim 18.

20. A method of identifying an object comprising:

determining if an object is provided in a field of view of a camera;

in response to determining that an object is in the field of view of the camera, capturing an image of the object; and based at least in part on the image, identifying the object.

21. The method of claim 20, comprising:

determining a distance from the camera to the object using the sensor.

22. The method of any claim 20 or claim 21, including:

providing a non-visual output indicating the object, optionally wherein the output comprises a verbal output.

23. The method of any of claims 20 to 22, wherein the object is identified using the processing system of any of claims 1 to 6, or any of claims 11 to 19.

5

24. Computer instructions that, when executed by a processor, perform the steps of any of claims 20 to 23.

25. A system arranged to identify an object, the system including a device having:

a camera having a field of view; and

10 a processing system arranged to:

receive an input;

in response to receiving the input, control the camera to capture an image of the object;

perform processing on the image in order to reduce the size of 15 the image such that the portion of the image occupied by the object is increased and/or emphasise the object;

cause processing of the image to identify the object based on the processed image.