WO2011134106A1

WO2011134106A1 - Method and device for identifying user inputs

Info

Publication number: WO2011134106A1
Application number: PCT/CN2010/000593
Authority: WO
Inventors: Peng Qin
Original assignee: Thomson Licensing; Shangguan, Sinan; Du, Lin
Priority date: 2010-04-29
Filing date: 2010-04-29
Publication date: 2011-11-03
Also published as: WO2011134106A8

Abstract

It is provided a method for identifying user inputs in a system, wherein said system comprises a display device and at least one camera for capturing user inputs, and in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of said display device, the method comprising the steps of obtaining a first user input; obtaining a second user input; and determining which choice among the set of the at least two explicit or implicit choices is chosen based on the position relationship between said first user input and said second user input.

Description

METHOD AND DEVICE FOR IDENTIFYING USER INPUTS

TECHNICAL FIELD

The present invention relates to user interface, and more particularly, relating to a method and a device for identifying user inputs.

BACKGROUND

Many devices and methods are available for providing user-machine interface. The user-machine interface, which is also called user interface (Ul), enables the machine or device to obtain user's inputs, and executes instruction corresponding to the obtained inputs.

Generally, the Ul system comprises three main components, i.e. processing unit, display unit and input unit. Taking the Television with Set Top Box (STB) connected as example, the TV is used as display device for displaying the electronic program guide (EPG), the STB is used as the processing device for outputting EPG to the TV and processing user's inputs, and the STB remote is used as input device. The user uses the STB remote to input instruction, e.g. channel up/down, volume up/down, viewing next channel information etc., with the help of EPG displayed on the TV.

As the technology evolves, more and more Ul systems are likely to adopt the gesture recognition technologies as the user input method. This kind of input method makes user feel like operating a real object compared to the traditional user input method, so the user experience is increased. However, these Ul systems require user to make training or calibration before actual operation, and the training or calibration is usually time consuming. To make things worse, the change of user position in front of the Ul system may affect the accuracy of recognition of user input, which means once a user changes his position, he may have to carry out training program again.

Therefore, a new method for user inputs is desired. SUMMARY

According to an aspect of present invention, it is provided a method for identifying user inputs in a system, wherein said system comprises a display device and at least one camera for capturing user inputs, and in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of said display device, the method comprising the steps of obtaining a first user input; obtaining a second user input; and determining which choice among the set of the at least two explicit or implicit choices is chosen based on the position relationship between said first user input and said second user input.

According to another aspect of present invention, it is provided a device for identifying user inputs. The device comprises a communication module configured to received user inputs captured by at least one camera; and a processing module configured to, in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of a display device, determine which choice is chosen based on position relationship between a first user input and a second user input, wherein said first user input and said second user input are obtained through said communication module.

According to the aspect of present invention, it provides an effective and efficient input method for a user without needing complex training each time the user changes his position.

It is to be understood that more aspects and advantages of the invention will be found in the following detailed description of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide a further understanding of the, illustrate embodiments of the invention together with the description which serves to explain the principle of the invention. Therefore, the invention is not limited to the embodiments. In the drawings: Fig. 1 is a diagram illustrating a system for user inputs according to an embodiment of present invention;

Fig. 2 is a diagram illustrating an example of a menu comprising 4 options according to the present embodiment;

Fig. 3 is a diagram illustrating an example of a trajectory of the infrared emitter in a camera according to the present embodiment;

Fig. 4A, 4B, 4C, 4D, 4E, 4F, 4G, 4H and 4I are diagrams illustrating examples of possible menus according to the present embodiment;

Fig. 5 is a flow chart illustrating a user input method according to the present embodiment;

Fig. 6A, 6B, 6C and 6D are screen snapshots illustrating an example of steps of getting gravity center from raw data when a user presses button of the infrared emitter in front of the cameras according to the present embodiment;

Fig. 7 is a diagram illustrating an example of coordinates system, in which the center of the first input and the occurrence position of the second input are shown, according to the present embodiment.

DETAILED DESCRIPTION

An embodiment of the present invention will now be described in detail in conjunction with the drawings. In the following description, some detailed descriptions of known functions and configurations may be omitted for clarity and conciseness.

The present invention aims to provide an input method for users, and make user feel more like operating an object in realistic world, for example, pressing a particular button among several buttons displayed on a display device, unlike the traditional operation in prior art that the user uses a remote to move selection pointer to a visual button on screen that he wants to execute and then presses 'OK' button on the remote. The method of the present invention uses two consecutive inputs of a user for generating a computer-comprehensible instruction, wherein the latter input is recognized and converted to the instruction based on the former input, so that once a user moves his position relative to the display device, the accuracy of input is increased without need him to perform the complex training/recalibration process.

An embodiment of present invention is placed in a system 100 as shown in the Fig. 1. The system 100 comprises a display device 101 , two cameras 102, 103 mounted on the display device 101 , a processing device 104 and an infrared emitter (not shown). As an example, the display device 101 is a PC display or a TV, the processing device is a PC main unit or a STB, the two cameras are mounted on the left-top and right-top of the TV, separately. It shall note that some components may be integrated into one device according to actual implementation, for example, the cameras 102, 103, the display device 101 and the processing device 104 are integrated into a single device. It shall also note that the two cameras can be mounted on other place other than the left-top and right-top of the TV, e.g. the left-bottom and right-bottom of the TV, or on the desk supporting the TV.

The functions of these components are as follow:

— The display device 101 is used to display information/prompt to user. The displayed information/prompt may relate to a later user instruction, the later user instruction is derived from user's two inputs by using infrared emitter. For example, the displayed information/prompt is a menu comprising 4 options for user to choose, and the Fig. 2 shows one possible implementation.

— The two cameras 102, 103 are used to receive and recognize the input of the infrared emitter. For example, the user holds the infrared emitter, with button being pressed, to stroke from left to right, and each camera will record the trajectory of the infrared emitter as shown in the Fig. 3. Preferably, in order to catch the infrared light of the infrared emitter more accurate, light filters can be attached to the surface of the lens of the cameras. Although the present embodiment uses two cameras, implementation of present invention only using one camera (either stereo camera or not) is also possible. Herein, an example of how to capture the infrared input by using a camera is introduced below: When a user presses the button of an infrared emitter, a camera will capture the original image as shown in the Fig. 6A. Base on this raw data of the original image, image processing is made to the raw data, for example: smooth as shown in the Fig. 6B, binary as in the Fig. 6C and got the gravity center of this image as in the Fig. 6D. Then we use this gravity center to represent the infrared point. It shall note above is just an example showing how to capture the infrared input, other input capture methods are also possible.

— The infrared emitter is used for emitting infrared light when a button thereon is pressed. When the button is kept pressed, the infrared light is kept on. So the user can use the infrared emitter to make gesture inputs with the button pressed. The gesture may comprises press, push, pull, stroke up, stroke down, stroke left, stroke right, stroke circle, stroke arc etc. Below shows an example of definitions for user gesture actions. Herein, we define horizontality as X-axis, verticality as Y-axis, and Z-axis is perpendicular to both X-axis and Y-axis.

Gesture Definition

Press Z-axis value have a little change, for example relative zero to 5 cm reduced.

Push Z-axis values have a big change, for example relative over

40cm reduced.

Pull Z-axis values have a big change, for example relative over

40cm increased.

Stroke Up Not push or pull gesture and Y-axis values have a bigger change than X-axis. The values are relative decreased.

Stroke Down Not push or pull gesture and Y-axis values have a bigger change than X-axis. The values are relative increased.

Stroke Left Not push or pull gesture and X-axis values have a bigger change than Y-axis. The values are relative decreased.

Stroke Right Not push or pull gesture and X-axis values have a bigger change than X-axis. The values are relative increased. Stroke Circle Not push or pull gesture. If we set one virtual coordinates base on the center of this gesture's image, the trace will cross with the four virtual axes.

Stroke Arc Not push or pull gesture. If we set on virtual coordinates system base on the center of this gesture's image, the trace will cross with three continuous virtual axes.

— The processing device 104 comprises functions of

1. generating menus for display on the display device 101 , each menu displayed on the display device 101 may comprise two or more buttons/ choices/ options for user to choose;

2. storing a mapping between computer comprehensible instructions and buttons/choices, taking the menu in the Fig. 2 as example, assuming that options 1 , 2, 3 and 4 separately represent instructions of RESTART, LOG OFF, STAND BY and SHUT DOWN. The processing device 104 stores commands for performing these actions, and stores a mapping for this menu;

3. determining which button/ choice/ option is selected by the user based on calculation of user's two consecutive inputs obtained by the two cameras. Details about the determination will be described below.

Fig. 5 is a flow chart illustrating a user input method according to the present embodiment.

— At the step 501 , a menu having two or more buttons is displayed on the display device 101. Two or more buttons are arranged in such a way that each button is put into a position around the center of the display area for the menu and the positions of all buttons are distinct from each other. Most times, the display are for the menu is the whole screen of the display device 101 , and consequently, the center of the display are for the menu is the center of the screen of display device 101 , but in a few cases, for example, as shown in the Fig. 4I, the display area for menu is a rectangle area on the right-top of the TV screen. Herein, the reason for making that "positions of all buttons are distinct from each other" is to make it easy and accurate for user to make gesture input in the air. The Fig. 4A, 4B, 4C, 4D, 4E, 4F, 4G, 4H and 41 shows some examples of such arrangement that make positions of all buttons distinct from each other. It shall note that the shapes of the buttons do no matter and the shape can be circle, rectangle, triangle, square etc. Preferably, the distinct positions comprise left-top, left-center, left-bottom, bottom-center, right-bottom, right-center, right-top and top-center relative to the center point of the display area for the menu. It shall note that under some circumstances that the display of the menu is redundant, for example, a default menu is set either by the manufacture or by the user, and the user knows the locations/ positions of all buttons of the default menu, thus he can perform the input without needing the menu being displayed on the display device 101.

—At the step 502, a first user input is obtained. For example, the user holds the infrared emitter, with button being pressed, to make a gesture of stroking from left to right. The purpose of the first input is to help both the user and the processing device 104 to obtain information about a center. The center is used for assisting the user to make a second input and assisting the processing device 104 to determine which button the user intends to press. Specifically, center of the first user input is defined as the reference center, with reference to which the second input is made. For the user, he needs to estimate the center of his first input. As for straight line, circle, arc line and dot input, it's easy to estimate a rough center. After he estimates the center of his first input, he is able to make reference to the estimated center and make a second gesture (e.g. a press action) in the spatial space corresponding to a button he wants to press. Still taking the menu in the Fig. 3 as example, assuming the option 1 is what the user wants to press, the user just needs to make the second gesture input in a spatial space locating left-top to the estimated center in the vertical plain. Although the center of the first input is roughly estimated, because the buttons/ options are arranged distinct in location from each other, it's hard for the user to make mistaken/ wrong input. And consequentially, the input accuracy can be guaranteed. For the processing device 104, the center of the first input is calculated by average value of all points that are collected by the two cameras. Here, the coordinates of the center of the first input is obtained as (Xcr, Ycr). Specifically, assuming a gesture of straight line is inputted. The start point of one gesture is where user press the button of infrared emitter, the stop point of one gesture is where user releases the button of infrared emitter. During the period of the one gesture, camera will collect the raw image of this infrared emitter's output infrared light. We use gravity center to represent every infrared point. And store these gravity centers to buffer. For every gesture, there will be two set of data collected by two cameras. We use the camera which collected more data as source. For example, left camera record N points in buffer ArrayL, and right camera record M points in buffer ArrayR. If N no less then M then we use ArrayL as source. The center can be calculated as below:

Although we use one camera collecting more data as source in this example, it shall note that a combination of data of both cameras as source is also possible.

— At the step 503, a second user input is obtained following the obtaining of the first input. Herein, in a preferably embodiment, a restriction on time can be added between the obtaining of the first input and the second input, i.e. if the second input is not obtained within a predefined time period, e.g. 5 second, after obtaining the first input, the second input is deemed invalid. As in this embodiment, the second input is a press action or a pull action. In addition, it shall note that second input can also be a short stroke, a small circle, and long stroke and big circle are also possible. We use its center for the determination step 504.

— At the step 504, the processing device 104 determines which button is chosen based on the first input and the second input. To be specific, in this example, it is determined based on the position relationship between the center of the first input and the occurrence position of the second input. Here, we assume a menu having four buttons as shown in the Fig. 2 is used. Therefore, we only need to determine the position relationship is which one in a group containing left-top, right-top, left-bottom and right-bottom. It can be easily realized in coordinates system. Assuming the camera's coordinates origin (0, 0) is the left-top corner as shown in the Fig. 7. After the first input gesture, we get the center (X0, Y0). If press action happened at (X1 , Y1 ), base on the related position with (X0, Y0) we can known the press action corresponds to which button. Taking Fig. 4F as another example, after obtaining the center of the first input and the occurrence position of the second input, the processing device 104 needs to determine the position relationship is which one in group containing left-top, center-top, right-top, right-center, right-bottom, center-bottom, left- bottom and left-center. It can be realized by averagely dividing the area into 9 blocks, wherein the center of the area and the center of the first input are overlapped.

After determination of the chosen button, the processing device 104 determines and executes the corresponding instruction based on mapping information between the buttons of the menu and computer comprehensible instructions.

According to a variant of present embodiment, the method of present invention is used for selecting a portion of an image, wherein, the image is divided into several selectable portions, and the division of image into portions can be referenced to the arrangement of the buttons as described above.

According to a variant of present embodiment, the gesture type of first input is fixed to a particular one, e.g. a left-to-right straight line, or a particular group of ones. The merit of usage of fixed gesture type is that it makes coexist of input method of present invention and traditional gesture input method possible, because the fixed gesture type differentiated from other being-used gesture is carefully selected to indicate the input method of present invention will be used.

According to the present embodiment, a computer comprehensible instruction is generated by two consecutive gesture inputs, i.e. a former input that is used to obtain the reference center and a latter input that is used to locate which button is chosen. However, according to a variant of present embodiment, it doesn't necessarily need the input of the former input. In one variant of present embodiment, the user wants the processing device to consecutively execute two or more instruction while he is at a same place, e.g. sitting on a sofa. He can make several gesture inputs, the first one of which is used as the former input and the succeeding ones are used as latter inputs. For example, when the user wants the computer to execute 4 instructions, he just needs to make 5 gesture inputs. According to another variant of present embodiment, the processing device uses the center of previous gesture input as the reference center when a user makes a gesture input. According to another variant of present embodiment, the menu is caused to be displayed upon a user's gesture input, for example, a pull gesture. We can store the occurrence position of the pull gesture input as the reference center during the life-span of this menu. Therefore, during the lifespan of the menu, only one gesture input is enough to cause the processing device to generate an instruction for execution.

According to a variant of present embodiment, a prompt that the second user input is expected to be inputted is displayed on the screen of the display device after the step 502. And in one example, the prompt may be displayed in top-center of the screen.

According to a variant of present embodiment, it's not limited to the infrared emitter. It can be extended to only use gesture recognition method. According to a variant of present embodiment, it's not limited to the stereo cameras. It can be extended to only use one camera which have built-in deep sensor.

The principle of the present invention is: use the relative positions of press gesture and other gesture's center to determine which event is triggered. It is appreciated that a person skilled in art can contemplate other variants or implementation after reading the description, and the variants and implementations shall falls in the scope of the principle of the present invention.

A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations shall fall in the scope of the invention.

Claims

1. A method for identifying user inputs in a system, characterized in that said system comprises a display device and at least one camera for capturing user inputs, and in that in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of said display device, the method comprising the steps of

obtaining a first user input;

obtaining a second user input; and

determining which choice among the set of the at least two explicit or implicit choices is chosen based on the position relationship between said first user input and said second user input.

2. The method of the claim 1 , characterized in that the determination step further comprises

determining a first center and a second center for said first user input and said second user input separately; and

determining which choice is chosen based on the position relationship between the first center and the second center.

3. The method of the claim 2, characterized in that the position relationship is one of a positioning group comprising left-top, left-center, left-bottom, right-top, right-center, right- bottom, top-center and bottom-center.

4. The method of the claim 3, characterized in that there exists a mapping indicating the position for each of said set of at least two explicit or implicit choices, wherein, each choice corresponds to at least one position of said positioning group, and none of said positioning group corresponds to two or more choices.

5. The method of the claim 1 , characterized in that after the determination step another set of at least two choices is displayed in a second portion of display area of said display device, wherein, the method further comprises obtaining a third user input; and

determining which choice among said another set of at least two choices is chosen based on position relationship between said first user input and said third user input.

6. The method of the claim 1 , characterized in that the choices are implicit, and the user know how the at least two choices are arranged in said first portion before making his inputs.

7. The method of the claim 1 , characterized in that the choices are explicit, and the method further comprises displaying said set of at least two explicit choices in said first portion of display area of said display device.

8. The method of the claim 1 , characterized in that the choice is a button that is corresponding to a particular instruction, and the method further comprises executing the instruction corresponding to the chosen button after the determination.

9. The method of the claim 1 , characterized in that the choice is a portion of an image for user to select.

10. The method of any of claims 1 to 9, characterized in that the user inputs comprise stroke, circle, arc, rectangle, triangle, square, press, pull and push.

11. A device for identifying user inputs, characterized by comprising

a communication module configured to received user inputs captured by at least one camera; and

a processing module configured to, in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of a display device, determine which choice is chosen based on position relationship between a first user input and a second user input, wherein said first user input and said second user input are obtained through said communication module.