WO2011134106A1 - Method and device for identifying user inputs - Google Patents

Method and device for identifying user inputs Download PDF

Info

Publication number
WO2011134106A1
WO2011134106A1 PCT/CN2010/000593 CN2010000593W WO2011134106A1 WO 2011134106 A1 WO2011134106 A1 WO 2011134106A1 CN 2010000593 W CN2010000593 W CN 2010000593W WO 2011134106 A1 WO2011134106 A1 WO 2011134106A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
user input
center
choices
input
Prior art date
Application number
PCT/CN2010/000593
Other languages
French (fr)
Other versions
WO2011134106A8 (en
Inventor
Peng Qin
Original Assignee
Thomson Licensing
Shangguan, Sinan
Du, Lin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing, Shangguan, Sinan, Du, Lin filed Critical Thomson Licensing
Priority to PCT/CN2010/000593 priority Critical patent/WO2011134106A1/en
Publication of WO2011134106A1 publication Critical patent/WO2011134106A1/en
Publication of WO2011134106A8 publication Critical patent/WO2011134106A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06F3/042Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means
    • G06F3/0425Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means using a single imaging device like a video camera for tracking the absolute position of a single or a plurality of objects with respect to an imaged reference surface, e.g. video camera imaging a display or a projection screen, a table or a wall surface, on which a computer generated image is displayed or projected
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text

Definitions

  • the present invention relates to user interface, and more particularly, relating to a method and a device for identifying user inputs.
  • the user-machine interface which is also called user interface (Ul)
  • U user interface
  • the Ul system comprises three main components, i.e. processing unit, display unit and input unit.
  • the TV is used as display device for displaying the electronic program guide (EPG)
  • the STB is used as the processing device for outputting EPG to the TV and processing user's inputs
  • the STB remote is used as input device.
  • the user uses the STB remote to input instruction, e.g. channel up/down, volume up/down, viewing next channel information etc., with the help of EPG displayed on the TV.
  • a method for identifying user inputs in a system comprising the steps of obtaining a first user input; obtaining a second user input; and determining which choice among the set of the at least two explicit or implicit choices is chosen based on the position relationship between said first user input and said second user input.
  • a device for identifying user inputs comprises a communication module configured to received user inputs captured by at least one camera; and a processing module configured to, in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of a display device, determine which choice is chosen based on position relationship between a first user input and a second user input, wherein said first user input and said second user input are obtained through said communication module.
  • the aspect of present invention provides an effective and efficient input method for a user without needing complex training each time the user changes his position.
  • Fig. 1 is a diagram illustrating a system for user inputs according to an embodiment of present invention
  • Fig. 2 is a diagram illustrating an example of a menu comprising 4 options according to the present embodiment
  • Fig. 3 is a diagram illustrating an example of a trajectory of the infrared emitter in a camera according to the present embodiment
  • Fig. 4A, 4B, 4C, 4D, 4E, 4F, 4G, 4H and 4I are diagrams illustrating examples of possible menus according to the present embodiment
  • Fig. 5 is a flow chart illustrating a user input method according to the present embodiment
  • Fig. 6A, 6B, 6C and 6D are screen snapshots illustrating an example of steps of getting gravity center from raw data when a user presses button of the infrared emitter in front of the cameras according to the present embodiment
  • Fig. 7 is a diagram illustrating an example of coordinates system, in which the center of the first input and the occurrence position of the second input are shown, according to the present embodiment.
  • the present invention aims to provide an input method for users, and make user feel more like operating an object in realistic world, for example, pressing a particular button among several buttons displayed on a display device, unlike the traditional operation in prior art that the user uses a remote to move selection pointer to a visual button on screen that he wants to execute and then presses 'OK' button on the remote.
  • the method of the present invention uses two consecutive inputs of a user for generating a computer-comprehensible instruction, wherein the latter input is recognized and converted to the instruction based on the former input, so that once a user moves his position relative to the display device, the accuracy of input is increased without need him to perform the complex training/recalibration process.
  • the system 100 comprises a display device 101 , two cameras 102, 103 mounted on the display device 101 , a processing device 104 and an infrared emitter (not shown).
  • the display device 101 is a PC display or a TV
  • the processing device is a PC main unit or a STB
  • the two cameras are mounted on the left-top and right-top of the TV, separately.
  • some components may be integrated into one device according to actual implementation, for example, the cameras 102, 103, the display device 101 and the processing device 104 are integrated into a single device.
  • the two cameras can be mounted on other place other than the left-top and right-top of the TV, e.g. the left-bottom and right-bottom of the TV, or on the desk supporting the TV.
  • the display device 101 is used to display information/prompt to user.
  • the displayed information/prompt may relate to a later user instruction, the later user instruction is derived from user's two inputs by using infrared emitter.
  • the displayed information/prompt is a menu comprising 4 options for user to choose, and the Fig. 2 shows one possible implementation.
  • the two cameras 102, 103 are used to receive and recognize the input of the infrared emitter.
  • the user holds the infrared emitter, with button being pressed, to stroke from left to right, and each camera will record the trajectory of the infrared emitter as shown in the Fig. 3.
  • light filters can be attached to the surface of the lens of the cameras.
  • the infrared emitter is used for emitting infrared light when a button thereon is pressed. When the button is kept pressed, the infrared light is kept on. So the user can use the infrared emitter to make gesture inputs with the button pressed.
  • the gesture may comprises press, push, pull, stroke up, stroke down, stroke left, stroke right, stroke circle, stroke arc etc.
  • horizontality as X-axis
  • Y-axis verticality
  • Z-axis is perpendicular to both X-axis and Y-axis.
  • Push Z-axis values have a big change, for example relative over
  • Pull Z-axis values have a big change, for example relative over
  • Stroke Up Not push or pull gesture and Y-axis values have a bigger change than X-axis. The values are relative decreased.
  • Stroke Down Not push or pull gesture and Y-axis values have a bigger change than X-axis.
  • the values are relative increased.
  • Stroke Left Not push or pull gesture and X-axis values have a bigger change than Y-axis. The values are relative decreased.
  • Stroke Right Not push or pull gesture and X-axis values have a bigger change than X-axis. The values are relative increased. Stroke Circle Not push or pull gesture. If we set one virtual coordinates base on the center of this gesture's image, the trace will cross with the four virtual axes.
  • Stroke Arc Not push or pull gesture. If we set on virtual coordinates system base on the center of this gesture's image, the trace will cross with three continuous virtual axes.
  • the processing device 104 comprises functions of
  • each menu displayed on the display device 101 may comprise two or more buttons/ choices/ options for user to choose;
  • Fig. 5 is a flow chart illustrating a user input method according to the present embodiment.
  • a menu having two or more buttons is displayed on the display device 101.
  • Two or more buttons are arranged in such a way that each button is put into a position around the center of the display area for the menu and the positions of all buttons are distinct from each other.
  • the display are for the menu is the whole screen of the display device 101 , and consequently, the center of the display are for the menu is the center of the screen of display device 101 , but in a few cases, for example, as shown in the Fig. 4I, the display area for menu is a rectangle area on the right-top of the TV screen.
  • the reason for making that "positions of all buttons are distinct from each other" is to make it easy and accurate for user to make gesture input in the air.
  • Fig. 4A, 4B, 4C, 4D, 4E, 4F, 4G, 4H and 41 shows some examples of such arrangement that make positions of all buttons distinct from each other.
  • the shapes of the buttons do no matter and the shape can be circle, rectangle, triangle, square etc.
  • the distinct positions comprise left-top, left-center, left-bottom, bottom-center, right-bottom, right-center, right-top and top-center relative to the center point of the display area for the menu.
  • a default menu is set either by the manufacture or by the user, and the user knows the locations/ positions of all buttons of the default menu, thus he can perform the input without needing the menu being displayed on the display device 101.
  • a first user input is obtained.
  • the user holds the infrared emitter, with button being pressed, to make a gesture of stroking from left to right.
  • the purpose of the first input is to help both the user and the processing device 104 to obtain information about a center.
  • the center is used for assisting the user to make a second input and assisting the processing device 104 to determine which button the user intends to press.
  • center of the first user input is defined as the reference center, with reference to which the second input is made. For the user, he needs to estimate the center of his first input. As for straight line, circle, arc line and dot input, it's easy to estimate a rough center.
  • the center of his first input After he estimates the center of his first input, he is able to make reference to the estimated center and make a second gesture (e.g. a press action) in the spatial space corresponding to a button he wants to press. Still taking the menu in the Fig. 3 as example, assuming the option 1 is what the user wants to press, the user just needs to make the second gesture input in a spatial space locating left-top to the estimated center in the vertical plain.
  • the center of the first input is roughly estimated, because the buttons/ options are arranged distinct in location from each other, it's hard for the user to make mistaken/ wrong input. And consequentially, the input accuracy can be guaranteed.
  • the center of the first input is calculated by average value of all points that are collected by the two cameras.
  • the coordinates of the center of the first input is obtained as (Xcr, Ycr).
  • the start point of one gesture is where user press the button of infrared emitter
  • the stop point of one gesture is where user releases the button of infrared emitter.
  • camera will collect the raw image of this infrared emitter's output infrared light.
  • gravity center to represent every infrared point. And store these gravity centers to buffer.
  • the center can be calculated as below:
  • a second user input is obtained following the obtaining of the first input.
  • a restriction on time can be added between the obtaining of the first input and the second input, i.e. if the second input is not obtained within a predefined time period, e.g. 5 second, after obtaining the first input, the second input is deemed invalid.
  • the second input is a press action or a pull action.
  • second input can also be a short stroke, a small circle, and long stroke and big circle are also possible. We use its center for the determination step 504.
  • the processing device 104 determines which button is chosen based on the first input and the second input. To be specific, in this example, it is determined based on the position relationship between the center of the first input and the occurrence position of the second input.
  • a menu having four buttons as shown in the Fig. 2 is used. Therefore, we only need to determine the position relationship is which one in a group containing left-top, right-top, left-bottom and right-bottom. It can be easily realized in coordinates system. Assuming the camera's coordinates origin (0, 0) is the left-top corner as shown in the Fig. 7. After the first input gesture, we get the center (X0, Y0).
  • the processing device 104 needs to determine the position relationship is which one in group containing left-top, center-top, right-top, right-center, right-bottom, center-bottom, left- bottom and left-center. It can be realized by averagely dividing the area into 9 blocks, wherein the center of the area and the center of the first input are overlapped.
  • the processing device 104 determines and executes the corresponding instruction based on mapping information between the buttons of the menu and computer comprehensible instructions.
  • the method of present invention is used for selecting a portion of an image, wherein, the image is divided into several selectable portions, and the division of image into portions can be referenced to the arrangement of the buttons as described above.
  • the gesture type of first input is fixed to a particular one, e.g. a left-to-right straight line, or a particular group of ones.
  • the merit of usage of fixed gesture type is that it makes coexist of input method of present invention and traditional gesture input method possible, because the fixed gesture type differentiated from other being-used gesture is carefully selected to indicate the input method of present invention will be used.
  • a computer comprehensible instruction is generated by two consecutive gesture inputs, i.e. a former input that is used to obtain the reference center and a latter input that is used to locate which button is chosen.
  • it doesn't necessarily need the input of the former input.
  • the user wants the processing device to consecutively execute two or more instruction while he is at a same place, e.g. sitting on a sofa. He can make several gesture inputs, the first one of which is used as the former input and the succeeding ones are used as latter inputs. For example, when the user wants the computer to execute 4 instructions, he just needs to make 5 gesture inputs.
  • the processing device uses the center of previous gesture input as the reference center when a user makes a gesture input.
  • the menu is caused to be displayed upon a user's gesture input, for example, a pull gesture. We can store the occurrence position of the pull gesture input as the reference center during the life-span of this menu. Therefore, during the lifespan of the menu, only one gesture input is enough to cause the processing device to generate an instruction for execution.
  • a prompt that the second user input is expected to be inputted is displayed on the screen of the display device after the step 502. And in one example, the prompt may be displayed in top-center of the screen.
  • the infrared emitter it's not limited to the infrared emitter. It can be extended to only use gesture recognition method. According to a variant of present embodiment, it's not limited to the stereo cameras. It can be extended to only use one camera which have built-in deep sensor.
  • the principle of the present invention is: use the relative positions of press gesture and other gesture's center to determine which event is triggered. It is appreciated that a person skilled in art can contemplate other variants or implementation after reading the description, and the variants and implementations shall falls in the scope of the principle of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)
  • Position Input By Displaying (AREA)

Abstract

It is provided a method for identifying user inputs in a system, wherein said system comprises a display device and at least one camera for capturing user inputs, and in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of said display device, the method comprising the steps of obtaining a first user input; obtaining a second user input; and determining which choice among the set of the at least two explicit or implicit choices is chosen based on the position relationship between said first user input and said second user input.

Description

METHOD AND DEVICE FOR IDENTIFYING USER INPUTS
TECHNICAL FIELD
The present invention relates to user interface, and more particularly, relating to a method and a device for identifying user inputs.
BACKGROUND
Many devices and methods are available for providing user-machine interface. The user-machine interface, which is also called user interface (Ul), enables the machine or device to obtain user's inputs, and executes instruction corresponding to the obtained inputs.
Generally, the Ul system comprises three main components, i.e. processing unit, display unit and input unit. Taking the Television with Set Top Box (STB) connected as example, the TV is used as display device for displaying the electronic program guide (EPG), the STB is used as the processing device for outputting EPG to the TV and processing user's inputs, and the STB remote is used as input device. The user uses the STB remote to input instruction, e.g. channel up/down, volume up/down, viewing next channel information etc., with the help of EPG displayed on the TV.
As the technology evolves, more and more Ul systems are likely to adopt the gesture recognition technologies as the user input method. This kind of input method makes user feel like operating a real object compared to the traditional user input method, so the user experience is increased. However, these Ul systems require user to make training or calibration before actual operation, and the training or calibration is usually time consuming. To make things worse, the change of user position in front of the Ul system may affect the accuracy of recognition of user input, which means once a user changes his position, he may have to carry out training program again.
Therefore, a new method for user inputs is desired. SUMMARY
According to an aspect of present invention, it is provided a method for identifying user inputs in a system, wherein said system comprises a display device and at least one camera for capturing user inputs, and in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of said display device, the method comprising the steps of obtaining a first user input; obtaining a second user input; and determining which choice among the set of the at least two explicit or implicit choices is chosen based on the position relationship between said first user input and said second user input.
According to another aspect of present invention, it is provided a device for identifying user inputs. The device comprises a communication module configured to received user inputs captured by at least one camera; and a processing module configured to, in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of a display device, determine which choice is chosen based on position relationship between a first user input and a second user input, wherein said first user input and said second user input are obtained through said communication module.
According to the aspect of present invention, it provides an effective and efficient input method for a user without needing complex training each time the user changes his position.
It is to be understood that more aspects and advantages of the invention will be found in the following detailed description of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide a further understanding of the, illustrate embodiments of the invention together with the description which serves to explain the principle of the invention. Therefore, the invention is not limited to the embodiments. In the drawings: Fig. 1 is a diagram illustrating a system for user inputs according to an embodiment of present invention;
Fig. 2 is a diagram illustrating an example of a menu comprising 4 options according to the present embodiment;
Fig. 3 is a diagram illustrating an example of a trajectory of the infrared emitter in a camera according to the present embodiment;
Fig. 4A, 4B, 4C, 4D, 4E, 4F, 4G, 4H and 4I are diagrams illustrating examples of possible menus according to the present embodiment;
Fig. 5 is a flow chart illustrating a user input method according to the present embodiment;
Fig. 6A, 6B, 6C and 6D are screen snapshots illustrating an example of steps of getting gravity center from raw data when a user presses button of the infrared emitter in front of the cameras according to the present embodiment;
Fig. 7 is a diagram illustrating an example of coordinates system, in which the center of the first input and the occurrence position of the second input are shown, according to the present embodiment.
DETAILED DESCRIPTION
An embodiment of the present invention will now be described in detail in conjunction with the drawings. In the following description, some detailed descriptions of known functions and configurations may be omitted for clarity and conciseness.
The present invention aims to provide an input method for users, and make user feel more like operating an object in realistic world, for example, pressing a particular button among several buttons displayed on a display device, unlike the traditional operation in prior art that the user uses a remote to move selection pointer to a visual button on screen that he wants to execute and then presses 'OK' button on the remote. The method of the present invention uses two consecutive inputs of a user for generating a computer-comprehensible instruction, wherein the latter input is recognized and converted to the instruction based on the former input, so that once a user moves his position relative to the display device, the accuracy of input is increased without need him to perform the complex training/recalibration process.
An embodiment of present invention is placed in a system 100 as shown in the Fig. 1. The system 100 comprises a display device 101 , two cameras 102, 103 mounted on the display device 101 , a processing device 104 and an infrared emitter (not shown). As an example, the display device 101 is a PC display or a TV, the processing device is a PC main unit or a STB, the two cameras are mounted on the left-top and right-top of the TV, separately. It shall note that some components may be integrated into one device according to actual implementation, for example, the cameras 102, 103, the display device 101 and the processing device 104 are integrated into a single device. It shall also note that the two cameras can be mounted on other place other than the left-top and right-top of the TV, e.g. the left-bottom and right-bottom of the TV, or on the desk supporting the TV.
The functions of these components are as follow:
— The display device 101 is used to display information/prompt to user. The displayed information/prompt may relate to a later user instruction, the later user instruction is derived from user's two inputs by using infrared emitter. For example, the displayed information/prompt is a menu comprising 4 options for user to choose, and the Fig. 2 shows one possible implementation.
— The two cameras 102, 103 are used to receive and recognize the input of the infrared emitter. For example, the user holds the infrared emitter, with button being pressed, to stroke from left to right, and each camera will record the trajectory of the infrared emitter as shown in the Fig. 3. Preferably, in order to catch the infrared light of the infrared emitter more accurate, light filters can be attached to the surface of the lens of the cameras. Although the present embodiment uses two cameras, implementation of present invention only using one camera (either stereo camera or not) is also possible. Herein, an example of how to capture the infrared input by using a camera is introduced below: When a user presses the button of an infrared emitter, a camera will capture the original image as shown in the Fig. 6A. Base on this raw data of the original image, image processing is made to the raw data, for example: smooth as shown in the Fig. 6B, binary as in the Fig. 6C and got the gravity center of this image as in the Fig. 6D. Then we use this gravity center to represent the infrared point. It shall note above is just an example showing how to capture the infrared input, other input capture methods are also possible.
— The infrared emitter is used for emitting infrared light when a button thereon is pressed. When the button is kept pressed, the infrared light is kept on. So the user can use the infrared emitter to make gesture inputs with the button pressed. The gesture may comprises press, push, pull, stroke up, stroke down, stroke left, stroke right, stroke circle, stroke arc etc. Below shows an example of definitions for user gesture actions. Herein, we define horizontality as X-axis, verticality as Y-axis, and Z-axis is perpendicular to both X-axis and Y-axis.
Gesture Definition
Press Z-axis value have a little change, for example relative zero to 5 cm reduced.
Push Z-axis values have a big change, for example relative over
40cm reduced.
Pull Z-axis values have a big change, for example relative over
40cm increased.
Stroke Up Not push or pull gesture and Y-axis values have a bigger change than X-axis. The values are relative decreased.
Stroke Down Not push or pull gesture and Y-axis values have a bigger change than X-axis. The values are relative increased.
Stroke Left Not push or pull gesture and X-axis values have a bigger change than Y-axis. The values are relative decreased.
Stroke Right Not push or pull gesture and X-axis values have a bigger change than X-axis. The values are relative increased. Stroke Circle Not push or pull gesture. If we set one virtual coordinates base on the center of this gesture's image, the trace will cross with the four virtual axes.
Stroke Arc Not push or pull gesture. If we set on virtual coordinates system base on the center of this gesture's image, the trace will cross with three continuous virtual axes.
— The processing device 104 comprises functions of
1. generating menus for display on the display device 101 , each menu displayed on the display device 101 may comprise two or more buttons/ choices/ options for user to choose;
2. storing a mapping between computer comprehensible instructions and buttons/choices, taking the menu in the Fig. 2 as example, assuming that options 1 , 2, 3 and 4 separately represent instructions of RESTART, LOG OFF, STAND BY and SHUT DOWN. The processing device 104 stores commands for performing these actions, and stores a mapping for this menu;
3. determining which button/ choice/ option is selected by the user based on calculation of user's two consecutive inputs obtained by the two cameras. Details about the determination will be described below.
Fig. 5 is a flow chart illustrating a user input method according to the present embodiment.
— At the step 501 , a menu having two or more buttons is displayed on the display device 101. Two or more buttons are arranged in such a way that each button is put into a position around the center of the display area for the menu and the positions of all buttons are distinct from each other. Most times, the display are for the menu is the whole screen of the display device 101 , and consequently, the center of the display are for the menu is the center of the screen of display device 101 , but in a few cases, for example, as shown in the Fig. 4I, the display area for menu is a rectangle area on the right-top of the TV screen. Herein, the reason for making that "positions of all buttons are distinct from each other" is to make it easy and accurate for user to make gesture input in the air. The Fig. 4A, 4B, 4C, 4D, 4E, 4F, 4G, 4H and 41 shows some examples of such arrangement that make positions of all buttons distinct from each other. It shall note that the shapes of the buttons do no matter and the shape can be circle, rectangle, triangle, square etc. Preferably, the distinct positions comprise left-top, left-center, left-bottom, bottom-center, right-bottom, right-center, right-top and top-center relative to the center point of the display area for the menu. It shall note that under some circumstances that the display of the menu is redundant, for example, a default menu is set either by the manufacture or by the user, and the user knows the locations/ positions of all buttons of the default menu, thus he can perform the input without needing the menu being displayed on the display device 101.
—At the step 502, a first user input is obtained. For example, the user holds the infrared emitter, with button being pressed, to make a gesture of stroking from left to right. The purpose of the first input is to help both the user and the processing device 104 to obtain information about a center. The center is used for assisting the user to make a second input and assisting the processing device 104 to determine which button the user intends to press. Specifically, center of the first user input is defined as the reference center, with reference to which the second input is made. For the user, he needs to estimate the center of his first input. As for straight line, circle, arc line and dot input, it's easy to estimate a rough center. After he estimates the center of his first input, he is able to make reference to the estimated center and make a second gesture (e.g. a press action) in the spatial space corresponding to a button he wants to press. Still taking the menu in the Fig. 3 as example, assuming the option 1 is what the user wants to press, the user just needs to make the second gesture input in a spatial space locating left-top to the estimated center in the vertical plain. Although the center of the first input is roughly estimated, because the buttons/ options are arranged distinct in location from each other, it's hard for the user to make mistaken/ wrong input. And consequentially, the input accuracy can be guaranteed. For the processing device 104, the center of the first input is calculated by average value of all points that are collected by the two cameras. Here, the coordinates of the center of the first input is obtained as (Xcr, Ycr). Specifically, assuming a gesture of straight line is inputted. The start point of one gesture is where user press the button of infrared emitter, the stop point of one gesture is where user releases the button of infrared emitter. During the period of the one gesture, camera will collect the raw image of this infrared emitter's output infrared light. We use gravity center to represent every infrared point. And store these gravity centers to buffer. For every gesture, there will be two set of data collected by two cameras. We use the camera which collected more data as source. For example, left camera record N points in buffer ArrayL, and right camera record M points in buffer ArrayR. If N no less then M then we use ArrayL as source. The center can be calculated as below:
Figure imgf000009_0001
Although we use one camera collecting more data as source in this example, it shall note that a combination of data of both cameras as source is also possible.
— At the step 503, a second user input is obtained following the obtaining of the first input. Herein, in a preferably embodiment, a restriction on time can be added between the obtaining of the first input and the second input, i.e. if the second input is not obtained within a predefined time period, e.g. 5 second, after obtaining the first input, the second input is deemed invalid. As in this embodiment, the second input is a press action or a pull action. In addition, it shall note that second input can also be a short stroke, a small circle, and long stroke and big circle are also possible. We use its center for the determination step 504.
— At the step 504, the processing device 104 determines which button is chosen based on the first input and the second input. To be specific, in this example, it is determined based on the position relationship between the center of the first input and the occurrence position of the second input. Here, we assume a menu having four buttons as shown in the Fig. 2 is used. Therefore, we only need to determine the position relationship is which one in a group containing left-top, right-top, left-bottom and right-bottom. It can be easily realized in coordinates system. Assuming the camera's coordinates origin (0, 0) is the left-top corner as shown in the Fig. 7. After the first input gesture, we get the center (X0, Y0). If press action happened at (X1 , Y1 ), base on the related position with (X0, Y0) we can known the press action corresponds to which button. Taking Fig. 4F as another example, after obtaining the center of the first input and the occurrence position of the second input, the processing device 104 needs to determine the position relationship is which one in group containing left-top, center-top, right-top, right-center, right-bottom, center-bottom, left- bottom and left-center. It can be realized by averagely dividing the area into 9 blocks, wherein the center of the area and the center of the first input are overlapped.
After determination of the chosen button, the processing device 104 determines and executes the corresponding instruction based on mapping information between the buttons of the menu and computer comprehensible instructions.
According to a variant of present embodiment, the method of present invention is used for selecting a portion of an image, wherein, the image is divided into several selectable portions, and the division of image into portions can be referenced to the arrangement of the buttons as described above.
According to a variant of present embodiment, the gesture type of first input is fixed to a particular one, e.g. a left-to-right straight line, or a particular group of ones. The merit of usage of fixed gesture type is that it makes coexist of input method of present invention and traditional gesture input method possible, because the fixed gesture type differentiated from other being-used gesture is carefully selected to indicate the input method of present invention will be used.
According to the present embodiment, a computer comprehensible instruction is generated by two consecutive gesture inputs, i.e. a former input that is used to obtain the reference center and a latter input that is used to locate which button is chosen. However, according to a variant of present embodiment, it doesn't necessarily need the input of the former input. In one variant of present embodiment, the user wants the processing device to consecutively execute two or more instruction while he is at a same place, e.g. sitting on a sofa. He can make several gesture inputs, the first one of which is used as the former input and the succeeding ones are used as latter inputs. For example, when the user wants the computer to execute 4 instructions, he just needs to make 5 gesture inputs. According to another variant of present embodiment, the processing device uses the center of previous gesture input as the reference center when a user makes a gesture input. According to another variant of present embodiment, the menu is caused to be displayed upon a user's gesture input, for example, a pull gesture. We can store the occurrence position of the pull gesture input as the reference center during the life-span of this menu. Therefore, during the lifespan of the menu, only one gesture input is enough to cause the processing device to generate an instruction for execution.
According to a variant of present embodiment, a prompt that the second user input is expected to be inputted is displayed on the screen of the display device after the step 502. And in one example, the prompt may be displayed in top-center of the screen.
According to a variant of present embodiment, it's not limited to the infrared emitter. It can be extended to only use gesture recognition method. According to a variant of present embodiment, it's not limited to the stereo cameras. It can be extended to only use one camera which have built-in deep sensor.
The principle of the present invention is: use the relative positions of press gesture and other gesture's center to determine which event is triggered. It is appreciated that a person skilled in art can contemplate other variants or implementation after reading the description, and the variants and implementations shall falls in the scope of the principle of the present invention.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations shall fall in the scope of the invention.

Claims

1. A method for identifying user inputs in a system, characterized in that said system comprises a display device and at least one camera for capturing user inputs, and in that in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of said display device, the method comprising the steps of
obtaining a first user input;
obtaining a second user input; and
determining which choice among the set of the at least two explicit or implicit choices is chosen based on the position relationship between said first user input and said second user input.
2. The method of the claim 1 , characterized in that the determination step further comprises
determining a first center and a second center for said first user input and said second user input separately; and
determining which choice is chosen based on the position relationship between the first center and the second center.
3. The method of the claim 2, characterized in that the position relationship is one of a positioning group comprising left-top, left-center, left-bottom, right-top, right-center, right- bottom, top-center and bottom-center.
4. The method of the claim 3, characterized in that there exists a mapping indicating the position for each of said set of at least two explicit or implicit choices, wherein, each choice corresponds to at least one position of said positioning group, and none of said positioning group corresponds to two or more choices.
5. The method of the claim 1 , characterized in that after the determination step another set of at least two choices is displayed in a second portion of display area of said display device, wherein, the method further comprises obtaining a third user input; and
determining which choice among said another set of at least two choices is chosen based on position relationship between said first user input and said third user input.
6. The method of the claim 1 , characterized in that the choices are implicit, and the user know how the at least two choices are arranged in said first portion before making his inputs.
7. The method of the claim 1 , characterized in that the choices are explicit, and the method further comprises displaying said set of at least two explicit choices in said first portion of display area of said display device.
8. The method of the claim 1 , characterized in that the choice is a button that is corresponding to a particular instruction, and the method further comprises executing the instruction corresponding to the chosen button after the determination.
9. The method of the claim 1 , characterized in that the choice is a portion of an image for user to select.
10. The method of any of claims 1 to 9, characterized in that the user inputs comprise stroke, circle, arc, rectangle, triangle, square, press, pull and push.
11. A device for identifying user inputs, characterized by comprising
a communication module configured to received user inputs captured by at least one camera; and
a processing module configured to, in response to a set of at least two explicit or implicit choices arranged in a first portion of display area of a display device, determine which choice is chosen based on position relationship between a first user input and a second user input, wherein said first user input and said second user input are obtained through said communication module.
PCT/CN2010/000593 2010-04-29 2010-04-29 Method and device for identifying user inputs WO2011134106A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2010/000593 WO2011134106A1 (en) 2010-04-29 2010-04-29 Method and device for identifying user inputs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2010/000593 WO2011134106A1 (en) 2010-04-29 2010-04-29 Method and device for identifying user inputs

Publications (2)

Publication Number Publication Date
WO2011134106A1 true WO2011134106A1 (en) 2011-11-03
WO2011134106A8 WO2011134106A8 (en) 2012-01-19

Family

ID=44860729

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/000593 WO2011134106A1 (en) 2010-04-29 2010-04-29 Method and device for identifying user inputs

Country Status (1)

Country Link
WO (1) WO2011134106A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0981362A (en) * 1995-09-12 1997-03-28 Casio Comput Co Ltd Data input device
JP2006139615A (en) * 2004-11-12 2006-06-01 Access Co Ltd Display device, menu display program, and tab display program
US20090289904A1 (en) * 2008-05-20 2009-11-26 Tae Jin Park Electronic device with touch device and method of executing functions thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0981362A (en) * 1995-09-12 1997-03-28 Casio Comput Co Ltd Data input device
JP2006139615A (en) * 2004-11-12 2006-06-01 Access Co Ltd Display device, menu display program, and tab display program
US20090289904A1 (en) * 2008-05-20 2009-11-26 Tae Jin Park Electronic device with touch device and method of executing functions thereof

Also Published As

Publication number Publication date
WO2011134106A8 (en) 2012-01-19

Similar Documents

Publication Publication Date Title
US9329714B2 (en) Input device, input assistance method, and program
JP6390799B2 (en) Input device, input method, and program
WO2019033957A1 (en) Interaction position determination method and system, storage medium and smart terminal
WO2011142317A1 (en) Gesture recognition device, method, program, and computer-readable medium upon which program is stored
EP2040156A2 (en) Image processing
JP6344530B2 (en) Input device, input method, and program
JP6062416B2 (en) Information input device and information display method
US9804667B2 (en) Electronic apparatus
US10713488B2 (en) Inspection spot output apparatus, control method, and storage medium
US20140132725A1 (en) Electronic device and method for determining depth of 3d object image in a 3d environment image
US11199946B2 (en) Information processing apparatus, control method, and program
US9400575B1 (en) Finger detection for element selection
JP5627314B2 (en) Information processing device
KR101807516B1 (en) Apparatus And Method Controlling Digital Device By Recognizing Motion
KR101321274B1 (en) Virtual touch apparatus without pointer on the screen using two cameras and light source
WO2021004413A1 (en) Handheld input device and blanking control method and apparatus for indication icon of handheld input device
US9489077B2 (en) Optical touch panel system, optical sensing module, and operation method thereof
EP3088991B1 (en) Wearable device and method for enabling user interaction
JP5646532B2 (en) Operation input device, operation input method, and program
WO2011134106A1 (en) Method and device for identifying user inputs
JP5080409B2 (en) Information terminal equipment
JP6686319B2 (en) Image projection device and image display system
JP2018097443A (en) Input system and input program
KR101272458B1 (en) virtual touch apparatus and method without pointer on the screen
JP5645530B2 (en) Information processing apparatus and control method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10850439

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10850439

Country of ref document: EP

Kind code of ref document: A1