WO2021134795A1

WO2021134795A1 - Handwriting recognition of hand motion without physical media

Info

Publication number: WO2021134795A1
Application number: PCT/CN2020/070329
Authority: WO
Inventors: Wenhui Jia; Zhenhua LAI; Elizabeth T. EDWARD; Ye Chen
Original assignee: Byton Limited
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2021-07-08

Abstract

Embodiments are disclosed of a process for handwriting recognition. The process includes capturing a raw data set of three-dimensional coordinates of a moving tracking point on a physical object. The raw data set representing a trace of the movement of the tracking point through three-dimensional space. The process also includes forming a reduced data set of three-dimensional coordinates by removing outliers from the raw data set of three-dimensional coordinates; forming a projected data set by projecting the reduced data set of three-dimensional coordinates onto an imaging surface, the projected data set representing a two-dimensional projection of the trace of the movement of the tracking point onto the imaging surface; and transmitting the projected data set to a handwriting recognition module for recognition of a letter or word formed by the set of two-dimensional coordinates in the projected data set. Other embodiments are disclosed and claimed.

Description

HANDWRITING RECOGNITION OF HAND MOTION WITHOUT PHYSICAL MEDIA

TECHNICAL FIELD

The disclosed embodiments relate generally to handwriting recognition and in particular, but not exclusively, to handwriting recognition that does not use a physical writing medium but instead recognizes hand motions without the need for a writing medium.

BACKGROUND

Fig. 1 illustrates an embodiment 100 of traditional handwriting. In traditional handwriting, a hand 102 grasps a stylus 104 and positions it so that its tip 106 is in contact with a writing medium 108. While grasping stylus 104 this way, hand 102 moves tip 106 across the media 108 to form drawings, characters (e.g., Chinese, Japanese, or Korean characters) , letters and numbers, symbols, or other things on the medium. Em-bodiments in which stylus 104 is a pen or pencil and medium 108 is paper are essentially the way handwriting has been done for centuries.

More recently, handwriting has begun to be captured electronically. This electronic capture usually takes one of two forms. In the first form, handwriting is written on paper the traditional way and the paper is then electronically scanned. Once scanned, the electronic form of the writing can either retain the form of the original handwriting (e.g., a bitmap or image of the paper) , or can be subjected to techniques such as optical character recognition (OCR) to turn the handwriting into computer-editable letters or words.

In the second form of electronic handwriting, instead of paper, medium 108 can be an elec-tronic device such as a display or tablet, and stylus 104 can be a special stylus whose tip 106 is moved over the surface of the tablet or display. Some tablets dispense with the need for a separate stylus; in these tablets the writer can instead use a fingertip, such as tip 110 of their index finger, to write directly on the tablet surface. Whether it uses a stylus or a fingertip, this form of electronic handwriting directly generates electronic versions of handwritten letters, numbers, characters, or drawings without the need for first putting them on paper. And as with writing on paper, the directly-generated electronic handwriting can either retain its original handwritten form (e.g., a bitmap or image) , or can be subjected to techniques such as optical character recognition (OCR) to turn the writing into computer-editable data.

But whether medium 108 is paper or electronic, and whether a stylus or a fingertip is used, all these handwriting methods have one thing in common: the writing is always two-dimensional. In every case stylus tip 106-or fingertip 110 in electronic embodiments that don’ t use a stylus-must be in contact with medium 108. This need for contact between stylus tip 106 and medium 108, or fingertip 110 and medium 108, constrains tip 106 to the two-dimensional surface of medium 108, and also constrains tip 106 within the bounds of medium 108. Another constraint associated with traditional handwriting is that the user must be able to find medium 108, and must hold onto it or otherwise secure it while writing on it.

SUMMARY

Embodiments are disclosed of an apparatus for handwriting recognition. The apparatus in-cludes a camera or an array of cameras capable of capturing three-dimensional images. A computer includes a communication interface, a processor coupled to the communication interface, and memory and storage cou-pled to the processor. The computer’s communication interface is communicatively coupled to the camera or array of cameras. A trace processing module runs on the computer, where it executes instructions that cause the computer to use the camera or the array of cameras to capture a raw data set of three-dimensional coordi-nates of a moving tracking point on a physical object. The raw data set represents a trace of the movement of the tracking point through three-dimensional space;

The trace processing module forms a reduced data set of three-dimensional coordinates by removing outliers from the raw data set of three-dimensional coordinates. It then forms a projected data set by projecting the reduced data set of three-dimensional coordinates onto an imaging surface, where the projected data set represents a two-dimensional projection of the trace of the movement of the tracking point onto the imaging surface. And the trace processing module passes the projected data set to a handwriting recognition module for recognition of a letter or word formed by the set of two-dimensional coordinates in the projected data set.

Embodiments are disclosed of a system including a vehicle and a handwriting system posi-tioned in the vehicle. The handwriting system includes a camera or an array of cameras capable of capturing three-dimensional images. A computer includes a communication interface, a processor coupled to the com-munication interface, and memory and storage coupled to the processor. The communication interface is com-municatively coupled to the camera or array of cameras. A trace processing module runs on the computer, where it executes instructions that cause the computer to use the camera or the array of cameras to capture a raw data set of three-dimensional coordinates of a moving tracking point on a physical object. The raw data set represents a trace of the movement of the tracking point through three-dimensional space;

One or more vehicle systems are coupled via the communication interface to the handwriting system so that a letter or word recognized by the handwriting recognition module can be transmitted as an input to one or more vehicle systems.

Embodiments are disclosed of a process for handwriting recognition. The process includes capturing a raw data set of three-dimensional coordinates of a moving tracking point on a physical object. The raw data set represents a trace of the movement of the tracking point through three-dimensional space. The process also includes forming a reduced data set of three-dimensional coordinates by removing outliers from the raw data set of three-dimensional coordinates; forming a projected data set by projecting the reduced data set of three-dimensional coordinates onto an imaging surface, the projected data set representing a two-dimen-sional projection of the trace of the movement of the tracking point onto the imaging surface; and transmitting the projected data set to a handwriting recognition module for recognition of a letter or word formed by the set of two-dimensional coordinates in the projected data set.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive embodiments of the present invention are described with reference to the following figures, wherein like reference numerals refer to like parts in all the drawings and views unless otherwise specified.

Fig. 1 is a view of an embodiment of prior art handwriting on a physical medium.

Fig. 2 is a block diagram of an embodiment of a system for recognizing handwriting without use of, or contact with, a physical medium.

Figs. 3A–3B are diagrams illustrating different embodiments of physical objects and their potential tracking points.

Figs. 4A–4B are diagrams illustrating embodiments of using perspective projection to deter-mine an imaging surface.

Fig. 5 is a flowchart illustrating an embodiment of the operation of the handwriting system of Fig. 2.

Fig. 6 is a flowchart illustrating another embodiment of the operation of the handwriting sys-tem of Fig. 2.

Fig. 7 is a flowchart illustrating another embodiment of the operation of the handwriting sys-tem of Fig. 2.

Fig. 8 is a diagram showing an embodiment of the handwriting system of Fig. 2 implemented in a vehicle.

DETAILED DESCRIPTION

Embodiments of an apparatus, system, and process are described below for recognizing hand-writing implemented by three-dimensional motion of a hand or other physical object through the air. In other words, the disclosed handwriting recognition requires no two-dimensional physical medium on which to write. The embodiments capture a raw data set of three-dimensional coordinates of a moving tracking point on a physical object. The raw data set thus represents a trace of the movement of the tracking point through three-dimensional space. A reduced data set of three-dimensional coordinates is then formed by removing outliers from the raw data set. A projected data set is then formed by projecting the reduced data set of three-dimen-sional coordinates onto an imaging surface. The projected data set represents a two-dimensional projection of the trace of the movement of the tracking point onto the imaging surface. The projected data set is then trans-mitted to a handwriting recognition module for recognition of a letter or word formed by the set of two-dimen-sional coordinates in the projected data set.

Fig. 2 illustrates an embodiment of a handwriting system 200 the neither needs nor requires any medium, two-dimensional or otherwise, on which to write; handwriting system 200 can be described as a system for “writing in the air. ”

System 200 includes a computer 202 having a processor 205 coupled to a communication interface 206 through which the computer can exchange commands and data with one or more eternal sensors such as cameras 204. Although not shown in the drawing, computer 202 also includes other components typi-cally associated with the computer, such as memory and storage. Communication interface 206 also couples computer 202 to a user display 218 as well as sensing elements such as microphone 220, stylus 222, and physical button 224. Communication interface 206 also couples computer 202 to various further systems such as systems 1–3.

Sensors are used to capture the three-dimensional trace, or trajectory, of a tracking point on a physical object. Different embodiments of system 200 can use different types of sensors, and different numbers of sensors, to capture the three-dimensional trace of the tracking point. In the illustrated embodiment the sen-sors are cameras 204, which can include an array of three individual cameras 204a–204c, but other embodi-ments can use more or less cameras than shown. Some embodiments can use a single camera, camera 204b for instance, depending on the type of camera used. Each camera has a field of view, shown cross-hatched in the figure; when there is an array of multiple cameras, the field of view is defined by the overlapping fields of view of the multiple cameras. Generally, the one or more cameras 204 are positioned so that a tracking point on a physical object is within the fields of view of however many cameras are needed to capture three-dimen-sional data. In some embodiments, three-dimensional data can be captured by a single camera, for instance if the camera is a time-of-flight (TOF) camera or a structured-light camera that can capture three-dimensional information on its own. In embodiments where each individual camera can only capture two-dimensional in-formation, multiple two-dimensional cameras can be used to form an array of cameras that simulate a stereo-scopic camera that can capture three-dimensional images. In embodiments with

multiple cameras

204a, 204b, 204c, the cameras can all be positioned on the same plane to simplify the camera design, as well as to simplify any required parallax calculations.

In the illustrated embodiment the physical object is a human hand 201 and the tracking point-that is, the point on hand 201 whose trace or trajectory will be tracked by the cameras-is the tip 203 of the index finger. In other embodiments other parts of hand 201 could function as the tracking point (see Fig. 3A) , for example, other fingers. Generally, tracking point 203 can move three-dimensionally without restriction, as long as it remains within the field of and depth of field of the camera or camera array. In one embodiment, cameras 204-whether a single 3-D camera or an array of multiple cameras that provide 3-D information, can generate a set of three-dimensional coordinates for each point in the trace or trajectory of tracking point 203 as it moves three-dimensionally within the field of view and depth of field of the cameras. The three-dimen-sional trace can be characterized by a set of three-dimensional coordinates indicating the position of the track-ing point at different times. In one embodiment, each point in the tracking point’s trace can be characterized by a set of three coordinates: (x, y, z) in a Cartesian coordinate system, (r, θ, z) in a cylindrical coordinate system, and so on.

A transceiver 208 that can transmit, receive, or both transmit and receive signals, either by wire or wirelessly, is coupled to communication interface 206. Additional sensing elements can be coupled to computer 202 to provide functions supporting system 200. In the illustrated embodiment a microphone 220, a physical button 224, and a stylus 222 can be coupled to computer 202. In the illustrated embodiment micro-phone 220 and physical button 224 can be directly hard wired to communication interface 206, while stylus 222 can communicate with computer 202 wirelessly via transceiver 208. These three elements-microphone 220, stylus 222, and physical button 224-can be used in various embodiments of the invention to signal the start, the end, or both the start and the end of a data collection cycle. In one embodiment, for instance, micro-phone 220 can be used to pick up a start trigger that starts capture of trace information by cameras 204, and also to pick up an end trigger that ends collection of the trace information.

In one embodiment, one or both of the start and end triggers can be sounds-for instance, specific words or phrases (e.g., “start writing” and “end writing” ) uttered by the user or someone else-picked up by microphone 220 and processed for recognition by sound recognition module 210. In one embodiment, the start trigger and end trigger sounds can be the same sound, but in another embodiment the start trigger and the end trigger need not be the same sound. In another embodiment one or both of the start and end triggers can be the activation of button 224; button 224 can be a physical button activated by the user by physically pressing it, or it can be a virtual button displayed on a screen and activated by touching the screen or by pointing at the virtual button on the screen with a finger or a stylus. In still another embodiment, one or both of the start and end triggers can be a wired or wireless electronic signal created by pressing a button on stylus 222 which can include a transmitter to transmit the wired or wireless signal to transceiver 208. In yet another embodiment, one or both of the start and end triggers can be contextual. For instance, if the user invokes a function, or a sub-function within another function, that uses text as input-e.g., text entry of addresses in a vehicle navigation system-then invoking the function or sub-function (i.e., the context) automatically triggers the start of 3-D data collection and terminating the function triggers the end of 3-D data collection. Other embodiments can also mix any of the above start and end triggers; one embodiment, for instance, might use a contextual start trigger and a button, physical or virtual, as an end trigger.

Processor 203 is also coupled to communication interface 206 and can run various modules such as a sound recognition module 210, a gesture recognition module 212, a trace/image processing module 214, and a handwriting recognition module 216. Modules 210–216 perform different functions during opera-tion of system 200. Among other things:

- In an embodiment of system 200 where sounds are used as start and end triggers to start and end data collection by cameras 204, sound recognition module 210 can be used to recognize one or more sounds received from microphone 220 signaling the start and end of data collection.

- In an embodiment of system 200 in which the physical object is a human hand, gesture recognition module 212 can be used to identify the tracking point on the human hand, such as a fingertip in one embodiment. In embodiments where the physical object is a human hand, gesture recognition module 212 can also be used to recognize gestures, motions, or transitions from one gesture to another, that trigger the start and end of image collection by cameras 204. Software to implement gesture recognition is commercially avail-able from vendors such as Sony Depthsensing Solutions (formerly Softkinetic) of Belgium, a subsidiary of Sony Corp. of Tokyo, Japan.

- Trace/image processing module 214 processes the raw three-dimensional data set received from cameras 204 that tracks the trace or trajectory of tracking point 203 as it moves through three-dimensional space. Among other things, trace/image processing module 214 can create a reduced three-dimensional data set by filtering the raw three-dimensional data to remove outliers, and can form a projected data set by per-forming a perspective projection to project the reduced 3-D data set. The operation of trace/image pro-cessing module 214 is further described below in connection with Figs. 4A–7.

- Finally, after the trace/image data has been processed by trace/imaging processing module 214, the pro-jected data set can be passed to handwriting recognition module 216 so that the letter, number, character, symbol, etc., traced out three-dimensionally by tracking point 203 can be recognized and turned into elec-tronic form. Software to implement the handwriting recognition module is commercially available from vendors such as MyScript of Nantes, France, or Jietong (Beijing Jietong Huasheng Technology Co., Ltd. ) of Beijing, China.

Although in the illustrated embodiments shows modules 210–216 running on a single proces-sor 205, in other embodiments computer 202 can include more than one single processor, and the different module shown can run on different processors or combinations of processors, including each module running on its own processor.

A flat or curved user display 218 is also coupled to computer 202 via communication interface 206. User display 218 provides real-time or near-real-time feedback to the user as cameras 204 capture the trajectory of tracking point 203. For instance, a user will move tracking point 203 in such a way as to spell out a letter, word, character, etc. As the user moves the tracking point through space and cameras 204 pick up its three-dimensional trajectory, a two-dimensional projection of the trajectory can be shown on user display 218 to provide feedback. In an alternative embodiment the physical object can be a stylus and the tracking point can be the tip of the stylus (see Fig. 3B) .

Systems 1–3 can be further systems coupled to computer 202 so that they can be controlled by input created by recognition of the handwriting spelled out by tracking point 203. In a vehicle embodiment, for example, systems 1–3 can be one or more of a navigation system, a stereo system, a messaging system, a car status and maintenance system, and the like.

Figs. 3A–3B illustrate embodiments of potential physical objects and tracking points. Fig. 3A illustrates an embodiment in which the physical object is a human hand 300 and the tracking point is the tip 302 of the hand’s index finger. In other embodiments, other tracking points on hand 300 can be used, for instance the tips of other fingers besides the index finger, a knuckle 304, or some other point on the hand. Fig. 3B illustrates an embodiment in which the physical object is a stylus 352 and the tracking point is the stylus tip 354. In an embodiment using a stylus, tracking point 354 can be recognized by the system using a known pattern on tracking point 354. Also in an embodiment in which the physical object is a stylus, the stylus 352 can have a transmitter (not shown) and have a button 356 which, when pressed, activates the transmitter to transmit a signal to transceiver 208 to trigger the start and end of a collection cycle-that is, the time during which trace or trajectory data for the tracking point is collected by cameras 204.

Fig. 4A illustrates an embodiment of perspective projection 400. For simplicity, the perspec-tive projection is illustrated for an embodiment with a single camera 204b such as a time-of-flight camera, but the projection would be similar for an embodiment with an array of two or more cameras.

In perspective projection 400, camera 204b has an optical axis 401. As tracking point 203 traces letters, numbers, symbols, characters, etc., in the air, it would ideally be confined in a two-dimensional plane normal to optical axis 401, such as plane 402. Plane 402 can be characterized by its normal vector n1; saying that plane 402 is normal to optical axis 401 is equivalent to saying that normal vector n1 is parallel to the optical axis. If that were to happen, camera 204 could simply capture two-dimensional data of the trace of tracking point 203 in plane 402 and directly use that 2-D data for handwriting recognition. But tracking point 203 is not confined in plane 402 and instead moves three-dimensionally, so that its trace or trajectory through three-dimensional space can be represented by a data set of three-dimensional points 404. In one embodiment, each point 404 can be characterized by a set of three coordinates: (x, y, z) in a Cartesian coordinate system, (r, θ, z) in a cylindrical coordinate system, etc. Together, the data set of points 404 form what is sometimes known as a “point cloud. ”

Given the three-dimensional distribution of points 404, imaging plane 402 might not be the best plane for obtaining a two-dimensional representation of the point cloud that can then be used for hand-writing recognition. It can therefore be beneficial to compute a different imaging surface onto which points 404 can be projected so that a better two-dimensional representation can be obtained. In the illustrated embod-iment the imaging surface is an imaging plane 406 that is then equivalent to the imaging plane of a virtual camera 408 that is in a different position than actual camera 204b-i.e., normal vector n2 of imaging plane 406 is parallel to optical axis 410 of virtual camera 408. Imaging plane 406 can be rotated, translated, or both rotated and translated, relative to imaging plane 402. This is known in the computer science and computer graphics arts as a perspective projection.

Imaging plane 406, as characterized for instance by its normal vector n2, can be computed or determined differently in different embodiments. In one embodiment, imaging plane 406 can be computed by applying a least-squares fit to the points in the data set of points 404. But other embodiments can use other methods, such as singular point decomposition or principal component analysis. Still other embodiments can use a K-means clustering method to define three centroids of the data set of points 404. Because three points define a plane, the three centroids then define an imaging plane. Methods such as K-means clustering are less computationally demanding than some of the other methods, so that K-means clustering can be useful in an iterative determination of the imaging plane where the calculation is repeated several times.

Fig. 4B illustrates another embodiment of perspective projection 450. Perspective projection 450 is in most respects similar to perspective projection 400; the primary difference is the shape of the imaging surface. In the embodiment of perspective projection 400, the imaging surface is a plane 406. In other embod-iments the imaging surface need not be a plane but can instead be an arbitrary surface 452, with the shape of surface 452 selected depending on the application. For instance, if user display 218 is curved a user might naturally be inclined to follow the curvature of the display with their finger. As a result, it might be advanta-geous, and might result in a better handwriting recognition and display quality, if the imaging surface is curved like imaging surface 452 instead of planar like imaging surface 406. In one embodiment a curved imaging surface 452 can be computed by fitting a surface of a prescribed mathematical form to the data set of points 404. As with imaging plane 406, imaging surface 452 can be rotated, translated, or both rotated and translated, relative to imaging plane 402. Because imaging surface 452 is not planar it has multiple normal vectors n2, such that imaging surface 452 is equivalent to the imaging planes of multiple virtual cameras 408, all except one of which are in different positions than actual camera 204b.

Current handwriting recognition programs are based on 2-D input, because existing handwrit-ing devices are all 2-D devices. In embodiments where 2-D input is needed for the handwriting recognition program, 2-D data can be extracted from non-planar imaging surface 452 by, for example, by mapping the projections of points 404 onto surface 452 using a surface-conforming coordinate system. But in embodiments using a handwriting recognition program that can accept 3-D input, the projections of points 404 onto surface 452 could be used directly without the need for further mapping into a two-dimensional coordinate system such as a surface-conforming coordinate system. With handwriting recognition programs that accept 3-D input it might also be possible to skip projection onto surface 452, or any other surface, entirely and simply use the raw 3-D data set.

Fig. 5 illustrates an embodiment of a process 500 for using handwriting system 200. The pro-cess begins at block 502. At block 504, the process listens or watches for a start trigger indicating the beginning of data collection. At block 506 the process inquires whether a start trigger has been detected. If at block 506 no start trigger has been detected the process returns to block 504. But if at block 506 a start trigger has been detected the process proceeds to block 508.

At block 508, the process determines the location of the tracking point on the physical object. When the physical object is a human hand, determination of the tracking point can be done by a gesture recog-nition module. As previously described, in one embodiment if the physical object is the human hand the track-ing point can be a fingertip of the human hand. In another embodiment, a user can wear a glove or other item on their hand, such as a glove, ring, or other item with a specific material, pattern, light, or color that the process can identify as the tracking point. When the physical object is a stylus and the tracking point is the stylus tip, the tracking point can be identified in one embodiment by putting a known pattern on the stylus tip so that the system will recognize its location.

The process then moves to block 510 where it forms a raw data set by collecting 3-D tracking data for the three-dimensional movement of the tracking point. In one embodiment, the raw data set includes three-dimensional coordinates of trace or trajectory of the tracking point in three-dimensional space. To pro-vide real-time or near-real-time feedback to the user, at block 512 the 3-D tracking point data collected at block 510 can be projected onto a fixed two-dimensional plane, and at block 514 the projected two-dimensional data is shown on a display (see Fig. 2) that is visible to the user. From block 512, the process moves on to block 516 where it determines the projection mode that has been selected. In the illustrated embodiment the process has three projection modes: an adaptive online mode, an adaptive off-line mode, and a fixed mode. The adap-tive online mode is described below in connection with Fig. 6; the adaptive off-line mode is described below in connection with Fig. 7.

If at block 516 the process determines that the fixed projection mode has been selected, the process proceeds to block 518 where it queries whether an end trigger has been detected, signaling the end of collection of 3-D tracking point data. If at block 518 an end trigger has not been detected, the process returns to block 510 where it continues collecting 3-D tracking point data for the raw data set. But if at block 518 an end trigger has been detected, the process proceeds to block 520.

At block 520, the process creates a reduced data set by filtering outliers from the raw data set collected at block 510. In other words:

Reduced Data Set = Raw Data Set –Outliers

The reduced data set, then, contains three-dimensional data from which the outliers have been removed or subtracted. In some cases, the 3-D tracking point data in the raw data set can include data that is not part of what the user is trying to handwrite and is therefore irrelevant or not useful for handwriting recognition. For example, between the start trigger and the beginning of an actual letter traced by the tracking point there might be some irrelevant hand motion that in turn creates irrelevant motion of the tracking point. And there can also be tracking point motion that creates irrelevant data between the end of what is being written and the end trigger. In one embodiment, these outliers can be filtered or removed from the raw data set by excluding data from a time period at the beginning of the collection cycle and a time period at the end of the collection cycle. For instance, the first 10–40 ms of data from the beginning of the raw data set and the last 10–40 ms of data from the end of the raw data set can be removed. In different embodiments the removal periods at the beginning and end of the raw data set can have, but need not have, the same length.

At block 522, the process creates a projected data set containing two-dimensional data result-ing from projection onto a fixed two-dimensional imaging plane of the reduced data set produced a block 520. In other words:

Projected Data Set = 2-D Projection of Reduced Data Set.

The projected data set, then, contains two-dimensional data instead of three-dimensional data. The fixed two-dimensional plane onto which the three-dimensional data from the reduced data set is projected is fixed and known in advance. In one embodiment, for instance, the fixed two-dimensional imaging plane can be a plane normal to the optical axis of the camera, or normal to the optical axis of one of the cameras in an embodiment with a camera array. In another embodiment the fixed two-dimensional imaging plane can be a plane substan-tially parallel to the plane of the user display. In some embodiments, the imaging plane used at block 522 can be the same imaging plane used at block 512, but in other embodiments it need not be the same imaging plane. The projected data set will generally be a set of two-dimensional coordinate pairs, which can be expressed as physical spatial coordinates or as camera pixel coordinates.

At block 524 the process smooths the projected 2-D data in the projected data set and at block 525 the process can transform and/or format the smoothed 2-D data so that it complies with the format required by the handwriting recognition engine. For example, some handwriting recognition engines might set the 2-D origin differently from the 2-D projection described, meaning that the data must be transformed or formatted to follow the convention. At block 526 the process transmits the smoothed and transformed/formatted two-dimensional data to a handwriting recognition module for recognition. At block 528, the handwriting recogni-tion module analyzes the data received from block 526 to recognize the handwriting. At block 530 the process transmits the recognized handwriting as an input or command to a further system (e.g., systems 1-3 in Fig. 2) and at block 532 the process shows the recognized handwriting on the user display. From block 530 the process then returns to block 504, where it listens or watches for a start trigger indicating the beginning of another collection cycle.

Fig. 6 illustrates an embodiment of an adaptive online perspective projection process 600 that can be used with process 500. Projection process 600 begins at block 516 (see Fig. 5) when block 516 deter-mines that adaptive online projection is the projection mode selected. Adaptive online projection process 600 is essentially an iterative real-time or near-real-time process for determining an imaging surface and projecting the 3-D data from the reduced data onto the determined imaging surface.

At block 602, the process creates a reduced data set from the raw data set by filtering or re-moving outliers from the 3-D tracking point data collected so far at block 510. In one embodiment block 602 can filter or remove outliers as described above for block 520, but in other embodiments block 602 need not use the same outlier filtering as block 520. At block 604 the process calculates an imaging surface from the reduced data set-i.e., from the existing 3-D tracking point data minus outliers. As discussed above for Figs. 4A-4B, if the imaging surface will be planar it can be determined for instance by applying a least squares fit in one embodiment to the reduced set of 3-D tracking point data. The iterative nature of this process means that the imaging surface computation might have to be done many times, in which case a less computationally demanding way of computing an imaging plane, such as K-means, can be used.

At block 606, the process checks whether an end trigger has been detected, signaling the end of collection of 3-D tracking point data at block 510. If at block 606 no end trigger has been detected the process moves to block 608 which updates the raw data set of 3-D tracking point data to include newly-col-lected data and then returns to block 604 where it computes a new projection plane that accounts for the most recently collected 3-D tracking point data. But if an end trigger is detected at block 606, the process moves to block 610 where it outputs the final adaptive imaging plane.

At block 612, the process creates a projected data set containing two-dimensional data by projecting the reduced data set produced at block 602 onto the two-dimensional plane determined at block 610. The projected data set will generally be a set of two-dimensional coordinate pairs, which can be expressed as physical spatial coordinates or as camera pixel coordinates. At block 614 the process smooths the 2-D data in the projected data set and at block 615 the process can transform and/or format the smoothed 2-D data so that it complies with the format required by the handwriting recognition engine. For example, some handwriting recognition engines might set the 2-D origin differently from the 2-D projection described, meaning that the data must be transformed or formatted to follow the convention. At block 616 the process transmits the smoothed and transformed/formatted two-dimensional data to a handwriting recognition module for recogni-tion. At block 618, the handwriting recognition module analyzes the data received from block 616 to recognize the handwriting. At block 620 the process transmits the recognized handwriting as an input or command to a further system (e.g., systems 1–3 in Fig. 2) and at block 624 also shows the recognized handwriting on the user display. From block 620 the process then moves to block 622, which returns the process to block 504 (see Fig. 5) to listen or watch for a start trigger indicating the beginning of another data collection cycle.

Fig. 7 illustrates an embodiment of an adaptive offline perspective projection process 700 that can be used with process 500. The primary difference between

processes

600 and 700 is that process 700 waits until all the 3-D trace data is collected, then does a single computation to determine the imaging plane. This is in contrast to process 600, which iteratively calculates the imaging plane as the 3-D data is collected instead of waiting until the raw data set is complete.

Projection process 700 begins at block 516 (see Fig. 5) when block 516 determines that adap-tive offline projection is the projection mode selected. At block 702, the process checks whether an end trigger has been detected, signaling the end of collection of 3-D tracking point data at block 510. If at block 702 no end trigger has been detected the process returns to block 510 where it continues to collect trace data. But if at block 702 the process detects an end trigger, the process moves to block 704, where it creates a reduced data set from the raw data set by filtering or removing outliers from the 3-D tracking point data collected at block 510. In one embodiment block 704 can filter or remove outliers as described above for block 520, but in other embodiments block 704 need not use the same outlier filtering as block 520.

At block 706 the process calculates an imaging surface from the reduced data set-i.e., from the existing 3-D tracking point data minus outliers. As discussed above for Fig. 4, if the imaging surface will be a plane it can be determined for instance by applying a least squares fit in one embodiment to the reduced set of 3-D tracking point data. The non-iterative nature of this process means that the imaging surface compu-tation need only be done once, in which case a more accurate but more computationally demanding way of computing the imaging surface can be used.

At block 708, the process creates a projected data set containing two-dimensional data by projecting the reduced data set produced at block 704 onto the imaging surface determined at block 706. The projected data set will generally be a set of two-dimensional coordinate pairs, which can be expressed as phys-ical spatial coordinates or as camera pixel coordinates. At block 710 the process smooths the 2-D data in the projected data set and at block 711 the process can transform and/or format the smoothed 2-D data so that it complies with the format required by the underlying handwriting recognition engine. For example, some hand-writing recognition engines might set the 2-D origin differently from the 2-D projection described, meaning that the data must be transformed or formatted to follow the convention. At block 712 the process transmits the smoothed and transformed/formatted two-dimensional data to a handwriting recognition module for recog-nition. At block 714, the handwriting recognition module analyzes the data received from block 712 to recog-nize the handwriting. At block 716 the process transmits the recognized handwriting as an input or command to a further system (e.g., systems 1–3 in Fig. 2) and at block 720 also shows the recognized handwriting on the user display. From block 716 the process then moves to block 718, which returns the process to block 504 (see Fig. 5) to listen or watch for a start trigger indicating the beginning of another data collection cycle.

Fig. 8 illustrates an embodiment of a vehicle implementation 800 of a system such as system 200. In the illustrated embodiment the vehicle 802 is a passenger sedan, but in other embodiments vehicle 802 can be any other type of vehicle-asedan with more or less doors than shown, a sport utility vehicle (SUV) , a van or minivan, a pickup truck, a commercial truck, etc.

All the elements of system 200 are positioned in vehicle 802. In the illustrated embodiment, vehicle control unit (VCU) 804 is the vehicle’s main computer and performs all the functions of computer 202, although in other embodiments the functions can be performed by a computer other than VCU 804, such as a separate computer specially designated for that purpose.

VCU 804 can be positioned anywhere in vehicle 802, but other components are positioned inside the vehicle cabin. Cameras or camera arrays are positioned in the cabin so that at least one of the driver’s hands is within the camera’s field of view. In other embodiments the cameras can be placed so that one or more passenger hands are within the field of view. In the illustrated embodiment, camera 806 is positioned on the vehicle ceiling roughly in the middle of the cabin, so that it looks down between the front seats. In that position, camera 806 can capture a driver’s right hand in a vehicle with a left-side driving station) or the driver’s left hand in a vehicle with a right-side driving station. Instead or in addition, a camera 807 can be positioned in the vehicle dashboard.

Other components are also positioned within the vehicle cabin. In the illustrated embodiment a physical button 808 and a microphone 810 can be positioned on the vehicle dashboard, where the driver can easily reach the button and where the microphone can easily pick up the driver’s voice or other sounds made by the driver. A display 814 can also be positioned on the dashboard where it can be easily seen by at least the driver. Other embodiments can, of course, position all these components differently than shown. With all the components in place, the system can operate as described above.

Although not shown in Figs. 5-7, to improve operation of the apparatus and process over time it can be useful to couple the described handwriting system to machine learning or “collective learning” to improve the system’s performance over time. For instance, in various embodiments machine learning or col-lective learning can be applied, for instance via back end server processing coupled to the handwriting system, to learn a particular user’s handwriting and improve its recognition, to improve computation of imaging planes for projecting user handwriting onto, and so on.

The above description of embodiments is not intended to be exhaustive or to limit the inven-tion to the described forms. Specific embodiments of, and examples for, the invention are described herein for illustrative purposes, but various modifications are possible.

Claims

An apparatus for handwriting recognition, the apparatus comprising:

a camera or an array of cameras capable of capturing three-dimensional images;

a computer including a communication interface, a processor coupled to the communication interface, and memory and storage coupled to the processor, the communication interface being com-municatively coupled to the camera or array of cameras;

a trace processing module running on the computer, wherein the trace processing module ex-ecutes instructions that cause the computer to:

use the camera or the array of cameras to capture a raw data set of three-dimensional coordinates of a moving tracking point on a physical object, the raw data set representing a trace of the movement of the tracking point through three-dimensional space;

form a reduced data set of three-dimensional coordinates by removing outliers from the raw data set of three-dimensional coordinates;

form a projected data set by projecting the reduced data set of three-dimensional co-ordinates onto an imaging surface, the projected data set representing a two-dimensional pro-jection of the trace of the movement of the tracking point onto the imaging surface; and

transmit the projected data set to a handwriting recognition module for recognition of a letter or word formed by the set of two-dimensional coordinates in the projected data set.
The apparatus of claim 1 wherein:

the physical object is a human hand and the tracking point is a fingertip of the human hand; or

the physical object is a stylus and the tracking point is a tip of the stylus.
The apparatus of claim 2, further comprising a gesture recognition module communicatively coupled to the communication interface, wherein if the physical object is a human hand the gesture recognition module, using images captured by the camera or array of cameras:

identifies the tracking point; and

identifies a start trigger to start capture of the raw data set and an end trigger to end capture of the raw data set, wherein the start trigger, the end trigger, or both, is a gesture, a motion, or a transition from one gesture to another.
The apparatus of claim 1, further comprising one or more of a microphone, a virtual or physical button, and a receiver coupled to the communication interface to receive a start trigger to start capture of the raw data set and to receive an end trigger to end capture of the raw data set.
The apparatus of claim 4 wherein the start trigger is one or more of:

a sound captured by the microphone;

activation of the virtual or physical button;

activation of a context that uses the recognized letter or word as input;

if the physical object is a stylus, a wired or wireless electronic signal transmitted to the receiver by a transmitter on the stylus; and

if the physical object is a stylus, recognition of a pattern on or near the tracking point.
The apparatus of claim 4 wherein the end trigger is one or more of:

a sound captured by the microphone;

activation of the virtual or physical button;

de-activation of a context that uses the recognized letter or word as input;

if the physical object is a stylus, a wired or wireless electronic signal transmitted to the receiver by a transmitter on the stylus.
The apparatus of claim 1 wherein the trace processing module smooths the projected data set before transmitting the projected data set to the handwriting recognition module.
The apparatus of claim 7 wherein the trace processing module transforms, re-formats, or both transforms and re-formats the smoothed data set before transmitting the smoothed data set to the handwriting recognition module.
The apparatus of claim 1 wherein the imaging surface is:

a fixed two-dimensional plane known before collection of the raw data set; or

an adaptive two-dimensional plane determined from the reduced data set.
The apparatus of claim 9 wherein the adaptive two-dimensional plane is computed by a least-squares fit of the three-dimensional coordinates in the reduced data set.
The apparatus of claim 1, further comprising a handwriting recognition module to recognize a letter or word formed by the set of two-dimensional coordinates.
The apparatus of claim 11, further comprising one or more further systems communicatively coupled to the communication interface, wherein a letter or word recognized by the handwriting recognition module is transmitted as an input to one or more further systems.
A system comprising:

a vehicle;

a handwriting system positioned in the vehicle, the handwriting system comprising:

a camera or an array of cameras capable of capturing three-dimensional images;

a computer including a communication interface, a processor coupled to the commu-nication interface, and memory and storage coupled to the processor, the communication in-terface being communicatively coupled to the camera or array of cameras;

a trace processing module running on the computer, wherein the trace processing module executes instructions that cause the computer to:

use the camera or the array of cameras to capture a raw data set of three-dimensional coordinates of a moving tracking point on a physical object, the raw data set representing a trace of the movement of the tracking point through three-dimen-sional space,

form a reduced data set of three-dimensional coordinates by removing outli-ers from the raw data set of three-dimensional coordinates,

form a projected data set by projecting the reduced data set of three-dimen-sional coordinates onto an imaging surface, the projected data set representing a two-dimensional projection of the trace of the movement of the tracking point onto the imaging surface, and

transmit the projected data set to a handwriting recognition module for recog-nition of a letter or word formed by the set of two-dimensional coordinates in the projected data set; and

one or more vehicle systems coupled via the communication interface to the handwriting sys-tem, wherein a letter or word recognized by the handwriting recognition module is transmitted as an input to one or more vehicle systems.
The system of claim 13 wherein:

the physical object is a human hand and the tracking point is a fingertip of the human hand; or

the physical object is a stylus and the tracking point is a tip of the stylus.
The system of claim 14, further comprising a gesture recognition module communicatively coupled to the communication interface, wherein if the physical object is a human hand the gesture recognition module, using images captured by the camera or array of cameras:

identifies the tracking point; and

identifies a start trigger to start capture of the raw data set and an end trigger to end capture of the raw data set, wherein the start trigger, the end trigger, or both, is a gesture, a motion, or a transition from one gesture to another.
The system of claim 13, further comprising one or more of a microphone, a virtual or physical button, and a receiver coupled to the communication interface to receive a start trigger to start capture of the raw data set and to receive an end trigger to end capture of the raw data set.
The system of claim 16 wherein the start trigger is one or more of:

a sound captured by the microphone;

activation of the virtual or physical button;

activation of a context that uses the handwriting system;

if the physical object is a stylus, a wired or wireless electronic signal transmitted to the receiver by a transmitter on the stylus; and

if the physical object is a stylus, recognition of a pattern on or near the tracking point.
The system of claim 16 wherein the end trigger is one or more of:

a sound captured by the microphone;

activation of the virtual or physical button;

deactivation of a context that uses the handwriting system;

if the physical object is a stylus, a wired or wireless electronic signal transmitted to the receiver by a transmitter on the stylus.
The system of claim 13 wherein the trace processing module smooths the projected data set before transmitting the projected data set to the handwriting recognition module.
The system of claim 19 wherein the trace processing module transforms, re-formats, or both transforms and re-formats the smoothed data set before transmitting the smoothed data set to the handwriting recognition module.
The system of claim 13 wherein the imaging surface is:

a fixed two-dimensional plane known before collection of the raw data set; or

an adaptive two-dimensional plane determined from the reduced data set.
The system of claim 21 wherein the adaptive two-dimensional plane is computed by a least-squares fit of the three-dimensional coordinates in the reduced data set.
The system of claim 13 wherein the one or more vehicle systems include at least one of a stereo system, a navigation system, a messaging system, or a vehicle maintenance and status system.
A process comprising:

capturing a raw data set of three-dimensional coordinates of a moving tracking point on a physical object, the raw data set representing a trace of the movement of the tracking point through three-dimensional space;

forming a reduced data set of three-dimensional coordinates by removing outliers from the raw data set of three-dimensional coordinates;

forming a projected data set by projecting the reduced data set of three-dimensional coordi-nates onto an imaging surface, the projected data set representing a two-dimensional projection of the trace of the movement of the tracking point onto the imaging surface; and

transmitting the projected data set to a handwriting recognition module for recognition of a letter or word formed by the set of two-dimensional coordinates in the projected data set.
The process of claim 24 wherein capturing the raw data set comprises using a camera or array of cameras to capture the three-dimensional motion of the tracking point.
The process of claim 24 wherein:

the physical object is a human hand and the tracking point is a fingertip of the human hand; or

the physical object is a stylus and the tracking point is a tip of the stylus.
The process of claim 24, further comprising:

starting capture of the raw data set using a start trigger; and

ending capture of the raw data set using an end trigger.
The process of claim 27 wherein the start trigger is one or more of:

a sound;

activation of a virtual or physical button;

activation of a context that uses the recognized letter or word as input;

if the physical object is a human hand, a gesture, motion, or transition from one gesture to another;

if the physical object is a stylus, a wired or wireless electronic signal; and

if the physical object is a stylus, recognition of a pattern on or near the tracking point.
The process of claim 27 wherein the end trigger is one or more of:

a sound;

activation of a virtual or physical button;

deactivation of a context that uses the recognized letter or word as input;

if the physical object is a human hand, a gesture, motion, or transition from one gesture to another; and

if the physical object is a stylus, a wired or wireless electronic signal.
The process of claim 24, further comprising smoothing the projected data set before transmitting the projected data set to the handwriting recognition module.
The process of claim 30, further comprising transforming, re-formatting, or both transforming and re-formatting the smoothed data set before transmitting the smoothed data set to the handwriting recognition module.
The process of claim 24 wherein the imaging surface is:

a fixed two-dimensional plane known before collection of the raw data set; or

an adaptive two-dimensional plane determined from the reduced data set.
The process of claim 32 wherein the adaptive plane is computed by a least-squares fit of the three-dimensional coordinates in the reduced data set.
The process of claim 24, further comprising recognizing the letter or word formed by the set of two-dimensional coordinates.
The process of claim 34, further comprising transmitting the recognized letter or word as an input to a further system.