EP1412910A1 - Method and apparatus for extracting information from a target area within a two-dimensional graphical object in an image - Google Patents

Method and apparatus for extracting information from a target area within a two-dimensional graphical object in an image

Info

Publication number
EP1412910A1
EP1412910A1 EP02733754A EP02733754A EP1412910A1 EP 1412910 A1 EP1412910 A1 EP 1412910A1 EP 02733754 A EP02733754 A EP 02733754A EP 02733754 A EP02733754 A EP 02733754A EP 1412910 A1 EP1412910 A1 EP 1412910A1
Authority
EP
European Patent Office
Prior art keywords
points
image
plane
target area
predetermined features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP02733754A
Other languages
German (de)
French (fr)
Inventor
Markus Andreasson
Andreas BJÖRKLUND
Martin SJÖLIN
Karl ASTRÖM
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anoto Group AB
Original Assignee
C Technologies AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by C Technologies AB filed Critical C Technologies AB
Publication of EP1412910A1 publication Critical patent/EP1412910A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/166Normalisation of pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present invention relates to the fields of computer vision, digital image processing, object recognition, and image-producing hand-held devices. More specifically, the present invention relates to a method and an apparatus for extracting information from a target area within a two-dimensional graphical object having a plurality of predetermined features with known characteristics in a predetermined first plane.
  • Computer vision systems for object recognition, image registration, 3D object reconstruction, etc. are known from e.g. US-Bl-6 , 226 , 396, US-B1-6, 192 , 150 and US- Bl-6, 181, 815.
  • a fundamental problem in computer vision systems is determining the correspondence between two sets of feature points extracted from a pair of images of the same object from two different views. Despite large efforts, the problem is still difficult to solve auto a- tically, and a general solution is yet to be found. Most of the difficulties lie in differences in illumination, perspective distortion, background noise, and so on. The solution will therefore have to be adapted to individual cases where all known information has to be accounted for.
  • an objective of the invention is to facilitate detection of a known two-dimensional object in an image so as to allow extraction of desired information which is stored in a target area within the object, even if the image is recorded in an unpredictable environment and, thus, at unknown angle, rotation and lighting conditions.
  • Another objective is to provide a universal detection method, which is adaptable to a variety of known objects with a minimum of adjustments. Still another objective is to provide a detection method, which is efficient in terms of computing power and memory usage and which, therefore, is particularly suitable for hand-held image-recording devices.
  • a method for extracting information from a target area within a two-dimensional graphical object having a plu- rality of predetermined features with known characteristics in a first plane.
  • the method involves: reading an image in which said object is located in a second plane, said second plane being a priori unknown; in said image, identifying a plurality of candida- tes to said predetermined features in said second plane; from said identified plurality of feature candidates, calculating a transformation matrix for projective mapping between said second and first planes; transforming said target area of said object from said second plane into said first plane, and processing said target area so as to extract said information.
  • the apparatus according to the invention may be a hand-held device that is used for detecting and interpreting a known two-dimensional object in the form of a sign in a single image, which is recorded at unknown angle, rotation and lighting conditions.
  • the feature identification may be based on the edges of the sign. This provides for a solution, which is adaptable to most already existing signs, since the features are as general as possible and common to most signs.
  • an edge detector based on the Gaussian kernel may be used. Once all edge points have been identified, they will be grouped together into lines. The Gaussian kernel may also be used for locating the gradient of the edge points .
  • the corner points on the inside of the edges are then used as feature point candidates. These corner points are obtained from the intersection of the lines, which run along the edges.
  • an algorithm for example based on the algorithm commonly known as RANSAC, may be executed in order to verify that the features are in the right configuration and to calculate a transformation matrix. After ensuring that the features are in the proper geometric configuration, any target area of the object can be transformed, . extracted and interpreted with, for example, an OCR or a barcode interpreter or a sign identificator .
  • FIG 1 is a schematic view of an image-recording apparatus according to the invention in the form of a hand-held device
  • FIG la is a schematic view of the image-recording apparatus of FIG 1 as well as a computer environment, in which the apparatus may be used
  • FIG 2 is a block diagram, which illustrates important parts of the image-recording apparatus shown in FIG 1,
  • FIG 3 is a flowchart diagram which illustrates the overall steps, which are carried out through the method according to the invention.
  • FIG 4 is a flowchart diagram which illustrates one of the steps of FIG 3 in more detail
  • FIG 5 is a graph for illustrating a smoothing and derivative mask, which is applied to a recorded image during one step of the method illustrated in FIGs 3 and
  • FIGs 6-17 are photographs illustrating the processing of a recorded image during different steps of the method illustrated in FIGs 3 and .
  • section A a general overview of the method and apparatus according to an embodiment is given.
  • Section B provides an explanation of how to obtain the transformation matrix or homography matrix, once feature point correspondences have been identified.
  • Section E describes a line-detecting algorithm.
  • Section F provides a description of the kind of information that can be obtained from lines.
  • the homography matrix can be computed, which is done using a RANSAC algorithm, as explained in Section G.
  • Section H describes how to extract the desired information from the target area .
  • section I addresses a few alternative embodiments .
  • the sign 100 is intended to look as ordinary as any sign.
  • the target area 101 from which information is to be extracted and interpreted, is the area with the numbers "12345678" and is indicated by a dashed frame in FIG 1.
  • the sign 100 does not hold very much information that can be used as features .
  • the sign 100 is surrounded by a frame. The edges of this frame give rise to lines. The embodiment is based on using these lines as features. However, any kind of feature can be used as long as a total of at least four feature points can be distinguished.
  • FIG 1 illustrates an image-producing hand-held device 300, which implements the apparatus according to the embodiment and by means of which the method according to the embodiment may be performed.
  • the hand-held device 300 has a casing 1 having approximately the same shape as a conventional highlighter pen.
  • One short side of the casing has a window 2, through which images are recorded for various image-based functions of the hand-held device.
  • the casing 1 contains an optics part, an electronics part and a power supply.
  • the optics part comprises a number of light sources 6 such as light emitting diodes, a lens system 7 and an optical image sensor 8, which constitutes the interface with the electronics part.
  • the light emitting diodes 6 are intended to illuminate a surface of the object (sign) 100, which at each moment lies within the range of vision of the window 2.
  • the lens system 7 is intended to project an image of the surface onto the light-sensitive sensor 8 as correctly as possible.
  • the optical sensor 8 can con- sist of an area sensor, such as a CMOS sensor or a CCD sensor with a built-in A/D converter. Such sensors are commercially available.
  • the optical sensor 8 may produce VGA images ("Video Graphics Array") in 640x480 resolution and 24-bit color depth.
  • the optics part forms a digital camera.
  • the power supply of the hand-held device 300 is a battery 12, but it can alternatively be a mains connection or a USB cable (not shown) .
  • the electronics part comprises a processing device 20 with storage means, such as memory 21.
  • the processing device 20 may be implemented by a commercially available microprocessor such as a CPU ("Central Processing Unit") or a DSP ("Digital Signal Processor”).
  • the processing device 20 may be implemented as an ASIC ("Application-Specific Integrated Circuit"), a gate array, as discrete analog and digital components, or in any combination thereof .
  • the storage means 21 includes various types of memory, such as a work memory (RAM) and a read-only memory (ROM) .
  • Associated programs 22 for carrying out the method according to the preferred embodiment are stored in the storage means 21.
  • the storage means 21 comprises a set of object feature definitions 23 and a set of inner camera parameters 24, the purpose of which will be described in more detail later. Recorded images are stored in an area 25 of the storage means 21.
  • the hand-held device 300 may be connected to a computer 200 through a transmission link 301.
  • the computer 200 may be an ordinary personal computer with circuits and programs, which allow communication with the hand-held device 300 through a communication interface 210.
  • the electronics part may also comprise a transceiver 26 for transmitting information to/from the computer 200.
  • the transceiver 26 is preferably adapted for short-range radio communication in accordance with, e.g., the Bluetooth standard in the 2.4 GHz ISM band ("Industrial, Scientific and Medical").
  • the transceiver can, however, alternatively be adapted for infrared communication (such as IrDA - "Infrared Data Association", as indicated by broken lines at 26') or ⁇ 5 4-> (ti a ⁇ 0 4->
  • the embodiment comprises at least one of an OCR module 29 or a barcode module 29' .
  • OCR module 29 or 29' are implemented as program code 22, which is stored in the storage means 21 and is executed by the processing device 20.
  • the extracted information can be used in many different ways, either internally in the hand-held device 300 or externally in the computer 200 after having been transferred across the transmission link 301.
  • Exemplifying but not limiting use cases include a custodian who verifies where and when during his night- shift that he was at different locations by capturing images of generally identical signs 100 containing different information when walking around the protected premises; a shop assistant using the hand-held device 300 for stocktaking purposes; tracking of goods in industrial areas; or for registering license plate numbers for cars and other vehicles.
  • the hand-held device 300 may advantageously provide other image-based services, such as scanner functionality and mouse functionality.
  • the scanner functionality may be used to record text.
  • the user moves the input unit 300 across the text, which he wants to record.
  • the optical sensor 8 records images with partially overlapping contents.
  • the images are assembled by the processing device 20.
  • Each character in the composite image is localized, and, using for instance neural network software in the processing device 20, its corresponding ASCII character is determined.
  • the text converted in this way to character-coded format can be stored, in the form of a text string, in the hand-held device 300 or be transferred to the computer 200 across the link 301.
  • the scanner functionality is described in greater detail in the Applicant's Patent Publication No. WO98/20446, which is incorporated herein by reference. ⁇ >t /— s (ti 1
  • a line in a plane is represented by the equation ax + by + c - 0 , where different choices of a, b and c give rise to different lines.
  • An equivalence class of vectors under this equivalence relationship is known as homogeneous vectors.
  • the set of equivalence classes of vectors in R- —(0,0,0) r forms the projective space P ⁇ .
  • the notation -(0,0,0) r means that the vector (0,0,0) ⁇ is excluded.
  • the point is represented as a 3-vector (x, y,l) by adding a final coordinate of 1 to the 2-vector.
  • (kx,ky,k)( ⁇ ,b,c) ⁇ 0 , which means that the vector k(x, y, ⁇ ) represents the same point as (x, y, ⁇ ) for any non-zero constant k.
  • the set of vectors k(x, y,V) T is considered to be the homogeneous representation of the point (x, y) ⁇ in R 2 .
  • This vector represents the point ( if , ⁇ O
  • a point represented as a homogeneous vector is therefore also an element of the projective space P ⁇ .
  • a projectivi ty is an invertible mapping h from P 2 —> P 2 such that j , x 2 and x 3 lie on the same line if and only if h(x x ) , h(x 2 ) and h(x ) do (see Hartley, R. , and Zisser ann, A. , "Multiple View Geometry in computer vision", Cambridge University Press, 2000).
  • a projectivity is also called a collineation, a projective transformation, or a homography.
  • a camera is a mapping from the 3D world to the 2D image. This mapping can be written as:
  • x PX .
  • X is the homogeneous representation of the point in the 3D world coordinate frame
  • x is the corresponding homogeneous representation of the point in the 2D image coordinate frame.
  • P is the 3x4 homogeneous camera projection matrix.
  • K is the 3x3 calibration matrix, which contains the inner parameters of the camera.
  • R is the 3x3 rotation matrix and t is the 3x1 translation vector. This factorization will be used below.
  • the calibration matrix K will be known, and we can obtain even more information. Since
  • the two first columns in the rotation matrix R are equivalent to the two first columns of K ⁇ l H .
  • H has eight degrees of freedom. Since we are working in 2D, every point has constraints in two directions, and hence every point correspondence has two degrees of freedom. This means that a lower bound of four corresponding points in the two different coordinate frames is needed to compute the homography matrix H. This section will show different ways of solving the equation for H.
  • Singular Value Decomposition In real life we usually don't get the position of the points to be exact, because of noise in the image. The solution to H will therefore be inexact. To get an H that is more accurate, we can use more than four point correspondences and then solve an over-determined system. If, on the other hand, the points are exact, the system will give rise to equations that are linearly dependent of each other, and we will once again end up with eight equations that are linearly independent.
  • the matrix A can be decomposed into:
  • the equation of the lines is not used when computing the homography matrix.
  • Steps 41 and 42 of FIG 4 are described in this section, whereas step 43 will be described in the next section.
  • Edges are defined as points where the gradients of of the image are large in terms of gray-scale, color, intensity or luminescence. Once all the edge points in an image have been obtained, they can be analyzed to see how many of them lie on a straight line. These points can then be used as the foundations of a line.
  • ⁇ 7 is the standard deviation (or the width of the kernel) and x is the distance from the point under investigation.
  • the gradient of a point in the image is a vector that points in the direction, in which the intensity in the image at the current point decreases the most. This vector is in the same direction as the normal to the possible line. Therefore, the gradient of all edge points has to be found.
  • the derivative of the Gaussian kernel in 2D is a vector that points in the direction, in which the intensity in the image at the current point decreases the most. This vector is in the same direction as the normal to the possible line. Therefore, the gradient of all edge points has to be found.
  • G ⁇ (x, y) ⁇ - ⁇ - e- ⁇ 2 ⁇ 2 , dx 2 ⁇ is applied to the image around the edge points.
  • (x, y) is the distance from the edge point.
  • the y coefficient can be extracted.
  • the normal of the line has the same direction as the gradient.
  • the a and b coefficients of the line have been obtained.
  • the equation for the line will be normalized, so the normal of the line will have the length 1:
  • the proposed line should run through the points. Since the image will be blurred, these constraints must be fulfilled only within a limit of a certain threshold. The threshold will of course depend on under what circumstances the picture was taken, the resolution of the image, and the object in the picture. Since all the data for the points is known, all that has to be done is to group the points together and adapt lines to them (step 42 in FIG 4) .
  • the following algorithm is used according to the preferred embodiment : For a certain amount of loops,
  • Step 3 See if these points have the same gradient as p using: (a n ,b n ) - (a,b) ⁇ > (l-thres2) ;
  • Step 6 If there are at least a certain amount of points that satisfy these conditions, define these points to be a line; End. Repeat with the remaining points.
  • This algorithm selects a point by random.
  • the equation of the line that this point might be a part of is already known. Now, the algorithm finds all other points that have the same gradient and lie on the same line as the first point. Both these checks have to be carried out within a certain threshold.
  • the algorithm checks if the point is closer than the distance thresl to the line.
  • the algorithm checks if the gradients of the two points are the same. If they are, then the product of the gradients should be 1. Once again, because of inaccuracy, it is sufficient if the product is larger than (l-thres2) . Since the edge points are not exactly located, and since the gradients will not have the exact value, a new line is computed in step 4.
  • This line is computed from all the points, which satisfy the conditions in step 2 and step 3 using SVD, in the following way.
  • step 2 and step 3 are repeated. To increase the accuracy even further, one more recursion takes place.
  • the values of the threshold numbers will have to be decided depending on an actual application, as is readily realized by a man skilled in the art.
  • FIG 8 shows the lines 104 that were found, and the edge points 103 that were used in the example above.
  • Consecutive edge points By coincidence, it is possible that the line- detecting algorithm produces a line that is actually made up from a lot of small edges that lie on a straight line. For example, edges of characters written on a straight line may give rise to such a line. If only lines con- sisting of consecutive edge points are of interest, it is desired to eliminate these other lines. One way of doing this is to take the mean point of all the edge points in the line. From this point, extrapolate a few more points along the line. Now check the differences in intensity on both sides of the line at the chosen points. If the differences in intensities at the points do not exceed a certain threshold, the line is not constructed from consecutive edge points .
  • FIG 14 shows an enlargement of the result of the algorithm, which checks for consecutive edge points, applied to the line 109 at the bottom of the numbers "12345678". The algorithm gave a negative result, in terms of whether it was consecutive edge points or not.
  • FIG 15 is an enlargement of the same algorithm applied to the line 110 at the bottom of the frame. Here, the algorithm gave a positive result of the edge points being consecutive.
  • the feature candidates in the image have been obtained, they must be matched to features from the original sign, which have known coordinates. If four feature candidates have been found, their coordinates can be matched with the corresponding object feature point coordinates stored in the area 23 of the storage means 21, and the homography matrix H can be computed. Since probably more candidates to the interesting features than the intended ones will be found, a verification procedure has to be carried out. This procedure must verify that the selected feature point correspondences have been carried out with the correct matching. Thus, if there are a lot of candidates for possible feature points, the homography matrix should be computed many times and verified every time, to check whether it is the proper point correspondence or not .
  • this matching procedure is optimized by using the RANSAC algorithm of Fischler and Bolles (see Fischler, M. A., and Bolles, R. C, "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography", Comm. Assoc . Comp . Mach . , 24 (6) :381-395 , 1981) .
  • the RANdom SAmple and Consensus algorithm (RANSAC) is an estimating algorithm that is able to work with very large sets of putative correspondences.
  • the best way to determine the homography matrix ff is to compute H for all possible combinations, verify every solution, and then CQ 4-1
  • the most common way to verify H is by using more feature points. In this case, even more than the four feature points from the original objects have to be known. The remaining points from the original object can then be transformed into the image coordinate system. Thereafter, a verification procedure can be performed to chech whether the points have been found in the image . The more extra features that are found, the higher likelihood that the correct set of point correspondences have been picked.
  • the camera is calibrated, it is possible to verify the putative homography matrix with the inner camera parameters 24 stored in the storage means 21 (see discussion in earlier sections) .
  • the homography matrix is a homogenous matrix and is only determined up to a scale. If the object have points that are at the exact same configuration as the feature-and-verification points, except rotated and/or up to scale, the verification procedure will give rise to exactly the same values as if the correct point correspondences had been found. Therefore it is important to choose feature points that are as distinct as possible.
  • RANSAC is based on randomization. If even more information is available, then obviously this should be used to optimize the RANSAC algorithm. Some restrictions that might be added are the following.
  • Stop if the solution is found Instead of repeating the calculations in the procedure a specific amount of times, it is possible to stop, if the verification indicates that a solution that is good has been found. To determine if a solution is good or not, a statement can be made that if at least a certain amount of feature points in the verification procedure have been found, then this must be the correct homography matrix. If the inner parameters of the camera are used as the verification procedure, a stop can be made if r t and r 2 are very close to having the same length and being orthogonal .
  • Collinear feature points The constraint that only such a set of feature points are supposed to be used, where no three points are allowed to be collinear, can be included in the RANSAC algorithm. After the four points have been picked by randomization, it is possible to check if three of them are collinear, before proceeding with computing the homography matrix. Combined with the next two restrictions, this check is very time efficient. Convex Hull
  • the convex hull of an arbitrary set S of points is the smallest convex polygon P ch for which each point in S is either on the boundary of P ch or in its interior.
  • Two of the most common algorithms used to compute the convex hull are Graham 's scan and Jarvis 's march . Both these algorithms use a technique called “rotational sweep" (see Cormen, T. H. , Leiserson, C. E., and Rivest, R. L . , "Introduction to Algorithms", The Massachusetts Institute of Technology, 1990., page 898).
  • these algorithms will also provide the order of the vertices, as they appear on the hull, in counterclockwise order. Graham's scan runs in 0(n ⁇ gn) time, as opposed to Jarvis's march that runs in 0(nh) time, where n is the number of points and h is the number of vertices .
  • RANSAC The principle of RANSAC is to choose four points by randomization, match them with four putative correspon- ding points also chosen by randomization and then discard these points and choose new ones. It is possible to modify this algorithm and include some systematical operations. Once the two sets of four points have been selected, all the possible combinations of matching between these points can be tested. This means that there are
  • Another method of reducing the computing time is to suppose that the image is taken more or less perpendicular to the target. Thus, lines which cross each other at 90 degrees will cross each other at an angle close to 90 degrees in the image. By looking for such almost perpendicular lines, it is possible to rapidly determine lines suitable for the transformation. If no such lines are found, the system continues as outlined above .
  • the computation time may be decreased by downsampling of the image.
  • the image is divided by a grid comprising for example each second line of pixels in the x and y directions.
  • the presence of a line on the grid is determined by testing only pixels on the grid.
  • the presence of a line may then be verified by testing all pixels along the supposed line.
  • any area from the image can be extracted, so it will seem like the picture was taken from a place located right in front of it.
  • all the points from within the area of interest will be transformed to the image plane in the resolution of choice. Since the image is a discrete coordinate frame, it is made up of pixels with integer numbers. The transformed points will probably not be integers though. Therefore, a bilinear interpolation (see e.g. Heckbert, P. S., "Graphics Gems IV", Academic Press, Inc. 1994) to obtain the intensity from the image has to be made.
  • the transformed image can be recovered from either the gray-scale intensity, or all three intensity levels can be obtained from the original picture in color.
  • FIG 16 shows the target area 101 of the image 102 in FIG 6, found by the algorithms above.
  • the target area 101' has been transformed, so that e.g. OCR or barcode interpretation can follow (steps 36 and 37 of FIG 3) .
  • a resolution of 128 pixels in the x direction was chosen.
  • the computer 200 may be connected, in a conventional manner, to a local area network or a global area network such as Internet, which allows the extracted information to be forwarded to still other applications outside the hand-held device 300 and computer 200.
  • the extracted information may be communicated through a mobile telephone, which is operatively connected to the hand-held device 300 by IrDA, Bluetooth or cable (not shown in the drawings) . While several embodiments of the invention have been described above, it is pointed out that the invention is not limited to these embodiments. It is expressly stated that the different features as outlined above may be combined in other manners than explicitely described and such combinations are included within the scope of the invention, which is only limited by the appended patent claims .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

A method is presented for extracting information from a target area (101) within a two-dimensional graphical object (100) having a plurality of predetermined features (23) with known characteristics in a first plane. An image (102) is read where the object (100) is located in a second plane, which is a priori unknown. A plurality of candidates (108) to the features in the second plane are identified in the image. A transformation matrix (H) for projective mapping between the second and first planes is calculated from the identified feature candidates. The target area (101) of the object is transformed from the second plane into the first plane. Finally, the target area is processed so as to extract the information.

Description

METHOD AND APPARATUS FOR EXTRACTING INFORMATION FROM A TARGET AREA WITHIN A TWO-DIMENSIONAL GRAPHICAL OBJECT IN
AN IMAGE
Field of the Invention
Generally speaking, the present invention relates to the fields of computer vision, digital image processing, object recognition, and image-producing hand-held devices. More specifically, the present invention relates to a method and an apparatus for extracting information from a target area within a two-dimensional graphical object having a plurality of predetermined features with known characteristics in a predetermined first plane.
Background of the Invention
Computer vision systems for object recognition, image registration, 3D object reconstruction, etc., are known from e.g. US-Bl-6 , 226 , 396, US-B1-6, 192 , 150 and US- Bl-6, 181, 815. A fundamental problem in computer vision systems is determining the correspondence between two sets of feature points extracted from a pair of images of the same object from two different views. Despite large efforts, the problem is still difficult to solve auto a- tically, and a general solution is yet to be found. Most of the difficulties lie in differences in illumination, perspective distortion, background noise, and so on. The solution will therefore have to be adapted to individual cases where all known information has to be accounted for.
In recent years, advanced computer vision systems have become available also in hand-held devices. Modern hand-held devices are provided with VGA sensors, which generate images consisting of 640x480 pixels. The high resolution of these sensors makes it possible to take pictures of objects with enough accuracy to process the images with satisfying results. However, an image taken from a hand-held device gives rise to rotations and perspective effects. Therefore, in order to extract and interpret the desired information within the image, a pro ective transformation is needed. Such a projective transformation requires at least four different point correspondences where no three points are collinear.
Summary of the Invention In view of the above, an objective of the invention is to facilitate detection of a known two-dimensional object in an image so as to allow extraction of desired information which is stored in a target area within the object, even if the image is recorded in an unpredictable environment and, thus, at unknown angle, rotation and lighting conditions.
Another objective is to provide a universal detection method, which is adaptable to a variety of known objects with a minimum of adjustments. Still another objective is to provide a detection method, which is efficient in terms of computing power and memory usage and which, therefore, is particularly suitable for hand-held image-recording devices.
Generally, the above objectives are achieved by a method and an apparatus according to the attached independent patent claims .
Thus, according to the invention, a method is provided for extracting information from a target area within a two-dimensional graphical object having a plu- rality of predetermined features with known characteristics in a first plane. The method involves: reading an image in which said object is located in a second plane, said second plane being a priori unknown; in said image, identifying a plurality of candida- tes to said predetermined features in said second plane; from said identified plurality of feature candidates, calculating a transformation matrix for projective mapping between said second and first planes; transforming said target area of said object from said second plane into said first plane, and processing said target area so as to extract said information.
The apparatus according to the invention may be a hand-held device that is used for detecting and interpreting a known two-dimensional object in the form of a sign in a single image, which is recorded at unknown angle, rotation and lighting conditions. To locate the known sign in such an image, specific features of the sign are identified. The feature identification may be based on the edges of the sign. This provides for a solution, which is adaptable to most already existing signs, since the features are as general as possible and common to most signs. To find lines that are based on the edges of the sign, an edge detector based on the Gaussian kernel may be used. Once all edge points have been identified, they will be grouped together into lines. The Gaussian kernel may also be used for locating the gradient of the edge points . The corner points on the inside of the edges are then used as feature point candidates. These corner points are obtained from the intersection of the lines, which run along the edges.
In an alternative embodiment, if there are other very significant features in the sign (e.g., dots of a specific gray-scale, color, intensity or luminescence) , these can be used instead of or in addition to the edges, since such significant features are easy to detect.
Once a specific amount of feature candidates have been identified, an algorithm, for example based on the algorithm commonly known as RANSAC, may be executed in order to verify that the features are in the right configuration and to calculate a transformation matrix. After ensuring that the features are in the proper geometric configuration, any target area of the object can be transformed, .extracted and interpreted with, for example, an OCR or a barcode interpreter or a sign identificator . Other objectives, characteristics and advantages of the present invention will appear from the following detailed disclosure, from the attached subclaims as well as from the drawings .
Brief Description of the Drawings
A preferred embodiment of the present invention will now be described in more detail, reference being made to the enclosed drawings, in which:
FIG 1 is a schematic view of an image-recording apparatus according to the invention in the form of a hand-held device,
FIG la is a schematic view of the image-recording apparatus of FIG 1 as well as a computer environment, in which the apparatus may be used, FIG 2 is a block diagram, which illustrates important parts of the image-recording apparatus shown in FIG 1,
FIG 3 is a flowchart diagram which illustrates the overall steps, which are carried out through the method according to the invention,
FIG 4 is a flowchart diagram which illustrates one of the steps of FIG 3 in more detail,
FIG 5 is a graph for illustrating a smoothing and derivative mask, which is applied to a recorded image during one step of the method illustrated in FIGs 3 and
4, and
FIGs 6-17 are photographs illustrating the processing of a recorded image during different steps of the method illustrated in FIGs 3 and . Detailed Disclosure of an Embodiment
The rest of this specification has the following disposition:
In section A, a general overview of the method and apparatus according to an embodiment is given.
To better understand the material covered by this specification, an introduction to projective geometry in terms of homogeneous notation and camera projection matrix is described in section B. Section C provides an explanation of how to obtain the transformation matrix or homography matrix, once feature point correspondences have been identified.
An explanation of which kind of features should be chosen and why is found in Section P. Section E describes a line-detecting algorithm.
Section F provides a description of the kind of information that can be obtained from lines.
Once the feature points have been identified, the homography matrix can be computed, which is done using a RANSAC algorithm, as explained in Section G.
Section H describes how to extract the desired information from the target area .
Finally, section I addresses a few alternative embodiments .
A. General Overview
An embodiment of the invention will now be described, where the object to be recognized and read from is a sign 100, as shown at the bottom of FIG 1. It is to be emphasized, however, that the invention is not limited to signs only. The sign 100 is intended to look as ordinary as any sign. The target area 101, from which information is to be extracted and interpreted, is the area with the numbers "12345678" and is indicated by a dashed frame in FIG 1. As can be seen, the sign 100 does not hold very much information that can be used as features . As with many other signs, the sign 100 is surrounded by a frame. The edges of this frame give rise to lines. The embodiment is based on using these lines as features. However, any kind of feature can be used as long as a total of at least four feature points can be distinguished. If the sign holds any special features (e.g., dots of a specific color), then these can be used instead of or in addition to the frame, since they are usually easier to detect. FIG 1 illustrates an image-producing hand-held device 300, which implements the apparatus according to the embodiment and by means of which the method according to the embodiment may be performed. The hand-held device 300 has a casing 1 having approximately the same shape as a conventional highlighter pen. One short side of the casing has a window 2, through which images are recorded for various image-based functions of the hand-held device.
Principally, the casing 1 contains an optics part, an electronics part and a power supply.
The optics part comprises a number of light sources 6 such as light emitting diodes, a lens system 7 and an optical image sensor 8, which constitutes the interface with the electronics part. The light emitting diodes 6 are intended to illuminate a surface of the object (sign) 100, which at each moment lies within the range of vision of the window 2. The lens system 7 is intended to project an image of the surface onto the light-sensitive sensor 8 as correctly as possible. The optical sensor 8 can con- sist of an area sensor, such as a CMOS sensor or a CCD sensor with a built-in A/D converter. Such sensors are commercially available. The optical sensor 8 may produce VGA images ("Video Graphics Array") in 640x480 resolution and 24-bit color depth. Hence, the optics part forms a digital camera. In this example, the power supply of the hand-held device 300 is a battery 12, but it can alternatively be a mains connection or a USB cable (not shown) .
As shown in more detail in FIG 2, the electronics part comprises a processing device 20 with storage means, such as memory 21. The processing device 20 may be implemented by a commercially available microprocessor such as a CPU ("Central Processing Unit") or a DSP ("Digital Signal Processor"). Alternatively, the processing device 20 may be implemented as an ASIC ("Application-Specific Integrated Circuit"), a gate array, as discrete analog and digital components, or in any combination thereof . The storage means 21 includes various types of memory, such as a work memory (RAM) and a read-only memory (ROM) . Associated programs 22 for carrying out the method according to the preferred embodiment are stored in the storage means 21. Additionally, the storage means 21 comprises a set of object feature definitions 23 and a set of inner camera parameters 24, the purpose of which will be described in more detail later. Recorded images are stored in an area 25 of the storage means 21.
As shown in FIG la, the hand-held device 300 may be connected to a computer 200 through a transmission link 301. The computer 200 may be an ordinary personal computer with circuits and programs, which allow communication with the hand-held device 300 through a communication interface 210. To this end, the electronics part may also comprise a transceiver 26 for transmitting information to/from the computer 200. The transceiver 26 is preferably adapted for short-range radio communication in accordance with, e.g., the Bluetooth standard in the 2.4 GHz ISM band ("Industrial, Scientific and Medical"). The transceiver can, however, alternatively be adapted for infrared communication (such as IrDA - "Infrared Data Association", as indicated by broken lines at 26') or τ5 4-> (ti a β 0 4->
90 Φ β Ti β t 1 β Φ -rl 4-1 Λ CQ 0 1 o 4-> Φ CQ H Φ Φ Λ Ti o -rl 4J A β Φ tn Φ 4-> Ti
(ti Φ β φ rl CQ 1 β $ Ti CO 4-> o -rl u -H rl Φ (ti u 0 ,β φ -H O Φ O Ti (ti 4-) φ " - — -H -H rl .. .. β O . CO Λ o -rl 4-) 4 1 4-1 (ti A Φ Ti ro Ti Φ & Ti rH > o CQ 4-> O Φ β CO
TJ Φ 4-1 d 4-1 a • 4-> 4-> 5 β Φ tn Φ β φ (ti Φ Ti o Φ (ti rH β (ti . Φ rl in β A β β -H ε .— ^ .. rl 0 Λ 0 a β Ti (0 β g e TJ φ ro rl φ 0 Φ J 0
H -H Λ (ti S 0 P β 4-1 β 0 -rl 4-1 0 φ -H rl H 0 rl rl Ti β 4-1 4J H φ β O
CJ φ β Λ u o 0 0 jg ϋ 4-) φ 4-1 e 0 a -rl 0 O T5 in φ 4-) U (ti X o — >
0- -. Λ O φ CO j -H o Φ (ti Φ rl CQ rl O 4-1 4-1 β rH O CJ (ti φ Φ (ti -\ M 4-) -rl 4-1 ϋ — 0 Φ (ti (ti CO Φ o -H a PJ O φ 4-) 1 — * 4 β β rl Φ CO n 4-> 4J (ti J r 0 4-) H rβ -H (ti β 0 ; O β φ 4-1 rl e β φ rβ φ > . 4-1 φ A J CO o
CN (U rl 4-> ε 1 Φ H CQ rl 0 A Ti CQ 0 rl (ti A 1 rl φ β A 0 Φ 0 Φ H — '
Ul >1 ϋ a β (ti > 4-> (ti -H 4-1 φ β φ 4-> 0 rl 4-> Ti Ti 0 4-1 En -[-) 4J tn β -H ε rH Φ σ rH β U ^ β (ti rβ β 4H 4 β CO -H 0 r-\ Λ tti (ti β
(ti β O o Φ o a -H β 1 β φ β β -rl H 4 -H β Φ (ti (ti Ti 4-> • (ti o CQ ε Φ o co a U & ro CQ P 4-1 ϋ H Φ -H s 4-> -rl CQ A r S rH (ti rl β J -rl rl in -rl
(TJ ι H 4-> 4-1 H 4J 4-1 -rl -H tn A rl β o -H φ 4-> Φ o -H Φ (ti 4-1 r-l g β Φ Φ Ti β 4 rβ β EH Ti φ -rl o Φ , Φ φ A O -H CQ , β Φ -H
Λ rH 0 Φ O Λ u Φ rH β ^ (ti φ 4-> rH rβ H Λ rH (ti Λ tn 1 jΞ; 4J >1 4-1 o A 4-1 β υ td ϋ _β u -H H CQ (ti φ -rl φ Φ 4 4-1 O g 4-1 ≤ Ti β i-i A -H 4-1 Φ tn β H 4J Φ > (ti Φ in Ti ^ β 4-1 Ti > -H 4J H g β Ti β ε a 4-1 4-1 tn 0 ω 4-1 φ β rH Φ 4 l Φ -rl Φ g ro -rl -H U 4H A 4H -rl (ti β SH (ti (ti 4-1 J β tn β (ti tn Ti CO β i o Φ (t (ti Φ
Φ β Φ 4-1 ϋ tn a J1 0 4-> CQ 4-1 Φ 0 (ti υ 0 A (ti (t 1 Φ o (ti β CD β -rl U rH ι~ι Φ -rl Φ 4-1 A CO ε i rl 4J α co (ti 4-) 0 5 φ tn 4-1 e X H φ Φ O A tn Λ CQ A φ φ CQ CO 4-1 4-1 0 Φ ° CQ in a 1 4-> rH ϋ -β -rl β PH TS n
CO . X H 0 β (ti & -H 4-> A rH Φ β β 4H rl Ti rl φ 1 & (ti φ Φ 4-> 4J 4 P -rl -H 0 -H X 4J tn CQ (ti ε Φ β β φ Φ
4J 4-1 ε g CO si Ti β O β β rl rl (ti rH CQ 4 ^ (ti 4-> β H rl 0 -rl CO i (ti 4-) H β 4-> a Φ (ti CQ Φ φ (ti 4H (ti ^H 4-1 rl ε Φ (ti ε 4-1
(ti iπ in O β (ti 1 -H 4-1 (t rl ϋ υ O O u 4-> rH Ti 0 4J -H (ti φ (ti β Φ tn β 4-> rβ 0
5^ a 4-1 rl A Φ o (ti
H A φ 0 β rl 4-1 β ro φ e 4-1 (0 0 o rl (ti β 4-1 rj & Φ β u
(ti (ti Λ u (ti -H 4-) 0 i rH a CN ϋ 0 ε 4H rl β - CQ -rl o ro A o -r| O (t A >t CQ β CO (ti a ,—, a 4-) -H . A r-l B< CQ o I 4J CO rl 4J a β O O o 4-> -t (ti O Ti β A
= rl 5 4-1 >1 Φ ε -H O ro H CO 0 φ Φ β a rl β J Φ β (ti J g VD 0 β CQ Λ M 4-> Φ (ti 4J -H o ro (ti -H CQ tn Φ Φ 4J (ti 0 ^ φ rl U
O H Φ A 4-> o rl ε β (ti rl
0 CM m (ti u CJ (ti H Λ β e -β CO <-\ φ Ti β β o Φ -rl 4-1 rH υ •H -H a rH 4 CQ 0 Φ O ϋ CO (ti a (ti -H φ 4 rl (ti a β β (ti 4-1 rl 4-1 (ti 4-1 (ti
4-> i Φ β Λ cti (ti U Λ ro 4 -H a 4J g rl 4J Ti Λ -H U a -rl rH -H O 4-1 J Φ o rH td in o β β .. 4-> ϋ > φ (J H P 1 4-1 -. 4-4 -H tti β a Ti Ti Φ A Φ A -H
(ti (ti -rl -H 0 1 Λ Φ Φ Φ Φ A Φ 0 a φ 0 rl • 4-) 4J (ti Φ o -m 4-1 4-1
-H CO 5 > 4 4-4 -rH rH U A .. U n T3 CO n 4-1 A β (ti P" T3 a 4J φ 4-1 CO Ti -rl Λ a rl Φ β Φ u O Ti -P rH β 4-1 ro -H A — - A CQ Φ 4-> 4-) 4-1 Φ a Φ o -H A 4-> CQ β Φ Ti O 0 ^ o
Φ β (ti 5 φ β U (ti CQ > 0 Ti 0 β A rl β 4-1 (ti β 4-> 1 ω -H 4-> H O (ti β β β Φ !-) β (ti 4-1 .^ — . Φ rl (ti -rl Φ 4-1 -H Φ -rl l φ H •
H CQ Ti Φ β P 0 » -H H Ti H Φ 0 r-l rl ro A 0 H β A (ti 4-1 β 4-1 A (ti A > A H tn τ5 r-i (ti o 4-1 -H CO A PH (ti Λ -r| (ti 4-> rl 4J 4H 4J a rl -U rβ 0 (ti 4-1 (ti 4-1 (ti • φ β φ Φ Φ Φ o 4 CN 4-) i β 1 4-) β 0 CD φ g Φ 4-1 rβ Φ β A Ti & β φ co Φ H Λ A g ro 4-1 a H β r-\ 0 Ti (ti 0 (ti 4-1 H 4H φ tn -H 4-) 4-1 4J 4J A (ti 0 φ 4H -rl
90 tti M A 1 En 0 o 1 5 -H Φ -H β β -rl PH 0 - rl CΛ Φ 0 O β EH ϋ D) in 0 A C5 -Q O (ti T5 Φ (ti Λ CQ (ti -rl CO -. A (ti Ti CO 0 φ (ti Φ tn r~~ 1
0 H β -Q U CO r-l rj 1 β Λ β β Φ ro β Φ (ti 4-> Φ Φ Φ A Λ φ Λ -H jg Ti Φ Φ tn -rl tn 4-> rl β -. 4H Λ ε A Φ β
(ti -H Φ a o -rl 4-1 tn 0
© Λ -. > Ti CO 0 β e Φ r-l g (ti CD (ti -H Φ a (ti O -P -rl 1 0 (ti rl ϊ H >1 > l φ o -H A -H A H -H g H in ε β A r-\ O o β (ti r (ti Φ
Es A i CN i g T5 CQ Λ T O a ε ϋ (t i 4-> -H Ti -H PH ro -H CQ 4-) (ti a ro -rl rl fS ε Λ -H 4J
code interpretation, so as to extract the information searched for (steps 36 and 37 in FIG 3) . To this end, the embodiment comprises at least one of an OCR module 29 or a barcode module 29' . Advantageously, such modules 29 or 29' are implemented as program code 22, which is stored in the storage means 21 and is executed by the processing device 20.
The extracted information can be used in many different ways, either internally in the hand-held device 300 or externally in the computer 200 after having been transferred across the transmission link 301.
Exemplifying but not limiting use cases include a custodian who verifies where and when during his night- shift that he was at different locations by capturing images of generally identical signs 100 containing different information when walking around the protected premises; a shop assistant using the hand-held device 300 for stocktaking purposes; tracking of goods in industrial areas; or for registering license plate numbers for cars and other vehicles.
The hand-held device 300 may advantageously provide other image-based services, such as scanner functionality and mouse functionality.
The scanner functionality may be used to record text. The user moves the input unit 300 across the text, which he wants to record. The optical sensor 8 records images with partially overlapping contents. The images are assembled by the processing device 20. Each character in the composite image is localized, and, using for instance neural network software in the processing device 20, its corresponding ASCII character is determined. The text converted in this way to character-coded format can be stored, in the form of a text string, in the hand-held device 300 or be transferred to the computer 200 across the link 301. The scanner functionality is described in greater detail in the Applicant's Patent Publication No. WO98/20446, which is incorporated herein by reference. φ >t /— s (ti 1
90 A β Λ 1 ε rS 1 1 Φ
O Φ 4-1 -H 1 CQ Φ ^ rH (ti 1 β CQ A o (ti A O 4H Φ Ti CJ β 4-1 O (ti Λ β O -H -H
4 Φ CN o 4H A rl φ -H (ti CQ (ti Φ 4-1 -H rl CQ CQ r-l rH CO 4-1 Φ O β φ Ti a rl φ O rl -H T5 4-1 CQ β tn ω 0 β tti 0 Φ A CO φ A -H A (ti O Ti . φ E3 in β Φ O Ti β rl Φ Λ U 4-> β CQ CQ 4-1 > rH rl -H β 4-1 φ O -H β U β -H
H 4-J -H o β -H (ti Ti 0 (ti 0 A H φ 0 (ti Λ Φ O -H (ti
4-1 > 4-1 -H (ti PA Φ rl a u β i≥ rH β rH tn (ti CJ A -H β 4 β J rH rH (ti
0- o (ti -H Φ o 4-1 Φ 4-) a a o o ~. β Λ ^-^ -H A 4-1 -H (ti rH O tti υ • β rH Ti -H ε 4-1 CQ (ti -H o rl 0 O rH (ti Ti β rH φ -H 3 CQ 4J o (ti rl CO -rl - rl φ 4-1 4-1 (ti H rH 4 Ti CJ -H ε a A β Φ β
0 o φ rl tn 0 0 rH 4J 0 Λ -H Ti rβ CD CD Φ β -H o A (ti Ti -H
4-1 CM 4-1 β β CO a Λ (ti β a Ti tn β A = H ε (ti 4H 0 4 (ti β -rl 0 -rl
X rH -H rl β (ti rl (ti β φ - — PH 0 4-1 J β β o CQ > a tti .
Ti Φ a CQ β Φ Ti 0 J o (ti rl -H rH . — . φ CQ υ -H ( Φ -H 4-1 O r~] O
Φ φ CQ U > φ -H -rl CJ ε 4J Es (ti = φ β tn Φ 4H 0 4H A 4-1 β 4H a φ
CQ 4J β (ti Φ -H β 4-1 rH β (ti υ CQ β -H Φ i J φ 4-1 -rl J a 0 a -H 4-1 U -H a 0 X β β β (t a -H CQ φ in Φ O β 3 CQ 4-1 Φ O Φ (ti
9< CO 0 rβ (ti ε β Q< Φ u Ti ^ U A β -r| β H tc β -. CQ a β CQ H φ ε CO J rl 4-1 rH rl β CO CJ β i -H a jg (ti β CQ -H Φ Φ 0 tn 0 a
A o CQ rl a Φ Φ 4H -H -H (ti ~ 0 > φ 0 ε -H (ti 4-1 Ti J CQ -H β 4-1 υ o 0 rl rl 4-1 Φ > 4-1 0 rH , O ,β a β -rl (ti -rl o 4J -H 4H rl u Φ 0 φ Φ A rβ CQ 4-> A Φ Φ CQ Φ 4-1 4-1 Φ a A 4H (ti a Ti O
(ti Φ J φ A 4-1 Φ Ti O A υ Φ β -H O 4-1 . φ CO 0 CQ EH 4-1 a Φ ε Λ (ti EH Λ β -rl CO -H rH CQ 4-1 4J Ti 4-1 A Φ 4J β β (ti CO CQ
O - CQ 4-1 φ 0 β (ti CO CQ S 0 Φ O A rl rl • O Φ
Ti 00 • rH ε -H Ti rl β φ Φ O β CO rl a β O CQ - l CO ε β β rH >1 O
4J 4-1 φ CQ (ti 4-1 (ti φ 0 o rl CJ) CD φ -H 4-1 Φ O 4J 4-1 Φ Φ Φ -H
-H o > SH Φ β o Φ rH - CQ 4H -H Ti (ti -. & o CQ g u o (ti rl > A 4-1 rH 0 o tn tn A A -H en (ti 4-1 Ti CQ (ti = 4J β & Φ rl g φ A 4-1
H ε CQ (ti -H CQ J a -rl (ti
(ti EH (ti UD A -. J (ti CQ CQ Φ T5 Φ rl 4J 0 > υ β φ 4J rH 4J β O β ε CQ -H 4-1 =* 1 O β Φ CO β g 0 M Φ πj υ Φ φ U rH β
0 CN CO Φ -rl CO rg Φ o φ O β -. S (ti 0 8 Φ ε ε (ti > CQ Φ -H Φ
-H -H CQ tn (ti CQ Ti 3 tn ro 4-1 4-1 -H Φ A Φ O O Φ CO -r-l & CO
4-1 tn β A Φ "•>. (ti X 4J Λ A CD β rl 4-1 φ -rl CQ CQ rl β O Φ
O (ti O rH β -H -. tn in CTl ε φ (ti φ SH CJ (ti -H (ti tn tf (ti (ti a O β rl β H O (ti -H β φ CQ (ti φ CA -H J rl 4J 0 β U 0) β 4-1 φ φ a O a β a ro J a 0 ,β φ ε 4-1 O -H φ rβ CO -H !> 4-> •H a β Φ -H M Ti rl β -H Φ
4H CO -H a -H 4-1 Dl -H (ti 13 > ε Ti CO β H φ (ti 0 (ti υ C φ φ 4-1
rl φ 4-1 tc 4-1 (ti Φ Φ Φ (ti Φ = φ β 4J 4-1 ,β Ti H -H (ti φ -H CQ tn O (ti
Φ Ti 0 a rH -H β ε Φ • Λ Ti o β β O a 4-1 Φ ■P Ti a rl CJ) 4H β O 4H 4J A
CO -H 0 rl CQ 0 -H A tn 0 4J β 0 g Φ (ti rl 0 -H CO 0 β -H 0 ε 0 β β Φ > Φ o 4-1 £ • o Ti 0 (ti co A 0 n A CO φ 3 H 4H -rl 4J φ O β β φ
0 A φ Φ > a o Ti β Φ rH φ CJ s a υ 0 υ β > •δ υ β φ rH β β rβ O rl ε 4J Ti A 0 o Φ 4-1 -H β o rH Φ Ti O co φ 0 0 0 β (ti rl Φ Φ φ -H CO φ
A CO CN Ti 0 0 β H A -H \ rH Φ ( O -H J u PH -H φ Ti Ti tn φ 4 -H 4H
Φ β Ti Φ rl Ti -H φ -H 1 > 4H X φ r-] -H -U a Ti rβ -H -H O A (ti rβ 4H
A 0 H -. H β rl 0 CQ φ 4-1 4-1 Ti o (ti A -H • -β (ti - a β -H EH CO ε EH 4-1 EH -H
90 EH Φ Φ rH -H φ υ A A fd Φ O β rl 4-4 Λ m EH 4-1 (ti H H H β Φ 0 O Ti
Ci r u (ti ε 4-> Φ β -H u 4H (ti 0 β \ φ 0 0 H U O rl A • β • r-- o 1 (ti -H β rl Φ rl -H Φ Λ 0 H rH a β rl β φ CJ (ti φ CO β
CQ T5 4H 4-1 Φ & -U J rH rl φ -H -H -H φ PH β φ β 4-1 φ Φ
© β ^ rl 4-1 ε φ β CQ Λ φ 4-> (ti A rl Ti 4-1 (ti . CO υ (ti β β Φ ϊ o β (ti β (ti φ 0 o φ β 1 rβ β (ti ε O 0 β (ti β H CM φ β r-i Φ -rl υ Λ CQ T5 υ 4-1 u Ti CM Λ 4-1 4 rH O a Φ ε 4H (ti ε -H a ϋ 4-1 Ti a 4-1 H 4-1
Homogeneous coordinates
A line in a plane is represented by the equation ax + by + c - 0 , where different choices of a, b and c give rise to different lines. The vector representation of this line is l = (a,b,c)τ . On the other hand, the equation (ka)x + (kb)y + kc = 0 also represents the same line for a nonzero constant k. Therefore the correspondence between lines and vectors are not one-to-one, since two vectors related by an overall scaling are considered to be equal. An equivalence class of vectors under this equivalence relationship is known as homogeneous vectors. The set of equivalence classes of vectors in R- —(0,0,0)r forms the projective space P^ . The notation -(0,0,0)r means that the vector (0,0,0)τ is excluded.
A point represented by the vector x = (x, y)τ lies on the line \ — (α,b,c)τ if and only if αx + by + c = 0 . This equation can be written as an inner product of two vectors, (x, y,ϊ)(α,b,c)τ = 0. Here, the point is represented as a 3-vector (x, y,l) by adding a final coordinate of 1 to the 2-vector. Using the same terminology as above, we notice that (kx,ky,k)(α,b,c)τ = 0 , which means that the vector k(x, y,ϊ) represents the same point as (x, y,ϊ) for any non-zero constant k. Hence the set of vectors k(x, y,V)T is considered to be the homogeneous representation of the point (x, y)τ in R2. An arbitrary homogeneous vector representative of a point is of the form x = (x1,x2,x3)τ .
This vector represents the point ( if , ≠O A point represented as a homogeneous vector is therefore also an element of the projective space P^. A special case of a point x = (xl,x2,x3)τ in P2 is when , = 0. This does not represent a finite point in R^ . in P^ these points are known as ideal points, or points at infini ty. The set of all ideal points is represented by x = (xl,x2 ,0)τ . This set lies on a single line known as the line at infini ty, and is denoted by the vector 1 = (0,0,1) T By calculations, one verifies that rx = (0,0,l)(jcpx2,0)r = 0.
Homographies or projective mappings
When points are being mapped from one plane to another, the ultimate goal is to find a single function that maps every point from the first plane uniquely to a point in the other plane.
A projectivi ty is an invertible mapping h from P2 —> P2 such that j , x2 and x3 lie on the same line if and only if h(xx) , h(x2) and h(x ) do (see Hartley, R. , and Zisser ann, A. , "Multiple View Geometry in computer vision", Cambridge University Press, 2000). A projectivity is also called a collineation, a projective transformation, or a homography.
This mapping can also be written as h(x) - Hx , where x, h(x) 6 P2 and H is a non-singular 3x3 matrix. H is called a homography matrix. From now on we will denote x'=/z(x), which gives us:
or just x'= Hx .
Since both x' and x are homogeneous representations of points, H may be changed by multiplying an arbitrary non-zero constant without altering the homography transformation. This means that H is only determined up to a scale. A matrix like this is called a homogeneous matrix. Consequently, H has only eight degrees of freedom, and the scale can be chosen such that one of its elements (e.g., ig ) can be assumed to be 1. However, if the coordinate origin is mapped to a point at infinity by H, it can be proven that t , =0 , and scaling H so that h9 = 0 can therefore lead to unstable results. Another way of choosing a representation for a homography matrix is to require that =1. Camera Projection Matrix
A camera is a mapping from the 3D world to the 2D image. This mapping can be written as:
or more briefly, x = PX . X is the homogeneous representation of the point in the 3D world coordinate frame, x is the corresponding homogeneous representation of the point in the 2D image coordinate frame. P is the 3x4 homogeneous camera projection matrix. For a complete derivation of P, see Hartley, R. , and Zissermann, A. , "Multiple View Geometry in computer vision" , Cambridge University Press, 2000, pages 139-144, where the camera projection matrix for the basic pinhole camera is derived. P can be factorized as:
P = KR[I I -t].
In this case, K is the 3x3 calibration matrix, which contains the inner parameters of the camera. R is the 3x3 rotation matrix and t is the 3x1 translation vector. This factorization will be used below.
On planes Suppose we are only interested in mapping points from the world coordinate frame that lie in the same plane π. Since we are free to choose our world coordinate frame as we please, we can for instance define π:Z=0. This reduces the equation above. If we denote the columns in the camera projection matrix with p,. , we get:
The mapping between the points xπ - (X,Y,1)T on π, and their corresponding points on the image x' , is a regular planar homography x'= Hxn , where H = p2 p4J .
Additional constraints
If we have a calibrated camera, the calibration matrix K will be known, and we can obtain even more information. Since
P = KR[I I -t],
and the calibration matrix K is invertible, we can get:
K~lP = R[l I -t] = K- p2 p3 p^ K- , h2 p3 h3].
The two first columns in the rotation matrix R are equivalent to the two first columns of K~lH . Denote these two column with rx and r2 , and we get: k r ^-1^ h . Since the rotation matrix is orthogonal, rx and r2 should be orthogonal and of unit length. However, as we have mentioned before, H is only determined up to scale, which means that Tj and r2 will not be normalized, but they should still be of the same length.
Conclusion: With a calibrated camera we obtain two additional constraints on JT:
T r\
«1 r2=0
ri = where
C. Solving for the homography matrix H
The first thing to consider, when solving the equation for the homography matrix H, is how many corresponding points x' x are needed. As we mentioned in section B, H has eight degrees of freedom. Since we are working in 2D, every point has constraints in two directions, and hence every point correspondence has two degrees of freedom. This means that a lower bound of four corresponding points in the two different coordinate frames is needed to compute the homography matrix H. This section will show different ways of solving the equation for H.
The Direct Linear Transformation (DLT) algorithm
For every point correspondence, we have the equation x', = Hx, . Note that since we are working with homogeneous vectors, x and Hx, may differ up to scale. The equation can also be expressed as a vector cross product x',XHx; = 0 . This form is easier to work with, since the scale factor will be removed. If we denote the j-th row in H with jT , then H can be expressed as: hlrx,
Hx. TT
X,
V
Using the same terminology as in section B, the cross product above can be expressed as:
y', h37 x, - w', h2rx, ' XHX, = w', hlrx, - x 3Txl 0 x h2rx, - y hιrx,
Since h^ ^ , hJ for 7=1..3, we can rearrange the equation and obtain:
We are now facing three linear equations with eight unknown elements (the nine elements in H minus one because of the scale factor) . However, since the third row is linearly dependent on the other two rows, only two of the equations provide us with useful information. Therefore every point correspondence gives us two equations. If we use four point correspondences we will get eight equations with eight unknown elements. This system can now be solved using Gaussian elimination. Another way of solving the system is by using SVD, as will be described below.
Singular Value Decomposition (SVD) In real life we usually don't get the position of the points to be exact, because of noise in the image. The solution to H will therefore be inexact. To get an H that is more accurate, we can use more than four point correspondences and then solve an over-determined system. If, on the other hand, the points are exact, the system will give rise to equations that are linearly dependent of each other, and we will once again end up with eight equations that are linearly independent.
If we have n numbers of point correspondences, we can denote the set of equations with Ah = 0, where A is a 2nx9 matrix, and
One way of solving this system is by minimizing the Euclidian norm |Ah| instead, subject to the constraint |h|=&, where k is a non-zero constant. This last constraint is because H is homogeneous. Minimization of the norm |Ah| is the same as optimizing the problem: minAh .
INK
A solution to this problem can be obtained by SVD. A detailed description of SVD is given in Golub, G. H. , and Van Loan, C. F., "Matrix Computations", 3d ed. , The John Hopkins University Press, Baltimore, MD, 1996.
Using SVD, the matrix A can be decomposed into:
A = USVT ,
where the last column of V gives the solution to h.
Restrictions on the corresponding points
If three points, out of the four point correspondences, are collinear, they will give rise to an under- determined system (see Hartley, R. , and Zissermann, A., "Multiple View Geometry in computer vision" , Cambridge University Press, 2000, page 74), and the solution from the SVD will be degenerate. We will therefore be restricted, when we pick our feature points, not to choose collinear points.
D. Feature restrictions
An important question is how to find features in objects. Since the results preferably are supposed to be applicable on already existing signs, it is desired to find features that are common in use and easy to detect in an image. A good feature should fulfill as many of the following criteria as possible:
• Be easy to detect, • Be easy to distinguish,
• Be located in a useful configuration.
In this section, a few different kinds of features, that can be used to compute the homography matrix if, are found. The features should somehow be associated with points, since point correspondences are used to compute H. Feature finding programs, where the user can just change a few constants, stored in the object feature
Zissermann, A. , "Multiple View Geometry in computer vision", Cambridge University Press, 2000, page 15).
It is even possible to mix feature points and lines when computing the homography matrix. There are however some more constraints involved while doing this, since points and lines are dependent of one another. As have been shown in section C, four points and similarly four lines hold eight degrees of freedom. Three lines and one point is geometrically equivalent to four points, since three non-concurrent lines define a triangle, and the vertices of the triangle uniquely define three points . Similarly, three non-collinear points and one line are equivalent to four lines, which have eight degrees of freedom. However, two points and two lines cannot be used to compute the homography matrix. The reason is that a total of five lines and five points can be determined uniquely from the two points and the two lines . The problem, however, is that four out the five lines are concurrent, and, four out of the five points are collinear. These two systems are therefore degenerate and cannot be used to compute the homography matrix.
Choose corner points
In the preferred embodiment, the equation of the lines is not used when computing the homography matrix.
Instead, the intersections of the lines are computed, and thus only points are used in the calculations . One of the reasons for doing this is because of the proportions of the coordinates (a, b and c) in the lines. In an image of VGA resolution, the values of the coordinates of a normalized line (see next section) will be
0 < |α|,|b| < l , but
0 < lcl < Λ/6402 + 4802 = 800 This means that the c coordinate is not in proportion with the a and b coordinates . The effect of this is that a slight variation of the gradient of the line (i.e., the a and Jb coordinates) might result in a large variation of the component c. This makes it hard to verify line correspondences.
The problem with these proportionate coordinates does not disappear when the intersection points of the lines are used instead of the parameters of the lines, it has just moved. This is just a way to normalize the parameters, so they easily can be compared with each other in the verification procedure.
Ξ. Line Detection With reference to FIGs 4 and 5, details about how to determine feature point candidates (i.e., step 33 in FIG 33) will now be given. Steps 41 and 42 of FIG 4 are described in this section, whereas step 43 will be described in the next section. Edges are defined as points where the gradients of of the image are large in terms of gray-scale, color, intensity or luminescence. Once all the edge points in an image have been obtained, they can be analyzed to see how many of them lie on a straight line. These points can then be used as the foundations of a line.
Edge points extraction
There are several different ways of extracting points from the image. Most of them are based on thres- holding, region growing, and region splitting and merging
(see Gonzalez, . C, and Woods, R. E., "Digital Image Processing", Addison Wesley, Reading, MA, 1993, page 414) . In practice, it is common to run a mask through the image. The definition of an edge is the intersection of two different homogeneous regions. Therefore, the masks are usually based on computation of a local derivative operation. Digital images generally absorb an undeter- mined amount of noise as a result of sampling. Therefore, a smoothing mask is also preferred before the derivative mask to reduce the noise. A smoothing mask, which gives very nice results, is the Gaussian kernel Gσ :
where <7 is the standard deviation (or the width of the kernel) and x is the distance from the point under investigation.
Instead of first running a smoothing mask over the image and then take its derivate, it is advantageous to just take the convolution of the image with the derivative of the Gaussian kernel:
FIG 5 shows — Gσ(x) for σ = 1.2. dx Since images are 2D, the filter is used in both the x and the y directions. To distinguish the edge points n, the filtered points f (n) , i.e. the result of the convolution of the image with the derivative of the Gaussian kernel, are selected, where
/(n) , where thres is a chosen threshold. In FIG 7, all the edge points detected from an original image 102 (FIG 6) are marked with a "+" sign, as indicated by reference numeral 103. A Gaussian kernel with cτ = 1.2 and thres - 5 has been used here.
Extraction of line information
Once all the edge points have been obtained, it is possible to find the equation of the line they might be a part of. The gradient of a point in the image is a vector that points in the direction, in which the intensity in the image at the current point decreases the most. This vector is in the same direction as the normal to the possible line. Therefore, the gradient of all edge points has to be found. To extract the x coefficient of the edge point, the derivative of the Gaussian kernel in 2D,
Gσ(x, y) = ~-^ - e-^2σ2 , dx 2πσ is applied to the image around the edge points. In this mask, (x, y) is the distance from the edge point.
Typically a range of is used, where < is the standard deviation.
Similarly, the y coefficient can be extracted. As mentioned above, the normal of the line has the same direction as the gradient. Hence, the a and b coefficients of the line have been obtained. The last coordinate c can easily be computed, since αx + by + c = Q . Preferably, the equation for the line will be normalized, so the normal of the line will have the length 1:
This means that the c coordinate will have the same value as the distance from the line to the origin.
Cluster edge points into lines To find out if edge points are parts of a line, constraints on the points have to be applied. There are two major constraints:
• The points should have the same gradient .
• The proposed line should run through the points. Since the image will be blurred, these constraints must be fulfilled only within a limit of a certain threshold. The threshold will of course depend on under what circumstances the picture was taken, the resolution of the image, and the object in the picture. Since all the data for the points is known, all that has to be done is to group the points together and adapt lines to them (step 42 in FIG 4) . The following algorithm is used according to the preferred embodiment : For a certain amount of loops,
Step 1: Select randomly a point p = (x, y,l)r, with the line data l = (a,b,c)τ ; Step 2: Find all other points p„ = (xn , yn ,l)τ , with the line data \n = (an ,bn,cn)τ , which lie on the same line using: p Λ < thresl ; Step 3: See if these points have the same gradient as p using: (an ,bn) - (a,b)τ > (l-thres2) ;
Step 4 : From all the points that satisfy the conditions in step 2 and step 3, pn , adapt a new line, \ = (a,b,c)τ , using SVD. Repeat step 2-3; Step 5: Repeat step 2-4 twice;
Step 6 : If there are at least a certain amount of points that satisfy these conditions, define these points to be a line; End. Repeat with the remaining points.
This algorithm selects a point by random. The equation of the line that this point might be a part of is already known. Now, the algorithm finds all other points that have the same gradient and lie on the same line as the first point. Both these checks have to be carried out within a certain threshold. In step 2, the algorithm checks if the point is closer than the distance thresl to the line. In step 3, the algorithm checks if the gradients of the two points are the same. If they are, then the product of the gradients should be 1. Once again, because of inaccuracy, it is sufficient if the product is larger than (l-thres2) . Since the edge points are not exactly located, and since the gradients will not have the exact value, a new line is computed in step 4. This line is computed from all the points, which satisfy the conditions in step 2 and step 3 using SVD, in the following way. The points are also supposed to satisfy the condition (x, y, )(a,b,c)τ =0. Therefore, an nx3 matrix consisting of these points can be composed, and the optimization of
min Al , ll'IH using SVD in similarity with section C. To obtain better accuracy, step 2 and step 3 are repeated. To increase the accuracy even further, one more recursion takes place. The values of the threshold numbers will have to be decided depending on an actual application, as is readily realized by a man skilled in the art.
FIG 8 shows the lines 104 that were found, and the edge points 103 that were used in the example above.
If the used edge points are left out, it is easier to see how good of an approximation the estimated lines are, see FIG 9.
F. Information gained from lines
To compute the homography matrix H, four corresponding points, from the two coordinate frames, are needed. Since many lines are available, additional information can be provided.
Cross points
Common features in signs are corners. However, there are usually a lot of corners in a sign that are of no interest; for instance, if there is text in the sign, the characters will give rise to a lot of corners that are of no interest. Now, when the lines that are formed by edges have been obtained, the corner points of the edges can easily be computed (step 43 of FIG 4) by taking the cross product of two lines: The vector xc will be the homogeneous representative of the point in which the lines I; and I intersect. If the third coordinate of xc=0, then xc is the point at infinity, and the lines 1, and 1; are parallel.
These cross points, combined with the information from the lines, will provide even more information. A verification whether the lines actually have edge points at the cross points, or whether the intersection is in the extension of the lines, can be applied. This information can then be compared with the feature points searched for, since information is known as regards whether or not they are supposed to have edge points at the cross points. In this way, cross points that are of no interest can be eliminated. Points that are of no interest can be of different origin. One possibility is that they are cross points that are supposed to be there, but are not used in this particular case. Another possibility is that they are generated by lines, which are not supposed to exist but which nevertheless have originated because of disturbing elements in the image.
In FIG 10, all cross points are marked with a "+" sign, as seen at 105. The actual corners of the frame are marked with a "*" sign, as seen at 106.
Parallel lines
Another common feature in signs is frames, which give rise to parallel lines. If only lines originating from frames are of interest, then all lines can be dis- carded that do not have a parallel counterpart, i.e. a line with a normal in the opposite direction close to itself. Since the image is transformed, parallel lines in the 3D world scene might not appear to be parallel in the 2D image scene. However, lines which are close to each other will still be parallel within a certain margin of error. The result of an algorithm that finds parallel lines 107, 107' is shown in FIG 11. When all the sets of parallel lines have been found, it is possible to figure out which lines that are candidates of being a line corresponding to the inside edge of a frame. If the cross products of all these lines is computed, a set of points that are putative candidates of inside corner points in a frame is obtained, as marked by "*" characters at 108 in FIG 12.
Consecutive edge points By coincidence, it is possible that the line- detecting algorithm produces a line that is actually made up from a lot of small edges that lie on a straight line. For example, edges of characters written on a straight line may give rise to such a line. If only lines con- sisting of consecutive edge points are of interest, it is desired to eliminate these other lines. One way of doing this is to take the mean point of all the edge points in the line. From this point, extrapolate a few more points along the line. Now check the differences in intensity on both sides of the line at the chosen points. If the differences in intensities at the points do not exceed a certain threshold, the line is not constructed from consecutive edge points .
With this algorithm, not only lines that originate from non-consecutive edge points will be eliminated, the algorithm will also eliminate thin lines in the image. This is a positive effect, if only edge lines originating from thick frames are used as features. In FIG 13, the same algorithms as used earlier have been applied to the image 102 displayed in FIG 6. The only difference in the algorithms is that no check has been carried out as regards whether the lines consist of consecutive edge points along edges .
FIG 14 shows an enlargement of the result of the algorithm, which checks for consecutive edge points, applied to the line 109 at the bottom of the numbers "12345678". The algorithm gave a negative result, in terms of whether it was consecutive edge points or not. FIG 15 is an enlargement of the same algorithm applied to the line 110 at the bottom of the frame. Here, the algorithm gave a positive result of the edge points being consecutive.
G. Computing the homography matrix H
Once the feature candidates in the image have been obtained, they must be matched to features from the original sign, which have known coordinates. If four feature candidates have been found, their coordinates can be matched with the corresponding object feature point coordinates stored in the area 23 of the storage means 21, and the homography matrix H can be computed. Since probably more candidates to the interesting features than the intended ones will be found, a verification procedure has to be carried out. This procedure must verify that the selected feature point correspondences have been carried out with the correct matching. Thus, if there are a lot of candidates for possible feature points, the homography matrix should be computed many times and verified every time, to check whether it is the proper point correspondence or not .
Advantageously, this matching procedure is optimized by using the RANSAC algorithm of Fischler and Bolles (see Fischler, M. A., and Bolles, R. C, "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography", Comm. Assoc . Comp . Mach . , 24 (6) :381-395 , 1981) .
RANSAC
The RANdom SAmple and Consensus algorithm (RANSAC) is an estimating algorithm that is able to work with very large sets of putative correspondences. The best way to determine the homography matrix ff is to compute H for all possible combinations, verify every solution, and then CQ 4-1
90 Cβ Φ ti u β CQ o rH 4H < β 4-1 4-1 1 H Φ ^ O rH CΛ 0 4-1 o A >1 -H ti β
Xi CQ -H !H Q φ 5z ε Φ Cti 0 Φ Φ o H 1 CQ Φ Φ H CQ φ ret β CQ 4J ε 4J a rQ X! β
Cti CQ > A cti 4J CQ Xi p_j Cβ Φ 0 cβ O CO 4-1 0 m . o cti -H 4-1 4-1 β Φ X! Ti φ 0 -H
H β a cti 0 O -H • -H J 4-1 β β β 1 ti 4-1 X! J 0 4-1 A -H β tn CO CQ Cβ β 4-1 (ti -H rH ti 4-1
0- -H β . 4-1 -H β CQ Φ β 4-1 Φ CQ Cti !H rH O CQ β -H
4-1 Φ M 0 Ti s -H 4-1 4-1 o Ti -H Φ Ti Cti υ β Φ S
Cti rH φ β Φ rj CQ Λ υ β -H φ β 4H (ti Φ 4-1 Xi J a φ > O 4J ,* Φ 9< 4J J 0 Φ 4-1 β Φ i ti Φ a Ti
-H 4H φ CQ 0 -H !H 0 rH ε Ct β O Φ xi ^ cti Φ
4H 4H -H a SH o 3 Φ 0 U Cβ a CQ β rH -H Φ Ti 4-1 O X! X!
-H -H a O CQ υ -H φ Ti A 4-1 -H 4-1 υ rl Ti o CQ β tn 1 4H Φ -H CQ U rH -U 4-1
Φ 4H -H CQ H A rH 4-1 β -H 0 rl υ CQ Φ Φ CJ H φ 0 cti
> β X! cti -H rH -H Φ 4H o 0 CQ υ rl Cβ (ti CQ β
-H ε it! 4-1 φ Cti X! Φ J O β Cti -H
4J ti U 4J υ .. A > Φ a a Φ ε β ti CQ Φ to Φ tn (ti < β -H CQ rl φ Ti CQ A φ Φ A
Φ β β CJ) CO A 4J A Ti Φ β X! u >1 β Φ 4-1 xi φ O
A 0 -H β CQ *z (ti β β X! TJ 4J ; β 0 U -H 4J > T5 >i
Ti A -H £ 2 _ ε -H (ti 4-1 Φ co ti a β rl -H rH
Φ β ε rj ζ £ φ 0 υ Φ ε O Φ 0 ^ tn CQ β
00 r φ a β 4-1 A 4-1 a 1 CQ 0 CQ 3 φ Ti tn 4-1 -H 0
CN ^ A ε O -H φ A CQ rH ε β P Φ β H CJ 4-> X!
0 β in X! -H 1 Φ i rl CQ rl o ε t
0 a rl rl 0 cti Φ X! EH rH
A β ϋ 0 O EH 0 • Φ ti 0 a rl tn rH
A cti J 0) 0 β TJ 4H CQ 4-1 β i J CQ φ ti -H • -H
-H u φ rH . tn 4H 4J β rl -H 0 4-1 φ Φ At 0 ε & u Φ Cti Φ rH 0 cti ti φ X! Φ -H φ A J X;
CQ β ε ε cti φ a 4-1 Ti 4-1 4H φ X! β 4-1 4-1 4-1
Φ φ -H -H Φ -H Ti 4H -H Cβ o xi 4J O Ti -H β -H β J iH O 4J Xi 4-> 1 Cβ O Ti 4-1 υ U 4J J β -H ti -H β β 4-1 1 4H Φ Φ 4-1 β Cti φ -H CQ CO (ti CQ 0 0 0
Φ J . 1 rH -H 4-1 rH β re Φ Ti 4H Φ β 4J A -H a tn a
Ti Φ rH β (ti ΪH O A -H a -H tn Φ CQ β rl β ϋ 0 φ Φ Φ Φ β -H o X Φ β rl Cβ X! Φ -H φ 4-1 CQ ti Φ
0 0 H > X! in > H O a -H rl φ Φ -U S 4-1 0 Ti Cti -H β a ti Φ 1 CQ rl xi > β
CQ 4-1 a in Xi A Φ 0 O a A β Ti • 0 Φ 4-1 0 4-J Cβ A o 4-1 A Xi φ -H X! -H β 1 a J Cti A 4-1 > CO -H Φ 4-> Φ rl β Ti J cti Cβ β i A 1 ε TJ CQ 6 β X! r-i 4J 4-1 u rl 0 Φ β ct E5 Φ Φ Ti β Φ Cβ A Ti 4-1 cti β (ti >1 β
0 -rl -Q 0 0 β CO Xi Ti 1 Φ (ti A 0 β υ -H Xi rQ -H υ 4-1 -H -H in o -H 4-1 4-1 β X! CO φ in Cti 4-1 -H O 4-> CQ cti 4-1 a CO β 0 a 0 CQ φ xi H 4-1 a Ti
90 Φ u υ Cti a Ti Φ Φ Xi a cti a Φ xi EH Φ -. (ti Φ -.
Ci Xi -H O β cti φ Xi ti tn O rl a ε 4-1 rl CQ • ε cti CQ -H u θN A 4-1 Φ -H -H A Φ β φ tn β -H 0 4-1 φ Φ 4H <
-H Ti π Ti ti 0 4H o 0 CQ 4-1 i ε β φ 4-1 Xi J -H i
© φ rl g O !H a 4H ε 4-1 -H Ti CQ 4-1 β ti jZ 5 o CQ Φ CO O O cti -H A o o CQ 4H -H CQ O ti 1 -H Φ Φ s β > -H U tn U rβ Ti A υ X! -H 0 -H 0 CQ TJ > ri O a
A 5th feature
The most common way to verify H is by using more feature points. In this case, even more than the four feature points from the original objects have to be known. The remaining points from the original object can then be transformed into the image coordinate system. Thereafter, a verification procedure can be performed to chech whether the points have been found in the image . The more extra features that are found, the higher likelihood that the correct set of point correspondences have been picked.
Inner Parameters of Camera
If the camera is calibrated, it is possible to verify the putative homography matrix with the inner camera parameters 24 stored in the storage means 21 (see discussion in earlier sections) . This puts even more constraints on the chosen feature points. If the points represents the corners of a rectangle, then the first and second row, r, and r2 , will give rise to the same value if the points are matched correctly up to an error of rotation of the rectangle of 180 degrees. This is obvious, since if a rectangle is rotated 180 degrees, it will give rise to exactly the same rectangle. Similarly, a square can be rotated 90, 180 or 270 degrees and still give rise to exactly the same square. In all these cases, Tj and r2 will still be orthogonal.
Although this verification procedure might give a rotation error, if the corners of a rectangle are used as feature points, it is still very useful, since rectangles are common features. The rotation error can easily be checked later on.
Verification errors Depending on how the feature points are chosen, there may still occur errors when the feature points are being verified. As mentioned above, the homography matrix is a homogenous matrix and is only determined up to a scale. If the object have points that are at the exact same configuration as the feature-and-verification points, except rotated and/or up to scale, the verification procedure will give rise to exactly the same values as if the correct point correspondences had been found. Therefore it is important to choose feature points that are as distinct as possible.
Restrictions on RANSAC
RANSAC is based on randomization. If even more information is available, then obviously this should be used to optimize the RANSAC algorithm. Some restrictions that might be added are the following.
Stop if the solution is found Instead of repeating the calculations in the procedure a specific amount of times, it is possible to stop, if the verification indicates that a solution that is good has been found. To determine if a solution is good or not, a statement can be made that if at least a certain amount of feature points in the verification procedure have been found, then this must be the correct homography matrix. If the inner parameters of the camera are used as the verification procedure, a stop can be made if rt and r2 are very close to having the same length and being orthogonal .
Collinear feature points The constraint that only such a set of feature points are supposed to be used, where no three points are allowed to be collinear, can be included in the RANSAC algorithm. After the four points have been picked by randomization, it is possible to check if three of them are collinear, before proceeding with computing the homography matrix. Combined with the next two restrictions, this check is very time efficient. Convex Hull
The convex hull of an arbitrary set S of points is the smallest convex polygon Pch for which each point in S is either on the boundary of Pch or in its interior. Two of the most common algorithms used to compute the convex hull are Graham 's scan and Jarvis 's march . Both these algorithms use a technique called "rotational sweep" (see Cormen, T. H. , Leiserson, C. E., and Rivest, R. L . , "Introduction to Algorithms", The Massachusetts Institute of Technology, 1990., page 898). When computing the convex hull, these algorithms will also provide the order of the vertices, as they appear on the hull, in counterclockwise order. Graham's scan runs in 0(n\gn) time, as opposed to Jarvis's march that runs in 0(nh) time, where n is the number of points and h is the number of vertices .
Since projective mappings are line preserving, they must also preserve the convex hull. In a set of four points, where no three points are collinear, then the convex hull will consist of either three or four of the points. This means that in two sets of corresponding points, their convex hull will both consist of either three or four points. A check for this, after the two sets of four points have been chosen, can be included in the RANSAC algorithm.
Systematic search
The principle of RANSAC is to choose four points by randomization, match them with four putative correspon- ding points also chosen by randomization and then discard these points and choose new ones. It is possible to modify this algorithm and include some systematical operations. Once the two sets of four points have been selected, all the possible combinations of matching between these points can be tested. This means that there are
4!= 24 different combinations to try. If the restrictions above are included, this number can be reduced conside- rably. First of all, make sure that no three of the four points in each set are collinear. Secondly, check if both the sets have the same amount of points in the convex hull. If they do, the order of the points on the hull will also be obtained, and now the points can only be matched with each other on either three or four different ways depending on how many points the hulls consist of.
Thus, out of 24 possible combinations, 0, 3 or 4 putative point correspondences has been reached. Of course, computing the convex hull and making sure that no three points are collinear is time consuming, but it is insignificant compared to computing the homography matrix 24 times.
Another method of reducing the computing time is to suppose that the image is taken more or less perpendicular to the target. Thus, lines which cross each other at 90 degrees will cross each other at an angle close to 90 degrees in the image. By looking for such almost perpendicular lines, it is possible to rapidly determine lines suitable for the transformation. If no such lines are found, the system continues as outlined above .
It is often time and processing power consuming to find and extract lines from an image. For the purpose of the present invention, the computation time may be decreased by downsampling of the image. Thus, the image is divided by a grid comprising for example each second line of pixels in the x and y directions. The presence of a line on the grid is determined by testing only pixels on the grid. The presence of a line may then be verified by testing all pixels along the supposed line.
H. Extraction of the target area
Once the homography matrix is known, any area from the image can be extracted, so it will seem like the picture was taken from a place located right in front of it. To do this extraction, all the points from within the area of interest will be transformed to the image plane in the resolution of choice. Since the image is a discrete coordinate frame, it is made up of pixels with integer numbers. The transformed points will probably not be integers though. Therefore, a bilinear interpolation (see e.g. Heckbert, P. S., "Graphics Gems IV", Academic Press, Inc. 1994) to obtain the intensity from the image has to be made. The transformed image can be recovered from either the gray-scale intensity, or all three intensity levels can be obtained from the original picture in color.
FIG 16 shows the target area 101 of the image 102 in FIG 6, found by the algorithms above.
In FIG 17, the target area 101' has been transformed, so that e.g. OCR or barcode interpretation can follow (steps 36 and 37 of FIG 3) . In this example, a resolution of 128 pixels in the x direction was chosen.
I. Alternative embodiments The invention has been described above with reference to an embodiment. However, other embodiments than the one disclosed above are equally possible within the scope of the invention, as defined by the appended patent claims. In particular, it is observed that the invention may be embodied in other portable devices than the one described above, for instance mobile telephones, portable digital assistants (PDA) , palm-top computers, organizers, communicators, etc.
Moreover, it is possible, within the scope of the invention, to perform some of the steps of the inventive method in the external computer 200 rather than in the hand-held device 300 itself. For instance, it is possible to transfer the transformed target area 101 as a digital image (JPEG, GIF, TIFF, BMP, EPS, etc) across the link 301 to the computer 200, which then will perform the actual processing of the transformed target area 101 so as to extract the desired information (OCR text, barcode, etc . ) .
Of course, the computer 200 may be connected, in a conventional manner, to a local area network or a global area network such as Internet, which allows the extracted information to be forwarded to still other applications outside the hand-held device 300 and computer 200. Alternatively, the extracted information may be communicated through a mobile telephone, which is operatively connected to the hand-held device 300 by IrDA, Bluetooth or cable (not shown in the drawings) . While several embodiments of the invention have been described above, it is pointed out that the invention is not limited to these embodiments. It is expressly stated that the different features as outlined above may be combined in other manners than explicitely described and such combinations are included within the scope of the invention, which is only limited by the appended patent claims .

Claims

1. A method of extracting information from a target area (101) within a two-dimensional graphical object (100) having a plurality of predetermined features (23) with known characteristics in a first plane, characterized by the steps of : reading an image (102) in which said object (100) is located in a second plane, said second plane being a priori unknown; in said image, identifying a plurality of candidates (108) to said predetermined features (23) in said second plane; from said identified plurality of feature candida- tes, calculating a transformation matrix (H) for projective mapping between said second and first planes; transforming said target area (101) of said object from said second plane into said first plane, and processing said target area so as to extract said information.
2. A method as in claim 1, wherein said plurality of predetermined features (23) are read from memory (21) before said plurality of feature candidates (108) are identified.
3. A method as in claim 1 or 2 , wherein said plurality of predetermined features (23) includes at least four features.
4. A method as in claim 3 , wherein said at least four predetermined features are four points, four lines, three points and one line, or one point and three lines.
5. A method as in claim 3, said at least four predetermined features being four points, wherein said plurality of feature candidates (108) are identified by: locating edge points (103) as points in said image (102) with large gradients; clustering said edge points into lines (104) ; and determining said plurality of feature candidates as points of intersection (105, 106, 108) between any two of said lines.
6. A method as in claim 5, wherein said points of intersection (105, 106, 108) are at four corner points of a frame in said two-dimensional graphical object
7. A method as in any preceding claim, wherein said transformation matrix (H) is calculated by: among said identified plurality of feature candida- tes, randomly selecting as many feature candidates as in said plurality of predetermined features (23); computing a hypothetical transformation matrix for said randomly selected candidates and said plurality of predetermined features; verifying the hypothetical transformation matrix; repeating the above steps a number of times; and selecting as said transformation matrix (H) the particular hypothetical transformation matrix with the best outcome from the verifying step.
8. A method as in claim 6 or 7 , wherein the hypothetical transformation matrix is verified by means of at least one additional predetermined feature.
9. A method as in any of claims 6-8, wherein said plurality of predetermined features (23) comprises at least four points and wherein said step of randomly selecting is limited to a set of four feature candidates that does not include three collinear points.
10. A method as in claim 9, wherein said step of randomly selecting is further limited by calculating the convex hull of said feature candidates.
11. A method as in any preceding claim, wherein said plurality of predetermined features (23) includes at least one point having a gray-scale, color, intensity or luminescence value which is distinctly different from surrounding points in said two-dimensional graphical object (100) .
12. A method as in any preceding claim, wherein said two-dimensional graphical object (100) is a sign.
13. A method as in any preceding claim, wherein said step of processing involves optical character recognition (OCR) of said target area (101) .
14. A method as in any preceding claim, wherein said step of processing involves barcode interpretation of said target area (101) .
15. A method as in any preceding claim, wherein said step of processing involves transfer of said target area (101) to an external computer (200).
16. A method as in any preceding claim, wherein said first plane is the image plane of said read image (102) .
17. A method as in any of claims 1-15, wherein said first plane is the image plane of a previously read image .
18. A method as in claim 1-17, wherein said plurality of predetermined features (23) are obtained by direct measurement at said previously read image .
19. A computer program product directly loadable into an internal memory (21) of a processing device (20) , the computer program product comprising program code (22) for performing the steps of any of claims 1-18 when executed by said processing device.
20. A computer program product as defined in claim 19, embodied on a computer-readable medium (21).
21. A hand-held image-producing apparatus (300) having storage means (21) and a processing device (20) , the storage means containing program code (22) for performing the steps of any of claims 1-18 when executed by said processing device.
22. An apparatus for extracting information from a target area (101) within a two-dimensional graphical object (100) having a plurality of predetermined features (23) with known characteristics in a first plane, the apparatus comprising an image sensor (8), a processing device (20) and storage means (21) , characterized by a first area (25) in said storage means (21) , said first area being adapted to store an image (102), as re- corded by said image sensor (8), in which said object (100) is located in a second plane, said second plane being a priori unknown; and a second area (23) in said storage means (21), said second area being adapted to store said plurality of predetermined features; wherein: said processing device (20) is adapted to read said image (102) from said first area (25) ; read said plurality of predetermined features from said second area (23); identify, in said image, a plurality of candidates to said features in said second plane; calculate, from said identified feature candidates, a transformation matrix (H) for projective mapping between said second and first planes; transform said target area (101) of said object from said second plane into said first plane; and, after transformation, extract said information from said target area.
23. An apparatus according to claim 22, further comprising an optical character recognition (OCR) module (29) adapted to extract said information from said target area (101) .
24. An apparatus according to claim 22, further comprising a barcode interpretation module (29') adapted to extract said information from said target area (101) .
25. An apparatus according to any of claims 22-24 in the form of a hand-held device (300) .
26. An apparatus according to any of claims 22-24, wherein said apparatus involves a hand-held device (300) and a computer (200) .
27. Use of a handheld apparatus according to any one of claims 22 to 26 for extraction of information from an image taken by said handheld apparatus by means of the methods of any one of claims 1 to 18.
EP02733754A 2001-06-07 2002-06-07 Method and apparatus for extracting information from a target area within a two-dimensional graphical object in an image Withdrawn EP1412910A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
SE0102021A SE522437C2 (en) 2001-06-07 2001-06-07 Method and apparatus for extracting information from a target area within a two-dimensional graphic object in an image
SE0102021 2001-06-07
PCT/SE2002/001098 WO2002099738A1 (en) 2001-06-07 2002-06-07 Method and apparatus for extracting information from a target area within a two-dimensional graphical object in an image

Publications (1)

Publication Number Publication Date
EP1412910A1 true EP1412910A1 (en) 2004-04-28

Family

ID=20284397

Family Applications (1)

Application Number Title Priority Date Filing Date
EP02733754A Withdrawn EP1412910A1 (en) 2001-06-07 2002-06-07 Method and apparatus for extracting information from a target area within a two-dimensional graphical object in an image

Country Status (3)

Country Link
EP (1) EP1412910A1 (en)
SE (1) SE522437C2 (en)
WO (1) WO2002099738A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI20055111A0 (en) 2005-03-11 2005-03-11 Nokia Corp Creating information for a calendar application in an electronic device
JP4280729B2 (en) * 2005-05-31 2009-06-17 キヤノン株式会社 Irradiation field region extraction method and radiation imaging apparatus
CN109358646B (en) * 2018-07-26 2020-11-24 北京航空航天大学 Missile autonomous formation random control system modeling method with multiplicative noise

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5481621A (en) * 1992-05-28 1996-01-02 Matsushita Electric Industrial Co., Ltd. Device and method for recognizing an image based on a feature indicating a relative positional relationship between patterns
EP0858637B1 (en) * 1995-10-27 2002-09-11 Licentia OY Scanning and interpretation device and method for reading and interpreting signs and characters
AU9676298A (en) * 1997-10-01 1999-04-23 Island Graphics Corporation Image comparing system
US6009198A (en) * 1997-11-21 1999-12-28 Xerox Corporation Method for matching perceptual shape similarity layouts across multiple 2D objects
WO2000036565A1 (en) * 1998-12-16 2000-06-22 Miller Michael I Method and apparatus for processing images with regions representing target objects

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BURNS B.; HANSON A.; RISEMAN E., IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 8, no. 4, July 1986 (1986-07-01), pages 425 - 455, XP001016046 *
FUJISAWA H.; SAKO H.; OKADA Y.; SEONG-WHAN LEE: "Information capturing camera and developmental issues", ICDAR '99. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, 20 September 1999 (1999-09-20), pages 205 - 208, XP010351192 *
HARTLEY R.; ZISSERMAN A.: "Multiple View Geometry in Computer Vision", 2000, CAMBRIDGE UNIVERSITY PRESS, NEW YORK, USA *
LIEBOWITZ D.; ZISSERMAN A.: "Metric rectification for perspective images of planes", IEEE COMPUTER SOCIETY CONFERENCE, 23 June 1998 (1998-06-23), SANTA BARBARA, CA, pages 482 - 488, XP010291688 *
SCHAFFALITZKY F.; ZISSERMAN A.: "Geometric Grouping of Repeated Elements within Images", SHAPE, CONTOUR AND GROUPING IN COMPUTER VISION, LNCS 1681, pages 165 - 181 *
See also references of WO02099738A1 *
SUK T.; FLUSSER J.: "Convex Layers: A new tool for Recognition of Projectively Deformed Point Sets", CAIP'99, LNCS 1689, 1999, pages 454 - 461 *

Also Published As

Publication number Publication date
SE0102021L (en) 2002-12-08
SE0102021D0 (en) 2001-06-07
SE522437C2 (en) 2004-02-10
WO2002099738A1 (en) 2002-12-12

Similar Documents

Publication Publication Date Title
US20030030638A1 (en) Method and apparatus for extracting information from a target area within a two-dimensional graphical object in an image
US7218773B2 (en) Pose estimation method and apparatus
Robertson et al. An Image-Based System for Urban Navigation.
US7313289B2 (en) Image processing method and apparatus and computer-readable storage medium using improved distortion correction
US8467596B2 (en) Method and apparatus for object pose estimation
US20200380229A1 (en) Systems and methods for text and barcode reading under perspective distortion
US20160371855A1 (en) Image based measurement system
JP5261501B2 (en) Permanent visual scene and object recognition
Elibol et al. A new global alignment approach for underwater optical mapping
US9036037B1 (en) System and method for pattern detection and camera calibration
KR102608956B1 (en) A method for rectifying a sequence of stereo images and a system thereof
US20150104068A1 (en) System and method for locating fiducials with known shape
Tsigkas et al. Markerless detection of ancient rock carvings in the wild: rock art in Vathy, Astypalaia
Karimi et al. A new method for automatic and accurate coded target recognition in oblique images to improve augmented reality precision
Ventura et al. Structure and motion in urban environments using upright panoramas
WO2002099738A1 (en) Method and apparatus for extracting information from a target area within a two-dimensional graphical object in an image
Belo et al. Digital assistance for quality assurance: Augmenting workspaces using deep learning for tracking near-symmetrical objects
CN109741389A (en) One kind being based on the matched sectional perspective matching process of region base
Andreasson et al. Non-iterative vision-based interpolation of 3D laser scans
Alkaabi et al. Iterative corner extraction and matching for mosaic construction
Yaman et al. Performance evaluation of similarity measures for dense multimodal stereovision
WO2006105465A2 (en) Automated alignment of spatial data sets using geometric invariant information and parameter space clustering
Gu et al. Pose ambiguity elimination algorithm for 3c components assembly pose estimation in point cloud
JP7478628B2 (en) Image processing device, control method, and control program
US20240242318A1 (en) Face deformation compensating method for face depth image, imaging device, and storage medium

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20040107

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

17Q First examination report despatched

Effective date: 20050406

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20060510