EP1756521A2

EP1756521A2 - Method for encoding and serving geospatial or other vector data as images

Info

Publication number: EP1756521A2
Application number: EP05725818A
Authority: EP
Inventors: Blaise Aguera Y Arcas
Original assignee: Seadragon Software Inc
Current assignee: Seadragon Software Inc
Priority date: 2004-03-17
Filing date: 2005-03-17
Publication date: 2007-02-28
Also published as: CA2559678A1; JP2007529786A; WO2005089434A2; CA2559678C; WO2005089434A3

Abstract

A system and method are disclosed which include providing a first layer of an image ( 320), the first layer including features of the image having locations within the first layer ; and providing a second layer of the image (330), the second layer including data blocks corresponding to respective ones of the features; each data block being in a location in the second layer ( 330) substantially corresponding to a location in the first layer (320) of the feature corresponding to each data block, wherein a size and shape of the second layer substantially correspond to a size and shape of said first layer.

Description

METHOD FOR ENCODING AND SERVING GEOSPATIAL OR OTHER VECTOR DATA AS IMAGES

BACKGROUND OF THE INVENTION Recently, image compression standards such as JPEG2000/JPIP (See e.g. David Taubman's implementation of Kakadu, available on the Kakadu software web site: www.kakadusoftware.com) have been introduced to meet a demanding engineering goal: to enable very large images (i.e. giga-pixels in size) to be delivered incrementally or selectively from a server to a client over a low-bandwidth communication channel. When such images are being viewed at full resolution, only a limited region can fit on a client's graphical display at any given time; the new standards are geared toward selectively accessing such regions and only sending across the communication channel data that is relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low-bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate. Prior technologies are typically limited to the incremental and selective transmission of discretely sampled images. The present invention extends this representation and transmission model to include vector data, hyperlinks, and other spatially localized features.

SUMMARY OF THE INVENTION One or more embodiments of the present invention relate to an extension of these selectively de-compressible image compression and transmission technologies to geospatial or schematic data. The one or more embodiments combine and extend methods described in the following documents, which are included in an appendix of this specification: (1) "Method for Spatially Encoding Large Texts, Metadata, and Other Coherently Accessed Non-Image Data"; (2) "Methods And Apparatus For Navigating An Image"; (3) "System and Method For The

Efficient, Dynamic And Continuous Display Of MultiResolution Visual Data"; (4) "System and Method For Foveated, Seamless, Progressive Rendering In A Zooming User Interface"; and (5) "System and Method for Multiple Node Display". The appendix included with this filing forms part of the description of this patent application. It is noted that the methods and apparatus described thus far and/or described later in this document may be achieved utilizing any of the known technologies, such as standard digital circuitry, analog circuitry, any of the known processors that are operable to execute software and/or firmware programs, programmable digital devices or systems, programmable array logic devices, or any combination of the above. One or more embodiments of the invention may also be embodied in a software program for storage in a suitable storage medium and execution by a processing unit.

According to one aspect, the invention provides a method of transmitting information indicative of an image comprising transmitting one or more nodes of information as a first image, transmitting a second image including information indicative of vectors defining characteristics to be utilized for display at predetermined locations in the first image, and transmitting a third image comprising a mapping between the first and second images such that a receiver of the first and second images can correlate the first and second images to utilize the vectors at the predetermined locations. Preferably, the first image is a map and the second image is a set of vectors defining visual data that is only displayed at predetermined levels of detail. Preferably, the first image is a map. Preferably, the second image includes hyperlinks. Preferably, the first image is a map, and the second image includes a set of vectors and wherein plural ones of the vectors are located at locations corresponding to locations on the first image wherein the vectors are to be applied, and plural ones of the vectors are located at locations on the second image which do not correspond to the locations on the first image wherein the vectors are to be applied. Preferably, the method further comprises utilizing an efficient packing algorithm to construct the second image to decrease an amount of space between a location on the second image at which one or more vectors appear, and a location on the first image where the one or more vectors are to be applied. Preferably, the vectors include information to launch a node or sub-node.

According to another aspect, the invention provides a method of rendering an image comprising receiving a first, second, and third set of data from a remote computer, the first data set being representative of an image, the second being representative of vectors defining characteristics of the image at prescribed locations, and the third serving to prescribe the locations. Preferably, the prescribed locations are street locations on a map. Preferably, the vectors represent sub-nodes and include information indicative of under what conditions the sub-nodes should launch. Preferably, the vectors include hyperlinlcs to at least one of the group consisting of: external content, such as advertising materials, and/or embedded visual content. Preferably, the vectors include hyperlinlcs to advertising materials. Preferably, the vectors include information specifying a rendering method for portions of an image at predetermined locations in the image.

According to another aspect, the invention provides a method, comprising: providing a first layer of an image, the first layer including features of the image having locations within the first layer; and providing a second layer of the image, the second layer including data blocks corresponding to respective ones of the features; each data block being in a location in the second layer substantially corresponding to a location in the first layer of the feature corresponding to each data block, wherein a size and shape of the second layer substantially correspond to a size and shape of the first layer. Preferably, each data block describes at least one characteristic of the feature corresponding to each data block. Preferably, the method further comprises providing a third layer of the image, the third layer including pointers, each pointer corresponding to a respective one of the features and a respective one of the data blocks. Preferably, each pointer indicates the location of each pointer's corresponding data block with respect to each pointer's location. Preferably, the describing comprises providing text data for at least one feature. Preferably, the describing comprises providing a graphical illustration of at least one feature. Preferably, the describing comprises providing geometric data indicative of at least one feature. Preferably, the describing comprises providing two- dimensional or three-dimensional shape or contour information for at least one feature. Preferably, the describing comprises providing color information for at least one feature. Preferably, the describing comprises providing advertising or hyperlinking information relating to at least one feature. Preferably, the describing comprises providing at least one link to an external web site relating to at least one feature. Preferably, the describing comprises providing embedded visual content relating to at least one feature. Preferably, the describing comprises providing advertising information relating to at least one feature. Preferably, the describing comprises: providing schematic detail of a road segment. Preferably, the describing comprises: providing schematic detail for at least one of the group consisting of: at least one road, at least one park, a topography of a region, a hydrography of a body of water, at least one building, at least one public restroom, at least one wireless fidelity station, at least one power line, and at least one stadium. According to yet another aspect, the invention provides an apparatus including a processing unit operating under the control of one or more software programs that are operable to cause the processing unit to execute actions, including: providing a first layer of an image, the first layer including features of the image having locations within the first layer; and providing a second layer of the image, the second layer including data blocks corresponding to respective ones of the features; each data block being in a location in the second layer substantially corresponding to a location in the first layer of the feature corresponding to each data block, wherein a size and shape of the second layer substantially correspond to a size and shape of the first layer. According to yet another aspect, the invention provides a storage medium containing one or more software programs that are operable to cause a processing unit to execute actions, including: providing a first layer of an image, the first layer including features of the image having locations within the first layer; and providing a second layer of the image, the second layer including data blocks corresponding to respective ones of the features; each data block being in a location in the second layer substantially corresponding to a location in the first layer of the feature corresponding to each data block, wherein a size and shape of the second layer substantially correspond to a size and shape of the first layer. According to another aspect, the invention provides a method, comprising: providing a first layer of an image, the first layer including features of the image having locations within the first layer; providing a second layer of the image, the second layer including data blocks corresponding to and describing respective ones of the features, each data block being in a location in the second layer at least substantially corresponding to a location in the first layer of the feature corresponding to each data block; and providing a third layer of the image, the third layer including pointers having locations in the third layer, each pointer corresponding to a respective one of the features and a respective one of the data blocks, the location of each pointer in the third layer at least substantially corresponding to the location in the first layer of the feature corresponding to each pointer. Preferably, the second layer and the third layer each have a size and shape corresponding to a size and a shape of the first layer. Preferably, the method further comprises: forming a map image from a combination of the first layer, the second layer, and the third layer. Preferably, the method further comprises: flattening data in the map image. Preferably, each pointer indicates the location of each pointer's corresponding data block with respect to each pointer's location. Preferably, the indicating comprises identifying an offset in two dimensions. Preferably, each dimension of the offset is expressed in units corresponding to an integral number of pixels, e.g. 2 or 4. Preferably, the indicating comprises identifying an offset as a one-dimensional distance along a Hubert curve. Preferably, the offset along the one- dimensional curve is expressed in units of pixels. Preferably, the offset along the one- dimensional curve is expressed in units corresponding to an integral number of pixels. Preferably, the offset along the one-dimensional curve is expressed in units corresponding to integral multiples of pixels. Preferably, placing each data block comprises: locating each data block employing a packing algorithm to achieve a maximum proximity of each data block to a target location for each data block in the second layer, the target location in the second layer corresponding to the location in the first layer of the feature corresponding to each data block. Preferably, the packing algorithm ensures that no two data blocks in the second layer overlap each other. Preferably, the maximum proximity is determined based on a shortest straight-line distance between each data block's location and the target location for each data block. Preferably, the maximum proximity is determined based on a sum of absolute values of offsets in each of two dimensions between each data block's location and the target location for each data block. Preferably, the maximum proximity is determined based on a minimum Hubert curve length between each data block's location and the target location for each data block. According to another aspect, the invention provides a storage medium containing one or more software programs that are operable to cause a processing unit to execute actions, comprising: providing a first layer of an image, the first layer including features of the image having locations within the first layer; providing a second layer of the image, the second layer including data blocks corresponding to and describing respective ones of the features, each data block being in a location in the second layer at least substantially corresponding to a location in the first layer of the feature corresponding to each data block; and providing a third layer of the image, the third layer including pointers having locations in the third layer, each pointer corresponding to a respective one of the features and a respective one of the data blocks, the location of each pointer in the third layer at least substantially corresponding to the location in the first layer of the feature corresponding to each pointer.

According to another aspect, the invention provides an apparatus including a processing unit operating under the control of one or more software programs that are operable to cause the processing unit to execute actions, comprising: providing a first layer of an image, the first layer including features of the image having locations within the first layer; providing a second layer of the image, the second layer including data blocks corresponding to and describing respective ones of the features, each data block being in a location in the second layer at least substantially corresponding to a location in the first layer of the feature corresponding to each data block; and providing a third layer of the image, the third layer including pointers having locations in the third layer, each pointer corresponding to a respective one of the features and a respective one of the data blocks, the location of each pointer in the third layer at least substantially corresponding to the location in the first layer of the feature corresponding to each pointer. Other aspects, features, advantages, etc. will become apparent to one skilled in the art when the description of the preferred embodiments of the invention herein is taken in conjunction with the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS

For the purposes of illustrating the various aspects of the invention, there are shown in the drawings forms that are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown. FIG. 1 illustrates a prerendered layer of a roadmap image including a plurality of features suitable for description in data blocks in accordance with one or more embodiments of the present invention;

FIG. 2 illustrates the roadmap of FIG. 1 and the pointers and data blocks corresponding to the respective road segments in a region having a low concentration of road segments in accordance with one or more embodiments of the present invention;

FIG. 3 illustrates a concentrated set of road segments belonging to a plurality of roads with a main road as well as pointers and data blocks corresponding to the road segments in a region having a high concentration of intersections in accordance with one or more embodiments of the present invention; FIG. 4 illustrates test output of a greedy rectangle packing algorithm for three cases in accordance with one or more embodiments of the present invention;

FIG. 5A is an image of binary 8-bit data taken from a dense region of roadmap data image of the Virgin Islands before the flattening of such data in accordance with one or more embodiments of the present invention; FIG. 5B is an image of binary 8-bit data taken from a dense region of roadmap data image of the Virgin Islands after the flattening of such data in accordance with one or more embodiments of the present invention;

FIG. 6 illustrates a first-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention;

FIG. 7 illustrates a second-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention;

FIG. 8 illustrates a third-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention; FIG. 9 illustrates a fourth-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention;

FIG. 10 illustrates a fifth-order Hilbert curve for mapping a two-dimensional pointer vector onto a one-dimensional distance, d, along the Hilbert curve, in accordance with one or more embodiments of the present invention;

FIG. 11 depicts an image of one of the U.S. Virgin Islands which incorporates 4-pixel by 4- pixel size data blocks for use in accordance with one or more embodiments of the present invention; FIG. 12 depicts an image of one of the U.S. Virgin Islands which incorporates 6-pixel by 6- pixel size data blocks for use in accordance with one or more embodiments of the present invention; and

FIG. 13 depicts an image of one of the U.S. Virgin Islands which incorporates 8-pixel by 8- pixel size data blocks for use in accordance with one or more embodiments of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

It is noted that the various aspects of the present invention that will be discussed below may be applied in contexts other than encoding and/or serving map data. Indeed, the extent of images and implementations for which the present invention may be employed are too numerous to list in their entirety. For example, the features of the present invention may be used to encode and/or transmit images of the human anatomy, complex topographies, engineering diagrams such as wiring diagrams or blueprints, gene ontologies, etc. It has been found, however, that the invention has particular applicability to encoding and/or serving images in which the elements thereof are of varying levels of detail or coarseness. Therefore, for the purposes of brevity and clarity, the various aspects of the present invention will be discussed in connection with a specific example, namely, encoding and/or serving of images of a map.

In (2), the concept of continuous multi-scale roadmap rendering was introduced. The basis for one or more embodiments of the invention of (2) is a pre-rendered "stack" of images of a roadmap or other vector-based diagram at different resolutions, in which categories of visual elements (e.g. classes of roads, including national highway, state highway, and local road) are rendered with different visual weights at different resolutions.

During client/server interaction, corresponding areas of more than one of these images can be downloaded, and the client's display can show a blended combination of these areas. Blending coefficients and a choice of image resolutions can be blended depending upon the zoom scale. The net result is that a user on the client side can navigate through a large map (e.g. all roads in the United States), zooming and panning continuously, without experiencing any visual discontinuities, such as categories of roads appearing or disappearing as the zoom scale is changed.

Rather, at every scale, the most relevant categories can be accentuated. For example, when zoomed out to view the entire country, the largest highways can be strongly weighted, making them stand out clearly, while at the state level, secondary highways can also weighted strongly enough to be clearly visible. When the user zooms in to the point where the most detailed pre- rendered image is being used, all roads are clearly visible, and in the preferred embodiment for geospatial data, all elements are preferably shown at close to their physically correct scale. A maximum reasonable resolution for these most detailed pre-rendered images may be about 15 meters/pixel. However, it is desirable from the user's standpoint to be able to zoom in farther.

However, pre-rendering at still higher levels of detail is not desirable for several reasons: first, because the file sizes on the server side become prohibitively large (a single Universal

Transverse Mercator zone image at 15 meters/pixel may already be in the gigapixel range); second, because a pre-rendered image is an inefficient representation for the kind of very sparse black-and-white data normally associated with high-resolution roadmap rendering; and third, because the client may require the "real" vector data for performing computational tasks beyond a static visual presentation.

For example, a route guidance system may highlight a road or change its color as displayed to a user on a monitor or in print media. This can be done on the client side only if the client has access to vector data, as opposed to a pre-rendered image alone. Vector data may also include street names, addresses, and other information which the client preferably has the flexibility to lay out and render selectively. Pre-rendering street name labels into the map image stack is undesirable, as these labels are preferably drawn in different places and are preferably provided with different sizes depending on the precise location and scale of the client view. Different label renditions should not blend into one another as the user zooms. Pre-rendering such data would also eliminate any flexibility with regard to font. To summarize, vector data (where we use the term generically to refer both to geometric and other information, such as place names) is both beneficial to the client in its own right, and a more efficient representation of the information than pre-rendered imagery, when the desired rendering resolution is high. However, if a large area is to be rendered at low resolution, the complete vector data may become prohibitively large and complex, making the pre-rendered image the more efficient representation. Even at low resolution, however, some subset of the vector data is beneficial, such as the names of major highways. This subset of the vector data may be included in a low resolution data layer associated with the low resolution pre-rendered layer, with more detailed vector data available in data layers associated with higher resolution pre-rendered layers.

One or more embodiments of the present invention extend the methods introduced in (1) to allow spatial vector data to be encoded and transmitted selectively and incrementally to the client, possibly in conjunction with the pre-rendered imagery of (2). In the prior art, this would be accomplished using a geospatial database. The database would need to include all relevant vector data, indexed spatially. Such databases present many implementation challenges. In one or more embodiments of the present invention, instead of using conventional databases, we use spatially addressable images, such as those supported by JPEG2000/JPIP, to encode and serve the vector data.

Multiple Image Map Data Representation In one or more embodiments, three images or channels are used for representing the map data, each with 8-bit depth. The prerendered layer is a preferably pre-computed literal rendition of the roadmap, as per (2). The pointer layer preferably includes 2*2 pixel blocks which are preferably located in locations within the pointer layer that correspond closely, and sometimes identically, to the locations, within the pre-rendered layer, of the respective features that the pointers correspond to. And the data layer preferably consists of n*m pixel blocks centered on or positioned near the 2*2 pointers which refer to them. The prerendered layer may also be in 24-bit color, or in any other color space or bit depth.

It is noted that that the prerendered layer, the pointer layer, and the data layer are in essence two-dimensional memory spaces for storing various quantities of binary data. These three layers preferably correspond to a common two-dimensional image region which is the subject of a roadmap or other two-dimensional image representation to a client. As used herein, the terms "size" and "shape" of a layer generally correspond to the size and shape, respectively, of the two-dimensional image which the data in that layer relates to.

Preferably, the prerendered layer, the pointer layer, and the data layer forming a particular map image, for instance, have "sizes" and "shapes" in the two-dimensional image (that is formed from these three layers) that are at least very close to, or possibly identical to, one another. This is preferably true however the stored data for the three layers are distributed within a physical memory of a data processing system. In one embodiment, the pertinent "features" in the prerendered layer may be road segments. In a map having 10 road segments, pointer 1 in the pointer layer would correspond to road segment 1 in the prerendered layer and to data block 1 in the data layer. Pointer 2 would correspond to road segment 2 and to data block 2, and so forth, with pointer "n," in each case corresponding to road segment "n" and to data block "n" for n=l to n=10. Moreover, pointer 1 is preferably in a location within the pointer layer that corresponds closely, and perhaps identically, to the location of road segment 1 (or more generally "feature 1") within the prerendered layer.

The various map layers (pre-rendered, pointer, and data) can be thought of as being superimposed on each other from the standpoint of readiness of association of an entry

(feature, pointer, or data block) in any layer to the corresponding entry (feature, pointer, or data block) in any other layer of these three layers. Thus, the size and shape of the three map layers preferably correspond closely to one another to make the desired associations of entries in the respective map layers as seamless as possible within a data processing system configured to access any of the layers and any of the entries in the layers, as needed. It will be appreciated that while the discussion herein is primarily directed to maps formed from three layers of data, the present invention could be practiced while using fewer or more than three layers of data, and all such variations are intended be within the scope of the present invention.

Because the three map layers are preferably of equal size and in registration with each other, they can be overlaid in different colors (red, green, blue on a computer display, or cyan, magenta, yellow for print media) to produce a single color image. FIGS. 1-3 may be displayed in color (either on an electronic display or on print media), and may be stored on the server side as a single color JPEG2000. However, for the sake of simplicity, FIGS. 1-3 are presented in black and white in this application. Preferably, only the prerendered layer would actually be visible in this form on the client's display.

FIG. 1 illustrates a prerendered layer of a roadmap including a plurality of features numbered 102 through 124. For the sake of simplicity, in FIG. 1, the features shown are all road segments. However, features may include many other entities such as sports arenas, parks, large buildings and so forth. The region shown in FIG. 1 is included for illustrative purposes and does not correspond to any real-world city or street layout.

FIG. 2 illustrates the roadmap of FIG. 1 as well as the pointers and data blocks corresponding to the respective road segments in a region having a low concentration of road segments in accordance with one or more embodiments of the present invention. Road segment 102 is shown in FIG. 2 and the other road segments from FIG. 1 are reproduced in FIG. 2. However, due to space limitations, the reference numerals for the other eleven road segments (104-124) are not shown in FIG. 2. Throughout FIGS. 2 and 3, pointers are shown as dark grey blocks, and data blocks are shown as larger light grey blocks.

Because FIG. 2 illustrates a region having a relatively low concentration of road segments per unit area, there is no difficulty locating pointers (202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222 and 224) at locations in the second layer (pointer layer) of map 200 corresponding closely, and possibly identically, to the locations in the first layer (prerendered layer) of map 200 of the respective features to which the pointers correspond. Similarly, data blocks (242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262 and 264) can be placed in locations within the third layer (data layer) of map 200 that correspond reasonably closely to the locations within the prerendered layer of map 200 of the respective features to which the data blocks correspond.

FIG. 3 illustrates a concentrated set of road segments of a plurality of minor roads 320 and a smaller number of main roads 310 as well as pointers and data blocks corresponding to their respective road segments in a region having a high concentration of road segments in accordance with one or more embodiments of the present invention. Reference numeral 330 refers to all of the pointers, and reference numeral 340 refers to all of the data blocks.

In the exemplary region of FIG. 3, the concentration of features is too high to enable all of the pointers or all of the data blocks to be located in locations within their respective layers that correspond exactly to the locations of the features in layer one that they correspond to. The degree of offset for the pointer locations may be minor or significant depending upon the degree of crowding. However, the concentration of road segments in FIG. 3 precludes placing the data blocks in locations in layer three closely corresponding to the locations of the respective features within layer one that the data blocks correspond to. Accordingly, data blocks 340 are distributed as close as possible to their corresponding pointers, making use of a nearby empty area 350 which is beneficially devoid of features, thus allowing the data blocks 340 to overflow into empty area 350. Empty area 350 may be any type of area that does not have a significant concentration of features having associated pointers and data blocks, such as, for example, a body of water or a farm. A packing algorithm may be employed to efficiently place data blocks 340 within map 300. This type of algorithm is discussed later in this application, and the discussion thereof is therefore not repeated in this section.

If the user navigates to the view of the map 100 shown in FIG. 1, the client can request from the server the relevant portions of all three image layers, as shown. The prerendered layer is generally the only one of the three image layers that displays image components representing the physical layout of a geographical area. The other two image layers preferably specify pointers and data blocks corresponding to features in the prerendered layer.

In one embodiment, the pointer image consists of 2x2 pixel blocks aligned on a 2x2 pixel grid, each of which specifies an (x,y) vector offset (with the x and y components of the vector each comprising a 16-bit integer, hence two pixels each) from its own location to the beginning (top left corner) of a corresponding data block in the data layer.

In this embodiment, the corresponding data block, in turn, can begin with two 16-bit values (four pixels) specifying the data block width and height. Preferably, the width is specified first, and is constrained to have a magnitude of at least 2 pixels, hence avoiding ambiguities in reading the width and height. The remainder of the data block can be treated as binary data which may contain any combination of vectors, text, or other information. In the examples of FIGS. 2-3, data blocks may contain street-map information including street names, address ranges, and vector representations. Compression:

In this section, the advantages of one or more embodiments of the above-discussed pointer layer and data layer combination over existing approaches are presented. One existing solution involves sending requests to a spatial database for all relevant textual/vector information within a window of interest. The server then replies with a certain volume of text. Existing spatial database systems send back the information substantially as plain text.

In one or more embodiments of the present invention, however, wavelet compression can be applied, thereby enabling the server to satisfy the data request while sending a much smaller quantity of data than would existing systems. Areas with no data storage that are located between data-storage areas on the data and pointer layers create very little waste, as they would if the image were being transmitted in uncompressed in raster order, because these areas have zero complexity and can be compressed into a very small number of bits in the wavelet representation.

Exploitation of Spatial Coherence:

Typical browsing patterns involve gradual zooming and panning. Multi-resolution streaming image browsing technologies are designed to work well in this context. Complete information can be transmitted for the window of interest, and partial information can be transmitted for areas immediately surrounding the window of interest. Upon panning or other movement, preferably only the relevant new information is transmitted (the "delta"). All of this can be done in a very efficient manner. An especially large data block, for example, may be partially transmitted well before the window of interest intersects the data block's anchor point.

Performance:

In one or more embodiments of the present invention, the pointer layer shows how distant a data block is from the pointer it corresponds to. We recall that in rural areas, the data blocks can be centered directly on the pointer positions. In this case, all data would be perfectly local. In urban areas, however, data begins to "crowd", and data blocks may be in positions offset from the respective pointers (in the pointer image) and the respective features (in the prerendered image) that the data blocks correspond to. In one or more embodiments of the present invention, when a map is generated, an upper limit can be imposed on the maximum displacement of a data block from the feature to which it corresponds in the prerendered image. This in turn limits the maximum area, and hence the maximum complexity, of a portion of a data layer relevant to a window of interest of a given size in the prerendered image. For example, if the maximum displacement is 32 pixels, then the window in the data image need only be 32 pixels bigger on each side than the window of interest in the prerendered image.

If the data density increases above the point where the packing is possible, one can always increase the resolution, for example, from 15 meters/pixel to 12 meters/pixel. This gives the data more "breathing space". It is noted that data of different kinds can also be distributed among different levels of detail. Thus, for example, excessive crowding of the data at a resolution of 40 meters/pixel implies that some class of that data might be better stored at the 30 meter/pixel level.

Crowding may be visible in the data layer and in the pointer layer. The appropriate resolutions for different classes of data may vary over space, so that for example small-street vector data might be encodable at 30 meters/pixel in rural areas but only at 12 meters/pixel in urban areas. In short, the pointer and data images make data crowding easy to detect and correct, either visually or using data-processing algorithms. The resulting hierarchy of data images can help ensure high-performance vector browsing even in a low-bandwidth setting, since the amount of data needed for any given view can be controlled so as to not exceed an upper bound. This kind of upper bound is extremely difficult to enforce, or even to define, in existing geospatial databases. Implementation Convenience:

One or more aspects of the present invention concern mapping the geospatial database problem onto the remote image browsing problem. A great deal of engineering has gone into making remote image browsing work efficiently. In addition to basic compression technology, this includes optimization of caching, bandwidth management, memory usage for both the client and the server, and file representation on the server side. This technology is mature and available in a number of implementations, as contrasted with conventional geospatial database technology.

Accordingly, one or more embodiments of the present invention contemplate bringing about efficient cooperation between a suitably arranged geospatial database and remote image browsing technology that interacts with the geospatial database. Further, in one or more embodiments of the present invention, only a single system need then be used for both image and data browsing, with only a simple adapter for the data on the client side. The foregoing is preferable to having two quasi-independent complex systems, one for image browsing and another for data browsing.

Alternative Method for Representing Pointers

In one or more alternative embodiments of the present invention, we consider the Hilbert curve, sometimes also called the Hilbert-Peano curve. The Hilbert curve belongs to a family of recursively defined curves known as space-filling curves (see http://mathworld. wolfram.com /HilbertCurve.html, or for the original reference, Hilbert, D. "Uber die stetige Abbildung einer Linie auf ein Flachenstϋck." Math. Ann. 38, 459-460, 1891, which is incorporated by reference herein.). Hilbert curves of order 1, 2, 3, 4 and 5 are illustrated in FIGS. 6, 7, 8, 9, and 10, respectively.

As is evident in the high-order limit, the one-dimensional curve fills the entire unit square (formally, becomes dense in the unit square). The n^th order curves visit 4^Λn points on the unit square. For the first order case (for 4^Λ1), these points are the corners of the square. For the purposes related to the present invention, it is preferred to have the n^th order curve visit all of the integer coordinates in the square (0,0)-(2^Λn - 1, 2^Λn - 1). Using bit manipulation, there are known rapid algorithms for inter-converting between path length on the n^th order Hilbert curve and (x,y) coordinates (see Warren, Henry S. Jr., Hacker's Delight, Addison- Wesley 2003, chapter 14), which is hereby incorporated herein by reference. For example, for the second order curve, this inter-conversion would map from the left column to the right or vice versa in the following table: TABLE 1

The Hilbert curve is relevant to the problem of encoding the pointer image because it provides a convenient way of encoding a two-dimensional vector (having components x and y) as a single number d, while preserving the "neighborhood relationship" fairly well. The neighborhood relationship means that as the vector position changes slowly, d tends to also change slowly, since, generally, points whose "d" values are close together are also close in two-dimensional space. However, this relationship does not always hold. For example, in the order-2 case, when moving from (1,0) to (2,0), the path distance "d" goes from 1 to 14. It is not possible to fill 2D space with a ID curve and always preserve the neighborhood relationship.

Representing the path distance d of the n^th order Hilbert curve requires 2*n (2 multiplied by n) bits. When employing a two-dimensional representation, the x and y coordinates each require n bits to represent a point located on the path distance d. Hence "Hilbert coding" a vector re- packages the pair of numbers (x,y) as a single number d, but both the input (x,y) and the output d use the same number of bits, 2n. Small (x,y) vectors encode to small values of d. In fact, it can be observed that the n^th order Hilbert curve is nothing more than the lower-left quarter of the (n+l)^th order curve.

Hence, the value of every pixel in an 8-bit image can be taken to be a path distance on the 4th order Hilbert curve, thus encoding a vector in the range (0,0)-(15,15), i.e. anywhere on a 16*16 grid. Instead of packing 4 bits of x in the low-order nibble and 4 bits of y in the high-order nibble, the Hilbert coded pixels will have a value which is less than 16 if x and y are both less than 4, and a value less than 4 if x and y are both less than 2. Because the data packing algorithm preferably packs data blocks as close as possible to the anchor point (where the pointer will be inserted), vectors with small values are common. Moreover, if these vectors are Hilbert-coded, this will translate into small pixel values in the pointer image and hence better image compression performance.

In one embodiment, we make use of 16-bit images or 24-bit images, which can encode (x,y) vectors on 256*256 or 4096*4096 grids, respectively. The value 256 equals 2^Λ8, and the value 4096 equals 2^Λ12.

In one embodiment, the Hilbert coding algorithm is modified to accommodate signed vectors, where x and y values range over both positive and negative numbers. The modification involves specifying two extra bits along with the path distance d, identifying the vector quadrant. These two bits are sign bits for x and y. The absolute values of x and y are then Hilbert coded as usual. (To avoid double coverage, x=0 and y=0 belong to the positive quadrant, and absolute values for negative x or y are computed as -1-x or -1-y.) In this embodiment, the sign bits are assigned to the two lowest-order bit positions of the output value, so that the numerical ranges of coded vectors in each of the quadrants are approximately equal. Hence, for the 16-bit image example, vectors with x and y values between -128 and +127 inclusive can be coded using a 7th order Hilbert curve for each quadrant. Vectors with x and y between -64 and +63 are assigned pixel values that can be represented with 14 bits, where 2^Λ14 = 16384. If x and y are between -8 and 7, then the values are less than 2^Λ8 = 256.

Packing Algorithm

In one or more embodiments, the pointer and data layers are precomputed, just as the prerendered layer is. Precomputation for the pointer and data layers consists of encoding all of the relevant vector data into data blocks, and packing both the pointers and data blocks as efficiently as possible into their respective layers. In rural or sparse suburban areas (see FIG. 2), features tend to be well separated, resulting in large empty areas in the pointer and data images. Where pointers do occur, they preferably fall precisely on the feature to which they refer, and their corresponding data blocks are in turn often centered precisely on the pointer. In dense urban areas, however (see FIG. 3), features are often too close together for the pointers and data blocks to fit into locations closely corresponding to the locations of their corresponding features in the prerendered layer. It is therefore helpful to use a rectangle packing algorithm to attempt to place pointers and data blocks as close to their desired positions as possible without any overlaps. The results are evident in FIG. 3. Empty area 350 is filled with data blocks 340 corresponding to features present along road 310 at intersections with plurality of roads 320. Because urban areas are typically surrounded by sparser areas (suburbs, mountains, or bodies of water), it is possible to place urban data blocks somewhere on the map, in reasonable proximity to the urban areas whose features they correspond to.

In general, even within a densely settled city there are enough empty spaces that this "spillover" is not overly severe. In general, the higher the rate of spillover is, the less well- localized the map vector data becomes. Spillover generally decreases drastically as the resolution of the data layer image is increased. It is beneficial to find a resolution at which efficiency and non-locality are appropriately balanced. In North America, 15m/pixel is generally a good choice. The resolution of 15m/pixel is "overkill" in rural areas, but near cities, this choice of resolution tends to limit spillover.

Efficient rectangle packing is a computationally difficult problem. However, there are numerous approximate algorithms in the computational geometry literature for solving it, and the present invention is not limited to any particular one of these. Otherwise stated, one or more of the rectangle-packing algorithms, described in the computational geometry literature and known to those of skill in the art, may be employed in conjunction with one or more embodiments of the present invention to place data blocks within the data layer.

A preferred algorithm which has already been used in conjunction with one or more embodiments of the present invention involves a hierarchical "rectangle tree", which enables conducting the following operations rapidly: testing whether a given rectangle intersects any other rectangle already in the tree; inserting a non-overlapping rectangle; and finding the complete set of "empty corners" (i.e. corners abutting already-inserted rectangles that border on empty space) in a ring of radius r₀<=τ<rι around a target point p. A "greedy algorithm" is used to insert a new rectangle as close as possible to a target point and then proceeds as follows:

1) Attempt to insert the rectangle centered on the target point. If this succeeds, algorithm ends.

2) Otherwise, define radius rO to be half the minimum of the length or width of the rectangle, and ri = r₀*2. 3) Find all "empty corners" between r₀ and rj, and sort by increasing radius.

4) Attempt to place the rectangle at each of these corners in sequence, and on success, algorithm ends.

If none of the attempted insertions succeeds, set r₀ to rl, set rl to 2*r_o, and go to step 3. In a preferred embodiment, this algorithm ultimately succeeds, in placing a rectangle provided that, somewhere in the image, an empty space exists which meets or exceeds the dimensions of the rectangle to be placed. This algorithm is "greedy" in the sense that it places a single rectangle at a time. The greedy algorithm does not attempt to solve the wholistic problem of packing n rectangles as efficiently as possible. A wholistic algorithm includes defining an explicit measure of packing efficiency, specifying the desired "tradeoff between minimizing wasted space and minimizing distance between rectangles and their "target points". The greedy algorithm is less optimal, but does not require explicitly specifying this tradeoff, as is clear from the algorithm description above. FIG. 4 demonstrates the output of the basic packing algorithm for three cases. In each case, the algorithm sequentially places a number of rectangles as near as possible to a common point. This solution to the rectangle packing problem is provided by "way of example. In the left-most case, most of the rectangles are small and narrow. The center example of the three, large and at least substantially square rectangles are used. And in the right-most example, a mix of small and large rectangles is employed.

For the greedy packing algorithm not to give placement preference to any specific areas of the map, it is desirable to randomize the order of rectangle insertio»n. In a preferred embodiment, pointer/data block pairs are thus inserted in random order. Otrier orderings may further improve packing efficiency in certain circumstances. For example, inserting large blocks before small ones may minimize wasted space.

In a preferred embodiment, pointer data is organized into two-pixel by two-pixel (meaning two pixels along a row and two pixels along a column) units. Thus, with units in pixels, each pointer is preferably 2x2 (the notation being rows x columns). However, in alternative embodiments, the row size and the column size of pointers may vary. In an alternative embodiment, pointers may be represented by a single 24-bit co> lor pixel, using 12^th order Hilbert coding.

For data blocks, there is freedom in selecting an aspect ratio of^" the data block: the block area in square pixels is determined by the amount of data which will fit in the block, but this area can fit into rectangles of many different shapes. For example, a 24-byte data block (including 4 bytes of width and height information, and 20 bytes of arbitrary data) can be represented exactly as 1x24, 2x12, 3x8, 4x6, 6x4, 8x3, or 12x2. (24x1 is disqualified, because, in this embodiment, the block width should be at least 2 so that the 2-t>yte width can be decoded before the block dimensions are known on the client side, as described above.) The data block can also be represented, with one byte left over, within a 5-pixel by 5-pixel block (or "5x5"). We refer to the set of all factorizations listed above, in addition to the approximate factorization 5x5, as "ceiling factorizations." The specifications for a valid ceiling factorization are that its area meet or exceed the dimensions of the data block in question, and that no row or column be entirely wasted. For example, 7x4 or 3x9 are not preferred ceiling factorizations, as they can be reduced to 6x4 and 3x8, respectively. In the simplest implementation, block dimensions may be selected based only on a ceiling factorization of the data length. In general, "squarer" blocks (such as 4x6) pack better than oblique ones (such as 2x12). The simplest data-block-sizing algorithm can thus select either 4x6 or 5x5, depending on how it trades off "squareness" against wasted bytes. More sophisticated block size selection algorithms may pick block dimensions adaptively, as part of the search for empty space near the target point. In one embodiment, steps 1 and 4 of the algorithm above are then modified as follows: 1) sort the ceiling factorizations having the needed data length by desirability, with preference for squarer factorizations and possibly a penalty for wasted bytes. 2) Attempt to place rectangles of dimensions given by each ceiling factorization in turn at target point p. 3) If any of these insertions succeeds, the algorithm ends. 4) For each "empty comer" c in turn, attempt to place rectangles of dimensions given by each of the ceiling factorizations in turn at c. On success, algorithm ends. Further refinements of this algorithm involve specification of a scoring function for insertions, which, as with a wholistic optimization function, trade off wasted space, non-square aspect ratio, and distance from the target point.

Each of the three map layers — the prerendered layer, the pointer layer, and the data layer — is preferably stored as a JPEG2000 or similar spatially-accessible representation. However, the permissible conditions for data compression are different for different ones of the three layers.

Compression for the prerendered road layer need not be lossless, but it is beneficial for it to have reasonable perceptual accuracy when displayed. At 15m/pixel, we have found 0.5 bit/pixel lossy wavelet compression to be fully adequate. However, in a preferred embodiment, the pointer and data layers are compressed losslessly, as they contain data which the client needs accurate reconstruction of. Lossless compression is not normally very efficient. Typical digital imagery, for example, is not usually compressible losslessly by more than a factor of about two at best. Techniques have been developed (as described in the "Flattening" section below) to achieve much higher lossless compression rates for the data and pointer layers, while still employing standard wavelet-based JPEG2000 compression. Alternative Packing Method

An alternative embodiment for packing data blocks within the data layer is presented in this section. In a preferred embodiment, an "allocate" function is defined to allocate a given number of bytes (corresponding to pixels). This allocate function preferably differs from the analogous conventional memory allocation function (in C, "malloc") in three ways.

1) While conventional memory allocation functions return a scalar pointer corresponding to the address of the beginning of a contiguous allocated interval in a one-dimensional address space, the pointer returned by our allocate function is a two-dimensional vector, specifying a starting position in the data image. 2) The pixels allocated by the allocate function disclosed herein may not all be contiguous, as are the allocated bytes beginning at the address returned by the function "malloc."

3) The allocate function disclosed herein is passed not only a desired number of pixels to allocate, but also a target position on the data image. The desired number of pixels are allocated as close as possible to the target position, while avoiding any previously allocated pixels.

The "allocate" function described in the "Packing Algorithm" section and the alternative "allocate" function described below share these characteristics. However, the previous "allocate" function always allocates a single rectangle of pixels, while the function described below can allocate space more flexibly. Desirable properties for the data image:

1) Low data overhead: One or more embodiments of the data image explored to date need some auxiliary data to be encoded. In the preliminary version, this data includes the block dimensions, stored as 16-bit values for width and height. Thus, the overhead was 4 pixels per allocated chunk of data. 2) Minimum wasted space: One or more embodiments of the data image explored so far may waste some pixels. For example, in one embodiment, requesting 26 pixels might have resulted in the allocation of an 8x4 pixel block. Of the 8x4=32 resulting pixels, 4 are overhead, and another 2 are wasted.

3) Good spatial localization: Repeated allocation of sizable chunks of data near the same target position will result in "crowding." It is desirable for the data pixels to be as close to the target as possible. 4) Coherence: It is desirable to keep the pixels of a single chunk as contiguous as possible, both for performance reasons and to reduce the number of incomplete data chunks given a fixed-size window into the data image.

Tradeoffs among these properties must generally be made. For example, although coherence and spatial localization would appear to be similar properties, they are often in conflict. If a long data chunk is allocated near a crowded spot, the nearest contiguous area may be far away, whereas the required number of pixels could be allocated nearby if the data chunk is instead broken up to fill cracks, resulting in better localization but worse coherence.

One or more embodiments of the present invention simplify data packing by setting the fundamental spatial unit of data allocation to be an n*m pixel block, where n and m are small but not less than 2*2, aligned on a grid with spacing n*m. These blocks may thus be considered "superpixels".

A single allocated chunk typically has more than n*m bytes, and the chunk must therefore span multiple blocks. Thus, blocks are preferably chained. The first two pixels of a block preferably comprise a pointer (which may be Hilbert-encoded, as described above), pointing to the next block in the chain. This is in effect a two-dimensional analogue of a singly- linked list. Vectors can be specified in grid units relative to the current block, so that for example, if a block specifies the vector (+1,0), it would mean that the chunk continues in the next block to the right; if a block specifies (-2,-1), it would mean that the chunk continues two blocks to the left and one block up. A (0,0) vector (equivalent to a null pointer) may be used to indicate that the current block is the last in the chain.

Data overhead in this scheme may be high if the block size is very small. For the limiting case of a 2x2 block, two of the four pixels per block serve as pointers to the next block, making the overhead data one half of the total data for the block. There are many compensating factors, however. One is that the packing problem is greatly simplified, resulting in a more optimal solution with less wasted space and better localization.

In one embodiment, the chunk allocation algorithm works by allocating n*m blocks sequentially. For k bytes, ceil((n*m-2)/k) blocks can be allocated. Allocation of a block can consist of locating the nearest empty block to the target point and marking it as full. After the required number of blocks are allocated, data and next-block pointers are then written to these blocks. "Nearest" may be defined using a variety of measures, but the four choices with useful properties are: 1) Euclidean (L2) norm. This will select the block with the shortest straight-line distance to the target, filling up blocks in concentric rings.

2) Manhattan (LI) norm: This distance measure is the sum of the absolute values of x offset and y offset. While a circle defines the set of points equidistant from a target point in L2, a rectangle defines this set in LI . Thus, blocks will fill up in concentric rectangles when using this measure. The LI measure makes more sense than L2 for most applications, because windows into the data image are themselves rectangular and because the maximum ranges for two-dimensional pointers are rectangular.

3) Hilbert Curve Norm: This norm is defined using the actual Hilbert curve path length, with the quadrant encoded in the lower two bits as described in the previous section. Minimizing this norm thus directly minimizes pointer magnitudes. Also, unlike the previous two norms, this one is non-degenerate, meaning that the distance from the target point (rounded to the nearest block position) to any other block is unique. In this embodiment, the "nearest" non- allocated block is therefore uniquely defined. 4) Rectangular spiral norm: This distance is similar to the LI norm, but it breaks the LI norm's degeneracy by imposing a consistent ordering on the blocks in a rectangular path Ll- equidistant from the target. This path can begin and end at an arbitrary point on the rectangle. For convenience, we can specify the low-x, low-y corner, with clockwise progression. This norm has the advantages that, like the Hilbert curve norm, it uniquely defines the "nearest" non-allocated block. Assuming that there are no collisions with pre-existing full blocks, sequential blocks are adjacent, thereby forming an expanding spiral around the target.

Other measures for picking the best (free) next block in a chain during allocation are also possible. In one embodiment, an allocator can take into account not only the distance of each block from the target point, but also the distance of each block from the previous block. The same measure can be used for measuring these two distances. Alternatively, different measures can be used, and these two distances can be added or otherwise combined with a relative weighting factor to give a combined distance. Weighing distance from the previous block heavily in deciding on the position of the next block favors coherence, while weighing absolute distance to the target point heavily favors spatial localization. Hence this scheme allows coherence and localization to be traded off in any manner desired by adjusting a single parameter.

The other important parameter is the block size "n." This discussion assumes that blocks are square, i.e. n=m. Non-square blocks, with n not equal to m, may be useful in situations where either the density of target points differs horizontally and vertically, or for better performance on computer architectures favoring coherent scan-line access. Block size defines the memory granularity, analogously to "memory alignment" in a ordinary ID (one-dimensional) memory. Large block sizes decrease data overhead, as the next-block pointers use a fraction 2/n^Λ2 of the allocated space; and they also improve coherence. However, large block sizes increase wasted space, since whole blocks are allocated at a time. Wasted space may in turn worsen spatial localization. The appropriate choice of block size depends on the distribution of expected chunk lengths, as well as the spatial distribution of target points. Making the best choice is complex, and should in general be done by experiment using typical data.

For map-vector data, 4-pixel by 4-pixel blocks have been found to be a good size. The data overhead is one eighth of the total data, which is substantial, but tighter packing resulting from a reduced amount of wasted space preferably compensates for the increased overhead. FIGS. 11-13 show data images (enhanced for high contrast) for one of the U.S. Virgin Islands using 4*4 blocks (FIG. 11), 6*6 blocks (FIG. 12) and 8*8 blocks (FIG. 13). Wasted space is drawn as white in FIGS. 11-13 for the sake of clarity (though in practice, for improved compression performance, wasted space is assigned value zero, or black). Clearly the 8*8 blocks both waste a great deal of space and offer poor localization, whereas the 4x4 blocks waste much less space and localize better. The 4*4 block image of FIG. 11 also compresses to a smaller file size than the other two.

Note that a data structure is needed to keep track of which blocks are full, and to find the nearest empty block based on a target point and a previous block position. The R-tree

(Antonin Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching, SIGMOD Conference 1984: 47-57 which is incorporated herein by reference) provides an example of an efficient sparse data structure that can be used to solve this problem. It is also possible to simply use a bitmap, where "0" bits indicate free blocks and "1" bits indicate filled blocks. Both of these data structures can support striping, whereby only a fraction of the total image is kept in working memory at any given time. This allows very large spatial database images to be created offline. Because localization is well bounded, use of the database for subsequent spatial queries requires only that a small window be "visible" at a time.

Flattening Image Data For most forms of either lossy or lossless compression, performance can be optimized by making the image function small in magnitude, hence occupying fewer significant bits. Therefore, in some embodiments, special coding techniques are used to "flatten" the original data. The outcome of these techniques is apparent in FIG. 5, which shows the same densely populated region of a data image before flattening (FIG. 5A) and after flattening (FIG. 5B). The data image used in FIG. 5 is a roadmap data image of the Virgin Islands. It is noted that FIG. 5B has been deliberately darkened so as to be more visible in this application. In FIG. 5B as presented, the rectangular image as a whole is a faint shade of grey. Moreover, a small amount of the pixel value variation highly evident in FIG. 5A is still visible in FIG. 5B, mostly in the bottom half of the image. The consistency of the pixel values throughout the vast majority of the pixels of FIG. 5B bears witness to the effectiveness of the extent of the "flattening" of the data of FIG. 5 A. Note that before flattening, referring to FIG. 5A, the data image has full 8-bit dynamic range, and exhibits high frequencies and structured patterns that make it compress very poorly (in fact, a lossless JPEG2000-compressed version of this image is no smaller than the original raw size). After "flattening", most of the structure is gone, and a great majority of the pixels have values that are less than 8 and that can therefore be represented using just 3 bits. The corresponding JPEG2000 compressed version of the image has better than 3: 1 compression. "Flattening" can consist of a number of simple data transformations, including the following (this is the complete list of transformations applied in the example of FIG. 5): The Flattening Technique Applied to FIG. 5 In the flattening technique of FIG. 5, 16-bit unsigned values, such as the width or height of the data block, would normally be encoded using a high-order byte and a low-order byte. We may use 16 bits because the values to be encoded occasionally exceed 255 (the 8-bit limit) by some unspecified amount. However, in the majority of cases, these values are do not exceed 255. For a value that fits in 8 bits, the high-order byte would be zero. Frequent zero high-order bytes followed by significant low-order bytes account for much of the 2-pixel periodicity apparent in parts of FIG. 5 A. We can remap the 16 bits as shown in Table 2 below. TABLE 2

In Table 1, the left eight columns represent the first pixel of the pair, previously the high-order byte, and the rightmost eight columns represent the second pixel, previously the low-order byte. By redistributing the bits in this way, the range of accessible values (0-65535) remains unchanged, but the two bytes become much more symmetric. For example, for all 16-bit values 0-255, the two bytes each assume values < 16. Similar bit-interleaving techniques apply to 32-bit or larger integer values. These techniques are also extensible to signed quantities. For variables in which the sign changes frequently, as occurs for differential coding of a road vector, a sign bit can be assigned to position 0, and the absolute value can be encoded in alternating bytes as above. Note that to be drawn convincingly, road vector data may be represented at greater than pixel precision. Arbitrary units smaller than a pixel can instead be used, or equivalently, sub-pixel precision can be implemented using fixed point arithmetic in combination with the above techniques. In our exemplary embodiment, 4 sub-pixel bits are used, for 1/16 pixel precision.

When numbers are encoded as described above, it is desirable to make the numbers as small as possible. Sometimes context suggests an obvious way to do this. For example, since, in a preferred embodiment, each data block is 2 or more pixels wide, we can subtract 2 from the data width before encoding. More significantly, both pointers and any position vectors encoded in a data block are specified in pixels relative to the pointer position, rather than in absolute coordinates. This not only greatly decreases the magnitude of the numbers to encode, it also allows a portion of the data image to be decoded and rendered vectorially in a local coordinate system without regard for the absolute position of this portion.

In a preferred embodiment, for vector rendering of a sequence of points defining a curve (for example, of a road), only the first point need be specified relative to the original pointer position. Subsequent points can be encoded as "deltas", or step vectors from the previous point. After the second such point, additional subsequent points can be encoded as the second derivative, or the difference between the current and previous delta. Encoding using the second derivative is generally efficient for such structures as roads, since they tend to be discretizations of curves with continuity of the derivative. Otherwise stated, roads tend to change their direction gradually. Alternative Flattening Technique

Another "flattening" technique is described in document (1) (which is attached hereto as Appendix A) for use with textual data, which would normally be encoded as ASCII, with a single character per byte. In the application described in (1), English text is being encoded, and hence the letters are remapped based on decreasing frequency of letter occurrence in a representative sample of English. The same technique can be used in this context, although the text encoded in a map, consisting mostly of street names, has quite different statistics from ordinary English. Numerals and capital letters, for example, are far more prominent. Note that the particular methods for the encoding of pointers or data as presented above are exemplary; many other encodings are also possible. "Good" encodings generally result in images which are smooth and/or which have low dynamic range. Using the techniques above, a roadmap of King County, Washington at 15 meters (m) per pixel compresses as shown in Table 3 below.

Table 3

Surprisingly, the JPEG2000 representation of the map data (including lossy pre-rendered roadmap image, lossless pointer layer, and lossless data layer) is actually smaller than the compressed ZIP file representing the original data as tabulated text. (This file is part of the United States Census Bureau's 2002 TIGER/Line database.) Unlike the original ZIP file however, the new representation is ready to serve interactively to a client, with efficient support for continuously pannable and zoomable spatial access. The original prerendered multiscale map invention introduced in document (2) (which is attached hereto as Exhibit B) included not a single prerendered image, but a stack of such images, rendered at progressively coarser resolutions and with rescaled weights for lines (or other visible features). Although no features are omitted from any of these prerenditions, some features are de-emphasized enough to be clearly visible only in an aggregate sense. For example, the local roads of a city become a faint grey blur at the statewide level. One or more embodiments of the present invention can be extended to include pointer and data images corresponding to the coarser prerendered roadmap images, in which only a subset of the original vector objects are represented. For example, statewide pointer and data images, which are at much lower resolution than those used for prerendered images, might only include data for state and national highways, excluding all local roads. These coarser data may also be "abstracts", for example specifying only road names, not vectors. Images at different resolutions might include varying mixtures or subsets of the original data, or abstracted versions. This technique both allows all of the relevant data to fit into the smaller coarse images, and provides the client with the subset of the vector information relevant for navigation at that scale.

Although the implementation outlined above suggests an 8-bit grayscale prerendered map image at every resolution, the prerendered images may also be in color. Further, the prerendered images may be displayed by the client in color even if they are single-channel images, since the vector data can be used to draw important roads in different colors than the prerendered material. Finally, the prerendered images may omit certain features or roads present in the vector data, relying on the client to composite the image and vector material appropriately.

Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims.

APPENDIX

Title: SYSTEM AND METHOD FOR FOVEATED, SEAMLESS, PROGRESSIVE RENDERING IN A ZOOMING USER INTERFACE

Inventor: BLAISE HILARY AGUERA Y ARC AS

Field of the Invention

The present invention relates generally to zooming user interfaces (ZUIs) for computers. More specifically, the invention is a system and method for progressively rendering arbitrarily large or complex visual content in a zooming environment while maintaining good user responsiveness and high frame rates. Although it is necessary in some situations to temporarily degrade the quality of the rendition to meet these goals, the present invention largely masks this degradation by exploiting well-known properties of the human visual system.

Background of the invention

Most present-day graphical computer user interfaces (GUIs) are designed using visual components of fixed spatial scale. However, it was recognized from the birth of the field of computer graphics that visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out. The desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneoixs text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets. Even when viewing ordinary documents, such as spreadsheets and reports, it is often useful to be able to glance at a document overview, then zoom in on an area of interest. Many modern computer applications include zoomable components, such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc. In most cases, these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally. Although continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent. In a more generalized zooming framework, any kind of visual content could be zoomed, and zooming would be as much a part of the user's experience as panning. Ideas along these lines made appearances as futuristic computer user interfaces 1 in many movies even as early as the 1960s ; recent movies continue the trend . A number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present.³ In 1991, some of these ideas were formalized in U.S. Patent 5,341 ,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability"). The prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++, have

¹ e.g. Stanley Kubrick's 2001: A Spats Odyssey, Turner Entertainment Company, a Time Warner company (1968).

² e.g. Steven Spielberg's Minority Repnt, 20* Century Fox and Dreamworks Pictures (2002).

³ An early appearance is W.G Donelson, Spatial Mana ment (f Irfomation, Proceedings of Computer Graphics SIGGRAPH (1978), ACM Press, p. 203-9. A recent example is Zanvas.com, which launched in the summer of 2002. undergone some development since⁴. To my knowledge, however, no major application based on a full ZUI (Zooming User Interface) has yet appeared on the mass market, due in part to a number of technical shortfalls, one of which is addressed in the present invention.

Summary of the invention

The present invention embodies a novel idea on which a newly developed zooming user interface framework (hereafter referred to by its working name, Voss) is based. Voss is more powerful, more responsive, more visually compelling and of more general utility than its predecessors due to a number of innovations in its software architecture. This patent is specifically about Voss's approach to object tiling, level-of-detail blending, and render queueing. A multiresolution visual object is normally rendered from a discrete set of sampled images at different resolutions or levels of detail (an image pyramid). In some technological contexts where continuous zooming is used, such as 3D gaming, two adjacent levels of detail which bracket the desired level of detail are blended together to render each frame, because it is not normally the case that the desired level of detail is exactly one of those represented by the discrete set. Such techniques are sometimes referred to as trilinear filtering or mipmapping. In most cases, mipmapped image pyramids are premade, and kept in short-term memory (i.e. RAM) continuously during the zooming operation; thus any required level of detail is always available. In some advanced 3D rendering scenarios, the image pyramid must itself be rendered within an

¹ Perlin describes subsequent developments at lmp://rnrl.nyu.edu/projects/zui/. animation loop; however, in these cases the complexity of this first rendering pass must be carefully controlled, so that overall frame rate does not suffex. In the present context, it is desirable to be able to naviga-te continuously by zooming and panning through an unlimited amount of content of arbitrary visual complexity. This content may not render quickly, and moreovex it may not be available immediately, but need to be downloaded from a remote location over a low-bandwidth connection. It is thus not always possible to render levels of detail (first pass) at a frame rate comparable to the desired display frame rate (second pass). Moreover it is not in general possible to keep pre-made image pyramids in memory for all content; image pyramids must be rendered or re-rendered as needed, and this rendering may be slow compared to the desired frame rate. The present invention involves both strategies for prioritizing the (potentially slow) rendition of the parts of the image pyramid relevent to the current display, and stategies for presenting the user with a smooth, continuous perception of the rendered content based on partial information, i.e. only the currently available subset of the image pyramid. In combination, these strategies make near-optimal use of the available computing power or bandwidth, while masking, to the extent possible, any image degradation resulting from incomplete image pyramids. Spatial and temporal blending are exploited to avoid discontinuities or sudden changes in image sharpness. An objective of the present invention is to allow sample d (i.e. "pixellated") visual content to be rendered in a zooming user interface without degradation in ultimate image quality relative to conventional trilinear interpolation. A further objective of the present invention is to allow arbitrarily large or complex visual content to be viewed in a zooming user interface. A further objective of the present invention is to enable near- immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network. A further objective of the present invention is to allow the user to zoom arbitrarily far in on visual content while maintaining interactive frame rates. A further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates. A further objective of the present invention is to minimize the user's perception of transitions between levels of detail or rendition qualities during interaction. A further objective of the present invention is to allow the graceful degradation of image quality by continuous blurring when detailed visual content is as yet unavailable, either because the information needed to render it is unavailable, or because rendition is still in progress. A further objective of the present invention is to gracefully increase image quality by gradual sharpening when renditions of certain parts of the visual content first become available. These and other objectives of the present invention will become apparent to those skilled in the art from a review of the specification that follows. Prior art: multiresolution imagery and zooming user interfaces

From a technical perspective, zooming user interfaces are a generalization of the usual concepts underlying visual computing, allowing a number of limitations inherent in the classical user/computer/document interaction model to be overcome. One such limitation is on the size of a document that can be "opened" from a computer application, as traditionally the entirety of such a document must be "loaded" before viewing or editing can begin. Even when the amount of short-term memory (normally RAM) available to a particular computer is large, this limitation is felt, because all of the document information must be transferred to short-term memory from some repository (e.g. from a hard disk, or across a network) during opening; limited bandwidth can thus make the delay between issuing an "open" command and being able to begin viewing or editing unacceptably long. Still digital images both provide an excellent example of this problem, and an illustration of how the computer science community has moved beyond the standard model for visual computing in overcoming the problem. Table 1 below shows download times at different bandwidths for typical compressed sizes of a variety of different image types, from the smallest useful images (thumbnails, which are sometimes used as icons) to the largest in common use today. Shaded boxes indicate images sizes for which interactive browsing is difficult or impossible at a particular connection speed.

Table 1.

*Note that these figures represent realistic compressed sizes at intermediate quality, not raw image data. Specifically, we assume 1 bit/pixel for the sizes up to 40MB, and 0.25 bits/pixel for the larger images, which are generally more compressible. **Local wireless networks may be considerably faster; this figure refers to wireless wide- area networks of the type often used for wireless PDAs.

Nearly eveiy image on the Web at present is under 100K (0.1MB), because most users are connected to the Web at DSL or lower bandwidth, and larger images would talce too long to download. Even in a local setting, on a typical user's hard drive, it is unusual to encounter images larger than 500K (0.5MB). That larger (that is, more detailed) images would often be useful is attested to by the fact that illustrated books, atlases, maps, newspapers and artworks in the average home include a great many images which, if digitized at full resolution, would easily be tens of megabytes in size. Several years ago the dearth of large images was largely due to a shortage of storage space in repositories, but advances in hard drive technology, the ease of burning CDROMs, and the increasing prevalence of large networked servers has made repository space no longer the limiting factor. The main bottleneck now is bandwidth, followed by short-term memory (i.e. RAM) space. The problem is in reality much worse than suggested by the table above, because in most contexts the user is interested not only in viewing a single image, but an entire collection of images; if the images are larger than some modest size, then it becomes impractical to wait while one image downloads after another. Modem image compression standards, such as JPEG2000⁵, are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition. The image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl. Obviously the fine details are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently-sized images are often called levels of detail, or LODs for short. At first glance it may seem as if the storage requirements for this series of differently-sized images might be greater than for the high-resolution image alone, but

⁵ http://www.jpeg.org/JPEG2000.html in fact this is not the case: a low-resolution image serves as a "predictor" for the next higher resolution. This allows the entire image hierarchy to be encoded very efficiently — more efficiently, in fact, than would usually be possible with a non- hierarchical representation of the high-resolution image alone. If one imagines that the sequence of multiresolution versions of the image is stored in order of increasing size in the repository, then a natural consequence is that as the image is transferred across the data link to the cache, the user can obtain a low- resolution overview of the entire image very rapidly; finer and finer details will then "fill in" as the transmission progresses. This is known as incremental or progressive transmission. Properly implemented, it has the property that any image at all — no matter how large — can be viewed in its spatial entirety (though not in its full detail) almost immediately, even if the bandwidth of the connection to the repository is very modest. Although the ultimate amount of time needed to download the image in full detail remains the same, the order in which this information is sent has been changed such that the large-scale features of an image are transmitted first; this is much more helpful to the user than transmitting pixel information at full detail and in "reading order", from top to bottom and left to right. Hidden in this advance is a new concept of what it means to "open" an image which does not fit into the classical application model described in the previous section. We are now imagining that the user is able to view an image as it downloads, a concept whose usefulness arises from the fact that the broad strokes of the image are available very soon after download begins, and perhaps well before downloading is finished. It therefore makes no sense for the application to force the user to wait while downloading finishes; the application should instead display what it can of the document immediately, and not cause delays or unnecessarily interrupt its interaction with the user while it continues downloading the details "in the background". This requires that the application do more than one task at once, which is termed multithreading. Note that most modern web browsers use multithreading in a slightly different capacity: to simultaneously download images on a web page, while displaying the web page's textual layout and remaining responsive to the user in the meantime. In this case we can think about the embedded images themselves as being additional levels of detail, which enhance the basic level of detail comprised of the web page's bare-bones text layout. This analogy will prove important later. Clearly hierarchical image representation and progressive transmission of the image document are an advance over linear representation and transmission. However, a further advance becomes important when an image, at its highest level of detail, has more information (i.e. more pixels) than the user's display can show at once. With current display technology, this is always the case for the bottom four kinds of images in the Table 1, but smaller displays (such as PDA screens) may not be able to show even the bottom eight. This makes a zooming feature imperative for large images: it is useless to view an image larger than the display if it is not possible to zoom in to discover the additional detail. When a large image begins to download, presumably the user is viewing it in its entirety. The first levels of detail are often so coarse that the displayed image will appear either blocky or blurry, depending on the kind of interpolation used to spread the small amount of infoiination available over a large display area. The image will then refine progressively, but at a certain point it will "saturate" the display with information, making any additional detail downloaded have no visible effect. It therefore makes no sense to continue the download beyond this point at all. Suppose, however, that the user decides to zoom in to see a particular area in much more detail, making the effective projected size of the image substantially larger than the physical screen. Then, in the downloading model described in the previous section, higher levels of detail would need to be downloaded, in increasing order. The difficulty is that every level of detail contains approximately four times the information of the previous level of detail; as the user zooms in, the downloading process will inevitably lag behind. Worse, most of the information being downloaded is wasted, as it consists of high-resolution detail outside the viewing area. Clearly, what is needed is the ability to download only selected parts of certain levels of detail — that is, only the detail which is visible should be downloaded. With this alteration, an image browsing system can be made that is not only capable of viewing images of arbitrarily large size, but is also capable of navigating (i.e. zooming and panning) through such images efficiently at any level of detail. Previous models of document access are by nature serial, meaning that the entirety of an information object is transmitted in linear order. This model, by contrast, is random-access, meaning that only selected parts of the information object are requested, and these requests may be made in any order and over an extended period of time, i.e. over the course of a viewing session. The computer and the repository now engage in an extended dialogue, paralleling the user's "dialogue" with the document as viewed on the display. To make random access efficient, it is convenient (though not absolutely required) to subdivide each level of detail into a grid, such that a grid square, or tile, is the basic unit of transmission. The size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile. The resulting tiled image pyramid is shown in Figure 2. Note that the "tip" of the pyramid, where the downscaled image is smaller than a single tile, looks like the untiled image pyramid of Figure 1. The JPEG2000 image format includes all of the features just described for representing tiled, multiresolution and random-access images. Thus far we have considered only the case of static images, but the same techniques, with application-specific modifications, can be applied to nearly any type of visual document. This includes (but is not limited to) large texts, maps or other vector graphics, spreadsheets, video, and mixed documents such as web pages. Our discussion thus far has also implicitly considered a viewing-only application, i.e. one in which only the actions or methods corresponding to opening and drawing need be defined. Clearly other methods may be desirable, such as the editing commands implemented by paint programs for static images, the editing commands implemented by word processors for texts, etc. Yet consider the problem of editing a text: the usual actions, such as inserting typed input, are only relevant over a certain range of spatial scales relative to the underlying document. If we have zoomed out so far that the text is no longer legible, then interactive editing is no longer possible. It can also be argued that interactive editing is no longer possible if we have zoomed so far in that a single letter fills the entire screen. Hence a zooming user interface may also restrict the action of certain methods to their relevant levels of detail. When a visual document is not represented internally as an image, but as more abstract data — such as text, spreadsheet entries, or vector graphics — it is necessary to generalize the tiling concept introduced in the previous section. For still images, the process of rendering a tile, once obtained, is trivial, since the information (once decompressed) is precisely the pixel-by-pixel contents of the tile. The speed bottleneck, moreover, is normally the transfer of compressed data to the computer (e.g. downloading). However, in some cases the speed bottleneck is in the rendition of tiles; the information used to make the rendition may already be stored locally, or may be very compact, so that downloading no longer causes delay. Hence we will refer to the production of a finished, fully drawn tile in response to a tile drawing request as tile rendition, with the understanding that this may be a slow process. Whether it is slow because the required data are substantial and must be downloaded over a slow connection or because the rendition process is itself computationally intensive is irrelevant. A complete zooming user interface combines these ideas in such a way that the user is able to view a large and possibly dynamic composite document, whose sub- documents are usually spatially non-overlapping. These sub-documents may in turn contain (usually non-overlapping) sub-sub-documents, and so on. Hence documents form a tree, a structure in which each document has pointers to a collection of sub- documents, or children, each of which is contained within the spatial boundary of the parent document. We call each such document a node, borrowing from programming terminology for trees. Although drawing methods are defined for all nodes at all levels of detail, other methods corresponding to application-specific functionality may be defined only for certain nodes, and their action may be restricted only to certain levels of detail. Hence some nodes may be static images which can be edited using painting-like commands, while other nodes may be editable text, while other nodes may be Web pages designed for viewing and clicking. All of these can coexist within a common large spatial environment — a "supernode" — which can be navigated by zooming and panning. There are a number of immediate consequences for a well-implemented zooming user interface, including: - - It is able to browse very large documents without downloading them in their entirety from the repository; thus even documents larger than the available short-term memory, or whose size would otherwise be prohibitive, can be viewed without limitation. - - Content is only downloaded as needed during navigation, resulting in optimally efficient use of the available bandwidth. - - Zooming and panning are spatially intuitive operations, allowing large amounts of infonriation to be organized in an easily understood way. - - Since "screen space" is essentially unlimited, it is not necessary to minimize windows, use multiple desktops, or hide windows behind each other to work on multiple documents or views at once. Instead, documents can be arranged as desired, and the user can zoom out for an overview of all of them, or in on particular ones. This does not preclude the possibility of rearranging the positions (or even scales) of such documents to allow any combination of them to be visible at a useful scale on the screen at the same time. Neither does it necessarily preclude combining zooming with more traditional approaches. - - Because zooming is an intrinsic aspect of navigation, content of any kind can be viewed at an appropriate spatial scale. - - High-resolution displays no longer imply shrinking text and images to small (sometimes illegible) sizes; depending on the level of zooming, they either allow more content to be viewed at once, or they allow content to be viewed at normal size and higher fidelity. - - The vision impaired can easily navigate the same content as normally sighted people, simply by zooming in farther. - These benefits are particularly valuable in the wake of the explosion in the amount of information available to ordinary computers connected to the Web. A decade ago, the kinds of very large documents which a ZUI enables one to view were rare, and moreover such documents would have taken up so much space that very few would have fit on the repositories available to most computers (e.g., a 40MB hard disk). Today, however, we face a very different situation: servers can easily store vast documents and document hierarchies, and make this information available to any client connected to the Web. Yet the bandwidth of the connection between these potentially vast repositories and the ordinary user is far lower than the bandwidth of the connection to a local hard disk. This is precisely the scenario in which the ZUI confers its greatest advantages over conventional graphical user interfaces. Detailed description of the invention

For a particular view of a node at a certain desired resolution, there is some set of tiles, at a certain LOD, which would need to be drawn for the rendition to include at least one sample per screen pixel. Note that views do not normally fall precisely at the resolution of one of the node's LODs, but rather at an intermediate resolution between two of them. Hence, ideally, in a zooming environment the client generates the set of visible tiles at both of these LODs — just below and just above the actual resolution — and uses some interpolation to render the pixels on the display based on this infomiation. The most common scenario is linear interpolation, both spatially and between levels of detail; in the graphics literature, this is usually referred to as trilinear interpolation. Closely related techniques are commonly used in 3D graphics architectures for texturing. Unfortunately, downloading (or programmatically rendering) tiles is often slow, and especially during rapid navigation, not all the necessary tiles will be available at all times. The innovations in this patent therefore focus on a combination of strategies for presenting the viewer with a spatially and temporally continuous and coherent image that approximates this ideal image, in an environment where tile download or creation is happening slowly and asynchronously. In the following we use two variable names, /and g. /refers to the sampling density of a tile relative to the display, defined in #1. Tiling granularity, which we will write as the variable g, is defined as the ratio of the linear tiling grid size at a some LOD to the linear tiling grid size at the next lower LOD. This is in general presumed to be

⁶ S.L. Tanimoto and T. Pavlidis, A hierarchical data structure for picture processing, Computer Graphics and Image Processing, Vol. 4, p. 104-119 (1975); Lance Williams, Pyramidal ParamZrks, ACM SIGGRAPH Conference Proceedings (1982). constant over different levels of detail for a given node, although none of the innovations presented here rely on constant g. In the JPEG2000 example considered in the previous section, g=2: conceptually, each tile "breaks up" into 2x2=4 tiles at the next higher LOD. Granularity 2 is by far the most common in similar applications, but in the present context g may take other values. 1. Level of detail tile request queuing. We first introduce a system and method for queuing tile requests that allows the client to bring a composite image gradually "into focus", by analogy with optical instruments. Faced with the problem of an erratic, possibly low-bandwidth connection to an information repository containing hierarchically tiled nodes, a zooming user interface must address the problem of how to request tiles during navigation. In many situations, it is unrealistic to assume that all such requests will be met in a timely manner, or even that they will be met at all during the period when the information is relevant (i.e. before the user has zoomed or panned elsewhere.) It is therefore desirable to prioritize tile requests intelligently. The "outermost" rule for tile request queuing is increasing level of detail relative to the display. This "relative level of detail", which is zoom-dependent, is given by the number /= (linear tile size in tile pixels)/(projected tile length on the screen measured in screen pixels). If/=1, then tile pixels are l.T with screen pixels; if/=10, then the information in the tile is far more detailed than the display can show (10*10=100 tile pixels fit inside a single screen pixel); and if/=0.1 then the tile is coarse relative to the display (every tile pixel must be "stretched", or interpolated, to cover 10*10=100 display pixels). This rule ensures that, if a region of the display is undersampled (i.e. only coarsely defined) relative to the rest of the display, the client's first priority will be to fill in this "resolution hole". If more than one level of detail is missing in the hole, then requests for all levels of detail with/< 1 , plus the next higher level of detail (to allow LOD blending — see #5), are queued in increasing order. At first glance, one might suppose that this introduces unnecessary overhead, because only the finest of these levels of detail is strictly required to render the current view; the coarser levels of detail are redundant, in that they define a lower-resolution image on the display. However, these coarser levels cover a larger area — in general, an area considerably larger than the display. The coarsest level of detail for any node in fact includes only a single tile by construction, so a client rendering any view of a node will invariably queue this "outermost" tile first. This is an important point for viewing robustness. By robustness we mean that the client is never "at a loss" regarding what to display in response to a user's panning and zooming, even if there is a large backlog of tile requests waiting to be filled. The client simply displays the best (i.e. highest resolution) image available for every region on the display. At worst, this will be the outermost tile, which is the first tile ever requested in connection with the node. Therefore, every spatial part of the node will always be renderable based on the first tile request alone; all subsequent tile requests can be considered incremental refinements. Falling back on lower-resolution tiles creates the impression of blurring the image; hence the overall effect is that the display may appear blurry after a sizeable pan or zoom. Then, as tile requests are filled, the image sharpens. A simple calculation shows that the overhead created by requesting "redundant" lower-resolution tiles is in fact minor — in particular, it is a small price to pay for the robustness of having the node image well-defined everywhere from the start. 2. Foveated tile request queuing. Within a relative level of detail, tile requests are queued by increasing distance to the center of the screen, as shown in Figure 3. This technology is inspired by the human eye, which has a central region — the fovea — specialized for high resolution. Because zooming is usually associated with interest in the central region of the display, foveated tile request queuing usually reflects the user's implicit prioritization for visual information during inward zooms. Furthermore, because the user's eye generally spends more time looking at regions near the center of the display than the edge, residual blurriness at the display edge is less noticeable than near the center. The transient, relative increase in sharpness near the center of the display produced by zooming in using foveal tile request order also mirrors the natural consequences of zooming out — see Figure 4. The figure shows two alternate "navigation paths": in the top row, the user remains stationary while viewing a single document (or node) occupying about two thirds of the display, which we assume can be displayed at very high resolution. Initially the node contents are represented by a single, low- resolution tile; then tiles at the next LOD become available, making the node contents visible at twice the resolution with four (=2x2) tiles; 4x4=16 and 8x8=64 tile versions follow. In the second row, we follow what happens if the user were to zoom in on the shaded square before the image displayed in the top row is fully refined. Tiles at higher levels of detail are again queued, but in this case only those that are partially or fully visible. Refinement progresses to a point comparable to that of the top row (in terms of number of visible tiles on the display). The third row shows what is available if the user then zooms out again, and how the missing detail is filled in. Although all levels of detail are shown, note that in practice the very fine levels would probably be omitted from the displays on the bottom row, since they represent finer details than the display can convey. Note that zooming out normally leaves the center of the display filled with more detailed tiles than the periphery. Hence this ordering of tile requests consistently prioritizes the sharpness of the central area of the display during all navigation. 3. Temporal LOD blending. Without further refinements, when a tile needed for the current display is downloaded or constructed and drawn for the first time, it will immediately obscure part of an underlying, coarser tile presumably representing the same content; the user experiences this transition as a sudden change in blurriness in some region of the display. Such sudden transitions are unsightly, and unnecessarily draw the user's attention to details of the software's implementation. Our general approach to ZUI design is to create a seamless visual experience for the user, which does not draw attention to the existence of tiles or other aspects of the software which should remain "under the hood". Therefore, when tiles first become available, they are not displayed immediately, but blended in over a number of frames — typically over roughly one second. The blending function may be linear (i.e. the opacity of the new tile is a linear function of time since the tile became available, so that halfway through the fixed blend- in interval the new tile is 50% opaque), exponential, or follow any other interpolating function. In an exponential blend, every small constant interval of time corresponds to a constant percent change in the opacity; for example, the new tile may become 20% more opaque at every frame, which results in the sequence of opacities over consecutive frames 20%, 36%, 49%, 59%, 67%, 74%, 79%, 83%, 87%, 89%, 91%, 93%, etc. Mathematically, the exponential never reaches 100%, but in practice, the opacity becomes indistinguishable from 100% after a short interval. An exponential blend has the advantage that the greatest increase in opacity occurs near the beginning of the blending-in, which makes the new information visible to the user quickly while still preserving acceptable temporal continuity. In our reference implementation, the illusion created is that regions of the display come smoothly into focus as the necessary information becomes available. 4. Continuous LOD. In a situation in which tile download or creation is lagging behind the user's navigation, adjacent regions of the display may have different levels of detail. Although the previous innovation (#3) addresses the problem of temporal discontinuity in level of detail, a separate innovation is needed to address the problem of spatial discontinuity in level of detail. If uncorrected, these spatial discontinuities are visible to the user as seams in the image, with visual content drawn more sharply to one side of the seam. We resolve this problem by allowing the opacity of each tile to be variable over the tile area; in particular, this opacity is made to go to zero at a tile edge if this edge abuts a region on the display with a lower relative level of detail. It is also important in some situations to make the opacity at each comer of the tile go to zero if the corner touches a region of lower relative level of detail. Figure 5 shows our simplest reference implementation for how each tile can be decomposed into rectangles and triangles, called tile shards, such that opacity changes continuously over each tile shard. Tile X, bounded by the square aceg, has neighboring tiles L, R, T and B on the left, right, top and bottom, each sharing an edge. It also has neighbors TL, TR, BL and BR sharing a single comer. Assume that tile X is present. Its "inner square", iiii, is then fully opaque. (Note that repeated lowercase letters indicate identical vertex opacity values.) However, the opacity of the surrounding rectangular frame is determined by whether the neighboring tiles are present (and fully opaque). Hence if tile TL is absent, then point g will be fully transparent; if L is absent, then points h will be fully transparent, etc. We term the border region of the tile (X outside iiii) the blending flaps. Figure 6 illustrates the reference method used to interpolate opacity over a shard. Part (a) shows a constant opacity rectangle. Part (b) is a rectangle in which the opacities of two opposing edges are different; then the opacity over the interior is simply a linear interpolation based on the shortest distance of each interior point from the two edges. Part (c) shows a bilinear method for interpolating opacity over a triangle, when the opacities of all three comers abc may be different. Conceptually, every interior point p subdivides the triangle into three sub-triangles as shown, with areas A, B and C. The opacity at p is then simply a weighted sum of the opacities at the corners, where the weights are the fractional areas of the three sub-triangles (i.e. A, B and C divided by the total triangle area A+B+C). It is easily verified that this formula identically gives the opacity at a vertex when/? moves to that vertex, and that if p is on the triangle edge then its opacity is a linear interpolation between the two connected vertices. Since the opacity within a shard is determined entirely by the opacities at its vertices, and neighboring shards always share vertices (i.e. there are no T-junctions), this method ensures that opacity will vary smoothly over the entire tiled surface. In combination with the temporal LOD blending of #3, this strategy causes the relative level of detail visible to the user to be a continuous function, both over the display area and in time. Both spatial seams and temporal discontinuities are thereby avoided, presenting the user with a visual experience reminiscent of an optical instrument bringing a scene continuously into focus. For navigating large documents, the speed with which the scene comes into focus is a function of the bandwidth of the connection to the repository, or the speed of tile rendition, whichever is slower. Finally, in combination with the foveated prioritization of innovation #2, the continuous level of detail is biased in such a way that the central area of the display is brought into focus first. 5. Generalized linear-mipmap-linear LOD blending. We have discussed strategies and reference implementations for ensuring spatial and temporal smoothness in apparent LOD over a node. We have not yet addressed, however, the manner in which levels of detail are blended during a continuous zooming operation. The method used is a generalization of trilinear interpolation, in which adjacent levels of detail are blended linearly over the intermediate range of scales. At each level of detail, each tile shard has an opacity as drawn, which has been spatially averaged with neighboring tile shards at the same level of detail for spatial smoothness, and temporally averaged for smoothness over time. The target opacity is 100% if the level of detail undersamples the display, i.e. /<1 (see #1). However, if it oversamples the display, then the target opacity is decreased linearly (or using any other monotonic function) such that it goes to zero if the oversampling is g-fold. Like trilinear interpolation, this causes continuous blending over a zoom operation, ensuring that the perceived level of detail never changes suddenly. However, unlike conventional trilinear interpolation — which always involves a blend of two levels of detail — the number of blended levels of detail in this scheme can be one, two, or more. A number larger than two is transient, and caused by tiles at more than one level of detail not having been fully blended in temporally yet. A single level is also usually transient, in that it normally occurs when a lower-than-ideal LOD is "standing in" at 100% opacity for higher LODs which have yet to be downloaded or constructed and blended in. The simplest reference implementation for rendering the set of tile shards for a node is to use the so-called "painter's algorithm": all tile shards are rendered in back-to- front order, that is, from coarsest (lowest LOD) to finest (highest LOD which oversamples the display less than g-fold). The target opacities of all but the highest LOD are 100%, though they may transiently be rendered at lower opacity if their temporal blending is incomplete. The highest LOD has variable opacity, depending on how much it oversamples the display, as discussed above. Clearly this reference implementation is not optimal, in that it may render shards which are then fully obscured by subsequently rendered shards. More optimal implementations are possible through the use of data structures and algorithms analogous to those used for hidden surface removal in 3D graphics. 6. Motion anticipation. During rapid zooming or panning, it is especially difficult for tile requests to keep up with demand. Yet during these rapid navigation patterns, the zooming or panning motion tends to be locally well-predicted by linear extrapolation (i.e. it is difficult to make sudden reversals or changes in direction). Thus we exploit this temporal motion coherence to generate tile requests slightly ahead of time, thus improving visual quality. This is accomplished by making tile requests using a virtual viewport which elongates, dilates or contracts in the direction of motion when panning or zooming, thus pre-empting requests for additional tiles. When navigation ceases, the virtual viewport relaxes over a brief interval of time back to the real viewport.

Note that none of the above innovations are restricted to rectangular tilings; they generalize in an obvious fashion to any tiling pattern which can be defined on a grid, such as triangular or hexagonal tiling, or heterogeneous tilings consisting of mixtures of such shapes, or entirely arbitrary tilings. The only explicit change which needs to be made to accommodate such alternate tilings is to define triangulations of the tile shapes analogous to those of Figure 5, such that the opacities of the edges and the interior can all be controlled independently.

FlG.l-

increasing LOD Pl&.S

increasing LOD T -

r- OC. CΩ

ι

linear interpolation of opacity over a polygon

Title: SYSTEM AND METHOD FOR THE EFFICIENT, DYNAMIC AND CONTINUOUS DISPLAY OF MULTIRESOLUTION VISUAL DATA

Inventor: BLAISE HILARY AGUERA Y ARCAS

Field of the Invention

The present invention relates generally to multiresolution imagery. More specifically, the invention is a system and method for efficiently blending together visual representations of content at different resolutions or levels of detail in real time. The method ensures perceptual continuity even in highly dynamic contexts, in which the data being visualized may be changing, and only partial data may be available at any given time. The invention has applications in a number of fields, including (but not limited to) zooming user interfaces (ZUIs) for computers.

Background of the invention

In many situations involving the display of complex visual data, these data are stored or computed hierarchically, as a collection of representations at different levels of detail (LODs). Many multiresolution methods and representations have been devised for different kinds of data, including (for example, and without limitation) wavelets for digital images, and progressive meshes for 3D models. Multiresolution methods are also used in mathematical and physical simulations, in situations where a possibly lengthy calculation can be performed more "coarsely" or more "finely"; this invention also applies to such simulations, and to other situations in which multiresolution visual data may be generated interactively. Further, the invention applies in situations in which visual data can be obtained "on the fly" at different levels of detail, for example, from a camera with machine-controllable pan and zoom. The present invention is a general approach to the dynamic display of such multiresolution visual data on one or more 2D displays (such as CRTs or LCD screens). In explaining the invention we will use as our main example the wavelet decomposition of a large digital image (e.g. as used in the JPEG2000 image format). This decomposition takes as its starting point the original pixel data, normally an array of samples on a regular rectangular grid. Each sample usually represents a color or luminance measured at a point in space corresponding to its grid coordinates. In some applications the grid may be very large, e.g. tens of thousands of samples (pixels) on a side, or more. This large size can present considerable difficulties for interactive display, especially when such images are to be browsed remotely, in environments where the server (where the image is stored) is connected to the client (where the image is to be viewed) by a low-bandwidth connection. If the image data are sent from the server to the client in simple raster order, then all the data must be transmitted before the client can generate an overview of the entire image. This may take a long time. Generating such an overview may also be computationally expensive, perhaps, for example, requiring downsampling a 20,000x20,000 pixel image to 500x500 pixels. Not only are such operations too slow to allow for interactivity, but they also require that the client have sufficient memory to store the full image data, which in the case just cited is 1.2 gigabytes (GB) for an 8-bit RGB color image (=3 *20,000^Λ2). Nearly every image on the Web at present is under 100K (0.1MB), because most users are connected to the Web at DSL or lower bandwidth, and larger images would take too long to download. Even in a local setting, on a typical user's hard drive, it is unusual to encounter images larger than 500K (0.5MB). That larger (that is, more detailed) images would often be useful is attested to by the fact that illustrated books, atlases, maps, newspapers and artworks in the average home include a great many images which, if digitized at full resolution, would easily be tens of megabytes in size. Several years ago the dearth of large images was largely due to a shortage of non- volatile storage space (repository space), but advances in hard drive technology, the ease of burning CDROMs, and the increasing prevalence of large networked servers has made repository space no longer the limiting factor. The main bottlenecks now are bandwidth, followed by short-term memory (i.e. RAM) space. Modem image compression standards, such as JPEG2000¹, are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition. The image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and lxl . We refer to the factor by which each resolution differs in size from the next higher — here 2 — as the granularity, which we represent by the variable g. The granularity may change at different scales, but here, for example and without limitation, we will assume that g is constant over the "image pyramid". Obviously the fine details http://www.jpeg.org/JPEG2000.html are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently- sized images or scales are often called levels of detail, or LODs for short. At first glance it may seem as if the storage requirements for this series of differently-sized images might be greater than for the high-resolution image alone, but in fact this is not the case: a low-resolution image serves as a "predictor" for the next higher resolution. This allows the entire image hierarchy to be encoded very efficiently — more efficiently, in fact, than would usually be possible with a non-hierarchical representation of the high-resolution image alone. If one imagines that the sequence of multiresolution versions of the image is stored in order of increasing size in a server's repository, then a natural consequence is that if the image is transferred from a server to a client, the client can obtain a low- resolution overview of the entire image very rapidly; finer and finer details will then "fill in" as the transmission progresses. This is known as incremental or progressive transmission, and is one of the major advantages of multiresolution representations. When progressive transmission is properly implemented, any image at all — no matter how large — can be viewed by a client in its spatial entirety (though not in its full detail) almost immediately, even if the bandwidth of the connection to the server is very modest. Although the ultimate amount of time needed to download the image in full detail remains the same, the order in which this information is sent has been changed such that the large-scale features of an image are transmitted first; this is much more helpful to the client than transmitting pixel information at full detail and in "reading order", from top to bottom and left to right. To make random access efficient in a dynamic and interactive context, it is convenient (though not absolutely required) to subdivide each level of detail into a grid, such that a grid square, or tile, is the basic unit of transmission. The size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile. Hence if we assume 64x64 pixel tiles, the 512x512 pixel image considered earlier has 8x8 tiles at its highest level of detail, 4x4 at the 256x256 level, 2x2 at the 128x128 level, and a single tile at the remaining levels of detail. The JPEG2000 image format includes the features just described for representing tiled, multiresolution and random-access images. If a detail of a large, tiled JPEG2000 image is being viewed interactively by a client on a 2D display of limited size and resolution, then some particular set of adjacent tiles, at a certain level of detail, are needed to produce an accurate rendition. In a dynamic context, however, these may not all be available. Tiles at coarser levels of detail often will be available, however, particularly if the user began with a broad overview of the image. Since tiles at coarser levels of detail span a much wider area spatially, it is likely that the entire area of interest is covered by some combination of available tiles. This implies that the image resolution available will not be constant over the display area. In a previously filed provisional patent application, I have proposed methods for "fading out" the edges of tiles where they abut a blank space at the same level of detail; this avoids the abrupt visual discontinuity in sharpness that would otherwise result when the "coverage" of a fine level of detail is incomplete. The edge regions of tiles reserved for blending are referred to as blending flaps. The simple reference implementation for displaying a finished composite image is a "painter's algorithm": all relevant tiles (that is, tiles overlapping the display area) in the coarsest level of detail are drawn first, followed by all relevant tiles in progressively finer levels of detail. At each level of detail blending was applied at the edges of incomplete areas as described. The result, as desired, is that coarser levels of detail "show through" only in places where they are not obscured by finer levels of detail. Although this simple algorithm works, it has several drawbacks: first, it is wasteful of processor time, as tiles are drawn even when they will ultimately be partly or even completely obscured. In particular, a simple calculation shows that each display pixel will often be (re)drawn log₂(f) times, where f is the magnification factor of the display relative to the lowest level of detail. Second, this technique relies on compositing in the framebuffer — meaning that, at intermediate points during the drawing operation, the regions drawn do not have their final appearance; this makes it necessary to use double-buffering or related methods and perform the compositing off-screen to avoid the appearance of flickering resolution. Third, unless an additional compositing operation is applied, this technique can only be used for an opaque rendition — it is not possible, for example, to ensure that the final rendition has 50% opacity everywhere, allowing other content to "show through". This is because the painter's algorithm relies precisely on the effect of one "layer of paint" (i.e. level of detail) fully obscuring the one undemeath; it is not known in advance where a level of detail will be obscured, and where not. The Invention

The present invention resolves these issues, while preserving all the advantages of the painter's algorithm. One of these advantages is the ability to deal with any kind of LOD tiling, including non-rectangular or irregular tilings, as well as irrational grid tilings, for which I am filing a separate provisional patent application. Tilings generally consist of a subdivision, or tesselation, of the area containing the visual content into polygons. For a tiling to be useful in a multiresolution context it is generally desirable that the areas of tiles at lower levels of detail be larger than the areas of tiles at higher levels of detail; the multiplicative factor by which their sizes differ is the granularity g, which we will assume (but without limitation) to be a constant. In the following, an irrational but rectangular tiling grid will be used to describe the improved algorithm. Generalizations to other tiling schemes should be evident to anyone skilled in the art. The improved algorithm consists of four stages. In the first stage, a composite grid is constructed in the image's reference frame from the superposition of the visible parts of all of the tile grids in all of the levels of detail to be drawn. When the irrational tiling innovation (detailed in a separate provisional patent application) is used, this results in an irregular composite grid, shown schematically in Figure 1. The grid is further augmented by grid lines corresponding to the x- and v- values which would be needed to draw the tile "blending flaps" at each level of detail (not shown in Figure 1, because the resulting grid would be too dense and visually confusing). This composite grid, which can be defined by a sorted list of x- and y- values for the grid lines, has the property that the vertices of all of the rectangles and triangles that would be needed to draw all visible tiles (including their blending flaps) lie at the intersection of an x and v grid line. Let there be n grid lines parallel to the -axis and m grid lines parallel to the y-axis. We then construct a two-dimensional n * m table, with entries corresponding to the squares of the grid. Each grid entry has two fields: an opacity, which is initialized to zero, and a list of references to specific tiles, which is initially empty. The second stage is to walk through the tiles, sorted by decreasing level of detail (opposite to the naϊve implementation). Each tile covers an integral number of composite grid squares. For each of these squares, we check to see if its table entry has an opacity less than 100%, and if so, we add the current tile to its list and increase the opacity accordingly. The per-tile opacity used in this step is stored in the tile data structure. When this second stage is complete, the composite grid will contain entries corresponding to the correct pieces of tiles to draw in each grid square, along with the opacities with which to draw these "tile shards". Normally these opacities will sum to one. Low-resolution tiles which are entirely obscured will not be referenced anywhere in this table, while partly obscured tiles will be referenced only in tile shards where they are partly visible. The third stage of the algorithm is a traversal of the composite grid in which tile shard opacities at the composite grid vertices are adjusted by averaging with neighboring vertices at the same level of detail, followed by readjustment of the vertex opacities to preserve the summed opacity at each vertex (normally 100%). This implements a refined version of the spatial smoothing of scale described in a separate provisional patent application. The refinement comes from the fact that the composite grid is in general denser than the 3x3 grid per tile defined in innovation #4, especially for low-resolution tiles. (At the highest LOD, by construction, the composite gridding will be at least as fine as necessary.) This allows the averaging technique to achieve greater smoothness in apparent level of detail, in effect by creating smoother blending flaps consisting of a larger number of tile shards. Finally, in the fourth stage the composite grid is again traversed, and the tile shards are actually drawn. Although this algorithm involves multiple passes over the data and a certain amount of bookkeeping, it results in far better performance than the naϊve algorithm, because much less drawing must take place in the end; every tile shard rendered is visible to the user, though sometimes at low opacity. Some tiles may not be drawn at all. This contrasts with the naϊve algorithm, which draws every tile intersecting with the displayed area in its entirety. An additional advantage of this algorithm is that it allows partially transparent nodes to be drawn, simply by changing the total opacity target from 100% to some lower value. This is not possible with the naϊve algorithm, because every level of detail except the most detailed must be drawn at full opacity in order to completely "paint over" any underlying, still lower resolution tiles. When the view is rotated in the x-y plane relative to the node, some minor changes need to be made for efficiency. The composite grid can be constructed in the usual manner; it may be larger than the grid would have been for the unrotated case, as larger coordinate ranges are visible along a diagonal. However, when walking through tiles, we need only consider tiles that are visible (by the simple intersecting polygon criterion). Also, composite grid squares outside the viewing area need not be updated during the traversal in the second or third stages, or drawn in the fourth stage. Note that a number of other implementation details can be modified to optimize performance; the algorithm is presented here in a form that makes its operation and essential features easiest to understand. A graphics programmer skilled in the art can easily add the optimizing implementation details. For example, it is not necessary to keep a list of tiles per tile shard; instead, each level of detail can be drawn immediately as it is completed, with the correct opacity, thus requiring only the storage of a single tile identity per shard at any one time. Another exemplary optimization is that the total opacity rendering left to do, expressed in terms of (area) x (remaining opacity), can be kept track of, so that the algorithm can quit early if everything has already been drawn; then low levels of detail need not be "visited" at all if they are not needed. The algorithm can be generalized to arbitrary polygonal tiling patterns by using a constrained Delaunay triangulation instead of a grid to store vertex opacities and tile shard identifiers. This data structure efficiently creates a triangulation whose edges contain every edge in all of the original LOD grids; accessing a particular triangle or vertex is an efficient operation, which can take place in of order n*log(ή) time (where n is the number of vertices or triangles added). The resulting triangles are moreover the basic primitive used for graphics rendering on most graphics platforms.

construction of composite node grid from superimposed irrational LOD tilings

(a) finest LOD (b) next-finest LOD, g=sqrt(3) (c) composite grid

I I fine (a) tile available I j partially obscured coarse (b) tile I I unobscured coarse (b) tile I I fine tile unavailable in (a)

FIGURE 1 A SYSTEM AND METHOD FOR MULTIPLE NODE DISPLAY

RELATED APPLICATION

[0001] This application claims the benefit of provisional application serial number 60/474,313, the disclosure of which is hereby incorporated by reference in its entirely.

BACKGROUND OF THE INVENTION

[0002] The present invention relates to zooming user interfaces (ZUI) for computers. [0003] Most present day graphical computer user interfaces are designed using visual components of a fixed spacial scale. The visual content can be manipulated by zooming in or out or otherwise navigating through it. However, the precision with which coordinates of various objects can be represented is extremely limited by the number of bits, usually between 16 and 64, designated to represent such coordinates. Because of their limited representational size, there is limited precision.

[0004] In the context of the zooming user interface, the user is easily able to zoom in, causing the area which previously covered only a single pixel to fill the entire display. Conversely, the user may zoom out, causing the contents of the entire display to shrink to the size of a single pixel. Since each zoom in or out may multiply or divide the xy coordinates by numerous orders of magnitude, just a few such zooms completely exhaust the precision available with a 64 bit floating point number, for example. Thereafter, round-off causes noticeable degradation of image quality.

[0005] It is an object of the present invention to provide a ZUI in which a larger range of zooms is possible.

[0006] It is a further object of the invention to provide a ZUI in which the precision in which coordinates are represented is related to the required precision needed at a particular zoom level of detail. It is a further object of the present invention to allow a pannable and zoomable two-dimensional space of a finite physical size, but of an arbitrarily high complexity or resolution, to be embedded into a well-defined area of a larger pannable and zoomable two-dimensional space. [0007] A further objective of the present invention is to allow zooming out after a deep zoom-in to behave like the "back" button of a web browser, letting the user retrace his or her steps through a visual navigation.

[0008] A further objective of the present invention is to allow zooming in immediately after zooming out to behave analogously to the "forward" button of a web browser, letting the user precisely undo the effects of an arbitrarily long zoom-out.

[0009] A further objective of the present invention is to allow a node, a visual object as defined more precisely below, to have a very large number of child nodes (for example, up to 10^Λ28).

[0010] A further objective of the present invention is to allow a node to generate its own children programmatically on the fly, enabling content to be defined, created or modified dynamically during navigation.

[0011] A further objective of the present invention is to enable near-immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if the data are stored at a remote location and shared over a low- bandwidth network.

[0012] A further objective of the present invention is to allow the user to zoom arbitrarily far in on visual content while maintaining interactive frame rates.

[0013] A further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates.

[0014] These and other broader objectives of the present invention will become apparent to those skilled in the art from a review of the specification that follows.

SUMMARY OF THE INVENTION

[0015] The above and other objects of the present invention are accomplished by displaying visual content as plural "nodes." Each node preferably has its own coordinate system and rendering method, but may be contained within a parent node, and may be represented in the coordinate system and rendering method of the parent node. As a user navigates the visual content, by for example, zooming in or out, a node is only "launched" when the zooming results in an appropriate level of detail. The launching of the node causes the node to be represented in its own coordinate system and/or rendering method, rather than in the coordinate system and/or rendering method of a different node.

[0016] Prior to the node being launched, the node is either represented in the coordinate system of the parent node, or not represented at all. By launching nodes only when they are required, the precision of a coordinate system is a function of the zoom level of detail of what is being displayed. This allows a variable level of precision, up to and including the maximum permissible by the memory of the computer in which the system operates.

DESCRIPTION OF THE DRAWINGS

[0017] For the puiposes of illustration, there are forms shown in the drawings that are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.

[0018] FIG. 1 is a depiction of visual content on a display;

[0019] FIG. 2 is an image of the visual content of FIG. 1 at a different level of detail;

[0020] FIG. 3 is a representation of an embodiment of the invention;

[0021] FIG. 4 is an exemplary embodiment of the invention showing plural nodes on a display;

[0022] FIG 5 is a tree diagram corresponding to the exemplary embodiment shown in FIG 4.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0023] We assume a user interface metaphor in which the display is a camera, through which the user can view part of a two-dimensional surface, or 2D universe. For convenience, although it is not necessary to do so, we ascribe physical dimensions to this universe, so that it may be, for example, one meter square. The invention is equally applicable to N-dimensional representations.

[0024] The exemplary universe in turn contains 2D objects, or nodes, which have a visual representation, and may also be dynamic or interactive (i.e. video clips, applications, editable text documents, CAD drawings, or still images). For a node to be visible it must be associated with a rendering method, which is able to draw it in whole or in part on some area of the display. Each node is also endowed with a local coordinate system of finite precision. For illustrative purposes, we assume a node is rectangular and represented by a local coordinate system. [0025] These two parameters, the rendering method and coordinate system, specify how to display the node, and the positions of items in the node. Each node may have 0 or more child nodes, which are addressed by reference. The node need not, and generally does not, contain all the information of each child node, but instead only an address providing information necessary to obtain the child node. As a user navigates, for example, zooms in and out, the nodes are displayed on the screen, as shown, for example in FIG. 1.

[0026] Generally, a "node" is the basic unit of functionality in the present invention. Most nodes manifest visually on the user's display during navigation, and some nodes may also be animated and/or respond to user input. Nodes are hierarchical, in that a node may contain child nodes. The containing node is then called a parent node. When a parent node contains a child node, the child's visual manifestation is also contained within the parent's visual manifestation. Each node has a logical coordinate system, such that the entire extent of the node is contained within an exemplary rectangle defined in this logical coordinate system; e.g. a node may define a logical coordinate system such that it is contained in the rectangle (0,0)-(100,100). [0027] Each node may have the following data defining its properties: o the node's logical coordinate system, including its logical size (100 x 100 in the above example); o the identities, positions and sizes of any child nodes, specified in the (parent) node's logical coordinates; o optionally, any necessary user data; executable code defining these operations or "methods": o initialization of the node's data based on "construction arguments" o rendering all or a portion of the node's visual appearance (the output of this method is a rendered tile); o optionally, responding to user input, such as keyboard or mouse events.

[0028] The executable code defines a "node class", and may be shared among many "node instances". Node instances differ in their data content. Hence a node class might define the logic needed to render a JPEG image. The "construction arguments" given to the initialization code would then include the URL of the JPEG image to display. A node displaying a particular image would be an instance of the JPEG node class. Plural instances of a node may be viewable in the same visual content, similar to the way a software application may be instantiated numerous times simultaneously.

[0029] Note that in a complex visual document or application, it is usually possible to divide the necessary functionality into nodes in many different ways. For example, a scripted web- page-like document containing multiple images, pull-down menus and buttons could be implemented as a single node with complex rendering and user input methods. Alternatively, it could be implemented as a parent node which only defines the overall layout of the page, with every constituent image and button a child node. This has the obvious advantage of reusing or "factoring" the functionality more effectively: the buttons may all have the same behavior, and hence all be instances of the same node class; the images may all be in the same format and so also be instances of a common node class, etc. This also simplifies rearranging the layout — the parent node can easily move or resize the child nodes.

[0030] In accordance with the present invention, visual content may be displayed in a manner that depends upon the state of navigation input by a user. For example, Fig. 1 shows a node 105 which may be the image of a portion of the city. Node 105 may contain child nodes 101-103. Node 101 may be an image of a building in the city, node 102 could be an image of a playground, and node 103 might be a sports arena. At the level of zoom shown, nodes 101-103 are relatively small, so they can be represented as a small darkened area with no detail in node 105, located at the correct location in the coordinate system of node 105. Only the coordinate system and the rendering method of node 105 is needed.

[0031] Consider the case where the user now zooms in so that a different level of detail (LOD) such as that shown in Fig. 2 results. In the LOD of Fig. 2, nodes 101 and 102 are no longer visible on the screen, due to the fact that the visual content is displayed as much larger. Additionally, it is noted that the because the size at which sports arena node 103 is displayed is now much larger, the details of the sports arena, such as the individual seats, the field, etc, now must be displayed.

[0032] In furtherance of the foregoing, sports arena node 103 would now be displayed not as a darkened area with no detail in the coordinate system of node 105, but rather, it would be "launched" to be displayed using its own coordinate system, and rendering method. When displayed using its own coordinate system and rendering method, the details such as seating, the filed of play, etc. would be individually shown. Other functions discussed above, and associated with the node 103, would also begin executing at the point when node 103 is launched. The particular navigation condition that causes the launching of node 103, or any node for that matter, is a function of design choice and is not critical to the present invention. [0033] The precision with which the node 103 will be displayed is the combined precision of the coordinate system utilized by node 105, as well as that of node 103. Thus, for example, if the coordinate system of each of said nodes utilizes 8 bits, then the combined precision will be 16 bits because the coordinate system of node 103 is only utilized to specify the position of items in node 103, but the overall location of node 103 within node 105 is specified within the coordinate system of node 105. Note that this nesting may continue repeatedly if sports arena 103 itself contains additional nodes within it. For example, one such node 201 may in fact be a particular concession stand within the sports arena. It is represented without much detail in the coordinate system and rendering method of node 103. As a user continues zooming in on sports arena 103, at some point node 201 will launch. If it is displayed using 8 bits of precision, those 8 bits will specify where within the node 201 coordinate system particular items are to be displayed. Yet, the location of node 201 within node 103 will be maintained to 8 bits of precision within the coordinate system of node 103, the location of which will in turn be maintained within the coordinate system of node 105 using 8 bits. Hence, items within node 201 will ultimately be displayed using 24 bits of precision.

[0034] By nesting nodes within nodes, the precision at which visual content may ultimately be displayed is limited only by the memory capacity of the computer. The ultimate precision with which visual content in a node is displayed after that node is launched is effectively the combined precision of all parent nodes and the precision of the node that has launched. Hence, depending upon the level of nesting, the precision may increase as needed limited only by the storage capacity of the computer, which is almost always much more than sufficient. Additionally, the increased precision is only utilized when necessary, because if the image is at an LOD that does not require launching, then in accordance with the above description, it will only be displayed with the precision of the node within which it is contained if that node has been launched. Thus, for nodes nested within other nodes, as one moves from the outermost node inward, one may traverse nodes that have launched until finally reaching a node that has not launched yet. Any such unlaunched node, and nodes further within it, will be displayed only with the precision of the last traversed node that has launched. [0035] This results in an "accordion" type precision, wherein the precision at which visual content is displayed expands and contracts as necessary and as dictated by the navigational input of the user, maximizing the efficiency of system resources by using them only when necessary for higher precision.

[0036] It is also noted, that when a node launches the display of that node changes from being based upon the coordinates and rendering method of the parent node to the coordinates and rendering method of the child node. That change is optimally made gradual through the use of blending, as described, for example, in copending US patent application no. 10/790,253. However, other methodologies of gradually changing from the display of the information in the coordinate system and rendering method the parent node to the child node are possible. The system could be programmed, for example, that over a particular range, the blending from parent to child occurs. Then, as the user traverses through that range during a zoom, the changeover occurs, unless the navigation is ceased during that range, in which case the blending may continue until fully displayed in the appropriate coordinate system. [0037] An additional issue solved by the present invention relates to a system for maintaining the spatial intercelationship among all nodes during display. More particularly, during dynamic navigation such as zooming and panning, many different coordinate systems are being used to display potentially different nodes. Some nodes, as explained above, are being displayed merely as an image in the coordinate system of other nodes, and some are being displayed in their own coordinate systems. Indeed, the entire visual display may be populated with nodes displayed at different positions in different coordinate systems, and the coordinate systems and precisions used for the various nodes may vary during navigation as nodes are launched. Hence, it is important to ensure that the nodes are properly located with respect to each other, because each node is only knowledgeable of its own coordinate system. The present invention provides a technique for propagating relative location information among all of the nodes and for updating that information when needed so that each node will "know" the proper position in the overall view at which it should render itself.

[0038] The foregoing may be accomplished with the addition of a field to the node structure and an additional address stack data structure. The expanded node definition includes a field which we term the "view" field, and which is used by the node to locate itself relative to the entire display. The view field represents, in the coordinates of that node, the visible area of the node — that is, the image of the display rectangle in the node's coordinates. This rectangle may only partially overlap the node's area, as when the node is partially off-screen. Clearly the view field cannot always be kept updated for every node, as we cannot necessarily traverse the entire directed graph of nodes in real time as navigation occurs. The stack structure is defined thus: Stack<Address> viewStack; where this stack is a global variable of the client (the computer connected to the display). For exemplary purposes we assume that navigation begins with an overview of a universe of content, defined by a root node; then this root node is pushed onto the viewStack, and the root node's view field might be initialized to be the entire area of the root node, i.e. rootNode . iew = rootNode. coordSystem; Push (viewStack, rootNode);

[0039] Schematically, the viewStack will specify the addresses of a sequence of nodes "pierced" by a point relative to the display, which we will take in our exemplary implementation to be the center of the display. This sequence must begin with the root node, but may be infinite, and therefore must be truncated. In an exemplary embodiment, the sequence is truncated when the nodes "pierced" become smaller than some minimum size, defined as minimumAre a. The current view is then represented by the view fields of all of the nodes in the viewStack:, each of which specify the current view in terms of the node's local coordinate system. If a user has zoomed very deeply into a universe, then the detailed location of the display will be given most precisely by the view field of the last node in the stack. The last element's view field does not, however, specify the user's viewpoint relative to the entire universe, but only relative to its local coordinates. The view field of the root node, on the other hand, does specify where in the universe the user is looking. Nodes closer to the "fine end" of the viewStack thus specify the view position with increasing precision, but relative to progressively smaller areas in the universe. This is shown conceptually in FIG. 3, where it can be seen that of the three nodes that have been launched, node 303 provides the most accurate indication of where the user is looking, since its coordinate system is the "finest", but node 301 provides information, albeit not as fine, on a much larger area of the visual content.

[0040] The problem then reduces to the following: the views (i.e. view fields) of all visible nodes must be kept synchronized as the user navigates through the universe, panning and zooming. Failure to keep them synchronized would result in the appearance of nodes moving on the display independently of each other, rather than behaving as a cohesive and physically consistent 2D surface.

[0041] Changing the view during any navigation operation proceeds as follows. Because the last node in the viewStack has the most precise representation of the view, the first step is to alter the view field of this last node; this altered view is taken to be the correct new view, and any other visible nodes must follow along. The second step is to propagate the new view "upward" toward the root node, which entails making progressively smaller and smaller changes to the view fields of nodes earlier in the stack. If the; user is deeply zoomed, then at some point in the upward propagation the alteration to the view may be so small that it ceases to be accurately representable; upward propagation stops at this node. At each stage of the upward propagation, the change is also propagated downward to other visible nodes. Hence, first, the last node's parent's view is modified; then, in the downward propagation, the last node's "siblings". The next upward propagation modified tire grandparent's view, and the second downward propagation modifies first uncles, then first cousins. The downward propagation is halted, as before, when the areas of "cousin nodes" become smaller than minimumArea, or when a node falls entirely offscreen.

[0042] The foregoing technique involves translating the layout of the various nodes into a tree, which conceptually is illustrated in FIGs. 4 and 5. As can be seen from FIGs. 4 and 5, there is a corresponding tree for a particular displayed set of nodes, and the tree structure may be used to propagate the view information as previously described.

[0043] A panning operation may move the last node far enough away that it no longer belongs in the viewStack. Alternatively, zooming in might enlarge a child to the extent that a lengthening of the viewStack is required, or zooming out might bring the last node's area below a minimum area requiring a truncation of the viewStack. In all of these cases the identity of the last node changes. These situations are detected during the downward propagation, which may alter the viewStack accordingly, potentially leaving it longer or shorter.

[0044] One simple case of the foregoing is that diαring zooming, a node gets launched so that now it needs to be placed in the view stack. Another- example is that by zooming out, a previously visible node becomes so small that it must be removed from the viewstack. [0045] An extension of the idea is to avoid truncating the viewStack immediately in response to a long outward zoom. Trancating the viewStack is only necessary if the user then pans. Although a long outward zoom will cause the view fields of deeply zoomed nodes to grow veiy large (and therefore numerically inaccurate), a field Point2D viewCenter; [0046] can be added to the Node structure, representing the central point of the view rectangle; zooming without panning therefore does not alter the viewCenter field of any node. This construction allows zooming far outward to be followed immediately by zooming back in. Because the viewStack has been left intact, the user can then return to precisely the starting view. This behavior is analogous to the "back" and "forward" buttons of a web browser: "back" is analogous to zooming out, and "forward" is analogous to zooming back in. In a web browser, if a user uses "back" to return to a previous web page, but then follows an alternate link, it is at this point that "forward" ceases to work. Following an alternate link is thus analogous to panning after zooming out.

[0047] The foregoing provides that visual content may be displayed and navigated in a variety of fashions with substantially infinite precision, limited only by the capacity of the computer system on which the application is running. The visual content displayed at any given time is then displayed as an assembly of nodes, wherein only the nodes needed for the particular view have been launched, and all other nodes are displayed without launching as part of another node or not at all. It is understood that various other embodiments will be apparent to those of ordinary skill in the art, and that the invention is not limited to the embodiments described herein.

Fig 1

Fig 2

Display

Fig 3

Fig 4

Fig 5 489/9 METHODS AND APPARATUS FOR NAVIGATING AN IMAGE

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Patent Application Serial No. 60/453,897, filed March 12, 2003, entitled SYSTEM AND METHOD FOR FOVEATED, SEAMLESS, PROGRESSIVE RENDERING IN A ZOOMING USER INTERFACE, the entire disclosure of which is hereby incorporated by reference.

BACKGROUND OF THE INVENTION

[0002] The present invention relates to methods and apparatus for navigating, such as zooming and panning, over an image of an object in such a way as to provide the appearance of smooth, continuous navigational movement.

[0003] Most conventional graphical computer user interfaces (GUIs) are designed using visual components of fixed spatial scale, it has long been recognized, however, that visual components may be represented and manipulated such that they do not have a fixed spatial scale on the display; indeed, the visual components may be panned and/or zoomed in or out. The ability to zoom in and out on an image is desirable in connection with, for example, viewing maps, browsing through text layouts such as newspapers, viewing digital photographs, viewing blueprints or diagrams, and viewing other large data sets .

[0004] Many existing computer applications, such as Microsoft Word, Adobe Photo Shop, Adobe Acrobat, etc., include zoomable components. In general, the zooming capability provided by these computer applications is a peripheral aspect of a user's interaction with the software and the zooming feature is only employed occasionally. These computer applications permit a user to pan over an image smoothly and continuously (e.g., utilizing scroll bars or the cursor to translate the viewed image left, right, up or down) . A significant problem with such computer applications, however, is that they do not permit a user to zoom smoothly and continuously. Indeed, they provide zooming in discrete steps, such as 10%, 25%, 50%, 75%, 100%, 150%, 200%, 500%, etc. The user selects the desired zoom using the cursor and, in response, the image changes abruptly to the selected zoom level .

[0005] The undesirable qualities of discontinuous zooming also exist in Internet-based computer applications. The computer application underlying the www.mapquest . com website illustrates this point. The MapQuest website permits a user to enter one or more addresses and receive an image of a roadmap in response. FIGS. 1-4 are examples of images that one may obtain from the MapQuest website in response to a query for a regional map of Long Island, NY, U.S.A. The MapQuest website permits the user to zoom in and zoom out to discrete levels, such as 10 levels. FIG. 1 is a rendition at zoom level 5, which is approximately 100 meters/pixel. FIG. 2 is an image at a zoom level 6, which is about 35 meters/pixel. FIG. 3 is an image at a zoom level 7, which is about 20 meters/pixel. FIG. 4 is an image at a zoom level 9, which is about 10 meters/pixel.

[0006] As can be seen by comparing FIGs. 1-4, the abrupt transitions between zoom levels result in a sudden and abrupt loss of detail when zooming out and a sudden and abrupt addition of detail when zooming in. For example, no local, secondary or connecting roads may be seen in FIG. 1 (at zoom level 5), although secondary and connecting roads suddenly appear in FIG. 2, which is the very next zoom level. Such abrupt discontinuities are very displeasing when utilizing the MapQuest website. It is noted, however, that even if the MapQuest software application were modified to permit a view of, for example, local streets at zoom level 5 (FIG. 1), the results would still be unsatisfactory. Although the visual density of the map would change with the zoom level such that at some level of zoom, the result might be pleasing (e.g., at level 7, FIG. 3), as one zoomed in the roads would not thicken, making the map look overly sparse. As one zoomed out, the roads would eventually run into each other, rapidly forming a solid nest in which individual roads would be indistinguishable. [0007] The ability to provide smooth, continuous zooming on images of road maps is problematic because of the varying levels of coarseness associated with the road categories. In the United States, there are about five categories of roads (as categorized under the Tiger/Line Data distributed by the U.S. Census Bureau): Al, primary highways; A2, primary roads; A3, state highways, secondary roads, and connecting roads; A4, local streets, city streets and rural roads; and A5, dirt roads. These roads may be considered the elements of an overall object (i.e., a roadmap). The coarseness of the road elements manifests because there are considerably more A4 roads than A3 roads, there are considerably more A3 roads than A2 roads, and there are considerably more A2 roads than Al roads. In addition, the physical dimensions of the roads (e.g., their widths), vary significantly. Al roads may be about 16 meters wide, A2 roads may be about 12 meters wide, A3 roads may be about 8 meters wide, A4 roads may be about 5 meters wide, and A5 roads may be about 2.5 meters wide.

[0008] The MapQuest computer application deals with these varying levels of coarseness by displaying only the road categories deemed appropriate at a particular zoom level. For example, a nation-wide view might only show Al roads, while a state-wide view might show Al and A2 roads, and a county-wide view might show Al, A2 and A3 roads. Even if MapQuest were modified to allow continuous zooming of the roadmap, this approach would lead to the sudden appearance and disappearance of road categories during zooming, which is confusing and visually displeasing.

[0009] In view of the foregoing, there are needs in the art for new methods and apparatus for navigating images of complex objects, which permit smooth and continuous zooming of the image while also preserving visual distinctions between the elements of the objects based on their size or importance.

SUMMARY OF THE INVENTION

[0010] In accordance with one or more aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: zooming into or out of an image having at least one object, wherein at least some elements of at least one object are scaled up and/or down in a way that is non-physically proportional to one or more zoom levels associated with the zooming.

[0011] The non-physically proportional scaling may be expressed by the following formula: p = c ^• d ^• z^a, where p is a linear size in pixels of one or more elements of the object at the zoom level, c is a constant, d is a linear size in physical units of the one or more elements of the object, z is the zoom level in units of physical linear size/pixel, and a is a scale power where a ≠ -1. [0012] Under non-physical scaling, the scale power a is not equal to -1 (typically -1 < a < 0) within a range of zoom levels zO and zl, where zO is of a lower physical linear size/pixel than zl. Preferably, at least one of zO and zl may vary for one or more elements of the object. It is noted that a, c and d may also vary from element to element.

[0013] At least some elements of the at least one object may also be scaled up and/or down in a way that is physically proportional to one or more zoom levels associated with the zooming. The physically proportional scaling may be expressed by the following formula: p = c ^■ d/z, where p is a linear size in pixels of one or more elements of the object at the zoom level, c is a constant, d is a linear size of the one or more elements of the object in physical units, and z is the zoom level in units of physical linear size/pixel . [0014] It is noted that the methods and apparatus described thus far and/or described later in this document may be achieved utilizing any of the known technologies, such as standard digital circuitry, analog circuitry, any of the known processors that are operable to execute software and/or firmware programs, programmable digital devices or systems, programmable array logic devices, or any combination of the above. The invention may also be embodied in a software program for storage in a suitable storage medium and execution by a processing unit.

[0015] The elements of the object may be of varying degrees of coarseness. For example, as discussed above, the coarseness of the elements of a roadmap object manifests because there are considerably more A4 roads than A3 roads, there are considerably more A3 roads than A2 roads, and there are considerably more A2 roads than Al roads. Degree of coarseness in road categories also manifests in such properties as average road length, frequency of intersections, and maximum curvature. The coarseness of the elements of other image objects may manifest in other ways too numerous to list in their entirety. Thus, the scaling of the elements in a given predetermined image may be physically proportional or non-physically proportional based on at least one of: (i) a degree of coarseness of such elements; and (ii) the zoom level of the given predetermined image. For example, the object may be a roadmap, the elements of the object may be roads, and the varying degrees of coarseness may be road hierarchies. Thus; the scaling of a given road in a given predetermined image may be physically proportional or non-physically proportional based on: (i) the road hierarchy of the given road; and (ii) the zoom level of the given predetermined image.

[0016] In accordance with one or more further aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of a roadmap; receiving one or more user navigation commands including zooming information at the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation.

[0017] In accordance with one or more still further aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: receiving at a client terminal a plurality of pre-rendered images of varying zoom levels of at least one object, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pre-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving one or more user navigation commands including zooming information at the client terminal; blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal .

[0018] In accordance with one or more still further aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: transmitting a plurality of pre-rendered images of varying zoom levels of a roadmap to a client terminal over a communications channel; receiving the plurality of pre-rendered images at the client terminal; issuing one or more user navigation commands including zooming information using the client terminal; and blending two or more of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands such that a display of the intermediate image on the client terminal provides the appearance of smooth navigation.

[0019] In accordance with one or more still further aspects of the present invention, methods and apparatus are contemplated to perform various actions, including: transmitting a plurality of pre-rendered images of varying zoom levels of at least one object to a client terminal over a communications channel, at least some elements of the at least one object being scaled up and/or down in order to produce the plurality of pre-determined images, and the scaling being at least one of: (i) physically proportional to the zoom level; and (ii) non-physically proportional to the zoom level; receiving the plurality of pre-rendered images at the client terminal; issuing one or more user navigation commands including zooming information using the client terminal; blending two of the pre-rendered images to obtain an intermediate image of an intermediate zoom level that corresponds with the zooming information of the navigation commands; and displaying the intermediate image on the client terminal.

[0020] Other aspects, features, and advantages will become apparent to one of ordinary skill in the art when the description herein is taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] For the purposes of illustrating the invention, forms are shown in the drawing, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.

[0022] FIG. 1 is an image taken from the MapQuest website, which is at a zoom level 5;

[0023] FIG. 2 is an image taken from the MapQuest website, which is at a zoom level 6;

[0024] FIG. 3 is an image taken from the MapQuest website, which is at a zoom level 7; [0025] FIG . 4 is an image taken from the MapQuest website, which is at a zoom level 9;

[0026] FIG. 5 is an image of Long Island produced at a zoom level of about 334 meters/pixel in accordance with one or more aspects of the present invention;

[0027] FIG. 6 is an image of Long Island produced at a zoom level of about 191 meters/pixel in accordance with one or more further aspects of the present invention;

[0028] FIG. 7 is an image of Long Island produced at a zoom level of about 109.2 meters/pixel in accordance with one or more further aspects of the present invention;

[0029] FIG. 8 is an image of Long Island produced at a zoom level of about 62.4 meters/pixel in accordance with one or more further aspects of the present invention;

[0030] FIG. 9 is an image of Long Island produced at a zoom level of about 35.7 meters/pixel in accordance with one or more further aspects of the present invention;

[0031] FIG. 10 is an image of Long Island produced at a zoom level of about 20.4 meters/pixel in accordance with one or more further aspects of the present invention;

[0032] FIG. 11 is an image of Long Island produced at a zoom level of about 11.7 meters/pixel in accordance with one or more further aspects of the present invention;

[0033] FIG. 12 is a flow diagram illustrating process steps that may be carried out in order to provide smooth and continuous navigation of an image in accordance with one or more aspects of the present invention;

[0034] FIG. 13 is a flow diagram illustrating further process steps that may be carried out in order to smoothly navigate an image in accordance with various aspects of the present invention;

[0035] FIG. 14 is a log-log graph of a line width in pixels versus a zoom level in meters/pixel illustrating physical and non-physical scaling' in accordance with one or more further aspects of the present invention; and

[0036] FIG. 15 is a log-log graph illustrating variations in the physical and non-physical scaling of FIG. 14.

[0037] FIGS. 16A-D illustrate respective antialiased vertical lines whose endpoints are precisely centered on pixel coordinates;

[0038] FIGS. 17A-C illustrate respective antialiased lines on a slant, with endpoints not positioned to fall at exact pixel coordinates; and

[0039] FIG. 18 is the log-log graph of line width versus zoom level of FIG. 14 including horizontal lines indicating incremental line widths, and vertical lines spaced such that the line width over the interval between two adjacent vertical lines changes by no more than two pixels.

DETAILED DESCRIPTION OF THE INVENTION

[0040] Referring now to the drawings, wherein like numerals indicate like elements, there is shown in FIGS. 5-11 a series of images representing the road system of Long Island, NY, U.S.A. where each image is at a different zoom level (or resolution) . Before delving into the technical details of how the present invention is implemented, these images will now be discussed in connection with desirable resultant features of using the invention, namely, at least the appearance of smooth and continuous navigation, particularly zooming, while maintaining informational integrity.

[0041] It is noted that the various aspects of the present invention that will be discussed below may be applied in contexts other than the navigation of a roadmap image. Indeed, the extent of images ard implementations for which the present invention may be employed are too numerous to list in their entirety. For example, the features of the present invention may be used to navigate irrages of the human anatomy, complex topographies, engineering diagrams such as wiring diagrams or blueprints, gene ontologies, etc. It has been found, however, that the invention has particular applicability to navigating images in which the elements thereof are of varying levels of detail or coarseness. Therefore, for the purposes of brevity and clarity, the various aspects of the present invention will be discussed in connection with a specific example, namely, images of a roadmap. [0042] Although it is impossible to demonstrate the appearance of smooth and continuous zooming in a patent document, this feature has been demonstrated through experimentation and prototype development by executing a suitable software program on a Pentium-based computer. The image 100A of the roadmap illustrated in FIG. 5 is at a zoom level that may be characterized by units of physical length/pixel (or physical linear size/pixel) . In other words, the zoom level, z, represents the actual physical linear size that a single pixel of the image 100A represents. In FIG. 5, the zoom level is about 334 meters/pixel. Those skilled in the art will appreciate that the zoom level may be expressed in other units without departing from the spirit and scope of the claimed invention. FIG. 6 is an image 100B of the same roadmap as FIG. 5, although the zoom level, z, is about 191 meters/pixel. [0043] In accordance with one or more aspects of the present invention, a user of the software program embodying one or more aspects of the invention may zoom in or out between the levels illustrated in FIGS. 5 and 6. It is significant to note that such zooming has the appearance of smooth and continuous transitions from the 334 meters/pixel level (FIG. 5) to/from the 191 meters/pixel level (FIG. 6) and any levels therebetween. Likewise, the user may zoom to other levels, such as z = 109.2 meters/pixel (FIG. 7), z = 62.4 meters/pixel (FIG. 8), z = 35.7 meters/pixel (FIG. 9), z = 20.4 meters/pixel (FIG. 10), and z = 11.7 meters/pixel (FIG. 11) . Again, the transitions through these zoom levels and any levels therebetween advantageously have the appearance of smooth and continuous movements .

[0044] Another significant feature of the present invention as illustrated in FIGS. 5-11 is that little or no detail abruptly appears or disappears when zooming from one level to another level. The detail shown in FIG. 8 (at the zoom level of z = 62.4 meters/pixel) may also be found in FIG. 5 (at a zoom level of z = 334 meters/pixel) . This is so even though the image object, in this case the roadmap, includes elements (i.e., roads) of varying degrees of coarseness . Indeed, the roadmap 100D of FIG. 8 includes at least Al highways such as 102, A3 secondary roads such as 104, and A4 local roads such as 106. Yet these details, even the A4 local roads 106, may still be seen in image 100A of FIG. 5, which is substantially zoomed out in comparison with the image 100D of FIG. 8.

[0045] Still further, despite that the A4 local roads 106 may be seen at the zoom level of z = 334 meters/pixel (FIG. 5) the Al, A2, A3, and A4 roads may be distinguished from one another. Even differences between Al primary highways 102 and A2 primary roads 108 may be distinguished from one another vis-a-vis the relative weight given to such roads in the rendered image 100A. [0046] The ability to distinguish among the road hierarchies is also advantageously maintained when the user continues to zoom in, for example, to the zoom level of z = 20.4 meters/pixel as illustrated in image 100F of FIG. 10. Although the weight of the Al primary highway 102 significantly increases as compared with the zoom level of z = 62.4 meters/pixel in FIG. 8, it does not increase to such an extent as to obliterate other detail, such as the A4 local roads 106 or even the A5 dirt roads. Nevertheless, the weights of the roads at lower hierarchical levels, such as A4 local roads 106 significantly increase in weight as compared with their counterparts at the zoom level z = 62.4 meters/pixel in FIG. 8. [0047] Thus, even though the dynamic range of zoom levels between that illustrated in FIG. 5 and that illustrated in FIG. 11 is substantial and detail remains substantially consistent (i.e., no roads suddenly appear or disappear while smoothly zooming) , the information that the user seeks to obtain at a given zooming level is not obscured by undesirable artifacts. For example, at the zoom level of z = 334 meters/pixel (FIG. 5) , the user may wish to gain a general sense of what primary highways exist and in what directions they extend. This information may readily be obtained even though the A4 local roads 106 are also depicted. At the zoom level of z = 62.4 meters/pixel (FIG. 8), the user may wish to determine whether a particular Al primary highway 102 or A2 primary road 108 services a particular city or neighborhood. Again, the user may obtain this information without interference from other much more detailed information, such as the existence and extent of A4 local roads 106 or even A5 dirt roads. Finally, at the zoom level of z = 11.7 meters/pixel, a user may be interested in finding a particular A4 local road such as 112, and may do so without interference by significantly larger roads such as the Al primary highway 102. [0048] In order to achieve one or more of the various aspects of the present invention discussed above, it is contemplated that one or more computing devices execute one or more software programs that cause the computing devices to carry out appropriate actions. In this regard, reference is now made to FIGS. 12-13, which are flow diagrams illustrating process steps that are preferably carried out by the one or more computing devices and/or related equipment .

[0049] While it is preferred that the process flow is carried out by commercially available computing equipment (such as Pentium-based computers), any of a number of other techniques may be employed to carry out the process steps without departing from the spirit and scope of the present invention as claimed. Indeed, the hardware employed may be implemented utilizing any other known or hereinafter developed technologies, such as standard digital circuitry, analog circuitry, any of the known processors that are operable to execute software and/or firmware programs, one or more programmable digital devices or systems, such as programmable read only memories (PPOMs) , programmable array logic devices (PALs) , any combination of the above, etc. Further, the methods of the present invention may be embodied in a software program that may be stored on any of the known or hereinafter developed media. [0050] FIG, 12 illustrates an embodiment of the invention in which a plurality of images are prepared (each at a different zoom level or resolution), action 200, and two or more of the images are blended together to achieve the appearance of smooth navigation, such as zooming (action 206) . Although not required to practice the inventicn, it is contemplated that the approach illustrated in FIG. 12 be employed in connection with a service provider - client relationship. For example, a service provider would expend the resources to prepare a plurality of pre-rendered images (action 200) ard make the images available to a user's client terminal a communications channel, such as the Internet (action 202). Alternatively, the pre-rendered images may be an integral or related par -. of an application program that the user loads and executes on his or her computer.

[0051] It has been found through experimentation that, when the blending approach is used, a set of images at the following zoom levels work well when the image object is a roadmap: 30 meters/pix ] , 50 meters/pixel, 75 meters/pixel, 100 meters/pixel, 200 meters/pixel, 300 meters/pixel, 500 meters/pixel, 1000 meters/pixel, and 3000 meters/pixel. It is noted, however, that any number of images may be employed at any number of resolutions without departing from the scope of the invention. Indeed, other image objects in other contexts may be best served by a larger or smaller nu oer of images, where the specific zoom levels are different from the example above . [0052] Irrespective of how the • images are obtained by the client terminal, in response to user-initiated navigation commands (action 204), such as zooming commands, the client terminal is preferably operable to blend two or more images in order to produce an intermediate resolution image that coincides with the navigation command (action 206) . This blending may be accomplished by a number of methods, such as the well-known trilinear interpolation technique described by Lance Williams, Pyramidal Parametrlcs, Computer Graphics, Proc. SIGGRAPH 83, 17(3): 1-11 (1983), the entire disclosure of which is incorporated herein by reference. Other approaches to image interpolation are also useful in connection with the present invention, such as bicubic-linear interpolation, and still others may be developed in the future. It is noted that the present invention does not require or depend on any particular one of these blending methods. For example, as shown in FIG. 8, the user may wish to navigate to a zoom level of 62.4 meters/pixel. As this zoom level may be between two of the pre-rendered images (e.g., in this example between zoom level 50 meters/pixel and zoom level 75 meters/pixel) , the desired zoom level of 62.4 meters/pixel may be achieved using the trilinear interpolation technique. Further, any zoom level between 50 meters/pixel and 75 meters/pixel may be obtained utilizing a blending method as described above, which if performed quickly enough provides the appearance of smooth and continuous navigation. The blending technique may be carried through to other zoom levels, such as the 35.7 meters/pixel level illustrated in FIG. 9. In such case, the blending technique may be performed as between the pre-rendered images of 30 meters/pixel and 50 meters/pixel of the example discussed thus far.

[0053] The above blending approach may be used when the computing power of the processing unit on which the invention is carried out is not high enough to (i) perform the rendering operation in the first instance, and/or (ii) perform image rendering "just-in-time" or "on the fly" (for example, in real time) to achieve a high image frame rate for smooth navigation. As will be discussed below, however, other embodiments of the invention contemplate use of known, or hereinafter developed, high power processing units that are capable of rendering at the client terminal for blending and/or high frame rate applications.

[0054] The process flow of FIG. 13 illustrates the detailed steps and/or actions that are preferably conducted to prepare one or more images in accordance with the present invention. At action 220, the information is obtained regarding the image object or objects using any of the known or hereinafter developed techniques . Usually, such image objects have been modeled using appropriate primitives, such as polygons, lines, points, etc. For example, when the image objects are roadmaps, models of the roads in any Universal Transverse Mercator (UTM) zone may readily be obtained. The model is usually in the form of a list of^" line segments (in any coordinate system) that comprise the roads in the zone. The list may be converted into an image in the spatial domain (a pixel image) using any of the known or hereinafter developed rendering processes so long as it incorporates certain techniques for determining the weight (e.g., apparent or real thickness) of a given primitive in the pixel (spatial) domain. In keeping with the roadmap example above, the rendering processes should incorporate certain techniques for determining the weight of the lines that model the roads of the roadmap in the spatial domain. These techniques will be discussed below.

[0055] At action 222 (FIG. 13) , the elements of the object are classified. In the case of a roadmap object, the classification may take the form of recognizing already existing categories, namely, Al, A2, A3, A , and A5. Indeed, these road elements have varying degrees of coarseness and, as will be discussed below, may be rendered differently based on this classification. At action 224, mathematical scaling is applied to the different road elements based on the zoom level. As will be discussed in more detail below, the mathematical scaling may also vary based on the element classification.

[0056] By way of background, there are two conventional techniques for rendering image elements such as the roads of a map: actual physical scaling, and pre-set pixel width. The actual physical scaling technique dictates that the roadmap is rendered as if viewing an actual physical image of the roads at different scales. Al highways, for example, might be 16 meters wide, A2 roads might be 12 meters wide, A3 roads might be 8 meters wide, A4 roads might be 5 meters wide, and A5 roads might be 2.5 meters wide. Although this might be acceptable to the viewer when zoomed in on a small area of the map, as one zooms out, all roads, both major and minor, become too thin to make out. At some zoom level, say at the state level (e.g., about 200 meters/pixel), no roads would be seen at all.

[0057] The pre-set pixel width approach dictates that every road is a certain pixel width, such as one pixel in width on the display. Major roads, such as highways, may be emphasized by making them two pixels 'wide, etc. Unfortunately this approach makes the visual density of the map change as one zooms in and out. At some level of zoom, the result might be pleasing, e.g., at a small-size county level. As one zooms in, however, roads would not thicken, making the map look overly sparse. Further, as one zooms out, roads would run into each other, rapidly forming a solid nest in which individual roads would be indistinguishable.

[0058] In accordance with one or more aspects of the present invention, at action 224, the images are produced in such a way that at least some image elements are scaled up and/or down either (i) physically proportional to the zoom level; or (ii) non-physically proportional to the zoom level, depending on parameters that will be discussed in more detail below. [0059] It is noted that the scaling being "physically proportional to the zoom level" means that the number of pixels representing the road width increases or decreases with the zoom level as the size of an element would appear to change with its distance from the human eye. The perspective formula, giving the apparent length y of an object of physical size d, is: y = c ^• d/x, [0060] where c is a constant determining the angular perspective and x is the distance of the object from the viewer. [0061] In the present invention, the linear size of an object of physical linear size d' in display pixels p is given by p = d' ^• z^a, [0062] where z is the zoom level in units of physical linear size/pixel (e.g. meters/pixel), and a is a power law. When a = -1 and d/ = d (the real physical linear size of the object) , this equation is dimensionally correct and becomes equivalent to the perspective formula, with p = y and z = x/c. This expresses the equivalence between physical zooming and perspective transformation: zooming in is equivalent to moving an object closer to the viewer, and zooming out is equivalent to moving the object farther away.

[0063] To implement non-physical scaling, a may be set to a power law other than -1, and d' may be set to a physical linear size other than the actual physical linear size d. In the context of a road map, where p may represent the displayed width of a road in pixels and d' may represent an imputed width in physical units, "non-physically proportional to the zoom level" means that the road width in display pixels increases or decreases with the zoom level in a way other than being physically proportional to the zoom level, i.e. a ≠ -1. The scaling is distorted in a way that achieves certain desirable results. [0064] It is noted that "linear size" means one-dimensional size. For example, if one considers any 2 dimensional object and doubles its "linear size" then one multiplies the area by 4 = 2². In the two dimensional case, the linear sizes of the elements of an object may involve length, width, radius, diameter, and/or any other measurement that one can read off with a ruler on the Euclidean plane. The thickness of a line, the length of a line, the diameter of a circle or disc, the length of one side of a polygon, and the distance between two points are all examples of linear sizes. In this sense the "linear size" in two dimensions is the distance between two identified points of an object on a 2D Euclidean plane. For example, the linear size can be calculated by taking the square root of (dx² + dy²) , where dx = xl - xO, dy = yl - yO, and the two identified points are given by the Cartesian coordinates (xO, yO) and (xl, yl) .

[0065] The concept of "linear size" extends naturally to more than two dimensions; for example, if one considers a volumetric object, then doubling its linear size involves increasing the volume by 8 = 2³. Similar measurements of linear size can also be defined for non-Euclidean spaces, such as the surface of a sphere. [0066] Any power law a < 0 will cause the rendered size of an element to decrease as one zooms out, and increase as one zooms in. When a < -1, the rendered size of the element will decrease faster than it would with proportional physical scaling as one zooms out. Conversely, when -1 < a < 0, the size of the rendered element decreases more slowly than it would with proportional physical scaling as one zooms out.

[0067] In accordance with at least one aspect of the invention, p(z), for a given length of a given object, is permitted to be substantially continuous so that during navigation the user does not experience a sudden jump or discontinuity in the size of an element of the image (as opposed to the conventional approaches that permit the most extreme discontinuity - a sudden appearance or disappearance of an element during navigation) . In addition, it is preferred that p(z) monotonically decrease with zooming out such that zooming out causes the elements of the object become smaller (e.g., roads to become thinner), and such that zooming in causes the elements of the object become larger. This gives the user a sense of physicality about the object (s) of the image. [0068] The scaling features discussed above may be more fully understood with reference to FIG. 14, which is a log-log graph of a rendered line width in pixels for an Al highway versus the zoom level in meters/pixel. (Plotting log(z) on the x-axis and log(p) on the y-axis is convenient because the plots become straight lines due to the relationship log(x^a) = a-log(x)). The basic characteristics of the line (road) width versus zoom level plot are :

(i) that the scaling of the road widths may be physically proportional to the zoom level when zoomed in (e.g., up to about 0.5 meters/pixel); (ii) that the scaling of the road widths may be non-physically proportional to the zoom level when zoomed out (e.g., above about 0.5 meters/pixel); and (iii) that the scaling of the road widths may be physically proportional to the zoom level when zoomed further out (e . g . , above about 50 meters/pixel or higher depending on parameters which will be discussed in more detail below) .

[0069] As for the zone in which the scaling of the road widths is physically proportional to the zoom level, the scaling formula of p = d' ^• z^a, is employed where a = -1. In this example, a reasonable value for the physical width of an actual Al highway is about d^r = 16 meters. Thus, the rendered width of the line representing the Al highway monotonically decreases with physical scaling as one zooms out at least up to a certain zoom level zO, say zO = 0.5 meters/pixel. [0070] The zoom level for zO = 0.5 is choserx to be an inner scale below which physical scaling is applied. This avoids a non-physical appearance when the roadmap is combined with other fine-scale GIS content with real physical dimensions. In this example, zO = 0.5 meters/pixel, or 2 pixels/meter, which when expressed as a map scale on a 15 inch display (with 1600x1200 pixel resolution) corresponds to a scale of about 1:2600. At d = 16 meters, which is a reasonable real physical w-idth for Al roads, the rendered road will appear to be its actual si_ze when one is zoomed in (0.5 meters/pixel or less). At a zoom level of 0.1 meters/pixel, the rendered line is about ISO pixels wide. At a zoom level of 0.5 meters/pixel, the rendered line is 32 pixels wide .

[0071] As for the zone in which the scaling of the road widths is non-physically proportional to the zoom level, the scaling formula of p = d' ^• z^a, is employed where -1 < a < 0 (within a range of zoom levels zO and zl) . In this example, the non-physical scaling is performed between about z0=0.5 meter-s/pixel and zl=3300 meters/pixel. Again, when -1 < a < 0, the "width of the rendered road decreases more slowly than it would with proportional physical scaling as one zooms out. Advantageously, this permits the Al road to remain visible (and distinguishable from other smaller roads) as one zooms out. For example, as shown in FIG. 5, the Al road 102 remains visible and distinguishable from ot-her roads at the zoom level of z = 334 meters/pixel. Assuming that the physical width of the Al road is d' = d = 16 meters, the width- of the rendered line using physical scaling would have been about 0.005 pixels at a zoom level of about 3300 meters/pixel, rendering it virtually invisible. Using non-physical scaling, however, where -1 < a < 0 (in this example, a is about -0.473), the width of the rendered line is about 0.8 pixels at a zoom level of 3300 mete s/pixel, rendering it clearly visible. [0072] It is noted that the value for zl is chosen to be the most zoomed-out scale at which a given road still has "greater than physical" importance. By way of example, if the entire U.S. is rendered on a 1600x1200 pixel display, the resolution would be approximately 3300 meters/pixel or 3.3 kilometers/pixel. If one looks at the entire world, then there may be no reason for U.S. highways to assume enhanced importance relative to the view of the country alone .

[0073] Thus, at zoom levels above zl, which in the example above is about 3300 meters/pixel, the scaling of the road widths is again physically proportional to the zoom level, but preferably with a large d' (much greater than the real width d) for continuity of p(z) . In this zone, the scaling formula of p = d' • z^a is employed where a = -1. In order for the rendered road width to be continuous at zl = 3300 meters/pixel, a new imputed physical width of the Al highway is chosen, for example, d' = 1.65 kilometers. zl and the new value for d' are preferably chosen in such a way that, at the outer scale zl, the rendered width of the line will be a reasonable number of pixels. In this case, at a zoom level in which the entire nation may be seen on the display (about 3300 meters/pixel) , Al roads may be about H pixel wide, which is thin but still clearly visible; this corresponds to an imputed physical road width of 1650 meters, or 1.65 kilometers.

[0074] The above suggests a specific set of equations for the rendered line width as a function of the zoom level: p(z) = dO • z^"1, if z ≤ zO p(z) = dl • z^a, if zO < z < zl, p(z) = d2 • z^"1, if z > zl.

[0075] The above form of p(z) has six parameters: zO, zl, dO, dl, d2 and a. zO and zl mark the scales at which the behavior of p(z) changes. In the zoomed-in zone (z ≤ zO) , zooming is physical (i.e., the exponent of z is -1), with a physical width of dO, which preferably corresponds to the real physical width d. In the zoomed-out zone (z ≥ zl) , zooming is again physical, but with a physical width of dl, which in general does not correspond to d. Between zO and zl, the rendered line width scales with a power law of a, which can be a value other than -1. Given the preference that p(z) is continuous, specifying zO, zl, dO and d2 is sufficient to uniquely determine dl and a, which is clearly shown in FIG. 14. [0076] The approach discussed above with respect to Al roads may be applied to the other road elements of the roadmap object. An example of applying these scaling techniques to the Al, A2, A3, A4, and A5 roads is illustrated in the log-log graph of FIG. 15. In this example, zO = 0.5 meters/pixel for all roads, although it may vary from element to element depending on the context. As A2 roads are generally somewhat smaller that Al roads, dO = 12 meters. Further, A2 roads are "important," e.g., on the U.S. state level, so zl = 312 meters/pixel, which is approximately the rendering resolution for a single state (about 1/10 of the country in linear scale) . At this scale, it has been found that line widths of one pixel are desirable, so d2 = 312 meters is a reasonable setting. [0077] Using the general approach outlined above for Al and A2 roads, the parameters of the remaining elements of the roadmap object may be established. A3 roads: dO = 8 meters, zO = 0.5 meters/pixel, zl = 50 meters/pixel, and d2 = 100 meters. A4 streets: dO = 5 meters, zO = 0.5 meters/pixel, zl = 20 meters/pixel, and d2 = 20 meters. And A5 dirt roads: dO = 2.5 meters, zO = 0.5 meters/pixel, zl = 20 meters/pixel, and d2 = 20m. It is noted that using these parameter settings, A5 dirt roads look more and more like streets at zoomed-out zoom levels, while their physical scale when zoomed in is half as wide. [0078] The log-log plot of FIG. 15 summarizes the scaling behaviors for the road types. It is noted that at every scale the apparent width of A1>A2>A3>A4>=A5. Note also that, with the exception of dirt roads, the power laws all come out in the neighborhood of a = -0.41. The dotted lines all have a slope of -1 and represent physical scaling at different physical widths. From the top down, the corresponding physical widths of these dotted lines are: 1.65 kilometers, 312 meters, 100 meters, 20 meters, 16 meters, 12 meters, 8 meters, 5 meters, and 2.5 meters.

[0079] When interpolation between a plurality of pre-rendered images is used, it is possible in many cases to ensure that the resulting interpolation is humanly indistinguishable or nearly indistinguishable from an ideal rendition of all lines or other primitive geometric elements at their correct pixel widths as determined by the physical and non-physical scaling equations. To appreciate this alternative embodiment of the current invention, some background on antialiased line drawing will be presented below.

[0080] The discussion of antialiased line drawing will be presented in keeping with the roadmap example discussed at length above, in which all primitive elements are lines, and the line width is subject to the scaling equations as described previously. With reference to FIG. 16A, a one pixel wide vertical line drawn in black on white background, such that the horizontal position of the line is aligned exactly to the pixel grid, consists simply of a 1- pixel-wide column of black pixels on a white background. In accordance with various aspects of the present invention, it is desirable to consider and accommodate the case where the line width is a non-integral number of pixels. With reference to FIG. 16B, if the endpoints of a line remain fixed, but the weight of the line is increased to be 1.5 pixels wide, then on an anti-aliased graphics display, the columns of pixels to the left and right of the central column are drawn at 25% grey. With reference to FIG. 16C, at 2 pixels wide, these flanking columns are drawn at 50% grey. With reference to FIG. 16D, at 3 pixels wide, the flanking columns are drawn at 100% black, and the result is three solid black columns as expected.

[0081] This approach to drawing lines of non-integer width on a pixellated display results in a sense (or illusion) of visual continuity as line width changes, allowing lines of different widths to be clearly distinguished even if they differ in width only by a fraction of a pixel. In general, this approach, known as antialiased line drawing, is designed to ensure that the line integral of the intensity function (or "1-intensity" function, for black lines on a white background) over a perpendicular to the line drawn is equal to the line width. This method generalizes readily to lines whose endpoints do not lie precisely in the centers of pixels, to lines which are in other orientations than vertical, and to curves.

[0082] Note that drawing the antialiased vertical lines of FIGS. 16A-D could also be accomplished by alpha-blending two images, one (image A) in which the line is 1 pixel wide, and the other (image B) in which the line is 3 pixels wide. Alpha blending assigns to each pixel on the display (1-alpha) * (corresponding pixel in image A) + alpha* (corresponding pixel in image B) . As alpha is varied between zero and one, the effective width of the rendered line varies smoothly between one and three pixels. This alpha-blending approach only produces good visual results in the most general case if the difference between the two rendered line widths in images A and B is one pixel or less; otherwise, lines may appear haloed at intermediate widths . This same approach can be applied to rendering points, polygons, and many other primitive graphical elements at different linear sizes.

[0083] Turning again to FIGS. 16A-D, the 1.5 pixel-wide line (FIG. 16B) and the 2 pixel-wide line (FIG. 16C) can be constructed by alpha-blending between the 1 pixel wide line (FIG. 16A) and the 3 pixel wide line (FIG. 16D) . With reference to FIGS. 17A-C, a 1 pixel wide line (FIG. 17A) , a 2 pixel wide line (FIG. 17B) and a 3 pixel wide line (FIG. 17C) are illustrated in an arbitrary orientation. The same principle applies to the arbitrary orientation of FIGS. 17A-C as to the case where the lines are aligned exactly to the pixel grid, although the spacing of the line widths between which to alpha-blend may need to be finer than two pixels for good results.

[0084] In the context of the present map example, a set of images of different resolutions can be selected for pre-rendition with reference to the log-log plots of FIGS. 14-15. For example, reference is now made to FIG. 18, which is substantially similar to FIG. 14 except that FIG. 18 includes a set of horizontal lines and vertical lines. The horizontal lines indicate line widths between 1 and 10 pixels, in increments of one pixel. The vertical lines are spaced such that line width over the interval between two adjacent vertical lines changes by no more than two pixels. Thus, the vertical lines represent a set of zoom values suitable for pre- rendition, wherein alpha-blending between two adjacent such prerendered images will produce characteristics nearly equivalent to rendering the lines representing roads at continuously variable widths .

[0085] Interpolation between the six resolutions represented by the vertical lines shown in FIG. 18 is sufficient to render the Al highways accurately using the scaling curve shown at about nine meters/pixel and above. Rendition below about nine meters/pixel does not require pre-rendition, as such views are very zoomed-in and thus show very few roads, making it more computationally efficient (and more efficient with respect to data storage requirements) to render them vectorially than to interpolate between pre-rendered images. At resolutions of more than about 1000 meters/pixel (such views encompass large fractions of the Earth's surface), the final pre-rendered image alone can be used, as it is a rendition using 1 pixel wide lines. Lines that are thinner than a single pixel render the same pixels more faintly. Hence, to produce an image in which the Al lines are 0.5 pixels wide, the 1 pixel wide line image can be multiplied by an alpha of 0.5.

[0086] In practice, a somewhat larger set of resolutions are prerendered, such that over each interval between resolutions, none of the scaling curves of FIG. 15 varies by more than one pixel. Reducing the allowed variation to one pixel can result in improved rendering quality. Notably, the tiling techniques contemplated and discussed in the following co-pending application may be considered in connection with the present invention: U.S. Patent Application No. 10/790,253, entitled SYSTEM AND METHOD FOR EXACT RENDERING IN A ZOOMING USER INTERFACE, filed March 1, 2004, Attorney Docket No. 489/2, the entire disclosure of which is hereby incorporated by reference. This tiling technique may be employed for resolving an image at a particular zoom level, even if that level does not coincide with a pre-rendered image. If each image in the somewhat larger set of resolutions is pre-rendered at the appropriate resolution and tiled, then the result is a complete system for zooming and panning navigation through a roadmap of arbitrary complexity, such that all lines appear to vary in width continuously in accordance with the scaling equations disclosed herein .

[0087] Additional details concerning other techniques for blending images, which may be employed in connection with implementing the present invention, may be found in U.S. Provisional Patent Application No. 60/475,897, entitled SYSTEM AND METHOD FOR THE EFFICIENT, DYNAMIC AND CONTINUOUS DISPLAY OF MULTI RESOLUTIONAL VISUAL DATA, filed June 5, 2003, the entire disclosure of which is hereby incorporated by reference. Still further details concerning blending techniques that may be employed in connection with implementing the present invention may be found in the

ill aforementioned U.S. Provisional Patent Application Serial No. 60/453,897.

[0088] Advantageously, employing the above-discussed aspects of the present invention, the user enjoys the appearance of smooth and continuous navigation through the various zoom levels. Further, little or no detail abruptly appears or disappears when zooming from one level to another. This represents a significant advancement over the state of the art.

[0089] It is contemplated that the various aspects of the present invention may be applied in numerous products, such as interactive software applications over the Internet, automobile-based software applications and the like. For example, the present invention may be employed by an Internet website that provides maps and driving directions to client terminals in response to user requests. Alternatively, various aspects of the invention may be employed in a GPS navigation system in an automobile. The invention may also be incorporated into medical imaging equipment, whereby detailed information concerning, for example, a patient's circulatory system, nervous system, etc. may be rendered and navigated as discussed hereinabove. The applications of the invention are too numerous to list in their entirety, yet a skilled artisan will recognize that they are contemplated herein and fall within the scope of the invention as claimed.

[0090] The present invention may also be utilized in connection with other applications in which the rendered images provide a means for advertising and otherwise advancing commerce. Additional details concerning these aspects and uses of the present invention may be found in U.S. Provisional Patent Application No.

60/ , entitled METHODS AND APPARATUS FOR EMPLOYING MAPPING

TECHNIQUES TO ADVANCE COMMERCE, filed on even date herewith, Attorney Docket No. 489/7, the entire disclosure of which is hereby incorporated by reference. [0091] Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims.

FIG. 1 PRIOR ART

FIG. 2 PRIOR ART

FIG. 3 PRIOR ART

FIG. 4 PRIOR ART

100A

FIG. 5

100B

FIG. 6

100C

FIG. 7

100D

FIG. 8

100E

FIG. 9

100F

FIG. 10

100G

FIG. 11

FIG. 12

FIG. 13

' ZOOM LEVEL

FIG. 14

ZOOM LEVEL

FIG. 15 D

FIG. 16

FIG. 17 ' METERS/PIXEL

FIG. 18 METHOD FOR SPATIALLY ENCODING LAROE TEXTS, METADATA, AND OTHER COHERENTLY ACCESSED NON-IMAGE DATA

Recently, image compression standards such as JPEG2000/JPIP¹ have been introduced to meet a demanding engineering goal: to enable very large images (i.e. gigapixels in size) to be delivered incrementally or selecti ely from a server to a client over a low-bandwidth communication channel. When sue h images are being viewed at full resolution, only a limited region can fit on a client's graphical display at any given time; the new standards are geared toward selectively accessing such regions and sending across the communication channel only data relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low-bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate. The present invention relates to an extension of these selectively decompressable image compression and transmission technologies to textβ.al or other non-image data. In the simplest instantiation, imagine a large text, e.g. the book Ulysses, by James Joyce. We can format this text by putting each chapter in its own column, with columns for sequential chapters arranged left-to-right. Columns are assumed to have a maximum width in characters, e.g. 100. Figure 2 shows the entire text of Ulysses encoded as an image in this fashion, with each textual character corresponding to a single pixel. The pixel intensity value in Figure 1 is simply the ASCII code of the corresponding character. Because greyscale pixels and ASCII characters both fit in 8 bits (values 0- 255), the correspondence between a pixel value and a character is quite natural. The full text of Ulysses in ordinary ASCII representation (i.e. -as a .txt file) is 1.5MB, which may be too large to transmit in its entirety over a narrowband communication channel. The ASCII text image of Figure 1, encoded as a lossless JPEG2000, is 2.2MB. This size would be somewhat reduced if the chapters of the book: were more equal in length, resulting in less empty space (encoded as zeros) in tlie text-image. Much more important than the overall file size, however, is the ability of an ordinary JPIP server to serve this file to a client selectively and incrementally. Any client viewing the text at a

¹ See e.g. David Taubman's implementation of Kakadu, www, kakadusoftware.com. Taubman was on the JPEG2000 ISO standards committee. zoom level sufficient to read the characters (this requires well over 1 pixel/character on the client-side display) can use JPIP to request only the relevant portion of the text. This operation is efficient, and adequate performance could be achieved for a reader of the text even with a very low bandwidth connection to the server, under conditions that would make it prohibitive to download the entire text. Note that similar effects could be achieved using a client/server technology specifically designed for selective access to large texts, but the text-image approach (as we will call it) has a number of advantages over conventional implementations: it uses existing technology and protocols designed for very large data volume it easily scales up to texts many gigabytes in size, or more, without any degradation of performance because access is inherently two-dimensional, in many situations (for example, when text is to be viewed in columns as in the Ulysses case) this approach is much more efficient than approaches that deal with text as a one-dimensional string because wavelets are used in JPEG2000, the text is subject to a multiresolution representation; although the text cannot be read at other than the final (most detailed) resolution, the broader spatial support of lower-resolution wavelets naturally provides an intelligent client-side cache for text near the region of interest; moving the ROI, as during scrolling, is thus highly efficient. Extending this approach to deal with formatted text, Unicode, or other metadata is straightforward, as all such data can be represented using ASCII text strings, possibly with embedded escape sequences. In many applications, JPEG2000 is used as a lossy compression format, meaning that the decoded image bytes are not necessarily identical to the original bytes. Clearly if the image bytes represent text, lossy compression is not acceptable. One of the design goals of JPEG2000 was, however, to support lossless compression efficiently, as this is important in certain sectors of the imaging community (e.g. medical and scientific). Lossless compression ratios for photographic images are typically only around 2:1, as compared with visually acceptable lossy images, which can usually easily be compressed by 24:1. Image compression, both lossy and lossless, can operate best on images that have good spatial continuity, meaning that the differences between the intensity values of adjacent pixels are minimized. The raw ASCII encoding is clearly not optimal from this perspective. One very simple way to improve the encoding is to reorder characters by frequency in the text or simply in the English language, from highest to lowest: code 0 remains empty space, code 1 becomes the space character, and codes 2 onward are e, t, a, o, i, n, s, r, h, 1, etc. Figures 2 and 3 compare text-images with ASCII encoding and with this kind of character frequency encoding. Clearly pixel values tend to cluster near zero; at least as importantly, the difference between one letter and the next tends to be substantially decreased. When this frequency encoded text-image is compressed losslessly as a JPEG2000, the file size is 1.6MB, barely larger than the raw ASCII text file (1.5MB) and 37% smaller than the ASCII encoded text-image. With further optimizations of the letter encoding, the compressed file size can drop well below the ASCII text file size. The further optimizations can include, but are not limited to: using letter transition probabilities (Markov-1) to develop the encoding, instead of just frequencies (Markov-0) encoding as pixels the delta or difference between one character and the next, rather than the characters themselves. With these added optimizations, we add to the advantages listed earlier that on the server side, text ready to be served in this fashion is actually compressed relative to the raw ASCII. The new invention is discussed here in the context of JPEG2000/JPIP as a selective image decompression technology, but nothing about the invention limits it to that particular format or protocol. For example, LizardTech's MrSID format and protocol has similar properties, and would also work.

Figure 1. Full Ulysses text-image, raw ASCII encoding (whϊte=0, black=255).

Figure 2. Text-image of first five chapters of Ulysses (truncated). Raw ASCII encoding; white=0, black=255.

Figure 3. Text-image of first five chapters of Ulysses (truncated) encod«d by frequency (simplest remapping).

METHOD FOR ENCODING AND SERVING GEOSPATIAL OR OTHER VECTOR DATA AS IMAGES

Recently, image compression standards such as JPEG2000/JPIP¹ have been introduced to meet a demanding engineering goal: to enable very large images (i.e. gigapixels in size) to be delivered incrementally or selectively from a server to a client over a low- bandwidth communication channel. When such images are being viewed at full resolution, only a limited region can fit on a client's graphical display at any given time; the new standards are geared toward selectively accessing such regions and sending across the communication channel only data relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low- bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate. The present invention relates to an extension of these selectively decompressable image compression and transmission technologies to geospatial or schematic data. It combines and extends methods introduced in previous application (1) Method for spatially encoding large texts, metadata, and other coherently accessed non-image data, attached as exhibit A, and (2) METHODS AND APPARATUS FOR NAVIGATING AN IMAGE attached as exhibit B. In (2), the concept of continuous multiscale roadmap rendering was introduced. The basis for the invention of (2) is a pre-rendered "stack" of images of a roadmap or other vector-based diagram at different resolutions, in which categories of visual elements (e.g. classes of road, including national highway, state highway, and local road) are rendered with different visual weights at different resolutions. During client/server interaction, corresponding areas of more than one of these images are downloaded, and the client's display shows a blended combination of these areas; the blending coefficients and the choice of image resolutions to be blended depend on zoom scale. The net result is that a user on the client side can navigate through a large map (e.g. all roads in the United States), zooming and panning continuously, without experiencing any visual discontinuities, such as

¹ See e.g. David Taubman's implementation of Kakadu, www.kakadusoftware.com. Taubman was on the JPEG2000 ISO standards committee. categories of roads appearing or disappearing as the zoom scale is changed. Rather, at every scale, the most relevant categories are accentuated; for example, when zoomed out to view the entire country, the largest highways are strongly weighted, making them stand out clearly, while at the state level, secondary highways are also weighted strongly enough to be clear. When the user zooms in to the point where the most detailed pre-rendered image is being used, all roads are clearly visible, and in the preferred embodiment for geospatial data, all elements are shown at close to their physically correct scale. A maximum reasonable resolution for these most detailed pre-rendered images may be about 15 meters/pixel; however, it is desirable from the user's standpoint to be able to zoom in farther. Pre- rendering at higher detail is not desirable for several reasons: first, because the file sizes on the server side become prohibitive (a single Universal Transverse Mercator zone image at 15 meters/pixel may already be in the gigapixel range); second, because a pre-rendered image is an inefficient representation for the kind of veiy sparse black-and-white data normally associated with high-resolution map rendering; and third, because the client may require the "real" vector data for performing computational tasks beyond static visual presentation. For example, a route guidance system may highlight a road or change its color; this can be done on the client side only if the client has access to vector data, as opposed to a pre-rendered image alone. Vector data may also include street names, addresses, and other information which the client must have the flexibility to lay out and render selectively. Pre-rendering street name labels into the map image stack is clearly undesirable, as these labels must be drawn in different places and sizes depending on the precise location and scale of the client view; different label renditions should not blend into one another as the user zooms. Pre- rendering such data would also eliminate any flexibility with regard to font. To summarize, vector data (where we use the term generically to refer both to geometric and other information, such as place names) is both important to the client in its own right, and a more efficient representation of the information than pre-rendered imagery, when the desired rendering resolution is high. Note, however, that if a large area is to be rendered at low resolution, the vector data may become prohibitively large and complex, making the pre-rendered image the more efficient representation. Even at low resolution, however, some subset of the vector data is necessary, such as the names of major highways. The present invention extends the methods introduced in (1) to allow spatial vector data to be encoded and transmitted selectively and incrementally to the client, possibly in conjunction with the pre-rendered imagery of (2). Using prior art, this would be accomplished using a geospatial database. The database would need to include all relevant vector data, indexed spatially. Such databases present many implementation challenges. Here, instead of using conventional databases, we use spatially addressable images, such as those supported by JPEG2000/JPIP, to encode and serve the vector data. In our exemplary embodiment, three images or channels are used for representing the map data, each with 8 bit depth: the prerendered layer is a precomputed literal rendition of the roadmap, as per (2); the pointer layer consists of 2*2 pixel blocks positioned at or very near the roadmap features to which they refer, typically intersections; the data layer consists of n*m pixel blocks centered on or positioned near the 2*2 pointers which refer to them. Because these three channels are of equal size and in registration with each other, they can be overlaid in different colors (red, green, blue on a computer display, or cyan, magenta, yellow for print media) to produce a single color image. Such images are reproduced in Figures 2-3. Figure 1 shows the prerendered layer alone, for comparison and orientation. The region shown is King County, in Washington state, which includes Seattle and many of its suburbs. Figures 3a and 3b are closeups from suburban and urban areas of the map, respectively.

Figure 1. Prerendered roadmap of King County, WA.

Figure 2. Color version showing prerendered roads (yellow), pointers (magenta) and data (cyan). ti *>

Figure 3 a. Closeup of suburban area of King County.

Figure 3b. Closeup of urban area of King County. If the user navigates to the view of the map shown in Figure 3 a, then the client will request from the server the relevant portions of all three image layers, as shown. The prerendered layer (shown in yellow) is the only one of the three displayed on the screen as is. The other two specify the vector data. The pointer image consists of 2x2 pixel blocks aligned on a 2x2 pixel grid, each of which specifies an (x,y) vector offset (with the x and y components of the vector each comprising a 16-bit integer, hence two pixels each) from its own location to the beginning (top left corner) of a corresponding data block in the data layer. The corresponding data block, in turn, begins with two 16-bit values (four pixels) specifying the data block width and height. The width is specified first, and is constrained to be at least 2, hence avoiding ambiguities in reading the width and height. The remainder of the data block can be treated as binary data which may contain any combination of vectors, text, or other information. In the examples of Figures 2-3, data blocks contain streetmap information including street names, address ranges, and vector representations. The pointer and data layers are precomputed, just as the prerendered layer is. Precomputation for the pointer and data layers consists of encoding all of the relevant vector data into data blocks, and packing both the pointers and data blocks as efficiently as possible into their respective images. In rural or sparse suburban areas (see Figure 3 a), features tend to be well-separated, resulting in large empty areas in the pointer and data images. Where pointers do occur, they fall precisely on the feature to which they refer, and their corresponding data blocks are in turn often centered precisely on the pointer. In dense urban areas, however (see Figure 3b), features are often too close together for the pointers and data blocks to all fit. It is therefore necessaiy to use a rectangle packing algorithm to attempt to place pointers and data blocks as close to their desired positions as possible without any overlaps. The results are evident in Figure 3b: the lakes and coasts near Seattle are filled with data blocks corresponding to features on the land. Because all urban areas are surrounded by sparser areas (suburbs, mountains, or bodies of water), it is always possible to place urban data blocks somewhere on the map; in general, even within a dense city there are enough empty spaces that this "spillover" is not overly severe. The higher the rate of spillover, the less well-localized the map vector data becomes. Spillover decreases drastically as the resolution of the data layer image is increased, and it is always possible to find a resolution at which efficiency and non-locality are appropriately balanced. In North America, 15m/pixel appears to be a good choice. It is "overkill" in rural areas, but near cities, it limits spillover as shown in Figures 2 and 3b. Efficient rectangle packing is a computationally difficult problem; however, there are numerous approximate algorithms for solving it in the computational geometry literature, and the present invention does not stipulate any particular one of these. The algorithm actually used involves a hierarchical "rectangle tree", which makes the following operations fast: test whether a given rectangle intersects any other rectangle already in the tree; insert a non-overlapping rectangle; find the complete set of "empty corners" (i.e. comers abutting already-inserted rectangles that border on empty space) in a ring of radius rO<=r<rl around a target point p. The "greedy algorithm" used to insert a new rectangle as close as possible to a target point then proceeds as follows: Attempt to insert the rectangle centered on the target point. If this succeeds, algorithm ends. Otherwise, define radius rO to be half the minimum of the length or width of the rectangle, and rl = r0*2. Find all "empty corners" between rO and rl, and sort by increasing radius. Attempt to place the rectangle at each of these corners in sequence, and on success, algorithm ends. If none of the attempted insertions succeeds, set rO to rl, set rl to 2*r0, and goto step 3. This algorithm always succeeds in ultimately placing a rectangle provided that somewhere in the image an empty space of at least the right dimensions exists. It is "greedy" in the sense that it places a single rectangle at a time; it does not attempt to solve the wholistic problem of packing n rectangles as efficiently as possible. (A wholistic algorithm would require defining an explicit measure of packing efficiency, specifying the desired tradeoff between minimizing wasted space and minimizing distance between rectangles and their "target points". The greedy algorithm does not require explicitly specifying this tradeoff, as is clear from the algorithm description above.) Figure 4 demonstrates the output of the basic packing algorithm for three cases. In each case, the algorithm sequentially placed a number of rectangles as near as possible to a common point. This solution to the rectangle packing problem is provided by way of example only.

Figure 4. Test output of the greedy rectangle packing algorithm. On the left, predominantly small, skinny rectangles; in the center, large, square rectangles; and on the right, a mixture.

For the greedy packing algorithm not to give placement preference to any specific areas of the map, it is desirable to randomize the order of rectangle insertion. In a preferred embodiment, pointer/data block pairs are thus inserted in random order. Other orderings may further improve packing efficiency in certain circumstances; for example, inserting large blocks before small ones may minimize wasted space. Pointers are always 2x2 (our notation is rows x columns); however, for data blocks, there is freedom in selecting an aspect ratio: the required block area in square pixels is determined by the amount of data which must fit in the block, but this area can fit into rectangles of many different shapes. For example, a 24 byte data block (including 4 bytes of width and height information, and 20 bytes of arbitrary data) can be represented exactly as 1x24, 2x12, 3x8, 4x6, 6x4, 8x3, or 12x2. (24x1 is disqualified, as the block width must be at least 2 for the 2-byte width to be decoded before the block dimensions are known on the client side, as described above.) The block can also be represented, with one byte left over, as 5x5. We refer to the set of all factorizations listed above, in addition to the approximate factorization 5x5, as "ceiling factorizations". The requirements for a valid ceiling factorization are that its area be at least the required area, and that no row or column be entirely wasted; for example, 7x4 or 3x9 are invalid ceiling factorizations, as they can be reduced to 6x4 and 3x8 respectively. In the simplest implementation, block dimensions may be selected based only on a ceiling factorization of the data length; in general, "squarer" blocks (such as 4x6) pack better than oblique ones (such as 2x12). The simplest data block sizing algorithm would thus select either 4x6 or 5x5, depending on how it trades off "squareness" against wasted bytes. More sophisticated block size selection algorithms may pick block dimensions adaptively, as part of the search for empty space near the target point. In one implementation, steps 1 and 4 of the algorithm above are then modified as follows: Sort the ceiling factorizations of the required data length by desirability, with preference for squarer factorizations and possibly a penalty for wasted bytes. Attempt to place rectangles of dimensions given by each ceiling factorization in turn at target point p. If any of these insertions succeeds, algorithm ends.

For each "empty corner" c in turn, attempt to place rectangles of dimensions given by each of the ceiling factorizations in turn at c. On success, algorithm ends. Further refinements of this algorithm involve specification of a scoring function for insertions, which, as with a wholistic optimization function, trade off wasted space, non- square aspect ratio and distance from the target point. Each of the three map layers — prerendered roads, pointers and data — is stored as a JPEG2000 or similar spatially-accessible representation. However, the storage requirements for the three layers differ. The prerendered road layer need not be lossless; it is only necessary for it to have reasonable perceptual accuracy when displayed. At 15m/pixel, we have found 0.5 bit/pixel lossy wavelet compression to be fully adequate. The pointer and data layers, however, must be represented losslessly, as they contain data which the client must be able to reconstruct exactly. Lossless compression is not normally very efficient; typical digital imagery, for example, is not usually compressible losslessly by more than a factor of about two at best. For most forms of either lossy or lossless compression, performance can be optimized by making the image function small in magnitude, hence occupying fewer significant bits. Therefore, in enhanced embodiments, special coding techniques are used to "flatten" the original data. The outcome of these techniques is apparent in Figure 5, which shows the same densely populated region of a data image before and after "flattening". Note that before flattening, the data image has full 8-bit dynamic range, and exhibits high frequencies and stuctured patterns that make it compress very poorly (in fact, a lossless JPEG2000 of this image is no smaller than the original raw size). After "flattening", most of the structure is gone, and a great majority of the pixels have values < 8, hence fitting in 3 bits. The corresponding JPEG2000 has better than 3:1 compression. "Flattening" can consist of a number of simple data transformations, including the following (this is the complete list of transformations applied in the example of Figure 5): 16-bit unsigned values, such as the width or height of the data block, would normally be encoded using a high-order byte and a low-order byte. We may require 16 bits because values occasionally exceed 255 (the 8-bit limit) by some unspecified amount, yet in the majority of cases values are small. For a value that fits in 8 bits, the high-order byte would be zero. Frequent zero high-order bytes followed by significant low-order bytes account for much of the 2-pixel periodicity apparent in parts of Figure 5 a. We can remap the 16 bits as follows:

The left eight columns represent the first pixel of the pair, previously the high-order byte; the rightmost eight columns represent the second pixel, previously the low-order byte. By redistributing the bits in this way, the range of accessible values (0-65535) remains unchanged, but the two bytes become much more symmetric. For example, for all 16-bit values 0-255, the two bytes each assume values < 16. Similar techniques apply to 32-bit or larger integer values. These techniques are also extensible to signed quantities. For variables in which the sign changes frequently, as occurs for differential coding of a road vector, a sign bit can be assigned to position 0, and the absolute value encoded in alternating bytes as above. Note that to be drawn convincingly, road vector data must often be represented at greater than pixel precision. Arbitrary units smaller than a pixel can instead be used, or equivalent!/, subpixel precision can be implemented using fixed point in combination with the above techniques. In our exemplary embodiment, 4 subpixel bits are used, for 1/16 pixel precision. When numbers are encoded as described above, it is desirable to make the numbers as small as possible. Sometimes context suggests an obvious way to do this; for example, since the width of any data block must be at least 2, we subtract 2 from data width before encoding. More significantly, both pointers and any position vectors encoded in a data block are specified in pixels relative to the pointer position, rather than absolute coordinates. This not only decreases the magnitude of the numbers to encode greatly; it also allows a portion of the data image to be decoded and rendered vectorially in a local coordinate system without regard for the absolute position of this portion. For vector rendering of a sequence of points defining a curve (for example, of a road), only the first point need be specified relative to the original pointer position; subsequent points can be encoded as "deltas", or step vectors from the previous point. After the second such point, subsequent points can be encoded as the second derivative, or the difference between the current and previous delta. Encoding using the second derivative is generally efficient for such structures as roads, since they tend to be discretizations of curves with continuity of the derivative — that is, they change their direction gradually. Another "flattening" technique is described in [1] for use with textual data, which would normally be encoded as ASCII, with a single character per byte. In the application described in [1], English text is being encoded, and hence the letters are remapped based on decreasing frequency of letter occurrence in a representative sample of English. The same technique can be used in this context, although the text encoded in a map, consisting mostly of street names, has quite different statistics from ordinaiy English. Numerals and capital letters, for example, are far more prominent. Note that the particular methods for encoding of pointers or data as presented above are exemplary; many other encodings are also possible. "Good" encodings generally result in images which are smooth and/or have low dynamic range.

Figure 5. The same binary 8-bit data (taken from a dense region of the roadmap data image of the Virgin Islands) before (above, a) and after (below, b) "flattening".

Using the techniques above, the King county roadmap at 15 m/pixel compresses as follows:

Surprisingly, the JPEG2000 representation (including lossy pre-rendered roadmap image, lossless pointer layer, and lossless data layer) is actually smaller than the compressed ZIP file representing the original data as tabulated text. (This file is part of the United States Census Bureau's 2002 TIGER/Line database.) Unlike the original ZIP, however, the new representation is ready to serve interactively to a client, with efficient support for continuously pannable and zoomable spatial access. The original prerendered multiscale map invention introduced in [2] included not a single prerendered image, but a stack of such images, rendered at progressively coarser resolutions and with rescaled weights for lines (or other visible features). Although no features are omitted from any of these prerenditions, some features are de-emphasized enough to be clearly visible only in an aggregate sense, e.g. the local roads of a city become a faint grey blur at the statewide level. The present invention can be extended to include pointer and data images corresponding to the coarser prerendered roadmap images, in which only a subset of the original vector objects are represented. For example, statewide pointer and data images, which are at much lower resolution than those of Figures 1-3, might only include data for state and national highways, excluding all local roads. These coarser data may also be "abstracts", for example specifying only road names, not vectors. Images at different resolutions might include varying mixtures or subsets of the original data, or abstracted versions. This technique both allows all of the relevant data to fit into the smaller coarse images, and provides the client with the subset of the vector information relevant for navigation at that scale. Although the implementation outlined above suggests an 8-bit greyscale prerendered map image at every resolution, the prerendered images may also be in color. Further, the prerendered images may be displayed by the client in color even if they are single-channel images, since the vector data can be used to draw important roads in different colors than the prerendered material. Finally, the prerendered images may omit certain features or roads present in the vectorial data, relying on the client to composite the image and vectorial material appropriately.

END

OF

APPENDIX

Claims

1. A method of transmitting information indicative of an image comprising transmitting one or more nodes of information as a first image, transmitting a second image including information indicative of vectors defining characteristics to be utilized for display at predetermined locations in said first image, and transmitting a third image comprising a mapping between said first and second images such that a receiver of said first and second images can correlate said first and second images to utilize said vectors at said predetermined locations.

2. The method of claim 1 wherein said first image is a map and wherein said second image is a set of vectors defining visual data that is only displayed at predetermined levels of detail.

3. The method of claim 1 wherein said first image is a map.

4. The method of claim 1 wherein said second image includes hyperlinks.

5. The method of claim 1 wherein said first image is a map and wherein said second image includes a set of vectors and wherein plural ones of said vectors are located at locations corresponding to locations on said first image wherein said vectors are to be applied, and wherein plural ones of said vectors are located at locations on said second image which do not correspond to said locations on said first image wherein said vectors are to be applied.

6. The method of claim 5 further comprising utilizing an efficient packing algorithm to construct sεi second image to decrease an amount of space between a location on said second image at which one or more vectors appear, and a location on said first image where said one or more vectors are to be applied.

7. The method of claim 6 wherein said vectors include in formation to launch a node or sub-node.

8. A method of rendering an image comprising receiving a first, second, and third set of data from a temote computer, the first data set being representative of an image, the second being representative of vectors defining characteristics of said image at prescribed locations, and the third iaid serving to prescribe said locations.

9. The method of claim 8 wherein said prescribed locations are street locations on a map.

10. The method of claim 8 wherein said vectors represent sub-nodes and include information indicative of under what conditions said sub-nodes should launch.

1 1. The method of claim 8 wherein said vectors include hyperlinks to at least one of the group consisting of: external web sites and embedded visual content.

12. The method of claim 8 wherein said vectors include hyperlinks to advertising materials.

13. The method of claim 8 wherein said vectors include information specifying a rendering method for portions of an image at predetermined locations in said image.

14. A method, comprising: providing a first layer of an image, said first layer including features of said image having locations within said first layer; and providing a second layer of said image, said second layer including data blocks corresponding to respective ones of said features; each said data block being in a location in said second layer substantially corresponding to a location in said first layer of the feature corresponding to each said data block, wherein a size and shape of said second layer substantially correspond to a size and shape of said first layer.

15. The method of claim 14 wherein each said data block describes at least one characteristic of the feature corresponding to each said data block.

16. The method of claim 14 further comprising: providing a third layer of said image, said third layer including pointers, each said pointer corresponding to a respective one of said features and a respective one of said data blocks.

17. The method of claim 16 wherein each said pointer indicates the location of each said pointer's corresponding data block with respect to each said pointer's location.

18. The method of claim 15 wherein said describing comprises: providing text data for at least one said feature.

19. The method of claim 15 wherein said describing comprises: providing a graphical illustration of at least one said feature.

20. The method of claim 15 wherein said describing comprises: providing geometric data indicative of at least one said feature.

21. The method of claim 20 wherein said geometric data comprises contour data.

22. The method of claim 15 wherein said describing comprises: providing color information for at least one said feature.

23. The method of claim 15 wherein said describing comprises: providing at least one link to an external web site relating to at least one said feature.

24. The method of claim 15 wherein said describing comprises: providing embedded visual content relating to at least one said feature.

25. The method of claim 13 wherein said describing comprises: providing advertising information relating to at least one said feature.

26. The method of claim 15 wherein said describing comprises: providing schematic detail of a road segment.

27. The method of claim 15 wherein said describing comprises: providing schematic detail for at least one of the group consisting of: at least one road, at least one park, a topography of a region, a hydrography of a body of water, at least one building, at least one public restroom, at least one wireless fidelity station, at least one power line, and at least one stadium.

28. An apparatus including a processing unit operating under the control of one or more software programs that are operable to cause the processing unit to execute actions, comprising: providing a first layer of an image, said first layer including features of said image having locations within said first layer; and providing a second layer of said image, said second layer including data blocks corresponding to respective ones of said features; each said data block being in a location in said second layer substantially corresponding to a location in said first layer of the feature corresponding to each said data block, wherein a size and shape of said second layer substantially correspond to a size and shape of said first layer.

29. A storage medium containing one or more software programs that are operable to cause a processing unit to execute actions, comprising: providing a first layer of an image, said first layer including features of said image having locations within said first layer; and providing a second layer of said image, said second layer including data blocks corresponding to respective ones of said features; each said data block being in a location in said second layer substantially corresponding to a location in said first layer of the feature corresponding to each said data block, wherein a size and shape of said second layer substantially -orrespond to a size and shape of said first layer.

30. A method, comprising: providing a first layer of an image, said first layer including features of said image having locations within said first layer; providing a second layer of said image, said second layer including data blocks corresponding to and describing respective ones of said features, each said data block being in a location in .said second layer at least substantially corresponding to a location in said first layer of the feature corresponding to each said data block; and providing a third layer of said image, said third layer including pointers having locations in said third layer, each said pointer corresponding to a respective one of said features and a respective one of said data blocks, the location of each said pointer in said third layer at least substantially corresponding to the location in said first layer of the feature corresponding to each said pointer.

31. The method of claim 30 wherein said second layer and said third layer each have a size and shape coiresponding to a size and a shape of said first layer.

32. The method of claim 30 further comprising: forming a map image from a combination of said first layer, said second layer, and said third layer.

33. The method of claim 32 further comprising flattening data in said map image.

34. The method of claim 30 wherein each said pointer indicates the location of each said pointer's corresponding data block with respect to each said pointer's location.

35. The method of claim 34 wherein said indicating comprises identifying an offset in two dimensions.

36. The method of claim 35 wherein each said dimension of said offset is expressed in pixels.

37. The method of claim 35 wherein said indicating comprises identifying an offset as a one-dimensional distance along a Hilbert curve.

38. The method of claim 37 wherein said offset along said one-dimensional curve is expressed in units of pixels.

39. The method of claim 37 wherein said offset along said one-dimensional curve is expressed in units corresponding to integral multiples of pixels.

40. The method of claim 30 wherein said providing said second layer of said image comprises: locating each said data block employing a packing algorithm to achieve a maximum proximity of each said data block to a target location for each said data block in said second layer, said target location in said second layer corresponding to the location in said first layer of the feature corresponding to each said data block.

41. The method of claim 40 wherein said maximum proximity is determined based on a shortest straight-line distance between each said data block's location and said target location for each said data block.

42. The method of claim 40 wherein said maximum proximity is determined based on a sum of absolute values of offsets in each of two dimensions between each said data block's location and said target location for each said data block.

43. The method of claim 40 wherein said maximum proximity is determined based on a minimum Hilbert curve length between each said data block's location and said target location for each said data block.

44. A storage medium containing one or more software programs that are operable to cause a processing unit to execute actions, comprising: providing a first layer of an image, said first layer including features of said image having locations within said first layer; providing a second layer of said image, said second layer including data blocks corresponding to and describing respective ones of said features, each said data block being in a location in said second layer at least substantially corresponding to a location in said first layer of the feature corresponding to each said data block; and providing a third layer of said image, said third layer including pointers having locations in said third layer, each said pointer corresponding to a respective one of said features and a respective one of said data blocks, the location of each said pointer in said third layer at least substantially corresponding to the location in said first layer of the feature corresponding to each said pointer.

45. The storage medium of claim 44 wherein said second layer and said third layer each have a size and shape corresponding to a size and a shape of said first layer.

46. The storage medium of claim 44 wherein the processing unit executes the further action of: forming a map image from a combination of said first layer, said second layer, and said third layer.

47. The storage medium of claim 44 wherein each said pointer indicates the location of each said pointer's corresponding data block with respect to each said pointer's location.

48. The storage medium of claim 44 wherein said indicating comprises identifying an offset in two dimensions.

49. The storage medium of claim 44 wherein said providing said second layer of said image comprises: locating each said data block employing a packing algorithm to achieve a maximum proximity of each said data block to a target location for each said data block in said second layer, said target location in said second layer corresponding to the location in said first layer of the feature corresponding to each said data block.

50. The storage medium of claim 49 wherein said maximum proximity is determined based on a shortest straight-line distance between each said data block's location and said target location for each said data block.

51. An apparatus including a processing unit operating under the control of one or more software programs that are operable to cause the processing unit to execute actions, comprising: providing a first layer of an image, said first layer including features of said image having locations within said first layer; providing a second layer of said image, said second layer including data blocks corresponding to and describing respective ones of said features, each said data block being in a location in said second layer at least substantially corresponding to a location in said first layer of the feature corresponding to each said data block; and providing a third layer of said image, said third layer including pointers having locations in said third layer, each said pointer corresponding to a respective one of said features and a respective one of said data blocks, the location of each said pointer in said third layer at least substantially corresponding to the location in said first layer of the feature corresponding to each said pointer.

52. The apparatus of claim 51 wherein said second layer and said third layer each have a size and shape corresponding to a size and a shape of said first layer.

53. The apparatus of claim 51 wherein the processing unit executes the further action of: forming a map image from a combination of said first layer, said second layer, and said third layer.

54. The apparatus of claim 51 wherein each said pointer indicates the location of each said pointer's corresponding data block with respect to each said pointer's location.

55. The apparatus of claim 54 wherein said indicating comprises identifying an offset in two dimensions.

56. The apparatus of claim 51 wherein said providing said second layer of said image comprises: locating each said data block employing a packing algorithm to achieve a maximum proximity of each said data block to a target location for each said data block in said second layer, said target location in said second layer corresponding to the location in said first layer of the feature corresponding to each said data block.

57. The apparatus of claim 56 wherein said maximum proximity is determined based on a shortest straight-line distance between each said data block's location and said target location for each said data block.