WO2002035391A1 - Resource distribution and addressing - Google Patents

Resource distribution and addressing Download PDF

Info

Publication number
WO2002035391A1
WO2002035391A1 PCT/US2000/029290 US0029290W WO0235391A1 WO 2002035391 A1 WO2002035391 A1 WO 2002035391A1 US 0029290 W US0029290 W US 0029290W WO 0235391 A1 WO0235391 A1 WO 0235391A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
ofthe
sets
grid
announce
Prior art date
Application number
PCT/US2000/029290
Other languages
French (fr)
Inventor
Patrick Lincoln
Steven Dawson
David Stringer-Calvert
Original Assignee
Sri International
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sri International filed Critical Sri International
Priority to PCT/US2000/029290 priority Critical patent/WO2002035391A1/en
Publication of WO2002035391A1 publication Critical patent/WO2002035391A1/en
Priority to US10/242,285 priority patent/US7177867B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Definitions

  • the field of the invention is computer based resource distribution and addressing.
  • resource lookup and addressing is simple. For example, to locate a printer the system simply checks its configuration for a directly connected printer. In networks of computer systems, resource location becomes much more difficult.
  • a common model for resource lookup is one in which a requesting computer system asks all computer systems on the network if they hold the required resource (where "resource” encompasses data, programs, hardware, and so on).
  • resource encompasses data, programs, hardware, and so on.
  • a computer requiring a printer on a local area network (LAN) may broadcast a request to all nodes on the LAN when it requires a printer, following which systems offering printing services will reply to the originating node.
  • LAN local area network
  • this approach is not scalable to large networks — it would be inconceivable to ask every computer on the Internet for the location of a particular data file, for example.
  • a centralized storage model in which all the available resources are listed in a single index, which is stored on a single centralized storage system ("server"), with which all other systems ("clients'") must communicate in order to access the information. For example, to locate a printer in a local area network (LAN) a client may contact the master server which has knowledge of all ofthe printing resources within the LAO.
  • LAN local area network
  • the use of such a centralized model is not, however, desirable under all circumstances. If the centralized storage system is networked to the client systems by a small number of relatively narrow communication channels, the amount of data being transferred to and from the storage system may exceed the capacity ofthe communication channels.
  • Another difficulty often encountered in a network environment is low network performance (a high "latency", or information transit time) as data traverses the network when traveling between the client systems and the centralized storage system.
  • Another difficulty arises from the need to provide a storage system having sufficient capacity to store all ofthe resource locations.
  • Yet another difficulty arises from the decreased reliability which results from storing all ofthe resource locations on a single system, i.e. the central system is a single point of failure.
  • the centralized model is not scalable to large networks.
  • the present invention is directed to a scalable method and architecture for efficiently locating desired resources within a network containing a plurality of server nodes, each of which hosts or otherwise provides access to a subset of a global resource set.
  • each ofthe server nodes are assigned membership in at least two sets, an "announce” set and a "request” set. Efficiency is obtained by taking advantage of this assignment to significantly limit the number of nodes that must be queried in order to locate any desired member or subset ofthe global resource set.
  • retrieval of a plurality of resources distributed across an electronic network which includes a plurality of interconnected resource nodes each ofthe resources being associated with at least one corresponding resource node may be accomplished by (1) assigning to each ofthe resource nodes membership in at least one of a plurality of announce sets; and (2) assigning to each ofthe resource nodes membership in at least one of a plurality of request sets, such that each ofthe request sets intersects with every one of the announce sets thereby forming a logical grid.
  • Fig. 1 is a schematic view of a network of nodes, a portion of which are to used to store a dataset.
  • Fig. 2 is a view of a logical 2x2 grid showing server nodes assigned to announce and lookup sets.
  • Fig. 3 is a view of a logical 3x3x3 cube showing the coordinates assigned to server nodes.
  • Fig. 4 is a flow diagram of a method in accordance with the present invention.
  • Fig. 5 is a flow diagram of a method in accordance with the present invention.
  • a network comprises nodes Nl-NZ, wherein server nodes N1-N27 are used to host (or otherwise access, such as by pointer) a particular global resource set.
  • node as used herein is not limited to any particular number or arrangement of computers or other devices but consists of any device or group of devices on which information can be stored and from which information can be retrieved.
  • a node may comprise a single device, a group of devices, a number of devices networked together, or a network comprising multiple sub-networks.
  • a partial list of potential devices may include hand-held devices such as cell phones, personal data assistants, and even appliances such as toasters and washing machines.
  • distribution of information among the machines ofthe node may be done according to the methods disclosed herein, resulting in a hierarchical embodiment of the present invention.
  • each server node N1-N27 is assigned membership in least one of a plurality of sets of "request” nodes, as well as membership in at least one of a plurality of sets of "announce” nodes.
  • the assignment is made in such a manner that each one ofthe plurality of sets of request nodes intersects each one ofthe plurality of announce sets. In other words, any given announce set and request set pair will share at least one server node in common.
  • the allocation of server nodes can be accomplished, for example, by assigning each server node a set of coordinates corresponding to a point in an imaginary 2-dimensional spatial coordinate system and utilizing the assigned coordinates to assign the server nodes to the sets of announce nodes and request nodes.
  • coordinates may be arbitrarily assigned, the assignment of coordinates may be related to factors such as whether it is in direct connection with another node. In some embodiments, coordinates may be assigned so that there is direct connectivity between non-orthogonally adjacent nodes.
  • Each server node announces or advertises the availability of its associated resources simply by informing the other members of its assigned announce set. Consequently, determining the availability and/or characteristics of any desired resource by identifying its corresponding server node (hereinafter sometimes simply "locating" the resource) is greatly facilitated.
  • the plurality of announce sets and the plurality of request sets are defined such that any selected request set intersects every announce set, as stated above, it is guaranteed that if an announcement of resource availability was made by any of server nodes N1-N27, then at least one member node of each request set will have been informed of that announcement. Therefore, in order to locate a desired resource, it is only necessary to query the members of any single request set. (In particular, in the worst case all nodes of a chosen request set might need to be queried; however in some embodiments, as described further below, the member nodes ofthe request set may be queried in an ordered manner, such that only a subset ofthe request set will generally need be queried.) In Fig. 4, a flow diagram ofthe above process is presented.
  • a 3x9 grid formed by using a 2-dimensional spatial coordinate system to assign coordinates to server nodes N1-N27 is shown.
  • the coordinates ofthe server nodes correspond to the row and column in which they are located. Having assigned coordinates to the nodes, dividing the nodes into request sets and announce sets can be accomplished in a straightforward fashion by utilizing each row as an announce set and each column as a request set for the node at the intersection ofthe row and column.
  • the request set USl for Nl includes all the nodes in column 1, namely Nl, N10, and N19
  • the announce set RSI for Nl includes all the nodes in row 1, namely N1-N9.
  • the request set US2 for N12 includes all the nodes in column 3, namely N3, N12, and N21
  • the announce set RS2 for N12 includes all the nodes in row 2, namely Nl 0-N18.
  • Fig. 2 does not contain an equal number of rows and columns, it is preferred that when feasible an approximately equal number of rows and columns be used, and that each row contain approximately the same number of server nodes. If it is not feasible to utilize an equal number of rows and columns it is still preferred that the number of rows and columns, and the number of server nodes in each row and column differ only by some small fixed constant factor. To satisfy this preference, nodes would likely have to be added and/or removed a row or column at a time. If the number of update nodes available does not allow for an equal division into request sets and announce sets, some nodes may be designated as members of multiple sets so as to fill out the empty places in the set. Server nodes may also be members of multiple logical grids. In other words, a given server node may have multiple assigned update and request sets where each pair of request and announce sets is specific to a particular dataset.
  • nodes communicatively coupled to the server nodes may exist which are not themselves server nodes.
  • a node communicatively coupled to the server nodes may be a member of one or more request sets without being a member of a announce set, or may be a member of one or more announce sets without being a member of an request set.
  • rows and columns may use diagonals and rows or some other mechanism for assigning nodes to request sets and announce sets, so long as each request set contains at least one member from each ofthe announce sets.
  • the representation ofthe server nodes in a 2- dimensional grid in Fig. 2. is merely a convenient form for representation, and is not a limitation ofthe present invention to server nodes that form such a 2-dimensional logical grid.
  • Alternative embodiments may arrange nodes in a D-dimensional logical grid, in which nodes are still assigned membership in announce sets and request sets.
  • a 3x3x3 cube formed by using a 3-dimensional spatial coordinate system to assign coordinates to storage nodes N1-N27 is shown.
  • the coordinates ofthe server nodes correspond to the nodes X, Y, Z position within the cube. Having assigned coordinates to the nodes, dividing the nodes into request sets and announce sets can be accomplished in many different ways.
  • each X,Y plane as a request set and each Z line (i.e. nodes sharing the same X coordinate) as an announce set for the node at the intersection ofthe line and the plane.
  • the request set US3 for Nl includes all the nodes having the same Z coordinate as Nl, namely N1-N9
  • the announce set RS3 for Nl includes all the nodes having the same X and Y coordinates as Nl, namely Nl, N10, and N19.
  • the node Once a node has been assigned to a request set and an announce set, the node will inform the other members of its announce set of its associated resources and of any updates or changes thereto, and will respond to queries for resources not associated with the node itself by querying other nodes in its request set.
  • the method and architecture described above can be advantageously extended. For example, if there exists any ordering relation (preferably a total order, although partial orders are sufficient) on the members ofthe global resource set (or a subset thereof), then in an extended embodiment the assigned members of each announce set maintain (at least) range information for the ordered subset of resources announced for that set. In this manner, the ordering on the subset ofthe global resource set is mapped to form an ordering relation on the server nodes comprising the announce set.
  • Well known techniques exist for example "binary search" to perform efficient lookup on an ordered set, which could be thereby utilized within the present invention.
  • the announcement and lookup/access of resources is enhanced by including the art of "Byzantine agreement" protocols.
  • "Byzantine agreement" protocols See for example the paper by M. Pease, R. Shostak and L. Lamport “Reaching Agreement in the Presence of Faults",. Journal of the ACM, volume 27 number 2, pages 228-234, April 1980.
  • "rogue" nodes may become part of any given request set. Such rogue nodes are prone to providing incorrect information when a request is made for a resource.
  • embodiments ofthe present invention may be extended such that requests for location of a desired resource are handled by redundantly querying more than one ofthe request sets.
  • server nodes may be dynamically added and removed to the grids of Figure 2 or 3.
  • FIG. 5 an example joining and leaving scenario is presented — suppose the server nodes comprise individual user machines, and each machine comprises a resource list showing files that a user is willing to share with other users. If a new user is added as a server node it will be assigned a grid position and hence membership in a request set and an announce set.
  • the list of files (resource directory) on the user's machine will be transmitted to the nodes/other users in its announce set, and the contents of their resource directories (there should only be one common directory duplicated on each node in the announce set) will be provided to the user machine so that it will have a directory ofthe resources provided by each node in its announce set. Leaving the grid involves simply reassigning one or more server nodes such that the the announce sets and request sets still intersect as required.
  • users outside the grid i.e. users other than those whose machines are acting as server nodes, may also access the system.
  • Such users may or may not receive a resource directory from the storage nodes.
  • the user may simply be provided a list of request set nodes to query in order to locate desired resources, but need not have a view of all the resources available on the announce set nodes and their corresponding request sets.
  • the user may be allowed to "piggy-back" on a particular server node so as to have visibility to that server node's resource directory.
  • the user may be provided a static "snapshot" of a resource directory but not receive subsequent updates until/unless another "snapshot" is requested.
  • protocols be established for nodes entering and leaving so as to ensure that the resource directory on a newly added node correspond to that ofthe other members of its announce set and to prevent holes in the grid from occurring when a node leaves.
  • Multi-grid implementations are also contemplated, in further embodiments ofthe present invention.
  • each individual grid follows the structure and principles described above.
  • each ofthe request sets making up a particular grid intersects each one ofthe announce sets making up that grid.
  • the global resource set may be distributed among associated server nodes that are allocated among different grids. Therefore, while there are many possible multi-grid embodiments, some further strategy or approach is preferably implemented in such embodiments in order to maintain the property that resource requests are guaranteed to be fulfilled if the desired resource exists somewhere within the global resource set.
  • One approach is to forward (or "re-request") any unsatisfied requests to at least one request set from a second grid, continuing if necessary to further grids until the resource is located.
  • Alternative multi-grid variations may forward a resource request to request sets from multiple grids simultaneously, such as where parallel processing of such requests can be effectively executed. Further variations may forward announce messages instead of, or in addition to, request messages.
  • the server nodes comprise all or part of an "edge" network.
  • An edge network is one that includes more than 20 nodes, at least several of those nodes being physically separated from each other by a distance of at least 1 km, and where the edge network nodes are communicatively coupled via data channels that are faster by at least an order of magnitude than the speed of connection between the edge nodes and at least one or more non-edge network nodes.
  • a typical edge network might be a group of geographically distributed Internet servers connected to each other by relatively high-speed lines (such as dedicated leased lines).
  • a "private edge network” is an edge network whose nodes are under the management and control of a single entity (such as a corporation or partnership, for example). Many edge networks have arisen or been constructed out of a desire to provide improved network infrastructure for delivering content across the Internet to multitudes of geographically diffuse end-users.
  • logical grid coordinate values may advantageously exceed 5, 10, 100, 1000, or more.
  • advantages offered by the present invention include:
  • Distributable resources include resources that can be copied and transmitted across a network, including but not limited to data files, application code, web pages and databases.
  • the present invention is likewise applicable to systems employing replication of resources, wherein it is desirable to efficiently locate all server nodes which currently hold a replicated copy of a resource, to allow the replications ofthe resource to either be purged or updated to reflect the new state ofthe updated resource.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Computer And Data Communications (AREA)

Abstract

The present invention is directed to a scalable method and architecture for efficiently locating desired resources within a network containing a plurality of server sets (Figure 1). The invention utilizes at least one announce set and one request set to query a subset of said servers and/or nodes, as shown in Figure 2, elements US1 and RS1, and thus reduces the overhead of searching desired network resources.

Description

RESOURCE DISTRIBUTION AND ADDRESSING
Field of The Invention The field of the invention is computer based resource distribution and addressing.
Background of The Invention
The number of information systems that provide the capability to store and retrieve large amounts of data and other resources continues to grow. For such systems, the architecture and methodologies employed by the system have a significant impact on the performance and reliability of the system.
In a non-networked system, resource lookup and addressing is simple. For example, to locate a printer the system simply checks its configuration for a directly connected printer. In networks of computer systems, resource location becomes much more difficult.
A common model for resource lookup is one in which a requesting computer system asks all computer systems on the network if they hold the required resource (where "resource" encompasses data, programs, hardware, and so on). For example, a computer requiring a printer on a local area network (LAN) may broadcast a request to all nodes on the LAN when it requires a printer, following which systems offering printing services will reply to the originating node. However, this approach is not scalable to large networks — it would be inconceivable to ask every computer on the Internet for the location of a particular data file, for example.
Many information systems utilize a centralized storage model in which all the available resources are listed in a single index, which is stored on a single centralized storage system ("server"), with which all other systems ("clients'") must communicate in order to access the information. For example, to locate a printer in a local area network (LAN) a client may contact the master server which has knowledge of all ofthe printing resources within the LAO. The use of such a centralized model is not, however, desirable under all circumstances. If the centralized storage system is networked to the client systems by a small number of relatively narrow communication channels, the amount of data being transferred to and from the storage system may exceed the capacity ofthe communication channels. Another difficulty often encountered in a network environment is low network performance (a high "latency", or information transit time) as data traverses the network when traveling between the client systems and the centralized storage system. Another difficulty arises from the need to provide a storage system having sufficient capacity to store all ofthe resource locations. Yet another difficulty arises from the decreased reliability which results from storing all ofthe resource locations on a single system, i.e. the central system is a single point of failure.
Such deficiencies of resource lookup in large networks, such as the World Wide Web (WWW), has led to the creation of search engines, web indexes, portals, and so forth. Web indexes and search engines operate much as the previously described central index, and rely on a process whereby resource locations are inserted (manually or automatically) into the index. However, these still suffer from the deficiency of being a central index as described earlier.
In summary, the centralized model is not scalable to large networks.
Current approaches to solving this problem for networks such as the Internet involve replicating the centralized index across a plurality of servers, but this has the deficiency that the indices must be kept synchronized, which is not scalable to vast resource sets or large networks. In addition, the replication approach typically entails replication of hardware sufficient to host the entire index at each replicated location. For large indices this may imply a significant additional cost burden that further impairs scalability.
Although the inadequacies of existing resource lookup methods have been previously recognized, and various solutions have been attempted, there has been and continues to be a need for improved resource lookup systems. Of particular interest here is a solution enabling efficient location and retrieval of an item from a resource set which is vastly distributed.
Summary of the Invention
The present invention is directed to a scalable method and architecture for efficiently locating desired resources within a network containing a plurality of server nodes, each of which hosts or otherwise provides access to a subset of a global resource set. In one aspect ofthe invention, each ofthe server nodes are assigned membership in at least two sets, an "announce" set and a "request" set. Efficiency is obtained by taking advantage of this assignment to significantly limit the number of nodes that must be queried in order to locate any desired member or subset ofthe global resource set.
In particular, retrieval of a plurality of resources distributed across an electronic network which includes a plurality of interconnected resource nodes, each ofthe resources being associated with at least one corresponding resource node may be accomplished by (1) assigning to each ofthe resource nodes membership in at least one of a plurality of announce sets; and (2) assigning to each ofthe resource nodes membership in at least one of a plurality of request sets, such that each ofthe request sets intersects with every one of the announce sets thereby forming a logical grid. In some instances it may be beneficial to include the following steps: (3) forming the members ofthe announce sets ofthe resources corresponding to all members ofthe same announce set; (4) requesting a desired resource; and (5) locating the desired resource by querying the members of at least one of the request sets but not all ofthe request sets.
Brief Description of The Drawings
Fig. 1 is a schematic view of a network of nodes, a portion of which are to used to store a dataset.
Fig. 2 is a view of a logical 2x2 grid showing server nodes assigned to announce and lookup sets.
Fig. 3 is a view of a logical 3x3x3 cube showing the coordinates assigned to server nodes. Fig. 4 is a flow diagram of a method in accordance with the present invention.
Fig. 5 is a flow diagram of a method in accordance with the present invention.
Detailed Description
In Figure 1, a network comprises nodes Nl-NZ, wherein server nodes N1-N27 are used to host (or otherwise access, such as by pointer) a particular global resource set. It should be noted that the term "node" as used herein is not limited to any particular number or arrangement of computers or other devices but consists of any device or group of devices on which information can be stored and from which information can be retrieved. Thus, a node may comprise a single device, a group of devices, a number of devices networked together, or a network comprising multiple sub-networks. A partial list of potential devices may include hand-held devices such as cell phones, personal data assistants, and even appliances such as toasters and washing machines. For nodes that comprise multiple machines, distribution of information among the machines ofthe node may be done according to the methods disclosed herein, resulting in a hierarchical embodiment of the present invention.
In a preferred embodiment ofthe present invention, each server node N1-N27 is assigned membership in least one of a plurality of sets of "request" nodes, as well as membership in at least one of a plurality of sets of "announce" nodes. The assignment is made in such a manner that each one ofthe plurality of sets of request nodes intersects each one ofthe plurality of announce sets. In other words, any given announce set and request set pair will share at least one server node in common. The allocation of server nodes can be accomplished, for example, by assigning each server node a set of coordinates corresponding to a point in an imaginary 2-dimensional spatial coordinate system and utilizing the assigned coordinates to assign the server nodes to the sets of announce nodes and request nodes. Although the coordinates may be arbitrarily assigned, the assignment of coordinates may be related to factors such as whether it is in direct connection with another node. In some embodiments, coordinates may be assigned so that there is direct connectivity between non-orthogonally adjacent nodes. Each server node announces or advertises the availability of its associated resources simply by informing the other members of its assigned announce set. Consequently, determining the availability and/or characteristics of any desired resource by identifying its corresponding server node (hereinafter sometimes simply "locating" the resource) is greatly facilitated. Because the plurality of announce sets and the plurality of request sets are defined such that any selected request set intersects every announce set, as stated above, it is guaranteed that if an announcement of resource availability was made by any of server nodes N1-N27, then at least one member node of each request set will have been informed of that announcement. Therefore, in order to locate a desired resource, it is only necessary to query the members of any single request set. (In particular, in the worst case all nodes of a chosen request set might need to be queried; however in some embodiments, as described further below, the member nodes ofthe request set may be queried in an ordered manner, such that only a subset ofthe request set will generally need be queried.) In Fig. 4, a flow diagram ofthe above process is presented.
Note that revocation of resource availability can be announced in a similar manner to announcement of resource availability, simply by removing the previous announcement from all nodes belonging to the revoking node's announce set.
In Fig. 2 a 3x9 grid formed by using a 2-dimensional spatial coordinate system to assign coordinates to server nodes N1-N27 is shown. The coordinates ofthe server nodes correspond to the row and column in which they are located. Having assigned coordinates to the nodes, dividing the nodes into request sets and announce sets can be accomplished in a straightforward fashion by utilizing each row as an announce set and each column as a request set for the node at the intersection ofthe row and column. Thus, the request set USl for Nl includes all the nodes in column 1, namely Nl, N10, and N19, and the announce set RSI for Nl includes all the nodes in row 1, namely N1-N9. Similarly, the request set US2 for N12 includes all the nodes in column 3, namely N3, N12, and N21, and the announce set RS2 for N12 includes all the nodes in row 2, namely Nl 0-N18.
Although the grid formed in Fig. 2 does not contain an equal number of rows and columns, it is preferred that when feasible an approximately equal number of rows and columns be used, and that each row contain approximately the same number of server nodes. If it is not feasible to utilize an equal number of rows and columns it is still preferred that the number of rows and columns, and the number of server nodes in each row and column differ only by some small fixed constant factor. To satisfy this preference, nodes would likely have to be added and/or removed a row or column at a time. If the number of update nodes available does not allow for an equal division into request sets and announce sets, some nodes may be designated as members of multiple sets so as to fill out the empty places in the set. Server nodes may also be members of multiple logical grids. In other words, a given server node may have multiple assigned update and request sets where each pair of request and announce sets is specific to a particular dataset.
It should be noted that not all ofthe nodes of Fig. 1 are represented in Fig. 2. For example, it is contemplated that nodes communicatively coupled to the server nodes may exist which are not themselves server nodes. Similarly, a node communicatively coupled to the server nodes may be a member of one or more request sets without being a member of a announce set, or may be a member of one or more announce sets without being a member of an request set. Moreover, rows and columns may use diagonals and rows or some other mechanism for assigning nodes to request sets and announce sets, so long as each request set contains at least one member from each ofthe announce sets.
It should further be noted that the representation ofthe server nodes in a 2- dimensional grid in Fig. 2. is merely a convenient form for representation, and is not a limitation ofthe present invention to server nodes that form such a 2-dimensional logical grid. Alternative embodiments may arrange nodes in a D-dimensional logical grid, in which nodes are still assigned membership in announce sets and request sets. For example, in Fig. 3 a 3x3x3 cube formed by using a 3-dimensional spatial coordinate system to assign coordinates to storage nodes N1-N27 is shown. The coordinates ofthe server nodes correspond to the nodes X, Y, Z position within the cube. Having assigned coordinates to the nodes, dividing the nodes into request sets and announce sets can be accomplished in many different ways. The simplest allocation would be to utilize each X,Y plane as a request set and each Z line (i.e. nodes sharing the same X coordinate) as an announce set for the node at the intersection ofthe line and the plane. Thus, the request set US3 for Nl includes all the nodes having the same Z coordinate as Nl, namely N1-N9, and the announce set RS3 for Nl includes all the nodes having the same X and Y coordinates as Nl, namely Nl, N10, and N19.
Once a node has been assigned to a request set and an announce set, the node will inform the other members of its announce set of its associated resources and of any updates or changes thereto, and will respond to queries for resources not associated with the node itself by querying other nodes in its request set.
If further information is available about the resource being announced and the global resource set (or a subset thereof), the method and architecture described above can be advantageously extended. For example, if there exists any ordering relation (preferably a total order, although partial orders are sufficient) on the members ofthe global resource set (or a subset thereof), then in an extended embodiment the assigned members of each announce set maintain (at least) range information for the ordered subset of resources announced for that set. In this manner, the ordering on the subset ofthe global resource set is mapped to form an ordering relation on the server nodes comprising the announce set. Well known techniques exist (for example "binary search") to perform efficient lookup on an ordered set, which could be thereby utilized within the present invention.
In a further embodiment ofthe present invention, the announcement and lookup/access of resources is enhanced by including the art of "Byzantine agreement" protocols. (See for example the paper by M. Pease, R. Shostak and L. Lamport "Reaching Agreement in the Presence of Faults",. Journal of the ACM, volume 27 number 2, pages 228-234, April 1980.) In a large network, "rogue" nodes may become part of any given request set. Such rogue nodes are prone to providing incorrect information when a request is made for a resource. To address this problem, embodiments ofthe present invention may be extended such that requests for location of a desired resource are handled by redundantly querying more than one ofthe request sets. As will be recognized by those of skill in the art in light ofthe teachings herein, by use of Byzantine agreement protocols the present invention can tolerate one or more rogue nodes providing incorrect information, and still produce the correct result. In a further aspect ofthe present invention, server nodes may be dynamically added and removed to the grids of Figure 2 or 3. In Fig. 5 an example joining and leaving scenario is presented — suppose the server nodes comprise individual user machines, and each machine comprises a resource list showing files that a user is willing to share with other users. If a new user is added as a server node it will be assigned a grid position and hence membership in a request set and an announce set. The list of files (resource directory) on the user's machine will be transmitted to the nodes/other users in its announce set, and the contents of their resource directories (there should only be one common directory duplicated on each node in the announce set) will be provided to the user machine so that it will have a directory ofthe resources provided by each node in its announce set. Leaving the grid involves simply reassigning one or more server nodes such that the the announce sets and request sets still intersect as required.
It is contemplated, in further embodiments ofthe present invention, that users outside the grid, i.e. users other than those whose machines are acting as server nodes, may also access the system. Such users may or may not receive a resource directory from the storage nodes. In one embodiment the user may simply be provided a list of request set nodes to query in order to locate desired resources, but need not have a view of all the resources available on the announce set nodes and their corresponding request sets. In another embodiment the user may be allowed to "piggy-back" on a particular server node so as to have visibility to that server node's resource directory. In yet another embodiment the user may be provided a static "snapshot" of a resource directory but not receive subsequent updates until/unless another "snapshot" is requested.
It is preferred that protocols be established for nodes entering and leaving so as to ensure that the resource directory on a newly added node correspond to that ofthe other members of its announce set and to prevent holes in the grid from occurring when a node leaves.
Multi-grid implementations are also contemplated, in further embodiments ofthe present invention. In such embodiments, each individual grid follows the structure and principles described above. For example, each ofthe request sets making up a particular grid intersects each one ofthe announce sets making up that grid. However, the global resource set may be distributed among associated server nodes that are allocated among different grids. Therefore, while there are many possible multi-grid embodiments, some further strategy or approach is preferably implemented in such embodiments in order to maintain the property that resource requests are guaranteed to be fulfilled if the desired resource exists somewhere within the global resource set. One approach is to forward (or "re-request") any unsatisfied requests to at least one request set from a second grid, continuing if necessary to further grids until the resource is located. Alternative multi-grid variations may forward a resource request to request sets from multiple grids simultaneously, such as where parallel processing of such requests can be effectively executed. Further variations may forward announce messages instead of, or in addition to, request messages.
The enhancements and alternative embodiments discussed earlier, such as the use of ordering techniques to enable further efficiency in resource location, generally may also be applied to multi-grid embodiments, as will be apparent to those of skill in the art in light of these teachings .
In a further preferred embodiment, the server nodes comprise all or part of an "edge" network. An edge network is one that includes more than 20 nodes, at least several of those nodes being physically separated from each other by a distance of at least 1 km, and where the edge network nodes are communicatively coupled via data channels that are faster by at least an order of magnitude than the speed of connection between the edge nodes and at least one or more non-edge network nodes. For example, a typical edge network might be a group of geographically distributed Internet servers connected to each other by relatively high-speed lines (such as dedicated leased lines). A "private edge network" is an edge network whose nodes are under the management and control of a single entity (such as a corporation or partnership, for example). Many edge networks have arisen or been constructed out of a desire to provide improved network infrastructure for delivering content across the Internet to multitudes of geographically diffuse end-users.
(See, for example, the methods of Digital Island at http://www.digisle.net/ and Akamai at http://www.akamai.com.) In current approaches, such content is typically replicated and/or cached across the multiple edge network nodes with the result that every node in the edge network is able to provide any information that is served by the edge network as a whole. This requires a priori knowledge ofthe origin location of such information, either for on- demand caching or for preemptive replication. However, by incorporation and use ofthe present invention, a distributed directory of serviced resources can be held across the edge network, such that when an end-user requests access to particular resources the edge network will locate and provide access to the desired resource very efficiently, in accordance with the methods and architecture ofthe present invention as described above.
Although the methods disclosed herein may be used on generalized storage networks comprising various numbers of nodes, greatest leverage is likely realized in networks comprising relatively large numbers of nodes, because of the scalability of the approach described. Thus it is contemplated that the total number of nodes may exceed 25, 100, 1000, 100000, or even 1000000. As a result, logical grid coordinate values may advantageously exceed 5, 10, 100, 1000, or more.
Thus, advantages offered by the present invention include:
- increasing the availability of resource location information, and avoiding the bottleneck (and single-point-of-failure vulnerability) that is characteristic of current approaches that store a global resource directory on a centralized server node;
- reducing the synchronization/update burden entailed by solutions that replicate copies of a global resource directory (or the resources themselves) over multiple server nodes; and
- greatly accelerating resource lookup as contrasted with solutions in which resource location queries potentially must be broadcast to all server nodes in the network in order to locate a desired resource.
In addition to the various embodiments described above, alternative embodiments are contemplated and include, but are not necessarily limited to, the following:
Practitioners of skill in the art will recognize that the present invention is generally applicable to the distribution and retrieval of distributable resources, as well as resource index information. Distributable resources include resources that can be copied and transmitted across a network, including but not limited to data files, application code, web pages and databases. Furthermore, the present invention is likewise applicable to systems employing replication of resources, wherein it is desirable to efficiently locate all server nodes which currently hold a replicated copy of a resource, to allow the replications ofthe resource to either be purged or updated to reflect the new state ofthe updated resource.
Thus, specific embodiments and applications of methods for information distribution and retrieval have been disclosed. It should be apparent, however, to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit ofthe appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms "comprises" and "comprising" should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced.

Claims

CLAIMSWhat is claimed is:
1. A method of facilitating retrieval of a plurality of resources distributed across an electronic network comprising a plurality of interconnected resource nodes, each of the resources being associated with at least one corresponding resource node, said method comprising: assigning to each ofthe resource nodes membership in at least one of a plurality of announce sets; assigning to each ofthe resource nodes membership in at least one of a plurality of request sets, such that each ofthe request sets intersects with every one ofthe announce sets thereby forming a logical grid.
2. The method of claim 1, further including informing the members of each one ofthe announce sets of the resources corresponding to all members ofthe same announce set.
3. The method of claim 1, further including requesting a desired resource, and locating the desired resource by querying the members of at least one request set but not all request sets.
4. The method of claim 1 further comprising designating at least one additional node that is not a member ofthe grid.
5. The method of claim 4 wherein the additional node is not a member of any ofthe request sets.
6. The method of claim 4, wherein the additional node is not a member of any of the announce sets.
7. The method of claim 4, wherein the additional node is a user node, and is communicatively coupled to one or more resource nodes in the grid.
8. The method of claim 1, wherein non-orthogonal adjacency in the grid indicates direct connectivity.
9. The method of claim 1, wherein additional distributed resources are associated with other resource nodes, said other resource nodes being assigned to other request sets and announce sets thereby forming an additional logical grid.
10. The method of claim 9, further including requesting a desired one ofthe additional distributed resources, and locating the desired resource by querying the members of at least one request set in the additional grid.
11. The method of claim 1 wherein at least one ofthe request sets includes a second logical grid, the second logical grid comprising a plurality of announce sets and a plurality of request sets.
12. The method of claim 1 wherein at least one ofthe resource nodes is selected from the following group: {general purpose computer, LAN, hand-held device, appliance}.
13. The method of claim 1 wherein the grid comprises an edge network.
14. The method of claim 1 wherein the plurality of resource nodes are allocated among the plurality of announce sets according to a characteristic ofthe associated resources, and the characteristic ofthe resources is utilized in selecting an order in which to query the members of at least one ofthe request sets when looking for the resource.
15. The method of claim 1 wherein a leaving protocol is followed before a node is removed from the grid.
16. The method of claim 1 wherein a joining protocol is followed before a node is added to the grid.
17. The method of claim 1 further including adding additional resource nodes to the grid, wherein the grid is maintained to comprise a number of rows and a number of columns that differ by a factor no more than a pre-determined constant number.
18. The method of claim 1 wherein the grid comprises at least 5 rows and at least 5 columns.
19. The method of claim 1 wherein the grid comprises at least 100 rows and at least 100 columns.
20. The method of claim 1 wherein the grid is two-dimensional.
21. The method of claim 1 wherein the grid is of greater than two dimensions.
22. The method of claim 1 wherein the resources are at least one ofthe following: data files, application files, programs, documents, web pages, images, music files, video files, and encryption keys.
23. A method of facilitating retrieval of a plurality of resources distributed across an electronic network comprising a plurality of interconnected resource nodes, each of the resources being associated with at least one corresponding resource node, said method comprising: assigning to each ofthe resource nodes membership in at least one of a plurality of announce sets; assigning to each ofthe resource nodes membership in at least one of a plurality of request sets; forming the members ofthe announce sets ofthe resources corresponding to all members ofthe same announce set; requesting a desired resource; and locating the desired resource by querying the members of at least one ofthe request sets but not all ofthe request sets.
24. A system comprising:
a plurality of resources; a plurality of interconnected resource nodes, each resource ofthe plurality of resources being associated with at least one corresponding resource node; each resource node comprising an announce set of resource nodes; each resource node comprising a request set of resource nodes which intersects with the announce set of every other resource node; each resource node programmed to inform the members of its announce set of its associated resources; each resource node programmed to locate a desired non-associated resource by querying the members of its request set.
AMENDED CLAIMS
[received by the International Bureau on 8 May 2001 (08.05.01); original claims 1-24 amended (4 pages)]
1. A method of facilitating retrieval of a plurality of resources distributed across an electronic network comprising a plurality of interconnected resource nodes, each of the resources being associated with at least one corresponding resource node, said method comprising', assigning to each ofthe resource nodes membership in at least one of a plurality of announce sets; assigning to each of he resource nodes membership in at least one of a plurality of request sets, such that each ofthe request sets intersects wit every one of the announce sets thereby forming a logical grid.
2. The method of claim I, further including infoπniαg the members of each one ofthe announce sets ofthe resources corresponding to all members ofthe same announce set.
3. The method of claim 1, further including requesting a desired resource, and locating the desired resource by querying the members of at least one request set but not all request sets.
4. The method of claim 1 further comprising designating at least one additional node that is not a member ofthe grid.
5. The method of claim 4 wherein the additional node is not a member of any ofthe request sets.
6. The method of claim 4, wherein the additional node is not a member of any of the announce sets.
7. The method of claim 4, wherein the additional node is a user node, and is communicatively coupled to one or more resource nodes in the grid.
8. The method of claim 1, wherein non-orthogonal adjacency in the grid indicates direct connectivity.
9. The method of claim 1- wherein additional distributed resources are associated with other resource nodes, said other resource nodes being assigned to other request sets and announce sets thereby forming an additional logical grid.
10. The method of claim 9, further including requesting a desired one of the additional distributed resources, and locating the desired resource by querying the members of at least one request set in the additional grid.
11. The method of claim 1 wherein at least one of the request sets includes a second logical grid, the second logical grid comprising a plurality of announce sets and a plurality of request sets.
12. The method of claim 1 wherein at least one ofthe resource nodes is selected from the following group: {general purpose computer, LAN, hand-held device, appliance}.
13. The method of claim 1 wherein the grid comprises an edge network,
14. The metliod of claim 1 wherein the plurality of resource nodes are allocated among the plurality of announce sets according to a characteristic ofthe associated resources, and the characteristic ofthe resources is utilized in selecting an order in which to query the members of at least one of the request sets when looking for the resource.
15. The method of claim 1 wherein a leaving protocol is followed before a node is removed from the grid,
16. The method of claim 1 wherein a joining protocol is followed before a node is added to the grid.
17. The method of claim 1 further including adding additional resource nodes to the grid, wherein the grid is maintained to comprise a number of rows and a number of columns that differ by a factor no more than a pre-determined constant number.
18. The method of claim 1 wherein the grid comprises at least 5 rows and at least 5 columns.
19. The method of claim 1 wherein the grid comprises at least 100 rows and at least 100 columns.
20. The method of claim 1 wherein the grid is two-dimensional.
21. The method of claim 1 wherein the grid is of greater than two dimensions. .
22. The method of claim 1 wherein the resources arc at least one of the following: data files, application files, programs, documents, web pages, images, music files, video files, and encryption keys.
23. Amethod of facilitating retrieval of aplurality of resources distributed across an electromc network comprising a plurality of interconnected resource nodes, each of the resources being associated with at least one corresponding resource node, said method comprising: assigning to each ofthe resource nodes membership in at least one of a plurality of announce sets; assigning to each of he resource nodes membership in at least one of a plurality of request sets; informing the members ofthe announce sets ofthe resources corresponding to all members ofthe same announce set; requesting a desired resource; and locating the desired resource by querying the members of at least one of the request sets but not all o the request sets.
24. A system comprising:
a plurality of resources; a plurality of interconnected resource nodes, each resource of the plurality of resources being associated with at least one corresponding resource node; each resource node comprising an announce set of resource nodes; each resource node comprising a request set of resource nodes which intersects with the announce set of every other resource node; each τesource node programmed to inform the members of its announce set of its associated resources; each resource node programmed to locate a desired non-associated resource by querying the members of its request set.
PCT/US2000/029290 2000-10-23 2000-10-23 Resource distribution and addressing WO2002035391A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/US2000/029290 WO2002035391A1 (en) 2000-10-23 2000-10-23 Resource distribution and addressing
US10/242,285 US7177867B2 (en) 2000-10-23 2002-09-12 Method and apparatus for providing scalable resource discovery

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2000/029290 WO2002035391A1 (en) 2000-10-23 2000-10-23 Resource distribution and addressing

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US10/242,285 Continuation-In-Part US7177867B2 (en) 2000-10-23 2002-09-12 Method and apparatus for providing scalable resource discovery

Publications (1)

Publication Number Publication Date
WO2002035391A1 true WO2002035391A1 (en) 2002-05-02

Family

ID=21741921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/029290 WO2002035391A1 (en) 2000-10-23 2000-10-23 Resource distribution and addressing

Country Status (1)

Country Link
WO (1) WO2002035391A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100386986C (en) * 2006-03-10 2008-05-07 清华大学 Hybrid positioning method for data duplicate in data network system
US7461166B2 (en) 2003-02-21 2008-12-02 International Business Machines Corporation Autonomic service routing using observed resource requirement for self-optimization

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5574860A (en) * 1993-03-11 1996-11-12 Digital Equipment Corporation Method of neighbor discovery over a multiaccess nonbroadcast medium
US5600794A (en) * 1995-08-04 1997-02-04 Bay Networks, Inc. Method and apparatus for managing exchange of metrics in a computer network by exchanging only metrics used by a node in the network
US6108652A (en) * 1997-12-01 2000-08-22 At&T Corp. Multicast probability-base grouping of nodes in switched network for improved broadcast search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5574860A (en) * 1993-03-11 1996-11-12 Digital Equipment Corporation Method of neighbor discovery over a multiaccess nonbroadcast medium
US5600794A (en) * 1995-08-04 1997-02-04 Bay Networks, Inc. Method and apparatus for managing exchange of metrics in a computer network by exchanging only metrics used by a node in the network
US6108652A (en) * 1997-12-01 2000-08-22 At&T Corp. Multicast probability-base grouping of nodes in switched network for improved broadcast search

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIN C.R. AND SHIANG-WEI CHAO: "A multicast routing protocol for multishop wireless networks", PROCEEDINGS OF THE 1999 GLOBECOM, IEEE 99CH37042, 1999, pages 235 - 239, XP002938329 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7461166B2 (en) 2003-02-21 2008-12-02 International Business Machines Corporation Autonomic service routing using observed resource requirement for self-optimization
CN100386986C (en) * 2006-03-10 2008-05-07 清华大学 Hybrid positioning method for data duplicate in data network system

Similar Documents

Publication Publication Date Title
US7035931B1 (en) Volume location service for a distributed file system
US10467245B2 (en) System and methods for mapping and searching objects in multidimensional space
US7177867B2 (en) Method and apparatus for providing scalable resource discovery
US10545914B2 (en) Distributed object storage
US7793112B2 (en) Access to content addressable data over a network
US9436694B2 (en) Cooperative resource management
US7644087B2 (en) Method and apparatus for data management
US7076553B2 (en) Method and apparatus for real-time parallel delivery of segments of a large payload file
US8346824B1 (en) Data distribution system
CN103067461B (en) A kind of metadata management system of file and metadata management method
JP7270755B2 (en) Metadata routing in distributed systems
US20070079004A1 (en) Method and apparatus for distributed indexing
US6973536B1 (en) Self-adaptive hybrid cache
Panigrahy Relieving hot spots on the world wide web
KR101341412B1 (en) Apparatus and method of controlling metadata in asymmetric distributed file system
WO2002035391A1 (en) Resource distribution and addressing
van Renesse et al. Autonomic computing: A system-wide perspective
March et al. DGRID: a DHT-based resource indexing and discovery scheme for computational grids
CN117609389A (en) Multi-terminal database system
Bisadi et al. A fault tolerant peer-to-peer spatial data structure
Chen Improved load-balancing for a chord-based peer-to-peer storage system in a cluster environment.

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 10242285

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP