GB2403825A - Method and apparatus for providing access to data from a heterogeneous storage environment - Google Patents

Method and apparatus for providing access to data from a heterogeneous storage environment Download PDF

Info

Publication number
GB2403825A
GB2403825A GB0316070A GB0316070A GB2403825A GB 2403825 A GB2403825 A GB 2403825A GB 0316070 A GB0316070 A GB 0316070A GB 0316070 A GB0316070 A GB 0316070A GB 2403825 A GB2403825 A GB 2403825A
Authority
GB
United Kingdom
Prior art keywords
data
application system
application
database
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0316070A
Other versions
GB0316070D0 (en
GB2403825B (en
Inventor
Steven Toth
Gavin Mackrill
Alun Mackrill
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bluestar Software Ltd
Original Assignee
Bluestar Software Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bluestar Software Ltd filed Critical Bluestar Software Ltd
Priority to GB0316070A priority Critical patent/GB2403825B/en
Publication of GB0316070D0 publication Critical patent/GB0316070D0/en
Publication of GB2403825A publication Critical patent/GB2403825A/en
Application granted granted Critical
Publication of GB2403825B publication Critical patent/GB2403825B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and system are provided for running an application system to provide client access through one or more applications to data stored in a heterogeneous data storage environment. The method involves uploading selected data from the heterogeneous data storage environment, which comprises multiple separate database systems, into the application system. The uploaded data is saved in the application system in a homogeneous data storage environment, which has a structure that reflects the applications themselves. Requests are received from clients to access the saved data. Each request identifies one of the applications. A received request is forwarded to the identified application, which accesses the saved data in order to generate a reply to the request. This reply can now be returned from the application system to the client as a response to the request.

Description

METHOD AND APPARATUS FOR PROVIDING ACCESS TO DATA FROM A
HETEROGENEOUS STORAGE ENVIRONMENT
Field of the Invention
The present invention relates to computer systems, and in particular to accessing data stored in multiple diverse systems through one or more applications.
Background of the Invention
Figure 1 is a schematic diagram depicting some of me computer systems utilised by a police force in accordance wim a typical existing implementation. In particular, Figure 1 illustrates Free separate systems 11A, 11B, 11C. By way of example, the first system 11A is used for crime recording, me second system 11 B is used for maintaining custody records, and me Bird system 11 C represents the police national computer service (although of course many omer systems and combinations of systems may be present instead).
The first system 11A stores details of recorded crimes in server 61, and more particularly in associated disk units 51A, 51B and 51C. Police officers can men access mese crime records from desktop PC 81 over local area network (LAN) 71.
The second system 11 B stores custody information in mainframe 62, more particularly on associated disk unit 52. Police officers can men access me custody information from terminal 82 over fixed line connection 72. Note mat both the crime recording system 11A and the custody information system 11B are local systems, in that each regional police force typically maintains its own versions of these two systems for storing data particular to Heir own police area.
The Bird system 11C provides intelligence about known criminals, illegal organizations, vehicle registrations, and so on. This intelligence is maintained on server 63, and more particularly on associated disk unit 53. In contrast to the first and second systems 11A and 11B, the police national computer service 11C is provided on a national basis to all regional police forces across the country. Accordingly, in order to access this service 11C, a desktop PC 83 is used to connect to server 63 via dial-up link 73A through wide area network (WAN) 73.
A major disadvantage of the configuration shown in Figure 1 is that the three systems 11A, 11 B and 11 C are in effect completely independent of one another. One consequence of this is that the same data may have to be entered multiple times (once 0 for each system in which it is to be stored). For example, if someone is brought into custody having been charged with a particular crime, then the relevant information is entered into custody system 11B via terminal 82. However, this information does not automatically get transferred into crime recording system 11A. Rather, details about the arrest have to be re-entered through desktop PC 81 in order to store them also in crime recording system 11A. It will be appreciated that this requirement to enter the same data more than once is both time-consuming and potentially error prone.
The lack of connectivity between the different systems also makes certain forms of data retrieval difficult. Thus consider the situation where a group of people are currently being held in custody. A police of ficer may need to determine whether any of them is potentially associated with a particular type of vehicle (e.g. a black convertible) that has been involved in a crime. In order to do this with the system configuration of Figure 1, the names of all the detainees must first be extracted from the custody system 11B. These names must then be fed individually into the police national computer system 11C to see what vehicle (if any) they possess. This two- stage process is necessary because there is no facility to do a single combined search of the two systems 11 B. 1 1 C (i.e. there is no mechanism to run a single query to locate the names of all those people in custody who are known to have a black convertible). Once again therefore, using the configuration of Figure 1 is time consuming and error prone.
In fact, even within a single system 11 the available forms of data access can be somewhat limited. Thus these systems are typically stored as relational databases, which are logically structured as a set of twodimensional tables. As the skilled person will be aware, there is some flexibility in the internal implementation of a given database model. In particular, the number and contents (i.e. fields) of the tables may vary according to the chosen database implementation.
For example, a crime recording system may store for each crime record the name and number of the reporting of ricer, along with his/her station identity and lo address. However, holding all this information in one long record for each recorded crime is rather inefficient in terms of storage space, since the name, station, and address will be constant for any given officer identity across all crime records reported by that crime officer. Consequently, rather man repeating this information in each individual crime record, such crime records might only hold the number of the reporting officer. A separate table can then be used to store a mapping from of fleer number to officer name and station identity. Yet another table may then be used to store a mapping from station identity to station address.
It is standard database design practice to decompose database records into various tables in this manner in order to minimise the disk space occupied by the database. This has particularly been the approach in the past, when disk capacity was rather expensive.
It is still possible to query data that is stored across multiple tables by the use of a join operation (e.g. in the above example, list all the crimes reported by officers at a particular station). However, while the use of a join operation is fine in principle, in practice the need to access multiple tables for a single query may result in a severe performance degradation. This is particularly likely if different tables are stored across different disk units (such as disk units 51A, SIB and 51C for crime recording system 11A), so that multiple disks units may have to be accessed (sometimes repeatedly) for a single query. These problems are magnified when many users are trying to access a database at the same time, which is typically the case for police computer systems.
Therefore, although systems 11 might in theory support certain forms of query, the response times in practice for such queries may be so poor as to render them largely unavailable. Indeed, it is relatively common for the designers of a system 11 to specifically disable certain queries, simply on the basis that they are highly inefficient within the adopted table structure for the database system. This is because allowing users to process such inefficient queries would consume large 0 amounts of computing resource, and so seriously degrade performance for other users.
Nevertheless, it will be appreciated that such restrictions limit the utility of the data held within the system.
There are various other problems associated with the disparate nature of systems 11A, 11B, and 11C. For example, each system needs to be made secure so that it can only be accessed by suitably authorised personnel. However, because the different systems are independent of one another, the required security information needs to be maintained separately for each individual system. For example, if a new officer joins the police force and is to be granted access to multiple systems, there is no mechanism to do this via a single operation. Rather, a separate authorization process must be performed for each individual system. Once again, such a piecemeal approach is time-consuming and error prone.
Furthermore, each system typically has its own specialised access software.
This causes a support and training problem, in that personnel need to learn a different application in order to access each system 11A, 11B, 11C. Furthermore, there can be problems with being able to access more than one system 11A, 11B, 1 1C from the same desktop. For example, desktop PC 81 may represent an identical machine to desktop PC 83 in terms of hardware. However, desktop PC 81 may be unable to access server 63 because it does not possess the relevant application software; similarly, desktop PC 83 may be unable to access server 61, again because it lacks the required application software. Of course, it may be feasible to install both sets of application software on a single PC, thereby allowing access to multiple systems 11A, 11B, 11C from this single PC (providing that there are no incompatibilities, and the capacity of the PC is not exceeded). However, if this approach is replicated across multiple PCs (so as to be available to multiple of ricers) then there may be additional s software licensing costs.
Another problem that can arise is that existing systems 11A, 11B, 11 C are frequently implemented on legacy systems. This can then limit the type of information that can be stored in such systems. For example, a database running on lo mainframe 62 may not have any facility to store image data. This then prevents upgrading the custody system 11A to hold a picture of each detainee, despite the fact that the advent of digital cameras makes image acquisition simple, and such a facility would be useful from an operational perspective.
The use of legacy systems may also restrict the type of devices used to access data in police systems. For example, portable and mobile computing devices, such as handheld machines and in-vehicle systems, are becoming increasingly prevalent. It is highly desirable for of ricers in the field to be equipped with this type of device in order to be able to access data from police systems during actual operations.
However, legacy computing systems may not provide the necessary support for access by mobile devices.
Of course, it may in principle be feasible to modify or enhance legacy systems to provide support for more complex data types and/or newer forms of access device.
2s However, implementing such additional support for each existing system 11A, 11B, 11C in turn may be a prohibitively expensive task. Similarly, one might consider replacing a legacy system entirely with a more modern system that already offers the desired features, but again the financial resources for this may not be available.
Yet a further problem with existing arrangements such as shown in Figure 1 arises from the use of different configurations in different police forces. Thus it is often beneficial to share information between such forces, not least because criminals are rarely considerate enough to limit their activities to a single police region. While the centralised police national computer system 11 C does provide some such sharing capabilities, it does not duplicate the information on the regional systems 11A and 11B. In other words, much useful information is held exclusively on regional systems such as crime recording system 11A and custody system 11B. This information is then simply inaccessible to other forces without human intervention (i.e. without a first force having to specifically request a second force to manually extract and forward information that is only available on a computer system belonging to the second force).
In fact, the problems with existing systems may be even more extensive than suggested above. Thus while Figure 1 depicts a configuration with three separate systems 11A, 11B, 11C, in practice most local police forces support a significantly greater number of independent systems, such as for preparation of court cases, manpower scheduling, and so on, and this only serves to exacerbate the situation.
It will be appreciated that problems such as those described above are not limited to police computer systems. Rather, they are typical of many large organizations that have developed a range of different computer systems to support various activities, but where there is only limited (or no) integration or coherence between the different systems (this is perhaps particularly true for some large public sector organizations).
Summary of the Invention
Accordingly, one embodiment of the invention provides a method of running an application system that provides client access through one or more applications to data stored in a heterogeneous data storage environment that comprises multiple separate database systems. The method comprises uploading selected data from the heterogeneous data storage environment into the application system. The uploaded data is saved in the application system in a homogeneous data storage environment, which has a structure that reflects the applications that use the application system.
The application system receives requests from clients to access the saved data.
Each request typically identifies one of the applications, and so is forwarded to the identified application. The receiving application then accesses the data saved in the homogeneous environment on behalf of the client, and generates a reply to the request. This reply is then returned from the application system to the requesting client.
The application system therefore acts as an intermediary between the clients 0 and the data stored in the various databases of the heterogeneous data storage environment. This provides a single point of access for the clients to multiple data sources, thereby shielding the clients from having to know the specifics of any particular data source. This greatly simplifies the clients, since these now only need to be provided with data access software for one system (this software may in fact be no more than a standard browser). Consequently, the software administration and training burden with respect to the clients is significantly reduced. Furthermore, security controls and procedures for individual clients can be implemented within the application system, rather than having to be replicated across all the different original databases of the heterogeneous data storage environment. Note that this sort of approach is especially suited to use with multiple police computing systems, such as described above, as well as other large-scale application systems that provide multiple clients with access to a heterogeneous data storage environment.
One potential downside with the interposition of the application system between the data sources and the clients is that this might increase the path length between the clients and the data, thereby degrading response time. This problem is avoided by copying (selected) data from the data sources into the application system itself. Consequently, the application system can now satisfy client requests from this locally saved data, without the overhead of having to contact the original data sources.
In addition, the data storage structure(s) in the application system are especially configured to support the particular applications through which clients access data saved in the application system. For example, record fields from the original databases that are not of interest to the clients can be omitted from the data saved in the application system. In addition, the storage configuration can be arranged so that information needed for responding to a single client request can be s quickly accessed as a whole. This is instead of having (say) to retrieve different parts of the information from different disk units or even different databases, which may represent the arrangement of the data as originally stored in the heterogeneous environment.
lo In general terms therefore, the databases of the heterogeneous data storage environment are usually structured to minimise storage capacity, and/or to facilitate initial data input. This may involve long record lengths, for comprehensive data recordal, and significant fragmentation of the database across multiple disk units in order to reduce storage requirements. In contrast, the homogeneous data storage environment is generally structured to optimise responsiveness for a particular set of queries, as determined by the particular set of applications supported by the system.
In one preferred embodiment, the uploaded data are saved into two or more slices in the homogeneous data storage environment. Each slice corresponds to a respective one of the applications in terms of data structure and contents, and is designed to provide a rapid response to queries from that particular application. Note that consequently there may be duplication of data between different slices, but this is acceptable in view of the increased performance. Note also that a single slice may combine data from two or more databases from the heterogeneous data storage environment, thereby facilitating applications that are not feasible with many existing systems.
It will be appreciated that the above considerations apply not only to queries involving data reads but also to those that include data writes. In other words, a single query may involve writing data elements that are ultimately intended for different disk units or different databases in the heterogeneous data storage environment. However, the responsible application initially saves these data elements into the homogeneous data storage environment in close conjunction with one another, thereby enabling a rapid response to queries. Although the subsequent synchronization operation to save this data back into the different databases of the heterogeneous data storage environment may take longer, this is asynchronous with the application responding to s the client, and so does not impact system performance from a user perspective.
In many implementations, it is important to maintain good synchronization between the copy of the data stored into the application system and the original database systems in the heterogeneous data storage environment. In one embodiment, lo uploading therefore involves ascertaining the rate of update of an original database system. The application system then adjusts the upload frequency from this database in accordance with the update rate. Thus if there are a large number of updates being made to the original database system, then these must be uploaded relatively frequently into the application system in order for the two to remain in close synchronization. Conversely, if a particular database system is rarely updated, then the upload rate from that database system into the application system can be relatively low.
There are various ways of ascertaining the update rate of a database. In some cases this information is directly available from a management or administration facility of the database. Another possibility is to look at the number of data fields that have changed, based on an update timestamp. In one particular embodiment of the present invention, the rate of update is determined from a log file of the database.
Thus typically each update of the database produces an entry (with timestamp) in the Is log file of the database. This then allows the update rate to be ascertained by counting the number of log entries corresponding to database updates over a set period.
In general it is desirable that each upload operation only copies over data that has changed since the previous upload operation (in order to reduce bandwidth and processing requirements). There are various ways of identifying the data that has been updated since the previous uploading. In one embodiment, the recently updated data is identified from a log file. In particular, the log file is scanned for transactions that have updated the database since the last upload operation. The changed database entries resulting from these transactions can now be uploaded to the application system. In other implementations, it may be possible to identify recently updated data entries directly, for example, by examining a timestamp included in such data entries.
In many implementations, the uploading process is not only used to copy data from multiple database sources into the application system, but also to significantly enhance this data. There are a variety of ways in which such enhancement may be achieved. For example, the data can be cleansed before uploading, to remove 0 incorrect or spurious values. Furthermore, consistency can be imposed across the different data sets retrieved from the various database systems of the heterogeneous data storage environment, such as by storing all address information in a common format. This can then allow linkages between the different data sets to be identified and saved. This enables clients to use the application system for searching across data IS from multiple data sources, a facility not generally available within the heterogeneous data storage environment, where each of the database systems is largely independent of the other database systems.
One particular form of linkage is established by comparing a first data entry from a first database with a second data entry from a second database. If certain predetermined criteria are satisfied, the first and second data entries are regarded as relating to the same underlying entity. As an example, different data sets may store lists of people in different contexts, for example, as past criminals, as victims of crime, as witnesses, etc. The application system may decide that any two people who 2s share the same name and the same date of birth are in fact one and the same individual. Appropriate entries are then made with the saved data to record such a linkage.
A further possibility is to enhance the uploaded data by supplementing it with additional information not previously available from the relevant database(s). For example, the application system may identify the grid reference associated with each address, and then save this grid reference in conjunction with the address in question. l
The grid reference can then be used to facilitate plotting the items associated with the address on a GIS (geographical information system) and elsewhere.
In one implementation, the application system supports multiple client platforms, such as desktop browsers, mobile phones, portable computing devices, in- car systems, and so on. In responding to a client request therefore, the reply is formatted in accordance with the type of client platform from which the request was received. For example, if the reply is being sent to a relatively small device, such as a mobile phone, the amount of graphics in the reply may be minimised. In addition, the 0 presentation of various user interface items, such as selection lists, can be customised to the particular client platform (e.g. as dropdown menus, hierarchical selection screens, etc).
Note that having the application system as an intermediary between the clients and the various database systems avoids having to provide separate support for multiple different client platforms on each database system of the heterogeneous data storage environment. Rather, support for any given client platform only has to be added once, to the application system itself, and this then provides access for the client platform in question to all of the database systems in the heterogeneous data storage environment.
It is important that the application system is responsive to client requests, even if the number of simultaneous users becomes large. One way of helping to achieve this is to maintain a pool of connections to the saved data in the homogeneous data storage environment. Accessing the saved data then includes allocating one of the connections to the application identified in an incoming request, which the application can use to access the appropriate data for satisfying the request. Having connections already open in this manner avoids the overhead of an application needing to open a new connection for each request, and so helps the system to respond promptly back to the clients.
A further performance benefit comes from the use of metadata to specify various aspects of system operation, such as the particular data that is to be uploaded from the heterogeneous data storage environment, and the way in which the data is to be presented in replies to the client requests. In one embodiment, this metadata is s compiled into the various system routines themselves, rather than being accessed (as a data file) at run-time, thereby helping to optimise performance.
Another important aspect of the present approach relates to high availability.
In one particular embodiment of the invention, modifying an application included in lo the application system involves developing a modified version of the application, and then taking an existing version of the application off-line. At this point client requests arriving into the application system for the off-line existing application are rejected.
The modified version of the application can therefore be installed safely, before being marked as on-line. The next client request that is directed to the application (now marked as on-line) will then result in instantiation of the application. Note that this approach allows applications to be modified or upgraded on an individual basis, without having to take down the entire system. It will be appreciated that this is a significant advantage, especially for an application system that supports a relatively large number of applications.
A particular feature of the present approach is that it supports horizontal integration between different application systems (subject of course to appropriate security constraints). Accordingly, an application on a first application system can transmit a query request to a second application system. The reply received from this second application system is then compatible with the homogeneous data storage environment of the first application system, for ready integration of the results. This high degree of commonality and interoperability between different application systems enhances the value and power of the data contained in any individual system.
Note that in some circumstances, data may be uploaded from a first application system to a second application system for input into the homogeneous data storage environment of the latter. In this case, the data from the first application system becomes, in effect, locally available on the second application system. This approach is particularly suited where the data concerned is likely to be regularly accessed in general terms by clients of the second application system.
s Alternatively, the second application system may only contact the first application system for the data in response to a particular client request. In this case, the data may not be stored in the homogeneous data storage environment of the second application system (or may only be cached there for a relatively short period).
This approach is particularly suited where data is provided in response to an ad hoc lo query that is rather unlikely to be repeated.
Another embodiment of the invention provides an application system for providing client access through one or more applications to data stored in a heterogeneous data storage environment having multiple separate database systems.
The system provides a refresh unit operable to upload selected data from the heterogeneous storage environment. The apparatus further comprises a data store, which contains the uploaded data in a homogeneous data storage environment structured in accordance with the applications. The apparatus further comprises at least one network interface, operable to receive client requests to access the saved data and for returning a reply to the client in response to the request. The application system further comprises a controller for forwarding a received client request to an application identified in the request. The identified application accesses data from the data store in order to generate a reply to the request.
2s Another embodiment of the invention provides a computer program product comprising instructions on a medium. The instructions are capable of causing a machine to run an application system to provide client access through one or more applications to data stored in a heterogeneous data storage environment. This is achieved by uploading selected data from theheterogeneous storage environment (comprising multiple separate database systems) into the application system, and saving the uploaded data in the application system into a homogeneous data storage environment structured in accordance with the applications. Client requests are received to access the saved data, each request identifying one of the applications.
The requests are forwarded to the identified application, which accesses the saved data in order to generate replies to the requests. These replies can then be returned to the clients in response to various requests.
It will be appreciated that the system and computer program product embodiments of the invention will generally benefit from the same particular features as the method embodiment of the invention. It will also be appreciated that the program instructions of computer program products are generally loaded from a 0 memory into a processor for execution. Some or all of these software components may be pre-installed onto one or more hard disk units, or loaded off some portable storage medium, such as a magnetic tape, CD-ROM, DVD, etc. Alternatively, some or all of the software components may be downloaded via a transmission medium over a network. Note that software obtained from a transmission or portable storage medium may be saved onto a hard disk for subsequent (refuse by the system, or may be loaded directly for execution into system memory.
Brief Description of the Drawings
Various embodiments of the invention will now be described in detail by way of example only with reference to the following drawings, in which like reference numerals pertain to like elements, and in which: Figure 1 is a diagram illustrating a typical set of existing computing systems used by a police force; 2s Figure 2 is a high-level schematic diagram of a police computing system in accordance with one embodiment of the present invention; Figure 2A is a high-level schematic diagram of a police computing system in accordance with one embodiment of the present invention; Figure 3 illustrates in more detail the data source side of the police computing system of Figure 2; Figure 4 illustrates in more detail the user interface side of the police computing system of Figure 2; Figure 4A is a diagram illustrating the relationship between applications and saved data in the system of Figures 2 and 4; Figure 5 illustrates one possible machine implementation of the police computing system of Figure 2; Figures 6 and 7 are flowcharts regarding the operation of the police computing system shown in Figures 3 and 4; Figure 8 is a flowchart illustrating the automatic generation of code in accordance with one embodiment of the invention; and Figure 9 is a flowchart illustrating the insertion of a revised application into lo the police computing system of Figure 2.
Detailed Description
Figure 2 illustrates in high-level schematic form a computer configuration in accordance with one embodiment of the invention. The configuration is used to provide access to multiple data sources I 1, which are typically heterogeneous in structure. Although Figure 2 illustrates only two data sources 11, there may be significantly more data sources in some implementations.
The data sources 11 are linked by a network 70 to an application or middleware system 200. Note that while Figure 2 shows both data sources 11 as being linked to application system 200 by a single network, in other embodiments the application system 200 may access different data sources 11 by different forms of network or data connection.
The application system 200 makes information from data sources 11 available to various clients 80A, SOB, 80C over networks 75A, 75B. For example, clients 80A, 80B may represent desktop PCs accessing the application system over a local area network (LAN) 75A, and client 80C may represent a portable handheld device accessing the application system over a mobile telephone network using the Wireless Application Protocol (WAP).
It will be appreciated that in a typical implementation, the number of clients may be very large, and they may be linked to the application system by a single network or by many networks, depending upon the particular set(s) of client devices to be supported. It will also be appreciated that data sources 11 may still be accessed via traditional mechanisms, such as illustrated in Figure 1, if this is considered appropriate for operational reasons.
The application system 200 provides a common access layer to data sources 11. This allows any client 80 to access data from any data source 11 via a standard or shared interface, thereby providing a degree of flexibility not present in existing configurations such as shown in Figure 1. In addition, it is possible to support queries and other data operations that utilise data from multiple different data sources 11. A further advantage of the configuration of Figure 2 is that the application system can be extended to support additional data sources 11 without requiring modification to clients 80, and likewise to support access by new forms of client 80 without requiring modification of data sources 11.
A key aspect of the application system 200 relates to performance. Thus it is very important that clients 80 have rapid access to data from data sources 11, and moreover that the architecture is scalable to support potentially a very large number of clients (perhaps hundreds or even thousands of users). One might expect the presence of the application system 200 to slow down access to the data, since it represents in effect an additional system interposed between the data and the users. In other words, there is now an extra system for the data to pass through en route from source to destination (compared to the configuration of Figure 1), and prima facie this would tend to slow down the provision of data to clients 80.
In order to address this problem, the application system 200 includes its own database 220, which is used to store at least portions of data from data sources 11.
Having a local copy of data within application system 200 thereby helps to ensure that timely access is available for clients 80, since data requests can now be supplied directly from information internal to application system 200, without having to forward the request to the original data source(s) 11.
The operations of the application system 200 may be broken down into two main logical components, namely a back end 225 and a front end 226. The former is responsible for uploading and replenishing the data in database 220 from the various data sources 11, while the latter is responsible for providing access to the data in database 220 for clients 80.
lo Note that in the illustrated embodiment it is also possible for information to flow in the reverse direction to that described above, namely from client 80 through to data source 11. Thus client 80 may interact with the front end 226 in order to update or enter some particular piece of data. The front end then stores this information in database 220, from where back end portion 220 transfers it back into the relevant data source 11. Back end portion 225 can therefore be regarded as responsible for a two way synchronization of data sources 11 and database 220.
Figure 3 illustrates the application system 200 in more detail, in particular the back end portion 225 responsible for interaction with data sources 11. In the example of Figure 3, the application system is being used to integrate access to the diverse existing police database systems 11 of Figure 1, in particular crime recording system l lA, custody system l lB, and police national computer system l lC. The application system 200 is attached to these various existing systems l lA, l lB and l lC by an appropriate data connection 71, 72, 73. In the embodiment of Figure 3 these data connections correspond to those used for normal user access (as shown in Figure 1), but any suitable form of data connection or access mechanism could be used to link back end portion 225 with the various data sources 11.
The operations of the back end portion 225 of the application system 200 are driven by a refresh unit 215. The refresh unit 215 includes or has access to a set of metadata 212. This metadata 212 describes all of the data structures present in the various individual computer systems l lA, l lB, l lC attached to the application system 200.
The back end 225 also includes multiple adapters 210A, 210B, and 210C (typically there is one adapter for each data source 11). Each adapter provides an appropriate protocol stack for interacting with its respective data source 11 in order to extract the desired information (as specified by metadata 212) for upload to database 220. Thus adapter 210A is used to access data from the crime recording system l lA over LAN 71, adapter 210B is used to access data from custody system l lB over lo fixed line 72, and adapter 210C is used to access data from the police national computer system l lC over WAN 73 and dial- up link 73A.
The nature of the adapters 210A, B. C varies depending on the particular database system used for the relevant police computer system. For example, the adapter software may emulate application software that would normally be used to access a database from a desktop PC (such as desktop PC 81 in Figure 1). Other adapters may be provided by various database manufacturers as a tool to allow application programmers to access records within the database system (e.g. such an adapter is available from Oracle Corporation in respect of its databases, such as Oracle 8). Thus depending upon the particular nature of the existing database systems l lA, 11 B. 11 C, the adapters 21 OA, B. C may be already commercially available.
Alternatively, in some other cases the adapters may have to be specially prepared based on knowledge of the relevant database.
Considering now metadata 212 in more detail, this specifies the source and target attributes for the data upload or synchronization process. The source attributes for each database system 11 detail the tables present, the fields in a table, as well as the type and length of each field (e.g. integer, X characters, etc). This metadata can then be used by the refresh unit 215 to access the data in the diverse existing systems l lA, 1 IB and l lC and to upload it into the database 220. The target attributes then specify how the data is to be stored into the application system database (again in
terms of tables, fields, format, etc.).
The metadata 212 can further be used to specify those particular data fields that do (or do not) need to be uploaded into the database 220. For example, existing database systems 11 generally include fields that only have internal meaning, perhaps for linking together tables. Since this data does not have any external meaning or use, it does not have to be uploaded into database 220.
In addition, many database systems 11 incorporate information that is no longer of interest or has not been properly maintained. For example, records may lo have been updated so as to be Y2K compliant, and a field provided to indicate the date when such compliance was achieved, and the name of the contractor involved. It will be appreciated that such data is not now of relevance for general operational purposes. Similarly, the crime recording system may include a field indicating a file number for holding the paperwork associated with a particular record, along with a field representing a cabinet where this file is stored. However, the cabinet information may not have been entered for most records, or may now be too inaccurate to be useful, perhaps due to subsequent file or cabinet rearrangements. The metadata 212 can then specify that these fields can be omitted from the records stored in database 220.
In addition to discarding certain fields from records prior to loading into database 220, it may also be desired to exclude certain records. For example, there may be a cut-off date from crime recording system 11A in terms of the age of the crime (which may vary according to the type of crime, so that more serious crimes are 2s perhaps uploaded even if relatively old). Any such selection of records from a data source 11 is again performed in accordance with appropriate parameters maintained in metadata 212.
Note that the dropping of fields from database records can be performed either by uploading only the desired part of a record from the original data source l l, or by uploading the whole record and then discarding those fields not required within back end portion 225 (or a combination of both techniques). Likewise, the dropping of records may be performed either at data source 11 and/or at the back end portion 225 of the application system 200. The exact approach adopted may differ from one data source to another (and for dropping fields or records), depending on the particular circumstances. Relevant factors to be taken into consideration include the proportion s of fields or records to be discarded, the available functionality in the computer holding the original data source, the type of communications link to the database source 11, the refresh rate for the data, and so on. In most situations, it is generally more efficient to perform the data selection or pruning at the original data source 11, thereby reducing the volume of traffic over the connection 71, 72, 73 between the data lo source 11 and the application system 200.
Note that any data that is not uploaded into the application system 200 is not lost, since it is retained in the original data systems 11. However, this data is not available via the application system 200. Nevertheless, should more general access to this data be desirable at some later date, the metadata 212 can be updated so that the data in question is now pulled into the application system 200.
Figure 2A illustrates a variation on the embodiment of Figure 2, in which database 220 is replicated within the application system 200. In particular, there is a first copy of the database 220A associated with the back end portion 225, and there is a second copy of the database 220B associated with the front end portion 226. In operation, data from the data sources 11 is first transported by the back end portion 225 into database 220A, and then copied by the front end portion 226 from database 220A into database 220B. 2s
Metadata 212A is used to determine the data copying performed by the back end portion 225, as well as describing the contents of database 220A, while metadata 212B is used to determine the data copying performed by the front end portion 226, as well as describing the contents of database 220B. Note that database 220B need not be an exact replica of database 220A, but rather its contents can be matched to particular application requirements. In this context, database 220B can be regarded as a set of one or more database caches particularly appropriate to the query forms of the applications in the application system 200.
Figure 2A also illustrates that the application system 200 may have a connection to one or more other application systems 200N that typically belong to other police forces. Thus as discussed in relation to Figure 1, in existing police computer systems there is little or no facility to exchange data between different regions. However, with the architecture of Figure 2A, not only does the application system 200 provide a vertically integrated system, allowing multiple diverse clients lo 80 to operate with multiple diverse data sources 11, but it also provides a form of horizontal integration by virtue of the interconnectivity between different application systems 200, 200N. Consequently, clients 80 attached to one application system 200 are able to access information held by data sources associated with one or more other application systems 200N (subject to the appropriate security authorizations).
Figure 6 presents a flowchart illustrating the general operation of the refresh engine of Figure 3. The method commences with a data capture operation between the application system 200 and a data source 11 (step 610). An important aspect of the application system is the need for the information in database 220 to be kept current. Thus the underlying data sources 11 are generally being updated on a constant basis as a result of new police operations, events, and so on. This updating may be performed using legacy terminals (such as shown in Figure 1). Accordingly, the data capture of step 610 from data sources 11 into database 220 is not a oneoff operation, but rather an ongoing synchronization process.
Data capture operations typically occur at least once per day for each data source 11, and are driven by the refresh unit 215 in accordance with predetermined criteria (which may vary from one data source 11 to another, depending on the nature of the information involved). Thus the timing of the uploads may be at regular intervals (e.g. every night), or contingent (at least partially) upon activity at the data source concerned. For example, the refresh unit 215 may poll a data source 11 (say every few minutes) to determine the level of activity at the data source. This can then be used to trigger an upload whenever the activity level passes a certain threshold (typically subject to the further condition of a maximum time between updates, irrespective of the level of activity). Note that an indication of the level of activity may be directly available from some database systems, such as from a management s information interface, but for other systems the level of activity may have to be estimated (such as from the size or a review of the log file).
For a given update it is generally most efficient if only those records that have changed since the last update are uploaded (if possible). In other words, there is no lo need to upload unchanged records, since they should already be present in the application system database 220. There are a variety of possible mechanisms for identifying records that have recently been updated. For example, in many systems database records are marked with a timestamp to indicate when they were last changed. In this situation, any record whose timestamp indicates that it has been updated since the last upload needs to be selected for (re)upload on this occasion.
Another strategy is to review the log file, and from this identify all recently updated records that need (re)copying across to the application system 200.
Once the data has been transferred from the original data source 11 into the application system 200, this allows a copy layer to be created (step 620). This copy layer can be regarded as a temporary data structure to hold the incoming data prior to storing it into the application system repository of database 220 (step 640). The copy layer reflects in part the arrangement of existing data sources l lA, l lB, l lC in the way in which the data is held. In contrast, the data structures in the database 220 itself are generally quite different from those of data sources l lA, l lB, l lC.
Accordingly, before the data is stored from the copy layer into database 220, it undergoes various forms of transformation (step 630). There are several motivations for this transformation, including: enhancing data quality by performing validation and cleansing operations; bringing different data sets into consistency with one another in order to hide or compensate for the heterogeneous nature of the underlying data sources; deriving added value from the multiple available data sets by identifying linkages between them; adopting a standardised format to facilitate information exchange between different application systems; and improving responsiveness to clients 80 by appropriate structuring of information within database 220.
Metadata 212 is typically used to specify the various transformations to be applied to data uploaded from the different data sources 11 in accordance with business and application logic. For example, if different databases l lA, l lB, l lC store the same sort of information in different formats, then the metadata 212 generally imposes a uniform format for such data. Data is then transformed (where necessary) into this uniform format before entry into database 220. For example, lo some data sources 11 may store a postal (ZIP) code as part of an address string, while other data sources may store this item as a separate field. Assuming that the application system decides to adopt the latter format throughout, then records from those systems 11 that do not originally have this format must have an additional field inserted in database 220 to hold the postal code. The address string for such records is then parsed to strip out the postal code (if present), which is now entered instead
into the newly created field.
As another example, multiple data systems l lA, l lB, l lC may include a record field representing the number of a police officer, but this may be stored in some systems 11 as an integer, and in others as a character string (where the relevant integer and/or character lengths may possibly vary from one data source to another).
In these circumstances, the metadata typically specifies a common storage format (say an 8-character string) to be used for this field in all forms of record. Accordingly, when a record is uploaded from a data source that has a different format for this field, the appropriate transformation is performed (such as from an integer to the desired 8- character string). It will be appreciated that this commonality of format then greatly facilitates searching across data sets from multiple original sources l lA, l lB, l lC (as well as across different application systems 200N).
The metadata 212 may also be used to specify certain edit checks to help ensure validity of the data; e.g. a release date must be no earlier than a corresponding incarceration date. If any problems are found, the back end 225 may choose not to upload or store the data in question, or to flag it with some indicator of unreliability, as well as typically generating an error message to a log file.
An important aspect of holding information from multiple data sources 11 in a s single system is that it is now possible to exploit linkages between the different data sources. In some cases, support for such linkages may already be included in the data stored in the existing systems. However, the use of such linkages was previously often prevented or restricted by the disjointed nature of existing systems.
lo For example, each recorded crime may be given a reference number in the crime recording system 11A (see Figure 1), while for each suspect detained in the custody database 11B, there may be a listing of crimes for which they are being held.
The listing in the custody system 11B may use the same reference number to specify a crime as the crime recording system 11A. Accordingly, the data transformation of step 630 establishes formal linkages (e.g. pointers) between the crimes listed in the custody data and the full crime records for those same crimes as detailed in the crime recording data. This then allows applications using the application system 200 to exploit data pulled from both systems - e.g. the system may have a facility to query whether anyone is currently in custody in relation to a crime at a particular address.
In contrast, in the existing systems of Figure 1, the above operation would typically have required a two-stage process. Firstly, the crime recording system 11A would be interrogated to find a crime reference number associated with the address, and then secondly this crime reference number would be entered into the custody Is system 11B to see if any suspect was being held for the crime. Note however that the existing systems of Figure 1 do not have any facility to automatically exploit the commonality of the crime reference number between the two systems.
The data transformation of step 630 can also be made to deduce new linkages between (or within) data sets. For example, many fields in a police data system identify people in one capacity or another, such as suspects, victims, detainees, etc. The data transformation procedure includes rules to determine when two individual identities should be regarded as both relating to a single person (typically this occurs if there are common first and last names, and also a common date of birth). Such information can then be formally encoded into the application system 200. Typically this is achieved by assigning a unique key to the relevant data item, which can then be utilised or referenced elsewhere in the database where the same data item is being stored.
A further aspect of the data transformation 630 is that, as previously mentioned, records in existing systems 11 are frequently split into tables stored on 0 different disk units (such as 51A, SIB, and 51C), thereby minimising total disk capacity required. In contrast, the integrity of records in the database 220 is preserved as much as possible (rather than splitting different fields from the same record into different tables). It will be appreciated that from a logical perspective, this does not alter the contents of the database, but it can have a significant impact from a performance standpoint. In particular, access time to a record can be greatly reduced.
Thus instead of having to access first one disk drive, then another, and then potentially another (etc.) to reconstruct a record split across various tables, the application system database 220 generally allows a complete stored record to be retrieved in a single read operation. This can lead to a significant improvement in the responsiveness of the application system 200 compared to existing systems. This aspect of the application system is described in more detail below with reference to Figure 4A.
As illustrated in Figure 3, apart from existing data sources l lA, l lB, and 1 IC, 2s the application system 200 also supports input from additional data sources 245.
These additional data sources may be accessed over any suitable connection, such as the WorldWideWeb, a local area network (LAN), an intranet, and so on, and may represent stored data (whether in a database, individual documents, etc) or data generated on the fly.
One typical category of additional data sources 245 is other application systems 200N belonging to other police systems. This provides a standardized mechanism for sharing data between police forces (subject to appropriate security checks, etc), thereby providing a form of horizontal integration. Note that depending on the circumstances, data access to other application systems may also be handled by the front end portion 226 rather than the back end portion 225. Thus the former is generally appropriate for responding to ad hoc data queries, while the latter may be used to systematically integrate selected data from remote systems 200N into the local database 220.
Another typical category of data from additional sources 245 represents lo supplementary information that is not directly available from existing data sources 11.
In some cases, this arises because certain legacy systems are unable to store particular types of information. For example, if database system 52 is unable to hold images, it may be decided to hold such images on a separate system. Then, when the records from database system are uploaded to the back end 225, the corresponding images can be pulled from the additional data sources 245 (assuming that the appropriate cross referencing is available). This then allows a complete record (image plus original data) to be generated for storage in database 220.
Another example of supplementing the data from sources 11 is to determine a grid reference for each address (typically by using a third party system to perform the desired conversion by lookup, interpolation etc). This grid reference can then be stored in conjunction with the address information in database 220, and greatly facilitates on-line mapping and other such utilization of the data. For example, of ricers in a vehicle may be able to access a map of a certain region, onto which the 2slocations of certain crime scenes and a suspect's house can now be superimposed.
Note that although the flowchart of Figure 6 has concentrated on data upload from the existing data systems 11 into the application system 200, in a preferred implementation there is a full two-way data synchronization. Thus the application system 200 provides (properly authorized) clients 80 not just with read update access to the data in database 220, but also update access. Accordingly, the refresh process is bidirectional, in that updates made into database 220 must also be returned (downloaded) as appropriate to the original data systems 11 (typically as a change set specifying a list of deltas). It will be appreciated that this download may involve restructuring or reformatting the data in the opposite manner to that performed on upload from data systems 11 to database 220. Note that the download to data sources 11 may be performed in conjunction with an upload from data sources 11 (i. e. as a two-way synchronization operation), or may be separately scheduled (such as whenever the number of updates made to the database 220 exceeds a given threshold).
Turning now to the front end 226 of the application system 200, this is lo illustrated in Figure 4. The application system may be accessed by multiple clients 80A, SOB, 80C, 80D and 80E over one or more networks 75A, 75B. Typically network 75A represents the Internet (or some other form of intranet or extranet running the TCP/IP protocol). In this case clients 80A, SOB, 80C generally use conventional web browsers such as Internet Explorer from Microsoft Corporation in order to access the application system 200. This therefore avoids the need for clients 80A, SOB, 80C to have any specialized software installed for this purpose (in contrast to the situation illustrated in Figure 1 for accessing data sources 11 directly).
Network 75B may represent a mobile telephone network or similar, in which case clients 80D and 80E typically represent mobile handsets supporting the wireless application protocol (WAP) or some other suitable mobile data format.
The front end portion 226 also includes multiple applications 450A, 450B, 450C, which can be regarded in some respects as sophisticated search engines for performing various operations on database 220. In one embodiment of the application system 200, each application is written in the Java programming language, and runs within its own Java virtual machines (VM) (Java is a trademark of Sun Microsystems, Inc).
An application 450A, 450B, 450C may broadly correspond to one of the existing data systems 11, such as crime recording system l lA, or alternatively it may represent a new application in the environment of application system 200 that does not have any counterpart in the existing systems. These new applications may be used, for example, to exploit the ability of the application system 200 to search across or to update data from multiple existing systems l lA, l lB, l lC.
Other applications 450N may also exploit the database 220 without being s formally included in the application system 200 itself (i.e. these applications do not generally access database 220 through front end portion 226). As an example, front end portion 226 may largely be designed to support operational queries from of ricers in the field. In contrast, an application 450N might represent some management information system, designed to review statistically the performance of different 0 aspects of the police service - e.g. crime clear-up rates, stolen property recovery rates, and so on. It is generally much easier for such applications 450N to access data from the database 220 rather than the existing data sources 11, given the homogeneous and cleansed data of the former.
Figure 7 is a flowchart illustrating the way in which the front end portion 226 of the application system 200 handles requests for data from clients 80. These requests are received over the relevant networks, and are directed at a particular server in accordance with load balancing requirements (step 710). (Note that this load balancing is discussed in more detail below with reference to Figure 5).
Accordingly, the request now arrives at a server component 410 of front end portion 226. It will be appreciated that this server component may have multiple front end portions in order to interface with the different network formats and the different communications protocols being utilised (e.g. network 75A, network 75B). The server component 410 then passes any incoming requests down to a servlet connector 420, which establishes a session per set of one or more requests from a client (step 720). In one particular implementation, the server 410 and the servlet connector 420 are provided by the Apache program (available from www. apache.org) running in a Linux environment (see www.linux.org), but the skilled person will be aware of other possible implementations for these components.
Incoming requests from clients 80 are then passed from the servlet connector 420 through to the XML producer component 440, which in one embodiment is implemented using the Cocoon program from Apache (again available from www.apache.org). Note the Cocoon program also includes an XML translation s component 430 (this is shown in Figure 4 with dashed line, since it is only pertinent to outgoing rather than incoming communications).
For an initial request in a session, the producer 440 interacts with controller software 460 in the application system to perform an authorization check, for example lo to confirm that a user has entered a correct password (step 730). The controller makes such checks by interacting with a system security layer (not shown in Figure 4).
Assuming that the client is indeed authorised, the controller software returns an identifier that can then be used for future communications in this session in lieu of repeating the full authorization procedure.
Note that the controller 460 therefore provides a single point of security control for the application system 200. This obviates the need to maintain separate security information and checks in each of the existing data sources 11. Rather, such existing data sources only need to be configured to give access to the application system itself, which can then control usage at the level of individual clients.
The controller also performs certain other validation of the request, such as checking, for example, that the desired application exists on the system. This typically involves identifying the appropriate 450A, 450B, 450C application to handle 2s the incoming request, generally on the basis of information included in the request.
For example, a user request may specify a Universal Resource Locator (URL) corresponding to a particular application, or may identify a particular desired application in an HTML form. The incoming request fails the validation by the controller 460 if the specified application does not exist within the application system 200.
Assuming that the incoming request is valid, the controller 460 notifies the producer of the target application. The producer then forwards the request to the appropriate application (step 740), which queries the relevant data stored in database 220 in accordance with the client request (step 750). In some cases this may involve a s further data-specific security check between the application and the controller 460. In other words, having originally checked that the client is a legitimate user of the system as a whole, the application may now check that the client is authorised to access the particular data requested (including whether such permission is restricted to read only).
In order to facilitate communications between the applications 450 and the database 220, in one implementation the controller maintains one or more pools of open connections to the database 220. Typically the different pools have different security privileges in terms of accessing data in the database. An application can then request such a connection from a pool in order to access the database (this avoids the overhead of the application having to open a new connection for each request). In response, the controller returns a connection having the appropriate privileges (i.e. from the appropriate pool) to the application, which then uses the connection to process a query with respect to the database.
When the application has finished processing the client request, it returns the connection back to the pool (via the controller), where it becomes available again for other applications to use. The controller is responsible for monitoring the usage of connections from the pool(s), and for creating new connections if the number of free 2s connections in a pool drops below a certain threshold.
Once the application has retrieved the requested data from the database, it is returned back to the XML producer 440. The XML producer 440 constructs an appropriate response for sending back to the client using extensible markup language (XML) (step 760). More particularly, the XML producer 440 maintains a set of XML forms that represent the outline of a response. These forms are then completed with the data received from the application 450. The overall contents and presentation of the response are governed by metadata 212B.
Once the XML producer 440 has created the raw response in XML for s returning to the client 80, a style sheet is generally applied. This is used to control the appearance of the data, and can be device dependent, for example, if the data is being sent to a device having a small screen, any optional graphics components may be omitted. The response is also processed by the XML translation component 430, which converts from XML into an appropriate format for the client device in question.
0 For example, if the client device 80 is running a standard web browser, then layer 430 translates from XML into HTML. Alternatively, the XML output from producer 440 may be translated by component 430 into WAP or any other supported format. (In one implementation, another supported output format is PDF, which is useful for FAXing, printing, etc). The appropriately formatted data is now ready for returning back to the client that originally made the relevant request via servlet component 420 and server 410 (step 770).
The intended use of the application system 200 implies that performance is a key concern. In other words, if a police of ricer acting as a client 80 sends a request to the system it is important that a response is returned promptly, otherwise the operational effectiveness of the officer may be compromised. Moreover, timeliness of response must be maintained even if a large number of clients are all interacting with the application system 200 at approximately the same time. A further important aspect of the application system is robustness. In other words, the system must be 2s very reliable to ensure as near continuous data availability as possible.
Figure 4A illustrates in more detail the structure of data as held in the application system 200. In this embodiment, the database 220B associated with the front end portion 226 acts to some extent as a cache for applications 450A, 450B, 450C. More particularly, the contents of database 220B are structured to reflect the specific requirements of applications 450A, 450B, 450C that are using the database 220B. In other words, database 220B does not have a generic configuration, but rather has multiple slices 227A, 227B, 227C, each corresponding to a different application. The data within a slice is tailored to the needs of the corresponding application. For example, fields that are not utilised by queries from application 450A need not be stored within the corresponding slice 227A. Moreover, the configuration of the tables and fields within a particular slice 227 can be matched to the kinds of queries generated by the corresponding application 450.
Such an approach helps to optimise the response time for application queries, since the size and number of tables to be searched in handling the query are lo minimised. Note that there can be some duplication in terms of slice contents, in that the same data may be stored in multiple different slices for access by different respective applications. However, as previously indicated, the general emphasis for database 220B (and system 200 overall) is to maximise performance rather than to minimise storage requirements. (Note that storage space is conserved anyway in database 220 by omitting data from a slice 227 if it is not needed by the corresponding application).
Figure 5 illustrates the machine layout of an implementation of the application system in accordance with one embodiment of the invention. The general architecture is based on multiple components that are able to operate largely independently of one another. This ensures robustness, in that the system can continue to operate even if one of the components fails. Furthermore, the architecture is highly scalable, in that additional components can be readily added to the configuration until the desired level of performance has been obtained.
The system is comprised of multiple nodes 530J, 530K, 530L, 530M, which are connected to one another by Ethernet 520. Each node represents in effect a complete application system 200, including a back end portion 225, a front end portion 226, and its own database 220, although, in one particular implementation, only one node (nominated as a master) actively uploads data from existing data systems 11. In other words, the back end portion 225 of the application system is only operational on one node (the master) at any given time, thereby avoiding each node separately accessing and transforming the same data. Access to the existing data systems 11 from nodes 530 may be made over Ethernet 520 or any other suitable network connection (not shown in Figure 5).
The database software on the nominated master node (say database 220J in node 450J) monitors updates to the database of this node 450J, and propagates such updates to the databases 220K, 220L, 220M in the remaining nodes 450K, 450L, 450M over Ethernet 520. The master node thus makes the retrieved and transformed data available to the other nodes, so that they all have a consistent view of the various 0 data sources 11.
Note that the software on a master node is the same as on all the other nodes, rather it is simply that one of the nodes performs the role of master node for a period of time. Thus if the master node fails, the database software on the other nodes detects this. The remaining nodes then arbitrate amongst themselves to decide upon a new node to take on the role of the master node. The use of multiple independent nodes, each of which can act as the master, provides robustness in case of a failure of any one individual node. In one embodiment, the database software on the various nodes is the MySQL program (available from www.mysql.com), but the skilled person will be aware of other programs that could be used instead.
Incoming requests to the application system 200 from various clients (not shown in Figure 5) are received over one or more networks 75. The application system 200 incorporates two load balancers 510, which typically run on separate machines (personal computers), with one designated as a master, and the other as a slave. These monitor the level of activity in the various nodes 530J, 530K, 530L, 530M. If an incoming client request relates to a new session, the request can be routed to a particular node in accordance with some appropriate load balancing algorithm (e.g. to the node that currently has the least activity). On the other hand, requests that represent a continuation of an existing session are routed to the same node as the previous requests in the session in order to ensure continuity.
In a current implementation, the nodes 530J, 530K, 530L and 530M are substantially identical with one another, and all support the same set of applications.
In other implementations however, the installed applications may vary from one node to another. The load balancer would then direct incoming client requests to a node s that supported the desired application. Having certain nodes dedicated to only certain applications may help to improve efficiency, since the nodes avoid some of the overhead and potential conflicts of running multiple application simultaneously.
Nevertheless, any given application should still be present on at least two different nodes, in order to provide redundancy.
In one particular embodiment, the metadata 212 used to control input into and output from the application system 200 is compiled into the application code itself (rather than being held as a separate data set). This process is illustrated in the flowchart of Figure 8, which commences with the initial generation of the metadata 212 (step 810). For the back end portion 225, the metadata 212A defines the data to be loaded from data sources 11 in terms of record and field selections, as well as any required cleansing, supplementing, etc. As regards the front end portion 226, the metadata 212B typically defines the contents of the various output screens, how these are presented and formatted, and so on.
The construction of the metadata itself can be based on a variety of sources.
In many cases there is a fair degree of commonality with previous implementations.
For example, different police forces generally all save the same sort of information in crime recording systems, and require broadly similar outputs of this information, Is albeit typically in somewhat differing formats. This then allows metadata to be reused from one implementation to another, subject to appropriate modification. The metadata can also be derived in part from table and field specifications for existing data sources 11 (i.e. the metadata for these legacy systems).
The method now proceeds to generate code automatically to implement the data synchronization and presentation defined in the metadata (step 820). For example, if the metadata 212 indicates that certain fields of a database 51 from an existing data source are to be pulled into the application system database 200, then code is generated and compiled into back end portion 225 to perform the desired retrieval. Likewise, if the metadata indicates that a particular set of data is to be presented as a scrollable list, then this is implemented into the front end code portion 226.
In practice, a relatively large proportion of the application system code can be generated automatically, given the comparatively structured nature of the task to be performed. However, usually there are a few items for which the automatic code generation is not completely successful, or for which a somewhat different result is desired. For example, it may be desired to present search results to a user in a somewhat different format or order from that resulting from the automatic code generation. Issues such as these are typically revealed by testing the system (step 830). Consequently, there follows a process of subsequent code refinement, in which the operational system is completed by human intervention (step 840). It will be appreciated that once this process has been finished, it is no longer necessary to maintain the metadata per se on a production system, rather it is effectively integrated into the application code itself.
Note that in some respects the automatic code generation of Figure 8 can be regarded as customization of generalized coding for data synchronization and presentation. This generalized coding is developed first, and then automatically particularized to a given set of metadata in order to perform the desired synchronization and presentation for the system in question.
In other implementations, the metadata may be used at run-time to control the operations of the application system. Typically this involves generalized code for synchronization and presentation accessing the relevant metadata (as a data file) for each data access to determine what data to retrieve. This is somewhat slower than the approach shown in Figure 8, given that the metadata contents must be tested to find out how to proceed. On the other hand, this alternative approach is potentially somewhat more flexible, in that it avoids hard-wiring the metadata into the actual as code. Consequently, changes can be made to the metadata (e. g. as regards which data fields to upload or to present in response to a given query) without having to recompile the relevant application code.
Another important feature of the application system 200 is high availability.
One aspect of this is the ability in one implementation to update applications without having to take the whole system off-line. For example, let us say that some new data is to be uploaded from existing data source 11 A into the application system, or that the presentation layout of query results to clients 80 is to be altered. These aspects are controlled by the metadata 212, which as just discussed may be coded into the applications 450 themselves. Accordingly, changes to the metadata now require a new application to be compiled and loaded.
In these circumstances, the procedure of Figure 9 is followed, which begins IS with the creation of the new or modified application (step 910). Such an operation has already been described in relation to Figure 8 (i. e. the new metadata is defined, leading to the automatic generation of code, followed by testing and appropriate code revision until the finished application is available).
Once a satisfactory new or modified application has been developed (step 910), the previous version of the application is taken off-line (step 920) . This involves modifying configuration data available to the controller 460 so that incoming client requests are no longer directed to this application. (If any such requests are received, these are bounced by the controller at the validation stage with an indication 2s that the requested application is unavailable). In addition, the existing application itself is terminated on the application system 200.
The code for the new application is now installed into the application system (step 930), typically over-writing the code for the existing application (now off-line).
In one embodiment, the applications are written in the Java programming language, and are installed into the application system 200 as JAR (Java archive) files. If there is a new style sheet (an XSLT file) associated with the new application, then this is also loaded into the application system.
The controller configuration data can now be updated to indicate the availability of the newly installed application. When the XML producer 440 next tries to access this application, it detects that the relevant code is not yet instantiated.
Accordingly, the producer launches a Java VM and loads or instantiates the new application into the VM (step 940), whereupon it is then available for handling client requests. Note that in some systems the XML producer may be configured to lo automatically instantiate new applications on its own initiative (as opposed to waiting for a first incoming client request for that particular application to trigger instantiation, which may somewhat delay the response to the first request) .
The application loading procedure of Figure 9 therefore allows applications to be modified without taking down the entire application system 200. Rather, just the application concerned is unavailable for a certain period of time. It will be appreciated that this facility greatly enhances the robustness and reliability of the application system, especially in a system running many applications at the same time.
The application system 200 has been primarily described herein in the context of police computers. Nevertheless, it will be appreciated that it is also potentially applicable to other contexts addressing analogous technical concerns - e.g. a wide range of client devices that need access to multiple legacy systems across a heterogeneous data storage environment.
In conclusion therefore, a variety of particular embodiments have been described in detail herein, but it will be recognised that this is by way of exemplification only. The skilled person will be aware of many further potential modifications and adaptations that fall within the scope of the claimed invention and its equivalents. l

Claims (33)

  1. Claims 1. A method of running an application system to provide client
    access through s one or more applications to data stored in a heterogeneous data storage environment comprising multiple separate database systems, said method comprising: uploading selected data from the heterogeneous storage environment into the application system; saving the uploaded data in the application system into a homogeneous data lo storage environment structured in accordance with said one or more applications; receiving client requests to access the saved data, each request identifying one of said one or more applications; forwarding a received client request to the identified application; accessing the saved data with the identified application in order to generate a reply to said request; and returning said reply from the application system to the client in response to said request.
  2. 2. The method of claim 1, wherein uploading comprises ascertaining the rate of update of at least one of said database systems, and modifying an upload frequency from said at least one database in accordance with said update rate.
  3. 3. The method of claim 2, wherein said rate of update is ascertained from a log file of said at least one database system.
  4. 4. The method of any preceding claim, wherein uploading comprises selecting data that has recently been updated for uploading, said recently updated data being identified from a log file.
  5. 5. The method of any preceding claim, wherein uploading includes data cleansing.
  6. 6. The method of claim 5, wherein data cleansing comprises comparing a first data entry from a first database with a second data entry from a second database, and treating said first and second data entries as a single data entry in the application system if one or more predetermined criteria are satisfied. s
  7. 7. The method of claim 5, wherein the uploaded data includes data from a first database having a link with data from a second database, and said data cleansing comprises creating a formalised indication of said link for saving with the data.
    lo
  8. 8. The method of any preceding claim, wherein saving the uploaded data comprises supplementing the uploaded data with additional information.
  9. 9. The method of any preceding claim, wherein said uploading and saving are performed in accordance with metadata that is precompiled into routines for uploading and saving.
  10. 10. The method of any preceding claim, wherein the uploaded data is saved into one or more slices, each corresponding to a respective one of said one or more applications.
  11. 11. The method of any preceding claim, wherein said application system supports multiple client platforms, and returning a reply to a request includes formatting the reply in accordance with a client platform from which the request was received.
  12. 12. The method of any preceding claim, further comprising maintaining a pool of connections to the saved data in the homogeneous data storage environment, wherein accessing the saved data includes allocating one of the connections to the identified application.
  13. 13. The method of any preceding claim, wherein the structure of the reply to said request is defined by metadata that are precompiled into routines for generating the reply.
  14. 14. The method of any preceding claim, further comprising modifying one of said one or more applications within the application system by: developing a modified version of an application; taking an existing version of the application off-line, whereby client requests for the off- line existing application are rejected; installing the modified version of the application; marking the modified version of the application as on- line; and instantiating the modified version of the application in order to handle client lo requests.
  15. 15. The method of any preceding claim, further comprising transmitting a request to another application system and receiving a reply therefrom, wherein said reply is compatible with said homogeneous data storage environment.
  16. 16. An application system for providing client access through one or more applications to data stored in a heterogeneous data storage environment that comprises multiple separate database systems, said system comprising: a refresh unit operable to upload selected data from the heterogeneous data storage environment; a data store containing the uploaded data in a homogeneous data storage environment structured in accordance with said one or more applications; at least one network interface, operable to receive client requests for accessing the uploaded data in the data store, each request identifying one of said one or more applications, and to return a reply from the application system to the client in response to said request; and a controller for forwarding a received client request to the identified application, wherein the identified application accesses data from the data store in order to generate a reply to said request.
  17. 17. The application system of claim 16, wherein the refresh unit is operable to ascertain the rate of update of at least one of said database systems, wherein upload frequency from said at least one database is modified in accordance with said update rate.
  18. 18. The application system of claim 17, wherein said update rate is ascertained from a log file of said at least one database system.
  19. 19. The application system of any of claims 16 to 18, wherein the refresh unit is operable to select data that has recently been updated for uploading, said recently updated data being identified from a log file.
  20. 20. The application system of any of claims 16 to 19, wherein the refresh unit incorporates data transformation routines operable to compare a first data entry from a first database with a second data entry from a second database, and wherein said first and second data entries are treated as a single data entry in the application system if one or more predetermined criteria are satisfied.
  21. 21. The application system of any of claims 16 to 20, wherein the uploaded data includes data from a first database having a link with data from a second database, and wherein the refresh unit saves the uploaded data into the data store with a formalised indication of said link.
  22. 22. The application system of any of claims 16 to 21, wherein the uploaded data is supplemented with additional information prior to entry into the data store.
  23. 23. The application system of any of claims 16 to 22, wherein said refresh unit uploads data from the heterogeneous data store environment in accordance with metadata that is precompiled into routines incorporated into the refresh unit.
  24. 24. The application system of any of claims 16 to 23, wherein the uploaded data contained in the data store is configured into one or more slices, each corresponding to a respective one of said one or more applications.
  25. 25. The application system of any of claims 16 to 24, wherein said application system supports multiple client platforms, and wherein a reply to a request is formatted in accordance with a client platform from which the request was received.
  26. 26. The application system of any of claims 16 to 25, wherein said controller maintains a pool of connections to the data store, wherein data in the data store is accessed by allocating one of the connections to the appropriate application.
  27. 27. The application system of any of claims 16 to 26, wherein the structure of the lo reply to a request is defined by metadata that is precompiled into routines for generating the reply.
  28. 28. The application system of any of claims 16 to 27, wherein said controller is operable to transmit a request to another application system and to receive a reply therefrom, wherein said reply is compatible with said homogeneous data storage environment.
  29. 29. A computer program product comprising instructions on a medium, said instructions capable of causing a machine to run an application system to provide client access through one or more applications to data stored in a heterogeneous data storage environment by performing the operations of: uploading selected data from the heterogeneous data storage environment into the application system, wherein said heterogeneous data storage environment comprises multiple separate database systems; saving the uploaded data in the application system into a homogeneous data storage environment structured in accordance with said one or more applications; receiving client requests to access the saved data, each request identifying one of said one or more applications; forwarding a received client request to the identified application; accessing the saved data with the identified application in order to generate a reply to said request; and returning said reply from the application system to the client in response to said request.
  30. 30. A computer program comprising instructions for implementing the method of s any ofclaims 1 to 15.
  31. 31. A method for providing client access through one or more applications to data stored in a heterogeneous data storage environment substantially as described herein with reference to the accompany drawings.
  32. 32. Apparatus for providing client access through one or more applications to data stored in a heterogeneous data storage environment substantially as described herein with reference to the accompany drawings.
  33. 33. A computer program for providing client access through one or more applications to data stored in a heterogeneous data storage environment substantially as described herein with reference to the accompany drawings.
GB0316070A 2003-07-09 2003-07-09 Method and apparatus for providing access to data from a heterogeneous storage environment Expired - Lifetime GB2403825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0316070A GB2403825B (en) 2003-07-09 2003-07-09 Method and apparatus for providing access to data from a heterogeneous storage environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0316070A GB2403825B (en) 2003-07-09 2003-07-09 Method and apparatus for providing access to data from a heterogeneous storage environment

Publications (3)

Publication Number Publication Date
GB0316070D0 GB0316070D0 (en) 2003-08-13
GB2403825A true GB2403825A (en) 2005-01-12
GB2403825B GB2403825B (en) 2005-06-01

Family

ID=27741868

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0316070A Expired - Lifetime GB2403825B (en) 2003-07-09 2003-07-09 Method and apparatus for providing access to data from a heterogeneous storage environment

Country Status (1)

Country Link
GB (1) GB2403825B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012091948A3 (en) * 2010-12-28 2013-08-29 Citrix Systems, Inc. Systems and methods for database proxy request switching
US9589029B2 (en) 2010-12-28 2017-03-07 Citrix Systems, Inc. Systems and methods for database proxy request switching

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999019816A1 (en) * 1997-10-14 1999-04-22 Massachusetts Institute Of Technology Method and apparatus for automated, context-dependent retrieval of information
US20020143942A1 (en) * 2001-03-28 2002-10-03 Hua Li Storage area network resource management
WO2003017132A1 (en) * 2001-08-17 2003-02-27 Gunrock Knowledge Concepts Pty Ltd Knowledge management system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999019816A1 (en) * 1997-10-14 1999-04-22 Massachusetts Institute Of Technology Method and apparatus for automated, context-dependent retrieval of information
US20020143942A1 (en) * 2001-03-28 2002-10-03 Hua Li Storage area network resource management
WO2003017132A1 (en) * 2001-08-17 2003-02-27 Gunrock Knowledge Concepts Pty Ltd Knowledge management system and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012091948A3 (en) * 2010-12-28 2013-08-29 Citrix Systems, Inc. Systems and methods for database proxy request switching
US9589029B2 (en) 2010-12-28 2017-03-07 Citrix Systems, Inc. Systems and methods for database proxy request switching
US10726029B2 (en) 2010-12-28 2020-07-28 Citrix Systems, Inc. Systems and methods for database proxy request switching

Also Published As

Publication number Publication date
GB0316070D0 (en) 2003-08-13
GB2403825B (en) 2005-06-01

Similar Documents

Publication Publication Date Title
US11647097B2 (en) Providing access to managed content
US8166101B2 (en) Systems and methods for the implementation of a synchronization schemas for units of information manageable by a hardware/software interface system
US7590643B2 (en) Systems and methods for extensions and inheritance for units of information manageable by a hardware/software interface system
US7483923B2 (en) Systems and methods for providing relational and hierarchical synchronization services for units of information manageable by a hardware/software interface system
USRE42051E1 (en) Peer-to-peer automated anonymous asynchronous file sharing
US8046424B2 (en) Systems and methods for the utilization of metadata for synchronization optimization
US7634728B2 (en) System and method for providing a runtime environment for active web based document resources
US8396938B2 (en) Providing direct access to distributed managed content
JP5193056B2 (en) Method and system for maintaining up-to-date data of wireless devices
JP4583376B2 (en) System and method for realizing a synchronous processing service for a unit of information manageable by a hardware / software interface system
US20030018624A1 (en) Scalable eContent management system and method of using the same
US20050160076A1 (en) Method and apparatus for referring to database integration, and computer product
WO2008069125A1 (en) Data management device
KR20060080581A (en) Systems and methods for interfacing application programs with an item-based storage platform
WO2005024550A2 (en) System and method for implementation of a digital image schema in a hardware/software interface
JP4583375B2 (en) System for implementation of synchronization schema
CA2506337A1 (en) Systems and methods for extensions and inheritance for units of information manageable by a hardware/software interface system
GB2403825A (en) Method and apparatus for providing access to data from a heterogeneous storage environment
US9536244B1 (en) Managed content delivery via web services
ZHANG DOUBLE MIDDLEWARE-BASED MOBILE FILE SERVICE
Zhang A mobile file service based on double middleware
KR20060117872A (en) Systems and methods for providing synchronization services for units of information manageable by a hardware/software interface system

Legal Events

Date Code Title Description
PE20 Patent expired after termination of 20 years

Expiry date: 20230708