GB2500567A

GB2500567A - Operating software and associated policy rules to carry out data search in order to identify missing or incomplete data

Info

Publication number: GB2500567A
Application number: GB1201725.7A
Authority: GB
Inventors: Derek Fordham; Kieran Hedigan; Donald Neville Bradley; Lawrence Milner
Original assignee: STETFAST Ltd
Current assignee: STETFAST Ltd
Priority date: 2012-02-01
Filing date: 2012-02-01
Publication date: 2013-10-02
Also published as: GB201201725D0

Abstract

Input data is parsed to generate one or more key data items or identifiers which are matched to a policy rule. Output is generated based on each policy rule, the output varying according to the proximity of the key data or identifier to other designated content in the vicinity. Key data, vicinity data, policy rules and output are displayed to a system user particularly in order to identify missing or incomplete data and suggest changes to the user. Equivalent search terms can particularly be searched for in a thesaurus. The vicinity data will particularly establish whether it is within a specified distance or proximity to the key data.

Description

I

System for identifying missing and incomplete data in electronic documents and suggesting changes.

Background

This invention relates to a system (hardware, software and associated communications systems, rules and databases) for searching data, identifying where that data is incorrect or incomplete and suggesting changes to be made to the data as a result.

Any given set of data may be defective because it is incomplete or incorrect -but the person reviewing the data may not realise this, because they lack the relevant knowledge, expertise and ability to do so. This system highlights precisely where data is missing and in what way it is defective.

There are many existing calculation, spell-checking, grammar checking and similar technical tools for reviewing numbers, text and other data, but these focus on correcting data that are already apparent (eg correcting calculations in a spreadsheet or misspelt words in a text document). These do not identify what is missing or incomplete in the relevant content under review, an equally important function in many cases.

In order to know what is missing from any set of data, an ?ndividual has to rely on his or her own personal intelligence and knowledge and/or consult experts and/or research reference materials, all of which is often difficult, expensive and time consuming. In the absence of having the requisite knowledge, a person may simply not consider an important issue arising from a set of data. For example, consumers are frequently unaware of their statutory rights and do not know when those rights have been infringed (eg where they have not received statutory notices they are entitled to by law and do not realise this). The result is that individuals may lose out because they fail to consider, let alone understand, their rights. The present invention prevents that happening.

The current invention uses computer technology to highlight what is missing, incorrect or incomplete in a given set of data. The system involves the searching of data and the application of analytical tools to ascribe a meaningful context to data. This system rapidly identifies the context in which specific data is used and can thereby identify missing or incorrect data given that context, the technical effect of which is to enable data to be reviewed automatically and more effectively than Guidance issued by the Intellectual Property Office states that "Computer-related inventions may be patentable, but only if they in valve something more than just software running an a computer in a technically ordinary way." This application describes a technically implemented innovation, namely a computer system comprising software and hardware and associated materials and communication tools, operating context-specific policies and reference databases, so as to enable incomplete or incorrect data to be identified, analysed and corrected in an entirely novel way. The effect of this is that, for example, individuals in any walk of life can analyse a set of data or documents at a speed and to a level of detail that would otherwise be impossible for them. This invention does not describe a theoretical, intellectual or economic method of doing business, but constitutes a practical technical advance in data analysis that can be used by individuals to better understand data, in particular of a type and in a context with which they are unfamiliar.

Statement of invention

Generally As stated above, unless one is an expert on a particular subject, it is hard to know whether information one receives is incorrect or incomplete. With any information, there are often a number of unknown unknowns'. To overcome this problem, the present invention helps identify information which is not contained in electronic data when it should be, and then suggests further sources of information and changes to that data.

It does this by using software policies and rules to search for missing data based on the presence, but also the absence, of certain words or other data (or groups of words or data) within defined proximities to each other.

for the purposes of this invention, data' includes all words, letters, numbers or other symbols and includes single words, phrases, sentences, paragraphs, clauses, schedules and sections of, as well as entire copies of, electronic files and documents. Proximity' means the number of characters, spaces and/or words between two or more specified pieces or sets of data. References to policy rules' or rules' include data search and processing policies, rules, tools and programmes.

Specific invention (a) High level system overview The invention will now be described with reference to the accompanying drawing. Figure 1 is a diagram summarising an example of the system according to the invention.

Using the invention in Figure 1, a system user (either a system administrator or end user) uses a screen interface display 1, and data input device (such as a computer keyboard or mobile device) 2, to input policy rules using a policy rules manager 3.

Those policy rules are stored on a policy database server 4, which is accessible by a software data search and processing engine 5. The software engine 5 operates on fixed or mobile hardware technology, in each case including an appropriate processing chip and memory storage device 7.

The software engine 5 uses selected policy rules contained in the policy database 4 to identify Key Data and/or Vicinity Data (as defined below) in any electronic data under review 6.

Having identified relevant Key Data and/or Vicinity Data (as defined below) in the electronic data under review (or confirmed the absence thereof), the software programme 5 accesses associated Output data (defined below) referred to in the applicable policy rules 4. It may also access other associated Output data (where necessary via electronic communications systems 8) which are either contained in the user's own internal or proprietary reference databases 9 and/or external or third prty resources, such as the Internet 10. All relevant Output data is then made available to system users, along with system usage information and management reports 11, using the Output display 1.

The overall effect is that the combination of hardware, software, policy rules, reference databases and inputting and output devices, enables users to quickly access relevant missing information semi-automatically in a way they would not currently do.

(b) More detailed description

The system proposed involves using an electronic device with a processing unit, to operate software and associated policy rules and databases, to carry out a matching, search or comparative process which involves the following steps: (1) parsing a data packet or set of data to generate one or more key data items or identifiers (Key Data') and matching each Key Data item or identifer to the corresponding designated or closest policy rule; (2) a processing engine to generate output (Output') based on each policy rule, such Output originating from one or more designated reference databases; (3) such Output to vary depending on the proximity to the Key Data of other designated content in the vicinity (or the absence of such data) (Vicinity Data'), in each case as specified in a reJevant policy rule; and (4) relevant Key Data, Vicinity Data, policy rules and Output being contained in one or more reference databases and capable of being displayed to the system user in various ways according to system settings and user preference.

Specifically, the system proposed involves using an electronic device with a processing unit, to operate software and associated policy rules and databases, to carry out a matching, search or comparative process which involves the foilowing steps: (1) a software search engine and associated policy rules will search a data source (such as the text of an electronic document) for one or more examples of specified data (Key Data'); and then (2) automatically search for other data in the vicinity of the Key Data (Vicinity Data') as determined by relevant policy rules, the presence and/or absence of Vicinity Data determining which associated commentary and data contained in associated reference databases (Output') will be displayed to the system user; and (3) the system highlighting or otherwise identifying, arranging or displaying the Key Data and/or any Vicinity Data and associated Output; or (4) where Key Data and/or Vicinity Data is not found, the system suggesting other Output; and (5) the Output and overall system being used to automatically identify missing or incomplete data and suggesting changes, additions and replacements to such data and enabling users to implement such changes, additions and replacements.

Advantages The system as described above can adapt and learn by tracking and recording user actions, both in terms of use of the system and amendments made to the data under review as a result of operation of the system. So where, for example, a user has in the past not taken any action in response to specific Output suggestions (eg where specific Output is rarely acted on in practice) the system may restrict use of that rule or Output in future. In that way, users can choose to review data only by reference to policy rules and Output that are deemed to be material' as opposed to rules and Output that previous use of the system suggests are immaterial' or rarely used'.

Likewise, the system can also use computer memory devices to record changes to data under review which have been made in response to Output, and use these changes as the basis for improving policy rules and associated Output in future. for example, where the system records that in response to certain Output users typically replace Key Data or Vicinity Data (eg the word King') with a certain replacement term (eg the term Monarch'), then future Output can be amended accordingly to suggest that replacement term is used in future (eg offer the user the choice of using the terms King' or Monarch').

The system will search for Key Data and Vicinity Data, but will also be able to search for equivalent terms listed in a thesaurus contained in system databases. So where, for example, a Key Data or Vicinity Data term is not found, the system will search for alternative terms contained in a thesaurus.

In this way the system will be able to operate and suggest Output both where there are exact matches between the data under review and Key Data and Vicinity Data used in the system (eg the word king'), and also where there are similar terms used (eg the word monarch').

This is a customisoble system which is capable of operating in multiple contexts depending on the policy rules and associated reference databases selected by the system user. It can be adapted for use in a wide variety of different user contexts, such as helping consumers, researchers and children deal with different types of data they are unfamiliar with.

The policy rules and corresponding Output can be applied sequentially by the search engine, in selected groups or topics, or all at the same time depending on system settings and user preferences and policy rules can operate using bath parallel and serial processing.

The Key Data or Vicinity Data may each consist of one or more groups of data being searched concurrently (rather than single terms), the system being able, for example, to search any content for Key Data A', within a specified distance from Key Data or Vicinity Data B', but not Vicinity Data C'. See example below for further details. The search and policy rules can be complex, so as to specify the context in which words appear so as to ensure that any associated Output is equally specific.

The search may be for Vicinity Data either before or after the Key Data to which the Vicinity Data relates.

The search for Vicinity Data will establish whether it is, or is not, within a specified distance or proximity to the Key Data, such proximity or distance to vary depending on individual rule settings made by a central system administrator and/or each individual user.

The Output may include possible suggested changes to Key Data and/or Vicinity Data and/or direct the system user to other sources of information, which may be extensive and technical in nature and cross refer to other data or reference sources via the Internet or otherwise.

Policy rules and Output applied by system users may vane depending on the nature of the data under review. One set of rules and associated Output may be intended for use in one context leg research), whereas another set of rules may be produced where the system it to be used to in another context (eg analysing data in a foreign language) so as to ensure that relevant rules and Output data are appropriate for the data under review.

The Key Data, Vicinity Data and/or Output shall be capable of being accepted or added to the data under review by the user and/or printed or electronically stored with the data, or on a standalone basis to form a list of issues arising from the data.

The system shaH be capable of defining different user interaction formats based on Key Data, Vicinity Data and/or Output, in particular the user having the option to review Key Data, Vicinity Data and/or Output in different online and printed paper formats.

The system shall enable different forms of user interaction for how Output are displayed to the user; any Output and user interactions being displayed in a variety of ways, including but not limited to, pop-ups or balloons.

The applicability of rules and/or the system settings can he selected by the user based on the types of data being reviewed so that one user may not necessarily select to use the same rules as another user.

The system can be used to identify not only specific missing individual words or items, but also missing sets of data such as entire clauses in a document, or indeed entire documents or files.

The above process can be carried out by identifying relevant Key Data, Vicinity Data and/or Output term by term or by identifying all relevant Key Data) Vicinity Data and/or Output all at the same time.

The user may choose to operate the above system using only certain rules and/or all rules at the same time.

Examples

First example:

A specific embodiment of the invention will now be described by way of example by reference to the following sample data. A typical disclaimer in a document (for example, terms used by a consumer holiday trave' operator) may include the following text: We shall not be liable for indirect or consequential loss'.

In anticipation of this issue arising in data/documents of this type, the system may include a policy rule as follows: "Target: Consequential loss Key Data=(indirectjoss) (up to 10 words) (consequential) Vicinity Data: OUTvicinity.020.030(deposits) Output: Does this include lost deposits (as excluded consequential loss)? Read the following Internet link: fwww.ABTA.orp.ukl Replacement Data: indirect or consequential loss, which shall jnotJ include lost deposits' End of rule." Explanation: the purpose and practical effect of the above rule is explained, for the purpose of this application, by the additional explanatory data in [square brackets] below as follows: Target: Consequential loss [i.e. this policy rule relates specifically to the data term consequential Ioss'J Key Data=(indirect,ioss) (up to 10 words) (consequential) (the system software will search for Key Data consisting of the words indirect' or loss' within any 10 words of the word consequential'] Vicinity Data: OUTvicinity.020.030(deposits) [the system software will also search for the word deposits' to check that it is NOT in the vicinity (te. it is outside the vicinity) of 20 words before, or thirty words after the Key Data referred to above.] Output: Does this include lost depo5it5 (as excluded consequential loss)? Read the following Internet link: (www.ABTA.orq.ukJ (Assuming the above conditions have been met -Le. the word deposits' has not been found within the vicinity of the Key Data indirect' or loss' and consequentialç the system will display this Output to the user, identifying the issue in question (i.e. the absence of a reference to last depasits this being an issue that a consumer might reasonably want to have highlighted to them) and a link to additional resources, in this case an internet site where additional information may be available if required.] Replacement: indirect or consequential loss, which shall (notJ include lost deposits' [The system will also display this (and other similar) alternative forms of wording which the user may choose to use to replace or amend the existing wording.] End of rule." The application of the above rule means that where, for example, an untrained consumer or individual reviews this standard disclaimer, the system will help him or her consider whether lost deposits should fall within the scope of excluded consequential losses (an issue which he or she might not otherwise consider or clarify) and if not, whether this is something he or she might wish to take up with the person issuing this disclaimer.

Other rules can be set up to address other issues arising from different types of data in a similar way.

For example, the words December and 2012' may appear in a document but without any other numbers in the vicinity, and the system may then prompt the user to add a specific date. As a result, through a combination of Key Data search software and policy rules, plus Vicinity Data searches to identify other data inside or outside the vicinity of the Key Data) plus associated Output, missing data and issues can be automatically highlighted which a system user would not otherwise have considered.

Second example:

The above examples relate to detailed textual search relating to particular words or numbers, however a similar rule could also be used to identify the presence or absence of an entire body of data, such as an entire section of a document. This system is particularly beneficial for data which tend to follow a particular style or form (such as statutory notices and standard form documentation). By spotting where data either conforms with or differs from the statutory form or standard form, the system assists the person reviewing the data to avoid risks associated with that data.

This system is therefore capable, through its search for multiple different combinations of data and/or the absence of data within a specified vicinity, to provide exceedingly detailed and informed Output in respect of any data. Without all these different elements which cumulatively help to pin-point what each issue is and what the missing data is that should be included in any data under review, this would work tar ?ess well. it would be imposstle to know what key words in relation to a specific issue are missing from a data.

How it looks from a user standpoint: The above examples may display to the user of the system in a variety of ways, for example as follows in relation to the first example above: Ouptut: Does this include lost deposits (as excluded consequential loss)? Read the following Internet link: www.ABTA.org Replace with: indirect or consequential loss, which shall (not! include lost deposits J_REPLACE We shall not be liable for indirect or conseouential loss.