GB2500567A - Operating software and associated policy rules to carry out data search in order to identify missing or incomplete data - Google Patents

Operating software and associated policy rules to carry out data search in order to identify missing or incomplete data Download PDF

Info

Publication number
GB2500567A
GB2500567A GB1201725.7A GB201201725A GB2500567A GB 2500567 A GB2500567 A GB 2500567A GB 201201725 A GB201201725 A GB 201201725A GB 2500567 A GB2500567 A GB 2500567A
Authority
GB
United Kingdom
Prior art keywords
data
output
vicinity
user
key data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1201725.7A
Other versions
GB201201725D0 (en
Inventor
Derek Fordham
Kieran Hedigan
Donald Neville Bradley
Lawrence Milner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
STETFAST Ltd
Original Assignee
STETFAST Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by STETFAST Ltd filed Critical STETFAST Ltd
Priority to GB1201725.7A priority Critical patent/GB2500567A/en
Publication of GB201201725D0 publication Critical patent/GB201201725D0/en
Publication of GB2500567A publication Critical patent/GB2500567A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Abstract

Input data is parsed to generate one or more key data items or identifiers which are matched to a policy rule. Output is generated based on each policy rule, the output varying according to the proximity of the key data or identifier to other designated content in the vicinity. Key data, vicinity data, policy rules and output are displayed to a system user particularly in order to identify missing or incomplete data and suggest changes to the user. Equivalent search terms can particularly be searched for in a thesaurus. The vicinity data will particularly establish whether it is within a specified distance or proximity to the key data.

Description

I
System for identifying missing and incomplete data in electronic documents and suggesting changes.
Background
This invention relates to a system (hardware, software and associated communications systems, rules and databases) for searching data, identifying where that data is incorrect or incomplete and suggesting changes to be made to the data as a result.
Any given set of data may be defective because it is incomplete or incorrect -but the person reviewing the data may not realise this, because they lack the relevant knowledge, expertise and ability to do so. This system highlights precisely where data is missing and in what way it is defective.
There are many existing calculation, spell-checking, grammar checking and similar technical tools for reviewing numbers, text and other data, but these focus on correcting data that are already apparent (eg correcting calculations in a spreadsheet or misspelt words in a text document). These do not identify what is missing or incomplete in the relevant content under review, an equally important function in many cases.
In order to know what is missing from any set of data, an ?ndividual has to rely on his or her own personal intelligence and knowledge and/or consult experts and/or research reference materials, all of which is often difficult, expensive and time consuming. In the absence of having the requisite knowledge, a person may simply not consider an important issue arising from a set of data. For example, consumers are frequently unaware of their statutory rights and do not know when those rights have been infringed (eg where they have not received statutory notices they are entitled to by law and do not realise this). The result is that individuals may lose out because they fail to consider, let alone understand, their rights. The present invention prevents that happening.
The current invention uses computer technology to highlight what is missing, incorrect or incomplete in a given set of data. The system involves the searching of data and the application of analytical tools to ascribe a meaningful context to data. This system rapidly identifies the context in which specific data is used and can thereby identify missing or incorrect data given that context, the technical effect of which is to enable data to be reviewed automatically and more effectively than Guidance issued by the Intellectual Property Office states that "Computer-related inventions may be patentable, but only if they in valve something more than just software running an a computer in a technically ordinary way." This application describes a technically implemented innovation, namely a computer system comprising software and hardware and associated materials and communication tools, operating context-specific policies and reference databases, so as to enable incomplete or incorrect data to be identified, analysed and corrected in an entirely novel way. The effect of this is that, for example, individuals in any walk of life can analyse a set of data or documents at a speed and to a level of detail that would otherwise be impossible for them. This invention does not describe a theoretical, intellectual or economic method of doing business, but constitutes a practical technical advance in data analysis that can be used by individuals to better understand data, in particular of a type and in a context with which they are unfamiliar.
Statement of invention
Generally As stated above, unless one is an expert on a particular subject, it is hard to know whether information one receives is incorrect or incomplete. With any information, there are often a number of unknown unknowns'. To overcome this problem, the present invention helps identify information which is not contained in electronic data when it should be, and then suggests further sources of information and changes to that data.
It does this by using software policies and rules to search for missing data based on the presence, but also the absence, of certain words or other data (or groups of words or data) within defined proximities to each other.
for the purposes of this invention, data' includes all words, letters, numbers or other symbols and includes single words, phrases, sentences, paragraphs, clauses, schedules and sections of, as well as entire copies of, electronic files and documents. Proximity' means the number of characters, spaces and/or words between two or more specified pieces or sets of data. References to policy rules' or rules' include data search and processing policies, rules, tools and programmes.
Specific invention (a) High level system overview The invention will now be described with reference to the accompanying drawing. Figure 1 is a diagram summarising an example of the system according to the invention.
Using the invention in Figure 1, a system user (either a system administrator or end user) uses a screen interface display 1, and data input device (such as a computer keyboard or mobile device) 2, to input policy rules using a policy rules manager 3.
Those policy rules are stored on a policy database server 4, which is accessible by a software data search and processing engine 5. The software engine 5 operates on fixed or mobile hardware technology, in each case including an appropriate processing chip and memory storage device 7.
The software engine 5 uses selected policy rules contained in the policy database 4 to identify Key Data and/or Vicinity Data (as defined below) in any electronic data under review 6.
Having identified relevant Key Data and/or Vicinity Data (as defined below) in the electronic data under review (or confirmed the absence thereof), the software programme 5 accesses associated Output data (defined below) referred to in the applicable policy rules 4. It may also access other associated Output data (where necessary via electronic communications systems 8) which are either contained in the user's own internal or proprietary reference databases 9 and/or external or third prty resources, such as the Internet 10. All relevant Output data is then made available to system users, along with system usage information and management reports 11, using the Output display 1.
The overall effect is that the combination of hardware, software, policy rules, reference databases and inputting and output devices, enables users to quickly access relevant missing information semi-automatically in a way they would not currently do.
(b) More detailed description
The system proposed involves using an electronic device with a processing unit, to operate software and associated policy rules and databases, to carry out a matching, search or comparative process which involves the following steps: (1) parsing a data packet or set of data to generate one or more key data items or identifiers (Key Data') and matching each Key Data item or identifer to the corresponding designated or closest policy rule; (2) a processing engine to generate output (Output') based on each policy rule, such Output originating from one or more designated reference databases; (3) such Output to vary depending on the proximity to the Key Data of other designated content in the vicinity (or the absence of such data) (Vicinity Data'), in each case as specified in a reJevant policy rule; and (4) relevant Key Data, Vicinity Data, policy rules and Output being contained in one or more reference databases and capable of being displayed to the system user in various ways according to system settings and user preference.
Specifically, the system proposed involves using an electronic device with a processing unit, to operate software and associated policy rules and databases, to carry out a matching, search or comparative process which involves the foilowing steps: (1) a software search engine and associated policy rules will search a data source (such as the text of an electronic document) for one or more examples of specified data (Key Data'); and then (2) automatically search for other data in the vicinity of the Key Data (Vicinity Data') as determined by relevant policy rules, the presence and/or absence of Vicinity Data determining which associated commentary and data contained in associated reference databases (Output') will be displayed to the system user; and (3) the system highlighting or otherwise identifying, arranging or displaying the Key Data and/or any Vicinity Data and associated Output; or (4) where Key Data and/or Vicinity Data is not found, the system suggesting other Output; and (5) the Output and overall system being used to automatically identify missing or incomplete data and suggesting changes, additions and replacements to such data and enabling users to implement such changes, additions and replacements.
Advantages The system as described above can adapt and learn by tracking and recording user actions, both in terms of use of the system and amendments made to the data under review as a result of operation of the system. So where, for example, a user has in the past not taken any action in response to specific Output suggestions (eg where specific Output is rarely acted on in practice) the system may restrict use of that rule or Output in future. In that way, users can choose to review data only by reference to policy rules and Output that are deemed to be material' as opposed to rules and Output that previous use of the system suggests are immaterial' or rarely used'.
Likewise, the system can also use computer memory devices to record changes to data under review which have been made in response to Output, and use these changes as the basis for improving policy rules and associated Output in future. for example, where the system records that in response to certain Output users typically replace Key Data or Vicinity Data (eg the word King') with a certain replacement term (eg the term Monarch'), then future Output can be amended accordingly to suggest that replacement term is used in future (eg offer the user the choice of using the terms King' or Monarch').
The system will search for Key Data and Vicinity Data, but will also be able to search for equivalent terms listed in a thesaurus contained in system databases. So where, for example, a Key Data or Vicinity Data term is not found, the system will search for alternative terms contained in a thesaurus.
In this way the system will be able to operate and suggest Output both where there are exact matches between the data under review and Key Data and Vicinity Data used in the system (eg the word king'), and also where there are similar terms used (eg the word monarch').
This is a customisoble system which is capable of operating in multiple contexts depending on the policy rules and associated reference databases selected by the system user. It can be adapted for use in a wide variety of different user contexts, such as helping consumers, researchers and children deal with different types of data they are unfamiliar with.
The policy rules and corresponding Output can be applied sequentially by the search engine, in selected groups or topics, or all at the same time depending on system settings and user preferences and policy rules can operate using bath parallel and serial processing.
The Key Data or Vicinity Data may each consist of one or more groups of data being searched concurrently (rather than single terms), the system being able, for example, to search any content for Key Data A', within a specified distance from Key Data or Vicinity Data B', but not Vicinity Data C'. See example below for further details. The search and policy rules can be complex, so as to specify the context in which words appear so as to ensure that any associated Output is equally specific.
The search may be for Vicinity Data either before or after the Key Data to which the Vicinity Data relates.
The search for Vicinity Data will establish whether it is, or is not, within a specified distance or proximity to the Key Data, such proximity or distance to vary depending on individual rule settings made by a central system administrator and/or each individual user.
The Output may include possible suggested changes to Key Data and/or Vicinity Data and/or direct the system user to other sources of information, which may be extensive and technical in nature and cross refer to other data or reference sources via the Internet or otherwise.
Policy rules and Output applied by system users may vane depending on the nature of the data under review. One set of rules and associated Output may be intended for use in one context leg research), whereas another set of rules may be produced where the system it to be used to in another context (eg analysing data in a foreign language) so as to ensure that relevant rules and Output data are appropriate for the data under review.
The Key Data, Vicinity Data and/or Output shall be capable of being accepted or added to the data under review by the user and/or printed or electronically stored with the data, or on a standalone basis to form a list of issues arising from the data.
The system shaH be capable of defining different user interaction formats based on Key Data, Vicinity Data and/or Output, in particular the user having the option to review Key Data, Vicinity Data and/or Output in different online and printed paper formats.
The system shall enable different forms of user interaction for how Output are displayed to the user; any Output and user interactions being displayed in a variety of ways, including but not limited to, pop-ups or balloons.
The applicability of rules and/or the system settings can he selected by the user based on the types of data being reviewed so that one user may not necessarily select to use the same rules as another user.
The system can be used to identify not only specific missing individual words or items, but also missing sets of data such as entire clauses in a document, or indeed entire documents or files.
The above process can be carried out by identifying relevant Key Data, Vicinity Data and/or Output term by term or by identifying all relevant Key Data) Vicinity Data and/or Output all at the same time.
The user may choose to operate the above system using only certain rules and/or all rules at the same time.
Examples
First example:
A specific embodiment of the invention will now be described by way of example by reference to the following sample data. A typical disclaimer in a document (for example, terms used by a consumer holiday trave' operator) may include the following text: We shall not be liable for indirect or consequential loss'.
In anticipation of this issue arising in data/documents of this type, the system may include a policy rule as follows: "Target: Consequential loss Key Data=(indirectjoss) (up to 10 words) (consequential) Vicinity Data: OUTvicinity.020.030(deposits) Output: Does this include lost deposits (as excluded consequential loss)? Read the following Internet link: fwww.ABTA.orp.ukl Replacement Data: indirect or consequential loss, which shall jnotJ include lost deposits' End of rule." Explanation: the purpose and practical effect of the above rule is explained, for the purpose of this application, by the additional explanatory data in [square brackets] below as follows: Target: Consequential loss [i.e. this policy rule relates specifically to the data term consequential Ioss'J Key Data=(indirect,ioss) (up to 10 words) (consequential) (the system software will search for Key Data consisting of the words indirect' or loss' within any 10 words of the word consequential'] Vicinity Data: OUTvicinity.020.030(deposits) [the system software will also search for the word deposits' to check that it is NOT in the vicinity (te. it is outside the vicinity) of 20 words before, or thirty words after the Key Data referred to above.] Output: Does this include lost depo5it5 (as excluded consequential loss)? Read the following Internet link: (www.ABTA.orq.ukJ (Assuming the above conditions have been met -Le. the word deposits' has not been found within the vicinity of the Key Data indirect' or loss' and consequentialç the system will display this Output to the user, identifying the issue in question (i.e. the absence of a reference to last depasits this being an issue that a consumer might reasonably want to have highlighted to them) and a link to additional resources, in this case an internet site where additional information may be available if required.] Replacement: indirect or consequential loss, which shall (notJ include lost deposits' [The system will also display this (and other similar) alternative forms of wording which the user may choose to use to replace or amend the existing wording.] End of rule." The application of the above rule means that where, for example, an untrained consumer or individual reviews this standard disclaimer, the system will help him or her consider whether lost deposits should fall within the scope of excluded consequential losses (an issue which he or she might not otherwise consider or clarify) and if not, whether this is something he or she might wish to take up with the person issuing this disclaimer.
Other rules can be set up to address other issues arising from different types of data in a similar way.
For example, the words December and 2012' may appear in a document but without any other numbers in the vicinity, and the system may then prompt the user to add a specific date. As a result, through a combination of Key Data search software and policy rules, plus Vicinity Data searches to identify other data inside or outside the vicinity of the Key Data) plus associated Output, missing data and issues can be automatically highlighted which a system user would not otherwise have considered.
Second example:
The above examples relate to detailed textual search relating to particular words or numbers, however a similar rule could also be used to identify the presence or absence of an entire body of data, such as an entire section of a document. This system is particularly beneficial for data which tend to follow a particular style or form (such as statutory notices and standard form documentation). By spotting where data either conforms with or differs from the statutory form or standard form, the system assists the person reviewing the data to avoid risks associated with that data.
This system is therefore capable, through its search for multiple different combinations of data and/or the absence of data within a specified vicinity, to provide exceedingly detailed and informed Output in respect of any data. Without all these different elements which cumulatively help to pin-point what each issue is and what the missing data is that should be included in any data under review, this would work tar ?ess well. it would be imposstle to know what key words in relation to a specific issue are missing from a data.
How it looks from a user standpoint: The above examples may display to the user of the system in a variety of ways, for example as follows in relation to the first example above: Ouptut: Does this include lost deposits (as excluded consequential loss)? Read the following Internet link: www.ABTA.org Replace with: indirect or consequential loss, which shall (not! include lost deposits J_REPLACE We shall not be liable for indirect or conseouential loss.
GB1201725.7A 2012-02-01 2012-02-01 Operating software and associated policy rules to carry out data search in order to identify missing or incomplete data Withdrawn GB2500567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1201725.7A GB2500567A (en) 2012-02-01 2012-02-01 Operating software and associated policy rules to carry out data search in order to identify missing or incomplete data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1201725.7A GB2500567A (en) 2012-02-01 2012-02-01 Operating software and associated policy rules to carry out data search in order to identify missing or incomplete data

Publications (2)

Publication Number Publication Date
GB201201725D0 GB201201725D0 (en) 2012-03-14
GB2500567A true GB2500567A (en) 2013-10-02

Family

ID=45876452

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1201725.7A Withdrawn GB2500567A (en) 2012-02-01 2012-02-01 Operating software and associated policy rules to carry out data search in order to identify missing or incomplete data

Country Status (1)

Country Link
GB (1) GB2500567A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070198501A1 (en) * 2006-02-09 2007-08-23 Ebay Inc. Methods and systems to generate rules to identify data items
US20070288507A1 (en) * 2006-06-07 2007-12-13 Motorola, Inc. Autonomic computing method and apparatus
US20080320550A1 (en) * 2007-06-21 2008-12-25 Motorola, Inc. Performing policy conflict detection and resolution using semantic analysis
US20100011027A1 (en) * 2008-07-11 2010-01-14 Motorola, Inc. Policy rule conflict detection and management
US20120323947A1 (en) * 2011-06-14 2012-12-20 Microsoft Corporation Enriching Database Query Responses using Data from External Data Sources

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070198501A1 (en) * 2006-02-09 2007-08-23 Ebay Inc. Methods and systems to generate rules to identify data items
US20070288507A1 (en) * 2006-06-07 2007-12-13 Motorola, Inc. Autonomic computing method and apparatus
US20080320550A1 (en) * 2007-06-21 2008-12-25 Motorola, Inc. Performing policy conflict detection and resolution using semantic analysis
US20100011027A1 (en) * 2008-07-11 2010-01-14 Motorola, Inc. Policy rule conflict detection and management
US20120323947A1 (en) * 2011-06-14 2012-12-20 Microsoft Corporation Enriching Database Query Responses using Data from External Data Sources

Also Published As

Publication number Publication date
GB201201725D0 (en) 2012-03-14

Similar Documents

Publication Publication Date Title
Goel et al. Robustness gym: Unifying the NLP evaluation landscape
US10169337B2 (en) Converting data into natural language form
Treude et al. Extracting development tasks to navigate software documentation
Van Hooland et al. Exploring entity recognition and disambiguation for cultural heritage collections
AU2014318392B2 (en) Systems, methods, and software for manuscript recommendations and submissions
US11354501B2 (en) Definition retrieval and display
US20160103837A1 (en) System for, and method of, ranking search results obtained by searching a body of data records
US20130198599A1 (en) System and method for analyzing a resume and displaying a summary of the resume
US20090007271A1 (en) Identifying attributes of aggregated data
US9754022B2 (en) System and method for language sensitive contextual searching
Zhang et al. Where2Change: Change request localization for app reviews
US20120179709A1 (en) Apparatus, method and program product for searching document
US20180189380A1 (en) Job search engine
JP2007011604A (en) Fault diagnostic system and program
Szymański et al. Review on wikification methods
Li et al. To Do or Not To Do: Distill crowdsourced negative caveats to augment api documentation
US10120858B2 (en) Query analyzer
Liu et al. Software Vulnerability Detection with GPT and In-Context Learning
JP2007172260A (en) Document rule preparation support apparatus, document rule preparation support method and document rule preparation support program
US20140280147A1 (en) Database ontology creation
US20150186363A1 (en) Search-Powered Language Usage Checks
KR101238927B1 (en) Electronic book contents searching service system and electronic book contents searching service method
US9558269B2 (en) Extracting and mining of quote data across multiple languages
Monaco Methods for in-sourcing authority control with MarcEdit, SQL, and regular expressions
GB2500567A (en) Operating software and associated policy rules to carry out data search in order to identify missing or incomplete data

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)