AU2008202384A1

AU2008202384A1 - Ucadia Semantic Classification System

Info

Publication number: AU2008202384A1
Application number: AU2008202384A
Authority: AU
Inventors: Frank O'Collins
Original assignee: Individual
Current assignee: Individual
Priority date: 2008-05-23
Filing date: 2008-05-30
Publication date: 2009-12-10

Description

AUSTRALIA Patents Act 1990 COMPLETE SPECIFICATION STANDARD PATENT Ucadia Semantic Classification System A system for the semantic classification of common languages into a universally consistent structure of meaning using the Ucadia Classification System and Ucadia Symbols System.

Background Field of Invention [001] This invention relates to a wide cross section of fields of science and sociology including: Natural Language, Learning Systems, Semantics, Early Childhood Education, Remedial Education Programs, Speech Recognition, Computer Science and Artificial Intelligence research in the most efficient semantic classification of common languages into a universally consistent structure of meaning. Background of the Invention [002] The ability to properly classify spoken and/or written language into a meaningful structure is an essential precedent for Natural Language Processing. Unless the constituent elements of a spoken and/or written string of text can be properly identified according to some rule based system, a valid translation is not possible. [003] Similarly, words assembled without reference to a rule based system may in themselves transmit no effective meaning and/or conflicting meaning. The evidence of meaning therefore of any string of written or spoken words is largely dependent upon it adhering to certain rules of construction to a specific system. [004] In all cases, the formal rule system considered a standard component of all natural language systems is rules of grammar and syntax. The complexity of rules (grammar and syntax) differs from one natural language to the other. Arguably one of the most complex languages to master is English, given the wide variety of "exceptions" to rules as well as general rules. [005] This level of complexity of grammar and syntax in turn has a natural impact on any methods seeking to classify a spoken statement in a natural language such as English into its valid constituent parts. [006] A further layer of complexity exists in attempting to classify the meaningful structure of a spoken or written statement in the existence of such inherent problems of words with multiple meanings, colloquialisms that can only be understood by the context of the spoken statement and associated culture and syntactic ambiguity whereby a phrase punctuated or spoken differently can have a completely different meaning. [007] Yet a further level of complexity exists in relation to the identification of the underlying "semantics", or meaning of a spoken or written statement. Whereas a translation may be attempted through knowledge and methods associated with a source language and a target language, the underlying Ucadia Semantic Classification System 2 meaning of a statement required deeper context, which in itself implies a deeper understanding of the relationships between words and concepts. [008] The theoretical advantage of knowing the underlying meaning of a string of text is then being able to properly translate it into any target language. However, the computational problem that has existed rests in the ability to utilize a mapping of concepts and real world objects that requires less processing time but of similar quality to a world encyclopedia enabling the deeper relationships between concepts and objects to be quickly referenced. The lack of a computational solution to this knowledge problem for semantic understanding is why rules-based translation (grammar and syntax) remains the underpinning for virtually all inventions concerning Natural Language Processing. [009] Excluding attempts of semantic understanding, three generalized approaches have been adopted by the majority of methods and systems associated with Natural Language Processing: The first is to seek to identify and classify the Predicate first, the second is to identify and classify the Subject first and the third is to identify and classify a Phrase has being identical or similar to a previous phrase deciphered. [010] The most common methods and systems for Natural Language Processing have been those that focus on the identification and classification of the Subject first. For example one accepted method proposes a combination of dictionary sets listing common words in target languages combined with a rule set for distinguishing common grammatical structure. [011] In recent years a number of successful patents have been granted for methods and systems that address the third method of identification and classification of Phrases using a dictionary of similar phrases previously deciphered and statistical methods to enable approximate comparisons. For example, large commercial translation enterprises both use this method as an essential component to its invention associated with Natural Language Processing. [012] Less common are systems that begin with identification of the Predicate. In cases where patents have been successfully filed, the emphasis remains on the identification of the verb and any modifiers compared to the general intent, tense and plurality of the phrase. [013] In all three generalized approaches used by other inventions to Natural Language Processing, the assumption exists that the rules by which a sentence is classified is according to the rules (grammar and syntax) of that language. This means the level of complexity of the rules of that language are assumed to necessarily play a part in processing a statement whether or not the statement is then converted into a third-part independent state or "hybrid" between various natural languages. Ucadia Semantic Classification System 3 [014] In all prior art cases discovered, there exists a universal assumption that the underlying semantic meaning of a statement is considered prohibitively complex, while the only valid avenue is to seek to ensure grammatically correct translation occurs. Given that most languages provide multiple words possessing similar meanings, the computational likelihood of error of grammar to grammar translation remains high. Furthermore, the assumption of grammar ahead of semantic accuracy continues to produce inventions that periodically produce non-sense statements and translations, in spite of being grammatically correct. [015] In contrast the system of the present invention is able to determine the core semantic meaning of a statement and by-pass the complexity of present grammatical rule systems using the invention of the Australian Standard Patent Application SPES-1 0700414 entitled "Ucadia Classification System"- A classification system for the identification and association of theoretical and real world objects and universal language components into a consistent non-duplicating structure [016] Built within this inherent classification structure is a unique knowledge structure, providing an ability to map point to point any concept with any other concept simultaneously through relationship of function, meaning and classification. This means an unprecedented depth may be set to the level of semantic understanding sought by a statement. [017] Furthermore, the invention of Australian Standard Patent Application SPES-1 0705264 entitled "Ucadia Symbols System" enables the precise classification of the intent, tense and plurality of a statement into one of only a few dozen possible numeric or symbolic combinations thereby overcoming the computation problem of many hundreds of millions of possible word combinations even in simple statements. [018] The prior art searches have not found any evidence for a completely new semantic approach to language translation capable of bypassing rules based (grammar and syntax) dependency and capable of valid semantic identification. Numerous examples to date claim unique approaches, but none have demonstrated a comprehensive, sophisticated classification system enabling semantic identification. [019] One object of the present invention is to provide a unique classification methodology for the proper assembly of Ucadia Symbols as defined by (Australian Standard Patent Application SPES-10705264 entitled "Ucadia Symbols System") that relate to the proper classification system of the Ucadia Classification System as defined by (Australian Standard Patent Application SPES-1 0700414 entitled "Ucadia Classification System") into optimum statements of meaning. [020] Another object of the present invention is to utilize a unique classification method outlined in previous provisional patent application (Australian Provisional Patent Application SPEP-1 0683413 entitled "Ucadia Classification System") to provide a simple, non-duplicating Ucadia Semantic Classification System 4 method to identify the underlying semantic meaning of each key element of a statement, translating it to a unique classification system which in turn can then be used to translate the semantically recognized statement to any language and in any format. [021] Another object of the present invention is to provide a system that entails a wide margin for error, whereby grammatically correct translations are less important than semantically correct translations. In enabling such precision in accurately identifying the underlying semantic meaning of a statement, a response shall have a higher likelihood of acceptance, even if the statement is not grammatically perfect. This margin for error enables a simpler mathematical model and set of unique methods. Definitions and cross references Cross references to related Provisional Patent Applications [022] This application claims the benefit of Australian Standard Patent Application SPES-10700414 entitled "Ucadia Classification System" A classification system for the identification and association of theoretical and real world objects and universal language components into a consistent non-duplicating structure, priority date 23/05/2008 13:34:40. [023] This application claims the benefit of Australian Standard Patent Application SPES-10705264 entitled "Ucadia Symbols System" A system for the symbolic representation of theoretical and real world objects and universal language components for the transmission of meaning, priority date 23/05/2008 14:32:06. Definitions "Adjective" a word serving as a modifier of a Noun to denote a quality of the thing named, to indicate its quantity or extent, or to specify a thing as distinct from something else. "Adverb" a word typically serving as a modifier of a verb, an adjective, another adverb, a preposition, a phrase, a clause, or a sentence, expressing some relation of manner or quality, place, time, degree, number, cause, opposition, affirmation, or denial. "Clause" A group of words containing a Subject and Predicate and functioning as a member of a complex or compound sentence. "Homograph" a word that is spelled the same as another but is pronounced differently Ucadia Semantic Classification System 5 "Grammar" The study and set of rules governing the use of a particular natural or artificial language. "Homonym" a word that has the same pronunciation and spelling as another word, but a different meaning. "Idiom" An expression whose meaning cannot be deduced from the literal definitions and the arrangement of its parts, but refers instead to a figurative meaning that is known only through conventional use. "Natural A written and/or spoken language found to be in common use Language" for general-purpose communication. Natural A subfield of artificial intelligence and linguistics involving the Language study and solution to the problems of automated generation Processing of language and assumed "understanding" of natural human languages. "Noun" Any member of a class of words that typically can be combined with determiners to serve as the subject of a verb and refer to an entity, quality, state, action, or concept. "Parsing" The process of analyzing a sequence of language elements usually described as tokens in order to determine its grammatical structure with respect to a given formal system of grammar. It is also formally named syntax analysis. "Phrase" A word or group of words forming a syntactic constituent with a single grammatical function. "Predicate" Part of a sentence or clause that expresses what is said of the subject and that usually consists of a verb with or without objects, complements, or adverbial modifiers. With the Subject, it is considered the second constituent element of a Sentence. "Preposition" A word that typically combines with a Noun to form a phrase which usually expresses a modification or Predication. "Pronoun" A word that substitutes for a noun or noun phrase with or without a determiner. "Semantics" The aspects of meaning that are expressed in a language, system, or other form of representation. Ucadia Semantic Classification System 6 "Sentence" A word, Clause, or Phrase or a group of clauses or phrases forming a syntactic unit which expresses an assertion, a question, a command, a wish, an exclamation, or the performance of an action, that in writing usually begins with a capital letter and concludes with appropriate end punctuation, and that in speaking is distinguished by characteristic patterns of stress, pitch, and pauses. "Syntactic A standard problem of computational linguistics, being Ambiguity" naturally occurring ambiguity found in natural language grammar whereby a phrase or sentence may have multiple possible methods of being Parsed and therefore multiple interpretations. "Syntax" the study of the rules, or "patterned relations", that govern the way words combine to form phrases and phrases combine to form sentences. "Verb" A word that usually denotes an action, an occurrence, or a state of being. "Word Sense A standard problem of computational linguistics, being the Disambiguation" problem of determining in which sense a word having a number of distinct senses is used in a given sentence. Brief Abstract of Invention [024] The system of the present invention is a system based on a set of rules and methods for disassembling and analyzing any statement in any language to enable the statement to be translated into the unique classification system identified in Australian Standard Patent Application SPES-1 0700414 entitled "Ucadia Classification System" (A classification system for the identification and association of theoretical and real world objects and universal language components into a consistent non duplicating structure, filed 23/05/2008 13:34:40) based on a standard methodology being: (1) Input preparation (2) Input Parsing, (3) Input classification (4) Tensor intent classification and (5) Semantic matching. [025] Unlike existing inventions, the present invention seeks to identify the underlying semantic meaning of any given input text thereby making its accurate translation into any other language possible with greatly reduced reliance upon grammatical and syntactical considerations. [026] In addition, unlike existing inventions, the present invention makes the identification of the tensor structure- the perspective, intent and tense Ucadia Semantic Classification System 7 combined as the essential first priority, as opposed to identification of the subject or predicate (verb) of a statement. As there are only so many combinations of tensor systems, this enables millions of potential statements to be reduced down to a few dozen known Tensor structures. [027] The present invention defines five (5) steps for the proper declaration of a statement of meaning using the Ucadia Symbols System. The order of declaration is: (1) Tensors (intent, tense, perspective) (2) Objects (3) Associators (and, but) (4) Relators (is, under etc) (5) Modifiers (brown, big, small) [028] A proper declaration of meaning requires at least one object from one of the first two categories (1) Tensors (intent, tense, perspective) or (2) Objects. A declaration may not have more than one tensor. [029] Tensor objects are always aligned first to the left, followed by first logical object beginning with Object, Relators, Modifiers and Associators. A new Tensor object signifies a new statement of meaning. [030] The present invention provides for three (3) times during the translation process whereby existing libraries of common phrases may be compared. (1) In the first instance, any input that is identified as having been properly parsed is checked against a library of pre-existing common string matches. (2) In the second instance, once a statement has been properly classified it is checked against a library of pre-existing common structures. (3) In the third and final instance, once the tensor intent has been classified the text is checked against the list of pre-existing tensor statements and structures for a match. This three step matching process enables for faster translation processing by eliminating steps where a valid match can be made. [031] An input for translation may be without any proper or immediate clarity as to where one thought/statement ends and another begins. At Input Parsing, a statement or group of statements are analyzed to establish its state of punctuation to the stream of content is declared into "chunks" being sub-sentence strings into sentences and then paragraphs, if not already in such form. [032] The absence of punctuation results in a treatment being applied to the target text based on declaring (1) all sub sentences first (2) reviewing groups of sub sentences to establish hard breaks as sentences and re apply. [033] Upon the Input Parsing phase of a statement, the statement is then classified according to the following system in two parts. The first part is if a sentence has already been classified (coded and interpreted), or if the sentence/statement is new and not previously classified. If it is a new Ucadia Semantic Classification System 8 sentence, not previously classified, then the sentence is classified to the DA level. The order of classification is: (1) Associators (and, but) (2) Relators (is, under etc) (3) Modifiers (brown, big, small) (4) Tensors (intent, tense, perspective) (5) Objects Objects (nouns) are classified last as they tend to be the most unique elements of language, whereas (1) to (4) enables the identification of the "structure" of a phrase. [034] Upon reaching Tensor intent classification a first attempt is made to identify the pre-existing pattern of the statement. Under the system of the present invention all possible sentence tensor constructions may be classified by thirty-nine (39) basic types. This means from the hundreds of thousands of possible unique sentence constructions, all can be identified as belonging to only one of these unique Tensor classification types. A preliminary classification is therefore done on the existence of a previous tensor sentence prior to the specific identification of objects (nouns). [035] Upon completing the Tensor intent classification phase, the statement is finally classified by its common structure and its underlying semantic intent. Because language enables the same semantic message to be said a multiple of ways, a new sentence may or may not be a different version of the same thing. Once a new first time sentence has been classified to the DA level, it is then checked as a pre-existing pattern. However, this list is constructed according to the classification system, not according to the original sentence construction so that the possible combinations begin with 39 possible tensor classifications + common attributes + unique objects. Brief Description of the Drawings [036] FIG. 1 is a list of the PRIMARY ELEMENTS of the Ucadia Semantic Classification System. [037] FIG. 2 is a list of the PURPOSE SET(S) of the Ucadia Semantic Classification System. [038] FIG. 3 is a graphic representation of the size of PURPOSE SET(S) of the Ucadia Semantic Classification System. Ucadia Semantic Classification System 9 [039] FIG. 4 is a list of the COMMON MEANING SET(S) of the Ucadia Semantic Classification System. [040] FIG. 5 is a graphic representation of the size of COMMON MEANING SET(S) of the Ucadia Semantic Classification System. [041] FIG. 6 is a list of the set of ALL SETS of the Ucadia Semantic Classification System. [042] FIG. 7 is a list of the DIA SET of the Ucadia Semantic Classification System. [043] FIG. 8 is a list of the LINEAR DIA SUB SET of the Ucadia Semantic Classification System. [044] FIG. 9 is a set of VALID DAT (SET OF DA) of the Ucadia Semantic Classification System. [045] FIG. 10 is another set of VALID DAT (SET OF DA) of the Ucadia Semantic Classification System. [046] FIG. 11 is a list of VALID DIA of the Ucadia Semantic Classification System. [047] FIG. 12 is a VALID DATA SET of DIA of the Ucadia Semantic Classification System. [048] FIG. 13 is a list of the construction steps of NON-PERSPECTIVE LINEAR DIA of the Ucadia Semantic Classification System. [049] FIG. 14 is a list of the construction steps of PERSPECTIVE LINEAR DIA of the Ucadia Semantic Classification System. [050] FIG. 15 is the ORDER of TRANSLATION of the Ucadia Semantic Classification System. [051] FIG. 16 is a list of the translation steps of NON-PERSPECTIVE LINEAR DIA of the Ucadia Semantic Classification System. [052] FIG. 17 is a list of the translation steps of PERSPECTIVE LINEAR DIA of the Ucadia Semantic Classification System. [053] FIG. 18 is a list of the Order of Input Classification of the Ucadia Semantic Classification System. Ucadia Semantic Classification System 10

Claims

1. A Ucadia Semantic Classification System for the consistent assembly and declaration of symbols and objects as defined by (Australian Standard Patent Application SPES-10700414 entitled "Ucadia Classification System" and Australian Standard Patent Application SPES-10705264 entitled "Ucadia Symbols System".

2. A Ucadia Semantic Classification System for the semantic classification of common languages into a universally consistent structure of meaning by using a classification system for the identification and association of theoretical and real world objects and universal language components into a consistent non-duplicating structure (Australian Standard Patent Application SPES-10700414 entitled "Ucadia Classification System" A classification system for the identification and association of theoretical and real world objects and universal language components into a consistent non-duplicating structure, filed 23/05/2008 13:34:40) and then applying a consistent methodology of input preparation and classification. Dependent Claims

3. A Ucadia Semantic Classification System can as claimed in claim 2, wherein all possible theoretical and real statements may be classified according to the classification system whereby there are only thirty-nine (39) possible major combination sets of tense, perspective and intent defined by TENSORS representing a significant part of most meaningful statements.

4. A Ucadia Semantic Classification System can as claimed in claim 3, wherein a standard classification methodology is used so that the common patterns of sentences can be identified early without the immediate need to identify and interpret object words (nouns) in most statements. That this technique of identification of objects last enables earlier and faster translations to the universal classification system.

5. A Ucadia Semantic Classification System can as claimed in claim 4, wherein the classification system and rules of process eliminate the significant need for complex rule sets for interpretation and classification enabling simpler, faster and more accurate translation to the universal classification system. Omnibus Claims Ucadia Semantic Classification System 12

6. The invention can substantially as herein before described with reference to figures 1 - 18 of the accompanying drawings entitled "C003_ucadiasemanticclassificationsystem drawings v1.pdf' Priority Date Transaction Reference Number: SPEP-10684387 Date/time: Fri May 23 15:59:16 AEST 2008 Inventor Inventor: Mr Frank O'Collins Ucadia Semantic Classification System 13