WO2016027170A2 - Lexical analysis tool - Google Patents
Lexical analysis tool Download PDFInfo
- Publication number
- WO2016027170A2 WO2016027170A2 PCT/IB2015/002222 IB2015002222W WO2016027170A2 WO 2016027170 A2 WO2016027170 A2 WO 2016027170A2 IB 2015002222 W IB2015002222 W IB 2015002222W WO 2016027170 A2 WO2016027170 A2 WO 2016027170A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- keywords
- token
- keyword
- tool
- pwtab
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/425—Lexical analysis
Definitions
- This paper provides an algorithm for constructing a lexical analysis tool, by different means than the UNIX Lex tool.
- the input is a keywords table, describing the target language 's keywords, keysymbols, and their semantics, instead of using regular expressions to do so.
- the output is a lexical analyzer for the specific programming language.
- the tool can also be used as a translator engine by inputing a dictionary table, and as a pattern recognizer.
- Tokens may be thought of as the fundamental building blocks of the language.
- a token might be a keyword, a variable name, an integer, an arithmetic operator etc.
- the task of scanning the source statement, recognizing and classifying the various tokens, is known as lexical analysis.
- the part of the compiler that performs this analytic function is commonly called the scanner.
- each statement in the program must be recognized as some language construct, such as a declaration or an assignment statement, described by the grammar. This process which is called parsing, is performed by a part of the compiler that is usually called parser. (See [4] for a simple construction)
- Regular expressions a mathematical logic tool, was soon introduced in order to specify the tokens of a given programming language. Since the theory of regular expressions is dual to that of finite state automata, both were used - the first - to specify tokens, the latter - to describe the process of identifying tokens.
- Lex UNIX
- Lex UNIX
- Lex A lexical analyzer created by Lex behaves in concert with the parser.
- NUM DIGITS, OPTIONAL-FRACTION, OPTIONAL-EXPONENT.
- Comment symbols are not keywords - they are defined in the beginning of the source program and can be modified from there.
- Program to be analyzed must be in file named "program. cpp" . It must have blanks between alphabetical tokens (as is normal practice of program writers) .
- IFX "IFX" is an id.
- Non-alphabetical tokens i.e. keysymbols are of length ⁇ 2.
- mainQ opens "keywords" file. It reads the keywords and keysymbols (skipping comments) , and inserts them into a keyword table.
- filler () is a "blank manager" . It puts blanks between non-alphabetical tokens. Blanks between alphabetical tokens exist according to step 4.
- lexer() It fills the token table simply by fetching strings till the blanks. The suffix punctuation is separated while checking one character backwards that it's not two dots etc.
- comparQ compares the token table with the keyword table and gives lexical analysis results.
- the method is sequential search:
- int numl /* stores number of entries in keyword table */
- f2 fopen("program. cpp" , "r");
- f3 fopen("interlexreslt . cpp" , "w") ;
- si and s2 store one character each, we inch through the file f2 by scanning and analyzing si and s2 each time i.e. we scan two characters and compare them to the given keyword table of length 2 then of length 1. if identity found we put them into f3 with blanks surrounding them */
- fl fopenC'lexreslt . cpp" , "w") ; fclose(f1) ;
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2015/002222 WO2016027170A2 (en) | 2015-12-04 | 2015-12-04 | Lexical analysis tool |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2015/002222 WO2016027170A2 (en) | 2015-12-04 | 2015-12-04 | Lexical analysis tool |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2016027170A2 true WO2016027170A2 (en) | 2016-02-25 |
WO2016027170A3 WO2016027170A3 (en) | 2016-05-12 |
Family
ID=55351346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2015/002222 WO2016027170A2 (en) | 2015-12-04 | 2015-12-04 | Lexical analysis tool |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2016027170A2 (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997007452A1 (en) * | 1995-08-15 | 1997-02-27 | International Software Machines | Programmable compiler |
CN103999081A (en) * | 2011-12-12 | 2014-08-20 | 国际商业机器公司 | Generation of natural language processing model for information domain |
-
2015
- 2015-12-04 WO PCT/IB2015/002222 patent/WO2016027170A2/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2016027170A3 (en) | 2016-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Owens et al. | Regular-expression derivatives re-examined | |
Täckström et al. | Efficient inference and structured learning for semantic role labeling | |
US6529865B1 (en) | System and method to compile instructions to manipulate linguistic structures into separate functions | |
Levine | Flex & Bison: Text Processing Tools | |
US6928448B1 (en) | System and method to match linguistic structures using thesaurus information | |
Rahman et al. | Natural software revisited | |
Dean et al. | Agile parsing in TXL | |
CN106843840B (en) | Source code version evolution annotation multiplexing method based on similarity analysis | |
US7676358B2 (en) | System and method for the recognition of organic chemical names in text documents | |
Van Cranenburgh et al. | Data-oriented parsing with discontinuous constituents and function tags | |
US7779049B1 (en) | Source level optimization of regular expressions | |
Lindén et al. | Hfst—a system for creating nlp tools | |
US5949993A (en) | Method for the generation of ISA simulators and assemblers from a machine description | |
CN112699665A (en) | Triple extraction method and device of safety report text and electronic equipment | |
Van Cranenburgh et al. | Discontinuous parsing with an efficient and accurate DOP model | |
Zhong et al. | Semantic scaffolds for pseudocode-to-code generation | |
Kumar et al. | Sanskrit compound processor | |
US20080141230A1 (en) | Scope-Constrained Specification Of Features In A Programming Language | |
Koskenniemi | Finite state morphology and information retrieval | |
Iwama et al. | Constructing parser for industrial software specifications containing formal and natural language description | |
Paakki | Prolog in practical compiler writing | |
Kantorovitz | Lexical analysis tool | |
Mössenböck | Alex—a simple and efficient scanner generator | |
US20220004708A1 (en) | Methods and apparatus to improve disambiguation and interpretation in automated text analysis using structured language space and transducers applied on automatons | |
Jain et al. | Cascaded finite-state chunk parsing for Hindi language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15834464 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15834464 Country of ref document: EP Kind code of ref document: A2 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15834464 Country of ref document: EP Kind code of ref document: A2 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/10/2018) |