CN1691009A - Method for marking information using computer language - Google Patents

Method for marking information using computer language Download PDF

Info

Publication number
CN1691009A
CN1691009A CN 200410026059 CN200410026059A CN1691009A CN 1691009 A CN1691009 A CN 1691009A CN 200410026059 CN200410026059 CN 200410026059 CN 200410026059 A CN200410026059 A CN 200410026059A CN 1691009 A CN1691009 A CN 1691009A
Authority
CN
China
Prior art keywords
sml
content
text
identification information
computerese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200410026059
Other languages
Chinese (zh)
Inventor
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Datang Telecom Co Ltd
Original Assignee
Xian Datang Telecom Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Datang Telecom Co Ltd filed Critical Xian Datang Telecom Co Ltd
Priority to CN 200410026059 priority Critical patent/CN1691009A/en
Publication of CN1691009A publication Critical patent/CN1691009A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a computer language marking information method, the format of which comprises tag with single form of head tag and no tail tag, and content of sub-tag and its corresponding content. The content can comprise one row or more rows, if later or it is sub-tag and its corresponding content, then it becomes to content block; the content of tag, content block, sub-tag and its corresponding content can nest with multiple layers with layer number unlimited and tree-shaped logic structure. This method can describe structure clear, with higher readability; with single form of head tag, it is easy to write analysis program; the SML file can converse to XML file; use corresponding tool software on SML can realize the analysis, creation, editing, and the data conversion function with XML file, HTML file and data in database for SML file.

Description

The method of computerese identification information
Technical field
The invention belongs to technical field of computer information processing, particularly a kind of method of computerese identification information.
Background technology
The sign of information has two kinds of basic skills: 1, adopt the natural language sign, i.e. title; 2, adopt the numerical coding mode to identify, i.e. code.For example, for the employee and the post of a company, employee's name and post title belong to the natural language sign, and job number and post coding belong to the coded system sign.Adopt the method for natural language identification information can be grasped everyone understanding of this natural language, can be used in daily life, work in the interchange, also can be used in computer realm.The information that adopts the numerical coding mode to identify is not easy to be understood by the people, and purposes is confined to computer realm or certain professional domain.
At present, (ExtensibleMarkup Language, XML), it is standard generalized markup language (Standard Generic MarkupLanguage, subclass SGML) to adopt the method for natural language identification information that extend markup language is arranged.XML is by XML working group (original SGML edits examination board) exploitation, and this working group is presided in 1996 by World Wide Web Consortium (W3C) and sets up.XML has defined the preservation and the exchange regulation of information with text mode (promptly adopting the natural language sign), making can be with existing HTML (Hypertext Markup Language) (Hypertext Markup Language on Web, HTML) use-pattern provides, and receiving and handle general SGML becomes possibility.Because XML has adopted the natural language identification information, and is easy to understand, reduced maintenance and exchanged cost.At present, XML has obtained widespread use in the Internet field.
In computing machine, adopt the information of natural language sign to adopt text mode to preserve and processing, adopt the information of digital coding identification to adopt binary mode to preserve and processing.
Adopt the method for natural language identification information easy to understand, but if employing computer Recognition and processing also need following work:
1. elimination ambiguity;
2. increase the information combination rule;
3. increase the information Recognition rule.
Summary of the invention
The objective of the invention is to, a kind of method of computerese identification information is provided.For same information, it is more clear than the description scheme of Window ini file form to adopt this method to describe, readable high; Than adopting XML to describe easier reading.The single form that this method adopts head to sign is easy to write analysis software.The tool software that XML is relevant can be finished the analysis to text, generates text.By this method, text-converted can be become other form, for example, and the XML text, html text, the data in the database, thus can utilize existing tool software that information is further processed.
The technical solution that realizes the foregoing invention purpose is, creates a kind of method of computerese identification information, and it is characterized in that: message identification is by label (Tag) and content (Content) formation, the single form that its label adopts head to sign, do not have tail tag to sign, promptly label is preceding, content after; Content can comprise delegation or multirow, if content is a multirow, perhaps content is the content of subtab and subtab correspondence, then becomes content piece (Content Block); Can the content (Sub Content) of subtab (SubTag) and subtab correspondence in the content piece; The content of label, content piece, subtab, subtab correspondence can multilayer nest, and the number of plies is unrestricted, and its logical organization is tree-shaped.
(1) basic structure
The SML language is by shifting symbol, and note and element are formed, relevant being defined as follows:
Escape character (Escape): back slash " " be escape character, identical with the escape character of C language;
Note (Comment): be one of following: 1. comment line or 2. note piece;
Comment line (Comment Line): with double slashes " // " beginning, be thereafter note, finish up to one's own profession; Identical with the capable note of C Plus Plus;
Note piece (Comment Block): with "/* " beginning, finish with " */", middle part is a notes content; Identical with the note of C language;
Element (Element): by label and one of following composition the: 1. content piece, 2. content.
Label (Tag): finger URL " "+tag name, " " are reserved character, and tag name is defined by the user, meet following rule: (1) first character is not a finger URL, and perhaps first character is that escape character adds Ding Weifu; (2) can not comprise blank character; (3) do not comprise slash "/";
Content piece (Content Block): form: (1) a pair of brace " { ", " } " and be included in this to the element in the brace by one of following; (2) a pair of brace " { ", " } " and be included in this to the content in the brace;
Content (Element Content): between two labels, perhaps blank (Blank) and the non-blank-white (Non-blank) after last label;
Blank (Blank): the blank character that one or more is continuous, the line feed character that the non-blank-white row is last;
Blank character (Blank Character): space (' '), tab (' t ');
Non-blank-white row (Non Blank Line): the row that has a non-blank-white character before the line feed character at least;
Non-blank-white character (Non Blank Line): except blank character, the character beyond the line feed character;
Non-blank-white (Non-blank): the non-blank-white character that one or more is continuous, perhaps a character string (String);
Character string (String): be included in the character between the double quotation marks, meet following rule: can not be line feed character;
Line feed character (Line Feed): (' n ');
Blank line (Blank Line): have only one or more blank character before the line feed character, do not have the row of non-blank-white character;
(2) label of SML
First character is not a finger URL, and perhaps first character is that escape character adds Ding Weifu;
The SML tag name can not comprise blank character;
Do not comprise slash "/" in the SML tag name;
SML tag name suggestion:, then adopt lowercase if bookmark name is a word;
The suggestion of SML tag name: if label two or more words by name then adopt underscore to connect, each word adopts lowercase;
The suggestion of SML tag name: the abb. in the tag name is considered as a common words;
(3) character string of SML
SML is considered as a complete non-blank-white (Non-blank) with character string;
Character string can appear at the optional position in any delegation in the element content, and finishes in one's own profession;
Can comprise one or more character string in the element content;
SML does not explain the content in the character string,, keeps all characters in the character string that is;
If in the character string content double quotation marks is arranged, tab, newline should use escape character;
If finger URL " @ " is arranged in the character string, can not use escape character;
(4) content of SML
If the front delegation of a non-blank-white row is a blank line, then this non-blank-white row is a begin column;
If the back delegation of a non-blank-white row is a blank line, then this non-blank-white row is an end line;
From initial row (comprising) to thereafter first end line (comprising), be called a paragraph;
SML keeps the newline that paragraph is last, and the newline in the middle of this paragraph is defined as blank.That is, if this delegation is an end line, then keep newline thereafter, if not end line, then newline thereafter is considered as blank;
SML is the continuous blank character beyond the character string in the delegation, and the line feed character in the middle of the paragraph is defined as blank, and its effect is equivalent to a space, only represents that the literal of front and the literal of back do not connect together;
(5) SML tree
The logical organization of SML is tree-shaped, is called the SML tree;
A node in the corresponding SML tree of each element of SML;
The leaf node of SML tree comprises label and content, and content can be sky; This situation is not equivalent to not content, and is interpreted as, and content exists, but content is empty;
The non-leaf node of SML tree only comprises label, not content;
(6) visit of SML tree
In the SML tree, the path of the path dactylus point of element, the path of node are to begin to the sign of all nodes formations of this node from relative node;
In the SML tree, a node can have two kinds of paths: named track (Named Path) and index path (Index Path);
The named track of a node is made of tag name, that is, the sign that the tag name of the element of all nodes from relative node to this node constitutes, with slash character "/" as separator;
The index path of a node is made of the node index, that is, the sign that the index of all nodes from relative node to this node constitutes, with slash character "/" as separator; Index is made of Ding Weifu @+ index value;
(7) use of colon
Between label prefix character " @ " and tag name, use ": ";
The effect of separator ": " is: 1. identification easily visually; 2. under the Windows system, double-click with mouse that to grab speech more convenient, in the ordinary course of things, can not choose ": " during double-click, if there is not separator, then double-clicks and grab speech and can choose finger URL " @ "; Can not use the colon ": " of Chinese character, the colon of Chinese character can not play compartmentation.
This table is known information approach and is had following technical characterictic: be illustrated in fig. 1 shown below, analyze the text of SML, check syntax and semantics, if inspection is passed through, then discern the content in the SML text automatically, the data that generation can further be processed.
Be illustrated in fig. 2 shown below, user's data is encoded according to the grammer of SML, produce the SML text.
Be illustrated in fig. 3 shown below, for the user provides a pattern manipulation interface, can show the SML tree to the user, the user can select icon or literal, is inserted into the assigned address of SML tree, revises or delete the node in the SML tree; After modification is finished, generate the text of SML.
Be illustrated in fig. 4 shown below, a text editor is provided, make the user can edit the SML text, this editing machine has preferably SML to be supported, can show the key word of SML with different colors, can discern the SML grammar mistake of text.
Be illustrated in fig. 5 shown below, the SML text is converted to the XML text according to certain rule, the related tool that facilitates the use XML is further processed, and inquiry perhaps shows; The XML text is converted to the SML text according to certain rule.
Be illustrated in fig. 6 shown below, the SML text is converted to html text according to certain rule, the related tool that facilitates the use HTML is further processed, and inquiry perhaps shows; Html text is converted to the SML text according to certain rule.
Be illustrated in fig. 7 shown below, the SML text is converted to data in the database according to certain rule, the related tool that facilitates the use database is further processed, and inquiry perhaps shows; Data in the database are converted to the SML text according to certain rule.
Method of the present invention has following characteristics:
1. employing natural language description, the people understands easily;
2. compare with existing other identification method, the SML structure is more clear;
3. any information can be described;
4. label is easy to expansion, can come the extend information structure by extension tag;
5. any natural language can use SML to be described, and is not subjected to the restriction of natural language (as Chinese, English);
6. compare the consistance height of SML with existing other identification method;
7. support structuring to describe;
8. logical organization is tree-shaped, is called the SML tree;
9. compare with existing other form of identification, SML is easier to be discerned and processes by computer software;
10. the own void value of non-leaf node in the tree structure is only represented the previous stage title of its all subtabs, the perhaps set of its all subtabs.
Description of drawings
Fig. 1 is a SML analyzer synoptic diagram;
Fig. 2 is a SML scrambler synoptic diagram;
Fig. 3 is a SML maker synoptic diagram;
Fig. 4 is a SML editing machine synoptic diagram;
Fig. 5 is SML and XML conversion synoptic diagram;
Fig. 6 is SML and HTML conversion synoptic diagram;
Fig. 7 is SML and database conversion synoptic diagram;
Fig. 8 is the SML tree of the logical organization correspondence of information;
Fig. 9 is the another kind of form of SML tree.
Embodiment
The present invention is described in further detail below in conjunction with drawings and Examples.
The method of this table knowledge information has the linguistic computer language independent, and the information that adopts SML to describe can be implemented on the various computing systems platform.
The method of this table knowledge information is made of label (Tag) and content (Content).The single form that its label adopts head to sign does not have tail tag to sign, that is, label is preceding, content after, meet people's reading habit.
Can the content (Sub Content) of subtab (Sub Tag) and subtab correspondence in the content piece.The content of label, content piece, subtab, subtab correspondence can multilayer nest, and the number of plies is unrestricted, and its logical organization is tree-shaped, is called SML tree (Tree).
Content can comprise delegation or multirow, if content is a multirow, perhaps content is the content of subtab and subtab correspondence, then becomes content piece (Content Block).
The implementation of label: in order to be easy to computer Recognition, label can adopt the method for special identifier, but method is not limited to the method for special identifier.Special identifier can adopt the prefix identification method, but is not limited to this a kind of method.The prefix identification method can adopt single character identification method, but is not limited to this a kind of method.Single character mark can adopt " " character mark, but is not limited to this character.In order to increase visual effect, and be easy in the operation of Windows series, grab speech, can increase by one ": " in " " back, but be not limited to this a kind of method with mouse.
The implementation of content piece: read in order to be easy to the people, the content piece can adopt the method for head and the tail special identifier, but is not limited to this a kind of method.The method of head and the tail special identifier can adopt a pair of big brace, and " { } ", but be not limited to this a kind of method.
For example, a kind of implementation of employing SML has been described the relevant information of certain company personnel A below, comprises work unit, contact method.
@: employee A
{
@: company of work unit
@: technology department of department
@: contact method
{
@: landline telephone
{
@: the * * * * of office 9552
@: machine room * * * * 9516
}
@: mobile phone * * * * * * * 6993
@: Email
{
@: the mailbox [email protected] of intra-company
@: outside mailbox [email protected]
}
}
}
Wherein, with @: beginning, as @: the employee, @: work unit etc. are called label, and content is followed in the label back, and what surround with big brace is the content piece, and the label in the content piece is a subtab.
The logical organization of above information is tree-shaped, and corresponding SML tree as shown in Figure 8.
This SML tree also can be expressed as the form of Fig. 9.
The own void value of non-leaf node of SML tree is only represented the previous stage title of its all subtabs, the perhaps set of its all subtabs.
The comparison of SML and Windows ini file form:
1, SML supports the structuring message structure, and Windows ini file form is not supported the structuring message structure, can only have one-level nested.
2, SML supports tree-shaped message structure, and Window ini file form is not supported tree structure, and its logical organization is tree-shaped.
For example, the description of the information of employee A employing Windows ini file then is following form:
[employee A]
Work unit=certain company
Department=technology department
Contact method=[employee A/ contact method]
[employee A/ contact method]
Landline telephone=[employee A/ contact method/landline telephone]
Mobile phone=* * * * * * * 6993
Email=[employee A/ contact method/Email]
[employee A/ contact method/landline telephone]
Office=* * * * 9552
Machine room=* * * * 9516
[employee A/ contact method/Email]
Mailbox=the [email protected] of intra-company
Outside [email protected] as can be seen, SML is more clear than the description scheme of Window ini file form to the description of identical information, and is readable high.
The comparison of SML and XML form:
1, SML adopts the single form that head is signed, and XML adopts head to sign, and tail tag is signed the form that occurs in pairs, and by contrast, SML is easy to read;
For example, the information of employee A employing XML description then is following form:
<employee A 〉
<work unit〉certain company</work unit 〉
<department〉technology department</department 〉
<contact method 〉
<landline telephone 〉
<office〉* * * * 9552</office 〉
<machine room〉* * * * 9516</machine room 〉
</landline telephone 〉
<mobile phone〉* * * * * * * 6993</mobile phone 〉
<Email 〉
<intra-company mailbox〉[email protected]</intra-company's mailbox 〉
<outside mailbox〉[email protected]</outside mailbox 〉
</Email 〉
</contact method 〉
</employee A 〉
As can be seen, SML is to the description of the identical information easier reading of description than XML;
2, SML adopts the single form that head is signed, and is easy to write analysis software;
3, the non-leaf node of SML content not, the non-leaf node of XML can comprise attribute.Same information uses XML to describe, and multiple implementation can be arranged, and SML describes then single relatively, and like this, the consistance of the SML file of different people's designs is than the consistance height of XML file.For example, the information of employee A also can be described as with XML:
<employee A work unit=certain company, department=technology department 〉
<contact method 〉
<landline telephone 〉
<office〉* * * * 9552</office 〉
<machine room〉* * * * 9516</machine room 〉
</landline telephone 〉
<mobile phone〉* * * * * * * 6993</mobile phone 〉
<Email 〉
<intra-company mailbox〉[email protected]</intra-company's mailbox 〉
<outside mailbox〉[email protected]</outside mailbox 〉
</Email 〉
</contact method 〉
</employee A 〉
The difference of above description is that with " work unit " " department " is as the attribute description of employee A, and conduct " label/content " is not described, different people's viewpoints may be different, therefore, same information, the XML that different people writes out describes may be different;
From as can be seen above, the consistance of SML is than XML height;
4, the non-leaf node of SML content not is easy to write analysis software;
5, the XML file can be converted on the SML file principle, thereby XML existing information system of processing can be utilized.
Window ini file form and other non-structured configuration file format can be upgraded to the SML configuration file format.The SML configuration file can be used in computer software, automatic control equipment, the communication facilities, improves the readability of configuration file.

Claims (10)

1. the method for a computerese identification information is characterized in that, identification information is made of label and content, and the single form that its label adopts head to sign does not have tail tag to sign, and promptly label is preceding, content after; Content can comprise delegation or multirow, can be the content of subtab and subtab correspondence in the content piece, and the content of label, content piece, subtab, subtab correspondence can multilayer nest, and the number of plies is unrestricted, and its logical organization is tree-shaped.
2, the method for computerese identification information as claimed in claim 1 is characterized in that, analyzes the text of SML, checks syntax and semantics, if inspection is passed through, then discerns the content in the SML text automatically, the data that generation can further be processed.
3, the method for computerese identification information as claimed in claim 1 is characterized in that, user's data is encoded according to the grammer of SML, produces the SML text.
4, the method for computerese identification information as claimed in claim 1, it is characterized in that, for the user provides a pattern manipulation interface, show the SML tree to the user, the user selects icon or literal, be inserted into the assigned address of SML tree, revise or delete the node in the SML tree, generate the text of SML.
5, the method for computerese identification information as claimed in claim 1 is characterized in that, a text editor is provided, and the user can edit the SML text, with the key word of different color demonstration SML, the SML grammar mistake of identification text.
6, the method for computerese identification information as claimed in claim 1 is characterized in that, the SML text is converted to the XML text according to certain rule, is convenient to utilize the related tool of XML further to process, and inquiry perhaps shows; The XML text is converted to the SML text according to certain rule.
7, the method for computerese identification information as claimed in claim 1 is characterized in that, the SML text is converted to html text according to certain rule, and the related tool that facilitates the use HTML is further processed, and inquiry perhaps shows; Html text is converted to the SML text according to certain rule.
8, the method for computerese identification information as claimed in claim 1 is characterized in that, the SML text is converted to data in the database according to certain rule, and the related tool that facilitates the use database is further processed, and inquiry perhaps shows; Data in the database are converted to the SML text according to certain rule.
9, the method for computerese identification information as claimed in claim 1 is characterized in that, SML comprises: the content of the label of basic structure, SML, the character string of SML, SML, SML tree, the visit of SML tree and the use of colon.
10, the method for computerese identification information as claimed in claim 1 is characterized in that, in order to be easy to computer Recognition, label can adopt the method for special identifier; Special identifier can adopt the prefix identification method; The prefix sign can adopt single character identification method; The content piece adopts the method for head and the tail special identifier.
CN 200410026059 2004-04-22 2004-04-22 Method for marking information using computer language Pending CN1691009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410026059 CN1691009A (en) 2004-04-22 2004-04-22 Method for marking information using computer language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410026059 CN1691009A (en) 2004-04-22 2004-04-22 Method for marking information using computer language

Publications (1)

Publication Number Publication Date
CN1691009A true CN1691009A (en) 2005-11-02

Family

ID=35346452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410026059 Pending CN1691009A (en) 2004-04-22 2004-04-22 Method for marking information using computer language

Country Status (1)

Country Link
CN (1) CN1691009A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452443A (en) * 2007-12-06 2009-06-10 富士通株式会社 Recording medium for recording logical structure model creation assistance program, logical structure model creation assistance device and logical structure model creation assistance method
CN101094194B (en) * 2006-06-19 2010-06-23 腾讯科技(深圳)有限公司 Method for picking up web information needed by user in web page
CN104035777A (en) * 2014-06-24 2014-09-10 内蒙古银安科技开发有限责任公司 Address batch-generation method for public-security grassroot basic information acquiring software of handheld devices

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094194B (en) * 2006-06-19 2010-06-23 腾讯科技(深圳)有限公司 Method for picking up web information needed by user in web page
CN101452443A (en) * 2007-12-06 2009-06-10 富士通株式会社 Recording medium for recording logical structure model creation assistance program, logical structure model creation assistance device and logical structure model creation assistance method
CN101452443B (en) * 2007-12-06 2011-11-23 富士通株式会社 Recording medium for recording logical structure model creation assistance program, logical structure model creation assistance device and logical structure model creation assistance method
CN104035777A (en) * 2014-06-24 2014-09-10 内蒙古银安科技开发有限责任公司 Address batch-generation method for public-security grassroot basic information acquiring software of handheld devices

Similar Documents

Publication Publication Date Title
CN1161701C (en) Speech recognition device, method and recording medium for storing program of the speech recognition device
CN1174332C (en) Method and device for converting expressing mode
CN1120442C (en) File picture processing apparatus and method therefor
CN1368693A (en) Method and equipment for global software
CN1168216C (en) Document managing apparatus, data compressing method, and data decompressing method
CN1328668A (en) System and method for specifying www site
CN1439979A (en) Solution scheme data editing process and automatic summarizing processor and method
CN1773508A (en) Method for converting source file to target web document
CN1186287A (en) Method and apparatus for character recognition
CN1330333A (en) Chinese input transformation device and input transformation processing method and recording medium
CN1542649A (en) Linguistically informed statistical models of constituent structure for ordering in sentence realization for a natural language generation system
CN1705958A (en) Method of improving recognition accuracy in form-based data entry systems
CN1858786A (en) Electronic file formatting annotate and comment system and method
CN101038550A (en) Information processing apparatus and information processing method
CN1379882A (en) Method for converting two-dimensional data canonical representation
CN1859359A (en) Realizing method and its device for communication protocol described by abstract grammar rule
CN1313561A (en) Method and device for information structuring and using
CN1828591A (en) Command-line data-type discovery and conversion
CN1519753A (en) Program, character inputting and editing method, device and recording medium
CN100351831C (en) Method and system for chatting service providing realtime tag analysis
CN1244057C (en) Contents server device
CN1860472A (en) Method and device for editing svg digital graphical documents using a browser
CN1691009A (en) Method for marking information using computer language
CN100351847C (en) OCR device, file search system and program
CN1741018A (en) HTTP web page dynamic outputting method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication