JP2008217809A

JP2008217809A - Structured document converting device

Info

Publication number: JP2008217809A
Application number: JP2008095971A
Authority: JP
Inventors: Shigeru Yoshida; 茂吉田; Hironori Yahagi; 裕紀矢作; Nobuko Itani; 宣子井谷
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2008-04-02
Filing date: 2008-04-02
Publication date: 2008-09-18
Anticipated expiration: 2021-12-28
Also published as: JP4571991B2

Abstract

<P>PROBLEM TO BE SOLVED: To achieve both the reduction of memory usage and the improvement in a process speed when processing a structured document by reducing resources required for operations of the structured document. <P>SOLUTION: A processing part reads the distinction information for distinguishing the structured document into key components subject to data processing and non-key components not subject to such data processing, creates new components for the non-key component to which predetermined tag names and predetermined attribute names are added, creates a tag name character string including the tag names of the non-key components connecting the tag names of the non-key components interposing delimiters therebetween, describes the new components as attribute values corresponding to the predetermined attribute names, creates a content character string including the contents of the non-key components connecting the contents of the non-key components interposing delimiters therebetween, describes it as the contents of the new components, and describes the key components in the converted structured document as it is. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ＸＭＬ(eXtensible Markup Language)等の構造化文書を取り扱うシステムに適用される技術に関し、そのシステムにおいて処理速度を高速化するとともに使用メモリ量を削減すべく、構造化文書のデータ構造や、構造化文書を成す文字列を変換するための技術に関する。
ＸＭＬ文書は、その特徴によって、次の２種類に大きく分類される。一つは、伝票，予定表など，タグ数が多く、要素内容が比較的短い、データ系ＸＭＬ文書である。また、もう一つは、雑誌，マニュアル，辞典など、要素内容が比較的長い文章になる文書系ＸＭＬ文書である。本発明は、前者のデータ系ＸＭＬ文書を処理対象とする場合に用いて好適の技術で、特に、表形式で表わされ、データベースのように扱われるＸＭＬ文書を処理対象とする場合に用いて好適の技術である。 The present invention relates to a technology applied to a system that handles structured documents such as XML (eXtensible Markup Language), and in order to increase the processing speed and reduce the amount of memory used in the system, The present invention relates to a technique for converting a character string constituting a structured document.
XML documents are roughly classified into the following two types according to their characteristics. One is a data-type XML document such as a slip or a schedule, which has a large number of tags and a relatively short element content. The other is a document-type XML document such as a magazine, a manual, or a dictionary that has a relatively long element content. The present invention is a technique suitable for use when processing the former data-type XML document, and particularly used when processing an XML document expressed in a table format and treated like a database. This is a preferred technique.

近年、インターネットを通して、個人，企業，自治体など、あらゆる種類のシステムが相互に通信可能に接続されており、これらのシステムが相互に連携してＷｅｂサービスが提供されたりＥＤＩ（Electronic Data Interchange）やＥＣ（Electronic Commerce）が行なわれたりしつつある。このため、幅広い情報の交換が必要になってきている。このような状況下において、ＸＭＬは、データを構造化する柔軟な表現能力を有し、コンピュータによる処理に適しているので、上記のシステム間のデータ交換や各システムでのデータ処理を行なう際の、共通基盤のフォーマットとして注目されている。 In recent years, various types of systems, such as individuals, companies, and local governments, are connected through the Internet so that they can communicate with each other, and these systems cooperate with each other to provide Web services, EDI (Electronic Data Interchange), and EC. (Electronic Commerce) is being carried out. For this reason, it is necessary to exchange a wide range of information. Under such circumstances, XML has a flexible expression ability for structuring data and is suitable for processing by a computer. Therefore, when performing data exchange between the above systems and data processing in each system, , Has been attracting attention as a common base format.

ＸＭＬは、１９８６年ＩＳＯ（International Organization for Standardization）で標準化されたＳＧＭＬ(Standard Generalized Markup Language)をインターネットで活用し易くするために、１９９８年２月にその基本仕様ＸＭＬ1.0としてＷ３Ｃ(World Wide Web Consortium)において策定されたものである。Ｗｅｂページ作成言語であるＨＴＭＬ(HyperText Markup Language)は、タグが固定され、表示に特化されたものとなっており、タグ情報を基にコンピュータで情報を処理したいという要件に対応することができないという問題があった。これに対し、ＸＭＬは、利用者によって自由にタグを定義され、文書中の文字列に対し意味付けを行なえる言語構造を有している。このようなＸＭＬで文書を記述した場合、その文書を、タグ情報に基づいてコンピュータで処理することが可能になる。 In order to make it easy to use SGML (Standard Generalized Markup Language) standardized by ISO (International Organization for Standardization) in 1986 on the Internet, XML was announced as W3C (World Wide Web) as its basic specification XML 1.0 in February 1998. Consortium). HTML (HyperText Markup Language), a Web page creation language, has a fixed tag and is specialized for display, and cannot meet the requirement to process information on a computer based on the tag information. There was a problem. On the other hand, XML has a language structure in which tags are freely defined by a user and meaning can be given to a character string in a document. When a document is described in such XML, the document can be processed by a computer based on tag information.

ここで、以下の説明で使用される用語について、ＸＭＬ規格に基づき説明しておく。一対の“<”と“>”とで囲まれた文字列を「タグ」、“<文字列>”を「開始タグ」、“</文字列>”を「終了タグ」、“<文字列/>”を「空要素タグ」、開始タグから終了タグまでの文字列全体を「要素」、開始タグと終了タグとで挟まれた文字列を「要素内容（単に内容と呼ぶ場合もある）」、タグ内に記述される要素の名前を「要素名(あるいはタグ名)」、要素に対する付加情報を「属性」と呼ぶ。 Here, terms used in the following description will be described based on the XML standard. A string enclosed by a pair of "<" and ">" is "tag", "<string>" is "start tag", "</ string>" is "end tag", "<string /> ”Is an“ empty element tag ”, the entire character string from the start tag to the end tag is“ element ”, and the character string sandwiched between the start tag and end tag is“ element content (sometimes referred to simply as content) ” ", The name of the element described in the tag is called" element name (or tag name) ", and the additional information for the element is called" attribute ".

構造化文書では、その文書中にタグを埋め込む形でデータ構造が記述される。このようにデータ構造をタグとして文書に埋め込んだ構成を採ることにより、データ項目の追加削除変更に対して柔軟性や拡張性が得られるほか、タグ名に、ユーザが読んで意味のある名前を付けることにより、その構造化文書データに視認性を持たせることもできる。
ところで、ＸＭＬ文書に対する処理の高速化やメモリ使用量の削減等をはかってＸＭＬ文書に対する処理性能を向上させる際には、一般に、基盤ソフトウエアの実装の高性能化をはかることが主流になっている。しかし、このような手法のほかに、ＸＭＬ文書自体に予め加工を施しておくことによっても、ＸＭＬ文書に対する処理性能を向上させることが可能である。本発明は、後者の手法（ＸＭＬ文書を加工して処理性能の向上をはかる手法）に関連するものであり、ここで、後者の手法に係る従来技術について説明する。 In the structured document, the data structure is described in such a manner that tags are embedded in the document. In this way, by adopting a structure in which the data structure is embedded in the document as a tag, flexibility and extensibility can be obtained for addition / deletion / change of the data item. By attaching it, the structured document data can be given visibility.
By the way, in order to improve the processing performance for an XML document by increasing the processing speed for an XML document or reducing the memory usage, in general, it has become mainstream to improve the performance of the base software. Yes. However, in addition to such a method, it is possible to improve the processing performance for an XML document by previously processing the XML document itself. The present invention relates to the latter method (a method for improving the processing performance by processing an XML document), and here, a conventional technique related to the latter method will be described.

〔ａ１〕従来技術１
日経コンピュータ誌2001.3.12号の記事「見えてきた万能幻想の真実ＸＭＬの“常識”を覆す」に、ＸＭＬ導入時に処理速度が遅くなる問題が発生し、データ構造を変更することにより、問題に対処する事例が開示されている。住友電工システムズの例(同誌のp.64-65参照)では、同種のデータを、ＣＳＶ(Comma Separated Value)形式でまとめて記述し、まとめられたデータを、ＸＭＬデータ中の一つのタグ中に埋め込む。例えば、ＸＭＬデータの定義情報を変更し、１カ月分のＸＭＬデータを日付順にコンマで区切ってまとめている。 [A1] Prior art 1
The issue of Nikkei Computer Magazine 2001.3.12 issue “Overturning the“ common sense ”of the Truth of Universal Illusions” that appears ”caused a problem that the processing speed slowed down when XML was introduced, and it became a problem by changing the data structure. Cases to deal with are disclosed. In the example of Sumitomo Electric Systems (see p.64-65 of the same magazine), the same type of data is described in CSV (Comma Separated Value) format, and the combined data is stored in one tag in the XML data. Embed. For example, the definition information of the XML data is changed, and the XML data for one month is grouped by separating them with commas in order of date.

具体的には、
<KOUSU day="01">8.0</KOUSU><KOUSU day="02">5.5</KOUSU>…
…<KOUSU day="31">12.8</KOUSU>
というように、別々のタグに記述されていた毎日の実績に関するデータを、
<KOUSU day="01,02,…,31" data="8.0,5.5,…,12.8"></KOUSU>
といった形式で、月単位にまとめるように、元の文書を書き換えている。 In particular,
<KOUSU day = "01"> 8.0 </ KOUSU><KOUSU day = "02"> 5.5 </ KOUSU>…
… <KOUSU day = "31"> 12.8 </ KOUSU>
As such, data on daily performance that was described in separate tags,
<KOUSU day = "01,02,…, 31" data = "8.0,5.5,…, 12.8"></KOUSU>
In this format, the original document is rewritten so that it can be collected monthly.

このような変更により、１ヶ月分のデータを参照する際には、データベース・サーバへの照会は１回で済むようになり、ＸＭＬの定義情報の送信も１回だけで済むほか、データ容量も１０分の１に減ったとしている。このような従来技術１による手法は、データ処理に用いられる同種のデータを一つのタグにまとめるものであって、同種のデータを持つ特定のデータに適用される技術であり、改善の効果はデータに依存する。 As a result of this change, when referring to the data for one month, the database server can be inquired only once, the XML definition information need only be transmitted once, and the data capacity is also reduced. It is said that it has decreased to 1/10. Such a technique according to the prior art 1 is a technique for collecting the same kind of data used for data processing into one tag, and is applied to specific data having the same kind of data. Depends on.

〔ａ２〕従来技術２
ＸＭＬ文書のレコード項目（フィールド）が、データ処理の対象となるキー要素（鍵要素）と、そのデータ処理の対象とならない要素（非キー要素，非鍵要素）とに分けられる場合、下記項目〔ａ２−１〕，〔ａ２−２〕に示すように、キー要素を残し、非キー要素をまとめて別ファイルにすることができる。このとき、非キー要素は、キー要素から、属性としての識別情報(id)を用いて引用される。このような従来技術２では、キー要素だけでデータ処理する場合には、キー要素のみに負荷を限定することができるが、検索の用途などで対象レコードを抽出し、キー要素と非キー要素とを一緒に表示したい場合には、非キー要素を別ファイルから読み出してキー要素と一緒にする必要があり、非常に手間がかかる。 [A2] Prior art 2
When the record items (fields) of an XML document are divided into key elements (key elements) that are subject to data processing and elements (non-key elements, non-key elements) that are not subject to data processing, the following items [ As shown in [a2-1] and [a2-2], it is possible to leave the key elements and combine the non-key elements into another file. At this time, the non-key element is quoted from the key element using identification information (id) as an attribute. In such prior art 2, when data processing is performed using only key elements, the load can be limited only to the key elements. However, the target record is extracted for the purpose of search, and the key elements and non-key elements are extracted. When it is desired to display the non-key elements together, it is necessary to read the non-key elements from another file and combine them with the key elements, which is very laborious.

〔ａ２−１〕原ＸＭＬ文書の具体例
<名簿>
<個人><名前>Aさん</名前><会社>A社</会社><部署>A部</部署><住所>A市</住所><電話>123</電話></個人>
<個人><名前>Bさん</名前><会社>B社</会社><部署>B部</部署><住所>B市</住所><電話>456</電話></個人>
</名簿>
〔ａ２−２〕２つのファイルへの分割例
上記原ＸＭＬ文書のうち、キー要素(名前，会社)と非キー要素（部署，住所，電話）とを別々のファイル、つまりキー要素のＸＭＬ文書と非キー要素のＸＭＬ文書とに分け、キー要素のＸＭＬ文書では、新たにタグ名「情報」の空要素タグを作成するとともに、その空要素タグにおける属性(id)によってキー要素と非キー要素のＸＭＬ文書とを関連付ける。別ファイルで、タグ名「情報」の要素に非キー要素をまとめておき、ｉｄ属性と対応する属性(ref)を用いて非キー要素を参照する。 [A2-1] Specific example of original XML document
<Roster>
<Individual><Name> A-san </ Name><Company> Company A </ Company><Department> A Department </ Department><Address> A City </ Address><Phone> 123 </ Phone></ Person >
<Individual><Name> B-san </ Name><Company> Company B </ Company><Department> B Department </ Department><Address> B City </ Address><Phone> 456 </ Phone></ Person >
</ Roster>
[A2-2] Example of division into two files Of the original XML document, the key element (name, company) and the non-key element (department, address, telephone) are separated into separate files, that is, the XML document of the key element In the XML document of the key element, a new empty element tag with the tag name “information” is newly created, and the attribute (id) of the empty element tag is used to identify the key element and the non-key element XML document. Associate with XML document. In a separate file, the non-key elements are grouped in the element of the tag name “information”, and the non-key elements are referenced using the attribute (ref) corresponding to the id attribute.

・キー要素のＸＭＬ文書
<名簿>
<個人><名前>Aさん</名前><会社>A社</会社><情報 id="1"/></個人>
<個人><名前>Bさん</名前><会社>B社</会社><情報 id="2"/></個人>
</名簿>
・非キー要素のＸＭＬ文書
<名簿>
<情報ref="1"><部署>A部</部署><住所>A市</住所><電話>123</電話></情報>
<情報ref="2"><部署>B部</部署><住所>B市</住所><電話>456</電話></情報>
</名簿>
〔ａ３〕従来技術３
従来技術３では、ＸＭＬデータの階層を指定して、それ以下の階層におけるデータをＸＭＬ専用の圧縮ソフトＸＭＬＺｉｐにより圧縮する。データベース形式のＸＭＬデータでは、レコード毎に圧縮ファイルが作成され、圧縮されたＸＭＬデータを部分的に復元することができるようになっている。ＸＭＬ文書をレコード毎に解凍できるようにすることで、メモリの制約を回避することができる。ただし、このような従来技術３では、１レコード当たりのサイズ（データ量）が大きくない場合には、有効な圧縮率を得ることができない。 -XML document of key elements
<Roster>
<Personal><name> A-san </ name><company> company A </ company><information id = "1"/></personal>
<Personal><name> B-san </ name><company> company B </ company><information id = "2"/></personal>
</ Roster>
・ XML document of non-key elements
<Roster>
<Information ref = "1"><Department> A Department </ Department><Address> A City </ Address><Telephone> 123 </ Telephone></Information>
<Information ref = "2"><Department> B Department </ Department><Address> B City </ Address><Telephone> 456 </ Telephone></Information>
</ Roster>
[A3] Prior art 3
In the prior art 3, the hierarchy of XML data is designated, and the data in the hierarchy below it is compressed by XML compression software XMLZip. In the database format XML data, a compressed file is created for each record, and the compressed XML data can be partially restored. By allowing the XML document to be decompressed for each record, memory restrictions can be avoided. However, with such a conventional technique 3, when the size (data amount) per record is not large, an effective compression rate cannot be obtained.

ところで、代表的な構造化文書であるＸＭＬ文書に対しては、そのＸＭＬ文書を応用ソフトウエア（アプリケーション）によって扱うために、ＤＯＭ(Document Object Model)とＳＡＸ(Simple API for XML)と呼ばれる、２つの標準的なインターフェイス(ＡＰＩ：Application Programming Interface)が定められている。ＳＡＸは、一般に処理速度が高速で、処理時のメモリ使用量が小さいが、時系列出力を行なうものであり、データを参照するだけの簡単な処理に向くという特徴を有している。これに対し、ＤＯＭは、一般に処理速度が低速で、処理時のメモリ使用量が大きいが、ＸＭＬ文書の要素を階層的なツリー（ＤＯＭツリー）に展開するため、複雑な処理内容に対してもプログラムを組みやすいという特徴を有している。 By the way, an XML document which is a typical structured document is called DOM (Document Object Model) and SAX (Simple API for XML) in order to handle the XML document by application software (application). Three standard interfaces (API: Application Programming Interface) are defined. SAX generally has a high processing speed and a small amount of memory used at the time of processing, but performs time-series output and has a feature that it is suitable for simple processing only by referring to data. In contrast, DOM is generally slow in processing speed and uses a large amount of memory during processing. However, since the elements of an XML document are expanded into a hierarchical tree (DOM tree), even for complicated processing contents. It has the feature that it is easy to build a program.

一般に、ＸＭＬ文書に対して検索・更新・削除などの操作を施す場合、操作対象のＸＭＬ文書を標準ＡＰＩ（ＤＯＭ）でＤＯＭツリーに展開してから、その操作を施すことになる。しかし、ＸＭＬ文書をＤＯＭツリーに展開する際には、元のデータ量の６倍もの膨大な動作メモリ容量が必要となる上、使用されない項目（操作対象外の項目）も一緒に展開されてしまうため、展開処理に多大な時間を要している。 In general, when an operation such as search / update / deletion is performed on an XML document, the operation target XML document is expanded into a DOM tree using a standard API (DOM), and then the operation is performed. However, when an XML document is expanded to a DOM tree, a large amount of operation memory capacity that is six times the original data amount is required, and items that are not used (items that are not subject to operation) are also expanded together. For this reason, a great deal of time is required for the expansion process.

つまり、標準ＡＰＩ（ＤＯＭ）によるメモリ使用量が大きく処理速度が遅くなるのは、ＸＭＬ文書を取り扱うアプリケーションにおいて、データ処理の対象とならない要素も含む全ての要素がメモリ上に展開されるからである。従って、処理速度やメモリ使用量は、構造化文書の要素数に比例して増加することになる。
そこで、ＸＭＬ文書自体に予め加工を施して、ＸＭＬ文書に対する処理性能を向上させるべく、上述のような従来技術１，２も提案されている。 In other words, the memory usage by the standard API (DOM) is large and the processing speed is slow because, in an application that handles XML documents, all elements including elements that are not subject to data processing are expanded on the memory. . Accordingly, the processing speed and the memory usage increase in proportion to the number of elements of the structured document.
Therefore, the prior arts 1 and 2 as described above have been proposed in order to improve the processing performance for the XML document by processing the XML document in advance.

しかし、上述した従来技術１による手法では、データ処理に用いられる同種のデータが一つのタグにまとめられているが、この手法は、同種のデータを持つ特定のデータに適用され、メモリ使用量削減や処理速度の高速化についての改善効果はデータに依存することになる。
また、上述した従来技術２では、データ処理の対象となるキー要素と、使用されない要素とを別々のファイルに分けているが、キー要素と非キー要素とを一緒に表示したい場合、非キー要素を別ファイルから読み出してキー要素と一緒にしなければならず、非常に手間がかかることになる。 However, in the technique according to the prior art 1 described above, the same kind of data used for data processing is collected into one tag. However, this technique is applied to specific data having the same kind of data, and the memory usage is reduced. In addition, the improvement effect of increasing the processing speed depends on the data.
Further, in the above-described prior art 2, the key element that is the target of data processing and the element that is not used are divided into separate files. However, if the key element and the non-key element are to be displayed together, the non-key element Must be read from a separate file and combined with the key element, which is very time consuming.

従って、ＸＭＬデータの構造を事前に変換する場合には、その変換手法を種々のＸＭＬデータに適用することができるように汎用のデータ構造変換法を考える必要がある。また、変換後のＸＭＬデータも有効なデータ構造を有するように変換を行なう必要があるほか、応用ソフトウエアに対してのトランスペアレント性（透過性）を確保する必要もある。ここで、トランスペアレント性は、応用ソフトウエアが、処理対象の変換後ＸＭＬ文書に対して、修正を行なうことなく、もしくは、わすかな修正を行なうだけで、変換後ＸＭＬ文書をそのまま使えることを意味する。このトランスペアレント性は、変換後のＸＭＬ文書を、既存の応用ソフトウエアで実行する際に重要な性質である。 Therefore, when converting the structure of XML data in advance, it is necessary to consider a general-purpose data structure conversion method so that the conversion method can be applied to various types of XML data. Further, it is necessary to perform conversion so that the converted XML data also has an effective data structure, and it is also necessary to ensure transparency (transparency) to application software. Here, the transparent property means that the application software can use the converted XML document as it is without correcting the conversion target XML document to be processed or only by making a slight correction. . This transparency is an important property when the converted XML document is executed by existing application software.

一方、上述した従来技術３では、ＸＭＬデータのレコード毎に圧縮ファイルを作成しているが、その圧縮ファイルは、通常、バイナリデータであるため、文字コードのみからなるＸＭＬ文書内に置くことができず、別ファイルとして保存されることになる。従って、ＸＭＬ文書内の所定レコードを参照する必要がある場合には、そのレコードを別ファイルから読み出して解凍しなければならず、非常に手間がかかることになる。このため、ＸＭＬ文書を効率良く圧縮しながら、その圧縮結果をＸＭＬ文書内に置くことができるようにした（つまり圧縮結果を文字コードの形で得られるようにした）圧縮方法の開発が望まれている。 On the other hand, in the above-described prior art 3, a compressed file is created for each record of XML data. However, since the compressed file is usually binary data, it can be placed in an XML document consisting only of character codes. Instead, it will be saved as a separate file. Therefore, when it is necessary to refer to a predetermined record in the XML document, it is necessary to read the record from another file and decompress it, which is very troublesome. Therefore, it is desired to develop a compression method capable of efficiently compressing an XML document and placing the compression result in the XML document (that is, the compression result can be obtained in the form of a character code). ing.

本発明は、このような課題に鑑み創案されたもので、アプリケーションに対するトランスペアレント性や変換された構造化文書のデータ構造の有効性を確保しながら、非キー要素を一つの要素にまとめるデータ構造変換処理を、種々の構造化文書データに施すことができるようにした汎用の変換技術を提供することにより、構造化文書に対する操作に必要となるリソースを軽減し、構造化文書を処理する際におけるメモリ使用量削減と処理速度の高速化との両方を実現することを目的とする。 The present invention was devised in view of such problems, and is a data structure conversion that combines non-key elements into one element while ensuring the transparency to the application and the validity of the data structure of the converted structured document. By providing a general-purpose conversion technology that allows processing to be performed on various structured document data, resources required for operations on structured documents are reduced, and memory for processing structured documents The purpose is to achieve both reduction in usage and increase in processing speed.

また、本発明は、構造化文書を効率良く圧縮しながら、その圧縮結果を文字コードの形で得て構造化文書内に置くことができるようにした圧縮変換技術を提供することにより、構造化文書に対する操作に必要となるリソースを軽減し、構造化文書を処理する際におけるメモリ使用量削減と処理速度の高速化との両方を実現することを目的とする。 In addition, the present invention provides a compression conversion technique that enables a compressed result to be obtained in the form of a character code and placed in the structured document while efficiently compressing the structured document. An object of the present invention is to reduce resources required for operation on a document and to realize both reduction in memory usage and increase in processing speed when processing a structured document.

上記目的を達成するために、本発明の構造化文書変換装置は、構造化文書を変換する処理部を有する装置であって、該処理部が、変換対象の構造化文書を成す要素につき、構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象にならない非キー要素とに区別するための区別情報を読み込み、所定のタグ名および所定の属性名を付与された、該区別情報における該非キー要素のための新たな要素を作成し、該非キー要素のタグ名を区切り記号を介して繋いで該非キー要素のタグ名を含むタグ名文字列を作成し、該タグ名文字列を、該新たな要素において、前記所定の属性名に対応する属性値として記述し、該非キー要素の内容を区切り記号を介して繋いで該非キー要素の内容を含む内容文字列を作成し、該内容文字列を、該新たな要素の内容として記述し、該区別情報における該キー要素を、変換後の構造化文書においてそのまま記述することを特徴としている（請求項１）。 In order to achieve the above object, a structured document conversion apparatus according to the present invention is a device having a processing unit for converting a structured document, and the processing unit has a structure for each element constituting the structured document to be converted. The distinction information that is read with the distinction information for distinguishing the key element that is the target of the data processing for the document and the non-key element that is not the target of the data processing, and is given the predetermined tag name and the predetermined attribute name Create a new element for the non-key element in, connect the tag names of the non-key elements via a delimiter, create a tag name string including the tag name of the non-key element, and The new element is described as an attribute value corresponding to the predetermined attribute name, and the contents of the non-key element are created by connecting the contents of the non-key element via a delimiter, and the contents String Described as the contents of the new element, the key element in the compartment-specific information, is characterized in that as it is described in the structured document after conversion (claim 1).

なお、該処理部が、該区別情報を記述するとともに該新たな要素のタグ名を記述した変換仕様文書を読み込み、該変換仕様文書に基づいて、該変換対象の構造化文書に対し、該非キー要素の記述についての変換を施すことが好ましい（請求項２）。
また、上記目的を達成するために、本発明の構造化文書変換装置は、構造化文書を変換する処理部を有する装置であって、該処理部が、変換対象の構造化文書を成す要素につき、構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象にならない非キー要素とに区別するための区別情報を読み込み、所定のタグ名を付与された、該区別情報における該非キー要素のための新たな要素を作成し、該非キー要素の記述中においてタグを表わす記号を実体参照記述によりタグ付けに関連しない実体参照文字列に置き換えた文字列を作成し、該文字列を、該新たな要素の内容として記述し、該区別情報における該キー要素を、変換後の構造化文書においてそのまま記述することを特徴としている（請求項３）。 The processing unit reads the conversion specification document describing the distinction information and the tag name of the new element, and based on the conversion specification document, applies the non-key to the structured document to be converted. It is preferable to convert the element description (claim 2).
In order to achieve the above object, the structured document conversion apparatus of the present invention is a device having a processing unit for converting a structured document, and the processing unit is provided for elements constituting the structured document to be converted. The non-key in the distinction information, which is read with the distinction information for distinguishing between the key element which is the object of data processing for the structured document and the non-key element which is not the object of data processing, and is given a predetermined tag name Create a new element for the element, create a character string in which the symbol representing the tag in the description of the non-key element is replaced with an entity reference character string not related to tagging by the entity reference description, and the character string is It is described as the contents of the new element, and the key element in the distinction information is described as it is in the converted structured document (claim 3).

さらに、上記目的を達成するために、本発明の関連技術としての構造化文書変換方法は、変換対象の構造化文書を成す要素をキー要素と非キー要素とに分け、所定のタグ名および所定の属性名を付与された新たな要素を作成し、該非キー要素のタグ名を含むタグ名文字列を作成し、該タグ名文字列を、該新たな要素において前記所定の属性名に対応する属性値として記述するタグ名変換を行ない、該非キー要素の内容を含む内容文字列を作成し、該内容文字列を、該新たな要素の内容として記述する内容変換を行ない、該キー要素を、変換後の構造化文書において、そのまま（該キー要素に対し何の変換も施すことなく）記述することを特徴としている。 Furthermore, in order to achieve the above object, a structured document conversion method as a related technique of the present invention divides elements constituting a structured document to be converted into a key element and a non-key element, a predetermined tag name and a predetermined Is created, a tag name character string including the tag name of the non-key element is created, and the tag name character string is associated with the predetermined attribute name in the new element. Perform tag name conversion described as an attribute value, create a content character string including the content of the non-key element, perform content conversion describing the content character string as the content of the new element, and convert the key element to In the structured document after conversion, it is described as it is (without performing any conversion on the key element).

また、本発明の関連技術としての構造化文書変換方法は、変換対象の構造化文書を成す要素をキー要素と非キー要素とに分け、所定のタグ名を付与された新たな要素を作成し、該非キー要素の記述中においてタグ付けに関連する記号をタグ付けに関連しない文字列に置き換えた文字列を作成し、該文字列を、該新たな要素の内容として記述し、該キー要素を、変換後の構造化文書において、そのまま（該キー要素に対し何の変換も施すことなく）記述することを特徴としている。 Also, the structured document conversion method as a related technology of the present invention divides the elements constituting the conversion target structured document into a key element and a non-key element, and creates a new element given a predetermined tag name. A character string in which a symbol related to tagging is replaced with a character string not related to tagging in the description of the non-key element, the character string is described as the content of the new element, and the key element is In the converted structured document, it is described as it is (without performing any conversion on the key element).

さらに、本発明の関連技術としての構造化文書変換方法は、変換対象の構造化文書を成す要素をキー要素と非キー要素とに分け、所定のタグ名を付与された新たな要素を作成し、該非キー要素を成す文字もしくは文字列に対し出現頻度の高いものほど短い可変長符号を割り付ける可変長符号化を行ない、該可変長符号化により得られたバイナリデータを６ビットずつ１バイトの変換データにパッキングし、各変換データにパッキングされた６ビットデータをＡＳＣＩＩ（American Standard Code for Information Interchange）コードに従う文字コードに変換することにより、該非キー要素を、該文字コードからなる圧縮文字列に変換し、該圧縮文字列を、該新たな要素の内容として記述し、該キー要素を、変換後の構造化文書において、そのまま（該キー要素に対し何の変換も施すことなく）記述することを特徴としている。 Further, the structured document conversion method as a related technique of the present invention divides the elements constituting the conversion target structured document into a key element and a non-key element, and creates a new element given a predetermined tag name. , Variable length coding is performed by assigning a shorter variable length code to a character or character string that forms the non-key element, and the shorter the variable length code is assigned, the binary data obtained by the variable length coding is converted into 1 byte by 6 bits. The non-key element is converted into a compressed character string composed of the character code by packing the data into 6-bit data and converting the 6-bit data packed in each conversion data into a character code according to the ASCII (American Standard Code for Information Interchange) code. Then, the compressed character string is described as the contents of the new element, and the key element is directly used in the converted structured document (the key element). Is characterized in that no) describe performing any transformation on.

一方、本発明の関連技術としてのデータ変換方法は、変換対象の文字もしくは文字列に対し、出現頻度の高いものほど短い可変長符号を割り付ける可変長符号化を行ない、該可変長符号化により得られたバイナリデータを６ビットずつ１バイトの変換データにパッキングして出力することを特徴としている。このとき、さらに、各変換データにパッキングされた６ビットデータをＡＳＣＩＩコードに従う文字コードに変換し、該変換データ毎に得られた該文字コードを、前記変換対象の文字もしくは文字列の圧縮変換結果として出力してもよい。 On the other hand, the data conversion method as a related technique of the present invention performs variable-length coding by assigning a shorter variable-length code to a character or character string to be converted that has a higher appearance frequency, and is obtained by the variable-length coding. The binary data is packed into 1-byte converted data every 6 bits and output. At this time, further, the 6-bit data packed in each conversion data is converted into a character code according to the ASCII code, and the character code obtained for each conversion data is converted into a compression conversion result of the character or character string to be converted. May be output as

本発明の関連技術としての構造化文書変換方法及び本発明の構造化文書変換装置によれば、処理部によって、変換対象の構造化文書を成す要素につき、キー要素と非キー要素とに分けた区別情報を読み込まれ、その変換対象の構造化文書が、キー要素をそのまま記述する一方で非キー要素を一つのタグにまとめて記述した構造化文書に変換されるので、変換後の構造化文書では、要素数が削減されるとともに、展開時やデータ処理時に非キー要素を一括して扱うことができる。特に、データ処理の対象とならない非キー要素が多い構造化文書や、１レコードの要素数が多い構造化文書での要素数の削減効果は大きい。 According to the structured document conversion method and the structured document conversion apparatus of the present invention as related techniques of the present invention, the processing unit divides the elements constituting the structured document to be converted into key elements and non-key elements. The distinction information is read, and the structured document to be converted is converted into a structured document in which the key elements are described as they are and the non-key elements are combined into one tag. In this case, the number of elements is reduced, and non-key elements can be handled collectively at the time of expansion or data processing. In particular, the effect of reducing the number of elements in a structured document with many non-key elements that are not subject to data processing or a structured document with a large number of elements in one record is great.

また、応用ソフトウエア（アプリケーション）により構造化文書に対するデータ処理を行なう際にはキー要素のみが使用されるが、本発明では、キー要素についてはそのまま記述されているので、通常通り、キー要素のタグ名を用いてキー要素の内容を参照することができ、変換後の構造化文書のトランスペアレント性は確保される。
このとき、変換仕様文書を構造化文書として作成し変換実行手順を与えることにより、多様な種類の構造化文書に対して、スタイルシートを一々作成する必要がなくなり、手間をかけることなく、本発明によるデータ構造の変換／逆変換処理を種々の構造化文書データに施すことができる。さらに、変換仕様文書に基づいて変換／逆変換を指示する変換／逆変換用スタイルシートを生成すれば、構造化文書変換プロセッサ（例えば標準のＸＳＬＴプロセッサ）により変換／逆変換用スタイルシートを用いて変換／逆変換を実行することができ、つまりは、ほとんどあらゆる種類の構造化文書システム（ＸＭＬ文書システム）において本発明による変換／逆変換処理を実行することができる。 Further, only key elements are used when data processing is performed on a structured document by application software (application). However, in the present invention, the key elements are described as they are, so that the key elements are not changed as usual. The contents of the key element can be referred to using the tag name, and the transparency of the converted structured document is ensured.
At this time, by creating a conversion specification document as a structured document and giving a conversion execution procedure, it is not necessary to create a style sheet for each type of structured document, and the present invention can be performed without trouble. Data structure conversion / inverse conversion processing according to can be applied to various structured document data. Further, if a conversion / inverse style sheet for instructing conversion / inverse conversion is generated based on the conversion specification document, a structured / text conversion style sheet (for example, a standard XSLT processor) is used to use the conversion / inverse style sheet. Conversion / inversion can be performed, that is, conversion / inversion processing according to the present invention can be performed in almost any kind of structured document system (XML document system).

従って、本発明によれば、アプリケーションに対するトランスペアレント性や変換された構造化文書のデータ構造の有効性を確保しながら、非キー要素を一つの要素にまとめるデータ構造変換処理を、種々の構造化文書データに施すことができるようにした汎用の変換技術を提供することができ、これにより、構造化文書に対する操作に必要となるリソースが大幅に軽減され、構造化文書を処理する際におけるメモリ使用量削減と処理速度の高速化との両方が実現されることになる。 Therefore, according to the present invention, data structure conversion processing for combining non-key elements into one element while ensuring the transparency to the application and the validity of the data structure of the converted structured document is performed in various structured documents. Provides general-purpose conversion technology that can be applied to data, which significantly reduces the resources required to operate on structured documents and uses memory when processing structured documents Both reduction and an increase in processing speed are realized.

タグ名変換や内容変換に際しては、コンマ等の区切り記号を介して非キー要素のタグ名や内容を繋ぐことにより、タグ名文字列や内容文字列が、タグ付けに関連することのない記号を用いて極めて容易に作成される。
このとき、非キー要素が複数階層を成している場合、タグ名文字列におけるタグ名に、階層構造識別情報を付加すれば、その階層構造を変換後の構造化文書に保存することができるので、その階層構造識別情報に従って、元の構造化文書を復元するための逆変換を容易に行なうことができる。 In tag name conversion and content conversion, the tag name character string and content character string are not related to tagging by connecting the tag name and content of non-key elements via a separator such as a comma. Very easy to use.
At this time, if the non-key element has a plurality of hierarchies, the hierarchical structure can be stored in the converted structured document by adding the hierarchical structure identification information to the tag name in the tag name character string. Therefore, reverse conversion for restoring the original structured document can be easily performed in accordance with the hierarchical structure identification information.

また、非キー要素が属性を有する場合、タグ名文字列において、属性を有するタグ名の後に、区切り記号を介して、属性名識別情報を付加した属性の属性名を記述するとともに、このタグ名文字列におけるタグ名の並びに対応させて非キー要素の内容を繋いだ内容文字列を作成することにより、非キー要素の属性を変換後の構造化文書に保存することができるので、その属性名識別情報に従って、元の構造化文書を復元するための逆変換を容易に行なうことができる。 If the non-key element has an attribute, in the tag name character string, the attribute name of the attribute to which attribute name identification information is added is described via a delimiter after the tag name having the attribute. By creating a content string that links the contents of non-key elements in correspondence with the tag names in the string, the attributes of the non-key elements can be saved in the structured document after conversion. According to the identification information, reverse conversion for restoring the original structured document can be easily performed.

非キー要素のタグ名を短縮タグ名に置換するタグ名短縮変換を行なうことにより、変換後の構造化文書のデータ量を削減することができる。このとき、変換仕様文書におけるタグ名短縮変換情報によってタグ名短縮変換を行なうか否かを指示し、タグ名短縮変換やタグ名伸長変換の実行／非実行を自動的に切り換えることができる。
変換対象の構造化文書が表形式で記述されている場合、元の構造化文書を復元するための逆変換に際してタグ名や属性名を容易に割り出すことができるので、タグ名変換や属性名変換を省略することができる。従って、変換後の構造化文書においては、非キー要素の内容文字列が記述されていればよく、タグ名や属性名に係る記述を省略することができ、変換後の構造化文書のデータ量を大幅に削減することができる。このとき、変換仕様文書における表形式情報によって表形式変換を行なうか否かを指示し、表形式変換や表形式逆変換の実行／非実行を自動的に切り換えることができる。 By performing the tag name shortening conversion that replaces the tag name of the non-key element with the shortened tag name, the data amount of the structured document after the conversion can be reduced. At this time, whether or not to perform the tag name shortening conversion is instructed by the tag name shortening conversion information in the conversion specification document, and the execution / non-execution of the tag name shortening conversion and the tag name decompression conversion can be automatically switched.
If the conversion target structured document is described in a tabular format, tag names and attribute names can be easily determined when performing reverse conversion to restore the original structured document. Can be omitted. Therefore, in the converted structured document, it is only necessary to describe the content character string of the non-key element, the description relating to the tag name and the attribute name can be omitted, and the data amount of the converted structured document Can be greatly reduced. At this time, whether or not to perform table format conversion is instructed by the table format information in the conversion specification document, and execution / non-execution of table format conversion and table format reverse conversion can be automatically switched.

また、本発明の関連技術としての構造化文書変換方法及び本発明の構造化文書変換装置によれば、処理部によって、変換対象の構造化文書を成す要素につき、キー要素と非キー要素とに分けた区別情報を読み込まれ、その変換対象の構造化文書が、キー要素をそのまま記述する一方で、非キー要素を一つのタグにまとめその非キー要素の記述中のタグ付けに関連する記号をタグ付けに関連しない文字列に置き換えた構造化文書に変換されるので、上述した構造化文書変換方法と同様の効果ないし利点を得ることができる。このとき、非キー要素の記述中においてタグを表わす記号を実体参照記述によりタグ付けに関連しない実体参照文字列に置き換えることにより〔例えば、構造化文書がＸＭＬ文書である場合、タグ記号「<」および「>」をそれぞれ実体参照記述による文字列「&lt；」および「&gt；」に置き換えることにより〕、極めて容易に構造化文書の変換を行なうことができる。 Further, according to the structured document conversion method and the structured document conversion apparatus of the present invention as the related technology of the present invention, the processing unit converts the elements constituting the structured document to be converted into a key element and a non-key element. The divided identification information is read and the structured document to be converted describes the key element as it is, while the non-key elements are grouped into one tag and the symbols related to tagging in the description of the non-key element are displayed. Since it is converted into a structured document replaced with a character string not related to tagging, the same effect or advantage as the above-described structured document conversion method can be obtained. At this time, by replacing the symbol representing the tag in the description of the non-key element with the entity reference character string not related to tagging by the entity reference description [for example, when the structured document is an XML document, the tag symbol “<” And by replacing the character strings “<” and “>” by entity reference description with “>” respectively, the structured document can be converted very easily.

さらに、本発明の関連技術としての構造化文書変換方法によれば、変換対象の構造化文書を成す要素がキー要素と非キー要素とに分けられ、その変換対象の構造化文書が、キー要素をそのまま記述する一方で、非キー要素を成す文字もしくは文字列を一つのタグにまとめ下記データ圧縮方法により圧縮して得られた文字コード列（圧縮文字列）として記述した構造化文書に変換されるので、上述した構造化文書変換方法と同様の効果ないし利点を得ることができるほか、変換後の構造化文書のデータ量を大幅に削減することができる。 Furthermore, according to the structured document conversion method as the related art of the present invention, the elements constituting the structured document to be converted are divided into key elements and non-key elements, and the structured document to be converted is converted to the key element. Is converted into a structured document described as a character code string (compressed character string) obtained by combining the characters or character strings that make up the non-key elements into a single tag and compressing them using the following data compression method. Therefore, the same effect or advantage as the structured document conversion method described above can be obtained, and the data amount of the structured document after conversion can be greatly reduced.

非キー要素を成す文字もしくは文字列を圧縮する際には、可変長符号化を行なった上で、この可変長符号化により得られたバイナリデータを６ビットずつ１バイトの変換データにパッキングし、各変換データにパッキングされた６ビットデータをＡＳＣＩＩコードに従う文字コードに変換することにより、文字コードで記述された圧縮データ（圧縮文字列）を得ることができる。これにより、圧縮データを、構造化文書の要素あるいは属性値として置くことができる。 When compressing a character or character string that forms a non-key element, after performing variable-length encoding, the binary data obtained by this variable-length encoding is packed into 1-byte converted data by 6 bits, By converting the 6-bit data packed in each conversion data into a character code according to the ASCII code, compressed data (compressed character string) described in the character code can be obtained. Thereby, the compressed data can be placed as an element or attribute value of the structured document.

上述のような本発明の関連技術としてのデータ圧縮方法を用いることにより、構造化文書を効率良く圧縮しながら、その圧縮結果を文字コードの形で得て構造化文書内に置くことのできる圧縮変換技術が提供されるので、構造化文書に対する操作に必要となるリソースが大幅に軽減され、構造化文書を処理する際におけるメモリ使用量削減や処理速度の高速化が実現される。 By using the data compression method as the related art of the present invention as described above, the compression that can obtain the compression result in the form of character code and put it in the structured document while efficiently compressing the structured document. Since the conversion technique is provided, the resources required for the operation on the structured document are greatly reduced, and the amount of memory used and the processing speed are increased when the structured document is processed.

このとき、圧縮データを表現する文字コードとして、タグ付けに関連する記号（例えばＸＭＬ文書では＜，＞，＆，”，’）を除いたＡＳＣＩＩコードを用いる。これにより、変換後の構造化文書における圧縮文字列にはタグ付けに関連する記号が存在せず、データ処理時などに誤処理が発生するのを確実に防止することができる。
また、ＡＳＣＩＩコードは、種々の文字コード系に共通に含まれる文字コードセットであるため、変換後の構造化文書が文字コード系の変換を施されても、ＡＳＣＩＩコードを用いた圧縮文字列を成すビット列は、文字コード系の変換の影響を受けることなく元の状態に保たれる。従って、文字コード系を変換された構造化文書に含まれる圧縮文字列は、元の非キー要素に正しく復元される。 At this time, the ASCII code excluding symbols related to tagging (for example, <,>, &, “,” in the XML document) is used as the character code expressing the compressed data. In the compressed character string, there is no symbol related to tagging, and it is possible to reliably prevent erroneous processing during data processing.
Since the ASCII code is a character code set that is commonly included in various character code systems, even if the converted structured document is subjected to character code system conversion, a compressed character string using the ASCII code is not converted. The bit string formed is kept in its original state without being affected by the conversion of the character code system. Therefore, the compressed character string included in the structured document whose character code system has been converted is correctly restored to the original non-key element.

さらに、圧縮文字列に圧縮時点の文字コード系の種別を示す情報を付与しておくことにより、圧縮文字列から復元されたデータの文字コード系の種別を認識することができ、その文字コード系を、構造化文書の現在の文字コード系に合わせることにより、構造化文書全体の文字コード系の整合性を保つことができる。
また、非キー要素を圧縮文字列に変換するに先立ち、非キー要素を成す文字列を、予め作成された静的辞書を用いて辞書番号に置き換えておくことにより、可変長符号化の対象となる文字列を短縮できるので、より圧縮効率を高めることができ、変換後の構造化文書のデータ量をより削減することができる。 Furthermore, by adding information indicating the type of the character code system at the time of compression to the compressed character string, it is possible to recognize the character code system type of the data restored from the compressed character string. By matching this with the current character code system of the structured document, the consistency of the character code system of the entire structured document can be maintained.
Prior to converting a non-key element into a compressed character string, the character string that forms the non-key element is replaced with a dictionary number using a static dictionary created in advance, so that the variable length encoding target is changed. Therefore, the compression efficiency can be further increased, and the data amount of the structured document after conversion can be further reduced.

以下、図面を参照して本発明の実施の形態を説明する。
標準ＡＰＩとしてＤＯＭを採用し構造化文書をメモリ上へＤＯＭツリーとして展開する場合、一般に、構造化文書中の要素数が多いほど、その展開処理に時間がかかり、タグ検索にも時間がかかることになる。
構造化文書中には、通常、その構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とが含まれており、構造化文書を成す要素は、キー要素と非キー要素とに分けることができる。つまり、応用ソフトウエア（アプリケーション）により構造化文書に対するデータ処理を行なう際には、キー要素のみが処理の対象とされ、タグ名でキー要素の検索を行ない、検索されたキー要素の内容が参照される。 Embodiments of the present invention will be described below with reference to the drawings.
When DOM is adopted as a standard API and a structured document is expanded as a DOM tree in memory, in general, the larger the number of elements in the structured document, the longer the expansion process takes, and the longer the tag search takes. become.
In a structured document, a key element that is a target of data processing for the structured document and a non-key element that is not a target of data processing are usually included, and the elements constituting the structured document are key elements. And non-key elements. In other words, when data processing for structured documents is performed by application software (application), only key elements are processed, key elements are searched by tag name, and the contents of the searched key elements are referenced. Is done.

そこで、本発明（第１〜第３実施形態）では、変換対象の構造化文書を、１レコード中で、キー要素については何の変換も施すことなくそのまま記述するとともに、非キー要素を一つのタグにまとめて記述した構造化文書に変換している。以下、本実施形態では、構造化文書がＸＭＬ文書である場合について説明する。
〔１〕第１実施形態の説明
本発明の第１実施形態では、説明を簡単にするため、まず、各レコードの要素が１階層であるＸＭＬ文書の変換方法について説明した後、要素が２以上の階層を成しているレコードや、属性を有する要素を含むレコードが含まれるＸＭＬ文書の変換方法について説明する。 Therefore, in the present invention (first to third embodiments), a structured document to be converted is described as it is without performing any conversion on key elements in one record, and one non-key element is described as one. It is converted into a structured document that is described in tags. Hereinafter, in the present embodiment, a case where the structured document is an XML document will be described.
[1] Description of First Embodiment In the first embodiment of the present invention, for the sake of simplicity, first, a method for converting an XML document in which each record element is one layer will be described. A method for converting an XML document that includes a record that includes the above-described hierarchy and a record that includes an element having an attribute will be described.

〔１−１〕第１実施形態の構造化文書変換方法の原理
ここで、図１（Ａ），図１（Ｂ）および図３（Ａ）を参照しながら、本発明の第１実施形態としての構造化文書変換方法の原理について説明する。
図３（Ａ）に示す変換対象のＸＭＬ文書は２つのレコード（タグ名“個人”）を有している。一方のレコードは、タグ名“名前”，“会社”，“部署”，“住所”，“電話”の要素を一つずつ有している。また、もう一方のレコードは、タグ名“名前”，“会社”，“部署”の要素を一つずつ有するとともに、タグ名“電話”の要素を二つ有している。これら二つのレコードにおいては、要素の種類や数が異なっているため、図３（Ａ）に示すＸＭＬ文書は表形式ではない。図３（Ａ）に示すＸＭＬ文書のメモリ展開形式を図１（Ａ）に示す。この図１（Ａ）では、図３（Ａ）に示すＸＭＬ文書をメモリ上へＤＯＭツリーとして展開した例が示されている。 [1-1] Principle of Structured Document Conversion Method of First Embodiment Here, referring to FIGS. 1A, 1B, and 3A, as a first embodiment of the present invention. The principle of the structured document conversion method will be described.
The XML document to be converted shown in FIG. 3A has two records (tag name “person”). One record has elements of tag name “name”, “company”, “department”, “address”, and “phone” one by one. The other record has elements of tag names “name”, “company”, and “department” one by one, and two elements of tag name “phone”. Since these two records have different types and numbers of elements, the XML document shown in FIG. 3A is not in a table format. FIG. 1A shows a memory development format of the XML document shown in FIG. FIG. 1A shows an example in which the XML document shown in FIG. 3A is expanded on a memory as a DOM tree.

上述のような要素を有するＸＭＬ文書において、タグ名“名前”，“会社”の要素をキー要素とするとともにタグ名“部署”，“住所”，“電話”の要素を非キー要素とし、このＸＭＬ文書に対し、第１実施形態の構造化文書変換方法を適用して得られた変換後ＸＭＬ文書のメモリ展開形式を図１（Ｂ）に示す。なお、ここで示す展開形式は、応用ソフトウエアが標準ＡＰＩ（ＤＯＭ）を介して変換後ＸＭＬ文書を操作するときの、メモリ上への展開形式である。 In an XML document having the above elements, the tag name “name” and “company” elements are used as key elements, and the tag name “department”, “address”, and “telephone” elements are used as non-key elements. FIG. 1B shows a memory expansion format of the converted XML document obtained by applying the structured document conversion method of the first embodiment to the XML document. The development format shown here is a development format on the memory when the application software operates the converted XML document via the standard API (DOM).

この図１（Ｂ）に示す変換後ＸＭＬ文書は、図３（Ｂ）〜図３（Ｄ）を参照しながら後述するＸＭＬ文書に対応するもので、この図１（Ｂ）では、図３（Ｂ）〜図３（Ｄ）に示すＸＭＬ文書をメモリ上へＤＯＭツリーとして展開した例が示されている。図１（Ｂ）に示すＸＭＬ文書では、タグ名“情報”を有する新たな要素が作成され、この新たな要素の内容として、タグ名“部署”，“住所”，“電話”の非キー要素の内容がまとめて記述されている。 The converted XML document shown in FIG. 1 (B) corresponds to an XML document described later with reference to FIGS. 3 (B) to 3 (D). In FIG. 1 (B), FIG. An example is shown in which the XML document shown in B) to FIG. 3D is expanded as a DOM tree on the memory. In the XML document shown in FIG. 1B, a new element having a tag name “information” is created, and the contents of this new element are non-key elements of the tag name “department”, “address”, and “telephone”. The contents of are described together.

つまり、一方のレコードでは、タグ名“情報”の要素の内容として“Ａ部，Ａ市，123”が記述され、もう一方のレコードでは、タグ名“情報”の要素の内容として“Ｂ部，456，789”が記述されている。また、タグ名“名前”，“会社”のキー要素については、元のまま記述されている。
このようにして非キー要素を一つの要素にまとめるようにＸＭＬ文書を変換することによって、ＸＭＬ文書中に含まれる要素の数、つまりメモリ上に展開されたツリーの子要素の数を大幅に減らすことができ、展開時やデータ処理時に非キー要素を一括して扱うことができる。 That is, in one record, “A part, A city, 123” is described as the contents of the element of the tag name “information”, and in the other record, “B part, 456, 789 "are described. Further, the key elements of the tag names “name” and “company” are described as they are.
By transforming the XML document so that the non-key elements are combined into one element in this way, the number of elements included in the XML document, that is, the number of child elements of the tree expanded in the memory is greatly reduced. It is possible to handle non-key elements at the same time during expansion and data processing.

〔１−２〕第１実施形態のシステムおよび変換／逆変換処理の流れ
図２は、本発明の第１実施形態としての構造化文書変換方法を適用されるシステムおよびそのシステムにおける変換／逆変換処理の流れを説明するための図である。
多様な種類のＸＭＬ文書に対して、各ＸＭＬ文書に応じたスタイルシート〔ＸＳＬ(ＸＭＬ Style Language)シート〕を一々作成するのは極めて面倒で手間がかかる。 [1-2] System of First Embodiment and Flow of Conversion / Inverse Conversion Processing FIG. 2 shows a system to which the structured document conversion method according to the first embodiment of the present invention is applied, and conversion / inverse conversion in the system. It is a figure for demonstrating the flow of a process.
Creating various style sheets (XSL (XML Style Language) sheets) corresponding to each XML document for various types of XML documents is extremely troublesome and troublesome.

そこで、その手間を省くため、第１実施形態では、図９，図１２〜図１５および図１７を参照しながら後述するごとく、ＸＭＬ文書のデータ構造を変換するための仕様（レコード名，キータグ名，非キータグ名等）をＸＭＬ文書（変換仕様文書）によって作成して変換実行手順を与え、図１８および図１９を参照しながら後述するごとく、その変換仕様文書に基づいてＸＭＬ文書の変換／逆変換を実行する。 Therefore, in order to save the effort, in the first embodiment, specifications (record name, key tag name) for converting the data structure of the XML document as will be described later with reference to FIGS. , Non-key tag names, etc.) are created by an XML document (conversion specification document) and given a conversion execution procedure. As will be described later with reference to FIGS. 18 and 19, conversion / reverse of the XML document based on the conversion specification document. Perform the conversion.

さらに、第１実施形態では、図２０（Ａ）〜図２０（Ｄ）を参照しながら後述するごとく、与えられた変換仕様文書に基づいて、変換実行手順を指示する変換用スタイルシートや、逆変換実行手順を指示する逆変換用スタイルシートを自動的に生成し、このスタイルシートを用いて、構造化文書変換プロセッサ〔ＸＳＬＴ(ＸＭＬ Style Language Translator)プロセッサ〕に、ＸＭＬ文書に対するデータ構造変換／逆変換を実行させている。このように変換／逆変換の実行手順をスタイルシートで与えるようにすれば、標準のＸＳＬＴプロセッサで変換／逆変換を実行することができるので、ほとんどあらゆる種類のＸＭＬ文書システムにおいて第１実施形態による変換／逆変換処理を実行することができる。 Furthermore, in the first embodiment, as will be described later with reference to FIGS. 20A to 20D, a conversion style sheet for instructing a conversion execution procedure based on a given conversion specification document, and vice versa. A reverse conversion style sheet for instructing the conversion execution procedure is automatically generated, and using this style sheet, a structured document conversion processor (an XSLT (XML Style Language Translator) processor) converts / reverses the data structure of the XML document. The conversion is executed. If the conversion / inverse conversion execution procedure is given in the style sheet as described above, the conversion / inverse conversion can be executed by a standard XSLT processor. Therefore, according to the first embodiment in almost all kinds of XML document systems. Conversion / inverse conversion processing can be executed.

図２に示すシステムでは、ＸＳＬＴ変換部１１，ＸＳＬＴ構造変換部１２およびＸＳＬＴ逆変換部１３からなるデータ構造変換／逆変換機構１０がそなえられるとともに、標準ＡＰＩ２０および応用ソフトウエア３０がそなえられている。なお、ＸＳＬＴ変換部１１，ＸＳＬＴ構造変換部１２およびＸＳＬＴ逆変換部１３（データ構造変換／逆変換機構１０）は、実際には、一つの標準のＸＳＬＴプロセッサ（構造化文書変換プロセッサ）によって実現される。 In the system shown in FIG. 2, a data structure conversion / inverse conversion mechanism 10 including an XSLT conversion unit 11, an XSLT structure conversion unit 12, and an XSLT reverse conversion unit 13 is provided, and a standard API 20 and application software 30 are provided. . The XSLT converter 11, the XSLT structure converter 12, and the XSLT inverse converter 13 (data structure conversion / inverse conversion mechanism 10) are actually realized by one standard XSLT processor (structured document conversion processor). The

ＸＳＬＴ変換部１１は、ＸＭＬ文書によって与えられ、キー要素と非キー要素との区別情報等を記したデータ構造変換の仕様（例えば図９参照）を読み込み、そのＸＭＬ文書と自動変換スタイルシートとにより、構造変換用スタイルシート（例えば図１０参照）および逆変換用スタイルシート（例えば図１１参照）を生成するものである。
ＸＳＬＴ構造変換部１２は、変換対象のＸＭＬ文書（入力ＸＭＬ文書）を読み込み、ＸＳＬＴ変換部１１により生成された構造変換用スタイルシートに基づいて、入力ＸＭＬ文書に対し、レコード毎に非キー要素を一つの要素にまとめるデータ構造変換を施すものである。 The XSLT conversion unit 11 reads the data structure conversion specification (for example, see FIG. 9) given by the XML document and describes the distinction information between the key element and the non-key element, and uses the XML document and the automatic conversion style sheet. A structure conversion style sheet (see, for example, FIG. 10) and a reverse conversion style sheet (see, for example, FIG. 11) are generated.
The XSLT structure conversion unit 12 reads an XML document to be converted (input XML document), and sets a non-key element for each record with respect to the input XML document based on the structure conversion style sheet generated by the XSLT conversion unit 11. Data structure conversion to be combined into one element is performed.

標準ＡＰＩ２０および応用ソフトウエア（アプリケーション）３０は、いずれも、プロセッサによって実行され、ＸＳＬＴ構造変換部１２からの変換後ＸＭＬ文書に対して所定のデータ処理を施すためのものである。そのプロセッサとしては、データ構造変換／逆変換機構１０を実現するためのＸＳＬＴプロセッサを用いてもよいし、このＸＳＬＴプロセッサとは別のプロセッサを用いてもよい。 Both the standard API 20 and the application software (application) 30 are executed by the processor and perform predetermined data processing on the converted XML document from the XSLT structure conversion unit 12. As the processor, an XSLT processor for realizing the data structure conversion / inverse conversion mechanism 10 may be used, or a processor different from the XSLT processor may be used.

ＸＳＬＴ逆変換部１３は、応用ソフトウエア３０により処理されたＸＭＬ文書（抽出ＸＭＬ文書，変換後ＸＭＬ文書）を読み込み、ＸＳＬＴ変換部１１により生成された逆変換用スタイルシートに基づいて、抽出ＸＭＬ文書を元の形式のＸＭＬ文書（非キー要素を元の状態に戻したＸＭＬ文書）に復元するための逆変換を実行し、復元結果を抽出最終結果として出力するものである。 The XSLT reverse conversion unit 13 reads an XML document (extracted XML document, converted XML document) processed by the application software 30 and extracts the extracted XML document based on the reverse conversion style sheet generated by the XSLT conversion unit 11. Is converted to an XML document in the original format (an XML document in which the non-key elements are restored to the original state), and the restoration result is output as an extraction final result.

上述のごとく構成されたシステムにおいて、データ構造変換/逆変換機構（ＸＳＬＴプロセッサ）１０は、ＸＭＬ文書による変換仕様文書を読み込むとともに、処理対象の入力ＸＭＬ文書を読み込み、変換仕様（実際には構造変換用スタイルシート）に基づいて入力ＸＭＬ文書を変換し、所定のデータ構造変換を施したＸＭＬ文書を出力する。そして、変換されたＸＭＬ文書に対しては、標準ＡＰＩ２０を介して応用ソフトウエアによりデータ処理（例えばタグ検索）が施され、データ処理後のＸＭＬ文書が得られる。なお、データ処理としてタグ検索を行なった場合、検索結果が抽出ＸＭＬ文書の形で得られる。この抽出ＸＭＬ文書は、データ構造変換／逆変換機構１０に読み込まれ、変換仕様（実際には逆変換用スタイルシート）に基づいて元のデータ構造のＸＭＬ文書に逆変換され、最終的なデータ処理結果としてのＸＭＬ文書が得られる。 In the system configured as described above, the data structure conversion / inverse conversion mechanism (XSLT processor) 10 reads the conversion specification document based on the XML document and also reads the input XML document to be processed, and the conversion specification (actually the structure conversion). The input XML document is converted based on the style sheet, and the XML document subjected to the predetermined data structure conversion is output. The converted XML document is subjected to data processing (for example, tag search) by application software via the standard API 20, and an XML document after data processing is obtained. When a tag search is performed as data processing, the search result is obtained in the form of an extracted XML document. This extracted XML document is read into the data structure conversion / inverse conversion mechanism 10 and is converted back into an XML document having the original data structure based on the conversion specification (actually, the reverse conversion style sheet), and the final data processing is performed. The resulting XML document is obtained.

なお、第１実施形態において、ＸＳＬＴ変換部１１に読み込まれるデータ構造変換用の仕様ＸＭＬ文書については、図９，図１２〜図１５および図１７を参照しながら後述する。また、ＸＳＬＴ変換部１１によって生成される構造変換用スタイルシートおよび逆変換用スタイルシートについては、それぞれ図１０および図１１を参照しながら後述する。
〔１−３〕第１実施形態における、非表形式ＸＭＬ文書の変換方法および具体的な変換例
第１実施形態の変換方法を表形式でないＸＭＬ文書（非表形式ＸＭＬ文書）に適用した場合、非キー要素のタグ名を含むタグ名文字列、および、非キー要素の内容を含む内容文字列が作成され、これらの文字列が、新たに作成された要素において要素内容，タグ名もしくは属性値として記述される。 In the first embodiment, the specification XML document for data structure conversion read by the XSLT conversion unit 11 will be described later with reference to FIGS. 9, 12 to 15 and FIG. Further, the structure conversion style sheet and the reverse conversion style sheet generated by the XSLT conversion unit 11 will be described later with reference to FIGS. 10 and 11, respectively.
[1-3] Conversion method and specific conversion example of non-tabular XML document in the first embodiment When the conversion method of the first embodiment is applied to an XML document (non-tabular XML document) that is not a tabular format, A tag name character string including the tag name of the non-key element and a content character string including the content of the non-key element are created, and these character strings are element contents, tag names, or attribute values in the newly created element. Is described as

このとき、タグ名文字列は、区切り記号を介して複数の非キー要素のタグ名を繋いで作成されるとともに、内容文字列も、同様に、区切り記号を介して複数の非キー要素の内容を繋いで作成される。第１実施形態では、区切り記号としてコンマ“，”を用いる。
ここで、タグ名や内容の繋ぎ方としてはＣＳＶ(Comma Separated Value)形式を用いる。ＣＳＶは、本来、コンマを介して数値や文字列を繋ぐ方法であり、区切り記号をコンマに限っているが、本発明では、区切り記号をコンマに限る必要はない。 At this time, the tag name character string is created by connecting the tag names of a plurality of non-key elements via delimiters, and the content character string is similarly the contents of the plurality of non-key elements via delimiters. Created by connecting. In the first embodiment, a comma “,” is used as a delimiter.
Here, the CSV (Comma Separated Value) format is used as a method of connecting tag names and contents. CSV is originally a method of connecting numerical values and character strings via commas, and delimiters are limited to commas. However, in the present invention, delimiters need not be limited to commas.

区切り記号としてコンマを用いた場合、要素内容が金額であると、千の位を表わすコンマと混同するおそれがあるため、コンマよりも“@”（アットマーク）や“_”（アンダーバー）を用いる。また、区切り記号を介して文字列を繋ぐ際にその文字列中に区切り記号が文字として含まれている場合には、その文字を実体参照の形に置き換えてもよい。例えばコンマを区切り記号とした場合に文字列中のコンマについては、実体参照記述である“&CMM;”に置き換える。従って、できれば、区切り記号としては、通常の文字列に、滅多に現われない文字を用いることが望ましい。本実施形態では、コンマに限らず、区切り記号を介して、数値や文字列を繋ぐ方法を、便宜上、ＣＳＶと呼ぶことにする。 When a comma is used as a delimiter, if the element content is a monetary amount, there is a risk of being confused with a comma representing a thousand digit, so use "@" (at sign) or "_" (underscore) rather than a comma. . In addition, when character strings are connected via a delimiter, if the delimiter is included as a character in the character string, the character may be replaced with an entity reference form. For example, when a comma is used as a delimiter, a comma in a character string is replaced with an entity reference description “&CMM;”. Therefore, if possible, it is desirable to use characters that rarely appear in normal character strings as delimiters. In the present embodiment, a method of connecting numerical values and character strings not only with commas but via delimiters is referred to as CSV for convenience.

図３（Ｂ）〜図３（Ｆ）においては、それぞれ、図３（Ａ）で前述した表形式でないＸＭＬ文書に、第１実施形態の構造化文書変換方法を適用して得られた変換結果の第１〜第５具体例が示されている。ここでも、タグ名“名前”，“会社”の要素をキー要素とし、タグ名“部署”，“住所”，“電話”の要素を非キー要素とする。
第１実施形態の変換方法では、基本的に、変換対象のＸＭＬ文書を成す要素をそのＸＭＬ文書に対してデータ処理の対象となるキー要素とそのデータ処理の対象とならない非キー要素とに分け、新たな要素を作成し、非キー要素に対してはタグ名変換および内容変換を施す一方、キー要素については、変換後のＸＭＬ文書においても、何の変換も施すことなくそのまま記述する。 3B to 3F, conversion results obtained by applying the structured document conversion method of the first embodiment to the XML document that is not in the table format described above with reference to FIG. 3A. The first to fifth specific examples are shown. Again, the elements of the tag names “name” and “company” are used as key elements, and the elements of the tag names “department”, “address”, and “telephone” are used as non-key elements.
In the conversion method of the first embodiment, basically, the elements constituting the XML document to be converted are divided into key elements that are subject to data processing for the XML document and non-key elements that are not subject to data processing. A new element is created, and tag name conversion and content conversion are performed for the non-key element, while the key element is described as it is in the converted XML document without any conversion.

図３（Ｂ）に示す第１具体例では、タグ名“情報”および属性名“tags”を付与された新たな要素を作成した上で、タグ名変換により、非キー要素についてのタグ名文字列をＣＳＶ形式で作成し、そのタグ名文字列を、新たな要素において属性名“tags”に対応する属性値として記述している。また、内容変換により、非キー要素についての内容文字列をＣＳＶ形式で作成し、この内容文字列を、新たな要素の内容として記述している。 In the first specific example shown in FIG. 3B, after creating a new element with the tag name “information” and the attribute name “tags”, the tag name character for the non-key element is converted by tag name conversion. A column is created in the CSV format, and the tag name character string is described as an attribute value corresponding to the attribute name “tags” in the new element. Further, a content character string for a non-key element is created in CSV format by content conversion, and this content character string is described as the content of a new element.

つまり、図３（Ｂ）に示す変換後ＸＭＬ文書の第１レコードでは、タグ名“情報”の要素において、内容文字列“Ａ部，Ａ市，123”が要素内容として記述され、タグ名文字列“部署，住所，電話”が属性名“tags”の属性値として記述されている。また、第２レコードでは、タグ名“情報”の要素において、内容文字列“Ｂ部，456，789”が要素内容として記述され、タグ名文字列“部署，電話，電話”が属性名“tags”の属性値として記述されている。 That is, in the first record of the converted XML document shown in FIG. 3B, the content character string “A part, A city, 123” is described as the element content in the element of the tag name “information”, and the tag name character The column “department, address, telephone” is described as the attribute value of the attribute name “tags”. In the second record, in the element of the tag name “information”, the content character string “B part, 456, 789” is described as the element content, and the tag name character string “department, telephone, telephone” is the attribute name “tags”. "Is described as an attribute value.

このとき、図１２や図１４を参照しながら後述するごとく、変換仕様文書において、非キー要素のタグ名とこのタグ名よりも短く且つこのタグ名を特定しうる短縮タグ名とを対応付けて記述しておき、上記タグ名変換時に、変換仕様文書に基づいて、非キー要素のタグ名を短縮タグ名に置換するタグ名短縮変換を行なってもよい。このようなタグ名短縮変換を施されたＸＭＬ文書を元の状態に復元する際（逆変換時）には、変換仕様文書に基づいて、短縮タグ名を非キー要素のタグ名に置換するタグ名伸長変換を行なう。 At this time, as will be described later with reference to FIGS. 12 and 14, in the conversion specification document, the tag name of the non-key element is associated with a shortened tag name that is shorter than this tag name and can specify this tag name. It may be described, and at the time of the tag name conversion, tag name shortening conversion may be performed in which the tag name of the non-key element is replaced with the shortened tag name based on the conversion specification document. When restoring an XML document subjected to such tag name shortening conversion to the original state (in reverse conversion), a tag that replaces the shortened tag name with the tag name of the non-key element based on the conversion specification document Perform name expansion conversion.

図３（Ｃ）に示す第２具体例では、図３（Ｂ）に示すＸＭＬ文書に対して、さらに上述のようなタグ名短縮変換を施した結果のＸＭＬ文書が示されている。つまり、変換仕様文書においてタグ名“部署”，“住所”，“電話”をそれぞれ短縮タグ名“Ａ”，“Ｂ”，“Ｃ”に対応付けておくことにより（図１２や図１４参照）、第１レコードでは、属性名“tags”の属性値として記述されるタグ名文字列が“Ａ，Ｂ，Ｃ”に置き換えられ、同様に、第２レコードでは、属性名“tags”の属性値として記述されるタグ名文字列が“Ａ，Ｃ，Ｃ”に置き換えられている。 In the second specific example shown in FIG. 3C, an XML document obtained as a result of performing the above-described tag name shortening conversion on the XML document shown in FIG. 3B is shown. That is, by associating the tag names “department”, “address”, and “telephone” with the short tag names “A”, “B”, and “C”, respectively, in the conversion specification document (see FIGS. 12 and 14). In the first record, the tag name character string described as the attribute value of the attribute name “tags” is replaced with “A, B, C”. Similarly, in the second record, the attribute value of the attribute name “tags” is replaced. Is replaced with “A, C, C”.

図３（Ｄ）に示す第３具体例では、タグ名“情報”，第１属性名“tags”および第２属性名“contents”を付与された新たな要素を作成した上で、タグ名変換により、非キー要素についてのタグ名文字列をＣＳＶ形式で作成し、そのタグ名文字列を、新たな要素において第１属性名“tags”に対応する第１属性値として記述している。また、内容変換により、非キー要素についての内容文字列をＣＳＶ形式で作成し、この内容文字列を、新たな要素において第２属性名“contents”に対応する第２属性値として記述している。なお、この場合、新たな要素は空要素タグとして記述されることになる。 In the third specific example shown in FIG. 3D, a tag name conversion is performed after a new element having a tag name “information”, a first attribute name “tags”, and a second attribute name “contents” is created. Thus, the tag name character string for the non-key element is created in the CSV format, and the tag name character string is described as the first attribute value corresponding to the first attribute name “tags” in the new element. Further, a content character string for the non-key element is created in CSV format by content conversion, and this content character string is described as a second attribute value corresponding to the second attribute name “contents” in the new element. . In this case, the new element is described as an empty element tag.

つまり、図３（Ｄ）に示す変換後ＸＭＬ文書の第１レコードでは、タグ名“情報”の要素において、内容文字列“Ａ部，Ａ市，123”が第２属性名“contents”の第２属性値として記述され、タグ名文字列“部署，住所，電話”が第１属性名“tags”の第１属性値として記述されている。また、第２レコードでは、タグ名“情報”の要素において、内容文字列“Ｂ部，456，789”が第２属性名“contents”の第２属性値として記述され、タグ名文字列“部署，電話，電話”が第１属性名“tags”の第１属性値として記述されている。このとき、図３（Ｃ）に示した第２具体例と同様、第１属性値として記述されるタグ名文字列に対して、さらに、上述と同様のタグ名短縮変換を施してもよい。 That is, in the first record of the converted XML document shown in FIG. 3D, in the element of the tag name “information”, the content character string “A part, A city, 123” is the second attribute name “contents”. Two attribute values are described, and the tag name character string “department, address, telephone” is described as the first attribute value of the first attribute name “tags”. In the second record, in the element of the tag name “information”, the content character string “B part, 456, 789” is described as the second attribute value of the second attribute name “contents”, and the tag name character string “department” , Telephone, telephone "is described as the first attribute value of the first attribute name" tags ". At this time, similar to the second specific example shown in FIG. 3C, the tag name character string described as the first attribute value may be further subjected to the tag name shortening conversion as described above.

図３（Ｅ）に示す第４具体例では、タグ名変換により、非キー要素についてのタグ名文字列をＣＳＶ形式で作成し、そのタグ名文字列をタグ名として付与された新たな要素を作成する。そして、内容変換により、非キー要素についての内容文字列をＣＳＶ形式で作成し、この内容文字列を、新たな要素の内容として記述している。
つまり、図３（Ｅ）に示す変換後ＸＭＬ文書の第１レコードでは、タグ名“部署，住所，電話”の要素において、内容文字列“Ａ部，Ａ市，123”が要素内容として記述されている。また、第２レコードでは、タグ名“部署，電話，電話”の要素において、内容文字列“Ｂ部，456，789”が要素内容として記述されている。 In the fourth specific example shown in FIG. 3E, a tag name character string for a non-key element is created in CSV format by tag name conversion, and a new element assigned with the tag name character string as a tag name is created. create. Then, a content character string for the non-key element is created in CSV format by content conversion, and this content character string is described as the content of the new element.
That is, in the first record of the converted XML document shown in FIG. 3E, the content character string “A part, A city, 123” is described as the element content in the element of the tag name “department, address, telephone”. ing. In the second record, the content character string “B section, 456, 789” is described as the element content in the element of the tag name “department, telephone, telephone”.

図３（Ｆ）に示す第５具体例では、図３（Ｅ）に示すＸＭＬ文書に対して、さらに上述のようなタグ名短縮変換を施した結果のＸＭＬ文書が示されている。つまり、変換仕様文書においてタグ名“部署”，“住所”，“電話”をそれぞれ短縮タグ名“Ａ”，“Ｂ”，“Ｃ”に対応付けておくことにより（図１２や図１４参照）、第１レコードでは、新たな要素のタグ名として記述されるタグ名文字列が“Ａ，Ｂ，Ｃ”に置き換えられ、同様に、第２レコードでは、新たな要素のタグ名として記述されるタグ名文字列が“Ａ，Ｃ，Ｃ”に置き換えられている。 In the fifth specific example shown in FIG. 3 (F), an XML document obtained as a result of performing the above-described tag name shortening conversion on the XML document shown in FIG. 3 (E) is shown. That is, by associating the tag names “department”, “address”, and “telephone” with the short tag names “A”, “B”, and “C”, respectively, in the conversion specification document (see FIGS. 12 and 14). In the first record, the tag name character string described as the tag name of the new element is replaced with “A, B, C”. Similarly, in the second record, it is described as the tag name of the new element. The tag name character string is replaced with “A, C, C”.

なお、図３（Ｂ）に示すようにＣＳＶ形式のタグ名文字列を属性値として新要素の開始タグに入れる方法を用いた場合は、図３（Ｅ）に示すようにＣＳＶ形式のタグ名文字列をタグ名として新要素の開始タグに持たせる方法を用いた場合に比べ、終了タグが短くなる分だけデータ量が減ることになる。その代わり、前者の方法を用いた場合、ＣＳＶ形式のタグ名文字列を記述する属性が一つ増えることになる。図３（Ｂ）や図３（Ｅ）に示すＸＭＬ文書は、それぞれ図３（Ｃ）や図３（Ｆ）に示すごとく、前述したタグ名短縮変換を行なうことで、データ量を削減することができる。 In addition, when the method of putting a CSV tag name character string as an attribute value into the start tag of a new element as shown in FIG. 3B is used, the CSV tag name is used as shown in FIG. Compared to the method of using a character string as a tag name for the start tag of a new element, the data amount is reduced by the amount of the end tag being shortened. Instead, when the former method is used, the attribute describing the CSV tag name character string is increased by one. As shown in FIGS. 3C and 3F, the XML document shown in FIGS. 3B and 3E can reduce the amount of data by performing the tag name shortening conversion described above. Can do.

このように、第１実施形態の変換方法は、複数の非キー要素を一つの要素にまとめ、応用ソフトウエアがデータ処理を実行している間は非キー要素をデータ処理と無関係な要素として一括して扱えるようにするものである。非キー要素のタグ名をＣＳＶ形式に繋いで作成したタグ名文字列を、新たな要素のタグ名として記述するか、新たな要素の属性値として記述するかは、変換仕様文書等により選択・指定することができるようになっている。また、非キー要素の要素内容をＣＳＶ形式に繋いで作成した内容文字列を、新たな要素の属性値として記述するか、新たな要素の内容として記述するかも、変換仕様文書等により選択・指定することができるようになっている。変換方法として、図３（Ｂ）〜図３（Ｆ）で説明した各種方法のうちのどれを用いるかは、ＸＭＬ文書のデータ量によって、あるいは、データ処理に伴い新たな要素が幾つ増えるかによって決定されることになるが、非キー要素をひとまとめにして取り扱うという本発明の本質を考えれば、どの方法を採用してもよい。 As described above, in the conversion method of the first embodiment, a plurality of non-key elements are combined into one element, and while the application software is executing data processing, the non-key elements are collectively regarded as elements unrelated to data processing. So that it can be handled. Whether the tag name character string created by connecting the tag name of the non-key element to the CSV format is described as the tag name of the new element or the attribute value of the new element is selected by the conversion specification document, etc. It can be specified. In addition, the content string created by linking element contents of non-key elements in CSV format may be described as a new element attribute value or new element contents. Can be done. Which of the various methods described with reference to FIGS. 3B to 3F is used as the conversion method depends on the data amount of the XML document or how many new elements increase with data processing. As will be determined, any method may be adopted in view of the essence of the present invention in which non-key elements are handled together.

〔１−４〕第１実施形態における、表形式ＸＭＬ文書の変換方法および具体的な変換例
第１実施形態の変換方法を表形式ＸＭＬ文書に適用した場合、非キー要素の内容を含む内容文字列が作成され、この内容文字列が、新たに作成された要素において要素内容もしくは属性値として記述される。つまり、第１実施形態の変換方法を表形式ＸＭＬ文書に適用する場合、表形式ＸＭＬ文書における各レコードの要素記述が規則性を有しているので、表形式でないＸＭＬ文書で行なっていたタグ名変換（あるいは後述する属性名変換）を省略することができる。 [1-4] Tabular XML Document Conversion Method and Specific Conversion Example in the First Embodiment When the conversion method of the first embodiment is applied to a tabular XML document, content characters including the contents of non-key elements A column is created, and this content character string is described as element content or attribute value in the newly created element. That is, when the conversion method of the first embodiment is applied to a tabular XML document, the element description of each record in the tabular XML document has regularity, so the tag name used in the XML document that is not in the tabular format. Conversion (or attribute name conversion described later) can be omitted.

ただし、その場合、図９を参照しながら後述するごとく、変換仕様文書においては、キー要素と非キー要素とを区別するための情報が記述されるとともに、非キー要素のタグ名（属性を有する場合には、その属性名も含む；項目〔１−５〕参照）とそのタグ名や属性名を代表する代表タグ名（新要素のタグ名）とが対応付けて記述される。このような変換仕様文書に基づいて、データ構造変換時には、変換対象のＸＭＬ文書に対し、上述したタグ名変換を省略し上述した内容変換のみを行なう表形式変換を施す。一方、逆変換時には、この変換仕様文書に基づいて、代表タグ名（新要素のタグ名）から非キー要素のタグ名および属性名を割り出し、前記表形式変換を施されたＸＭＬ文書（データ処理後のＸＭＬ文書）に対し、非キー要素の記述を元の状態に戻す表形式逆変換を施す。 However, in this case, as will be described later with reference to FIG. 9, the conversion specification document describes information for distinguishing key elements from non-key elements, and also includes tag names (attributes) of the non-key elements. In this case, the attribute name is also included; see item [1-5]) and the representative tag name (tag name of the new element) representing the tag name or attribute name is described in association with each other. Based on such a conversion specification document, at the time of data structure conversion, table format conversion is performed on the XML document to be converted, omitting the tag name conversion described above and performing only the content conversion described above. On the other hand, at the time of reverse conversion, based on the conversion specification document, the tag name and attribute name of the non-key element are determined from the representative tag name (new element tag name), and the XML document (data processing) subjected to the table format conversion is applied. Table format reverse conversion is performed to return the description of the non-key element to the original state.

ここで、図４（Ａ）〜図４（Ｃ）を参照しながら表形式ＸＭＬ文書の具体的な変換結果について説明する。
図４（Ａ）に示す変換対象のＸＭＬ文書は２つのレコード（タグ名“個人”）を有しており、これらのレコードは、いずれも、タグ名“名前”，“会社”，“部署”，“住所”，“電話”の要素を一つずつ有している。つまり、これら二つのレコードにおいては、要素の種類や数が同じであり、図４（Ａ）に示すＸＭＬ文書は表形式である。 Here, a specific conversion result of the tabular XML document will be described with reference to FIGS. 4 (A) to 4 (C).
The XML document to be converted shown in FIG. 4A has two records (tag name “person”). These records all have tag names “name”, “company”, and “department”. , “Address” and “telephone” one by one. That is, in these two records, the type and number of elements are the same, and the XML document shown in FIG. 4A is in a table format.

図４（Ｂ）および図４（Ｃ）においては、それぞれ、図４（Ａ）で前述した表形式ＸＭＬ文書に、第１実施形態の構造化文書変換方法を適用して得られた変換結果の第１および第２具体例が示されている。ここでも、タグ名“名前”，“会社”の要素をキー要素とし、タグ名“部署”，“住所”，“電話”の要素を非キー要素とする。
第１実施形態の変換方法を表形式ＸＭＬ文書に適用する場合、上述のごとく変換仕様文書において代表タグ名（新要素のタグ名）“情報”と非キー要素のタグ名“部署”，“住所”，“電話”とを対応付けた上で、変換対象のＸＭＬ文書を成す要素をそのＸＭＬ文書に対するデータ処理の対象となるキー要素とそのデータ処理の対象とならない非キー要素とに分け、新たな要素を作成し、非キー要素に対しては内容変換を施す一方、キー要素については、変換後のＸＭＬ文書においても、何の変換も施すことなくそのまま記述する。 4B and 4C, conversion results obtained by applying the structured document conversion method according to the first embodiment to the table-format XML document described above with reference to FIG. First and second specific examples are shown. Again, the elements of the tag names “name” and “company” are used as key elements, and the elements of the tag names “department”, “address”, and “telephone” are used as non-key elements.
When the conversion method of the first embodiment is applied to a tabular XML document, as described above, in the conversion specification document, the representative tag name (new element tag name) “information” and the non-key element tag names “department”, “address” ”,“ Telephone ”, the elements that make up the XML document to be converted are divided into key elements that are subject to data processing for the XML document and non-key elements that are not subject to data processing, and new The non-key element is subjected to content conversion, while the key element is described as it is without any conversion in the converted XML document.

図４（Ｂ）に示す第１具体例では、代表タグ名“情報”を付与された新たな要素を作成した上で、内容変換により、非キー要素についての内容文字列をＣＳＶ形式で作成し、この内容文字列を、新たな要素の内容として記述している。
つまり、図４（Ｂ）に示す変換後ＸＭＬ文書の第１レコードでは、タグ名“情報”の要素において、内容文字列“Ａ部，Ａ市，123”が要素内容として記述される。また、第２レコードでは、タグ名“情報”の要素において、内容文字列“Ｂ部，Ｂ市，456”が要素内容として記述される。なお、図４（Ｂ）に示すＸＭＬ文書は、図９を参照しながら後述する変換仕様文書に従って、図４（Ａ）に示すＸＭＬ文書を変換して得られたものである。 In the first specific example shown in FIG. 4B, after creating a new element with a representative tag name “information”, a content character string for a non-key element is created in CSV format by content conversion. This content character string is described as the content of a new element.
That is, in the first record of the converted XML document shown in FIG. 4B, the content character string “A part, A city, 123” is described as the element content in the element of the tag name “information”. In the second record, the content character string “B section, B city, 456” is described as the element content in the element of the tag name “information”. Note that the XML document shown in FIG. 4B is obtained by converting the XML document shown in FIG. 4A according to a conversion specification document described later with reference to FIG.

図４（Ｃ）に示す第２具体例では、タグ名“情報”および属性名“contents”を付与された新たな要素を作成した上で、内容変換により、非キー要素についての内容文字列をＣＳＶ形式で作成し、この内容文字列を、新たな要素において属性名“contents”に対応する属性値として記述している。なお、この場合、新たな要素は空要素タグとして記述されることになる。 In the second specific example shown in FIG. 4C, a new element having a tag name “information” and an attribute name “contents” is created, and then a content character string for a non-key element is converted by content conversion. Created in CSV format, this content character string is described as an attribute value corresponding to the attribute name “contents” in the new element. In this case, the new element is described as an empty element tag.

つまり、図４（Ｃ）に示す変換後ＸＭＬ文書の第１レコードでは、タグ名“情報”の要素において、内容文字列“Ａ部，Ａ市，123”が属性名“contents”の属性値として記述される。また、第２レコードでは、タグ名“情報”の要素において、内容文字列“Ｂ部，Ｂ市，456”が属性名“contents”の属性値として記述される。
このように、変換対象のＸＭＬ文書が表形式で記述されている場合、元のＸＭＬ文書を復元するための逆変換に際してタグ名（属性を有する場合には、その属性名も含む）を容易に割り出すことができるので、タグ名変換や属性名変換（この属性名変換については、図５〜図８を参照しながら後述する）を省略することができる。従って、表形式ＸＭＬ文書を変換した場合、その変換後ＸＭＬ文書には、非キー要素の内容文字列が記述されていればよく、タグ名や属性名に係る記述を省略することができる。 That is, in the first record of the converted XML document shown in FIG. 4C, in the element of the tag name “information”, the content character string “A part, A city, 123” is used as the attribute value of the attribute name “contents”. Described. In the second record, the content character string “B section, B city, 456” is described as the attribute value of the attribute name “contents” in the element of the tag name “information”.
As described above, when the XML document to be converted is described in the table format, the tag name (including the attribute name if it has an attribute) can be easily obtained in the reverse conversion for restoring the original XML document. Since it can be determined, tag name conversion and attribute name conversion (this attribute name conversion will be described later with reference to FIGS. 5 to 8) can be omitted. Therefore, when a tabular XML document is converted, it is only necessary to describe the content character string of the non-key element in the converted XML document, and the description relating to the tag name and attribute name can be omitted.

〔１−５〕第１実施形態における、階層構造および属性を有するＸＭＬ文書の変換方法および具体的な変換例
ここまで、各レコードにおける非キー要素が、単一階層であり且つ属性をもたない場合について説明してきたが、第１実施形態の変換方法は、非キー要素が複数階層を成す場合（階層が深くなった場合）や属性を有する場合にも、上述した原理を拡張することによって適用される。 [1-5] Conversion method and specific conversion example of XML document having hierarchical structure and attributes in the first embodiment Up to now, the non-key elements in each record have a single hierarchy and no attributes Although the case has been described, the conversion method of the first embodiment is applied by extending the above-described principle even when the non-key element has a plurality of hierarchies (when the hierarchies become deeper) or has attributes. Is done.

非キー要素が複数階層を成している場合、第１実施形態の変換方法では、上記タグ名変換によって得られるタグ名文字列において、複数階層を成す非キー要素のタグ名に、その非キー要素が複数階層を成している旨を示す階層構造識別情報（記号もしくは文字列；図６〜図８参照）を付加する。
また、非キー要素が属性を有する場合、第１実施形態の変換方法では、その属性名の文字列に、この文字列が属性名であることを示す属性名識別情報（記号；例えば“＠”；図６〜図８参照）を付加する。そして、上記タグ名変換によって得られるタグ名文字列において、属性を有する非キー要素のタグ名の後に、区切り記号（例えばコンマ）を介して、上述のごとく属性名識別情報を付加した属性名を記述する。また、上記内容変換によって得られる内容文字列において、属性を有する非キー要素の内容の後に、区切り記号（例えばコンマ）を介して、その属性の属性値を記述する。 When the non-key element has a plurality of hierarchies, in the conversion method of the first embodiment, in the tag name character string obtained by the tag name conversion, the tag name of the non-key element having a plurality of hierarchies is changed to the non-key element. Hierarchical structure identification information (symbol or character string; see FIGS. 6 to 8) indicating that the element has a plurality of hierarchies is added.
When the non-key element has an attribute, in the conversion method of the first embodiment, the attribute name identification information (symbol; for example, “@” indicating that the character string is an attribute name is added to the character string of the attribute name. ; See FIGS. 6 to 8). Then, in the tag name character string obtained by the tag name conversion, the attribute name to which the attribute name identification information is added as described above via the delimiter (for example, a comma) after the tag name of the non-key element having the attribute. Describe. In the content character string obtained by the content conversion, the attribute value of the attribute is described via a delimiter (for example, a comma) after the content of the non-key element having the attribute.

これにより、属性値は、内容文字列において、タグ名文字列中における属性名の記述位置に対応する位置に記述される。つまり、非キー要素のタグ名および属性名と非キー要素の要素内容および属性内容（属性値）とを、一対一の対応関係を保持しながら、それぞれＣＳＶ形式で繋いだタグ名文字列および内容文字列が作成され、ＸＭＬ文書中に記述される。 As a result, the attribute value is described in the content character string at a position corresponding to the description position of the attribute name in the tag name character string. That is, the tag name character string and the content in which the tag name and attribute name of the non-key element and the element content and attribute content (attribute value) of the non-key element are connected in the CSV format while maintaining a one-to-one correspondence. A character string is created and described in the XML document.

なお、レコード毎の要素の種類や数が同じである表形式ＸＭＬ文書であって非キー要素が属性を有している場合には、非キー要素のタグ名および属性名とそのタグ名および属性名を代表する代表タグ名（新要素のタグ名）とを対応付けた変換仕様文書が作成される。そして、変換後のＸＭＬ文書における新要素内の内容文字列では、要素内容および属性内容（属性値）が、その変換仕様文書におけるタグ名および属性名の記述順序と対応した順序で記述される。 In the case of a tabular XML document having the same type and number of elements for each record and the non-key element has an attribute, the tag name and attribute name of the non-key element and the tag name and attribute A conversion specification document in which a representative tag name representing a name (a tag name of a new element) is associated is created. In the content character string in the new element in the converted XML document, the element content and the attribute content (attribute value) are described in an order corresponding to the description order of the tag name and the attribute name in the conversion specification document.

ここで、図５〜図８を参照しながら、階層構造および属性を有するＸＭＬ文書の具体的な変換結果について説明する。
図５に示す変換対象のＸＭＬは２つのレコード（タグ名“個人”）を有しており、これらのレコードは、いずれも、タグ名“名前”，“勤務先”，“住所”，“連絡先”の要素を一つずつ有している。そして、タグ名“勤務先”の要素は、タグ名“会社”，“部署”の要素を有して階層構造となっており、さらにタグ名“部署”の要素は、属性名“担務”の属性を有している。なお、第１レコードは、タグ名“部署”の要素を一つだけ有しているのに対し、第２レコードは、タグ名“部署”の要素を二つ有している。また、タグ名“連絡先”の要素は、タグ名“電話”，“Fax”，“Email”の要素を有して階層構造となっている。 Here, a specific conversion result of an XML document having a hierarchical structure and attributes will be described with reference to FIGS.
The XML to be converted shown in FIG. 5 has two records (tag name “person”), and these records all have tag names “name”, “business”, “address”, and “contact”. It has one “first” element. The element of the tag name “work” has a hierarchical structure with elements of the tag names “company” and “department”, and the element of the tag name “department” has an attribute name “service”. Has attributes. The first record has only one element of the tag name “department”, whereas the second record has two elements of the tag name “department”. The element of the tag name “contact” has a hierarchical structure including elements of the tag names “phone”, “Fax”, and “Email”.

図６〜図８においては、それぞれ、図５で前述したＸＭＬ文書に第１実施形態の構造化文書変換方法を適用して得れた変換結果の第１〜第３具体例が示されている。ここでも、タグ名“名前”，“会社”の要素をキー要素とし、それ以外の要素を非キー要素とする。ただし、ここでは、タグ名“勤務先”の要素が、タグ名“会社”の要素を含む階層構造となっているので、タグ名“勤務先”の要素はキー要素として取り扱われることになる。 FIGS. 6 to 8 show first to third specific examples of conversion results obtained by applying the structured document conversion method of the first embodiment to the XML document described above with reference to FIG. . Again, the elements of tag names “name” and “company” are key elements, and the other elements are non-key elements. However, since the element of the tag name “workplace” has a hierarchical structure including the element of the tag name “company”, the element of the tag name “workplace” is handled as a key element.

図６に示す第１具体例では、レコード毎に、タグ名“勤務先”の要素内に、タグ名“情報１”および属性名“tags”を付与された第１の新要素が作成されるとともに、タグ名“名前”やタグ名“勤務先”の要素と同一階層に、タグ名“情報２”および属性名“tags”を付与された第２の新要素が作成される。
そして、第１レコードのタグ名“情報１”の要素においては、タグ名文字列“部署，＠担務”が属性名“tags”の属性値として記述され、内容文字列“Ａ部，主務”が要素内容として記述されている。また、第１レコードのタグ名“情報２”の要素においては、タグ名文字列“住所，0連絡先，1電話，1Fax，1Email”が属性名“tags”の属性値として記述され、内容文字列“Ａ市，123，321，[email protected]”が要素内容として記述されている。 In the first specific example shown in FIG. 6, for each record, a first new element having a tag name “information 1” and an attribute name “tags” is created in the element of the tag name “workplace”. At the same time, a second new element having the tag name “information 2” and the attribute name “tags” is created in the same hierarchy as the elements of the tag name “name” and the tag name “workplace”.
In the element of the tag name “information 1” of the first record, the tag name character string “department, @ employee” is described as the attribute value of the attribute name “tags”, and the content character string “part A, supervisor” It is described as element content. In addition, in the element of the tag name “information 2” of the first record, the tag name character string “address, 0 contact, 1 telephone, 1 fax, 1 email” is described as the attribute value of the attribute name “tags”, and the content character The column “A city, 123, 321, [email protected]” is described as the element content.

同様に、第２レコードのタグ名“情報１”の要素においては、タグ名文字列“部署，＠担務，部署，＠担務”が属性名“tags”の属性値として記述され、内容文字列“Ｂ-1部，主務，Ｂ-2部，兼務”が要素内容として記述されている。また、第２レコードのタグ名“情報２”の要素においては、タグ名文字列“住所，0連絡先，1電話，1Fax，1Email”が属性名“tags”の属性値として記述され、内容文字列“Ｂ市，456，654，[email protected]”が要素内容として記述されている。 Similarly, in the element of the tag name “information 1” of the second record, the tag name character string “department, @work, department, @work” is described as the attribute value of the attribute name “tags”, and the content character string “ "B-1 part, supervisor, B-2 part, concurrent part" is described as element content. In the element of the tag name “information 2” of the second record, the tag name character string “address, 0 contact, 1 telephone, 1 fax, 1 email” is described as the attribute value of the attribute name “tags”, and the content character The column “B city, 456, 654, [email protected]” is described as the element content.

ここで、“担務”の先頭に付加された“＠”は、属性名識別情報であり、この“担務”が属性名であることを示す情報である。また、“連絡先”の先頭に付加された“0”や、“電話”，“Fax”，“Email”の先頭に付加された“1”は階層構造識別情報であり、“1”を付されたタグ名の要素が“0”を付されたタグ名の要素の下位階層（要素内容に含まれる要素）であることを示すものである。 Here, “@” added to the head of “service” is attribute name identification information, and is information indicating that this “service” is an attribute name. Also, “0” added to the beginning of “Contact” and “1” added to the beginning of “Phone”, “Fax”, “Email” are hierarchical structure identification information, and “1” is added. This indicates that the element of the tag name is a lower layer (element included in the element contents) of the element of the tag name to which “0” is added.

この図６に示すＸＭＬ文書は、図１５を参照しながら後述する変換仕様文書に従って、図５に示すＸＭＬ文書を変換して得られたものである。特に、図６に示すＸＭＬ文書は、図１５に示す変換仕様文書において“情報１”および“情報２”の表形式情報としていずれも“nontable”（表形式でない旨）を設定して得られたものである。つまり、この図６に示す例では、タグ名“住所”および“連絡先”の要素は、レコード毎の要素の種類や数が同一であるので、表形式として取り扱うことのできるものであるが、表形式情報として“nontable”を設定することにより、タグ名“住所”および“連絡先”の要素を、表形式ではないものとして取り扱っている。 The XML document shown in FIG. 6 is obtained by converting the XML document shown in FIG. 5 according to a conversion specification document described later with reference to FIG. In particular, the XML document shown in FIG. 6 is obtained by setting “nontable” (not in table format) as the table information of “information 1” and “information 2” in the conversion specification document shown in FIG. Is. That is, in the example shown in FIG. 6, the elements of the tag names “address” and “contact” can be handled as a table because the type and number of elements for each record are the same. By setting “nontable” as the tabular information, the elements of the tag name “address” and “contact” are handled as not in tabular format.

図７に示す第２具体例においても、図６に示した例と同様、レコード毎に、タグ名“勤務先”の要素内に、タグ名“情報１”および属性名“tags”を付与された第１の新要素が作成されるとともに、タグ名“名前”やタグ名“勤務先”の要素と同一階層に、タグ名“情報２”および属性名“tags”を付与された第２の新要素が作成される。
また、第１レコードのタグ名“情報１”の要素においても、図６に示した例と同様、タグ名文字列“部署，＠担務”が属性名“tags”の属性値として記述され、内容文字列“Ａ部，主務”が要素内容として記述されている。ただし、図７に示す第２具体例では、第１レコードのタグ名“情報２”の要素においては、タグ名文字列“住所，連絡先/電話，連絡先/Fax，連絡先/Email”が属性名“tags”の属性値として記述され、内容文字列“Ａ市，123，321，[email protected]”が要素内容として記述されている。 In the second specific example shown in FIG. 7 as well, the tag name “information 1” and the attribute name “tags” are assigned to each record in the element of the tag name “workplace” as in the example shown in FIG. In addition, the first new element is created, and the tag name “information 2” and the attribute name “tags” are assigned to the same hierarchy as the element of the tag name “name” and the tag name “work”. A new element is created.
Also, in the element of the tag name “information 1” of the first record, the tag name character string “department, @work” is described as the attribute value of the attribute name “tags”, as in the example shown in FIG. The character string “Part A, competent” is described as the element content. However, in the second specific example shown in FIG. 7, the tag name character string “address, contact / phone, contact / fax, contact / Email” is included in the element of the tag name “information 2” of the first record. It is described as an attribute value of the attribute name “tags”, and a content character string “A city, 123, 321, [email protected]” is described as element content.

同様に、第２レコードのタグ名“情報１”の要素においては、タグ名文字列“部署，＠担務，部署，＠担務”が属性名“tags”の属性値として記述され、内容文字列“Ｂ-1部，主務，Ｂ-2部，兼務”が要素内容として記述されている。また、第２レコードのタグ名“情報２”の要素においては、タグ名文字列“住所，連絡先/電話，連絡先/Fax，連絡先/Email”が属性名“tags”の属性値として記述され、内容文字列“Ｂ市，456，654，[email protected]”が要素内容として記述されている。 Similarly, in the element of the tag name “information 1” of the second record, the tag name character string “department, @work, department, @work” is described as the attribute value of the attribute name “tags”, and the content character string “ "B-1 part, supervisor, B-2 part, concurrent part" is described as element content. In addition, in the element of the tag name “information 2” of the second record, the tag name character string “address, contact / phone, contact / fax, contact / Email” is described as the attribute value of the attribute name “tags”. The content character string “B city, 456, 654, [email protected]” is described as the element content.

ここで、“電話”，“Fax”，“Email”の先頭に付加された文字列“連絡先/”は階層構造識別情報であり、文字列“連絡先/”を付されたタグ名の要素が、タグ名“連絡先”の要素の下位階層（要素内容に含まれる要素）であることを示すものである。この階層位置の表記法は、ＸPathとして知られている。
この図７に示すＸＭＬ文書は、図１７を参照しながら後述する変換仕様文書に従って、図５に示すＸＭＬ文書を変換して得られたものである。特に、図７に示すＸＭＬ文書は、図１７に示す変換仕様文書において“情報１”および“情報２”の表形式情報としていずれも“nontable”（表形式でない旨）を設定して得られたものである。つまり、この図７に示す例でも、タグ名“住所”および“連絡先”の要素は表形式として取り扱うことのできるものであるが、表形式情報として“nontable”を設定することにより、タグ名“住所”および“連絡先”の要素を、表形式ではないものとして取り扱っている。 Here, the character string “Contact /” added to the head of “Telephone”, “Fax”, “Email” is hierarchical structure identification information, and the tag name element with the character string “Contact /” Indicates a lower hierarchy (element included in the element contents) of the element of the tag name “contact”. This hierarchical position notation is known as XPath.
The XML document shown in FIG. 7 is obtained by converting the XML document shown in FIG. 5 according to a conversion specification document described later with reference to FIG. In particular, the XML document shown in FIG. 7 is obtained by setting “nontable” (not in tabular form) as the table information of “information 1” and “information 2” in the conversion specification document shown in FIG. Is. That is, even in the example shown in FIG. 7, the elements of the tag names “address” and “contact” can be handled as a table format, but by setting “nontable” as the table format information, The elements of “address” and “contact” are treated as non-tabular.

図８に示す第３具体例においては、レコード毎に、タグ名“勤務先”の要素内に、タグ名“情報１”および属性名“tags”を付与された第１の新要素が作成されるとともに、タグ名“名前”やタグ名“勤務先”の要素と同一階層に、タグ名“情報２”を付与された第２の新要素が作成される。
そして、第１レコードのタグ名“情報１”の要素においては、タグ名文字列“部署，＠担務”が属性名“tags”の属性値として記述され、内容文字列“Ａ部，主務”が要素内容として記述されている。また、タグ名“住所”および“連絡先”の要素を表形式として取り扱うことにより、第１レコードのタグ名“情報２”の要素においては、内容文字列“Ａ市，123，321，[email protected]”が要素内容として記述されている。 In the third specific example shown in FIG. 8, for each record, a first new element having a tag name “information 1” and an attribute name “tags” is created in the element of the tag name “workplace”. At the same time, a second new element to which the tag name “information 2” is assigned is created in the same hierarchy as the elements of the tag name “name” and the tag name “workplace”.
In the element of the tag name “information 1” of the first record, the tag name character string “department, @ employee” is described as the attribute value of the attribute name “tags”, and the content character string “part A, supervisor” It is described as element content. Further, by handling the elements of the tag name “address” and “contact” as a table format, the content character string “A city, 123, 321, a1- “[email protected]” is described as the element content.

同様に、第２レコードのタグ名“情報１”の要素においては、タグ名文字列“部署，＠担務，部署，＠担務”が属性名“tags”の属性値として記述され、内容文字列“Ｂ-1部，主務，Ｂ-2部，兼務”が要素内容として記述されている。また、第２レコードのタグ名“情報２”の要素においては、内容文字列“Ｂ市，456，654，[email protected]”が要素内容として記述されている。 Similarly, in the element of the tag name “information 1” of the second record, the tag name character string “department, @work, department, @work” is described as the attribute value of the attribute name “tags”, and the content character string “ "B-1 part, supervisor, B-2 part, concurrent part" is described as element content. In the element of the tag name “information 2” of the second record, the content character string “B city, 456, 654, [email protected]” is described as the element content.

この図８に示すＸＭＬ文書は、図１５もしくは図１７を参照しながら後述する変換仕様文書に従って、図５に示すＸＭＬ文書を変換して得られたものである。特に、図８に示すＸＭＬ文書は、図１５もしくは図１７に示す変換仕様文書において“情報１”の表形式情報として “nontable”（表形式でない旨）を設定するとともに“情報２”の表形式情報として“table”（表形式である旨）を設定して得られたものである。 The XML document shown in FIG. 8 is obtained by converting the XML document shown in FIG. 5 in accordance with a conversion specification document described later with reference to FIG. 15 or FIG. In particular, the XML document shown in FIG. 8 sets “nontable” (not a table format) as the table format information of “information 1” in the conversion specification document shown in FIG. 15 or FIG. This is obtained by setting “table” (indicating that the information is in a table format) as information.

なお、図６〜図８のいずれのＸＭＬ文書においても、当然、キー要素については、何の変換も施すことなくそのまま記述されている。
〔１−６〕第１実施形態の変換仕様文書およびスタイルシートの具体例
〔１−６−１〕表形式データのための変換仕様文書およびスタイルシート
図９には、図４（Ａ）に示した表形式ＸＭＬ文書を変換対象とした場合の、具体的な変換仕様文書（ＸＭＬ文書）が示されている。 In any of the XML documents in FIGS. 6 to 8, the key elements are naturally described as they are without any conversion.
[1-6] Specific Example of Conversion Specification Document and Style Sheet of First Embodiment [1-6-1] Conversion Specification Document and Style Sheet for Tabular Data FIG. 9 shows FIG. A specific conversion specification document (XML document) when the table format XML document is to be converted is shown.

この図９に示す変換仕様文書では、ルートのタグ名“名簿”やレコードのタグ名“個人”が記述されるほか、タグ名“key_tags”の要素の内容としてキー要素のタグ名“名前”および“会社”を記述するとともにタグ名“nonkey_tags”の要素の内容として非キー要素のタグ名“部署”，“住所”および“電話”を記述することにより、キー要素と非キー要素とを区別するための情報が記述されている。また、タグ名“nonkey_tags”の要素の内容には、タグ名“merged_tag”の要素が含まれており、この要素の内容として、非キー要素を一つにまとめるための新たな要素のタグ名（代表タグ名）“情報”が記述されている。このような変換仕様文書により、ＸＭＬ文書のデータ構造変換実行手順が指示される。 In the conversion specification document shown in FIG. 9, the tag name “name list” of the root and the tag name “person” of the record are described, and the tag name “name” of the key element is included as the contents of the element of the tag name “key_tags”. Distinguish between key elements and non-key elements by describing "company" and the tag name "department", "address", and "phone" of the non-key element as the contents of the element of the tag name "nonkey_tags" Information for this is described. In addition, the content of the element of the tag name “nonkey_tags” includes the element of the tag name “merged_tag”. As the content of this element, the tag name of a new element for combining the non-key elements into one ( (Representative tag name) "information" is described. Such a conversion specification document instructs the data structure conversion execution procedure of the XML document.

そして、図２に示すＸＳＬＴ変換部１１が、図９に示す変換仕様文書を読み込み、その変換仕様文書と自動変換スタイルシート（自動変換ＸＳＬシート；図示省略）とにより、図１０に示す構造変換用スタイルシート（ＸＳＬシート）と図１１に示す逆変換用スタイルシート（ＸＳＬシート）とを生成する。図１０に示す構造変換用スタイルシートは、ＸＳＬＴ構造変換部１２によって読み込まれ、変換対象のＸＭＬ文書（入力ＸＭＬ文書）に対しデータ構造変換を施すために用いられる。また、図１１に示す逆変換用スタイルシートは、ＸＳＬＴ逆変換部１３によって読み込まれ、応用ソフトウエア３０により処理されたＸＭＬ文書（抽出ＸＭＬ文書，変換後ＸＭＬ文書）を元の形式のＸＭＬ文書（非キー要素を元の状態に戻したＸＭＬ文書）に復元するために用いられる。 Then, the XSLT conversion unit 11 shown in FIG. 2 reads the conversion specification document shown in FIG. 9 and uses the conversion specification document and the automatic conversion style sheet (automatic conversion XSL sheet; not shown) for structure conversion shown in FIG. A style sheet (XSL sheet) and a reverse conversion style sheet (XSL sheet) shown in FIG. 11 are generated. The style sheet for structure conversion shown in FIG. 10 is read by the XSLT structure conversion unit 12 and used to perform data structure conversion on the XML document to be converted (input XML document). Also, the reverse conversion style sheet shown in FIG. 11 is an XML document (original XML document (extracted XML document, converted XML document)) that is read by the XSLT reverse conversion unit 13 and processed by the application software 30. This is used to restore a non-key element to an XML document that has been returned to its original state.

上述のように変換対象のＸＭＬ文書が表形式データである場合、非キー要素のタグ名は、変換／逆変換用スタイルシートによって新要素のタグ名（代表タグ名）と対応付けられるので、変換後のＸＭＬ文書には現われない。これにより、変換後のＸＭＬ文書のデータ量を大幅に削減することができる。つまり、変換仕様文書と自動変換スタイルシートとの両方を用意しておくか、もしくは、構造変換／逆変換用スタイルシートを用意しておけば、非キー要素のタグ名は変換後ＸＭＬ文書では基本的に不要になる。上述のようなスタイルシートの用意がない場合は、表形式のＸＭＬ文書であっても、非表形式として取り扱えば、要素の並びの規則性に基づいて、元のＸＭＬ文書を復元することは可能である。 When the XML document to be converted is tabular data as described above, the tag name of the non-key element is associated with the tag name (representative tag name) of the new element by the conversion / reverse conversion style sheet. It will not appear in later XML documents. Thereby, the data amount of the converted XML document can be greatly reduced. In other words, if both the conversion specification document and the automatic conversion style sheet are prepared, or the structure conversion / reverse conversion style sheet is prepared, the tag name of the non-key element is the basic in the converted XML document. Unnecessary. If there is no style sheet as described above, even if it is a tabular XML document, if it is handled as a non-tabular format, it is possible to restore the original XML document based on the regularity of the arrangement of elements. It is.

〔１−６−２〕タグ名短縮変換を行なうための変換仕様文書
図１２には、第１実施形態における、タグ名短縮変換を行なうための具体的な変換仕様文書（ＸＭＬ文書）が示されている。この図１２に示す変換仕様文書においては、変換対象のＸＭＬ文書における非キー要素のタグ名“部署”，“住所”，“電話”を変換後のＸＭＬ文書において例えば図３（Ｃ）に示すようにそれぞれ短縮タグ名“Ａ”，“Ｂ”，“Ｃ”に置き換えるタグ名短縮変換を行なうべく、タグ名“部署”，“住所”，“電話”と短縮タグ名“Ａ”，“Ｂ”，“Ｃ”との対応付けが記述されている。このとき、図１２に示す変換仕様文書においても、図９に示した変換仕様文書と同様の記述がなされているが、図１２に示す変換仕様文書では、短縮タグ名が、タグ名“nonkey_tags”の各キー要素のタグ名において“abbr”属性によって対応付けられて記述されている。 [1-6-2] Conversion Specification Document for Performing Tag Name Short Conversion FIG. 12 shows a specific conversion specification document (XML document) for performing tag name short conversion in the first embodiment. ing. In the conversion specification document shown in FIG. 12, the tag names “Department”, “Address”, and “Telephone” of the non-key elements in the XML document to be converted are shown in the converted XML document as shown in FIG. Tag name “department”, “address”, “telephone” and abbreviated tag names “A”, “B” to perform tag name abbreviated conversion to replace with the abbreviated tag names “A”, “B”, “C” respectively. , “C” is described. At this time, the conversion specification document shown in FIG. 12 has the same description as the conversion specification document shown in FIG. 9, but in the conversion specification document shown in FIG. 12, the short tag name is the tag name “nonkey_tags”. The tag name of each key element is described in association with the “abbr” attribute.

〔１−６−３〕表形式／非表形式を指定するための変換仕様文書
図１３には、第１実施形態における、データ形式（表形式であるか否か）を指定する機能を有する変換仕様文書（ＸＭＬ文書）の具体例が示されている。この図１３に示す変換仕様文書においては、変換対象のＸＭＬ文書（非キー要素）が表形式で記述されているか否かの表形式情報が記述されている。つまり、図１３に示す変換仕様文書においても、図９に示した変換仕様文書と同様の記述がなされているが、図１３に示す変換仕様文書では、表形式情報が、タグ名“merged_tag”の要素における“format”属性として付加されている。表形式を指定する場合には“format”属性値として例えば“table”を記述する一方、非表形式を指定する場合には“format”属性値として例えば“nontable”を記述する。 [1-6-3] Conversion specification document for designating table format / non-table format FIG. 13 shows a conversion having a function of designating a data format (whether it is a table format) in the first embodiment. A specific example of a specification document (XML document) is shown. The conversion specification document shown in FIG. 13 describes table format information indicating whether or not the XML document (non-key element) to be converted is described in a table format. That is, in the conversion specification document shown in FIG. 13, the same description as the conversion specification document shown in FIG. 9 is made. However, in the conversion specification document shown in FIG. 13, the table format information includes the tag name “merged_tag”. It is added as “format” attribute in the element. For example, “table” is described as the “format” attribute value when the table format is specified, while “nontable” is described as the “format” attribute value when the non-table format is specified.

変換仕様文書の“format”属性値として“table”が記述されていれば、図２に示すＸＳＬＴ構造変換部１２は、表形式に対応した変換処理（タグ名変換を省略し内容変換のみ行なう処理）を実行するとともに、図２に示すＸＳＬＴ逆変換部１３は、表形式に対応した逆変換を実行する。逆に、変換仕様文書の“format”属性値として“nontable”が記述されていれば、図２に示すＸＳＬＴ構造変換部１２は、非表形式に対応した変換処理（タグ名変換および内容変換の両方を行なう処理）を実行するとともに、図２に示すＸＳＬＴ逆変換部１３は、非表形式に対応した逆変換を実行する。 If “table” is described as the “format” attribute value of the conversion specification document, the XSLT structure conversion unit 12 shown in FIG. 2 performs conversion processing corresponding to the table format (processing for performing only content conversion while omitting tag name conversion). 2) and the XSLT inverse transform unit 13 shown in FIG. 2 performs inverse transform corresponding to the table format. Conversely, if “nontable” is described as the “format” attribute value of the conversion specification document, the XSLT structure conversion unit 12 shown in FIG. 2 performs conversion processing (tag name conversion and content conversion) corresponding to the non-table format. 2 is executed, and the XSLT inverse transform unit 13 shown in FIG. 2 performs inverse transform corresponding to the non-table format.

従って、エンドユーザは、ＸＭＬで記述された変換仕様文書において、“format”属性を用いて変換対象のＸＭＬ文書が表形式か否かを指定することができ、この“format”属性により、表形式変換を行なうか否か、つまり、表形式変換と非表形式変換とのどちらを行なうかが指示され、表形式変換や表形式逆変換の実行／非実行の自動切換え、つまり、表形式変換／逆変換と非表形式変換／逆変換との自動切換えを行なうことができる。 Therefore, the end user can specify whether or not the XML document to be converted is in the table format using the “format” attribute in the conversion specification document described in XML, and the “format” attribute can be used to specify the table format. Whether conversion is to be performed, that is, whether to perform tabular conversion or non-tabular conversion is instructed, and automatic switching between execution / non-execution of tabular conversion and tabular conversion, that is, tabular conversion / Automatic switching between reverse conversion and non-tabular conversion / inverse conversion can be performed.

なお、上述のような表形式情報としての“format”属性は、図２１（Ａ）および図２１（Ｂ）を参照しながら後述するごとく、図２に示すＸＳＬＴ変換部１１において表形式データに対応する構造変換／逆変換用スタイルシートと非表形式データに対応する構造変換／逆変換用スタイルシートとのどちらを作成するかを判断する際に参照される。
また、一つの変換対象のＸＭＬ文書に表形式の部分と非表形式の部分とが混在する場合には、例えば図１５や図１７に示すごとく、タグ名“merged_tag”の各要素における“format”属性によって表形式情報を指定することで、例えば図８に示すごとく、表形式の部分に対しては表形式変換を施すとともに、非表形式の部分に対しては非表形式変換を施すようにすることもできる。 Note that the “format” attribute as the table format information as described above corresponds to the table format data in the XSLT conversion unit 11 shown in FIG. 2 as will be described later with reference to FIGS. 21 (A) and 21 (B). This is referred to when determining whether to create a structure conversion / inverse conversion style sheet or a structure conversion / inverse conversion style sheet corresponding to non-tabular data.
Further, when a table format portion and a non-table format portion are mixed in one XML document to be converted, for example, as shown in FIGS. 15 and 17, “format” in each element of the tag name “merged_tag”. By specifying the table format information by attribute, for example, as shown in FIG. 8, the table format portion is subjected to the table format conversion, and the non-table format portion is subjected to the non-table format conversion. You can also

〔１−６−４〕短縮変換の実行／非実行を指定するための変換仕様文書
図１４には、第１実施形態における、データ形式（タグ名短縮変換を行なうか否か）を指定する機能を有する変換仕様文書（ＸＭＬ文書）の具体例が示されている。この図１４に示す変換仕様文書においては、変換時にタグ名短縮変換を行なうか否かのタグ名短縮変換情報が記述されている。つまり、図１４に示す変換仕様文書においては、図１２に示した変換仕様文書とほぼ同様の記述がなされているが、図１４に示す変換仕様文書では、タグ名短縮変換情報が、タグ名“merged_tag”の要素における“format”属性として付加されている。タグ名短縮変換を実行する場合には“format”属性値として例えば“abbr”を記述する。 [1-6-4] Conversion Specification Document for Specifying Execution / Non-execution of Short Conversion FIG. 14 shows a function for specifying a data format (whether or not to perform short tag name conversion) in the first embodiment. A specific example of a conversion specification document (XML document) having “” is shown. In the conversion specification document shown in FIG. 14, tag name shortening conversion information indicating whether or not to perform tag name shortening conversion at the time of conversion is described. In other words, the conversion specification document shown in FIG. 14 has almost the same description as the conversion specification document shown in FIG. 12. However, in the conversion specification document shown in FIG. It is added as a “format” attribute in the “merged_tag” element. When executing the tag name shortening conversion, for example, “abbr” is described as the “format” attribute value.

変換仕様文書においてタグ名と短縮タグ名との対応付けがなされるとともに “format”属性値として“abbr”が記述されていれば、図２に示すＸＳＬＴ構造変換部１２は、タグ名短縮変換処理を実行するとともに、図２に示すＸＳＬＴ逆変換部１３は、タグ名伸長変換処理を実行する。
従って、エンドユーザは、ＸＭＬで記述された変換仕様文書において、“format”属性を用いてタグ名短縮変換を行なうか否かを指定することができ、この“format”属性により、タグ名短縮変換やタグ名伸長変換の実行／非実行の自動切換えを行なうことができる。 If the tag name and the short tag name are associated with each other in the conversion specification document and “abbr” is described as the “format” attribute value, the XSLT structure conversion unit 12 shown in FIG. 2 is executed, and the XSLT inverse conversion unit 13 shown in FIG. 2 executes a tag name expansion conversion process.
Therefore, the end user can specify whether or not to perform the tag name shortening conversion using the “format” attribute in the conversion specification document described in XML. In addition, execution / non-execution of tag name expansion conversion can be automatically switched.

〔１−６−５〕階層構造と属性とをもつＸＭＬ文書のための変換仕様文書
図１５には、第１実施形態における、レコード内の非キー要素が階層構造を成すとともに属性を有する場合の変換仕様文書（ＸＭＬ文書）の第１具体例が示されている。特に、この図１５に示す変換仕様文書は、図５に示したＸＭＬ文書を変換対象とし、図５に示したＸＭＬ文書を、図６または図８で説明したＸＭＬ文書に変換するためのものである。ここでは、要素の階層構造が、属性“depth”を用いて記述されるほか、子を有する親のタグにも属性“depth”を付しておく。 [1-6-5] Conversion Specification Document for XML Document with Hierarchical Structure and Attributes FIG. 15 shows a case where non-key elements in a record have a hierarchical structure and attributes in the first embodiment. A first specific example of a conversion specification document (XML document) is shown. In particular, the conversion specification document shown in FIG. 15 is for converting the XML document shown in FIG. 5 into the XML document shown in FIG. 6 or FIG. is there. Here, the hierarchical structure of the element is described using the attribute “depth”, and the attribute “depth” is also attached to a parent tag having a child.

この図１５に示すような変換仕様文書の作成手順を、図１６に示すフローチャート（ステップＳ１〜Ｓ４）に従って説明する。ただし、図１６に示す手順は、レコード内の階層数が任意であり且つ非キー要素が任意の属性を有する場合の変換仕様の作成手順である。
まず、ルート（root）およびレコードのタグ名を要素“structure”で指定する（ステップＳ１）。例えば図５に示すＸＭＬ文書を変換対象とする場合、ルートのタグ名として“名簿”が指定され、レコードのタグ名として“個人”が指定される。 The procedure for creating the conversion specification document as shown in FIG. 15 will be described according to the flowchart (steps S1 to S4) shown in FIG. However, the procedure illustrated in FIG. 16 is a procedure for creating a conversion specification when the number of hierarchies in a record is arbitrary and a non-key element has an arbitrary attribute.
First, the root and the tag name of the record are specified by the element “structure” (step S1). For example, when the XML document shown in FIG. 5 is to be converted, “name list” is designated as the tag name of the root, and “person” is designated as the tag name of the record.

また、レコード内の要素をキー要素と非キー要素との二つのグループに分ける（ステップＳ２）。図５および図１６に示す例では、タグ名“名前”，“姓”，“名”，“勤務先”および“会社”の要素をキー要素とし、タグ名“部署”，“住所”，“連絡先”，“電話”，“Fax”および“Email”の要素を非キー要素としている。
そして、キー要素のタグ名をそれぞれ<key_tags>内の<tag>の箇所で指定するとともに（ステップＳ３）、非キー要素のタグ名をそれぞれ<nonkey_tags>内の<tag>の箇所で指定する（ステップＳ４）。 Further, the elements in the record are divided into two groups of key elements and non-key elements (step S2). In the example shown in FIG. 5 and FIG. 16, the tag names “name”, “last name”, “first name”, “work” and “company” are used as key elements, and the tag names “department”, “address”, “ The “Contact”, “Telephone”, “Fax” and “Email” elements are non-key elements.
Then, the tag name of the key element is specified at the position of <tag> in <key_tags> (step S3), and the tag name of the non-key element is specified at the position of <tag> in <nonkey_tags> ( Step S4).

ステップＳ４においては、非キー要素に関する情報が、以下の手順(1)〜(4)に従って変換仕様文書として記述される。
手順(1)：一つにまとめられた非キー要素を記述する新たな要素のタグ名を<merged_tag>で指定する（図１５の“情報１”や“情報２”参照）。
手順(2)：一つにまとめるべき非キー要素が表形式データであるか否かを“format”属性で指定する。表形式データの場合、“format”属性値として“table”を記述する一方、非表形式データの場合、“format”属性値として“nontable”を記述する。また、非表形式データの場合であって、タグ名を短縮タグ名に置き換えるタグ名短縮変換を行なう場合も、その旨を“format”属性で指定する。タグ名短縮変換を行なう場合、“format”属性値として“abbr”を記述する。 In step S4, information on non-key elements is described as a conversion specification document according to the following procedures (1) to (4).
Procedure (1): A tag name of a new element describing non-key elements grouped together is designated by <merged_tag> (see “Information 1” and “Information 2” in FIG. 15).
Step (2): Specify whether or not the non-key elements that should be combined into one are tabular data using the “format” attribute. In the case of tabular data, “table” is described as the “format” attribute value, while in the case of non-tabular data, “nontable” is described as the “format” attribute value. Also, in the case of non-tabular data, when performing tag name abbreviated conversion in which a tag name is replaced with a short tag name, this is designated by the “format” attribute. When tag name shortening conversion is performed, “abbr” is described as the “format” attribute value.

手順(3)：タグ名，要素内容，属性，属性内容（属性値）を所定の順序で順に書き出し、ＣＳＶ形式で繋げる。
手順(4)：２階層以上の要素（階層構造を成す要素）は、“depth”属性で深さを指定する（図１５の「depth=“0”」や「depth=“1”」参照）。
以上のような手順によって、変換仕様文書が、図１５に示すごとく、ＸＭＬによって記述されることになる。 Step (3): Tag names, element contents, attributes, and attribute contents (attribute values) are sequentially written in a predetermined order and connected in CSV format.
Step (4): For the elements of two or more layers (elements having a hierarchical structure), the depth is designated by the “depth” attribute (see “depth =“ 0 ”” and “depth =“ 1 ”” in FIG. 15). .
Through the above procedure, the conversion specification document is described in XML as shown in FIG.

一方、図１７には、第１実施形態における、レコード内の非キー要素が階層構造を成すとともに属性を有する場合の変換仕様文書（ＸＭＬ文書）の第２具体例が示されている。特に、この図１７に示す変換仕様文書は、図５に示したＸＭＬ文書を変換対象とし、図５に示したＸＭＬ文書を、図７または図８で説明したＸＭＬ文書に変換するためのものである。ここでは、葉となる要素の階層構造が、属性“path”を用いて記述される。また、“path”属性値は、“ＸＰath”で表現される。 On the other hand, FIG. 17 shows a second specific example of the conversion specification document (XML document) when the non-key elements in the record have a hierarchical structure and have attributes in the first embodiment. In particular, the conversion specification document shown in FIG. 17 is for converting the XML document shown in FIG. 5 into the XML document shown in FIG. 7 or FIG. is there. Here, the hierarchical structure of the elements to be leaves is described using the attribute “path”. The “path” attribute value is expressed by “XPath”.

このように、図１７に示す変換仕様文書では、属性“path”を用いて階層構造を記述する点以外は、図１５に示した変換仕様文書と同様であるので、その詳細な説明は省略する。また、図１７に示すような変換仕様文書も、図１６にて説明した手順と同様の手順によって作成される。
なお、前述した通り、図６や図７に示したＸＭＬ文書は、図１５や図１７に示す変換仕様文書を用いて変換されたもので、“format”属性値として“nontable”を設定し、変換対象のＸＭＬ文書が表形式データであるか否かを区別されることなく（つまり非表形式データとして）変換されたものである。これに対し、図８に示したＸＭＬ文書では、“情報１”の“format”属性値として“nontable”を設定するとともに“情報２”の“format”属性値として“table”を設定し、表形式データの非キー要素に対しては表形式変換が施されるとともに、非表形式データの非キー要素に対しては非表形式変換が施されている。 As described above, the conversion specification document shown in FIG. 17 is the same as the conversion specification document shown in FIG. 15 except that the hierarchical structure is described using the attribute “path”. . Also, the conversion specification document as shown in FIG. 17 is created by the same procedure as that described with reference to FIG.
As described above, the XML document shown in FIGS. 6 and 7 is converted using the conversion specification document shown in FIGS. 15 and 17, and “nontable” is set as the “format” attribute value. It is converted without distinguishing whether the XML document to be converted is tabular data (that is, as non-tabular data). On the other hand, in the XML document shown in FIG. 8, “nontable” is set as the “format” attribute value of “information 1” and “table” is set as the “format” attribute value of “information 2”. The non-key elements of the format data are subjected to table format conversion, and the non-key elements of the non-tabular data are subjected to non-table format conversion.

〔１−７〕第１実施形態の変換方法による具体的な変換処理手順
次に、図１８〜図２１を参照しながら、本発明の第１実施形態としての構造化文書変換方法による変換処理手順について説明する。
図１８および図１９は、データ構造変換／逆変換処理を、Javaソフトウエアにより、ＤＯＭおよびＸＳＬＴなどを使用して実行する場合の処理手順を示すものである。なお、Javaは、米国Sun Microsystems社によって開発されたＣ++類似のオブジェクト指向プログラミング言語である。 [1-7] Specific Conversion Processing Procedure by the Conversion Method of the First Embodiment Next, the conversion processing procedure by the structured document conversion method as the first embodiment of the present invention will be described with reference to FIGS. Will be described.
FIG. 18 and FIG. 19 show processing procedures when the data structure conversion / inverse conversion processing is executed using Java software using DOM, XSLT, and the like. Java is an object-oriented programming language similar to C ++ developed by Sun Microsystems.

ここで、図１８は、変換仕様文書に基づいて変換対象ＸＭＬ文書に対してデータ構造変換を施す際の処理手順を説明するためのフローチャート（ステップＡ１〜Ａ１６）であり、図１９は、変換仕様文書に基づいて変換後ＸＭＬ文書（処理済ＸＭＬ文書）に対してデータ構造の逆変換を施す際の処理手順を説明するためのフローチャート（ステップＢ１〜Ｂ１５）である。図１８および図１９に示す処理手順は、図２に示すようなデータ構造変換／逆変換機構１０を用いることなく、変換仕様文書に基づいて、変換対象ＸＭＬ文書や変換後ＸＭＬ文書に対する処理を実行する際の処理手順である。 Here, FIG. 18 is a flowchart (steps A1 to A16) for explaining the processing procedure when data structure conversion is performed on the conversion target XML document based on the conversion specification document, and FIG. 19 shows the conversion specification. It is a flowchart (step B1-B15) for demonstrating the process sequence at the time of performing reverse conversion of a data structure with respect to the XML document after conversion (processed XML document) based on a document. The processing procedure shown in FIG. 18 and FIG. 19 executes processing on the conversion target XML document and the converted XML document based on the conversion specification document without using the data structure conversion / inverse conversion mechanism 10 as shown in FIG. It is a processing procedure at the time of doing.

変換対象ＸＭＬ文書に対してデータ構造変換を施す際には、図１８に示すように、最初に、プロセッサは、変換仕様文書を読み込んで、その変換仕様文書の記述から変換仕様を解析してから（ステップＡ１）、変換対象のＸＭＬ文書を読み込み、データ構造の変換処理を開始する（ステップＡ２）。
まず、変換対象ＸＭＬ文書のルート（root）のタグを変換後ＸＭＬ文書側にコピーしてから（ステップＡ３）、次のレコードデータを一つ変換対象ＸＭＬ文書から切り出す（ステップＡ４）。この後、全てのレコードに対して処理を行なったか否かを判断し（ステップＡ５）、まだ全てのレコードに対する処理を完了していない場合（ステップＡ５のＮＯルート）、次のレコードのタグを変換後ＸＭＬ文書側にコピーし（ステップＡ６）、現在処理中のレコードから次の要素データを切り出す（ステップＡ７）。 When performing data structure conversion on the conversion target XML document, as shown in FIG. 18, the processor first reads the conversion specification document and analyzes the conversion specification from the description of the conversion specification document. (Step A1), an XML document to be converted is read, and data structure conversion processing is started (Step A2).
First, after copying the root tag of the conversion target XML document to the converted XML document side (step A3), the next record data is cut out from the conversion target XML document (step A4). Thereafter, it is determined whether or not processing has been performed for all records (step A5). If processing for all records has not yet been completed (NO route of step A5), the tag of the next record is converted. The data is then copied to the XML document side (step A6), and the next element data is cut out from the record currently being processed (step A7).

ここで次の要素データが切り出された場合には、まだ全ての要素に対する処理を完了していないものと判断し（ステップＡ８のＮＯルート）、切り出された要素がキー要素であるか否かを判断する（ステップＡ９）。キー要素である場合（ステップＡ９のＹＥＳルート）、切り出された要素をそのまま変換後ＸＭＬ文書側にコピーしてから（ステップＡ１０）、ステップＡ７の処理に戻る。 If the next element data is cut out, it is determined that processing for all the elements has not been completed yet (NO route of step A8), and whether or not the cut out element is a key element is determined. Judgment is made (step A9). If it is a key element (YES route in step A9), the extracted element is copied as it is to the XML document side after conversion (step A10), and the process returns to step A7.

切り出された要素がキー要素でない場合（ステップＡ９のＮＯルート）、その要素が非キー要素であるか否かを判断する（ステップＡ１１）。もし非キー要素でない場合（ステップＡ１１のＮＯルート）、何らかのエラー処理を実行する。
一方、非キー要素である場合（ステップＡ１１のＹＥＳルート）、変換仕様文書によって予め指定されたタグ名の新要素を作成する（ステップＡ１２）。既に非キー要素に対応する新要素が作成されている場合には、この作成処理は省略される。 If the extracted element is not a key element (NO route of step A9), it is determined whether or not the element is a non-key element (step A11). If it is not a non-key element (NO route of step A11), some error processing is executed.
On the other hand, if it is a non-key element (YES route in step A11), a new element having a tag name specified in advance by the conversion specification document is created (step A12). If a new element corresponding to a non-key element has already been created, this creation process is omitted.

そして、ステップＡ１２で新要素を作成した場合には、その非キー要素のタグ名を新要素の属性においてタグ名文字列（属性値）として記述する。既に非キー要素に対応する新要素が作成されている場合には、その非キー要素のタグ名を、新要素の属性におけるタグ名文字列の後に、ＣＳＶ形式でつまり区切り記号を介して繋げる（ステップＡ１３）。
また、ステップＡ１２で新要素を作成した場合には、その非キー要素の内容を新要素の内容において内容文字列として記述する。既に非キー要素に対応する新要素が作成されている場合には、その非キー要素の内容を、新要素の要素における内容文字列の後に、ＣＳＶ形式でつまり区切り記号を介して繋げる（ステップＡ１４）。この後、ステップＡ７の処理に戻る。なお、ステップＡ１４において、非キー要素の内容中に区切り記号（ここではコンマ“，”）と同じ文字が現われた場合、前述したように、非キー要素の内容中における文字（区切り記号）を、他の識別文字列（例えば実体参照記述等）に置き換える。 When a new element is created in step A12, the tag name of the non-key element is described as a tag name character string (attribute value) in the attribute of the new element. When a new element corresponding to a non-key element has already been created, the tag name of the non-key element is connected in the CSV format, that is, via a delimiter after the tag name character string in the attribute of the new element ( Step A13).
When a new element is created in step A12, the content of the non-key element is described as a content character string in the content of the new element. If a new element corresponding to a non-key element has already been created, the contents of the non-key element are connected in CSV format, that is, via a delimiter after the content character string in the element of the new element (step A14). ). Thereafter, the process returns to step A7. In step A14, when the same character as the delimiter (here, comma “,”) appears in the content of the non-key element, as described above, the character (delimiter) in the content of the non-key element is Replace with another identification character string (for example, entity reference description).

ステップＡ７で次の要素データが切り出されなかった場合には、全ての要素に対する処理を完了したものと判断し（ステップＡ８のＹＥＳルート）、現在処理中のレコードの終了タグを出力し変換後ＸＭＬ文書側にコピーしてから（ステップＡ１５）、ステップＡ４の処理に戻る。また、全てのレコードに対する処理を完了した場合（ステップＡ５のＹＥＳルート）、ルートの終了タグを出力し変換後ＸＭＬ文書側にコピーし（ステップＡ１６）、変換処理を終了する。 If the next element data is not cut out in step A7, it is determined that the processing for all elements has been completed (YES route in step A8), the end tag of the record currently being processed is output, and the converted XML After copying to the document side (step A15), the process returns to step A4. When processing for all records is completed (YES route in step A5), a route end tag is output and copied to the XML document side after conversion (step A16), and the conversion processing ends.

逆に、変換後ＸＭＬ文書に対してデータ構造の逆変換を施す際には、図１９に示すように、最初に、プロセッサは、変換仕様文書を読み込んで、その変換仕様文書の記述から変換仕様を解析してから（ステップＢ１）、逆変換対象ＸＭＬ文書を読み込み、データ構造の逆変換処理を開始する（ステップＢ２）。
まず、逆変換対象ＸＭＬ文書のルート（root）のタグを復元ＸＭＬ文書側にコピーしてから（ステップＢ３）、次のレコードデータを一つ逆変換対象ＸＭＬ文書から切り出す（ステップＢ４）。この後、全てのレコードに対して処理を行なったか否かを判断し（ステップＢ５）、まだ全てのレコードに対する処理を完了していない場合（ステップＢ５のＮＯルート）、そのレコードのタグを復元ＸＭＬ文書側にコピーし（ステップＢ６）、現在処理中のレコードから次の要素データを切り出す（ステップＢ７）。 On the other hand, when performing reverse conversion of the data structure on the converted XML document, first, as shown in FIG. 19, the processor reads the conversion specification document and converts the conversion specification document from the conversion specification document description. Is analyzed (step B1), the reverse conversion target XML document is read, and the reverse conversion processing of the data structure is started (step B2).
First, the root tag of the reverse conversion target XML document is copied to the restored XML document (step B3), and then the next record data is cut out from the reverse conversion target XML document (step B4). Thereafter, it is determined whether or not processing has been performed for all records (step B5). If processing for all records has not yet been completed (NO route of step B5), the tag of the record is restored XML. Copy to the document side (step B6) and cut out the next element data from the record currently being processed (step B7).

ここで次の要素データが切り出された場合には、まだ全ての要素に対する処理を完了していないものと判断し（ステップＢ８のＮＯルート）、切り出された要素がキー要素であるか否かを判断する（ステップＢ９）。キー要素である場合（ステップＢ９のＹＥＳルート）、切り出された要素をそのまま復元ＸＭＬ文書側にコピーしてから（ステップＢ１０）、ステップＢ７の処理に戻る。 If the next element data is cut out, it is determined that the processing for all the elements has not been completed yet (NO route of step B8), and whether or not the cut out element is a key element is determined. Judgment is made (step B9). If it is a key element (YES route in step B9), the extracted element is copied to the restored XML document as it is (step B10), and the process returns to step B7.

切り出された要素がキー要素でない場合（ステップＢ９のＮＯルート）、その要素が、非キー要素をまとめた（マージした）ものであるか否かを判断する（ステップＢ１１）。もし非キー要素をまとめたものでない場合（ステップＢ１１のＮＯルート）、何らかのエラー処理を実行する。
一方、非キー要素をまとめた前記新要素が切り出された場合（ステップＢ１１のＹＥＳルート）、その新要素のタグにおいて属性値として記述されたタグ文字列（非キー要素のタグ名をＣＳＶ形式で繋いだもの）から、非キー要素のタグ名を順次切り出す（ステップＢ１２）。 If the extracted element is not a key element (NO route of Step B9), it is determined whether or not the element is a combination (merged) of non-key elements (Step B11). If the non-key elements are not collected (NO route of step B11), some error processing is executed.
On the other hand, when the new element that summarizes the non-key elements is cut out (YES route of Step B11), the tag character string described as the attribute value in the tag of the new element (the tag name of the non-key element in the CSV format) The tag names of the non-key elements are sequentially cut out from the connected ones (step B12).

また、その新要素の内容に記述された内容文字列（非キー要素の内容をＣＳＶ形式で繋いだもの）から、非キー要素の内容を順次切り出し、切り出された内容とステップＢ１２で切り出されたタグ名とから、非キー要素を復元してから（ステップＢ１３）、ステップＢ７の処理に戻る。なお、ステップＢ１３において、新要素における内容文字列から、区切り記号についての識別文字列を含む内容が切り出された場合には、その識別文字列を元の区切り記号に戻す。 Also, the content of the non-key element is sequentially cut out from the content character string described in the content of the new element (the content of the non-key element is connected in CSV format), and the cut content and the cut-out content are cut out in step B12. After restoring the non-key element from the tag name (step B13), the process returns to step B7. In step B13, when the content including the identification character string for the delimiter is extracted from the content character string in the new element, the identification character string is returned to the original delimiter.

ステップＢ７で次の要素データが切り出されなかった場合には、全ての要素に対する処理を完了したものと判断し（ステップＢ８のＹＥＳルート）、現在処理中のレコードの終了タグを出力し復元ＸＭＬ文書側にコピーしてから（ステップＢ１４）、ステップＢ４の処理に戻る。また、全てのレコードに対する処理を完了した場合（ステップＢ５のＹＥＳルート）、ルートの終了タグを出力し復元ＸＭＬ文書側にコピーし（ステップＢ１５）、逆変換処理を終了する。 If the next element data is not cut out in step B7, it is determined that the processing for all the elements has been completed (YES route in step B8), and the end tag of the record currently being processed is output and the restored XML document After copying to the side (step B14), the process returns to step B4. If the processing for all the records is completed (YES route in step B5), the end tag of the route is output and copied to the restored XML document side (step B15), and the inverse conversion process is terminated.

ところで、図２０（Ａ）〜図２０（Ｄ）は、第１実施形態によるデータ構造変換／逆変換処理をＸＳＬＴプロセッサのみで実行する場合の処理手順を示すものである。つまり、図２０（Ａ）〜図２０（Ｄ）に示す処理手順は、図２に示したデータ構造変換／逆変換機構１０を用い、変換仕様文書に基づいて、変換対象ＸＭＬ文書や変換後ＸＭＬ文書に対する処理を実行する際の処理手順である。 20A to 20D show a processing procedure when the data structure conversion / inverse conversion processing according to the first embodiment is executed only by the XSLT processor. That is, the processing procedure shown in FIGS. 20A to 20D uses the data structure conversion / inverse conversion mechanism 10 shown in FIG. 2 and based on the conversion specification document, the conversion target XML document and the converted XML document. This is a processing procedure when executing processing on a document.

ここで、図２０（Ａ）および図２０（Ｂ）は、それぞれ、第１実施形態における変換用スタイルシートおよび逆変換用スタイルシートの作成手順（ＸＳＬＴ変換部１１での処理）を説明するためのフローチャートである。
また、図２０（Ｃ）は、ＸＳＬＴ構造変換部１２が構造変換用スタイルシートに基づいて変換対象ＸＭＬ文書に対してデータ構造変換を施す際の処理手順を説明するためのフローチャートであり、図２０（Ｄ）は、ＸＳＬＴ逆変換部１３が逆変換用スタイルシートに基づいて変換後ＸＭＬ文書（処理済ＸＭＬ文書）に対してデータ構造の逆変換を施す際の処理手順を説明するためのフローチャートである。 Here, FIGS. 20A and 20B are diagrams for explaining the procedure for creating the conversion style sheet and the reverse conversion style sheet (processing in the XSLT conversion unit 11) in the first embodiment, respectively. It is a flowchart.
FIG. 20C is a flowchart for explaining a processing procedure when the XSLT structure conversion unit 12 performs data structure conversion on the conversion target XML document based on the structure conversion style sheet. (D) is a flowchart for explaining a processing procedure when the XSLT inverse transform unit 13 performs inverse transform of the data structure on the converted XML document (processed XML document) based on the inverse transform style sheet. is there.

変換対象ＸＭＬ文書に対する処理を施すのに先立って、まず、図２０（Ａ）に示すように、ＸＳＬＴ変換部１１は、ＸＭＬで記述された変換仕様文書を読み込んで、その変換仕様文書の記述から変換仕様を解析してから（ステップＡ１）、その変換仕様と自動変換スタイルシートとを用いて、データ構造変換用スタイルシートを作成する（ステップＡ２０）。また、同様に、図２０（Ｂ）に示すように、ＸＳＬＴ変換部１１は、ＸＭＬで記述された変換仕様文書を読み込んで、その変換仕様文書の記述から変換仕様を解析してから（ステップＢ１）、その変換仕様と自動変換スタイルシートとを用いて、データ構造逆変換用スタイルシートを作成する（ステップＢ２０）。 Prior to performing the processing on the conversion target XML document, first, as shown in FIG. 20A, the XSLT conversion unit 11 reads the conversion specification document described in XML, and from the description of the conversion specification document. After analyzing the conversion specifications (step A1), a data structure conversion style sheet is created using the conversion specifications and the automatic conversion style sheet (step A20). Similarly, as shown in FIG. 20B, the XSLT conversion unit 11 reads a conversion specification document described in XML and analyzes the conversion specification from the description of the conversion specification document (step B1). The data structure reverse conversion style sheet is created using the conversion specification and the automatic conversion style sheet (step B20).

そして、変換対象ＸＭＬ文書に対してデータ構造変換を施す際には、図２０（Ｃ）に示すように、ＸＳＬＴ構造変換部１２は、その変換対象ＸＭＬ文書と構造変換用スタイルシートとを指定して、変換処理を開始する（ステップＡ２１）。その後、ＸＳＬＴ構造変換部１２は、図１８のステップＡ２〜Ａ１６と同様の処理を実行する。
逆に、変換後ＸＭＬ文書に対してデータ構造の逆変換を施す際には、図２０（Ｄ）に示すように、ＸＳＬＴ逆変換部１３は、逆変換対象ＸＭＬ文書と逆変換用スタイルシートとを指定して、逆変換処理を開始する（ステップＢ２１）。その後、ＸＳＬＴ逆変換部１３は、図１９のステップＢ２〜Ｂ１５と同様の処理を実行する。 When the data structure conversion is performed on the conversion target XML document, the XSLT structure conversion unit 12 specifies the conversion target XML document and the structure conversion style sheet, as shown in FIG. Then, the conversion process is started (step A21). Thereafter, the XSLT structure conversion unit 12 performs the same processing as steps A2 to A16 in FIG.
On the other hand, when the data structure is inversely converted with respect to the converted XML document, as shown in FIG. Is designated and the inverse conversion process is started (step B21). Thereafter, the XSLT inverse transform unit 13 executes the same processing as steps B2 to B15 in FIG.

ここで、図２に示すように、応用ソフトウエア３０は、標準ＡＰＩ（ＤＯＭ）２０を通して、ＸＳＬＴ構造変換部１２からの、要素数を削減された変換後ＸＭＬ文書に対し、タグ検索等の処理を行なうことになるので、応用ソフトウエア３０による処理速度は大幅に高速化される。
応用ソフトウエア３０が、変換後ＸＭＬ文書に対してタグ検索を行なうものである場合、そのタグ検索によってヒットしたレコードを記述するＸＭＬ文書（抽出ＸＭＬ文書）が抽出・出力される。この抽出ＸＭＬ文書は、ＸＳＬＴ逆変換部１３によって上述のごとく逆変換され、応用ソフトウエア３０が元のＸＭＬ文書に対してタグ検索したのと全く同じ、検索結果（ＸＭＬ文書）が得られることになる。 Here, as shown in FIG. 2, the application software 30 performs processing such as tag search on the converted XML document from the XSLT structure conversion unit 12 through the standard API (DOM) 20 with the number of elements reduced. Therefore, the processing speed of the application software 30 is greatly increased.
When the application software 30 performs a tag search on the converted XML document, an XML document (extracted XML document) describing a record hit by the tag search is extracted and output. This extracted XML document is inversely transformed by the XSLT inverse transform unit 13 as described above, and a search result (XML document) is obtained which is exactly the same as the application software 30 has searched for the tag of the original XML document. Become.

このとき、ＸＳＬＴ逆変換部１３が逆変換を施すＸＭＬ文書は、応用ソフトウエア３０によって抽出された少数のレコードを記述されたＸＭＬ文書であるので、ＸＳＬＴ逆変換部１３による逆変換のオーバーヘッドは、ほとんど問題にならない。従って、応用ソフトウエア３０で多数回実行される処理は、本実施形態のデータ構造変換を予め施しておくことによって、大幅に高速化されるとともに、動作メモリの使用量も大幅に削減されることになる。 At this time, since the XML document to which the XSLT inverse transform unit 13 performs the inverse transform is an XML document in which a small number of records extracted by the application software 30 are described, the overhead of the inverse transform by the XSLT inverse transform unit 13 is Almost no problem. Therefore, the processing executed many times by the application software 30 is greatly speeded up by performing the data structure conversion of this embodiment in advance, and the amount of operation memory used is also greatly reduced. become.

なお、図２１（Ａ）および図２１（Ｂ）は、それぞれ、第１実施形態における変換用スタイルシートおよび逆変換用スタイルシートの作成手順（ＸＳＬＴ変換部１１での処理）の変形例を説明するためのフローチャートである。これらの図２１（Ａ）および図２１（Ｂ）に示す処理手順は、図１３，図１５や図１７に示す変換仕様文書において“format”属性値（表形式情報）により表形式／非表形式が指定されている場合に、図２０（Ａ）や図２０（Ｂ）で前述した処理手順に代えて、ＸＳＬＴ変換部１１において実行されるものである。 FIGS. 21A and 21B illustrate a modification of the conversion style sheet and reverse conversion style sheet creation procedure (processing in the XSLT conversion unit 11) in the first embodiment, respectively. It is a flowchart for. The processing procedures shown in FIGS. 21A and 21B are performed in the table format / non-table format according to the “format” attribute value (table format information) in the conversion specification document shown in FIGS. Is specified in the XSLT conversion unit 11 instead of the processing procedure described above with reference to FIGS. 20 (A) and 20 (B).

つまり、変換対象ＸＭＬ文書に対する処理を施すのに先立って、まず、図２１（Ａ）に示すように、ＸＳＬＴ変換部１１は、ＸＭＬで記述された変換仕様文書を読み込んで、その変換仕様文書の記述から変換仕様を解析してから（ステップＡ１）、“format”属性値を参照してデータ（変換対象ＸＭＬ文書）が表形式か否かを判断する（ステップＡ２２）。 That is, prior to performing the processing on the conversion target XML document, first, as shown in FIG. 21A, the XSLT conversion unit 11 reads the conversion specification document described in XML, After the conversion specification is analyzed from the description (step A1), it is determined by referring to the “format” attribute value whether the data (conversion target XML document) is in a table format (step A22).

データが表形式である場合（ステップＡ２２のＹＥＳルート）、ＸＳＬＴ変換部１１は、変換仕様と自動変換スタイルシートとを用いて、非キー要素のタグ名を新要素のタグ名で代表させる構造変換用スタイルシートを作成する（ステップＡ２０−１）。一方、非表形式の場合（ステップＡ２２のＮＯルート）、ＸＳＬＴ変換部１１は、変換仕様と自動変換スタイルシートとを用いて、区切り記号を介して非キー要素のタグ名（もしくは短縮タグ名）を繋いだタグ名文字列を変換後ＸＭＬ文書中に記述させる構造変換用スタイルシートを作成する（ステップＡ２０−２）。 When the data is in a table format (YES route in step A22), the XSLT conversion unit 11 uses the conversion specification and the automatic conversion style sheet to convert the non-key element tag name with the new element tag name. A style sheet is created (step A20-1). On the other hand, in the case of non-table format (NO route of step A22), the XSLT conversion unit 11 uses the conversion specification and the automatic conversion style sheet, and the tag name (or short tag name) of the non-key element via the delimiter. A structure conversion style sheet is created that allows the tag name character string connecting the two to be described in the converted XML document (step A20-2).

また、図２１（Ｂ）に示すように、ＸＳＬＴ変換部１１は、ＸＭＬで記述された変換仕様文書を読み込んで、その変換仕様文書の記述から変換仕様を解析してから（ステップＢ１）、“format”属性値を参照してデータ（変換対象ＸＭＬ文書）が表形式か否かを判断する（ステップＢ２２）。
データが表形式である場合（ステップＢ２２のＹＥＳルート）、ＸＳＬＴ変換部１１は、変換仕様と自動変換スタイルシートとを用いて、新要素のタグ名から非キー要素のタグ名を割り出せるようにした逆変換用スタイルシートを作成する（ステップＢ２０−１）。一方、非表形式の場合（ステップＢ２２のＮＯルート）、ＸＳＬＴ変換部１１は、変換仕様と自動変換スタイルシートとを用いて、タグ名文字列から非キー要素のタグ名を復元させる構造変換用スタイルシートを作成する（ステップＡ２０−２）。 Further, as shown in FIG. 21B, the XSLT conversion unit 11 reads a conversion specification document described in XML, analyzes the conversion specification from the description of the conversion specification document (step B1), “ With reference to the “format” attribute value, it is determined whether or not the data (the XML document to be converted) is in a tabular format (step B22).
When the data is in a table format (YES route of step B22), the XSLT conversion unit 11 can calculate the tag name of the non-key element from the tag name of the new element using the conversion specification and the automatic conversion style sheet. An inverse conversion style sheet is created (step B20-1). On the other hand, in the case of the non-table format (NO route of step B22), the XSLT conversion unit 11 uses the conversion specification and the automatic conversion style sheet to convert the tag name of the non-key element from the tag name character string. A style sheet is created (step A20-2).

〔１−８〕第１実施形態の効果
このように、本発明の第１実施形態としての構造化文書変換方法によれば、変換対象のＸＭＬ文書を成す要素がキー要素と非キー要素とに分けられ、その変換対象の構造化文書が、キー要素をそのまま記述する一方で非キー要素（データ処理の対象とならない項目）を一つのタグにまとめて記述したＸＭＬ文書に変換されるので、変換後のＸＭＬ文書では、要素数が大幅に削減されるとともに、ＤＯＭツリーへの展開時や、タグ検索等のデータ処理時に、非キー要素を一括して扱うことができる。 [1-8] Effects of First Embodiment As described above, according to the structured document conversion method as the first embodiment of the present invention, the elements constituting the XML document to be converted are converted into key elements and non-key elements. The structured document to be converted is converted into an XML document in which the non-key elements (items not subject to data processing) are described in one tag while the key elements are described as they are. In the later XML document, the number of elements is greatly reduced, and non-key elements can be handled in a lump at the time of expansion into a DOM tree or data processing such as tag search.

特に、データ処理の対象とならない非キー要素が多いＸＭＬ文書や、１レコードの要素数が多いＸＭＬ文書での要素数の削減効果は大きく、例えば要素数が半分になれば、ＤＯＭツリーへの展開およびタグ検索に要する時間は半分に短縮することができる。また、変換対象のＸＭＬ文書が表形式データである場合には、そのＸＭＬ文書を図４（Ｂ）や図４（Ｃ）にて説明したように変換することで、非キー要素のタグ名を変換後のＸＭＬ文書に記述する必要がなくなるので、変換後のＸＭＬ文書のデータ量を、変換前のＸＭＬ文書のデータ量の約３分の１まで削減することができる場合がある。 In particular, the effect of reducing the number of elements in an XML document having many non-key elements that are not subject to data processing or an XML document having a large number of elements in one record is large. And the time required for tag search can be cut in half. If the XML document to be converted is tabular data, the tag name of the non-key element is changed by converting the XML document as described with reference to FIGS. 4B and 4C. Since it is not necessary to describe in the converted XML document, the data amount of the XML document after conversion may be reduced to about one third of the data amount of the XML document before conversion.

また、応用ソフトウエア（アプリケーション）３０によりＸＭＬ文書に対するデータ処理を行なう際にはキー要素のみが使用されるが、第１実施形態では、キー要素についてはそのまま記述されているので、通常通り、キー要素のタグ名を用いてキー要素の内容を参照することができ、変換後のＥＭＬ文書のトランスペアレント性は確保される。
このとき、変換仕様文書をＸＭＬ文書として作成し変換実行手順を与えることにより、多様な種類のＸＭＬ文書に対して、スタイルシートを一々作成する必要がなくなり、手間をかけることなく、第１実施形態によるデータ構造の変換／逆変換処理を種々のＸＭＬ文書データに施すことができる。さらに、変換仕様文書に基づいて変換／逆変換を指示する変換／逆変換用スタイルシートを生成すれば、標準のＸＳＬＴプロセッサにより変換／逆変換用スタイルシートを用いて変換／逆変換を実行することができ、つまりは、ほとんどあらゆる種類のＸＭＬシステムにおいて第１実施形態による変換／逆変換処理を実行することができる。 Only key elements are used when the application software (application) 30 performs data processing on the XML document. However, in the first embodiment, the key elements are described as they are. The content of the key element can be referred to using the tag name of the element, and the transparency of the converted EML document is ensured.
At this time, by creating a conversion specification document as an XML document and giving a conversion execution procedure, it is not necessary to create style sheets for various types of XML documents one by one. Data structure conversion / inverse conversion processing can be applied to various XML document data. Furthermore, if a conversion / inverse style sheet for instructing conversion / inverse conversion is generated based on the conversion specification document, the conversion / inverse conversion style sheet is executed by the standard XSLT processor. In other words, the conversion / inverse conversion process according to the first embodiment can be executed in almost any kind of XML system.

従って、第１実施形態の変換方法によれば、アプリケーションに対するトランスペアレント性や変換されたＸＭＬ文書のデータ構造の有効性を確保しながら、非キー要素を一つの要素にまとめるデータ構造変換処理を、種々のＸＭＬ文書データに施すことができるようにした汎用の変換技術を提供することができ、これにより、ＸＭＬ文書に対する操作に必要となるリソースが大幅に軽減され、ＸＭＬ文書を処理する際におけるメモリ使用量削減と処理速度の高速化との両方が実現されることになる。 Therefore, according to the conversion method of the first embodiment, various data structure conversion processes that combine non-key elements into one element while ensuring the transparency to the application and the validity of the data structure of the converted XML document are performed in various ways. Can provide a general-purpose conversion technique that can be applied to XML document data, and this greatly reduces the resources required for operations on XML documents, and uses memory when processing XML documents. Both volume reduction and processing speed increase are realized.

また、ＥＤＩのデータにおいては１レコード当たり数百〜千の項目（要素）があり、項目数が多過ぎるため、ＤＯＭツリーへの展開に向かない。また、文書要素を切り出して時系列的に流すだけの標準ＡＰＩ(ＳＡＸ: Simple API for XML)が用いられているため、複雑な文書操作が難しくなっている。しかし、項目数の多いデータであっても、データ処理の対象となる項目（キー要素）の数は必ずしも多くないので、第１実施形態の変換方法によりＸＭＬ文書を変換することは極めて効果的である。 In addition, in EDI data, there are hundreds to thousands of items (elements) per record, and the number of items is too large, so it is not suitable for expansion into a DOM tree. Further, since a standard API (SAX: Simple API for XML) that only cuts out document elements and flows them in time series is used, complicated document operations are difficult. However, even with data having a large number of items, the number of items (key elements) to be subjected to data processing is not necessarily large, so that it is very effective to convert an XML document by the conversion method of the first embodiment. is there.

タグ名変換や内容変換に際しては、図３〜図８に示すように、コンマ等の区切り記号を介して（ＣＳＶ形式で）非キー要素のタグ名や内容を繋ぐことにより、タグ名文字列や内容文字列が、タグ付けに関連することのない記号を用いて極めて容易に作成される。
このとき、非キー要素が複数階層を成している場合、図６や図７に示すように、タグ名文字列におけるタグ名に、階層構造識別情報を付加すれば、その階層構造を変換後のＸＭＬ文書に保存することができるので、その階層構造識別情報に従って、元のＸＭＬ文書を復元するための逆変換を容易に行なうことができる。 In tag name conversion and content conversion, as shown in FIG. 3 to FIG. 8, tag name character strings and contents are connected by connecting tag names and contents of non-key elements (in CSV format) via a separator such as a comma. Content strings are very easily created using symbols that are not relevant to tagging.
At this time, when the non-key element has a plurality of hierarchies, as shown in FIGS. 6 and 7, if the hierarchical structure identification information is added to the tag name in the tag name character string, the hierarchical structure is converted Therefore, reverse conversion for restoring the original XML document can be easily performed according to the hierarchical structure identification information.

また、非キー要素が属性を有する場合、図６〜図８に示すように、タグ名文字列において、属性を有するタグ名の後に、区切り記号を介して、属性名識別情報（図６〜図８では“＠”）を付加した属性の属性名を記述するとともに、このタグ名文字列におけるタグ名の並びに対応させて非キー要素の内容を繋いだ内容文字列を作成することにより、非キー要素の属性を変換後のＸＭＬ文書に保存することができるので、その属性名識別情報に従って、元のＸＭＬ文書を復元するための逆変換を容易に行なうことができる。 When the non-key element has an attribute, as shown in FIGS. 6 to 8, in the tag name character string, after the tag name having the attribute, the attribute name identification information (FIG. 6 to FIG. 8 describes the attribute name of the attribute to which "@") is added, and creates a content string that links the contents of the non-key elements in correspondence with the tag names in this tag name string. Since element attributes can be stored in the converted XML document, reverse conversion for restoring the original XML document can be easily performed according to the attribute name identification information.

さらに、図３（Ｃ）や図３（Ｆ）に示すように、非キー要素のタグ名を短縮タグ名に置換するタグ名短縮変換を行なうことにより、変換後の構造化文書のデータ量を削減することができる。このとき、図１４に示すように変換仕様文書におけるタグ名短縮変換情報（“format”属性値の“abbr”）によってタグ名短縮変換を行なうか否かを指示し、タグ名短縮変換やタグ名伸長変換の実行／非実行を自動的に切り換えることができる。 Further, as shown in FIGS. 3C and 3F, by performing tag name shortening conversion that replaces the tag name of the non-key element with the shortened tag name, the data amount of the structured document after conversion is reduced. Can be reduced. At this time, as shown in FIG. 14, it is instructed whether or not to perform the tag name shortening conversion by the tag name shortening conversion information (“abbr” of the “format” attribute value) in the conversion specification document. Execution / non-execution of decompression conversion can be automatically switched.

変換対象のＸＭＬ文書が表形式で記述されている場合、前述した通り、元のＸＭＬ文書を復元するための逆変換に際してタグ名や属性名を容易に割り出すことができるので、タグ名変換や属性名変換を省略することができる。従って、変換後のＸＭＬ文書においては、非キー要素の内容文字列が記述されていればよく、タグ名や属性名に係る記述を省略することができ、変換後のＸＭＬ文書のデータ量を大幅に削減することができる。このとき、図１３，図１５や図１７に示すように変換仕様文書における表形式情報(“format”属性値の“table/nontable”)によって表形式変換を行なうか否かを指示し、表形式変換や表形式逆変換の実行／非実行を自動的に切り換えることができる。 When the XML document to be converted is described in the table format, as described above, the tag name and attribute name can be easily determined at the time of reverse conversion for restoring the original XML document. Name translation can be omitted. Therefore, in the converted XML document, it is only necessary to describe the content character string of the non-key element, the description relating to the tag name and the attribute name can be omitted, and the data amount of the converted XML document is greatly increased. Can be reduced. At this time, as shown in FIG. 13, FIG. 15 or FIG. 17, the table format information (“table / nontable” of the “format” attribute value) in the conversion specification document is used to instruct whether or not to perform the table format conversion. Execution / non-execution of conversion and tabular reverse conversion can be automatically switched.

〔２〕第２実施形態の説明
〔２−１〕第２実施形態の構造化文書変換方法の原理
図１（Ａ），図３（Ａ）および図２２を参照しながら、本発明の第２実施形態としての構造化文書変換方法の原理について説明する。
図１（Ａ）および図３（Ａ）により前述したＸＭＬ文書において、タグ名“名前”，“会社”の要素をキー要素とするとともにタグ名“部署”，“住所”，“電話”の要素を非キー要素とし、このＸＭＬ文書に対し、第２実施形態の構造化文書変換方法を適用して得られた変換後ＸＭＬ文書のメモリ展開形式を図２２に示す。なお、ここで示す展開形式は、応用ソフトウエアが標準ＡＰＩ（ＤＯＭ）を介して変換後ＸＭＬ文書を操作するときの、メモリ上への展開形式である。 [2] Description of Second Embodiment [2-1] Principle of Structured Document Conversion Method of Second Embodiment Referring to FIGS. 1 (A), 3 (A), and FIG. The principle of the structured document conversion method as an embodiment will be described.
In the XML document described above with reference to FIGS. 1A and 3A, the tag name “name” and “company” elements are used as key elements, and the tag name “department”, “address”, and “telephone” elements. Is a non-key element, and FIG. 22 shows a memory development format of the converted XML document obtained by applying the structured document conversion method of the second embodiment to the XML document. The development format shown here is a development format on the memory when the application software operates the converted XML document via the standard API (DOM).

この図２２に示すＸＭＬ文書では、タグ名“情報”を有する新たな要素が作成され、この新たな要素の内容として、タグ名“部署”，“住所”，“電話”の非キー要素が記述されている。ただし、非キー要素を新たな要素の内容として記述する際に、非キー要素記述におけるタグ記号“<”および“>”を実体参照記述に置き換えている。また、タグ名“名前”，“会社”のキー要素については、元のまま記述されている。なお、図２２において、新要素“情報”の要素内容は、先頭の一部のみ記述されている。 In the XML document shown in FIG. 22, a new element having a tag name “information” is created, and the non-key elements of the tag name “department”, “address”, and “phone” are described as the contents of the new element. Has been. However, when the non-key element is described as the content of a new element, the tag symbols “<” and “>” in the non-key element description are replaced with the entity reference description. Further, the key elements of the tag names “name” and “company” are described as they are. In FIG. 22, the element content of the new element “information” is described only at the beginning.

このようにしてレコード毎に非キー要素を一つの要素にまとめるようにＸＭＬ文書を変換することによって、ＸＭＬ文書中に含まれる要素の数、つまりメモリ上に展開されたツリーの子要素の数を大幅に減らすことができ、展開時やデータ処理時に非キー要素を一括して扱うことができる。
ここで、レコード毎に非キー要素を一つの要素にまとめる際、第２実施形態では、非キー要素の記述中においてタグ付けに関連する記号をタグ付けに関連しない文字列に置き換えた文字列を作成し、この文字列を、新たな要素の内容（図２２や図２３参照）もしくは新たな要素の属性値（図２４参照）もしくは親要素の属性値（図２５参照）もしくは親要素の内容（図２６参照）として記述する。なお、第２実施形態の変換方法の原理を説明するための図２２では、上記文字列を新たな要素の内容として記述した場合の変換後ＸＭＬ文書のＤＯＭツリーが示されている。 In this way, by converting the XML document so that the non-key elements are combined into one element for each record, the number of elements included in the XML document, that is, the number of child elements of the tree expanded in the memory is obtained. It can be greatly reduced, and non-key elements can be handled collectively at the time of expansion or data processing.
Here, when combining non-key elements into one element for each record, in the second embodiment, a character string in which a symbol related to tagging is replaced with a character string not related to tagging in the description of the non-key element is used. The character string is created and the new element content (see FIG. 22 and FIG. 23), the new element attribute value (see FIG. 24), the parent element attribute value (see FIG. 25), or the parent element content ( (See FIG. 26). FIG. 22 for explaining the principle of the conversion method of the second embodiment shows a DOM tree of the converted XML document when the character string is described as the contents of a new element.

特に、第２実施形態では、非キー要素におけるタグ付けに関連する記号（タグ記号「<」および「>」）を、「実体参照」と呼ばれる記述手法で、タグ付けに関連しない他の文字列に置き換えている。
実体(entity)とは、ファイルや置換文字列のように、何らかの形でＸＭＬ文書の一部となりうるデータを格納しているものをいう。「実体参照」を行なう場合には、ＸＭＬ実現値の中で「&実体名;」という記述がなされる。 In particular, in the second embodiment, symbols related to tagging in non-key elements (tag symbols “<” and “>”) are converted into other character strings that are not related to tagging by a description method called “entity reference”. Has been replaced.
An entity is an entity that stores data that can be part of an XML document in some form, such as a file or a replacement character string. When “entity reference” is performed, “& entity name;” is described in the XML realization value.

通常、文書型定義(ＤＴＤ：Document Type Definition)において、実体名と元のファイル名や文字列との対応関係が宣言される。しかし、下記表１に示す、タグ付けに関連する５つの実体＜，＞，＆，’，”は、ＤＴＤなしでも使用できるようになっている。例えば、要素内容中に実体（置換したい文字）「＜」が記述されていた場合、「＜」は実体名「lt」を用いた実体参照記述による文字列「<」に置き換えられる。同様に、「＞」は「>」に、「＆」は「&」に、「'」は「'」に、「"」は「"」に置き換えられる。 Usually, in a document type definition (DTD), a correspondence relationship between an entity name and an original file name or character string is declared. However, the five entities <,>, &, ', "related to tagging shown in Table 1 below can be used without DTD. For example, entities (characters to be replaced) in element contents When “<” is described, “<” is replaced with the character string “<” by the entity reference description using the entity name “lt”. Similarly, “>” is replaced with “>”, “&” is replaced with “&”, “′” is replaced with “'”, and ““ ”is replaced with“ " ”.

このような実体参照記述を用いて、要素内容中のタグを表わす記号「<」および「>」をそれぞれ実体参照文字列「<」および「>」に置き換えることにより、要素内容中に記述されていたタグ記号は、パーサー（構文解析ソフトウエア）でタグとして処理されなくなる。従って、非キー要素を一つの要素にまとめる際に、タグ記号を実体参照文字列に置換した一連の非キー要素を、例えば<情報></情報>というタグで囲い、タグ名“情報”の新要素の内容とすれば、その一連の非キー要素は、単なる要素内容として扱われることになる。このような変換方法を整理して記述すると、以下のようになる。 Using such entity reference descriptions, replace the symbols "<" and ">" representing the tags in the element content with entity reference character strings "<" and ">", respectively. The tag symbol that has been used is not processed as a tag by a parser (syntactic analysis software). Therefore, when combining non-key elements into one element, a series of non-key elements in which tag symbols are replaced with entity reference character strings are enclosed with tags such as <information> </ information>, and the tag name "information" If it is the contents of the new element, the series of non-key elements will be treated as simple element contents. Such a conversion method is organized and described as follows.

(1)一連の非鍵要素を抽出する。
第１レコード：<部署>Ａ部</部署><住所>Ａ市</住所><電話>123</電話>
第２レコード：<部署>Ｂ部</部署><電話>456</電話><電話>789</電話>
(2)タグ記号を実体参照文字列に置換する。
「<」を「<」に、「>」を「>」に置換
第１レコード：<部署>Ａ部</部署><住所>Ａ市</住所><電話>123</電話>
第２レコード：<部署>Ｂ部</部署><電話>456</電話><電話>789</電話>
(3)レコード毎に、<情報></情報>というタグで、実体参照を適用された一連の非キー要素を囲むことにより、一連の非キー要素を１つの要素の内容としてまとめる。 (1) Extract a series of non-key elements.
First record: <Department> Department A </ Department><Address> City A </ Address><Telephone> 123 </ Telephone>
Second record: <Department> Department B </ Department><Telephone> 456 </ Telephone><Telephone> 789 </ Telephone>
(2) Replace the tag symbol with the entity reference character string.
Replace "<" with "<" and ">" with ">" First record: < Department > A Department < / Department >< Address > A City < / Address >< Phone > 123 < / Phone >
Second record: < Department > B < / Department >< Telephone > 456 < / Telephone >< Telephone > 789 < / Telephone >
(3) By enclosing a series of non-key elements to which entity references are applied with tags <information></information> for each record, the series of non-key elements are collected as the contents of one element.

第１レコード：<情報><部署>Ａ部</部署><住所>Ａ市</住所><電話>123</電話></情報>
第２レコード：<情報><部署>Ｂ部</部署><電話>456</電話><電話>789</電話></情報>
〔２−２〕第２実施形態のシステムおよび変換／逆変換処理の流れ
本発明の第２実施形態としての構造化文書変換方法も、図２で説明したシステムに適用される。 First record: <Information>< Department > A Department < / Department >< Address > A City < / Address >< Telephone > 123 < / Telephone ></Information>
Second Record: <Information>< Department > B < / Department >< Telephone > 456 < / Telephone >< Telephone > 789 < / Telephone ></Information>
[2-2] System of Second Embodiment and Flow of Conversion / Inverse Conversion Processing The structured document conversion method according to the second embodiment of the present invention is also applied to the system described with reference to FIG.

多様な種類のＸＭＬ文書に対して、各ＸＭＬ文書に応じたスタイルシート（ＸＳＬシート）を一々作成するのは極めて面倒で手間がかかる。そこで、その手間を省くため、第２実施形態でも、図２７を参照しながら後述するごとく、ＸＭＬ文書のデータ構造を変換するための仕様（レコード名，キータグ名，非キータグ名等）をＸＭＬ文書（変換仕様文書）によって作成して変換実行手順を与え、図３１〜図３８を参照しながら後述するごとく、その変換仕様文書に基づいてＸＭＬ文書の変換／逆変換を実行する。 It is extremely troublesome and troublesome to create one style sheet (XSL sheet) corresponding to each XML document for various types of XML documents. Therefore, in order to save the effort, in the second embodiment, as will be described later with reference to FIG. 27, specifications (record name, key tag name, non-key tag name, etc.) for converting the data structure of the XML document are described in the XML document. (Conversion specification document) is created and given a conversion execution procedure, and conversion / inverse conversion of the XML document is executed based on the conversion specification document as described later with reference to FIGS.

さらに、第２実施形態でも、図３９（Ａ）〜図３９（Ｄ）を参照しながら後述するごとく、与えられた変換仕様文書に基づいて、変換実行手順を指示する変換用スタイルシートや、逆変換実行手順を指示する逆変換用スタイルシートを自動的に生成し、このスタイルシートを用いて、構造化文書変換プロセッサ（ＸＳＬＴプロセッサ）に、ＸＭＬ文書に対するデータ構造変換／逆変換を実行させている。このように変換／逆変換の実行手順をスタイルシートで与えるようにすれば、標準のＸＳＬＴプロセッサで変換／逆変換を実行することができるので、ほとんどあらゆる種類のＸＭＬ文書システムにおいて第２実施形態による変換／逆変換処理を実行することができる。 Further, in the second embodiment, as will be described later with reference to FIGS. 39A to 39D, a conversion style sheet for instructing a conversion execution procedure based on a given conversion specification document, and vice versa. A reverse conversion style sheet for instructing the conversion execution procedure is automatically generated, and the structured document conversion processor (XSLT processor) is used to execute data structure conversion / inverse conversion for the XML document using the style sheet. . If the conversion / inverse conversion execution procedure is given in the style sheet as described above, the conversion / inverse conversion can be executed by a standard XSLT processor. Therefore, in almost all kinds of XML document systems, the second embodiment is applied. Conversion / inverse conversion processing can be executed.

図２に示すシステムに第２実施形態の変換方法を適用した場合も、データ構造変換/逆変換機構（ＸＳＬＴプロセッサ）１０は、ＸＭＬ文書による変換仕様文書を読み込むとともに、処理対象の入力ＸＭＬ文書を読み込み、変換仕様（実際には構造変換用スタイルシート）に基づいて入力ＸＭＬ文書を変換し、所定のデータ構造変換を施したＸＭＬ文書を出力する。そして、変換されたＸＭＬ文書に対しては、標準ＡＰＩ２０を介して応用ソフトウエアによりデータ処理（例えばタグ検索）が施され、データ処理後のＸＭＬ文書が得られる。なお、データ処理としてタグ検索を行なった場合、検索結果が抽出ＸＭＬ文書の形で得られる。この抽出ＸＭＬ文書は、データ構造変換／逆変換機構１０に読み込まれ、変換仕様（実際には逆変換用スタイルシート）に基づいて元のデータ構造のＸＭＬ文書に逆変換され、最終的なデータ処理結果としてのＸＭＬ文書が得られる。 Even when the conversion method of the second embodiment is applied to the system shown in FIG. 2, the data structure conversion / inverse conversion mechanism (XSLT processor) 10 reads the conversion specification document by the XML document and also converts the input XML document to be processed. Read, convert the input XML document based on the conversion specification (actually, a structure conversion style sheet), and output an XML document subjected to predetermined data structure conversion. The converted XML document is subjected to data processing (for example, tag search) by application software via the standard API 20, and an XML document after data processing is obtained. When a tag search is performed as data processing, the search result is obtained in the form of an extracted XML document. This extracted XML document is read into the data structure conversion / inverse conversion mechanism 10 and is converted back into an XML document having the original data structure based on the conversion specification (actually, the reverse conversion style sheet), and the final data processing is performed. The resulting XML document is obtained.

なお、第２実施形態において、ＸＳＬＴ変換部１１に読み込まれるデータ構造変換用の仕様ＸＭＬ文書については、図２７を参照しながら後述する。また、ＸＳＬＴ変換部１１によって生成される構造変換用スタイルシートおよび逆変換用スタイルシートについては、それぞれ図２８および図２９を参照しながら後述する。
〔２−３〕第２実施形態におけるＸＭＬ文書の変換方法および具体的な変換例
図２３〜図２６においては、それぞれ、図４（Ａ）に示した表形式ＸＭＬ文書に、第２実施形態の構造化文書変換方法を適用して得られた変換結果の第１〜第４具体例が示されている。ここでも、タグ名“名前”，“会社”の要素をキー要素とし、タグ名“部署”，“住所”，“電話”の要素を非キー要素とする。 In the second embodiment, a specification XML document for data structure conversion read by the XSLT conversion unit 11 will be described later with reference to FIG. Further, the structure conversion style sheet and the reverse conversion style sheet generated by the XSLT conversion unit 11 will be described later with reference to FIGS. 28 and 29, respectively.
[2-3] XML Document Conversion Method and Specific Conversion Example in the Second Embodiment In FIGS. 23 to 26, the tabular XML document shown in FIG. First to fourth specific examples of conversion results obtained by applying the structured document conversion method are shown. Again, the elements of the tag names “name” and “company” are used as key elements, and the elements of the tag names “department”, “address”, and “telephone” are used as non-key elements.

図２３に示す第１具体例では、変換対象のＸＭＬ文書を成す要素をキー要素と非キー要素とに分け、タグ名“情報”を付与された新たな要素を作成し、非キー要素の記述中においてタグ記号「<」，「>」をそれぞれ実体参照記述による文字列「<」，「>」に置き換えた文字列を作成し、この文字列を新たな要素の内容として記述する。キー要素については、変換後のＸＭＬ文書においても、何の変換も施すことなくそのまま記述する。このとき、キー要素と非キー要素とを区別するための情報や、新たな要素に関する情報（タグ名“情報”）は、変換仕様文書に記述されて指定され、この変換仕様文書に基づいて、変換対象のＸＭＬ文書に対するデータ構造変換が施されるとともに、その変換を施されたＸＭＬ文書に対し、非キー要素の記述を元の状態に戻す逆変換が施される。 In the first specific example shown in FIG. 23, the elements constituting the XML document to be converted are divided into key elements and non-key elements, a new element having a tag name “information” is created, and the non-key element description A character string is created by replacing the tag symbols “<” and “>” with character strings “<” and “>” by entity reference description, respectively, and this character string is described as the contents of a new element. The key elements are described as they are without any conversion in the converted XML document. At this time, information for distinguishing between key elements and non-key elements and information on new elements (tag name “information”) are described and specified in the conversion specification document, and based on this conversion specification document, Data structure conversion is performed on the XML document to be converted, and reverse conversion is performed on the converted XML document to return the description of the non-key elements to the original state.

図２４に示す第２具体例では、変換対象のＸＭＬ文書を成す要素をキー要素と非キー要素とに分け、タグ名“情報”および属性名“contents”を付与された新たな要素（空要素）を作成し、非キー要素の記述中においてタグ記号「<」，「>」をそれぞれ実体参照記述による文字列「<」，「>」に置き換えた文字列を作成し、この文字列を新たな要素の属性名“contents”に対応する属性値として記述する。キー要素については、変換後のＸＭＬ文書においても、何の変換も施すことなくそのまま記述する。このとき、キー要素と非キー要素とを区別するための情報や、新たな要素に関する情報（タグ名“情報”および属性名“contents”）は、変換仕様文書に記述されて指定され、この変換仕様文書に基づいて、変換対象のＸＭＬ文書に対するデータ構造変換や、変換後ＸＭＬ文書に対する逆変換が施される。 In the second specific example shown in FIG. 24, the elements constituting the XML document to be converted are divided into key elements and non-key elements, and a new element (empty element) to which the tag name “information” and the attribute name “contents” are assigned. ) And a character string in which the tag symbols “<” and “>” are replaced with the character strings “<” and “>” in the entity reference description in the description of the non-key element, respectively. Is described as an attribute value corresponding to the attribute name “contents” of the new element. The key elements are described as they are without any conversion in the converted XML document. At this time, information for distinguishing key elements from non-key elements and information about new elements (tag name “information” and attribute name “contents”) are specified and described in the conversion specification document. Based on the specification document, data structure conversion for the XML document to be converted and reverse conversion for the converted XML document are performed.

図２５に示す第３具体例では、変換対象のＸＭＬ文書を成す要素をキー要素と非キー要素とに分け、非キー要素の親要素（タグ名“個人”）に新たな属性名“contents”を付与し、非キー要素の記述中においてタグ記号「<」，「>」をそれぞれ実体参照記述による文字列「<」，「>」に置き換えた文字列を作成し、この文字列を親要素の属性名“contents”に対応する属性値として記述する。キー要素については、変換後のＸＭＬ文書においても、何の変換も施すことなくそのまま記述する。このとき、キー要素と非キー要素とを区別するための情報や、親要素に関する情報（タグ名“個人”や属性名“contents”）は、変換仕様文書に記述されて指定され、この変換仕様文書に基づいて、変換対象のＸＭＬ文書に対するデータ構造変換や、変換後ＸＭＬ文書に対する逆変換が施される。 In the third specific example shown in FIG. 25, the elements constituting the XML document to be converted are divided into key elements and non-key elements, and a new attribute name “contents” is assigned to the parent element (tag name “person”) of the non-key elements. And create a character string in which the tag symbols "<" and ">" are replaced with the character strings "<" and ">" respectively in the entity reference description in the description of the non-key element. Describe as attribute value corresponding to attribute name “contents” of parent element. The key elements are described as they are without any conversion in the converted XML document. At this time, information for distinguishing key elements from non-key elements and information regarding parent elements (tag name “person” and attribute name “contents”) are described and specified in the conversion specification document. Based on the document, data structure conversion for the XML document to be converted and reverse conversion for the converted XML document are performed.

図２６に示す第４具体例では、変換対象のＸＭＬ文書を成す要素をキー要素と非キー要素とに分け、非キー要素の記述中においてタグ記号「<」，「>」をそれぞれ実体参照記述による文字列「<」，「>」に置き換えた文字列を作成し、この文字列を親要素（タグ名“個人”）の内容として記述する。キー要素については、変換後のＸＭＬ文書においても、何の変換も施すことなくそのまま記述する。このとき、キー要素と非キー要素とを区別するための情報や、親要素に関する情報（タグ名“個人”）は、変換仕様文書に記述されて指定され、この変換仕様文書に基づいて、変換対象のＸＭＬ文書に対するデータ構造変換や、変換後ＸＭＬ文書に対する逆変換が施される。 In the fourth specific example shown in FIG. 26, the elements constituting the XML document to be converted are divided into key elements and non-key elements, and the tag symbols “<” and “>” are described as entity reference descriptions in the description of the non-key elements. A character string replaced with the character strings “<” and “>” is created, and this character string is described as the content of the parent element (tag name “person”). The key elements are described as they are without any conversion in the converted XML document. At this time, information for distinguishing the key element from the non-key element and information regarding the parent element (tag name “person”) are described and specified in the conversion specification document, and the conversion is performed based on the conversion specification document. Data structure conversion for the target XML document and reverse conversion for the converted XML document are performed.

このように、第２実施形態の変換方法も、第１実施形態と同様、複数の非キー要素を一つの要素にまとめ、応用ソフトウエアがデータ処理を実行している間は非キー要素をデータ処理と無関係な要素として一括して扱えるようにするものである。変換方法として、図２３〜図２６で説明した各種方法のうちのどれを用いるかは、自動変換スタイルシート等により選択・指定することができるようになっている。このとき、これら各種方法のうちのどれを用いるかは、ＸＭＬ文書のデータ量によって、あるいは、データ処理に伴い新たな要素が幾つ増えるかによって決定されることになるが、非キー要素をひとまとめにして取り扱うという本発明の本質を考えれば、どの方法を採用してもよい。 As described above, in the conversion method of the second embodiment, similarly to the first embodiment, a plurality of non-key elements are combined into one element, and the non-key elements are converted into data while the application software is executing data processing. It can be handled collectively as an element unrelated to processing. Which of the various methods described with reference to FIGS. 23 to 26 is used as the conversion method can be selected and designated by an automatic conversion style sheet or the like. At this time, which of these various methods is used is determined by the amount of data in the XML document or by how many new elements increase with data processing. Any method may be adopted considering the essence of the present invention.

〔２−４〕第２実施形態の変換仕様文書およびスタイルシートの具体例
図２７には、図４（Ａ）に示した表形式ＸＭＬ文書を変換対象とした場合の、具体的な変換仕様文書（ＸＭＬ文書）が示されている。ここでは、変換対象のＸＭＬ文書が表形式データである場合について説明しているが、変換対象のＸＭＬ文書が非表形式データであっても、図２７に示す変換仕様文書を用いて変換／逆変換を行なうことができる。また、図２７に示す変換仕様文書は、図２３で説明した変換方法を実現するためのものである。 [2-4] Specific Example of Conversion Specification Document and Style Sheet of Second Embodiment FIG. 27 shows a specific conversion specification document when the table format XML document shown in FIG. (XML document) is shown. Here, the case where the XML document to be converted is tabular data has been described. However, even if the XML document to be converted is non-tabular data, conversion / inversion is performed using the conversion specification document shown in FIG. Conversion can be performed. The conversion specification document shown in FIG. 27 is for realizing the conversion method described in FIG.

この図２７に示す変換仕様文書では、ルートのタグ名“名簿”やレコードのタグ名“個人”が記述されるほか、タグ名“key”の要素の内容としてキー要素のタグ名“名前”および“会社”を記述するとともにタグ名“nonkey”の要素の内容として非キー要素のタグ名“部署”，“住所”および“電話”を記述することによりキー要素と非キー要素とを区別するための情報が記述されている。また、タグ名“nonkey”の要素の内容には、タグ名“merged_item”の要素が含まれており、この要素の内容として、非キー要素を一つにまとめるための新たな要素のタグ名 “情報”が記述されている。このような変換仕様文書により、ＸＭＬ文書のデータ構造変換実行手順が指示される。 In the conversion specification document shown in FIG. 27, the tag name “name list” of the root and the tag name “person” of the record are described, and the tag name “name” of the key element as the contents of the element of the tag name “key” and To distinguish between key elements and non-key elements by describing “company” and the tag name “department”, “address”, and “phone” of the non-key element as the contents of the element with the tag name “nonkey” Is described. In addition, the content of the element of the tag name “nonkey” includes the element of the tag name “merged_item”, and the content of this element is the tag name “ "Information" is described. Such a conversion specification document instructs the data structure conversion execution procedure of the XML document.

そして、図２に示すＸＳＬＴ変換部１１が、図２７に示す変換仕様文書を読み込み、その変換仕様文書と自動変換スタイルシート（自動変換ＸＳＬシート；図示省略）とにより、図２８に示す構造変換用スタイルシート（ＸＳＬシート）と図２９に示す逆変換用スタイルシート（ＸＳＬシート）とを生成する。図２８に示す構造変換用スタイルシートは、ＸＳＬＴ構造変換部１２によって読み込まれ、変換対象のＸＭＬ文書（入力ＸＭＬ文書）に対しデータ構造変換を施すために用いられる。また、図２９に示す逆変換用スタイルシートは、ＸＳＬＴ逆変換部１３によって読み込まれ、応用ソフトウエア３０により処理されたＸＭＬ文書（抽出ＸＭＬ文書，変換後ＸＭＬ文書）を元の形式のＸＭＬ文書（非キー要素を元の状態に戻したＸＭＬ文書）に復元するために用いられる。 Then, the XSLT conversion unit 11 shown in FIG. 2 reads the conversion specification document shown in FIG. 27 and uses the conversion specification document and the automatic conversion style sheet (automatic conversion XSL sheet; not shown) for structure conversion shown in FIG. A style sheet (XSL sheet) and a reverse conversion style sheet (XSL sheet) shown in FIG. 29 are generated. The structure conversion style sheet shown in FIG. 28 is read by the XSLT structure conversion unit 12 and used to perform data structure conversion on the XML document to be converted (input XML document). In addition, the style sheet for inverse transformation shown in FIG. 29 is an XML document (original XML document (extracted XML document, converted XML document)) that is read by the XSLT inverse transformation unit 13 and processed by the application software 30. This is used to restore a non-key element to an XML document that has been returned to its original state.

ここまで、各レコードにおける非キー要素が、単一階層であり且つ属性をもたない場合について説明してきたが、第２実施形態の変換方法も、非キー要素が複数階層を成す場合（階層が深くなった場合）や属性を有する場合にも、上述した原理を拡張することによって適用される。つまり、階層毎に、非キー要素のタグに関わる記号を実体参照記述による文字列に置き換え、その置換結果を要素内容とする新しい要素を同じ階層に設けるか、その置換結果を属性値とする新しい要素を同じ階層に設けるか、その置換結果を親要素の要素内容もしくは新しい属性の属性値として記述すればい。 Up to this point, the case where the non-key element in each record has a single hierarchy and no attribute has been described. However, the conversion method of the second embodiment also has a case where the non-key element has a plurality of hierarchies (hierarchy is different). This is also applied by extending the above-mentioned principle even when it is deeper or has attributes. In other words, for each hierarchy, replace the symbol related to the tag of the non-key element with a character string based on the entity reference description, and provide a new element with the substitution result as the element content in the same hierarchy, or use the substitution result as a new attribute value. Elements can be provided in the same hierarchy, or the replacement result can be described as the element content of the parent element or the attribute value of the new attribute.

なお、図３０は、第２実施形態において、レコード内の非キー要素が階層構造を成すとともに属性を有する場合の変換仕様文書を作成する手順を説明するためのフローチャート（ステップＳ１，Ｓ２，Ｓ５およびＳ６）である。ただし、図３０に示す手順は、レコード内の階層数が任意であり且つ非キー要素が任意の属性を有する場合の変換仕様の作成手順である。また、図３０に示す手法で作成される変換仕様文書は、図２３で説明した変換方法を実現するためのものである。 FIG. 30 is a flowchart (steps S1, S2, S5, and S5) for explaining a procedure for creating a conversion specification document when the non-key elements in the record have a hierarchical structure and have attributes in the second embodiment. S6). However, the procedure shown in FIG. 30 is a procedure for creating a conversion specification when the number of hierarchies in a record is arbitrary and a non-key element has an arbitrary attribute. Further, the conversion specification document created by the method shown in FIG. 30 is for realizing the conversion method described in FIG.

レコード内の非キー要素が階層構造を成すとともに属性を有する場合の変換仕様文書を作成する際には、図３０に示すように、まず、ルート（root）およびレコードのタグ名を要素“structure”で指定する（ステップＳ１）。また、レコード内の要素をキー要素と非キー要素との二つのグループに分ける（ステップＳ２）。そして、キー要素のタグ名をそれぞれ<key>内の<item>の箇所で指定するとともに（ステップＳ５）、非キー要素のタグ名をそれぞれ<nonkey>内の<item>の箇所で指定する（ステップＳ６）。 When creating a conversion specification document when the non-key elements in the record have a hierarchical structure and have attributes, first, as shown in FIG. 30, the root and the tag name of the record are set to the element “structure”. (Step S1). Further, the elements in the record are divided into two groups of key elements and non-key elements (step S2). Then, the tag name of the key element is designated at the <item> location in <key> (step S5), and the tag name of the non-key element is designated at the <item> location in <nonkey> (step S5). Step S6).

ステップＳ６においては、非キー要素に関する情報が、以下の手順(1)，(2)に従って変換仕様文書として記述される。
手順(1)：一つにまとめられた非キー要素を記述する新たな要素のタグ名を<merged_item>で指定する。
手順(2)：非キー要素のタグ名を<item>の後に記述する。 In step S6, information on non-key elements is described as a conversion specification document according to the following procedures (1) and (2).
Step (1): Specify <merged_item> as the tag name of a new element that describes non-key elements grouped together.
Step (2): Describe the tag name of the non-key element after <item>.

〔２−５〕第２実施形態の変換方法による具体的な変換処理手順
次に、図３１〜図３９を参照しながら、本発明の第２実施形態としての構造化文書変換方法による変換処理手順について説明する。
図３１〜図３８は、図１８や図１９により説明した手順と同様、データ構造変換／逆変換処理を、Javaソフトウエアにより、ＤＯＭおよびＸＳＬＴなどを使用して実行する場合の処理手順を示すものである。なお、図３１〜図３８において図１８および図１９と同じステップ番号を付されたステップは、図１８および図１９で説明した処理と同一もしくはほぼ同一の処理を実行するものであるので、その説明は省略する。つまり、以下の説明では、ステップ番号Ａ１〜Ａ１１，Ａ１５，Ａ１６およびＢ１〜Ｂ１１，Ｂ１４，Ｂ１５を付されたステップにおける処理の説明は省略する。また、図３１〜図３８に示す処理手順は、図２に示すようなデータ構造変換／逆変換機構１０を用いることなく、変換仕様文書に基づいて、変換対象ＸＭＬ文書や変換後ＸＭＬ文書に対する処理を実行する際の処理手順である。 [2-5] Specific Conversion Processing Procedure by the Conversion Method of the Second Embodiment Next, with reference to FIGS. 31 to 39, the conversion processing procedure by the structured document conversion method as the second embodiment of the present invention. Will be described.
FIG. 31 to FIG. 38 show the processing procedure when the data structure conversion / inverse conversion processing is executed using Java software using DOM and XSLT, similar to the procedure described with reference to FIG. 18 and FIG. It is. In FIG. 31 to FIG. 38, the steps denoted by the same step numbers as those in FIG. 18 and FIG. 19 execute the same or substantially the same processes as those described in FIG. 18 and FIG. Is omitted. That is, in the following description, the description of the process in the step which attached | subjected step number A1-A11, A15, A16 and B1-B11, B14, B15 is abbreviate | omitted. Further, the processing procedure shown in FIGS. 31 to 38 does not use the data structure conversion / inverse conversion mechanism 10 as shown in FIG. 2, and processes the conversion target XML document and the converted XML document based on the conversion specification document. It is a processing procedure at the time of performing.

〔２−５−１〕変換／逆変換処理手順の第１例
図３１は、変換仕様文書に基づいて変換対象ＸＭＬ文書に対してデータ構造変換を施す際の処理手順の第１例を説明するためのフローチャートであり、図３２は、変換仕様文書に基づいて変換後ＸＭＬ文書（処理済ＸＭＬ文書）に対してデータ構造の逆変換を施す際の処理手順の第１例を説明するためのフローチャートである。ここで説明する第１例は、図２３で説明した変換方法に対応するものである。 [2-5-1] First Example of Conversion / Inverse Conversion Processing Procedure FIG. 31 illustrates a first example of a processing procedure when data structure conversion is performed on a conversion target XML document based on a conversion specification document. FIG. 32 is a flowchart for explaining a first example of a processing procedure when reverse conversion of the data structure is performed on the converted XML document (processed XML document) based on the conversion specification document. It is. The first example described here corresponds to the conversion method described in FIG.

図３１に示す変換処理手順の第１例では、ステップＡ７で切り出された要素データが非キー要素である場合（ステップＡ１１のＹＥＳルート）、変換仕様文書によって予め指定されたタグ名“情報”の新要素（<情報>タグ）を作成する（ステップＡ３１）。既に非キー要素に対応する新要素が作成されている場合には、この作成処理は省略される。
また、その非キー要素の記述中におけるタグ記号「<」，「>」をそれぞれ実体参照記述による文字列「<」，「>」に置き換える（ステップＡ３２）。なお、このステップＡ３２において、非キー要素の内容中に、タグ付けに関連する記号（表１参照）と同じ文字が現われた場合、その文字を、実体参照記述による文字列に置き換える。 In the first example of the conversion processing procedure shown in FIG. 31, when the element data extracted in step A7 is a non-key element (YES route in step A11), the tag name “information” specified in advance by the conversion specification document is stored. A new element (<information> tag) is created (step A31). If a new element corresponding to a non-key element has already been created, this creation process is omitted.
Further, the tag symbols “<” and “>” in the description of the non-key element are respectively replaced with character strings “<” and “>” based on the entity reference description (step A32). In step A32, if the same character as the symbol related to tagging (see Table 1) appears in the contents of the non-key element, the character is replaced with a character string based on the entity reference description.

そして、ステップＡ３１で新要素を作成した場合には、ステップＡ３２での置換結果文字列を新要素の内容として記述する。既に非キー要素に対応する新要素が作成されている場合には、ステップＡ３２での置換結果文字列を、新要素の内容における置換結果文字列の後に繋げて記述する（ステップＡ３３）。この後、ステップＡ７の処理に戻る。
一方、図３２に示す逆変換処理手順の第１例では、非キー要素をまとめた新要素（<情報>タグ）がステップＢ７で切り出された場合（ステップＢ１１のＹＥＳルート）、その新要素の内容の記述中における文字列「<」，「>」を元のタグ記号「<」，「>」に復元する（ステップＢ３１）。なお、このステップＢ３１において、新要素における内容に、実体参照記述による他の文字列が含まれている場合には、その文字列を、タグ付けに関連する元の記号（表１参照）に復元する。そして、復元ＸＭＬ文書において、非キー要素をまとめた要素（<情報>タグ）の記述を削除してから（ステップＢ３２）、ステップＢ７の処理に戻る。 When a new element is created in step A31, the replacement result character string in step A32 is described as the content of the new element. If a new element corresponding to the non-key element has already been created, the replacement result character string in step A32 is described after the replacement result character string in the content of the new element (step A33). Thereafter, the process returns to step A7.
On the other hand, in the first example of the inverse conversion processing procedure shown in FIG. 32, when a new element (<information> tag) in which non-key elements are collected is cut out in step B7 (YES route in step B11), the new element The character strings “<” and “>” in the content description are restored to the original tag symbols “<” and “>” (step B31). In step B31, if the content of the new element contains another character string based on the entity reference description, the character string is restored to the original symbol related to tagging (see Table 1). To do. Then, after the description of the element (<information> tag) in which the non-key elements are grouped is deleted in the restored XML document (step B32), the process returns to step B7.

〔２−５−２〕変換／逆変換処理手順の第２例
図３３は、変換仕様文書に基づいて変換対象ＸＭＬ文書に対してデータ構造変換を施す際の処理手順の第２例を説明するためのフローチャートであり、図３４は、変換仕様文書に基づいて変換後ＸＭＬ文書（処理済ＸＭＬ文書）に対してデータ構造の逆変換を施す際の処理手順の第２例を説明するためのフローチャートである。ここで説明する第２例は、図２４で説明した変換方法に対応するものである。 [2-5-2] Second Example of Conversion / Inverse Conversion Processing Procedure FIG. 33 illustrates a second example of a processing procedure when data structure conversion is performed on a conversion target XML document based on a conversion specification document. FIG. 34 is a flowchart for explaining a second example of a processing procedure when performing reverse conversion of the data structure on the converted XML document (processed XML document) based on the conversion specification document. It is. The second example described here corresponds to the conversion method described in FIG.

図３３に示す変換処理手順の第２例では、ステップＡ７で切り出された要素データが非キー要素である場合（ステップＡ１１のＹＥＳルート）、タグ名“情報”および属性名“contents”を付与された新要素（<情報>タグ）を作成する（ステップＡ３４）。既に非キー要素に対応する新要素が作成されている場合には、この作成処理は省略される。
また、その非キー要素の記述中におけるタグ記号「<」，「>」をそれぞれ実体参照記述による文字列「<」，「>」に置き換える（ステップＡ３５）。なお、このステップＡ３５において、非キー要素の内容中に、タグ付けに関連する記号（表１参照）と同じ文字が現われた場合、その文字を、実体参照記述による文字列に置き換える。 In the second example of the conversion processing procedure shown in FIG. 33, when the element data extracted in step A7 is a non-key element (YES route in step A11), the tag name “information” and the attribute name “contents” are assigned. A new element (<information> tag) is created (step A34). If a new element corresponding to a non-key element has already been created, this creation process is omitted.
Further, the tag symbols “<” and “>” in the description of the non-key element are respectively replaced with character strings “<” and “>” based on the entity reference description (step A35). In step A35, if the same character as a symbol related to tagging (see Table 1) appears in the contents of the non-key element, the character is replaced with a character string based on the entity reference description.

そして、ステップＡ３４で新要素を作成した場合には、ステップＡ３５での置換結果文字列を新要素の“contents”属性値として記述する。既に非キー要素に対応する新要素が作成されている場合には、ステップＡ３５での置換結果文字列を、新要素の“contents”属性値における置換結果文字列の後に繋げて記述する（ステップＡ３６）。この後、ステップＡ７の処理に戻る。 When a new element is created in step A34, the replacement result character string in step A35 is described as the “contents” attribute value of the new element. If a new element corresponding to the non-key element has already been created, the replacement result character string in step A35 is described after the replacement result character string in the “contents” attribute value of the new element (step A36). ). Thereafter, the process returns to step A7.

一方、図３４に示す逆変換処理手順の第２例では、非キー要素をまとめた新要素（<情報>タグ）がステップＢ７で切り出された場合（ステップＢ１１のＹＥＳルート）、その新要素の“contents”属性値の記述中における文字列「<」，「>」を元のタグ記号「<」，「>」に復元する（ステップＢ３３）。なお、このステップＢ３３において、新要素における“contents”属性値に、実体参照記述による他の文字列が含まれている場合には、その文字列を、タグ付けに関連する元の記号（表１参照）に復元する。 On the other hand, in the second example of the inverse conversion processing procedure shown in FIG. 34, when a new element (<information> tag) in which non-key elements are collected is cut out in step B7 (YES route in step B11), the new element The character strings “<” and “>” in the description of the “contents” attribute value are restored to the original tag symbols “<” and “>” (step B33). In step B33, if the “contents” attribute value in the new element includes another character string based on the entity reference description, the character string is replaced with the original symbol related to tagging (Table 1). To restore).

そして、復元ＸＭＬ文書において、非キー要素をまとめた要素（<情報>タグ）の記述を削除するとともに、この要素（<情報>タグ）の“contents”属性値（ステップＢ３３で復元された結果）を、キー要素の隣に要素内容として挿入する（ステップＢ３４）。この後、ステップＢ７の処理に戻る。
〔２−５−３〕変換／逆変換処理手順の第３例
図３５は、変換仕様文書に基づいて変換対象ＸＭＬ文書に対してデータ構造変換を施す際の処理手順の第３例を説明するためのフローチャートであり、図３６は、変換仕様文書に基づいて変換後ＸＭＬ文書（処理済ＸＭＬ文書）に対してデータ構造の逆変換を施す際の処理手順の第３例を説明するためのフローチャートである。ここで説明する第３例は、図２５で説明した変換方法に対応するものである。 Then, in the restored XML document, the description of the element (<information> tag) that summarizes the non-key elements is deleted, and the “contents” attribute value of this element (<information> tag) (result restored in step B33) Is inserted as an element content next to the key element (step B34). Thereafter, the process returns to step B7.
[2-5-3] Third Example of Conversion / Inverse Conversion Processing Procedure FIG. 35 illustrates a third example of a processing procedure when data structure conversion is performed on a conversion target XML document based on a conversion specification document. FIG. 36 is a flowchart for explaining a third example of the processing procedure when performing reverse conversion of the data structure on the converted XML document (processed XML document) based on the conversion specification document. It is. The third example described here corresponds to the conversion method described in FIG.

図３５に示す変換処理手順の第３例では、ステップＡ７で切り出された要素データが非キー要素である場合（ステップＡ１１のＹＥＳルート）、親要素（<個人>タグ）に属性名“contents”の新たな属性を設定する（ステップＡ３７）。既に新属性が設定されている場合には、この作成処理は省略される。
また、その非キー要素の記述中におけるタグ記号「<」，「>」をそれぞれ実体参照記述による文字列「<」，「>」に置き換える（ステップＡ３８）。なお、このステップＡ３８において、非キー要素の内容中に、タグ付けに関連する記号（表１参照）と同じ文字が現われた場合、その文字を、実体参照記述による文字列に置き換える。 In the third example of the conversion processing procedure shown in FIG. 35, when the element data cut out in step A7 is a non-key element (YES route in step A11), the attribute name “contents” is assigned to the parent element (<person> tag). Are set (step A37). If a new attribute has already been set, this creation process is omitted.
Further, the tag symbols “<” and “>” in the description of the non-key element are respectively replaced with character strings “<” and “>” based on the entity reference description (step A38). In step A38, when the same character as the symbol related to tagging (see Table 1) appears in the contents of the non-key element, the character is replaced with a character string based on the entity reference description.

そして、ステップＡ３７で新属性を設定した場合には、ステップＡ３８での置換結果文字列を親要素の“contents”属性値として記述する。既に非キー要素に対応する新属性が設定されている場合には、ステップＡ３７での置換結果文字列を、親要素の“contents”属性値における置換結果文字列の後に繋げて記述する（ステップＡ３９）。この後、ステップＡ７の処理に戻る。 When a new attribute is set in step A37, the replacement result character string in step A38 is described as the “contents” attribute value of the parent element. When the new attribute corresponding to the non-key element has already been set, the replacement result character string in step A37 is described after the replacement result character string in the “contents” attribute value of the parent element (step A39). ). Thereafter, the process returns to step A7.

一方、図３６に示す逆変換処理手順の第３例では、前述したステップＢ９およびＢ１１による処理に代えてステップＢ９´による処理が実行されている。このステップＢ９´においては、ステップＢ７で切り出された要素が、非キー要素を“contents”属性値としてまとめたマージ親要素（ここでは“contents”属性値を有する<個人>タグ）であるか否かを判断する。 On the other hand, in the third example of the inverse conversion process procedure shown in FIG. 36, the process in step B9 ′ is executed instead of the process in steps B9 and B11 described above. In this step B9 ′, whether or not the element cut out in step B7 is a merge parent element (here, a <personal> tag having a “contents” attribute value) in which non-key elements are grouped as “contents” attribute values. Determine whether.

マージ親要素でない場合（ステップＢ９´のＮＯルート）、前述したステップＢ１０の処理へ移行する一方、マージ親要素である場合（ステップＢ９´のＹＥＳルート）、その親要素の“contents”属性値の記述中における文字列「<」，「>」を元のタグ記号「<」，「>」に復元する（ステップＢ３５）。なお、このステップＢ３５において、親要素における“contents”属性値に、実体参照記述による他の文字列が含まれている場合には、その文字列を、タグ付けに関連する元の記号（表１参照）に復元する。 If it is not a merge parent element (NO route of step B9 ′), the process proceeds to the process of step B10 described above. If it is a merge parent element (YES route of step B9 ′), the “contents” attribute value of the parent element is set. The character strings “<” and “>” in the description are restored to the original tag symbols “<” and “>” (step B35). In this step B35, if the “contents” attribute value in the parent element includes another character string based on the entity reference description, the character string is replaced with the original symbol related to tagging (Table 1). To restore).

そして、復元ＸＭＬ文書において、親要素において非キー要素向けに設定された属性の記述を削除するとともに、その属性の“contents”属性値（ステップＳ３５で復元された結果）を、元々の子要素の記述の隣に要素内容として挿入する（ステップＢ３６）。この後、ステップＢ７の処理に戻る。
〔２−５−４〕変換／逆変換処理手順の第４例
図３７は、変換仕様文書に基づいて変換対象ＸＭＬ文書に対してデータ構造変換を施す際の処理手順の第４例を説明するためのフローチャートであり、図３８は、変換仕様文書に基づいて変換後ＸＭＬ文書（処理済ＸＭＬ文書）に対してデータ構造の逆変換を施す際の処理手順の第４例を説明するためのフローチャートである。ここで説明する第４例は、図２６で説明した変換方法に対応するものである。 Then, in the restored XML document, the attribute description set for the non-key element in the parent element is deleted, and the “contents” attribute value of the attribute (the result restored in step S35) is changed to the original child element. The element content is inserted next to the description (step B36). Thereafter, the process returns to step B7.
[2-5-4] Fourth Example of Conversion / Inverse Conversion Processing Procedure FIG. 37 illustrates a fourth example of a processing procedure when data structure conversion is performed on a conversion target XML document based on a conversion specification document. FIG. 38 is a flowchart for explaining a fourth example of a processing procedure when performing reverse conversion of the data structure on the converted XML document (processed XML document) based on the conversion specification document. It is. The fourth example described here corresponds to the conversion method described in FIG.

図３７に示す変換処理手順の第４例では、ステップＡ７で切り出された要素データが非キー要素である場合（ステップＡ１１のＹＥＳルート）、その非キー要素の記述中におけるタグ記号「<」，「>」をそれぞれ実体参照記述による文字列「<」，「>」に置き換える（ステップＡ４０）。なお、このステップＡ４０において、非キー要素の内容中に、タグ付けに関連する記号（表１参照）と同じ文字が現われた場合、その文字を、実体参照記述による文字列に置き換える。そして、ステップＡ４０での置換結果文字列を、非キー要素の親要素（<個人>タグ）の内容として記述する（ステップＡ４１）。この後、ステップＡ７の処理に戻る。 In the fourth example of the conversion processing procedure shown in FIG. 37, when the element data extracted in step A7 is a non-key element (YES route in step A11), the tag symbol “<”, in the description of the non-key element, “>” Is replaced with character strings “<” and “>” according to the entity reference description, respectively (step A40). In step A40, if the same character as a symbol related to tagging (see Table 1) appears in the contents of the non-key element, the character is replaced with a character string based on the entity reference description. Then, the replacement result character string in step A40 is described as the contents of the parent element (<personal> tag) of the non-key element (step A41). Thereafter, the process returns to step A7.

一方、図３８に示す逆変換処理手順の第４例では、前述したステップＢ９´による処理に代えてステップＢ９″による処理が実行されている。このステップＢ９″においては、ステップＢ７で切り出された要素が、非キー要素を要素内容としてまとめたマージ親要素であるか否かを判断する。
マージ親要素でない場合（ステップＢ９″のＮＯルート）、前述したステップＢ１０の処理へ移行する一方、マージ親要素である場合（ステップＢ９″のＹＥＳルート）、その親要素の要素内容の記述中における文字列「<」，「>」を元のタグ記号「<」，「>」に復元する（ステップＢ３７）。なお、このステップＢ３７において、親要素における要素内容に、実体参照記述による他の文字列が含まれている場合には、その文字列を、タグ付けに関連する元の記号（表１参照）に復元する。そして、復元ＸＭＬ文書において、ステップＢ３７で復元された結果を、元々の子要素の記述の隣に要素内容として挿入してから（ステップＢ３８）、ステップＢ７の処理に戻る。 On the other hand, in the fourth example of the inverse conversion process procedure shown in FIG. 38, the process in step B9 ″ is executed instead of the process in step B9 ′ described above. In this step B9 ″, the process is cut out in step B7. It is determined whether or not the element is a merge parent element in which non-key elements are grouped as element contents.
When it is not a merge parent element (NO route of step B9 ″), the process proceeds to the above-described processing of step B10, while when it is a merge parent element (YES route of step B9 ″) The character strings “<” and “>” are restored to the original tag symbols “<” and “>” (step B37). In step B37, if the element content in the parent element includes another character string based on the entity reference description, the character string is replaced with the original symbol related to tagging (see Table 1). Restore. Then, in the restored XML document, the result restored in step B37 is inserted as element content next to the original description of the child element (step B38), and the process returns to step B7.

〔２−５−５〕変換／逆変換処理手順の第５例
ところで、図３９（Ａ）〜図３９（Ｄ）は、第２実施形態によるデータ構造変換／逆変換処理をＸＳＬＴプロセッサのみで実行する場合の処理手順を示すものである。つまり、図３９（Ａ）〜図３９（Ｄ）に示す処理手順は、図２に示したデータ構造変換／逆変換機構１０を用い、変換仕様文書に基づいて、変換対象ＸＭＬ文書や変換後ＸＭＬ文書に対する処理を実行する際の処理手順である。 [2-5-5] Fifth Example of Conversion / Inverse Conversion Processing Procedure FIGS. 39A to 39D execute the data structure conversion / inverse conversion processing according to the second embodiment only with the XSLT processor. The processing procedure in the case of doing is shown. That is, the processing procedure shown in FIGS. 39A to 39D uses the data structure conversion / inverse conversion mechanism 10 shown in FIG. 2, and based on the conversion specification document, the conversion target XML document and the converted XML document This is a processing procedure when executing processing on a document.

ここで、図３９（Ａ）および図３９（Ｂ）は、それぞれ、第２実施形態における変換用スタイルシートおよび逆変換用スタイルシートの作成手順（ＸＳＬＴ変換部１１での処理）を説明するためのフローチャートである。
また、図３９（Ｃ）は、第２実施形態において、ＸＳＬＴ構造変換部１２が構造変換用スタイルシートに基づいて変換対象ＸＭＬ文書に対してデータ構造変換を施す際の処理手順（変換処理手順の第５例）を説明するためのフローチャートであり、図３９（Ｄ）は、第２実施形態において、ＸＳＬＴ逆変換部１３が逆変換用スタイルシートに基づいて変換後ＸＭＬ文書（処理済ＸＭＬ文書）に対してデータ構造の逆変換を施す際の処理手順（逆変換処理手順の第５例）を説明するためのフローチャートである。 Here, FIG. 39A and FIG. 39B are respectively for explaining the procedure for creating the conversion style sheet and the reverse conversion style sheet (processing in the XSLT conversion unit 11) in the second embodiment. It is a flowchart.
FIG. 39C shows a processing procedure (conversion processing procedure) when the XSLT structure conversion unit 12 performs data structure conversion on the conversion target XML document based on the structure conversion style sheet in the second embodiment. FIG. 39D is a flowchart for explaining an XML document (processed XML document) converted by the XSLT inverse conversion unit 13 based on the reverse conversion style sheet in the second embodiment. 6 is a flowchart for explaining a processing procedure (fifth example of the inverse transformation processing procedure) when performing inverse transformation of the data structure on the data.

変換対象ＸＭＬ文書に対する処理を施すのに先立って、まず、図３９（Ａ）に示すように、ＸＳＬＴ変換部１１は、ＸＭＬで記述された変換仕様文書を読み込んで、その変換仕様文書の記述から変換仕様を解析してから（ステップＡ１）、その変換仕様と自動変換スタイルシートとを用いて、データ構造変換用スタイルシートを作成する（ステップＡ２０）。また、同様に、図３９（Ｂ）に示すように、ＸＳＬＴ変換部１１は、ＸＭＬで記述された変換仕様文書を読み込んで、その変換仕様文書の記述から変換仕様を解析してから（ステップＢ１）、その変換仕様と自動変換スタイルシートとを用いて、データ構造逆変換用スタイルシートを作成する（ステップＢ２０）。なお、図３９（Ａ）および図３９（Ｂ）により説明した処理手順は、第１実施形態において図２０（Ａ）および図２０（Ｂ）により説明した処理手順と同様である。 Prior to performing the process on the conversion target XML document, first, as shown in FIG. 39A, the XSLT conversion unit 11 reads the conversion specification document described in XML, and from the description of the conversion specification document. After analyzing the conversion specifications (step A1), a data structure conversion style sheet is created using the conversion specifications and the automatic conversion style sheet (step A20). Similarly, as shown in FIG. 39B, the XSLT conversion unit 11 reads a conversion specification document described in XML, analyzes the conversion specification from the description of the conversion specification document (step B1). The data structure reverse conversion style sheet is created using the conversion specification and the automatic conversion style sheet (step B20). Note that the processing procedure described with reference to FIGS. 39A and 39B is the same as the processing procedure described with reference to FIGS. 20A and 20B in the first embodiment.

そして、変換対象ＸＭＬ文書に対してデータ構造変換を施す際には、図３９（Ｃ）に示すように、ＸＳＬＴ構造変換部１２は、その変換対象ＸＭＬ文書と構造変換用スタイルシートとを指定して、変換処理を開始する（ステップＡ２１）。その後、ＸＳＬＴ構造変換部１２は、４種類の変換方法（それぞれ図２３〜図２６により説明した変換方法）から選択された方法に応じて、図３１，図３３，図３５もしくは図３７のステップＡ２以降の処理と同様の処理を実行する。 When the data structure conversion is performed on the conversion target XML document, the XSLT structure conversion unit 12 specifies the conversion target XML document and the structure conversion style sheet, as shown in FIG. Then, the conversion process is started (step A21). Thereafter, the XSLT structure conversion unit 12 performs step A2 of FIG. 31, FIG. 33, FIG. 35, or FIG. 37 according to the method selected from the four types of conversion methods (the conversion methods described with reference to FIGS. 23 to 26, respectively). The same processing as the subsequent processing is executed.

逆に、変換後ＸＭＬ文書に対してデータ構造の逆変換を施す際には、図３９（Ｄ）に示すように、ＸＳＬＴ逆変換部１３は、逆変換対象ＸＭＬ文書と逆変換用スタイルシートとを指定して、逆変換処理を開始する（ステップＢ２１）。その後、ＸＳＬＴ逆変換部１３は、４種類の変換方法（それぞれ図２３〜図２６により説明した変換方法）から選択された方法に応じて、図３２，図３４，図３６もしくは図３８のステップＢ２以降の処理と同様の処理を実行する。 On the other hand, when the data structure is reversely converted to the converted XML document, as shown in FIG. 39D, the XSLT reverse conversion unit 13 sets the reverse conversion target XML document, the reverse conversion style sheet, Is designated and the inverse conversion process is started (step B21). Thereafter, the XSLT inverse transform unit 13 performs step B2 in FIG. 32, FIG. 34, FIG. 36, or FIG. 38 according to the method selected from the four types of conversion methods (the conversion methods described with reference to FIGS. 23 to 26, respectively). The same processing as the subsequent processing is executed.

このようにして、第２実施形態においても、図２に示すように、応用ソフトウエア３０は、標準ＡＰＩ（ＤＯＭ）２０を通して、ＸＳＬＴ構造変換部１２からの、要素数を削減された変換後ＸＭＬ文書に対し、タグ検索等の処理を行なうことになるので、第１実施形態と同様、応用ソフトウエア３０による処理速度は大幅に高速化される。
〔２−６〕第２実施形態の効果
このように、本発明の第２実施形態としての構造化文書変換方法によれば、変換対象のＸＭＬ文書を成す要素がキー要素と非キー要素とに分けられ、その変換対象のＸＭＬ文書が、キー要素をそのまま記述する一方で、非キー要素を一つのタグにまとめその非キー要素の記述中のタグ記号をタグ付けに関連しない文字列に置き換えたＸＭＬ文書に変換されるので、第１実施形態の構造化文書変換方法と同様の効果ないし利点を得ることができる。このとき、タグ記号「<」および「>」をそれぞれ実体参照記述による文字列「&lt；」および「&gt；」に置き換えることにより、極めて容易にＸＭＬ文書の変換を行なうことができる。 In this way, also in the second embodiment, as shown in FIG. 2, the application software 30 uses the standard API (DOM) 20 to convert the converted XML from the XSLT structure conversion unit 12 with the number of elements reduced. Since processing such as tag search is performed on the document, the processing speed by the application software 30 is greatly increased as in the first embodiment.
[2-6] Effect of Second Embodiment As described above, according to the structured document conversion method as the second embodiment of the present invention, the elements constituting the XML document to be converted are converted into key elements and non-key elements. In the XML document to be converted, the key element is described as it is, while the non-key elements are grouped into one tag and the tag symbol in the description of the non-key element is replaced with a character string not related to tagging. Since it is converted into an XML document, the same effect or advantage as the structured document conversion method of the first embodiment can be obtained. At this time, the XML document can be converted very easily by replacing the tag symbols “<” and “>” with the character strings “<” and “>” based on the entity reference description, respectively.

〔３〕第３実施形態の説明
〔３−１〕第３実施形態の構造化文書変換方法の原理
図１（Ａ），図３（Ａ）および図４０を参照しながら、本発明の第３実施形態としての構造化文書変換方法の原理について説明する。
図１（Ａ）および図３（Ａ）により前述したＸＭＬ文書において、タグ名“名前”，“会社”の要素をキー要素とするとともにタグ名“部署”，“住所”，“電話”の要素を非キー要素とし、このＸＭＬ文書に対し、第３実施形態の構造化文書変換方法を適用して得られた変換後ＸＭＬ文書のメモリ展開形式を図４０に示す。なお、ここで示す展開形式は、応用ソフトウエアが標準ＡＰＩ（ＤＯＭ）を介して変換後ＸＭＬ文書を操作するときの、メモリ上への展開形式である。 [3] Description of Third Embodiment [3-1] Principle of Structured Document Conversion Method of Third Embodiment Referring to FIG. 1 (A), FIG. 3 (A), and FIG. The principle of the structured document conversion method as an embodiment will be described.
In the XML document described above with reference to FIGS. 1A and 3A, the tag name “name” and “company” elements are used as key elements, and the tag name “department”, “address”, and “telephone” elements. FIG. 40 shows a memory expansion format of the converted XML document obtained by applying the structured document conversion method of the third embodiment to this XML document. The development format shown here is a development format on the memory when the application software operates the converted XML document via the standard API (DOM).

この図４０に示すＸＭＬ文書では、タグ名“compressed”を付与された新たな要素が作成され、この新たな要素の内容として、タグ名“部署”，“住所”，“電話”の非キー要素をひとまとめにした文字列を圧縮して得られた圧縮文字列が記述されている。この圧縮文字列は、非キー要素をひとまとめにした文字列を、図４１（Ａ）を参照しながら後述する本発明のデータ変換方法により圧縮して得られたものである。また、タグ名“名前”，“会社”のキー要素については、元のまま記述されている。 In the XML document shown in FIG. 40, a new element having a tag name “compressed” is created, and the contents of this new element are non-key elements of tag names “department”, “address”, and “telephone”. A compressed character string obtained by compressing a character string that is a group of is described. This compressed character string is obtained by compressing a character string in which non-key elements are grouped by a data conversion method of the present invention described later with reference to FIG. Further, the key elements of the tag names “name” and “company” are described as they are.

このように、変換後ＸＭＬ文書において、レコード毎に一つの要素にまとめられた非キー要素を、所定のデータ変換方法により圧縮文字列に変換して記述することにより、ＸＭＬ文書中に含まれる要素の数、つまりメモリ上に展開されたツリーの子要素の数を大幅に減らすことができ、展開時やデータ処理時に非キー要素を一括して扱うことができる。
ここで、圧縮文字列は、変換後ＸＭＬ文書において、新たな要素の内容として記述されてもよいし〔図４０や図４４（Ａ）参照〕、新たな要素の属性値として記述されてもよい〔図４４（Ｂ）参照〕。なお、第３実施形態の変換方法の原理を説明するための図４０では、上記圧縮文字列を新たな要素の内容として記述した場合の変換後ＸＭＬ文書のＤＯＭツリーが示されている。 In this way, in the converted XML document, the non-key elements combined into one element for each record are converted into a compressed character string by a predetermined data conversion method, thereby describing the elements included in the XML document. , That is, the number of child elements of the tree expanded on the memory can be greatly reduced, and non-key elements can be handled collectively at the time of expansion or data processing.
Here, the compressed character string may be described as the content of a new element in the converted XML document (see FIG. 40 and FIG. 44A), or may be described as the attribute value of the new element. [See FIG. 44 (B)]. FIG. 40 for explaining the principle of the conversion method of the third embodiment shows a DOM tree of the converted XML document when the compressed character string is described as the contents of a new element.

〔３−２〕第３実施形態におけるデータ変換方法（データ圧縮／復元方法）
ところで、従来技術３において前述した通り、通常、圧縮ファイルは、バイナリデータであるため、文字コードのみからなるＸＭＬ文書内に置くことができない。
そこで、本発明のデータ変換方法のごとくバイナリの圧縮データを文字コードに変換すれば、その圧縮データ（圧縮文字列）をＸＭＬ文書の要素内容または属性値として記述することができる。 [3-2] Data conversion method (data compression / decompression method) in the third embodiment
By the way, as described above in the prior art 3, since the compressed file is usually binary data, it cannot be placed in an XML document consisting only of character codes.
Therefore, if binary compressed data is converted into a character code as in the data conversion method of the present invention, the compressed data (compressed character string) can be described as element contents or attribute values of the XML document.

ただし、その際、圧縮文字列に用いる文字コードのセット中に、構造化文書内で特別の意味をもつ文字コードが含まれないようにすることに注意する必要がある。ＸＭＬ文書の場合、上記特別の意味をもつ文字コードは、表１に示した、タグ付けに関連する記号<,>,&,",'である。
さらに、ＸＭＬ文書は種々の文字コード系（ＵＴＦ−８，ＵＴＦ−１６，Shift_ＪＩＳ，ＥＵＣ等）を取り得るため、圧縮データをただ単に文字コードで表すだけでは、ＸＭＬ文書の文字コード系が変換されたときに、圧縮データを表した圧縮文字列も自動的に変換されることになり、その圧縮データを元の状態に復元ができなくなるという不具合が生じるおそれがある。 However, it should be noted that character codes having a special meaning in the structured document are not included in the set of character codes used for the compressed character string. In the case of an XML document, the character codes having the special meaning are the symbols <,>, &, "," related to tagging shown in Table 1.
Furthermore, since the XML document can take various character code systems (UTF-8, UTF-16, Shift_JIS, EUC, etc.), the character code system of the XML document is converted simply by expressing the compressed data with the character code. When this occurs, the compressed character string representing the compressed data is also automatically converted, which may cause a problem that the compressed data cannot be restored to the original state.

上述のような注意点や不具合を考慮し、本発明のデータ変換方法では、圧縮データ（圧縮文字列）を表現する文字コードとして、タグ付けに関連する文字コードを除いたＡＳＣＩＩコードを用いている。ＡＳＣＩＩコードは、種々の文字コード系に共通に含まれる文字コードセットである。従って、圧縮文字列がＡＳＣＩＩコードによって記述されていれば、その圧縮文字列を含むＸＭＬ文書に対して文字コード系の変換が施されても、圧縮文字列を成すビット列は、変換されることなくそのままの状態に保持される。 In consideration of the above cautions and problems, the data conversion method of the present invention uses ASCII codes excluding character codes related to tagging as character codes that express compressed data (compressed character strings). . The ASCII code is a character code set that is commonly included in various character code systems. Therefore, if the compressed character string is described in ASCII code, the bit string constituting the compressed character string is not converted even if the character code system conversion is performed on the XML document including the compressed character string. It is kept as it is.

なお、図４５を参照しながら後述するごとく、圧縮文字列に圧縮時点の文字コード系の種別を示す情報を付与しておくことにより、圧縮文字列から復元されたデータの文字コード系の種別を認識することができ、その文字コード系を、ＸＭＬ文書の現在の文字コード系に合わせることにより、ＸＭＬ文書全体の文字コード系の整合性を保つことができる。
ここで、図４１（Ａ），図４１（Ｂ）および図４２を参照しながら、第３実施形態で用いられるデータ変換方法（データ圧縮／復元方法）について、より具体的に説明する。ここで、図４１（Ａ）はデータ変換処理（圧縮処理）の流れを説明するための図、図４１（Ｂ）はデータ逆変換処理（伸長処理）の流れを説明するための図、図４２は第３実施形態における文字コード変換用ルックアップテーブル（ＬＵＴ）の具体例を示す図である。 As will be described later with reference to FIG. 45, by adding information indicating the character code type at the time of compression to the compressed character string, the character code type of the data restored from the compressed character string can be changed. The character code system can be recognized, and by matching the character code system with the current character code system of the XML document, the consistency of the character code system of the entire XML document can be maintained.
Here, the data conversion method (data compression / decompression method) used in the third embodiment will be described more specifically with reference to FIG. 41 (A), FIG. 41 (B) and FIG. 41A is a diagram for explaining the flow of data conversion processing (compression processing), FIG. 41B is a diagram for explaining the flow of data reverse conversion processing (decompression processing), and FIG. These are the figures which show the specific example of the look-up table (LUT) for character code conversion in 3rd Embodiment.

入力文字列（本実施形態では非キー要素を成す文字列）を圧縮して文字コードへのパッキングを行なう際には、図４１（Ａ）に示すように、まず、入力文字列を、圧縮用静的単語辞書（静的辞書）４１に登録されている単語（文字列）と照合し、その単語辞書４１中の単語と最長一致する単語を入力文字列から順次切り出し、切り出された単語を、その単語に対応する辞書番号に置き換える（ステップＳ１１）。 When compressing an input character string (character string forming a non-key element in this embodiment) and packing it into a character code, first, as shown in FIG. A word (character string) registered in the static word dictionary (static dictionary) 41 is collated, and a word having the longest match with the word in the word dictionary 41 is sequentially cut out from the input character string. The dictionary number corresponding to the word is replaced (step S11).

なお、静的単語辞書４１によるデータ圧縮手法は、予め作成された辞書を用いる公知技術で、例えば特開平３−２４７１６７号公報（辞書登録方法及びデータ圧縮方法）や特開平４−８０８１３号公報（辞書初期化方式）や特開平６−２２２９０３号公報（文字データを圧縮し圧縮済みデータを展開するための静的辞書構造を提供する方法および手段）などに開示されている。第３実施形態における静的単語辞書４１や４４は、サンプルの出現頻度を調べて予め作成されたものである。 The data compression method using the static word dictionary 41 is a known technique using a dictionary created in advance. For example, Japanese Patent Laid-Open No. 3-247167 (dictionary registration method and data compression method) and Japanese Patent Laid-Open No. 4-80813 ( Dictionary initialization method) and Japanese Patent Laid-Open No. 6-222903 (method and means for providing a static dictionary structure for compressing character data and expanding compressed data). The static word dictionaries 41 and 44 in the third embodiment are created in advance by checking the appearance frequency of samples.

次に、出現頻度に従って割り当てた可変長の符号語を収めた符号表４２を参照し、固定長ビットの辞書番号に対応する可変長符号を取り出して辞書番号を可変長符号に置換した後、その可変長符号が、バイトごとのデータになるように、ビット詰め処理を行なう。このとき、各バイトのデータについて、可変長符号化により得られたバイナリデータを６ビット分だけ詰めるバイト・パッキングを行なう（ステップＳ１２）。つまり、ステップＳ１２では、変換対象の文字もしくは文字列（本実施形態では辞書番号）に対し、出現頻度の高いものほど短い可変長符号を割り付ける可変長符号化（統計的なデータ圧縮）を行ない、この可変長符号化により得られたバイナリデータを６ビットずつ１バイトの変換データにパッキングして出力する。 Next, referring to the code table 42 containing the variable-length code words assigned according to the appearance frequency, taking out the variable-length code corresponding to the fixed-length bit dictionary number and replacing the dictionary number with the variable-length code, Bit padding is performed so that the variable-length code becomes data for each byte. At this time, for each byte of data, byte packing is performed in which binary data obtained by variable length coding is packed by 6 bits (step S12). That is, in step S12, variable length coding (statistical data compression) is performed for assigning a shorter variable length code to a character or character string to be converted (dictionary number in the present embodiment) that has a higher appearance frequency, The binary data obtained by this variable length coding is packed into 1 byte converted data by 6 bits and output.

この後、例えば図４２に示すような文字コード変換用ルックアップテーブル（ＬＵＴ）４５を用いて、１バイトの各変換データ（６ビット詰めされた１バイトのデータ）を文字コードの符号に変換し、その変換結果を圧縮文字列として出力する（ステップＳ１３）。
ここで、ＬＵＴ４５は、上述のように６ビットパッキング時の文字コード変換（ＢＡＳＥ６４符号化）に用いられるもので、６ビットで表わされる値０〜６３と、これらの値０〜６３にそれぞれ対応する文字コードの符号との対応関係を設定している。特に、図４２に示すＬＵＴ４５は、６ビット値０〜６３を、それぞれ、Ａ〜Ｚ（0x41〜0x5A），ａ〜ｚ（0x61〜0x7A），０〜９（0x30〜0x39），+（0x2B），/（0x2F）の文字コードに対応させるように作成されている。 After that, for example, by using a character code conversion look-up table (LUT) 45 as shown in FIG. 42, each 1-byte conversion data (1-byte data packed with 6 bits) is converted into a character code code. The conversion result is output as a compressed character string (step S13).
Here, the LUT 45 is used for character code conversion (BASE64 encoding) at the time of 6-bit packing as described above, and corresponds to the values 0 to 63 represented by 6 bits and these values 0 to 63, respectively. Correspondence with character code is set. In particular, the LUT 45 shown in FIG. 42 converts 6-bit values 0 to 63 into A to Z (0x41 to 0x5A), a to z (0x61 to 0x7A), 0 to 9 (0x30 to 0x39), and + (0x2B), respectively. , / (0x2F) is created to correspond to the character code.

このとき、ＬＵＴ４５におけるＡＳＣＩＩコードのセットにはタグ記号“<”，“>”が含まれていない。つまり、ＬＵＴ４５には、ＸＭＬ文書においてタグ付けに関する文字コードを除いたＡＳＣＩＩコードのセットが登録されている。従って、ステップＳ１３でのＬＵＴ変換処理に際して、タグ記号をタグ付けに関連しない他の文字列に変換するといった、特別なエスケープ処理を行なう必要はない。 At this time, the set of ASCII codes in the LUT 45 does not include the tag symbols “<” and “>”. That is, in the LUT 45, a set of ASCII codes excluding character codes related to tagging in the XML document is registered. Therefore, in the LUT conversion process in step S13, it is not necessary to perform a special escape process such as converting the tag symbol into another character string not related to tagging.

このようなＬＵＴ４５を用いて、各変換データにパッキングされた６ビットデータが、ＡＳＣＩＩコードに従う文字コード（ＡＳＣＩＩコードの印字可能文字に対応する符号）に変換され、変換データ毎に得られた文字コードが、圧縮変換結果つまり圧縮文字列として出力される。
一方、上述のようにして圧縮された圧縮文字列を元の文字列に復元する際には、図４１（Ｂ）に示すように、まず、圧縮文字列の各文字コードを、ＬＵＴ４５に基づいて０〜６３の数値（６ビット値）に変換するＬＵＴ逆変換を行なう（ステップＳ２１）。 Using such a LUT 45, the 6-bit data packed in each conversion data is converted into a character code according to the ASCII code (a code corresponding to a printable character of the ASCII code), and the character code obtained for each conversion data Is output as a compression conversion result, that is, a compressed character string.
On the other hand, when the compressed character string compressed as described above is restored to the original character string, each character code of the compressed character string is first converted based on the LUT 45 as shown in FIG. LUT reverse conversion is performed to convert the numerical value from 0 to 63 (6-bit value) (step S21).

この後、６ビット詰めを解く処理、即ち、１バイトの各変換データから６ビットデータを取り出すデパッキング（アンパッキング）を行ない、取り出されたバイナリデータを、符号表４３に基づいて固定長ビットの辞書番号に復元する（ステップＳ２２）。
そして、ステップＳ２２で復元された各辞書番号を復元用静的単語辞書（静的辞書）４４の辞書番号と照合して各辞書番号に対応する単語（文字列）を読み出し、各辞書番号を、読み出された単語（文字列）に置き換えることにより、元の文字列に復元する（ステップＳ２３）。 Thereafter, a process of unpacking 6 bits, that is, depacking (unpacking) for extracting 6-bit data from each converted data of 1 byte is performed, and the extracted binary data is converted into fixed-length bits based on the code table 43. The dictionary number is restored (step S22).
Then, each dictionary number restored in step S22 is compared with the dictionary number of the restoration static word dictionary (static dictionary) 44 to read a word (character string) corresponding to each dictionary number. By replacing the read word (character string), the original character string is restored (step S23).

以下に、上述したデータ圧縮／復元方法を用いて行なわれる、構造化文書の代表であるＸＭＬ文書に対する変換／逆変換処理について説明する。
〔３−３〕第３実施形態のシステムおよび変換／逆変換処理の流れ
図４３は、本発明の第３実施形態としての構造化文書変換方法を適用されるシステムおよびそのシステムにおける変換／逆変換処理の流れを説明するための図である。 Hereinafter, a conversion / inverse conversion process for an XML document, which is a representative of a structured document, performed using the above-described data compression / decompression method will be described.
[3-3] System of Third Embodiment and Flow of Conversion / Inverse Conversion Processing FIG. 43 shows a system to which the structured document conversion method according to the third embodiment of the present invention is applied, and conversion / inverse conversion in the system. It is a figure for demonstrating the flow of a process.

多様な種類のＸＭＬ文書に対して、各ＸＭＬ文書に応じたスタイルシート（ＸＳＬシート）を一々作成するのは極めて面倒で手間がかかる。そこで、その手間を省くため、第３実施形態でも、図４６を参照しながら後述するごとく、ＸＭＬ文書のデータ構造を変換するための仕様（レコード名，キータグ名，非キータグ名等）をＸＭＬ文書（変換仕様文書）によって作成して変換実行手順を与え、図４７および図４８を参照しながら後述するごとく、その変換仕様文書に基づいてＸＭＬ文書の変換／逆変換を実行する。 It is extremely troublesome and troublesome to create one style sheet (XSL sheet) corresponding to each XML document for various types of XML documents. Therefore, in order to save the effort, in the third embodiment, as will be described later with reference to FIG. 46, specifications (record name, key tag name, non-key tag name, etc.) for converting the data structure of the XML document are described in the XML document. (Conversion specification document) is created and a conversion execution procedure is given, and conversion / inverse conversion of the XML document is executed based on the conversion specification document as described later with reference to FIGS.

図４３に示すシステムでは、データ構造変換／逆変換機構（プロセッサ）１０Ａ，標準ＡＰＩ２０および応用ソフトウエア３０がそなえられている。データ構造変換／逆変換機構１０Ａは、キー要素と非キー要素とを区別するための情報を記述するとともに新たな要素（圧縮文字列を記述する要素）に関する情報を記述した変換仕様文書（ＸＭＬ文書）を読み込み、この変換仕様文書に基づいて得られた構造変換圧縮ソフトウエアにより入力ＸＭＬ文書に対する変換処理を行なって変換後ＸＭＬ文書を出力する。 The system shown in FIG. 43 includes a data structure conversion / inverse conversion mechanism (processor) 10A, standard API 20, and application software 30. The data structure conversion / inverse conversion mechanism 10A describes a conversion specification document (XML document) that describes information for distinguishing key elements from non-key elements and also describes information about new elements (elements that describe compressed character strings). ) Is read, the input XML document is converted by the structure conversion compression software obtained based on the conversion specification document, and the converted XML document is output.

つまり、構造変換圧縮ソフトウエアによって動作するデータ構造変換／逆変換機構１０Ａは、所定のタグ名（本実施形態では“compressed”）を付与された新たな要素を作成した上で、圧縮用静的単語辞書４１，符号表４２およびＬＵＴ４５を用いて、非キー要素を成す文字もしくは文字列を図４１（Ａ）で説明したデータ圧縮方法により圧縮して圧縮文字列を生成してから、その圧縮文字列を、変換後ＸＭＬ文書において新たな要素の内容もしくは属性として記述する一方、キー要素を、変換後ＸＭＬ文書においてそのまま記述する。 That is, the data structure conversion / inverse conversion mechanism 10A operated by the structure conversion compression software creates a new element to which a predetermined tag name (in this embodiment, “compressed”) is assigned, and then compresses static data for compression. Using the word dictionary 41, the code table 42, and the LUT 45, a character or character string forming a non-key element is compressed by the data compression method described with reference to FIG. While the column is described as the content or attribute of a new element in the converted XML document, the key element is described as it is in the converted XML document.

そして、変換されたＸＭＬ文書に対しては、標準ＡＰＩ２０を介して応用ソフトウエアによりデータ処理（例えばタグ検索）が施され、データ処理後のＸＭＬ文書が得られる。なお、データ処理としてタグ検索を行なった場合、検索結果が抽出ＸＭＬ文書の形で得られる。この抽出ＸＭＬ文書は、データ構造変換／逆変換機構１０Ａに読み込まれ、データ構造変換／逆変換機構１０Ａは、前記変換仕様文書に基づいて得られた復元・構造逆ソフトウエアにより、抽出ＸＭＬ文書に対する逆変換処理を行なって抽出最終結果を出力する。 The converted XML document is subjected to data processing (for example, tag search) by application software via the standard API 20, and an XML document after data processing is obtained. When a tag search is performed as data processing, a search result is obtained in the form of an extracted XML document. This extracted XML document is read into the data structure conversion / inverse conversion mechanism 10A, and the data structure conversion / inverse conversion mechanism 10A applies the reconstruction / structure reverse software obtained based on the conversion specification document to the extracted XML document. An inverse transformation process is performed and the final extraction result is output.

つまり、復元・構造逆変換ソフトウエアによって動作するデータ構造変換／逆変換機構１０Ａは、復元用静的単語辞書４３，符号表４４およびＬＵＴ４５を用いて、所定のタグ名（本実施形態では“compressed”）を付与された要素に記述された圧縮文字列を、図４１（Ｂ）で説明したデータ復元方法により、非キー要素についての元の文字列に復元してから、復元された非キー要素を用いて、元のデータ構造のＸＭＬ文書を復元して出力する。これにより、最終的なデータ処理結果としてのＸＭＬ文書が得られる。 That is, the data structure conversion / inverse conversion mechanism 10A operated by the reconstruction / structure inverse conversion software uses the restoration static word dictionary 43, the code table 44, and the LUT 45 to generate a predetermined tag name (in this embodiment, “compressed ”) Is restored to the original character string of the non-key element by the data restoration method described with reference to FIG. 41B, and then the restored non-key element is restored. Is used to restore and output an XML document having the original data structure. Thereby, an XML document as a final data processing result is obtained.

このとき、図４３に示すように、応用ソフトウエア３０は、標準ＡＰＩ（ＤＯＭ）２０を通して、データ構造変換／逆変換機構１０Ａからの、要素数を削減された変換後ＸＭＬ文書に対し、タグ検索等の処理を行なうことになるので、第１実施形態や第２実施形態と同様、応用ソフトウエア３０による処理速度は大幅に高速化される。
応用ソフトウエア３０が、変換後ＸＭＬ文書に対してタグ検索を行なうものである場合、そのタグ検索によってヒットしたレコードを記述するＸＭＬ文書（抽出ＸＭＬ文書）が抽出・出力される。この抽出ＸＭＬ文書は、データ構造変換／逆変換機構１０Ａによって上述のごとく逆変換され、応用ソフトウエア３０が元のＸＭＬ文書に対してタグ検索したのと全く同じ、検索結果（ＸＭＬ文書）が得られることになる。 At this time, as shown in FIG. 43, the application software 30 performs tag search for the converted XML document with the reduced number of elements from the data structure conversion / inverse conversion mechanism 10A through the standard API (DOM) 20. Therefore, the processing speed by the application software 30 is greatly increased as in the first and second embodiments.
When the application software 30 performs a tag search on the converted XML document, an XML document (extracted XML document) describing a record hit by the tag search is extracted and output. This extracted XML document is inversely converted as described above by the data structure conversion / inverse conversion mechanism 10A, and the same search result (XML document) as that obtained when the application software 30 has performed a tag search on the original XML document is obtained. Will be.

また、データ構造変換／逆変換機構１０Ａが逆変換を施すＸＭＬ文書は、応用ソフトウエア３０によって抽出された少数のレコードを記述されたＸＭＬ文書であるので、データ構造変換／逆変換機構１０Ａによる逆変換のオーバーヘッドはほとんど問題にならない。従って、応用ソフトウエア３０で多数回実行される処理は、本実施形態のデータ構造変換を予め施しておくことによって、大幅に高速化されるとともに、動作メモリの使用量も大幅に削減されることになる。 The XML document to which the data structure conversion / inverse conversion mechanism 10A performs reverse conversion is an XML document in which a small number of records extracted by the application software 30 are described. Conversion overhead is of little concern. Therefore, the processing executed many times by the application software 30 is greatly speeded up by performing the data structure conversion of this embodiment in advance, and the amount of operation memory used is also greatly reduced. become.

〔３−４〕第３実施形態におけるＸＭＬ文書の変換方法および具体的な変換例
図４４（Ａ）および図４４（Ｂ）においては、それぞれ、図４（Ａ）に示す表形式ＸＭＬ文書に、第３実施形態の構造化文書変換方法を適用して得られた変換結果の第１および第２具体例が示されている。ここでも、タグ名“名前”，“会社”の要素をキー要素とし、タグ名“部署”，“住所”，“電話”の要素を非キー要素とする。なお、図４４（Ａ）および図４４（Ｂ）中において、波線下線を付した部分は、図４１（Ａ）で説明したデータ圧縮方法により圧縮処理を施される部分（文字列）である。 [3-4] XML Document Conversion Method and Specific Conversion Example in Third Embodiment In FIGS. 44 (A) and 44 (B), respectively, the tabular XML document shown in FIG. First and second specific examples of conversion results obtained by applying the structured document conversion method of the third embodiment are shown. Again, the elements of the tag names “name” and “company” are used as key elements, and the elements of the tag names “department”, “address”, and “telephone” are used as non-key elements. In FIGS. 44 (A) and 44 (B), the underlined portion is a portion (character string) subjected to compression processing by the data compression method described in FIG. 41 (A).

図４４（Ａ）に示す第１具体例では、変換対象のＸＭＬ文書を成す要素をキー要素と非キー要素とに分け、タグ名“compressd”を付与された新たな要素を作成し、非キー要素をひとまとめにしてから図４１（Ａ）で説明したデータ圧縮方法により圧縮処理を施して圧縮文字列を作成し、その圧縮文字列を新たな要素の内容として記述する。キー要素については、変換後のＸＭＬ文書においても、何の変換も施すことなくそのまま記述する。 In the first specific example shown in FIG. 44A, the elements constituting the XML document to be converted are divided into key elements and non-key elements, and a new element to which the tag name “compressd” is assigned is created. After the elements are grouped together, a compression character string is created by performing compression processing by the data compression method described with reference to FIG. 41A, and the compressed character string is described as the contents of the new element. The key elements are described as they are without any conversion in the converted XML document.

つまり、図４４（Ａ）に示す変換後ＸＭＬ文書の第１レコードでは、タグ名“compressd”の要素において、一連の非キー要素<部署>A部</部署><住所>A市</住所><電話>123</電話>を図４１（Ａ）で説明したデータ圧縮方法によって圧縮して得られた圧縮文字列が要素内容として記述される。また、第２レコードでは、タグ名“compressd”の要素において、一連の非キー要素<部署>B部</部署><住所>B市</住所><電話>456</電話>を図４１（Ａ）で説明したデータ圧縮方法によって圧縮して得られた圧縮文字列が要素内容として記述される。 That is, in the first record of the converted XML document shown in FIG. 44A, in the element of the tag name “compressd”, a series of non-key elements <department> A department </ department> <address> A city </ address A compressed character string obtained by compressing <telephone> 123 </ telephone> by the data compression method described with reference to FIG. 41A is described as element contents. In the second record, in the element of the tag name “compressd”, a series of non-key elements <department> B department </ department> <address> B city </ address> <phone> 456 </ phone> are shown in FIG. A compressed character string obtained by compression by the data compression method described in (A) is described as element contents.

図４４（Ｂ）に示す第２具体例では、変換対象のＸＭＬ文書を成す要素をキー要素と非キー要素とに分け、タグ名“compressd”および属性名“info”を付与された新たな要素（空要素）を作成し、非キー要素をひとまとめにしてから図４１（Ａ）で説明したデータ圧縮方法により圧縮処理を施して圧縮文字列を作成し、その圧縮文字列を、新たな要素において、属性名“info”に対応する属性値として記述する。キー要素については、変換後のＸＭＬ文書においても、何の変換も施すことなくそのまま記述する。 In the second specific example shown in FIG. 44 (B), the elements constituting the XML document to be converted are divided into key elements and non-key elements, and a new element having a tag name “compressd” and an attribute name “info” is given. (Empty element) is created, and the non-key elements are grouped together, and then compressed by the data compression method described with reference to FIG. 41A to create a compressed character string. Describe as an attribute value corresponding to the attribute name “info”. Key elements are described as they are without any conversion in the converted XML document.

つまり、図４４（Ｂ）に示す変換後ＸＭＬ文書の第１レコードでは、タグ名“compressd”の要素において、一連の非キー要素<部署>A部</部署><住所>A市</住所><電話>123</電話>を図４１（Ａ）で説明したデータ圧縮方法によって圧縮して得られた圧縮文字列が属性名“info”の属性値として記述されている。また、第２レコードでは、タグ名“compressd”の要素において、一連の非キー要素<部署>B部</部署><住所>B市</住所><電話>456</電話>を図４１（Ａ）で説明したデータ圧縮方法によって圧縮して得られた圧縮文字列が属性名“info”の属性値として記述される。 That is, in the first record of the converted XML document shown in FIG. 44B, in the element of the tag name “compressd”, a series of non-key elements <department> A department </ department> <address> A city </ address A compressed character string obtained by compressing <telephone> 123 </ telephone> by the data compression method described with reference to FIG. 41A is described as an attribute value of the attribute name “info”. In the second record, in the element of the tag name “compressd”, a series of non-key elements <department> B department </ department> <address> B city </ address> <phone> 456 </ phone> are shown in FIG. A compressed character string obtained by compression by the data compression method described in (A) is described as the attribute value of the attribute name “info”.

ＸＭＬ文書には、文字コードしか含ませることができないが、上述した圧縮方法で得られた圧縮データ（圧縮文字列）は文字コードによって記述されるので、そのままＸＭＬ文書に記述することができる。ただし、ＸＭＬ文書では，タグ記号“<”，“>”が特別な意味を持つが、上述したように圧縮データの文字コードは、タグ記号以外の、ＡＳＣＩＩコードの印字可能文字であるため、圧縮データを要素内容あるいは属性値に記述しても、全体がテキストと見なされる。 An XML document can contain only a character code, but since the compressed data (compressed character string) obtained by the compression method described above is described by the character code, it can be described as it is in the XML document. However, in XML documents, the tag symbols “<” and “>” have a special meaning. As described above, since the character code of the compressed data is a printable character of the ASCII code other than the tag symbol, it is compressed. Even if data is described in element contents or attribute values, the whole is regarded as text.

また、第３実施形態の変換方法も、第１実施形態や第２実施形態と同様、複数の非キー要素を一つの要素にまとめ、応用ソフトウエアがデータ処理を実行している間は非キー要素をデータ処理と無関係な要素として一括して扱えるようにするものである。変換方法として、図４４（Ａ）もしくは図４４（Ｂ）で説明した方法のうちのどちらを用いるかは、変換仕様文書等により選択・指定することができるようになっている。このとき、これらの変換方法のうちのどちらを用いるかは、ＸＭＬ文書のデータ量によって、あるいは、データ処理に伴い新たな要素が幾つ増えるかによって決定されることになるが、非キー要素をひとまとめにして取り扱うという本発明の本質を考えれば、どちらの方法を採用してもよい。 Also, in the conversion method of the third embodiment, similarly to the first embodiment and the second embodiment, a plurality of non-key elements are combined into one element and the non-key is applied while the application software is executing data processing. Elements can be handled collectively as elements unrelated to data processing. As a conversion method, which of the methods described in FIG. 44A or 44B is used can be selected and designated by a conversion specification document or the like. At this time, which one of these conversion methods is used is determined by the data amount of the XML document or by how many new elements increase with the data processing. In consideration of the essence of the present invention, it is possible to adopt either method.

なお、図４５に示すように、第３実施形態において、変換後ＸＭＬ文書中に記述された圧縮文字列（圧縮データ）の先頭には、圧縮時点におけるＸＭＬ文書の文字コード系の種別を示す情報としての識別ビット（ここでは２ビット）が付与されている。
ＸＭＬ文書の文字コード系が例えばＵＴＦ−８に固定されていて、文字コード系の変換が全く起こらない場合には、何ら問題は生じないが、ＸＭＬ文書は、ＵＴＦ−８のほか、ＵＴＦ−１６，Shift_ＪＩＳ，ＥＵＣ等の文字コード系を取り得るので、文字コード系が変更になる場合の、本発明の対応について以下に説明する。 As shown in FIG. 45, in the third embodiment, information indicating the type of the character code system of the XML document at the time of compression is placed at the head of the compressed character string (compressed data) described in the converted XML document. As an identification bit (2 bits in this case).
When the character code system of the XML document is fixed to, for example, UTF-8 and no conversion of the character code system occurs, no problem occurs. However, the XML document is not only UTF-8 but also UTF-16. Since the character code system such as Shift, JIS, EUC, etc. can be taken, the correspondence of the present invention when the character code system is changed will be described below.

圧縮文字列の文字コード系として特定のものを選んでおくと、ＸＭＬ文書の文字コード系が圧縮時点と変わった場合に、圧縮文字列も文字コード系が自動的に変換されてしまうので、通常、ビット並びが変わってしまい、圧縮文字列を元の状態に復元することができなくなるおそれがある。
これに対し、本発明では、前述したように、圧縮文字列が、全ての文字コード系に共通に含まれるＡＳＣＩＩコードによって記述されるため、元のＸＭＬ文書の文字コード系が変換されても、圧縮文字列のビット並びは変わらず、圧縮文字列を正常に復元することができる。 If a specific character code system is selected for the compressed character string, the character code system of the compressed character string is automatically converted when the character code system of the XML document changes from the time of compression. If the bit sequence is changed, the compressed character string may not be restored to the original state.
On the other hand, in the present invention, as described above, since the compressed character string is described by the ASCII code that is commonly included in all the character code systems, even if the character code system of the original XML document is converted, The bit sequence of the compressed character string does not change, and the compressed character string can be restored normally.

ＸＭＬ文書の文字コード系が圧縮時点から任意の文字コードに変換されているときには、圧縮文字列を復元して圧縮時点の文字コード系の種別を認識し、その文字コード系を現在（逆変換時）のＸＭＬ文書の文字コード系に合わせる必要がある。このため、第３実施形態では、図４５に示すように、圧縮データのヘッダに、圧縮時点の文字コード系の種別を識別するための識別ビットを付加する。 When the character code system of an XML document has been converted to an arbitrary character code from the time of compression, the compressed character string is restored to recognize the type of character code system at the time of compression, and the character code system is currently ) To match the character code system of the XML document. For this reason, in the third embodiment, as shown in FIG. 45, an identification bit for identifying the type of the character code system at the time of compression is added to the header of the compressed data.

識別すべき文字コード系の種別がＵＴＦ−８，ＵＴＦ−１６，Shift_ＪＩＳ，ＥＵＣの４つであれば、識別ビットを２ビット設ける。この場合、例えば、“００”がＵＴＦ−８を、“０１”がＵＴＦ−１６を、“１０”がShift_ＪＩＳを、“１１”がＥＵＣを示すものと定義しておく。なお、この識別ビットは、圧縮すべき一連の非キー要素に添付され、非キー要素とともに、図４１（Ａ）で説明したデータ圧縮方法により圧縮文字列に変換されるものとする。 If there are four types of character code systems to be identified, UTF-8, UTF-16, Shift_JIS, and EUC, two identification bits are provided. In this case, for example, “00” is defined as UTF-8, “01” as UTF-16, “10” as Shift_JIS, and “11” as EUC. This identification bit is attached to a series of non-key elements to be compressed, and is converted into a compressed character string together with the non-key elements by the data compression method described with reference to FIG.

〔３−５〕第３実施形態における変換仕様文書の具体例
図４６には、図４（Ａ）に示した表形式ＸＭＬ文書を変換対象とした場合の、具体的な変換仕様文書（ＸＭＬ文書）が示されている。ここでは、変換対象のＸＭＬ文書が表形式データである場合について説明しているが、変換対象のＸＭＬ文書が非表形式データであっても、図４６に示す変換仕様文書を用いて変換／逆変換を行なうことができる。また、図４６に示す変換仕様文書は、図４４（Ａ）で説明した変換方法を実現するためのものである。 [3-5] Specific Example of Conversion Specification Document in Third Embodiment FIG. 46 shows a specific conversion specification document (XML document) when the tabular XML document shown in FIG. )It is shown. Here, the case where the XML document to be converted is tabular data has been described, but even if the XML document to be converted is non-tabular data, conversion / inversion is performed using the conversion specification document shown in FIG. Conversion can be performed. The conversion specification document shown in FIG. 46 is for realizing the conversion method described with reference to FIG.

この図４６に示す変換仕様文書では、ルートのタグ名“名簿”やレコードのタグ名“個人”が記述されるほか、タグ名“key_tags”の要素の内容としてキー要素のタグ名“名前”および“会社”を記述するとともにタグ名“nonkey_tags”の要素の内容として非キー要素のタグ名“部署”，“住所”および“電話”を記述することにより、キー要素と非キー要素とを区別するための情報が記述されている。また、タグ名“nonkey_tags”の要素の内容には、タグ名“merged_tag”の要素が含まれており、この要素の内容として、非キー要素を一つにまとめるための新たな要素のタグ名 “compressed”が記述されている。このような変換仕様文書により、ＸＭＬ文書のデータ構造変換実行手順が指示される。 In the conversion specification document shown in FIG. 46, the tag name “name list” of the root and the tag name “person” of the record are described, and the tag name “name” of the key element is included as the contents of the element of the tag name “key_tags”. Distinguish between key elements and non-key elements by describing "company" and the tag name "department", "address", and "phone" of the non-key element as the contents of the element of the tag name "nonkey_tags" Information for this is described. In addition, the content of the element of the tag name “nonkey_tags” includes the element of the tag name “merged_tag”. As the content of this element, the tag name of a new element for combining the non-key elements into one “ “compressed” is described. Such a conversion specification document instructs the data structure conversion execution procedure of the XML document.

〔３−６〕第３実施形態の変換方法による具体的な変換処理手順
次に、図４７および図４８を参照しながら、本発明の第３実施形態としての構造化文書変換方法による変換処理手順について説明する。なお、図４７および図４８において図１８および図１９と同じステップ番号を付されたステップは、図１８および図１９で説明した処理と同一もしくはほぼ同一の処理を実行するものであるので、その説明は省略する。つまり、以下の説明では、ステップ番号Ａ１〜Ａ１１，Ａ１５，Ａ１６およびＢ１〜Ｂ１１，Ｂ１４，Ｂ１５を付されたステップにおける処理の説明は省略する。 [3-6] Specific Conversion Processing Procedure by the Conversion Method of the Third Embodiment Next, with reference to FIGS. 47 and 48, the conversion processing procedure by the structured document conversion method as the third embodiment of the present invention. Will be described. 47 and 48, the steps denoted by the same step numbers as those in FIGS. 18 and 19 execute the same or substantially the same processes as those described in FIG. 18 and FIG. Is omitted. That is, in the following description, the description of the process in the step which attached | subjected step number A1-A11, A15, A16 and B1-B11, B14, B15 is abbreviate | omitted.

図４７は、変換仕様文書に基づいて変換対象ＸＭＬ文書に対してデータ構造変換を施す際の処理手順を説明するためのフローチャートであり、図４８は、変換仕様文書に基づいて変換後ＸＭＬ文書（処理済ＸＭＬ文書）に対してデータ構造の逆変換を施す際の処理手順を説明するためのフローチャートである。
なお、第３実施形態では、データ構造変換／逆変換機構１０Ａは、図４３で説明した構造変換圧縮ソフトウエアや復元構造逆変換ソフトウエアを実行することにより、図４７もしくは図４８に示すフローチャートに従って、変換仕様文書を読み込み、符号表４１，４４や圧縮用／復元用の静的単語辞書４２，４３やＬＵＴ４５を参照しながら変換／逆変換処理（データ圧縮／復元処理）を実行する。 FIG. 47 is a flowchart for explaining a processing procedure when data structure conversion is performed on the conversion target XML document based on the conversion specification document. FIG. 48 shows a converted XML document ( It is a flowchart for demonstrating the process sequence at the time of performing reverse conversion of a data structure with respect to (processed XML document).
In the third embodiment, the data structure conversion / inverse conversion mechanism 10A executes the structure conversion compression software and the decompression structure reverse conversion software described with reference to FIG. 43, according to the flowchart shown in FIG. 47 or FIG. The conversion specification document is read, and conversion / inverse conversion processing (data compression / decompression processing) is executed while referring to the code tables 41 and 44, the static / dictionary static word dictionaries 42 and 43, and the LUT 45.

図４７に示す変換処理手順では、ステップＡ７で切り出された要素データが非キー要素である場合（ステップＡ１１のＹＥＳルート）、その非キー要素が、一つの要素にまとめるべき非キー要素群のうちの最初のものであるか否かを判断し（ステップＡ５１）、最初のものである場合（ステップＡ５１のＹＥＳルート）、変換仕様文書によって予め指定されたタグ名“compressed”の開始タグを作成してから（ステップＡ５２）、今回切り出された非キー要素を保持しておく（ステップＡ５３）。 In the conversion processing procedure shown in FIG. 47, when the element data cut out in step A7 is a non-key element (YES route in step A11), the non-key element is a non-key element group to be combined into one element. If it is the first one (YES route of step A51), a start tag with a tag name “compressed” designated in advance by the conversion specification document is created. (Step A52), the non-key element cut out this time is held (Step A53).

一方、非キー要素が最初のものでない場合（ステップＡ５１のＮＯルート）、つまり既に非キー要素に対応する新要素が作成されている場合には、ステップＡ５２による開始タグの作成処理をジャンプし、今回切り出された非キー要素を、既に切り出されて保持されている非キー要素の後に繋いでまとめる（ステップＡ５３）。
この後、非キー要素が、一つの要素にまとめるべき非キー要素群のうちの最後のものであるか否かを判断し（ステップＡ５４）、最後のものでない場合（ステップＡ５４のＮＯルート）、ステップＡ７の処理に戻る。 On the other hand, if the non-key element is not the first one (NO route in step A51), that is, if a new element corresponding to the non-key element has already been created, the start tag creation process in step A52 is jumped, The non-key elements cut out this time are connected together after the non-key elements already cut out and held (step A53).
Thereafter, it is determined whether or not the non-key element is the last non-key element group to be combined into one element (step A54). If it is not the last (NO route of step A54), The process returns to step A7.

一方、最後のものである場合（ステップＡ５４のＹＥＳルート）、ステップＡ５３でまとめられた非キー要素に、文字コード系の種別を示す識別ビットを付与した上で、図４１（Ａ）で説明したデータ圧縮方法による圧縮処理を施して圧縮文字列を得る。そして、その圧縮文字列を、タグ名“compressed”の開始タグの次に、新要素の内容として記述してから、その後に、タグ名“compressed”の終了タグを作成して付加する（ステップＡ５５）。この後、ステップＡ７の処理に戻る。 On the other hand, if it is the last one (YES route of step A54), an identification bit indicating the type of character code system is added to the non-key elements collected in step A53, and the description is made with reference to FIG. A compressed character string is obtained by performing compression processing by a data compression method. Then, the compressed character string is described as the content of the new element next to the start tag of the tag name “compressed”, and thereafter, an end tag of the tag name “compressed” is created and added (step A55). ). Thereafter, the process returns to step A7.

なお、ここでは、図４４（Ａ）で説明した変換方法に対応した処理を行なった場合について説明しているが、図４４（Ｂ）で説明した変換方法を採用する場合には、ステップＡ５２で、新要素として、タグ名“compressed”および属性名“info”をもつ空要素タグを作成し、ステップＡ５５で、圧縮文字列を新要素（空要素）の“info”属性の属性値として記述する。 Here, the case where the processing corresponding to the conversion method described in FIG. 44A is performed is described, but when the conversion method described in FIG. 44B is adopted, in step A52. Then, an empty element tag having the tag name “compressed” and the attribute name “info” is created as a new element, and the compressed character string is described as an attribute value of the “info” attribute of the new element (empty element) in step A55. .

一方、図４８に示す逆変換処理手順では、非キー要素をまとめた新要素（<compressed>タグ）がステップＢ７で切り出された場合（ステップＢ１１のＹＥＳルート）、新要素の内容（もしくは属性値）として記述された圧縮文字列を読み出し、その圧縮文字列から、図４１（Ｂ）で説明したデータ復元方法により、非キー要素を成す元の文字列を復元し、非キー要素のタグの記述を削除し、復元ＸＭＬ文書に、復元された非キー要素を記述してから（ステップＢ３９）、ステップＢ７の処理に戻る。 On the other hand, in the inverse conversion processing procedure shown in FIG. 48, when a new element (<compressed> tag) in which non-key elements are collected is cut out in step B7 (YES route in step B11), the content (or attribute value) of the new element ), The original character string constituting the non-key element is restored from the compressed character string by the data restoration method described in FIG. 41B, and the tag description of the non-key element is written. Is deleted, the restored non-key element is described in the restored XML document (step B39), and the process returns to step B7.

〔３−７〕第３実施形態の効果
このように、本発明の第３実施形態としての構造化文書変換方法によれば、変換対象のＸＭＬ文書を成す要素がキー要素と非キー要素とに分けられ、その変換対象のＸＭＬ文書が、キー要素をそのまま記述する一方で、非キー要素を成す文字もしくは文字列を一つのタグにまとめ図４１（Ａ）に示すデータ圧縮方法により圧縮して得られた文字コード列（圧縮文字列）として記述したＸＭＬ文書に変換される。従って、上述した第１実施形態や第２実施形態と同様の効果ないし利点を得ることができるほか、変換後のＸＭＬ文書のデータ量を大幅に削減することができる。 [3-7] Effect of Third Embodiment As described above, according to the structured document conversion method as the third embodiment of the present invention, the elements constituting the XML document to be converted are divided into key elements and non-key elements. The XML document to be converted is obtained by describing the key elements as they are, and combining the characters or character strings forming the non-key elements into one tag and compressing them by the data compression method shown in FIG. It is converted into an XML document described as a character code string (compressed character string). Therefore, the same effects and advantages as those of the first embodiment and the second embodiment described above can be obtained, and the data amount of the converted XML document can be greatly reduced.

また、図４１（Ａ）で説明したデータ圧縮方法を用いることにより、ＸＭＬ文書を効率良く圧縮しながら、その圧縮結果を文字コードの形で得てＸＭＬ文書内に置くことのできる圧縮変換技術が提供されるので、ＸＭＬ文書に対する操作に必要となるリソースが大幅に軽減され、ＸＭＬ文書を処理する際におけるメモリ使用量削減や処理速度の高速化が実現される。 Further, by using the data compression method described with reference to FIG. 41A, there is a compression conversion technique capable of obtaining the compression result in the form of a character code and placing it in the XML document while efficiently compressing the XML document. As a result, the resources required for operations on the XML document are greatly reduced, and the amount of memory used for processing the XML document is reduced and the processing speed is increased.

このとき、圧縮データを表現する文字コードとして、タグ付けに関連する記号（例えばＸＭＬ文書では＜，＞，＆，”，’）を除いたＡＳＣＩＩコードを用いる。これにより、変換後のＸＭＬ文書における圧縮文字列にはタグ付けに関連する記号が存在せず、データ処理時などに誤処理が発生するのを確実に防止することができる。
また、ＡＳＣＩＩコードは、種々の文字コード系に共通に含まれる文字コードセットであるため、変換後のＸＭＬ文書が文字コード系の変換を施されても、ＡＳＣＩＩコードを用いた圧縮文字列を成すビット列は、文字コード系の変換の影響を受けることなく元の状態に保たれる。従って、文字コード系を変換されたＸＭＬ文書に含まれる圧縮文字列は、元の非キー要素に正しく復元される。 At this time, an ASCII code excluding symbols related to tagging (for example, <,>, &, “,” in an XML document) is used as a character code that expresses compressed data. There is no symbol related to tagging in the compressed character string, and it is possible to reliably prevent erroneous processing during data processing.
Also, since the ASCII code is a character code set that is commonly included in various character code systems, even if the converted XML document is subjected to character code system conversion, a compressed character string using the ASCII code is formed. The bit string is kept in its original state without being affected by the conversion of the character code system. Therefore, the compressed character string included in the XML document whose character code system has been converted is correctly restored to the original non-key element.

さらに、図４５に示すように、圧縮文字列に圧縮時点の文字コード系の種別を示す識別ビットを付与しておくことにより、圧縮文字列から復元されたデータの文字コード系の種別を認識することができ、その文字コード系を、ＸＭＬ文書の現在の文字コード系に合わせることにより、ＸＭＬ文書全体の文字コード系の整合性を保つことができる。
また、非キー要素を圧縮文字列に変換するに先立ち、非キー要素を成す文字列を、予め作成された静的単語辞書４１を用いて辞書番号に置き換えておくことにより、可変長符号化の対象となる文字列をより短縮できるので、圧縮効率をさらに高めることができ、変換後のＸＭＬ文書のデータ量をより削減することができる。 Furthermore, as shown in FIG. 45, by adding an identification bit indicating the type of the character code system at the time of compression to the compressed character string, the type of the character code system of the data restored from the compressed character string is recognized. It is possible to maintain the consistency of the character code system of the entire XML document by matching the character code system with the current character code system of the XML document.
Prior to converting a non-key element into a compressed character string, the character string forming the non-key element is replaced with a dictionary number by using a static word dictionary 41 created in advance, so that variable-length encoding can be performed. Since the target character string can be further shortened, the compression efficiency can be further increased, and the data amount of the converted XML document can be further reduced.

〔４〕その他
なお、本発明は上述した実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で種々変形して実施することができる。
例えば、上述した実施形態では、構造化文書がＸＭＬ文書である場合について説明したが、本発明は、これに限定されるものではなく、他の種々の構造化文書にも上述した実施形態と同様に適用され、上述した実施形態と同様の作用効果を得ることができる。 [4] Others The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the present invention.
For example, in the above-described embodiment, the case where the structured document is an XML document has been described. However, the present invention is not limited to this, and other various structured documents are similar to the above-described embodiment. The same effects as those of the above-described embodiment can be obtained.

〔５〕付記
（付記１）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
所定のタグ名および所定の属性名を付与された新たな要素を作成し、
該非キー要素のタグ名を含むタグ名文字列を作成し、該タグ名文字列を、該新たな要素において、前記所定の属性名に対応する属性値として記述するタグ名変換を行ない、
該非キー要素の内容を含む内容文字列を作成し、該内容文字列を、該新たな要素の内容として記述する内容変換を行ない、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 [5] Appendix (Supplementary note 1) The elements constituting the structured document to be converted are divided into a key element that is a target of data processing for the structured document and a non-key element that is not a target of the data processing.
Create a new element with a given tag name and given attribute name,
Create a tag name character string including the tag name of the non-key element, perform tag name conversion that describes the tag name character string as an attribute value corresponding to the predetermined attribute name in the new element,
Creating a content string containing the content of the non-key element, performing content conversion describing the content string as the content of the new element,
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記２）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
所定のタグ名，所定の第１属性名および所定の第２属性名を付与された新たな要素を作成し、
該非キー要素のタグ名を含むタグ名文字列を作成し、該タグ名文字列を、該新たな要素において、前記第１属性名に対応する第１属性値として記述するタグ名変換を行ない、
該非キー要素の内容を含む内容文字列を作成し、該内容文字列を、該新たな要素において、前記第２属性名に対応する第２属性値として記述する内容変換を行ない、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 (Additional remark 2) The element which comprises the structured document of conversion object is divided into the key element used as the object of the data processing with respect to this structured document, and the non-key element which is not the object of the said data processing,
Create a new element given a predetermined tag name, a predetermined first attribute name, and a predetermined second attribute name,
Create a tag name character string including the tag name of the non-key element, perform tag name conversion that describes the tag name character string as a first attribute value corresponding to the first attribute name in the new element,
Creating a content character string including the content of the non-key element, performing content conversion describing the content character string as a second attribute value corresponding to the second attribute name in the new element;
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記３）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
該非キー要素のタグ名を含むタグ名文字列を、所定のタグ名として付与された新たな要素を作成するタグ名変換を行ない、
該非キー要素の内容を含む内容文字列を作成し、該内容文字列を、該新たな要素の内容として記述する内容変換を行ない、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 (Additional remark 3) The element which comprises the structured document of conversion object is divided into the key element used as the object of the data process with respect to this structured document, and the non-key element which is not the object of the said data process,
Perform tag name conversion to create a new element given a tag name character string including the tag name of the non-key element as a predetermined tag name,
Creating a content string containing the content of the non-key element, performing content conversion describing the content string as the content of the new element,
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記４）該タグ名文字列を、区切り記号を介して該非キー要素のタグ名を繋いで作成することを特徴とする、付記１〜付記３のいずれか一つに記載の構造化文書変換方法。
（付記５）該非キー要素が複数階層を成している場合、該タグ名文字列において、該複数階層を成す非キー要素のタグ名に、階層構造識別情報を付加することを特徴とする、付記４記載の構造化文書変換方法。 (Supplementary note 4) The structured document conversion according to any one of Supplementary notes 1 to 3, wherein the tag name character string is created by connecting the tag names of the non-key elements via a delimiter. Method.
(Supplementary note 5) When the non-key element has a plurality of hierarchies, in the tag name character string, hierarchical structure identification information is added to the tag name of the non-key element having the plurality of hierarchies. The structured document conversion method according to appendix 4.

（付記６）該非キー要素が属性を有する場合、該タグ名文字列において、該属性を有する非キー要素のタグ名の後に、該区切り記号を介して、属性名識別情報を付加した該属性の属性名を記述し、
該内容文字列を、区切り記号を介して該非キー要素の内容を繋いで作成するとともに、該内容文字列において、該属性を有する該非キー要素の内容の後に、該区切り記号を介して、該属性の属性値を記述することを特徴とする、付記４または付記５に記載の構造化文書変換方法。 (Supplementary note 6) When the non-key element has an attribute, in the tag name character string, the attribute name identification information added with the attribute name identification information after the tag name of the non-key element having the attribute Describe the attribute name,
The content character string is created by connecting the contents of the non-key element via a delimiter, and the attribute is added via the delimiter after the content of the non-key element having the attribute in the content character string. The structured document conversion method according to appendix 4 or appendix 5, wherein the attribute value is described.

（付記７）該内容文字列を、区切り記号を介して該非キー要素の内容を繋いで作成することを特徴とする、付記１〜付記５のいずれか一つに記載の構造化文書変換方法。
（付記８）該キー要素と該非キー要素とを区別するための情報を記述するとともに該新たな要素に関する情報を記述した変換仕様文書を作成し、
該変換仕様文書に基づいて、該変換対象の構造化文書に対し、該非キー要素の記述についての変換を施すことを特徴とする、付記１〜付記７のいずれか一つに記載の構造化文書変換方法。 (Supplementary note 7) The structured document conversion method according to any one of Supplementary notes 1 to 5, wherein the content character string is created by connecting the contents of the non-key elements via a delimiter.
(Supplementary note 8) Create a conversion specification document describing information for distinguishing the key element from the non-key element and describing information about the new element,
The structured document according to any one of appendix 1 to appendix 7, wherein the structured document to be converted is converted based on the conversion specification document with respect to the description of the non-key element. Conversion method.

（付記９）該変換仕様文書に基づいて、前記変換を施された構造化文書に対し、該非キー要素の記述を元の状態に戻す逆変換を施すことを特徴とする、付記８記載の構造化文書変換方法。
（付記１０）該変換仕様文書において、該非キー要素のタグ名と該タグ名よりも短く且つ該タグ名を特定しうる短縮タグ名とを対応付けて記述し、
前記変換時に、該変換仕様文書に基づいて、該非キー要素のタグ名を該短縮タグ名に置換するタグ名短縮変換を行なう一方、
前記逆変換時に、該変換仕様文書に基づいて、該短縮タグ名を該非キー要素のタグ名に置換するタグ名伸長変換を行なうことを特徴とする、付記９記載の構造化文書変換方法。 (Supplementary note 9) The structure according to supplementary note 8, wherein a reverse conversion is performed on the structured document subjected to the conversion based on the conversion specification document to return the description of the non-key element to the original state. Document conversion method.
(Supplementary Note 10) In the conversion specification document, a tag name of the non-key element and a shortened tag name shorter than the tag name and capable of specifying the tag name are described in association with each other.
At the time of the conversion, based on the conversion specification document, while performing a tag name shortened conversion to replace the tag name of the non-key element with the shortened tag name,
10. The structured document conversion method according to appendix 9, wherein tag name expansion conversion is performed to replace the short tag name with the tag name of the non-key element based on the conversion specification document during the reverse conversion.

（付記１１）該変換仕様文書において、前記変換時に前記タグ名短縮変換を行なうか否かのタグ名短縮変換情報を記述し、
前記変換時または前記逆変換時に、該変換仕様文書における該タグ名短縮変換情報に基づいて、前記タグ名短縮変換および前記タグ名伸長変換の実行／非実行を選択することを特徴とする、付記１０記載の構造化文書変換方法。 (Supplementary Note 11) In the conversion specification document, tag name shortening conversion information indicating whether or not to perform the tag name shortening conversion at the time of the conversion is described.
The execution time or non-execution of the tag name shortening conversion and the tag name decompression conversion is selected based on the tag name shortening conversion information in the conversion specification document at the time of the conversion or the reverse conversion. 10. The structured document conversion method according to 10.

（付記１２）該変換対象の構造化文書が、レコード毎の要素の種類および数が同じである表形式で記述されている場合、該キー要素と該非キー要素とを区別するための情報を記述するとともに、該非キー要素のタグ名と該タグ名を代表する前記所定のタグ名としての代表タグ名とを対応付けて記述した変換仕様文書を作成し、
該変換仕様文書に基づいて、該変換対象の構造化文書に対し、前記タグ名変換を省略し前記内容変換のみを行なう表形式変換を施すことを特徴とする、付記１〜付記１１のいずれか一つに記載の構造化文書変換方法。 (Supplementary note 12) When the structured document to be converted is described in a table format in which the types and numbers of elements for each record are the same, information for distinguishing the key element from the non-key element is described. And creating a conversion specification document in which the tag name of the non-key element and the representative tag name representing the tag name are described in association with each other,
Any one of appendix 1 to appendix 11, wherein the structured document to be converted is subjected to tabular conversion that omits the tag name conversion and performs only the content conversion based on the conversion specification document. The structured document conversion method as described in one.

（付記１３）該変換仕様文書に基づいて、前記代表タグ名から該非キー要素のタグ名を割り出し、前記表形式変換を施された構造化文書に対し、該非キー要素の記述を元の状態に戻す表形式逆変換を施すことを特徴とする、付記１２記載の構造化文書変換方法。
（付記１４）該変換対象の構造化文書が、レコード毎の要素の種類および数が同じである表形式で記述されている場合、該キー要素と該非キー要素とを区別するための情報を記述するとともに、該非キー要素のタグ名および属性名と該タグ名および該属性名を代表する前記所定のタグ名としての代表タグ名とを対応付けて記述した変換仕様文書を作成し、
該変換仕様文書に基づいて、該変換対象の構造化文書に対し、前記タグ名変換を省略し前記内容変換のみを行なう表形式変換を施すことを特徴とする、付記１〜付記１１のいずれか一つに記載の構造化文書変換方法。 (Supplementary Note 13) Based on the conversion specification document, the tag name of the non-key element is determined from the representative tag name, and the description of the non-key element is returned to the original state for the structured document subjected to the table format conversion. The structured document conversion method according to appendix 12, wherein reverse table format reverse conversion is performed.
(Supplementary Note 14) When the structured document to be converted is described in a tabular format having the same type and number of elements for each record, information for distinguishing the key element from the non-key element is described. And creating a conversion specification document in which the tag name and attribute name of the non-key element are associated with the representative tag name as the predetermined tag name representing the tag name and the attribute name,
Any one of appendix 1 to appendix 11, wherein the structured document to be converted is subjected to tabular conversion that omits the tag name conversion and performs only the content conversion based on the conversion specification document. The structured document conversion method as described in one.

（付記１５）該変換仕様文書に基づいて、前記代表タグ名から該非キー要素のタグ名および属性名を割り出し、前記表形式変換を施された構造化文書に対し、該非キー要素の記述を元の状態に戻す表形式逆変換を施すことを特徴とする、付記１４記載の構造化文書変換方法。
（付記１６）該変換仕様文書において、該変換対象の構造化文書が表形式で記述されているか否かの表形式情報を記述し、
該変換仕様文書における該表形式情報に基づいて、前記表形式変換および前記表形式逆変換の実行／非実行を選択することを特徴とする、付記１３または付記１５に記載の構造化文書変換方法。 (Supplementary Note 15) Based on the conversion specification document, the tag name and attribute name of the non-key element are determined from the representative tag name, and the description of the non-key element is based on the structured document subjected to the tabular conversion. 15. The structured document conversion method according to appendix 14, wherein the table format reverse conversion is performed to return to the state of (14).
(Supplementary Note 16) In the conversion specification document, describe tabular information indicating whether or not the structured document to be converted is described in a tabular format.
16. The structured document conversion method according to appendix 13 or appendix 15, wherein execution / non-execution of the tabular conversion and the tabular reverse conversion is selected based on the tabular information in the conversion specification document .

（付記１７）該変換対象の構造化文書が表形式ではない旨が該表形式情報として記述されている場合に、前記タグ名変換を実行することを特徴とする、付記１６記載の構造化文書変換方法。
（付記１８）該変換仕様文書を構造化文書として作成し変換実行手順を与えることを特徴とする、付記８〜付記１７のいずれか一つに記載の構造化文書変換方法。 (Supplementary note 17) The structured document according to supplementary note 16, wherein the tag name conversion is executed when it is described as the tabular information that the structured document to be converted is not tabular. Conversion method.
(Supplementary note 18) The structured document conversion method according to any one of supplementary notes 8 to 17, wherein the conversion specification document is created as a structured document and a conversion execution procedure is given.

（付記１９）該変換仕様文書に基づいて、前記の変換を指示する変換用スタイルシートを生成し、
構造化文書変換プロセッサに、該変換用スタイルシートを用いて前記の変換を実行させることを特徴とする、付記８〜付記１８のいずれか一つに記載の構造化文書変換方法。
（付記２０）該変換仕様文書に基づいて、前記の逆変換を指示する逆変換用スタイルシートを生成し、
構造化文書変換プロセッサに、該逆変換用スタイルシートを用いて前記の逆変換を実行させることを特徴とする、付記８〜付記１９のいずれか一つに記載の構造化文書変換方法。 (Supplementary Note 19) Based on the conversion specification document, a conversion style sheet for instructing the conversion is generated,
The structured document conversion method according to any one of appendix 8 to appendix 18, wherein the structured document conversion processor is configured to execute the conversion using the conversion style sheet.
(Additional remark 20) Based on this conversion specification document, the style sheet for reverse conversion which instruct | indicates the said reverse conversion is produced | generated,
20. The structured document conversion method according to any one of appendices 8 to 19, wherein the structured document conversion processor causes the reverse conversion to be performed using the reverse conversion style sheet.

（付記２１）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
所定のタグ名を付与された新たな要素を作成し、
該非キー要素の記述中においてタグ付けに関連する記号をタグ付けに関連しない文字列に置き換えた文字列を作成し、
該文字列を、該新たな要素の内容として記述し、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 (Supplementary note 21) The elements constituting the structured document to be converted are divided into a key element that is a target of data processing for the structured document and a non-key element that is not a target of the data processing.
Create a new element with a given tag name,
Create a character string by replacing a symbol related to tagging with a character string not related to tagging in the description of the non-key element,
Describe the character string as the content of the new element,
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記２２）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
所定のタグ名および所定の属性名を付与された新たな要素を作成し、
該非キー要素の記述中においてタグ付けに関連する記号をタグ付けに関連しない文字列に置き換えた文字列を作成し、
該文字列を、該新たな要素において、前記所定の属性名に対応する属性値として記述し、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 (Additional remark 22) The element which comprises the structured document of conversion object is divided into the key element used as the object of the data process with respect to this structured document, and the non-key element which is not the object of the said data process,
Create a new element with a given tag name and given attribute name,
Create a character string by replacing a symbol related to tagging with a character string not related to tagging in the description of the non-key element,
The character string is described as an attribute value corresponding to the predetermined attribute name in the new element,
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記２３）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
該非キー要素の親要素に新たな属性名を付与し、
該非キー要素の記述中においてタグ付けに関連する記号をタグ付けに関連しない文字列に置き換えた文字列を作成し、
該文字列を、該親要素において、前記新たな属性名に対応する属性値として記述し、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 (Additional remark 23) The element which comprises the structured document of conversion object is divided into the key element used as the object of the data processing with respect to this structured document, and the non-key element which is not the object of the said data processing,
Give a new attribute name to the parent element of the non-key element,
Create a character string by replacing a symbol related to tagging with a character string not related to tagging in the description of the non-key element,
The character string is described as an attribute value corresponding to the new attribute name in the parent element,
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記２４）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
該非キー要素の記述中においてタグ付けに関連する記号をタグ付けに関連しない文字列に置き換えた文字列を作成し、
該文字列を、該非キー要素の親要素の内容として記述し、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 (Supplementary Note 24) The elements forming the structured document to be converted are divided into key elements that are targets of data processing on the structured documents and non-key elements that are not targets of data processing.
Create a character string by replacing a symbol related to tagging with a character string not related to tagging in the description of the non-key element,
Describe the character string as the content of the parent element of the non-key element,
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記２５）該キー要素と該非キー要素とを区別するための情報を記述するとともに該新たな要素に関する情報を記述した変換仕様文書を作成し、
該変換仕様文書に基づいて、該変換対象の構造化文書に対し、該非キー要素の記述についての変換を施すことを特徴とする、付記２１または付記２２に記載の構造化文書変換方法。 (Supplementary Note 25) Create a conversion specification document that describes information for distinguishing the key element from the non-key element and describes information about the new element.
23. The structured document conversion method according to appendix 21 or appendix 22, wherein the structured document to be converted is converted based on the conversion specification document for the description of the non-key element.

（付記２６）該キー要素と該非キー要素とを区別するための情報を記述するとともに該親要素に関する情報を記述した変換仕様文書を作成し、
該変換仕様文書に基づいて、該変換対象の構造化文書に対し、該非キー要素の記述についての変換を施すことを特徴とする、付記２３または付記２４に記載の構造化文書変換方法。 (Supplementary Note 26) Create a conversion specification document that describes information for distinguishing between the key element and the non-key element and describes information about the parent element,
25. The structured document conversion method according to appendix 23 or appendix 24, wherein the structured document to be converted is converted with respect to the description of the non-key element based on the conversion specification document.

（付記２７）該変換仕様文書に基づいて、前記変換を施された構造化文書に対し、該非キー要素の記述を元の状態に戻す逆変換を施すことを特徴とする、付記２５または付記２６に記載の構造化文書変換方法。
（付記２８）該変換仕様文書を構造化文書として作成し変換実行手順を与えることを特徴とする、付記２５〜付記２７のいずれか一つに記載の構造化文書変換方法。 (Supplementary note 27) Supplementary note 25 or Supplementary note 26, characterized in that, based on the conversion specification document, reverse conversion is performed on the converted structured document to return the description of the non-key element to the original state. The structured document conversion method described in 1.
(Supplementary note 28) The structured document conversion method according to any one of supplementary notes 25 to 27, wherein the conversion specification document is created as a structured document and a conversion execution procedure is given.

（付記２９）該変換仕様文書に基づいて、前記変換を指示する変換用スタイルシートを生成し、
構造化文書変換プロセッサに、該変換用スタイルシートを用いて前記変換を実行させることを特徴とする、付記２５〜付記２８のいずれか一つに記載の構造化文書変換方法。
（付記３０）該変換仕様文書に基づいて、前記逆変換を指示する逆変換用スタイルシートを生成し、
構造化文書変換プロセッサに、該逆変換用スタイルシートを用いて前記の逆変換を実行させることを特徴とする、付記２５〜付記２９のいずれか一つに記載の構造化文書変換方法。 (Supplementary Note 29) Based on the conversion specification document, a conversion style sheet for instructing the conversion is generated,
29. The structured document conversion method according to any one of appendix 25 to appendix 28, wherein the structured document conversion processor causes the conversion style sheet to be used for the conversion.
(Supplementary Note 30) Based on the conversion specification document, a reverse conversion style sheet for instructing the reverse conversion is generated,
The structured document conversion method according to any one of appendix 25 to appendix 29, wherein the structured document conversion processor is configured to execute the reverse conversion using the reverse conversion style sheet.

（付記３１）前記タグ付けに関連しない文字列として、前記タグ付けに関連する記号の実体参照記述を用いることを特徴とする、付記２１〜付記３０のいずれか一つに記載の構造化文書変換方法。
（付記３２）該変換対象の構造化文書がＸＭＬ(eXtensible Markup Language)文書である場合、前記タグ付けに関連する記号「＜」および「＞」をそれぞれ「&lt；」および「&gt；」に置き換えることを特徴とする、付記３１記載の構造化文書変換方法。 (Supplementary note 31) The structured document conversion according to any one of Supplementary notes 21 to 30, wherein an entity reference description of a symbol related to the tagging is used as a character string not related to the tagging. Method.
(Supplementary Note 32) When the structured document to be converted is an XML (eXtensible Markup Language) document, the symbols “<” and “>” related to the tagging are replaced with “<” and “>”, respectively. The structured document conversion method according to supplementary note 31, characterized by that.

（付記３３）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
所定のタグ名を付与された新たな要素を作成し、
該非キー要素を成す文字もしくは文字列に対し出現頻度の高いものほど短い可変長符号を割り付ける可変長符号化を行ない、該可変長符号化により得られたバイナリデータを６ビットずつ１バイトの変換データにパッキングし、各変換データにパッキングされた６ビットデータをＡＳＣＩＩ（American Standard Code for Information Interchange）コードに従う文字コードに変換することにより、該非キー要素を、該文字コードからなる圧縮文字列に変換し、
該圧縮文字列を、該新たな要素の内容として記述し、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 (Additional remark 33) The element which comprises the structured document of conversion object is divided into the key element which is the object of the data processing with respect to this structured document, and the non-key element which is not the object of the said data processing,
Create a new element with a given tag name,
Variable length coding is performed in which a variable length code is assigned to a character or character string constituting the non-key element that has a higher appearance frequency, and binary data obtained by the variable length coding is converted into 1-byte converted data by 6 bits. By converting the 6-bit data packed in each conversion data into a character code according to the ASCII (American Standard Code for Information Interchange) code, the non-key element is converted into a compressed character string composed of the character code. ,
Describe the compressed string as the content of the new element,
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記３４）変換対象の構造化文書を成す要素を、該構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象とならない非キー要素とに分け、
所定のタグ名および所定の属性名を付与された新たな要素を作成し、
該非キー要素を成す文字もしくは文字列に対し出現頻度の高いものほど短い可変長符号を割り付ける可変長符号化を行ない、該可変長符号化により得られたバイナリデータを６ビットずつ１バイトの変換データにパッキングし、各変換データにパッキングされた６ビットデータをＡＳＣＩＩ（American Standard Code for Information Interchange）コードに従う文字コードに変換することにより、該非キー要素を、該文字コードからなる圧縮文字列に変換し、
該圧縮文字列を、該新たな要素において、前記所定の属性名に対応する属性値として記述し、
該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換方法。 (Supplementary Note 34) Elements constituting a structured document to be converted are divided into a key element that is a target of data processing for the structured document and a non-key element that is not a target of data processing.
Create a new element with a given tag name and given attribute name,
Variable length coding is performed in which a variable length code is assigned to a character or character string constituting the non-key element that has a higher appearance frequency, and binary data obtained by the variable length coding is converted into 1-byte converted data by 6 bits. By converting the 6-bit data packed in each conversion data into a character code according to the ASCII (American Standard Code for Information Interchange) code, the non-key element is converted into a compressed character string composed of the character code. ,
The compressed character string is described as an attribute value corresponding to the predetermined attribute name in the new element,
A structured document conversion method, wherein the key element is described as it is in the converted structured document.

（付記３５）該非キー要素を該圧縮文字列に変換するに先立ち、該非キー要素を成す文字列を、予め作成された静的辞書を用いて辞書番号に置き換え、該辞書番号を含む文字列を、該圧縮文字列に変換することを特徴とする、付記３３または付記３４に記載の構造化文書変換方法。
（付記３６）該変換後の構造化文書を逆変換する際には、該圧縮文字列を該変換後の構造化文書から取り出し、
前記圧縮文字列における各文字コードを、前記ＡＳＣＩＩコードに従って６ビットデータに変換し、
該文字コード毎に得られた６ビットデータから、該非キー要素を成す文字もしくは文字列を復元し、
復元された該非キー要素を用いて、元の構造化文書を復元することを特徴とする、付記３３または付記３４に記載の構造化文書変換方法。 (Supplementary Note 35) Prior to converting the non-key element into the compressed character string, the character string forming the non-key element is replaced with a dictionary number using a previously created static dictionary, and the character string including the dictionary number is The structured document conversion method according to appendix 33 or appendix 34, wherein the structured document is converted into the compressed character string.
(Supplementary Note 36) When the converted structured document is inversely converted, the compressed character string is extracted from the converted structured document,
Each character code in the compressed character string is converted into 6-bit data according to the ASCII code,
From the 6-bit data obtained for each character code, the character or character string constituting the non-key element is restored,
35. The structured document conversion method according to appendix 33 or appendix 34, wherein the original structured document is restored using the restored non-key element.

（付記３７）該キー要素と該非キー要素とを区別するための情報を記述するとともに該新たな要素に関する情報を記述した変換仕様文書を作成し、
該変換仕様文書に基づいて、該変換対象の構造化文書に対し、該非キー要素の記述についての変換を施すことを特徴とする、付記３３〜付記３６のいずれか一つに記載の構造化文書変換方法。 (Supplementary Note 37) Create a conversion specification document that describes information for distinguishing the key element from the non-key element and describes information about the new element.
37. The structured document according to any one of appendix 33 to appendix 36, wherein the structured document to be converted is converted based on the conversion specification document for the description of the non-key element. Conversion method.

（付記３８）該変換仕様文書に基づいて、前記変換を施された構造化文書に対し、該非キー要素の記述を元の状態に戻す逆変換を施すことを特徴とする、付記３７記載の構造化文書変換方法。
（付記３９）該変換仕様文書を構造化文書として作成し変換実行手順を与えることを特徴とする、付記３７または付記３８に記載の構造化文書変換方法。 (Supplementary note 38) The structure according to supplementary note 37, wherein, based on the conversion specification document, a reverse conversion is performed on the converted structured document to return the description of the non-key element to the original state. Document conversion method.
(Supplementary note 39) The structured document conversion method according to supplementary note 37 or supplementary note 38, wherein the conversion specification document is created as a structured document and a conversion execution procedure is given.

（付記４０）該圧縮文字列に、圧縮時点の文字コード系の種別を示す情報を付与し、
該変換後の構造化文書を逆変換する際には、該情報を参照して該圧縮時点での文字コード系の種別を認識し、
認識された種別の文字コード系を該逆変換時の文字コード系に合わせるように該圧縮文字列を復元することを特徴とする、付記３３〜付記３９のいずれか一つに記載の構造化文書変換方法。 (Supplementary Note 40) Information indicating the type of character code system at the time of compression is attached to the compressed character string,
When reverse-converting the converted structured document, the type of the character code system at the time of compression is recognized with reference to the information,
The structured document according to any one of appendix 33 to appendix 39, wherein the compressed character string is restored so that the character code system of the recognized type matches the character code system at the time of the reverse conversion. Conversion method.

（付記４１）前記ＡＳＣＩＩコードのセットとして、構造化文書においてタグ付けに関連する文字コードを除いたものを用いることを特徴とする、付記３３〜付記４０のいずれか一つに記載の構造化文書変換方法。
（付記４２）変換対象の文字もしくは文字列に対し、出現頻度の高いものほど短い可変長符号を割り付ける可変長符号化を行ない、
該可変長符号化により得られたバイナリデータを６ビットずつ１バイトの変換データにパッキングして出力することを特徴とする、データ変換方法。 (Supplementary note 41) The structured document according to any one of supplementary note 33 to supplementary note 40, wherein the ASCII code set is obtained by removing a character code related to tagging in the structured document. Conversion method.
(Supplementary Note 42) Variable length coding is performed in which a variable length code is assigned to a character or character string to be converted that has a shorter appearance frequency,
A data conversion method comprising: packing binary data obtained by the variable length coding into 1-byte converted data by 6 bits and outputting the data.

（付記４３）各変換データにパッキングされた６ビットデータをＡＳＣＩＩ（American Standard Code for Information Interchange）コードに従う文字コードに変換し、
該変換データ毎に得られた該文字コードを、前記変換対象の文字もしくは文字列の圧縮変換結果として出力することを特徴とする、付記４２記載のデータ変換方法。
（付記４４）前記圧縮変換結果を復元する際には、前記圧縮変換結果における各文字コードを、前記ＡＳＣＩＩコードに従って６ビットデータに変換し、
該文字コード毎に得られた６ビットデータから、前記変換対象の文字もしくは文字列を復元することを特徴とする、付記４３記載のデータ変換方法。 (Supplementary note 43) 6-bit data packed in each conversion data is converted into a character code according to the ASCII (American Standard Code for Information Interchange) code,
43. The data conversion method according to appendix 42, wherein the character code obtained for each conversion data is output as a compression conversion result of the character or character string to be converted.
(Supplementary Note 44) When restoring the compression conversion result, each character code in the compression conversion result is converted into 6-bit data according to the ASCII code,
44. The data conversion method according to appendix 43, wherein the character or character string to be converted is restored from 6-bit data obtained for each character code.

（付記４５）前記ＡＳＣＩＩコードのセットとして、構造化文書においてタグ付けに関連する文字コードを除いたものを用いることを特徴とする、付記４３または付記４４に記載のデータ変換方法。 (Supplementary Note 45) The data conversion method according to Supplementary Note 43 or Supplementary Note 44, wherein the ASCII code set is a structured document excluding character codes related to tagging.

本発明の第１実施形態としての構造化文書変換方法の原理について説明するためのもので、（Ａ）は変換対象のＸＭＬ文書のメモリ展開形式を示す図、（Ｂ）は（Ａ）に示すＸＭＬ文書に第１実施形態の構造化文書変換方法を適用して得られたＸＭＬ文書のメモリ展開形式を示す図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram for explaining the principle of a structured document conversion method according to a first embodiment of the present invention, in which (A) shows a memory expansion format of an XML document to be converted, and (B) shows (A). It is a figure which shows the memory expansion | deployment format of the XML document obtained by applying the structured document conversion method of 1st Embodiment to an XML document. 本発明の第１実施形態としての構造化文書変換方法を適用されるシステムおよびそのシステムにおける変換／逆変換処理の流れを説明するための図である。It is a figure for demonstrating the flow of the conversion / inverse conversion process in the system to which the structured document conversion method as 1st Embodiment of this invention is applied, and the system. （Ａ）は変換対象のＸＭＬ文書の具体例を示す図、（Ｂ）〜（Ｆ）は、それぞれ、（Ａ）に示すＸＭＬ文書に第１実施形態の構造化文書変換方法を適用して得られた変換結果の第１〜第５具体例を示す図である。(A) is a diagram showing a specific example of an XML document to be converted, and (B) to (F) are obtained by applying the structured document conversion method of the first embodiment to the XML document shown in (A). It is a figure which shows the 1st-5th specific example of the obtained conversion result. （Ａ）は変換対象のＸＭＬ文書（表形式）の具体例を示す図、（Ｂ）および（Ｃ）は、それぞれ、（Ａ）に示すＸＭＬ文書が表形式である場合に第１実施形態の構造化文書変換方法を適用して得られた変換結果の第１および第２具体例を示す図である。(A) is a diagram showing a specific example of an XML document (table format) to be converted, and (B) and (C) are diagrams of the first embodiment when the XML document shown in (A) is in a table format, respectively. It is a figure which shows the 1st and 2nd specific example of the conversion result obtained by applying the structured document conversion method. 変換対象のＸＭＬ文書の具体例を示す図である。It is a figure which shows the specific example of the XML document of conversion object. 図５に示すＸＭＬ文書に第１実施形態の構造化文書変換方法を適用して得られた変換結果の第１具体例を示す図である。It is a figure which shows the 1st specific example of the conversion result obtained by applying the structured document conversion method of 1st Embodiment to the XML document shown in FIG. 図５に示すＸＭＬ文書に第１実施形態の構造化文書変換方法を適用して得られた変換結果の第２具体例を示す図である。It is a figure which shows the 2nd specific example of the conversion result obtained by applying the structured document conversion method of 1st Embodiment to the XML document shown in FIG. 図５に示すＸＭＬ文書に第１実施形態の構造化文書変換方法を適用して得られた変換結果の第３具体例を示す図である。It is a figure which shows the 3rd specific example of the conversion result obtained by applying the structured document conversion method of 1st Embodiment to the XML document shown in FIG. 第１実施形態における変換仕様文書の具体例を示す図である。It is a figure which shows the specific example of the conversion specification document in 1st Embodiment. 第１実施形態において、図９に示す変換仕様文書により作成された変換用スタイルシートの具体例を示す図である。FIG. 10 is a diagram showing a specific example of a conversion style sheet created by the conversion specification document shown in FIG. 9 in the first embodiment. 第１実施形態において、図９に示す変換仕様文書により作成された逆変換用スタイルシートの具体例を示す図である。FIG. 10 is a diagram showing a specific example of a reverse conversion style sheet created by the conversion specification document shown in FIG. 9 in the first embodiment. 第１実施形態における、タグ名短縮を行なうための変換仕様文書の具体例を示す図である。It is a figure which shows the specific example of the conversion specification document for tag name shortening in 1st Embodiment. 第１実施形態における、データ形式（表形式であるか否か）を指定する機能を有する変換仕様文書の具体例を示す図である。It is a figure which shows the specific example of the conversion specification document which has a function which designates a data format (whether it is a table format) in 1st Embodiment. 第１実施形態における、データ形式（タグ名短縮変換を行なうか否か）を指定する機能を有する変換仕様文書の具体例を示す図である。It is a figure which shows the specific example of the conversion specification document which has a function which designates a data format (whether tag name shortening conversion is performed) in 1st Embodiment. 第１実施形態における、レコード内の非キー要素が階層構造を成すとともに属性を有する場合の変換仕様文書の第１具体例を示す図である。It is a figure which shows the 1st specific example of the conversion specification document in case non-key elements in a record have a hierarchical structure and have an attribute in 1st Embodiment. 第１実施形態において、レコード内の非キー要素が階層構造を成すとともに属性を有する場合の変換仕様文書を作成する手順を説明するためのフローチャートである。6 is a flowchart for explaining a procedure for creating a conversion specification document when non-key elements in a record have a hierarchical structure and have attributes in the first embodiment. 第１実施形態における、レコード内の非キー要素が階層構造を成すとともに属性を有する場合の変換仕様文書の第２具体例を示す図である。It is a figure which shows the 2nd specific example of the conversion specification document in case non-key elements in a record have a hierarchical structure and have an attribute in 1st Embodiment. 本発明の第１実施形態としての構造化文書変換方法による変換処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the conversion process procedure by the structured document conversion method as 1st Embodiment of this invention. 本発明の第１実施形態としての構造化文書変換方法による逆変換処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the reverse conversion process procedure by the structured document conversion method as 1st Embodiment of this invention. （Ａ）および（Ｂ）は、それぞれ、第１実施形態における変換用スタイルシートおよび逆変換用スタイルシートの作成手順を説明するためのフローチャート、（Ｃ）および（Ｄ）は、それぞれ、本発明の第１実施形態としての構造化文書変換方法による変換処理手順および逆変換処理手順の変形例を説明するためのフローチャートである。(A) and (B) are flowcharts for explaining the procedure for creating the conversion style sheet and the reverse conversion style sheet in the first embodiment, respectively, and (C) and (D) are the flowcharts of the present invention, respectively. It is a flowchart for demonstrating the modification of the conversion process procedure by the structured document conversion method as 1st Embodiment, and a reverse conversion process procedure. （Ａ）および（Ｂ）は、それぞれ、第１実施形態における変換用スタイルシートおよび逆変換用スタイルシートの作成手順の変形例を説明するためのフローチャートである。(A) And (B) is a flowchart for demonstrating the modification of the creation procedure of the style sheet for conversion and the style sheet for reverse conversion in 1st Embodiment, respectively. 本発明の第２実施形態としての構造化文書変換方法の原理を説明すべく、図１（Ａ）に示すＸＭＬ文書に第２実施形態の構造化文書変換方法を適用して得られたＸＭＬ文書のメモリ展開形式を示す図である。In order to explain the principle of the structured document conversion method as the second embodiment of the present invention, an XML document obtained by applying the structured document conversion method of the second embodiment to the XML document shown in FIG. It is a figure which shows the memory expansion | deployment format. 図４（Ａ）に示すＸＭＬ文書に第２実施形態の構造化文書変換方法を適用して得られた変換結果の第１具体例を示す図である。It is a figure which shows the 1st specific example of the conversion result obtained by applying the structured document conversion method of 2nd Embodiment to the XML document shown to FIG. 4 (A). 図４（Ａ）に示すＸＭＬ文書に第２実施形態の構造化文書変換方法を適用して得られた変換結果の第２具体例を示す図である。It is a figure which shows the 2nd specific example of the conversion result obtained by applying the structured document conversion method of 2nd Embodiment to the XML document shown to FIG. 4 (A). 図４（Ａ）に示すＸＭＬ文書に第２実施形態の構造化文書変換方法を適用して得られた変換結果の第３具体例を示す図である。It is a figure which shows the 3rd specific example of the conversion result obtained by applying the structured document conversion method of 2nd Embodiment to the XML document shown to FIG. 4 (A). 図４（Ａ）に示すＸＭＬ文書に第２実施形態の構造化文書変換方法を適用して得られた変換結果の第４具体例を示す図である。It is a figure which shows the 4th specific example of the conversion result obtained by applying the structured document conversion method of 2nd Embodiment to the XML document shown to FIG. 4 (A). 第２実施形態における変換仕様文書の具体例を示す図である。It is a figure which shows the specific example of the conversion specification document in 2nd Embodiment. 第２実施形態において、図２７に示す変換仕様文書により作成された変換用スタイルシートの具体例を示す図である。FIG. 28 is a diagram showing a specific example of a conversion style sheet created by the conversion specification document shown in FIG. 27 in the second embodiment. 第２実施形態において、図２７に示す変換仕様文書により作成された逆変換用スタイルシートの具体例を示す図である。FIG. 28 is a diagram showing a specific example of a reverse conversion style sheet created by the conversion specification document shown in FIG. 27 in the second embodiment. 第２実施形態において、レコード内の非キー要素が階層構造を成すとともに属性を有する場合の変換仕様文書を作成する手順を説明するためのフローチャートである。In a 2nd embodiment, it is a flow chart for explaining a procedure which creates a conversion specification document when a non-key element in a record has a hierarchical structure and has an attribute. 本発明の第２実施形態としての構造化文書変換方法による変換処理手順の第１例を説明するためのフローチャートである。It is a flowchart for demonstrating the 1st example of the conversion process procedure by the structured document conversion method as 2nd Embodiment of this invention. 本発明の第２実施形態としての構造化文書変換方法による逆変換処理手順の第１例を説明するためのフローチャートである。It is a flowchart for demonstrating the 1st example of the reverse conversion process sequence by the structured document conversion method as 2nd Embodiment of this invention. 本発明の第２実施形態としての構造化文書変換方法による変換処理手順の第２例を説明するためのフローチャートである。It is a flowchart for demonstrating the 2nd example of the conversion process procedure by the structured document conversion method as 2nd Embodiment of this invention. 本発明の第２実施形態としての構造化文書変換方法による逆変換処理手順の第２例を説明するためのフローチャートである。It is a flowchart for demonstrating the 2nd example of the reverse conversion process sequence by the structured document conversion method as 2nd Embodiment of this invention. 本発明の第２実施形態としての構造化文書変換方法による変換処理手順の第３例を説明するためのフローチャートである。It is a flowchart for demonstrating the 3rd example of the conversion process procedure by the structured document conversion method as 2nd Embodiment of this invention. 本発明の第２実施形態としての構造化文書変換方法による逆変換処理手順の第３例を説明するためのフローチャートである。It is a flowchart for demonstrating the 3rd example of the reverse conversion process sequence by the structured document conversion method as 2nd Embodiment of this invention. 本発明の第２実施形態としての構造化文書変換方法による変換処理手順の第４例を説明するためのフローチャートである。It is a flowchart for demonstrating the 4th example of the conversion process procedure by the structured document conversion method as 2nd Embodiment of this invention. 本発明の第２実施形態としての構造化文書変換方法による逆変換処理手順の第４例を説明するためのフローチャートである。It is a flowchart for demonstrating the 4th example of the reverse conversion process sequence by the structured document conversion method as 2nd Embodiment of this invention. （Ａ）および（Ｂ）は、それぞれ、第２実施形態における変換用スタイルシートおよび逆変換用スタイルシートの作成手順を説明するためのフローチャート、（Ｃ）および（Ｄ）は、それぞれ、本発明の第２実施形態としての構造化文書変換方法による変換処理手順および逆変換処理手順の第５例を説明するためのフローチャートである。(A) and (B) are flowcharts for explaining the procedure for creating the conversion style sheet and the reverse conversion style sheet in the second embodiment, respectively. (C) and (D) are the flowcharts of the present invention, respectively. It is a flowchart for demonstrating the 5th example of the conversion process procedure by the structured document conversion method as 2nd Embodiment, and an inverse conversion process procedure. 本発明の第３実施形態としての構造化文書変換方法の原理を説明すべく、図１（Ａ）に示すＸＭＬ文書に第３実施形態の構造化文書変換方法を適用して得られたＸＭＬ文書のメモリ展開形式を示す図である。In order to explain the principle of the structured document conversion method as the third embodiment of the present invention, an XML document obtained by applying the structured document conversion method of the third embodiment to the XML document shown in FIG. It is a figure which shows the memory expansion | deployment format. 第３実施形態で用いられるデータ変換方法を説明するためのもので、（Ａ）はデータ変換処理（圧縮処理）の流れを説明するための図、（Ｂ）はデータ逆変換処理（伸長処理）の流れを説明するための図である。It is for demonstrating the data conversion method used by 3rd Embodiment, (A) is a figure for demonstrating the flow of a data conversion process (compression process), (B) is a data reverse conversion process (decompression process). It is a figure for demonstrating the flow of. 第３実施形態における文字コード変換用ルックアップテーブルの具体例を示す図である。It is a figure which shows the specific example of the lookup table for character code conversion in 3rd Embodiment. 本発明の第３実施形態としての構造化文書変換方法を適用されるシステムおよびそのシステムにおける変換／逆変換処理の流れを説明するための図である。It is a figure for demonstrating the system of the structured document conversion method as 3rd Embodiment of this invention, and the flow of the conversion / inverse conversion process in the system. （Ａ）および（Ｂ）は、それぞれ、図４（Ａ）に示すＸＭＬ文書に第３実施形態の構造化文書変換方法を適用して得られた変換結果の第１および第２具体例を示す図である。(A) and (B) respectively show first and second specific examples of conversion results obtained by applying the structured document conversion method of the third embodiment to the XML document shown in FIG. 4 (A). FIG. 第３実施形態において、文字コード系の種別を示す情報を付与された圧縮文字列の具体例を示す図である。In 3rd Embodiment, it is a figure which shows the specific example of the compression character string to which the information which shows the classification of a character code type | system | group was provided. 第３実施形態における変換仕様文書の具体例を示す図である。It is a figure which shows the specific example of the conversion specification document in 3rd Embodiment. 本発明の第３実施形態としての構造化文書変換方法による変換処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the conversion process procedure by the structured document conversion method as 3rd Embodiment of this invention. 本発明の第３実施形態としての構造化文書変換方法による逆変換処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the reverse conversion process sequence by the structured document conversion method as 3rd Embodiment of this invention.

符号の説明Explanation of symbols

１０データ構造変換／逆変換機構（構造化文書変換プロセッサ）
１０Ａデータ構造変換／逆変換機構
１１ＸＳＬＴ変換部（構造化文書変換プロセッサ）
１２ＸＳＬＴ構造変換部（構造化文書変換プロセッサ）
１３ＸＳＬＴ逆変換部（構造化文書変換プロセッサ）
２０標準ＡＰＩ
３０応用ソフトウエア（アプリケーション）
４１圧縮用静的単語辞書（静的辞書）
４２，４３符号表
４４復元用静的単語辞書（静的辞書）
４５文字コード変換用ルックアップテーブル（ＬＵＴ） 10 Data structure conversion / inverse conversion mechanism (structured document conversion processor)
10A Data structure conversion / inverse conversion mechanism 11 XSLT conversion unit (structured document conversion processor)
12 XSLT structure conversion unit (structured document conversion processor)
13 XSLT inverse conversion unit (structured document conversion processor)
20 Standard API
30 Application software (application)
41 Static word dictionary for compression (static dictionary)
42, 43 Code table 44 Static word dictionary for restoration (static dictionary)
45 Lookup table (LUT) for character code conversion

Claims

構造化文書を変換する処理部を有する構造化文書変換装置であって、
該処理部が、
変換対象の構造化文書を成す要素につき、構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象にならない非キー要素とに区別するための区別情報を読み込み、
所定のタグ名および所定の属性名を付与された、該区別情報における該非キー要素のための新たな要素を作成し、
該非キー要素のタグ名を区切り記号を介して繋いで該非キー要素のタグ名を含むタグ名文字列を作成し、該タグ名文字列を、該新たな要素において、前記所定の属性名に対応する属性値として記述し、
該非キー要素の内容を区切り記号を介して繋いで該非キー要素の内容を含む内容文字列を作成し、該内容文字列を、該新たな要素の内容として記述し、
該区別情報における該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換装置。 A structured document conversion apparatus having a processing unit for converting a structured document,
The processing unit
For the elements constituting the structured document to be converted, read the distinction information for distinguishing between the key element that is the target of data processing for the structured document and the non-key element that is not the target of the data processing,
Creating a new element for the non-key element in the distinction information, given a predetermined tag name and a predetermined attribute name;
A tag name character string including the tag name of the non-key element is created by connecting the tag names of the non-key element via a delimiter, and the tag name character string corresponds to the predetermined attribute name in the new element As an attribute value
Connecting the contents of the non-key elements via a delimiter to create a content string containing the contents of the non-key element, describing the contents string as the contents of the new element,
A structured document conversion apparatus characterized in that the key element in the distinction information is described as it is in the converted structured document.

該処理部が、該区別情報を記述するとともに該新たな要素のタグ名を記述した変換仕様文書を読み込み、該変換仕様文書に基づいて、該変換対象の構造化文書に対し、該非キー要素の記述についての変換を施すことを特徴とする、請求項１記載の構造化文書変換装置。 The processing unit reads the conversion specification document describing the distinction information and the tag name of the new element, and based on the conversion specification document, the non-key element of the non-key element is read from the structured document to be converted. The structured document conversion apparatus according to claim 1, wherein conversion is performed on the description.

構造化文書を変換する処理部を有する構造化文書変換装置であって、
該処理部が、
変換対象の構造化文書を成す要素につき、構造化文書に対するデータ処理の対象となるキー要素と前記データ処理の対象にならない非キー要素とに区別するための区別情報を読み込み、
所定のタグ名を付与された、該区別情報における該非キー要素のための新たな要素を作成し、
該非キー要素の記述中においてタグを表わす記号を実体参照記述によりタグ付けに関連しない実体参照文字列に置き換えた文字列を作成し、該文字列を、該新たな要素の内容として記述し、
該区別情報における該キー要素を、変換後の構造化文書においてそのまま記述することを特徴とする、構造化文書変換装置。 A structured document conversion apparatus having a processing unit for converting a structured document,
The processing unit
For the elements constituting the structured document to be converted, read the distinction information for distinguishing between the key element that is the target of data processing for the structured document and the non-key element that is not the target of the data processing,
Create a new element for the non-key element in the distinction information, given a predetermined tag name,
Creating a character string in which the symbol representing the tag in the description of the non-key element is replaced by an entity reference character string not related to tagging by the entity reference description, and describing the character string as the content of the new element;
A structured document conversion apparatus characterized in that the key element in the distinction information is described as it is in the converted structured document.