JP4801368B2

JP4801368B2 - Image processing apparatus, image processing method, image processing program, and recording medium

Info

Publication number: JP4801368B2
Application number: JP2005117216A
Authority: JP
Inventors: 敦久斉藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2005-04-14
Filing date: 2005-04-14
Publication date: 2011-10-26
Anticipated expiration: 2025-04-14
Also published as: JP2006293917A

Description

本発明は、画像処理装置、画像処理方法、画像処理プログラム及び記録媒体に関し、特に所定の画像データに画像として含まれているテキスト情報を判定する画像処理装置、画像処理方法、画像処理プログラム及び記録媒体に関する。 The present invention relates to an image processing device, an image processing method, an image processing program, and a recording medium, and in particular, an image processing device, an image processing method, an image processing program, and a recording for determining text information included as an image in predetermined image data. It relates to the medium.

以前は、セキュリティと言えばウィルス等の外部からの攻撃ばかりが強調されてきた。しかし、近年では顧客のデータやプライバシー情報の漏洩といったように、企業又は個人を問わず内部からの情報漏洩も注目されている。このような情報漏洩に対する対策としては、ファイアウォール等で出口を塞ぐといった方法等では不十分で、情報資産それぞれの価値や使われ方等に応じた対策をとる必要がある。 In the past, security has only emphasized external attacks such as viruses. However, in recent years, information leakage from inside regardless of a company or an individual has been attracting attention, such as leakage of customer data and privacy information. As a countermeasure against such information leakage, a method such as closing an exit with a firewall or the like is not sufficient, and it is necessary to take a countermeasure according to the value of each information asset and how it is used.

一般的に、企業では、その情報資産は文書という形で形成され、蓄積され、利用されている。この企業内文書の機密性を考慮し、その機密性応じて企業文書の取り扱いをコントロールすることは非常に重要だと言える。かかる背景より、企業文書の取扱いを制限するための様々な技術が既に存在する。 In general, in an enterprise, the information assets are formed, stored and used in the form of documents. Considering the confidentiality of this corporate document, it is very important to control the handling of corporate documents according to the confidentiality. Against this background, various technologies for limiting the handling of corporate documents already exist.

例えば、特許文献１に記載されている技術では、文書の取り扱いをコントロールするために、各文書に対して各ユーザがどのようなアクセスが許可されているかを示すリスト(ＡＣＬ（Access Control List）)を付与し、ＡＣＬに基づいてシステムが動作することで文書の機密性を確保している。但し、ＡＣＬに基づいて動作しているシステムの内部では機密性が確保され得るが、そのＡＣＬに基づいてアクセスが許可されているユーザによってシステムの外部に持ち出されてしまった文書については機密性は保たれないことになる。 For example, in the technique described in Patent Document 1, in order to control the handling of a document, a list (ACL (Access Control List)) indicating what access each user is permitted to each document. And the confidentiality of the document is ensured by operating the system based on the ACL. However, while confidentiality can be ensured inside a system operating based on ACL, confidentiality is not provided for documents taken out of the system by users who are permitted to access based on ACL. It will not be kept.

また、特許文献２に記載されている技術では、ＸＭＬ（eXtensible Markup Language）文書の中に、タグの属性としてアクセス権限を持ったグループを記述したり、暗号化や有効期限の指定を行ったりすることで、ＸＭＬ文書がシステムを離れた場合であっても、当該ＸＭＬ文書に対するアクセス権限の保持を可能としている。 In the technique described in Patent Document 2, a group having an access right as an attribute of a tag is described in an XML (eXtensible Markup Language) document, or encryption or an expiration date is designated. As a result, even when the XML document leaves the system, it is possible to retain the access authority for the XML document.

また、特許文献３に記載されている技術では、文書を印刷不可のデータと印刷データとに変換し、元の文書と関連付けて保存しておく。そして、クライアントからの閲覧要求に対しては印刷不可のデータを送信し、印刷要求に対しては印刷データをプリンタ等に送信する。すなわち、要求されるアクセスに応じた文書を予め用意しておくことで、要求されたアクセス権限以上の情報が漏れることを防止している。
特開平６−４５３０号公報特開２００１−２７３２８５号公報特開２００２−３４２０６０号公報 In the technique described in Patent Document 3, a document is converted into unprintable data and print data, and stored in association with the original document. In response to a browsing request from a client, data that cannot be printed is transmitted, and in response to a print request, print data is transmitted to a printer or the like. That is, by preparing a document corresponding to the requested access in advance, it is possible to prevent leakage of information exceeding the requested access authority.
JP-A-6-4530 JP 2001-273285 A JP 2002-342060 A

しかしながら、特許文献１、特許文献２、及び特許文献３等に記載されている技術は、いずれについてもユーザによってなんらかの情報が定義又は設定されることを必要とする。すなわち、特許文献１に記載されている技術では、予めＡＣＬが設定されていなければアクセス制御を実現することはできない。また、特許文献２に記載されている技術では、文書の中にアクセス制御を行うための情報が付加されていなければ制御することはできない。更に、特許文献３に記載されている技術では、アクセス権限に応じた専用のファイルを予め生成されていなければ制御を行うことができない。 However, the techniques described in Patent Document 1, Patent Document 2, and Patent Document 3 require that some information be defined or set by the user. That is, with the technique described in Patent Document 1, access control cannot be realized unless ACL is set in advance. In the technique described in Patent Document 2, control is not possible unless information for access control is added to the document. Furthermore, in the technique described in Patent Document 3, control cannot be performed unless a dedicated file corresponding to the access authority is generated in advance.

すなわち、従来の技術はいずれもユーザの判断によってそのアクセス権限等のセキュリティ情報が与えられることで初めて機能する。また、その機能も文書がシステムの内部にあるときだけ有効であったり、システムが付加した情報があるときだけ有効だったりする。したがって、例えば、スキャナ等によって紙文書から読み取られた画像データを即座に、すなわち、所定のシステムに登録する前に保護することは困難であるという問題がある。 That is, all of the conventional techniques function only when security information such as access authority is given by the judgment of the user. The function is also effective only when the document is inside the system, or only when there is information added by the system. Therefore, for example, there is a problem that it is difficult to protect image data read from a paper document by a scanner or the like immediately, that is, before being registered in a predetermined system.

そこで、スキャナ等によって読み取られた画像データからＯＣＲ（Optical Character Recognition）によってテキスト情報を抽出し、抽出されたテキスト情報と、文書管理システム等においてセキュリティ情報が設定されて管理されている文書の内容とを比較することで、画像データより抽出されたテキスト情報と類似する内容を有する文書に設定されているセキュリティ情報を画像データに対して適用させることが考えられる。 Therefore, text information is extracted from the image data read by a scanner or the like by OCR (Optical Character Recognition), the extracted text information, and the contents of a document that is managed by setting security information in a document management system or the like. It is conceivable that the security information set in the document having contents similar to the text information extracted from the image data is applied to the image data.

但し、この場合、ＯＣＲの認識率によっては、画像データより抽出されたテキスト情報と文書管理システムにおける文書との類否判断に関して適切な判断結果が得られない可能性があるという問題がある。 However, in this case, depending on the recognition rate of the OCR, there is a problem that an appropriate determination result may not be obtained regarding the similarity determination between the text information extracted from the image data and the document in the document management system.

本発明は、上記の点に鑑みてなされたものであって、画像データに画像として含まれているテキスト情報を適切に判定することのできる画像処理装置、画像処理方法、画像処理プログラム及び記録媒体の提供を目的とする。 The present invention has been made in view of the above points, and is an image processing apparatus, an image processing method, an image processing program, and a recording medium that can appropriately determine text information included as an image in image data. The purpose is to provide.

そこで上記課題を解決するため、本発明は、所定の画像データに画像として含まれているテキスト情報を判定する画像処理装置であって、第一のテキスト情報と、前記第一のテキスト情報を画像として含む画像データより光学的文字認識によって抽出された第二のテキスト情報とを関連付けて保持する保持手段と、前記所定の画像データより光学的文字認識によって第三のテキスト情報を抽出するテキスト情報抽出手段と、前記第三のテキスト情報と、前記保持手段に保持されている前記第二のテキスト情報との比較に基づいて、前記所定の画像データに含まれているテキスト情報は当該第二のテキスト情報に関連付けられている前記第一のテキスト情報であると判定する判定手段とを有することを特徴とする。 Accordingly, in order to solve the above-described problem, the present invention provides an image processing apparatus for determining text information included as an image in predetermined image data, wherein the first text information and the first text information are converted into an image. Holding means for associating and holding second text information extracted by optical character recognition from the image data included as text information extraction for extracting third text information by optical character recognition from the predetermined image data The text information included in the predetermined image data is based on the comparison between the second text information stored in the storage unit and the second text information stored in the storage unit. And determining means for determining that the first text information is associated with the information.

このような画像処理装置では、画像データに画像として含まれているテキスト情報を適切に判定することができる。 In such an image processing apparatus, text information included as an image in image data can be appropriately determined.

また、上記課題を解決するため、本発明は、上記画像処理装置における画像処理方法、前記画像処理方法をコンピュータに実行させるための画像処理プログラム、又は前記画像処理プログラムを記録した記録媒体としてもよい。 In order to solve the above problems, the present invention may be an image processing method in the image processing apparatus, an image processing program for causing a computer to execute the image processing method, or a recording medium on which the image processing program is recorded. .

本発明によれば、画像データに画像として含まれているテキスト情報を適切に判定することのできる画像処理装置、画像処理方法、画像処理プログラム及び記録媒体を提供することができる。 According to the present invention, it is possible to provide an image processing apparatus, an image processing method, an image processing program, and a recording medium that can appropriately determine text information included as an image in image data.

以下、図面に基づいて本発明の実施の形態を説明する。図１は、本発明の実施の形態におけるセキュリティ管理システムの構成例を示す図である。図１において、セキュリティ管理システム１は、文書サーバ２０と、複合機５０と、セキュリティ属性推定サーバ１０とが、ＬＡＮ等のネットワーク（有線又は無線の別は問わない）によって接続されることにより構成されている。なお、文書サーバ２０、複合機５０、及びセキュリティ属性推定サーバ１０等は、同一の企業内又はオフィス内等、情報の機密性が保持されるべき空間内において構成されているものとする。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a configuration example of a security management system according to an embodiment of the present invention. In FIG. 1, the security management system 1 is configured by connecting a document server 20, a multi-function device 50, and a security attribute estimation server 10 via a network such as a LAN (whether wired or wireless). ing. It is assumed that the document server 20, the multi-function device 50, the security attribute estimation server 10 and the like are configured in a space where the confidentiality of information should be maintained, such as in the same company or office.

文書サーバ２０は、一台以上のクライアント（クライアント２２ａ及び２２ｂ等）と共にいわゆる文書管理システムを構成し、クライアント２２ａ等よりアップロードされる電子的な文書データ（以下、単に「文書」という。）に各種属性値を関連付けて管理する文書ＤＢ２１を有する。文書サーバ２０は、定期的又はクライアント２２ａ等より文書がアップロードされるたびに、文書とそのセキュリティ属性の属性値（以下「セキュリティ属性値」という。）とをセキュリティ属性推定サーバ１０に送信（アップロード）する。 The document server 20 constitutes a so-called document management system together with one or more clients (clients 22a and 22b, etc.), and various kinds of electronic document data (hereinafter simply referred to as “documents”) uploaded from the client 22a or the like. It has a document DB 21 that manages attribute values in association with each other. The document server 20 transmits (uploads) the document and the attribute value of the security attribute (hereinafter referred to as “security attribute value”) to the security attribute estimation server 10 periodically or whenever the document is uploaded from the client 22a or the like. To do.

ここで、セキュリティ属性とは、文書に関連付けられる属性のうち文書に対するアクセス制御の判定に用いられる属性等、セキュリティ管理において影響を及ぼす属性をいう。具体的には、運用上、どの属性に着目して文書を保護したいかに依存するが、例えば、所属(会社における部署すなわち管理責任者の管理範囲)、文書の種類(人事関連、経理関連、あるプロジェクト関連等)、関係者、関係グループ、秘密レベル(極秘、部外秘、社外秘、グループ外秘等)、秘密保持期限(秘密レベルを維持しなければならない期限)、有効期限(その文書が効力を持つ期限)、及び保存期限(法律で保存が義務付けられている文書の保存しなければならない期限)等がセキュリティ属性となり得る属性として挙げられる。 Here, the security attribute refers to an attribute that affects security management, such as an attribute used for determining access control for a document among attributes associated with the document. Specifically, depending on which attribute you want to protect the document in operation, for example, belonging (the department in the company, that is, the management scope of the manager), the type of document (HR related, accounting related, etc. Project related, etc.), related parties, related groups, secret level (confidential, confidential, internal secret, group confidential, etc.), confidentiality expiration date (time limit for maintaining the confidential level), expiration date (the document is valid) And the storage expiration date (a time limit for storing a document that is required to be stored by law).

なお、セキュリティ属性に基づくアクセス制御については、特開２００４−０９４４０１号公報、特開２００４−０９４４０５号公報、特開２００４−１０２６３５号公報、及び特開２００４−１０２９０７号公報に詳しい。これらの公報からも明らかなように、セキュリティ属性値を、予め定められているセキュリティポリシーに適用することにより、文書に対するアクセス制御が判定される。したがって、本実施の形態において、セキュリティ属性値はセキュリティ情報に相当する。 Note that access control based on security attributes is described in detail in Japanese Patent Application Laid-Open Nos. 2004-0944401, 2004-094405, 2004-102635, and 2004-102907. As is clear from these publications, access control to a document is determined by applying a security attribute value to a predetermined security policy. Therefore, in the present embodiment, the security attribute value corresponds to security information.

複合機５０は、プリンタ、ＦＡＸ、コピー、スキャナ等の機能が一つの筐体内に実装されている機器である。但し、必ずしも、複合機である必要はなく、いずれか一つの機能を有する機器であってもよい。複合機５０は、機密文書等に含まれている情報の漏洩を防止するため、コピー時、スキャン時、又はＦＡＸ送信時等に原稿から読み取られた画像データをセキュリティ属性推定サーバ１０に転送し、セキュリティ属性推定サーバ１０から返信されるセキュリティ属性値の推定結果に応じてコピー、スキャン又はＦＡＸ送信等の許否等を判定する。 The multi-function device 50 is a device in which functions such as a printer, a FAX, a copy, and a scanner are mounted in one housing. However, it is not always necessary to be a multifunction device, and it may be a device having any one function. In order to prevent leakage of information contained in a confidential document, the multi-function device 50 transfers the image data read from the manuscript at the time of copying, scanning, or FAX transmission to the security attribute estimation server 10, In accordance with the security attribute value estimation result returned from the security attribute estimation server 10, whether or not copying, scanning or FAX transmission is permitted is determined.

セキュリティ属性推定サーバ１０は、本実施の形態において所定の画像データに画像として含まれているテキスト情報を判定する画像処理装置に相当する。より詳しくは、セキュリティ属性推定サーバ１０は、文書サーバ２０より送信される文書より抽出したテキスト情報と、その文書のセキュリティ属性値とをテキスト情報ＤＢ１１に蓄積しておく。また、セキュリティ属性推定サーバ１０は、複合機５０より送信される画像データより抽出したテキスト情報と、テキスト情報ＤＢ１１に蓄積されている各文書のテキスト情報とを比較することにより、蓄積されているテキスト情報の中から画像データと同一の又は類似するテキスト情報を特定する。更に、セキュリティ属性推定サーバ１０は、特定されたテキスト情報に係る文書に対するセキュリティ属性値に基づいて、画像データに適用するセキュリティ属性値を推定する。推定されたセキュリティ属性値は、推定結果として複合機５０に送信される。複合機５０は、推定結果としてのセキュリティ属性値に基づいて、コピー、スキャン又はＦＡＸ送信等の許否を判定する。 The security attribute estimation server 10 corresponds to an image processing apparatus that determines text information included as an image in predetermined image data in the present embodiment. More specifically, the security attribute estimation server 10 stores the text information extracted from the document transmitted from the document server 20 and the security attribute value of the document in the text information DB 11. Also, the security attribute estimation server 10 compares the text information extracted from the image data transmitted from the multi-function device 50 with the text information of each document stored in the text information DB 11 to store the stored text. Text information that is the same as or similar to the image data is specified from the information. Further, the security attribute estimation server 10 estimates a security attribute value to be applied to the image data based on the security attribute value for the document related to the specified text information. The estimated security attribute value is transmitted to the multi-function device 50 as an estimation result. The multi-function device 50 determines whether or not copying, scanning, or FAX transmission is permitted based on the security attribute value as the estimation result.

すなわち、画像データと同一又は似ている文書に対して設定されているアクセス権限等のセキュリティ情報を画像データに適用させ、それによって重要文書等が無条件に読み取られることによる情報の漏洩を防止しようというわけである。 In other words, let's apply security information such as access authority set for documents that are the same as or similar to image data to the image data, thereby preventing leakage of information due to unimportant reading of important documents etc. That is why.

セキュリティ属性推定サーバ１０について更に詳しく説明する。図２は、本発明の実施の形態におけるセキュリティ属性推定サーバの機能構成例を示す図である。図２において、セキュリティ属性推定サーバ１０は、テキスト情報ＤＢ１１と、テキスト情報保存手段１２と、セキュリティ属性推定手段１３とより構成されている。 The security attribute estimation server 10 will be described in more detail. FIG. 2 is a diagram illustrating a functional configuration example of the security attribute estimation server according to the embodiment of the present invention. In FIG. 2, the security attribute estimation server 10 includes a text information DB 11, a text information storage unit 12, and a security attribute estimation unit 13.

テキスト情報保存手段１２は、文書サーバ２０より送信される文書及びセキュリティ属性値をテキスト情報ＤＢ１１に蓄積することを主な機能としており、データ受信部１２１、テキスト情報抽出部１２２、画像情報形成部１２３、誤認識情報抽出部１２４、データ保存部１２５、及びデータ送信部１２６等より構成される。 The text information storage unit 12 has a main function of accumulating documents and security attribute values transmitted from the document server 20 in the text information DB 11, and includes a data receiving unit 121, a text information extracting unit 122, and an image information forming unit 123. The error recognition information extraction unit 124, the data storage unit 125, the data transmission unit 126, and the like.

データ受信部１２１は、文書及びそのセキュリティ属性値を受信する。データ受信部１２１は、文書についてはテキスト情報抽出部１２２及び画像情報形成部１２３のそれぞれに出力し、セキュリティ属性値についてはデータ保存部１２５に出力する。 The data receiving unit 121 receives a document and its security attribute value. The data receiving unit 121 outputs the document to each of the text information extracting unit 122 and the image information forming unit 123, and outputs the security attribute value to the data storage unit 125.

テキスト情報抽出部１２２は、文書より記号として含まれているテキスト情報を抽出し、抽出されたテキスト情報を誤認識情報抽出部１２４及びデータ保存部１２５に出力する。ここで、「記号として含まれているテキスト情報」とは、後述する「画像として含まれているテキスト情報」に対する言葉であり、文字コード等、文字それ自体を識別するための情報によって文書に記録されているテキスト情報をいう。 The text information extraction unit 122 extracts text information included as a symbol from the document, and outputs the extracted text information to the misrecognition information extraction unit 124 and the data storage unit 125. Here, “text information included as a symbol” is a word for “text information included as an image” to be described later, and is recorded in a document by information for identifying the character itself, such as a character code. Text information.

記号として含まれているテキスト情報の抽出は、既存のソフトウェアやツールを利用すればよい。例えば、ＭＳＷｏｒｄの文書であればＭＳＷｏｒｄでその文書を読み込み、保存するファイルタイプとしてテキスト文書を選択することで、テキスト情報を得ることができる。ＭＳＰｏｗｅｒｐｏｉｎｔの文書であれば、読み込み後に一旦ＲＴＦ（Rich Text Format）フォーマットで保存し、さらにＭＳＷｏｒｄを利用してテキストで保存すればよい。また、ＭＳ文書だけでなく一太郎文書やＰＤＦ文書等もそれぞれ対応するソフトウェアを利用すればテキスト情報を得ることができる。 Extraction of text information included as symbols may be performed using existing software or tools. For example, in the case of an MS Word document, the text information can be obtained by reading the document with MS Word and selecting a text document as a file type to be saved. If it is an MS Powerpoint document, it may be saved once in RTF (Rich Text Format) format after being read, and further saved in text using MS Word. Further, text information can be obtained by using software that supports not only MS documents but also Ichitaro documents, PDF documents, and the like.

データ情報抽出部１２２は、また、画像情報形成部１２３によって文書に基づいて形成された画像データよりテキスト情報を抽出し、抽出されたテキスト情報を誤認識情報抽出部１２４に出力する。画像データからのテキスト情報の抽出は、一般的に利用されているＯＣＲ（Optical Character Recognition：光学的文字認識）技術が利用される。なお、データ情報抽出部１２２によって文書より抽出されたテキスト情報を、以下「文書からのテキスト情報」といい、画像データより抽出されたテキスト情報を、以下「画像からのテキスト情報」という。 The data information extracting unit 122 also extracts text information from the image data formed based on the document by the image information forming unit 123 and outputs the extracted text information to the misrecognition information extracting unit 124. Extraction of text information from image data uses a commonly used OCR (Optical Character Recognition) technique. The text information extracted from the document by the data information extraction unit 122 is hereinafter referred to as “text information from the document”, and the text information extracted from the image data is hereinafter referred to as “text information from the image”.

画像情報形成部１２３は、文書に基づいて画像データを生成する。具体的には、画像情報形成部１２３は、プリンタ装置等と同様の画像形成処理を行うことにより文書に基づいて画像データを生成する。画像情報形成部１２３による画像形成処理は、当該文書が出力され得る全てのプリンタ装置の機種のそれぞれに対応したものであることが望ましいが、例えば、プリンタの種別(レーザやインクジェットなど)やメーカなどである程度典型的なものに対応していればよい。 The image information forming unit 123 generates image data based on the document. Specifically, the image information forming unit 123 generates image data based on a document by performing an image forming process similar to that of a printer device or the like. It is desirable that the image forming process by the image information forming unit 123 corresponds to each model of all printer apparatuses that can output the document. For example, the printer type (laser, inkjet, etc.), manufacturer, etc. It only needs to correspond to a typical one.

誤認識情報抽出部１２４は、文書からのテキスト情報と画像からのテキスト情報とを比較することにより、ＯＣＲによる誤認識を検出する。すなわち、文書からのテキスト情報を正しい認識結果として扱い、画像からのテキスト情報に関して前者と異なる部分があるか否かが判定される。誤認識情報抽出部１２４は、誤認識が検出された際に、比較の対象とされた文書からのテキスト情報と画像からのテキスト情報とを関連付けて情誤認識情報としてデータ保存部１２５に出力する。 The misrecognition information extraction unit 124 detects misrecognition due to OCR by comparing text information from a document with text information from an image. That is, text information from a document is treated as a correct recognition result, and it is determined whether or not there is a part different from the former regarding text information from an image. The misrecognition information extraction unit 124 associates the text information from the document to be compared with the text information from the image and outputs the misrecognition information extraction unit 124 as the misrecognition information to the data storage unit 125 when misrecognition is detected. .

データ保存部１２５は、データ受信部１２１より出力されたセキュリティ属性値と、テキスト情報抽出部１２２より抽出された文書からのテキスト情報と、誤認識情報抽出部１２４より出力された誤認識情報とを関連付けてテキスト情報ＤＢ１１に登録する。 The data storage unit 125 includes the security attribute value output from the data reception unit 121, the text information from the document extracted from the text information extraction unit 122, and the erroneous recognition information output from the erroneous recognition information extraction unit 124. It is associated and registered in the text information DB 11.

データ送信部１２６は、一連の処理結果を文書サーバ２０へ返信する。 The data transmission unit 126 returns a series of processing results to the document server 20.

一方、セキュリティ属性推定手段１３は、複合機５０より転送される画像データに対するセキュリティ属性値をテキスト情報ＤＢ１１に蓄積されている情報に基づいて推定することを主な機能としており、データ受信部１３１、テキスト情報抽出部１３２、類似度算出部１３３、データ読み出し部１３４、セキュリティ属性推定部１３５、及びデータ送信部１３６等より構成される。 On the other hand, the security attribute estimation means 13 has a main function of estimating a security attribute value for image data transferred from the multi-function device 50 based on information stored in the text information DB 11. A text information extraction unit 132, a similarity calculation unit 133, a data reading unit 134, a security attribute estimation unit 135, a data transmission unit 136, and the like are included.

データ受信部１３１は、複合機５０より画像データを受信する。テキスト情報抽出部１３２は、ＯＣＲ技術を利用して画像データより画像として含まれているテキスト情報を抽出する。類似度算出部１３３は、テキスト情報抽出部１３２によって抽出されたテキスト情報（以下「対象テキスト」という。）と、テキスト情報ＤＢ１１に保存されている文書からのテキスト情報とを比較し、両者の類似度を算出する。 The data receiving unit 131 receives image data from the multi-function device 50. The text information extraction unit 132 uses the OCR technique to extract text information included as an image from the image data. The similarity calculation unit 133 compares the text information extracted by the text information extraction unit 132 (hereinafter referred to as “target text”) with the text information from the document stored in the text information DB 11, and the similarity between the two. Calculate the degree.

データ読み出し部１３４は、類似度算出部１３３からの要求に応じてテキスト情報ＤＢ１１より文書からのテキスト情報及び誤認識情報を読み出したり、セキュリティ属性推定部１３５からの要求に応じてテキスト情報ＤＢ１１よりセキュリティ属性値を読み出したりする。セキュリティ属性推定部１３５は、類似度算出部１３３によって算出された類似度に応じて対象テキストに係る画像データに適用させるセキュリティ属性値を推定する。データ送信部１３６は、セキュリティ属性推定部１３５による推定結果、すなわち、画像データに適用させるセキュリティ属性値を複合機５０に返信する。 The data reading unit 134 reads text information and misrecognition information from the document from the text information DB 11 in response to a request from the similarity calculation unit 133, or security from the text information DB 11 in response to a request from the security attribute estimation unit 135. Read attribute values. The security attribute estimation unit 135 estimates a security attribute value to be applied to the image data related to the target text according to the similarity calculated by the similarity calculation unit 133. The data transmission unit 136 returns the estimation result by the security attribute estimation unit 135, that is, the security attribute value to be applied to the image data, to the multi-function device 50.

なお、データ受信部１２１及びデータ受信部１３１と、データ送信部１２６及びデータ送信部１３６とはそれぞれ共通のものを利用してもよい。また、データ受信部１２１、データ送信部１２６、データ受信部１３１、及びデータ送信部１３６等によるデータの送受信、すなわち、セキュリティ属性推定サーバ１０と、文書サーバ２０及び複合機５０との通信は、ＨＴＴＰ（HyperText Transfer Protocol）とＸＭＬとを利用したＳＯＡＰ（Simple Object Access Protocol）を利用してもよい。 The data receiving unit 121 and the data receiving unit 131, and the data transmitting unit 126 and the data transmitting unit 136 may use the same ones. In addition, data transmission / reception by the data reception unit 121, the data transmission unit 126, the data reception unit 131, the data transmission unit 136, and the like, that is, communication between the security attribute estimation server 10, the document server 20, and the multi-function device 50 is performed using HTTP. SOAP (Simple Object Access Protocol) using (HyperText Transfer Protocol) and XML may be used.

図３は、本発明の実施の形態におけるセキュリティ属性推定サーバのハードウェア構成例を示す図である。図３のセキュリティ属性推定サーバ１０は、それぞれバスＢで相互に接続されているドライブ装置１００と、補助記憶装置１０２と、メモリ装置１０３と、演算処理装置１０４と、インタフェース装置１０５とを有するように構成される。 FIG. 3 is a diagram illustrating a hardware configuration example of the security attribute estimation server according to the embodiment of the present invention. The security attribute estimation server 10 of FIG. 3 includes a drive device 100, an auxiliary storage device 102, a memory device 103, an arithmetic processing device 104, and an interface device 105, which are mutually connected by a bus B. Composed.

セキュリティ属性推定サーバ１０での処理を実現するプログラムは、ＣＤ−ＲＯＭ等の記録媒体１０１によって提供される。プログラムが記録された記録媒体１０１がドライブ装置１００にセットされると、プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。 A program that realizes processing in the security attribute estimation server 10 is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 on which the program is recorded is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100.

補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。演算処理装置１０４は、メモリ装置１０３に格納されたプログラムに従ってセキュリティ属性推定サーバ１０に係る機能を実行する。インタフェース装置１０５はネットワークに接続するためのインタフェースとして用いられる。 The auxiliary storage device 102 stores the installed program and also stores necessary files and data. The memory device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The arithmetic processing unit 104 executes a function related to the security attribute estimation server 10 in accordance with a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network.

以下、セキュリティ管理システム１の処理手順について説明する。図４は、文書サーバからの文書及びセキュリティ属性値のアップロード時の処理を説明するためのシーケンス図である。 The processing procedure of the security management system 1 will be described below. FIG. 4 is a sequence diagram for explaining processing at the time of uploading a document and a security attribute value from the document server.

ステップＳ１０１において、文書サーバ２０は、文書及びそのセキュリティ属性値をセキュリティ属性推定サーバ１０に送信する。本ステップは、定期的、文書サーバ２０に文書がアップロードされた際、又は文書サーバ２０の文書ＤＢ２１に蓄積されている文書が更新された際等、必要に応じて適宜実行される。また、必ずしも一の文書が対象であるとは限らず、複数の文書及びそのセキュリティ属性値が送信対象となり得る。 In step S <b> 101, the document server 20 transmits the document and its security attribute value to the security attribute estimation server 10. This step is appropriately executed as necessary when a document is uploaded to the document server 20 or when a document stored in the document DB 21 of the document server 20 is updated. In addition, a single document is not necessarily a target, and a plurality of documents and their security attribute values can be transmission targets.

文書及びそのセキュリティ属性値を受信したセキュリティ属性推定サーバ１０のデータ受信部１２１は、文書については、テキスト情報抽出部１２２及び画像情報形成部１２３のそれぞれに出力し（Ｓ１０２、Ｓ１０５）、セキュリティ属性値についてはデータ保存部１２３に出力する（Ｓ１１２）。 The data receiving unit 121 of the security attribute estimation server 10 that has received the document and its security attribute value outputs the document to the text information extracting unit 122 and the image information forming unit 123, respectively (S102, S105). Is output to the data storage unit 123 (S112).

文書を受け取ったテキスト情報抽出部１２２は、文書から記号として含まれているテキスト情報（文書からのテキスト情報、図中では「テキスト情報Ａ」と表記）を抽出し（Ｓ１０３）、抽出されたテキスト情報をデータ保存部１２３出力する（Ｓ１０４）。 The text information extraction unit 122 that has received the document extracts text information included as a symbol from the document (text information from the document, expressed as “text information A” in the drawing) (S103), and the extracted text The information is output to the data storage unit 123 (S104).

画像情報形成部１２３は、文書に基づいて画像形成処理によって画像データを生成し（Ｓ１０６）、生成された画像データをテキスト情報抽出部１２２に出力する（Ｓ１０７）。画像情報形成部１２３より画像データを受け取ったテキスト情報抽出部１２２は、ＯＣＲ技術を利用して画像データより画像として含まれているテキスト情報（画像からのテキスト情報、図中では「テキスト情報Ｂ」と表記）を抽出する（Ｓ１０８）。テキスト情報抽出部１２２は、抽出された画像からのテキスト情報を、ステップＳ１０３において抽出された文書からのテキスト情報と共に誤認識情報抽出部１２４に出力する（Ｓ１０９）。 The image information forming unit 123 generates image data by image forming processing based on the document (S106), and outputs the generated image data to the text information extracting unit 122 (S107). The text information extraction unit 122 that has received the image data from the image information forming unit 123 uses the OCR technique to include text information (text information from the image, “text information B” in the figure) included as an image from the image data. Are extracted (S108). The text information extraction unit 122 outputs the text information from the extracted image to the misrecognition information extraction unit 124 together with the text information from the document extracted in step S103 (S109).

誤認識情報抽出部１２４は、文書からのテキスト情報と画像からのテキスト情報とを比較することにより、画像からのテキスト情報におけるＯＣＲによる誤認識の有無を判定する（Ｓ１１０）。 The misrecognition information extraction unit 124 compares the text information from the document with the text information from the image to determine the presence or absence of misrecognition due to OCR in the text information from the image (S110).

より詳しくは、誤認識情報抽出部１２４は、文書からのテキスト情報と画像からのテキスト情報のそれぞれを所定の規則に従って、一つ以上のブロック（以下「キーブロック」という。）に分割し、キーブロックごとに両者の文字列を比較することにより、キーブロック単位で誤認識の有無を判定する。すなわち、両者の文字列が一致しなかった場合、当該キーブロックに係る画像からのテキスト情報には誤認識が有るものと判定される。 More specifically, the misrecognition information extraction unit 124 divides each of the text information from the document and the text information from the image into one or more blocks (hereinafter referred to as “key blocks”) according to a predetermined rule. By comparing both character strings for each block, the presence or absence of erroneous recognition is determined in units of key blocks. That is, if the character strings do not match, it is determined that the text information from the image related to the key block has a misrecognition.

キーブロックの単位は、例えば、以下のようなものが考えられる。
（１）テキスト情報全体をそのまま一つのキーブロックとする。この場合、テキスト情報の分割は行われない。
（２）改行コードをキーブロックの区切りとする。この場合、改行コードごとにテキスト情報が分割され、キーブロックとなる。
（３）句点、読点、カンマ、ピリオド、引用符等の通常の文書で利用される記号をキーブロックの区切りとする。
（４）タブ、スペースをキーブロックの区切りとする。 As the unit of the key block, for example, the following can be considered.
(1) The entire text information is directly used as one key block. In this case, the text information is not divided.
(2) A line feed code is used as a key block delimiter. In this case, the text information is divided for each line feed code to form a key block.
(3) Symbols used in ordinary documents such as punctuation marks, punctuation marks, commas, periods, and quotation marks are used as key block delimiters.
(4) Tabs and spaces are used as key block delimiters.

上記（１）〜（４）のうちいずれか一つを行ってもよいし、二つ以上を組み合わせてもよい。また、（１）〜（４）に挙げたような単純な区切りではなく、形態素解析により名詞であることを判別し、その名詞をキーブロックとしてもよい。 Any one of the above (1) to (4) may be performed, or two or more may be combined. In addition, instead of the simple divisions as listed in (1) to (4), it is possible to determine a noun by morphological analysis and use the noun as a key block.

誤認識情報抽出部１２４は、誤認識が検出されたキーブロックについては、当該キーブロックに係る文書からのテキスト情報と画像からのテキスト情報を関連付けて誤認識情報としてデータ保存部１２５に出力する。なお、複数のキーブロックに関して誤認識が検出された場合は、複数の誤認識情報がデータ保存部１２５に出力される。すなわち、誤認識情報は、キーブロック単位における正しい認識結果と誤認識結果との対応表を構成する。 The misrecognition information extraction unit 124 associates the text information from the document related to the key block with the text information from the image and outputs it to the data storage unit 125 as misrecognition information for the key block in which the misrecognition is detected. Note that if misrecognition is detected for a plurality of key blocks, a plurality of misrecognition information is output to the data storage unit 125. That is, the misrecognition information constitutes a correspondence table between correct recognition results and misrecognition results in key block units.

続いてデータ保存部１２５は、データ受信部１２１より出力されたセキュリティ属性値と、テキスト情報抽出部１２２より抽出された文書からのテキスト情報と、誤認識情報抽出部１２４より出力された誤認識情報とを関連付けてテキスト情報ＤＢ１１に登録し（Ｓ１１３、Ｓ１１４）、その処理結果を示す情報、例えば、正常終了又は異常終了の別等をデータ送信部１２６に出力する（Ｓ１１５）。データ送信部１２６は、処理結果を示す情報を文書サーバ２０に送信し（Ｓ１１６）、一連の処理が終了する。 Subsequently, the data storage unit 125, the security attribute value output from the data reception unit 121, the text information from the document extracted from the text information extraction unit 122, and the erroneous recognition information output from the erroneous recognition information extraction unit 124. Are registered in the text information DB 11 (S113, S114), and information indicating the processing result, for example, whether normal termination or abnormal termination is output to the data transmission unit 126 (S115). The data transmission unit 126 transmits information indicating the processing result to the document server 20 (S116), and a series of processing ends.

このように、予め文書サーバ２０における文書からテキスト情報を抽出し、抽出されたテキスト情報をセキュリティ属性値と関連付けてテキスト情報ＤＢ１１に蓄積しておくことで、後述するセキュリティ属性値の推定処理のたびに文書サーバ２０からの文書等の取得、及び当該文書からのテキスト情報の抽出等を行う必要がなく、セキュリティ属性値の推定処理を高速化することができる。 In this way, by extracting text information from a document in the document server 20 in advance and storing the extracted text information in the text information DB 11 in association with the security attribute value, each time a security attribute value estimation process described later is performed. In addition, it is not necessary to acquire a document or the like from the document server 20 and extract text information from the document, so that the security attribute value estimation process can be speeded up.

なお、ステップＳ１０６の画像データの生成処理においては、複数の機種のプリンタ装置によるそれぞれの画像形成処理を用いて一つの文書から画像特性の異なる複数の画像データを生成しておくことが好ましい。その場合、ステップＳ１０８において、画像特性の異なるそれぞれの画像データよりテキスト情報が抽出され、ステップＳ１１０において、画像特性の異なるそれぞれの画像データより抽出されたテキスト情報について誤認識の判定及び誤認識情報が生成されることが好ましい。 In the image data generation process in step S106, it is preferable to generate a plurality of pieces of image data having different image characteristics from one document by using respective image forming processes by a plurality of types of printer apparatuses. In that case, in step S108, text information is extracted from each image data having different image characteristics, and in step S110, determination of misrecognition and misrecognition information are performed on the text information extracted from each image data having different image characteristics. Preferably it is produced.

また、ステップＳ１０６において生成された画像データに対してスキャナ装置による画像処理等に基づく画像の変化をシミュレートして、複合機５０において実際に印刷及びスキャンが行われたものに近い画像データを生成し、その画像データからテキスト情報を抽出するようにしてもよい。 Further, the image data generated in step S106 is simulated for image change based on image processing by the scanner device, and image data close to what is actually printed and scanned in the multi-function device 50 is generated. Then, text information may be extracted from the image data.

更に、実際に印刷された原稿からスキャナで実際に読み取られた画像データを文書と共に文書ＤＢ２１に蓄積しておき、文書ＤＢ２１に蓄積されている画像データよりテキスト情報を抽出してもよい。 Further, the image data actually read by the scanner from the actually printed document may be stored in the document DB 21 together with the document, and the text information may be extracted from the image data stored in the document DB 21.

いずれの場合においても、印刷時やスキャナ時にユーザが行うと想定される全ての設定に基づく画像データを用意することが好ましい。 In any case, it is preferable to prepare image data based on all the settings assumed to be performed by the user at the time of printing or scanning.

続いて、テキスト情報ＤＢ１１に蓄積された情報を利用して、セキュリティ属性推定サーバ１０が、複合機５０より転送される画像データのセキュリティ属性値を推定する処理について説明する。図５は、複合機により読み取られた画像データに対するセキュリティ属性値の推定処理を説明するためのシーケンス図である。 Next, a process in which the security attribute estimation server 10 estimates the security attribute value of image data transferred from the multi-function device 50 using information stored in the text information DB 11 will be described. FIG. 5 is a sequence diagram for explaining a security attribute value estimation process for image data read by the multifunction machine.

ステップＳ１２１において、複合機５０は、スキャンされた画像データと共に当該画像データのセキュリティ属性値の推定要求をセキュリティ属性推定サーバ１０に送信する。画像データの送信は、スキャナ又はコピー機能によってスキャンが実行されるタイミングで随時行ってもよいし、スキャンされた画像データがある程度蓄積されたタイミングや定期的に複数の画像データについてまとめて行ってもよい。 In step S <b> 121, the multi-function device 50 transmits a security attribute value estimation request for the image data together with the scanned image data to the security attribute estimation server 10. The transmission of the image data may be performed at any time when the scan is executed by the scanner or the copy function, or may be performed at a timing when the scanned image data is accumulated to some extent or periodically for a plurality of image data. Good.

ステップＳ１２１に続いてステップＳ１２２に進み、セキュリティ属性推定サーバ１０のデータ受信部１３１は、受信した画像データをテキスト情報抽出部１３２に出力する（Ｓ１２２）。テキスト情報抽出部１３２は、ＯＣＲ技術を利用して画像データよりテキスト情報を抽出し（Ｓ１２３）、抽出されたテキスト情報（対象テキスト）を類似度算出部１３３に出力する（Ｓ１２４）。類似度算出部１３３が、データ読み出し部１３４に文書からのテキスト情報と誤認識情報との読み出しを要求すると（Ｓ１２５）、データ読み出し部１３４は、テキスト情報ＤＢ１１に蓄積されている一部又は全ての文書からのテキスト情報及び誤認識情報を読み出し（Ｓ１２６）、類似度算出部１３３に出力する（Ｓ１２７）。 Progressing to step S122 following step S121, the data receiving unit 131 of the security attribute estimation server 10 outputs the received image data to the text information extracting unit 132 (S122). The text information extraction unit 132 extracts text information from the image data using the OCR technique (S123), and outputs the extracted text information (target text) to the similarity calculation unit 133 (S124). When the similarity calculation unit 133 requests the data reading unit 134 to read the text information and the misrecognition information from the document (S125), the data reading unit 134 may include a part or all of the data stored in the text information DB 11. Text information and misrecognition information from the document are read (S126), and output to the similarity calculation unit 133 (S127).

類似度算出部１３３は、対象テキストとそれぞれの文書からのテキスト情報との類似度を、誤認識情報を利用して算出し（Ｓ１２８）、算出された類似度をセキュリティ属性推定部１３５に出力する（Ｓ１２９）。類似度算出部１３３による類似度の算出処理の詳細については後述する。 The similarity calculation unit 133 calculates the similarity between the target text and the text information from each document using the misrecognition information (S128), and outputs the calculated similarity to the security attribute estimation unit 135. (S129). Details of the similarity calculation processing by the similarity calculation unit 133 will be described later.

セキュリティ属性推定部１３５は、類似度に基づいて、対象テキストのセキュリティ属性値を推定するために参考となる文書からのテキスト情報（以下「参考テキスト」という。）を特定し、参考テキストのセキュリティ属性値の読み出しをデータ読み出し部１３４に要求する（Ｓ１３０）。なお、後述するように参考テキストは一つとは限らない。 Based on the similarity, the security attribute estimation unit 135 identifies text information (hereinafter referred to as “reference text”) from a reference document for estimating the security attribute value of the target text, and the security attribute of the reference text The data reading unit 134 is requested to read the value (S130). As will be described later, the reference text is not limited to one.

データ読み出し部１３４は、テキスト情報ＤＢ１１より参考テキストに関連付けられているセキュリティ属性値を読み出し（Ｓ１３１）、セキュリティ属性推定部１３５に出力する（Ｓ１３２）。セキュリティ属性推定部１３５は、所定の方法（以下「推定方法」という。）にしたがって、読み出されたセキュリティ属性値に基づいて対象テキストに適用するセキュリティ属性値を推定し（Ｓ１３３）、推定結果として対象テキストに適用するセキュリティ属性値をデータ送信部１３６に出力する（Ｓ１３４）。データ送信部１３６は、推定されたセキュリティ属性値を複合機５０に送信し（Ｓ１３５）、処理が終了する。 The data reading unit 134 reads the security attribute value associated with the reference text from the text information DB 11 (S131), and outputs it to the security attribute estimation unit 135 (S132). The security attribute estimation unit 135 estimates a security attribute value to be applied to the target text based on the read security attribute value according to a predetermined method (hereinafter referred to as “estimation method”) (S133), The security attribute value applied to the target text is output to the data transmission unit 136 (S134). The data transmission unit 136 transmits the estimated security attribute value to the multi-function device 50 (S135), and the process ends.

推定結果を受信した複合機５０は、推定されたセキュリティ属性値に基づいて、かかるセキュリティ属性値を有する文書に対するアクセス権情報を入手したり、自らアクセス権限を判定したり、又は、推定結果を文書管理責任者に通知し、その応答を利用してコピーやスキャン要求に対する処理を制御する。例えば、スキャンされた画像データを削除したり、コピーを中止したり、管理者にスキャンデータを送信したり、スキャンデータをログに関連付けて保存したり、管理者に警告を送信したり、操作パネルに警告を表示したりしてもよい。これらは、それぞれを単独で行ってもよいし、複数を組み合わせて行うのでもよい。 The multi-function device 50 that has received the estimation result obtains access right information for the document having the security attribute value based on the estimated security attribute value, determines the access right by itself, or reports the estimation result to the document. The manager is notified, and the response to the copy or scan request is controlled using the response. For example, you can delete scanned image data, stop copying, send scan data to an administrator, save scan data in association with a log, send an alert to an administrator, A warning may be displayed. Each of these may be performed alone or in combination.

図５における類似度算出部１３３による対象テキストとそれぞれの文書からのテキスト情報との類似度の算出（Ｓ１２８）について説明する。類似度の算出は、公知の様々な技術を用いておこなってもよいが、例えば、以下のように行ってもよい
まず、対象テキストを一つ以上のキーブロックに分割する。ここでのキーブロックの単位は、上述したステップＳ１１０においてもちいられるキーブロックの単位と同じである必要がある。したがって、キーブロックの単位を規定する情報は、テキスト情報保存手段１２及びセキュリティ属性推定手段１３の双方からアクセス可能な記憶領域に保存されている必要がある。 The calculation of the similarity between the target text and the text information from each document (S128) by the similarity calculation unit 133 in FIG. 5 will be described. The similarity may be calculated using various known techniques. For example, the similarity may be calculated as follows. First, the target text is divided into one or more key blocks. The unit of the key block here needs to be the same as the unit of the key block used in step S110 described above. Therefore, the information defining the key block unit needs to be stored in a storage area accessible from both the text information storage unit 12 and the security attribute estimation unit 13.

続いて、対象テキストのキーブロックごとに誤認識情報における全ての画像からのテキスト情報のキーブロックとの比較を行う。ここで、画像特性の異なる複数の画像データより画像からのテキスト情報が抽出されている場合は、それぞれの画像からのテキスト情報との比較を行うことが好ましい。 Subsequently, each key block of the target text is compared with key blocks of text information from all images in the misrecognition information. Here, when text information from an image is extracted from a plurality of image data having different image characteristics, it is preferable to perform comparison with text information from each image.

比較の結果、両者が一致した場合は、対象テキストのキーブロックは、誤認識情報において当該画像データからのテキスト情報のキーブロックに関連付けられている文書からのテキスト情報のキーブロックに置き換えることによりＯＣＲによる誤認識を吸収する。 As a result of the comparison, if the two match, the key block of the target text is replaced by the OCR by replacing the key block of the text information from the document associated with the key block of the text information from the image data in the misrecognition information. Absorb misrecognition by

以上の処理の後、以下の式で対象テキストと文書からのテキスト情報との類似度を求める。 After the above processing, the similarity between the target text and the text information from the document is obtained by the following formula.

各変数の意味は以下の通りである。
S_i： i番目の保存テキストに対する類似度
BF：対象テキストから抽出されたキーブロック数
WBj：j番目のキーブロックの文字数
BA_ij： i番目の保存テキストに含まれているj番目のキーブロック数
WA_i： i番目の保存テキストの文字数
N：テキスト情報ＤＢ１１に蓄積されている保存テキストの数
なお、対象テキスト全体を一つのキーブロックとした場合は、文書の内容全文が複合機５０においてコピー等されようとしている場合は、類似度は「１」となる。

The meaning of each variable is as follows.
S _i : Similarity to the i-th stored text
BF: Number of key blocks extracted from the target text
WBj: Number of characters in the jth key block
BA _ij : Number of the jth key block contained in the i th saved text
WA _i : Number of characters in the i-th stored text
N: Number of stored texts stored in the text information DB 11 If the entire target text is a single key block, if the entire text of the document is about to be copied by the multi-function device 50, the similarity is “1”.

また、図５において、セキュリティ属性推定部１３５によるセキュリティ属性値の推定の際（Ｓ１３３）の推定方法は、例えば、以下のようなものでもよい。
（１）類似度が一番大きい文書からのテキスト情報に係る文書のセキュリティ属性値をそのまま対象テキストのセキュリティ属性値として推定する。
（２）類似度の上位数件の文書からのテキスト情報に係る文書のセキュリティ属性値のうち、最もセキュリティ属性値として厳しいものを対象テキストのセキュリティ属性値として推定する。
（３）類似度の上位数件の文書からのテキスト情報に係る文書のセキュリティ属性値の平均値を対象テキストのセキュリティ属性値として推定する。
（４）類似度の上位数件の文書からのテキスト情報に係る文書のセキュリティ属性値の一覧を対象テキストのセキュリティ属性値として推定する。すなわち、複数のセキュリティ属性値の候補をそのまま次工程（ここでは、複合機５０）に通知し、最終的にどのように利用するかは次工程に委ねる。 In FIG. 5, the estimation method used when the security attribute value is estimated by the security attribute estimation unit 135 (S133) may be as follows, for example.
(1) The security attribute value of the document related to the text information from the document having the highest similarity is estimated as the security attribute value of the target text as it is.
(2) Estimate the security attribute value of the target text that is the most strict as the security attribute value among the security attribute values of the documents related to the text information from the top few documents with similarities.
(3) The average value of the security attribute values of the documents related to the text information from the several documents with the highest similarity is estimated as the security attribute value of the target text.
(4) A list of security attribute values of documents related to text information from a few documents with the highest similarity is estimated as the security attribute value of the target text. That is, a plurality of security attribute value candidates are directly notified to the next process (in this case, the multi-function device 50), and finally how to use them is left to the next process.

（１）〜（４）については、いずれか一つの方法を用いてもよいが、対象となるセキュリティ属性に応じて選択できるようにしてもよい。すなわち、秘密レベルが、例えば、レベル1、レベル2、レベル3、と線形に定義されているような場合は、（２）、（３）の方法が適当である場合が多いと考えられる。また、セキュリティ属性が、秘密保持期限、有効期限、保存期限のような場合は、（２）の方法が適当である場合が多いと考えられる。一方、セキュリティ属性が、所属、種類、関係者、関係グループ等のような場合は、（１）又は（４）の方法が適当である場合が多いと考えられる。 As for (1) to (4), any one method may be used, but it may be selected according to the target security attribute. That is, when the secret level is defined linearly as, for example, level 1, level 2, and level 3, it is considered that the methods (2) and (3) are often appropriate. In addition, when the security attribute is a secret retention period, an expiration date, or a storage period, it is considered that the method (2) is often appropriate. On the other hand, when the security attribute is such as affiliation, type, party, relationship group, etc., it is considered that the method (1) or (4) is often appropriate.

上述したように本発明の実施の形態におけるセキュリティ属性推定サーバ１０によれば、文書ＤＢ２１に管理されている文書について「文書からのテキスト情報」と「画像からのテキスト情報」とを関連付けて管理しておく。そして、複合機５０によって読み取られた画像データと文書ＤＢ２１に管理されている文書との類否判断に際し、複合機５０によって読み取られた画像データよりＯＣＲによって抽出されたテキスト情報と、「画像からのテキスト情報」とを比較し、両者が一致又は類似する場合は、前者のテキスト情報は、「文書からのテキスト情報」であるものとして扱うことにより、ＯＣＲによる誤認識を吸収することができる。したがって、前記類否判断に際しＯＣＲの認識率への依存度を低下させることができ、画像データに画像として含まれているテキスト情報を適切に判定することができる。よって、前記類否判断の精度を向上させることができ、ひいては、画像データにとってより適当な文書のセキュリティ情報を当該画像データに適用させることができる。 As described above, according to the security attribute estimation server 10 in the embodiment of the present invention, “text information from a document” and “text information from an image” are managed in association with each other in a document managed in the document DB 21. Keep it. When determining the similarity between the image data read by the multi-function device 50 and the document managed in the document DB 21, the text information extracted by the OCR from the image data read by the multi-function device 50, and “from image” If the text information is compared with each other and the two match or are similar, the former text information is treated as “text information from the document”, so that misrecognition by OCR can be absorbed. Therefore, the dependency on the recognition rate of OCR can be reduced in the similarity determination, and the text information included as an image in the image data can be appropriately determined. Therefore, the accuracy of the similarity determination can be improved, and as a result, document security information more suitable for the image data can be applied to the image data.

以上、本発明の実施例について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As mentioned above, although the Example of this invention was explained in full detail, this invention is not limited to the specific embodiment which concerns, In the range of the summary of this invention described in the claim, various deformation | transformation * It can be changed.

本発明の実施の形態におけるセキュリティ管理システムの構成例を示す図である。It is a figure which shows the structural example of the security management system in embodiment of this invention. 本発明の実施の形態におけるセキュリティ属性推定サーバの機能構成例を示す図である。It is a figure which shows the function structural example of the security attribute estimation server in embodiment of this invention. 本発明の実施の形態におけるセキュリティ属性推定サーバのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the security attribute estimation server in embodiment of this invention. 文書サーバからの文書及びセキュリティ属性値のアップロード時の処理を説明するためのシーケンス図である。FIG. 10 is a sequence diagram for explaining processing when uploading a document and a security attribute value from a document server. 複合機により読み取られた画像データに対するセキュリティ属性値の推定処理を説明するためのシーケンス図である。FIG. 11 is a sequence diagram for explaining security attribute value estimation processing for image data read by a multi-function peripheral.

符号の説明Explanation of symbols

１セキュリティ管理システム
１０セキュリティ属性推定サーバ
１１テキスト情報ＤＢ
１２テキスト情報保存手段
１３セキュリティ属性推定手段
２０文書サーバ
２１文書ＤＢ
５０複合機
２２ａ、２２ｂクライアント
１００ドライブ装置
１０１記録媒体
１０２補助記憶装置
１０３メモリ装置
１０４演算処理装置
１０５インタフェース装置
１２１データ受信部
１２２テキスト情報抽出部
１２３画像情報形成部
１２４誤認識情報抽出部
１２５データ保存部
１２６データ送信部
１３１データ受信部
１３２テキスト情報抽出部
１３３類似度算出部
１３４データ読み出し部
１３５セキュリティ属性推定部
１３６データ送信部
Ｂバス 1 Security Management System 10 Security Attribute Estimation Server 11 Text Information DB
12 Text information storage means 13 Security attribute estimation means 20 Document server 21 Document DB
50 MFPs 22a and 22b Client 100 Drive device 101 Recording medium 102 Auxiliary storage device 103 Memory device 104 Processing unit 105 Interface device 121 Data receiving unit 122 Text information extracting unit 123 Image information forming unit 124 False recognition information extracting unit 125 Data storage Unit 126 data transmission unit 131 data reception unit 132 text information extraction unit 133 similarity calculation unit 134 data reading unit 135 security attribute estimation unit 136 data transmission unit B bus

Claims

画像データに画像として含まれているテキスト情報を判定する画像処理装置であって、
第一のテキスト情報を記号として含む文書データと、該文書データのアクセス制御を判定する際に用いられる属性であるセキュリティ属性の値とを受信する受信手段と、
前記文書データより前記第一のテキスト情報を取得する文書テキスト取得手段と、
一つの前記文書データに基づいて画像特性の異なる複数の画像データを生成する文書画像生成手段と、
前記文書画像生成手段によって生成された複数の画像データのそれぞれより光学的文字認識によって第二のテキスト情報を抽出する画像テキスト取得手段と、
前記第一のテキスト情報と、複数の前記第二のテキスト情報と、前記セキュリティ属性の値とを関連付けて保持する保持手段と、
前記セキュリティ属性の値の要求に応じて、該要求とともに受信した画像データより光学的文字認識によって第三のテキスト情報を抽出するテキスト情報抽出手段と、
前記第三のテキスト情報と、前記保持手段に保持されている前記複数の第二のテキスト情報との比較に基づいて、前記画像データに含まれているテキスト情報は当該第二のテキスト情報に関連付けられている前記第一のテキスト情報であると判定する判定手段と、
前記判定手段の結果に基づいて、前記第一のテキスト情報に関連付けられているセキュリティ属性の値を前記要求元へ送信する送信手段と、
を有することを特徴とする画像処理装置。 An image processing apparatus determines the text information included as an image images data,
Receiving means for receiving document data including first text information as a symbol, and a value of a security attribute that is an attribute used when determining access control of the document data;
Document text acquisition means for acquiring the first text information from the document data;
Document image generating means for generating a plurality of image data having different image characteristics based on one document data;
Image text acquisition means for extracting second text information by optical character recognition from each of a plurality of image data generated by the document image generation means;
Said first text information, and a plurality of the second text information, and holding means for holding in association with the value of the security attribute,
Text information extraction means for extracting third text information by optical character recognition from image data received with the request in response to a request for the value of the security attribute ;
Wherein a third of the text information, based on a comparison between the plurality of second text information held in said holding means, the text information contained in the pre-outs image data the second text information a determination unit that the said first text information associated with,
Based on a result of the determination means, a transmission means for transmitting a value of a security attribute associated with the first text information to the request source;
An image processing apparatus comprising:

前記第一のテキスト情報と前記第二のテキスト情報とを比較し、両テキスト情報が一致しない場合に当該第一のテキスト情報と当該第二のテキスト情報とを関連付けて前記保持手段に登録する登録手段を有することを特徴とする請求項１記載の画像処理装置。 Comparing the previous SL first text information and the second text information, registers the said first text information and the second text information in said holding means in association with the case where both the text information does not match the image processing apparatus according to claim 1, wherein a registering means.

前記登録手段は、前記第一のテキスト情報及び前記第二のテキスト情報のそれぞれを構成する所定の単位ごとに両テキスト情報を比較し、前記所定の単位ごとに前記保持手段に登録することを特徴とする請求項２記載の画像処理装置。 The registration means compares both text information for each predetermined unit constituting each of the first text information and the second text information, and registers the information in the holding means for each predetermined unit. The image processing apparatus according to claim 2 .

前記判定手段は、前記所定の単位ごとに前記第三のテキスト情報と前記第二のテキスト情報とを比較することを特徴とする請求項３項記載の画像処理装置。 The image processing apparatus according to claim 3 , wherein the determination unit compares the third text information with the second text information for each predetermined unit.

前記判定手段は、前記第三のテキスト情報との類似度が上位の前記第二のテキスト情報に関連付けられている前記第一のテキスト情報の中で最も厳しい前記セキュリティ属性の値に関連付けられている前記第一のテキスト情報を前記画像データに含まれているテキスト情報であると判定する請求項１乃至４いずれか一項記載の画像処理装置。The determination means is associated with the value of the security attribute that is the strictest among the first text information that is associated with the second text information having higher similarity with the third text information. The image processing apparatus according to claim 1, wherein the first text information is determined to be text information included in the image data.

前記判定手段は、前記セキュリティ属性が、秘密保持期限、有効期限、及び保存期限のいずれか一つである場合に、前記第三のテキスト情報との類似度が上位の前記第二のテキスト情報に関連付けられている前記第一のテキスト情報の中で最も厳しい前記セキュリティ属性の値に関連付けられている前記第一のテキスト情報を前記画像データに含まれているテキスト情報であると判定する請求項５記載の画像処理装置。When the security attribute is any one of a confidentiality expiration date, an expiration date, and a storage expiration date, the determination unit determines that the second text information having a higher similarity to the third text information is higher. 6. The first text information associated with the strictest value of the security attribute among the associated first text information is determined to be text information included in the image data. The image processing apparatus described.

コンピュータが画像データに画像として含まれているテキスト情報を判定する画像処理方法であって、
第一のテキスト情報を記号として含む文書データと、該文書データのアクセス制御を判定する際に用いられる属性であるセキュリティ属性の値とを受信する受信手順と、
前記文書データより前記第一のテキスト情報を取得する文書テキスト取得手順と、
一つの前記文書データに基づいて画像特性の異なる複数の画像データを生成する文書画像生成手順と、
前記文書画像生成手順によって生成された複数の画像データのそれぞれより光学的文字認識によって第二のテキスト情報を抽出する画像テキスト取得手順と、
前記第一のテキスト情報と、複数の前記第二のテキスト情報と、前記セキュリティ属性の値とを関連付けて保持する保持手順と、
前記セキュリティ属性の値の要求に応じて、該要求とともに受信した画像データより光学的文字認識によって第三のテキスト情報を抽出するテキスト情報抽出手順と、
前記第三のテキスト情報と、前記保持手順に保持されている前記複数の第二のテキスト情報との比較に基づいて、前記画像データに含まれているテキスト情報は当該第二のテキスト情報に関連付けられている前記第一のテキスト情報であると判定する判定手順と、
前記判定手順の結果に基づいて、前記第一のテキスト情報に関連付けられているセキュリティ属性の値を前記要求元へ送信する送信手順と、
を有することを特徴とする画像処理方法。 An image processing method that determines the text information included as an image on a computer Gae image data,
A reception procedure for receiving document data including first text information as a symbol and a value of a security attribute that is an attribute used when determining access control of the document data;
A document text acquisition procedure for acquiring the first text information from the document data;
A document image generation procedure for generating a plurality of image data having different image characteristics based on one document data;
Image text acquisition procedure for extracting second text information by optical character recognition from each of a plurality of image data generated by the document image generation procedure;
Said first text information, and a plurality of the second text information, a holding procedure to hold in association with the value of the security attribute,
A text information extraction procedure for extracting third text information by optical character recognition from image data received together with the request in response to a request for the value of the security attribute ;
Wherein a third of the text information, based on a comparison between the plurality of second text information held by the holding procedure, text information contained in the pre-outs image data the second text information a determination procedure is the first text information associated with,
A transmission procedure for transmitting a value of a security attribute associated with the first text information to the request source based on a result of the determination procedure;
An image processing method comprising:

画像データに画像として含まれているテキスト情報の判定をコンピュータに実行させる画像処理プログラムであって、
第一のテキスト情報を記号として含む文書データと、該文書データのアクセス制御を判定する際に用いられる属性であるセキュリティ属性の値とを受信する受信手順と、
前記文書データより前記第一のテキスト情報を取得する文書テキスト取得手順と、
一つの前記文書データに基づいて画像特性の異なる複数の画像データを生成する文書画像生成手順と、
前記文書画像生成手順によって生成された複数の画像データのそれぞれより光学的文字認識によって第二のテキスト情報を抽出する画像テキスト取得手順と、
前記第一のテキスト情報と、複数の前記第二のテキスト情報と、前記セキュリティ属性の値とを関連付けて保持する保持手順と、
前記セキュリティ属性の値の要求に応じて、該要求とともに受信した画像データより光学的文字認識によって第三のテキスト情報を抽出するテキスト情報抽出手順と、
前記第三のテキスト情報と、前記保持手順に保持されている前記複数の第二のテキスト情報との比較に基づいて、前記画像データに含まれているテキスト情報は当該第二のテキスト情報に関連付けられている前記第一のテキスト情報であると判定する判定手順と、
前記判定手順の結果に基づいて、前記第一のテキスト情報に関連付けられているセキュリティ属性の値を前記要求元へ送信する送信手順と、
を有することを特徴とする画像処理プログラム。 An image processing program for executing the determination of the text information included as an image in the images data to the computer,
A reception procedure for receiving document data including first text information as a symbol and a value of a security attribute that is an attribute used when determining access control of the document data;
A document text acquisition procedure for acquiring the first text information from the document data;
A document image generation procedure for generating a plurality of image data having different image characteristics based on one document data;
Image text acquisition procedure for extracting second text information by optical character recognition from each of a plurality of image data generated by the document image generation procedure;
Said first text information, and a plurality of the second text information, a holding procedure to hold in association with the value of the security attribute,
A text information extraction procedure for extracting third text information by optical character recognition from image data received together with the request in response to a request for the value of the security attribute ;
Wherein a third of the text information, based on a comparison between the plurality of second text information held by the holding procedure, text information contained in the pre-outs image data the second text information a determination procedure is the first text information associated with,
A transmission procedure for transmitting a value of a security attribute associated with the first text information to the request source based on a result of the determination procedure;
An image processing program comprising:

請求項８記載の画像処理プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the image processing program according to claim 8 is recorded.