WO2019012570A1 - Document classification system and method, and accounting system and method - Google Patents

Document classification system and method, and accounting system and method Download PDF

Info

Publication number
WO2019012570A1
WO2019012570A1 PCT/JP2017/025058 JP2017025058W WO2019012570A1 WO 2019012570 A1 WO2019012570 A1 WO 2019012570A1 JP 2017025058 W JP2017025058 W JP 2017025058W WO 2019012570 A1 WO2019012570 A1 WO 2019012570A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
accounting
image data
classification
documents
Prior art date
Application number
PCT/JP2017/025058
Other languages
French (fr)
Japanese (ja)
Inventor
将人 藤武
顕 松田
啓太郎 森
Original Assignee
ファーストアカウンティング株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ファーストアカウンティング株式会社 filed Critical ファーストアカウンティング株式会社
Priority to PCT/JP2017/025058 priority Critical patent/WO2019012570A1/en
Priority to JP2017536900A priority patent/JP6504514B1/en
Publication of WO2019012570A1 publication Critical patent/WO2019012570A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Definitions

  • the present invention relates to a document classification system for classifying documents into types, a document classification method using the document classification system, an accounting system for accounting based on classified documents, and an accounting system using the accounting system It relates to the processing method.
  • books general ledger, journal, cash register, accounts receivable and payable ledger fixed asset ledger, sales and purchase book
  • settlement related documents inventory table, balance sheet, profit and loss
  • Other documents contracts, receipts, quotations, invoices, orders, contract applications, invoices, inspection documents
  • business cards Other documents not necessarily related to accounting, such as company profile and company profile, are stored.
  • the present invention relates to a document classification system for classifying documents, image classification method using the document classification system, an accounting processing system for accounting based on classified documents, and an accounting system for image data of documents. It is an issue to provide an accounting method that uses
  • Documents stored in the above companies (1) to (4) are not limited to printed paper media, but are often stored as image data optically read by a scanner or the like. With regard to the documents in (1) to (3) above, they are national tax related book documents, and in particular (3) can be scanned and stored electronically. On the other hand, accounting software and accounting services need only input invoices, receipts, accounting personnel and tax accountants and accountants from the above-mentioned vast scanned documents, invoices and receipts It is in a situation where we have to work to take out the book.
  • Receipts are often smaller horizontal documents, while other documents such as bills are A4 size vertical documents. While other documents are printed on white paper, the restaurant's handwritten receipts are often entirely colored documents.
  • receipts of JR and other railways and subways are colored such as blue and orange, and often have a rectangular shape. Revenue stamps, which are not affixed to other documents, are often affixed to receipts. It is possible to classify documents according to the feature on an image without text information.
  • the present invention performs document processing based on a document classification system which classifies documents into the types without necessarily performing OCR processing, a document classification method using the document classification system, and the classified documents.
  • An accounting system and an accounting method using the accounting system are provided.
  • the document classification method of the present invention is A document classification method for classifying documents for data of documents, It is characterized in that each document is input to the AI function to perform classification.
  • the “AI function” refers to a function of making a determination regarding each data based on a large number of data.
  • the document classification system of the present invention is A document classification system that classifies documents for document data, and It is characterized in that it has an AI function that classifies each document.
  • the document classification system of the present invention is The AI function is characterized by performing classification on each document based on the result of prior learning.
  • the document classification system of the present invention is The prior learning is supervised learning based on data given document types,
  • the AI function is characterized by performing classification to specify the type of document.
  • the document classification system of the present invention is The prior learning is unsupervised learning based on data not given document types,
  • the AI function is characterized by performing classification without specifying the type of document.
  • the type of document is not specified at the time of learning, it can be classified by similar document group (cluster). Since the user can determine, for example, which category the invoice is based on the classified result, it is sufficiently usable even if the type of the document is not specified by the document classification system.
  • the document classification system of the present invention is The AI function is to input and classify a plurality of documents, and perform classification without specifying the type of the document based on the plurality of documents without performing prior learning.
  • the document classification system of the present invention is The type of the document is characterized by including a receipt and a bill.
  • the document classification system of the present invention is The AI function is characterized by determining whether or not the revenue stamp is attached.
  • the document classification system of the present invention is The AI function is characterized by determining the presence or absence of a seal and the shape of the sealed seal.
  • the document classification system of the present invention is The data of the document is image data
  • the AI function includes at least one of a color of the image data, a shape of the image data, a color of a section of the image data having a color different from the background, and a shape of the image data having a color different from the background It is characterized in that classification is performed on the basis of.
  • the document classification system of the present invention is The data of the document contains text data,
  • the AI function is characterized by performing classification based on characters and contents described in the character data.
  • classification using features included in character data is possible. For example, in the case of a receipt, a date, an amount, a company name, and a product name are often described. In the case of a bill, date, amount, supplier name, company name of sending source, telephone number, fax number, product name, number of products, etc. are often described. With regard to contracts, there is a unique wording used in contracts, which is considered to be a feature.
  • the document classification method of the present invention is It is characterized by using a document classification system including a receipt and a bill in the type of the document.
  • a document classification method for classifying receipts and invoices is provided.
  • the accounting method of the present invention is Separating the documents according to a document classification method using a document classification system including receipts and invoices in the document type; Performing OCR processing on image data whose type of document is a receipt and a bill, and inputting the result into accounting software.
  • an accounting method that performs OCR on receipts and invoices, and causes accounting software to process the results based on the results.
  • the accounting system of the present invention is A document classification system including receipts and invoices in the types of documents; An OCR processing unit that performs OCR processing on the image data with respect to a document whose type of the document classified by the document classification system is a receipt and a bill; And accounting processing software for performing accounting processing based on the character string output from the OCR processing unit.
  • an accounting system that performs OCR on receipts and invoices, and processes the accounting software based on the results.
  • the accounting system of the present invention is The accounting software may output an error if the string is nonconforming.
  • an accounting system that outputs an error (does not process the document) if the classification of the document is not suitable. It is a robust accounting system even if there are errors in document classification.
  • the accounting method of the present invention is Use an accounting system that outputs an error if the string is nonconforming, It is characterized in that supervised learning is performed on the image data for which the accounting method software outputs an error.
  • an accounting method which re-learns the document in which the error is output and improves the accuracy of the document classification.
  • a document classification system for classifying documents into the types, a document classification method using the document classification system, an accounting system for accounting based on the classified documents, and the accounting system Accounting methods are provided.
  • FIG. 1 is a diagram showing the configuration of a document classification system and an accounting system.
  • FIG. 2 is a diagram showing a document.
  • FIG. 1 is a diagram showing the configuration of a document classification system and an accounting system.
  • the document classification system 1 includes an AI function 11 and a learning unit 12, and holds learning data 12a and learning results 12b.
  • the document classification system 1 reads the image data 13 and classifies it into invoice image data 13a, receipt image data 13b, and other image data 13c by the AI function 11.
  • the bill image data 13a, the receipt image data 13b, and the other image data 13c are an example of classification, and may be classified into other types of image data.
  • the AI function 11 classifies the image data 13 with reference to the learning result 12 b.
  • the format of the learning result 12 b may be determined based on the specification of the AI function 11. Typically, it is a boundary value for each classification related to the feature value calculated from each image data.
  • the learning unit 12 calculates, for example, a feature amount based on the learning data 12 a and outputs a learning result 12 b.
  • the learning data 12 a it is assumed that the type of document is added to the image data 13. That is, the learning unit 12 performs supervised learning by knowing the classified types of the image data. Learning other than supervised learning is described in the second embodiment and the following.
  • the accounting system 2 includes a document classification system 1, an OCR processor 21, and accounting software 22.
  • the OCR processing unit 21 reads characters such as a document name, a summary, and an amount from the image data 13 by OCR processing. Although any image data can be processed, it is assumed that the OCR process is performed on the bill image data 13a and the receipt image data 13b.
  • the accounting software 22 prepares financial statements and performs other accounting processes based on the characters read by the OCR processing unit 21.
  • the read character is not suitable for accounting (for example, if the bill does not have the name of the bill source), an error is output.
  • the learning unit 12 Prior to the operation of the document classification system 1, the learning unit 12 generates a learning result 12 b.
  • the learning data 12a past documents (processed correctly) of the user company can be used.
  • the types of documents are "bill”, “receipt” and “other”.
  • the learning unit 12 inputs the learning data 12 a to which “bill”, “receipt”, and “other” are added, and outputs the learning result 12 b.
  • the document is optically read by a scanner or the like to generate image data 13.
  • the image data 13 is classified by the AI function 11 into invoice image data 13a, receipt image data 13b, and other image data 13c.
  • the AI function 11 classifies the image data 13 with reference to the learning result 12b, and moves the bill image data 13a, the receipt image data 13b, and the other image data 13c to the bill folder, the receipt folder, and the other folder, respectively. It shall be.
  • the type of document is indicated by the folder in which the file is stored.
  • the OCR processing unit 21 uses OCR processing to process characters such as a document name, a payment, an amount of money, and the like for unprocessed items among the invoice image data 13a and the receipt image data 13b (files in the invoice folder and the receipt folder). read.
  • the accounting software 22 prepares financial statements and performs other accounting processes based on the characters read by the OCR processing unit 21.
  • the read character is not suitable for accounting (for example, if the bill does not have the name of the bill source), an error is output.
  • the image data 13 (invoice image data 13a or receipt image data 13b) for which an error has been output is highly likely to be a classification error of the AI function 11. Therefore, the correct type is added to the image data 13 (manually), and supervised learning is performed as the learning data 12a. Although unsupervised learning is possible, supervised learning is preferable in order to reduce classification errors.
  • FIG. 2 is a diagram showing a document. 2 (A) shows a bill, FIG. 2 (B) shows a receipt, and FIG. 2 (C) shows a contract.
  • FIGS. 2A to 2C colored portions are shaded, and sections having a color different from the background are hatched.
  • Invoices are often printed on A4 paper, as shown in FIG. 2A, and square marks are sealed.
  • FIG. 2 (B) the receipt is often a colored, horizontally long sheet, and a square mark is stamped and a revenue stamp is attached.
  • FIG. 2C the contract is often printed on A4 paper, and two circles are sealed.
  • the presence or absence of the color (white, color other than black) of the image data 13, the shape (values of H, W, H / W) of the image data 13, and the sections of the image data 13 having a color different from the background The color of the image data and the shape (h, w, h / w values) of the sections of the image data that have a color different from the background differ greatly depending on the type of document. Documents can be classified into their types without performing OCR processing using only these feature quantities.
  • the size of the document (values of H and W) can not be determined in the case of performing optical reading using a scanner on a background white document.
  • this is not a problem because many scanners have a document size detection function.
  • the document size detection function is not provided, for example, a black paper larger than the document can be solved by placing it on the back of the document.
  • the presence or absence of a color (a color other than white or black) of the image data 13 and the shape of the image data 13 H , W, H / W
  • the color of the section of the image data 13 having a color different from the background the shape of the section of the image data having a color different from the background (h, w, h / w values)
  • Documents can be classified into the types only by the feature amount.
  • documents can be classified to extract bills and receipts, and accounting can be performed using these.
  • the learning unit 12 performs unsupervised learning.
  • the other points are the same as in the first embodiment, and the detailed description is omitted.
  • the learning unit 12 can still classify the similar document groups (clusters).
  • classification clustering based on the distribution of features in the document in the learning data 12a is possible.
  • Example 1 a clear classification is possible since there are obvious features depending on the type of document.
  • the type of document Since the type of document is not given to the learning data 12a, it is only classified, and it is not determined which type is which type (bill, receipt, etc.). However, the user can determine, for example, which category the bill is based on the classified result.
  • the type of document can be determined as a classification of the determination result even in the unsupervised learning.
  • Accounting processing by the OCR processing unit 21 and the accounting software 22 is possible by the image data 13 according to the classification determined that the user is a bill or a receipt.
  • the learning unit 12 can obtain the same effect as that of the first embodiment even by performing unsupervised learning.
  • classification can be performed without prior learning. That is, if a large number of image data 13 that can only function as learning data 12a are classified at the same time, classification (clustering) of those image data 13 is possible without prior learning.
  • the data of the document includes text data.
  • the other points are the same as in the first and second embodiments, and the detailed description will be omitted.
  • Document data includes text data.
  • OCR processing can be performed to generate character data.
  • character data is included therein, from which the character data can be generated.
  • Character data contains information that is valid for document classification. For example, in the case of a receipt, a date, an amount, a company name, and a product name are often described. In the case of an invoice, the date, the amount, the supplier name, the company name of the sending source, the telephone number, the fax number, the item name, the number of items, etc. are often described. With regard to contracts, in place of company names, there are often used unique phrases characteristic of contracts such as “A” or “ ⁇ ”.
  • the AI function 11 can also classify the character string of the name of the document as a keyword, for example, the receipt includes the character string of “receipt”, and may be classified more clearly based on a large number of data. it can.
  • the AI function 11 also corresponds to the case where “estimate sheet number” is described as supplementary information on a bill.
  • the AI function 11 can obtain the reliability of each of classification based on character data and classification based on image data (probability of classification).
  • the classification based on the character data and the classification based on the image data may be used together to determine the final classification based on the reliability.
  • the classification based on character data and the classification based on image data are used in combination, it may be considered that the classification based on character data is only classified by the keyword of the character string of the name of the document.
  • the clearness of classification can be improved using character data.
  • This embodiment is intended to classify various documents which are not limited to accounting documents.
  • the other points are the same as in the first to third embodiments, and the detailed description will be omitted.
  • accounting documents such as invoices and receipts have been described as the image data 13 in the first and second embodiments
  • documents existing as image data in a company may be considered.
  • the image data 13 often has characteristics specific to the type of the document. For example, business cards are characterized by their size. In addition, primary materials such as red and blue are often used as explanatory materials.
  • classification by the document classification system 1 is possible as in the first and second embodiments.
  • documents other than accounting documents can be classified.
  • a document classification system that classifies documents a document classification method that uses the document classification system, an accounting system that performs accounting based on classified documents, and an accounting system that uses the accounting system It is a processing method. It can be used by many companies.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Discrimination (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Image Analysis (AREA)

Abstract

[Problem] To provide a document classification system that classifies documents on the basis of image data of the documents, a document classification method that uses the document classification system, an accounting system that carries out accounting on the basis of the classified documents, and an accounting method that uses the accounting system. [Solution] A document classification system 1 is provided that carries out classification only on the basis of the features of document image data 13 without executing an OCR process. As the features, color presence, a shape, and the color and the shape of a section having a color different from that of the background of the image data 13 can be used. A document classification method that uses the document classification system 1 is provided. An accounting system 2 is provided that extracts a bill and a receipt by classifying the image data 13 and carries out accounting by using the bill and the receipt. An accounting method that uses the accounting system 2 is provided.

Description

[規則26に基づく補充 14.09.2017] 書類分類システム及び方法並びに会計処理システム及び方法[Repletion based on rule 26 14.09.2017] Document classification system and method and accounting system and method
 本発明は、書類をその種類に分類する書類分類システム、該書類分類システムを利用する書類分類方法、分類された書類に基づいて会計処理を行う会計処理システム、及び該会計処理システムを利用する会計処理方法に関する。 The present invention relates to a document classification system for classifying documents into types, a document classification method using the document classification system, an accounting system for accounting based on classified documents, and an accounting system using the accounting system It relates to the processing method.
 企業において、(1)帳簿(総勘定元帳、仕訳帳、現金出納帳、売掛金・買掛金元帳固定資産台帳、売上・仕入帳)、(2)決算関係書類(棚卸表、貸借対照表、損益計算書、その他決算に関して作成した書類)、(3)その他の証憑類(契約書や領収書、見積書、請求書、注文書、契約の申込書、納品書、検収書)、(4)名刺や会社概要などその他の必ずしも会計に関しない書類が保存される。 In a company, (1) books (general ledger, journal, cash register, accounts receivable and payable ledger fixed asset ledger, sales and purchase book), and (2) settlement related documents (inventory table, balance sheet, profit and loss) Documents and other documents prepared for financial statements), (3) Other documents (contracts, receipts, quotations, invoices, orders, contract applications, invoices, inspection documents), (4) business cards Other documents not necessarily related to accounting, such as company profile and company profile, are stored.
  書類を画像データとして保存する場合、書類を分類することが重要になる。紙媒体であれば、一見して書類の種類が判別できるが、画像データの場合には判別するために画像を表示することとなるからである。例えば、書類を分類し、分類項目別のフォルダに画像データ(のファイル)を保存することが好ましい。 When storing documents as image data, it is important to classify the documents. In the case of a paper medium, the type of the document can be determined at a glance, but in the case of image data, an image is displayed for determination. For example, it is preferable to classify documents and store (files of) image data in folders classified by classification items.
 書類の分類としては、例えば特許文献1に示すように、書類中の文字情報を用いる方法が知られている。しかし、画像データには文字情報がなく(OCR処理によって文字情報を得ることは可能であるが必ずしも正確ではない)、画像データの分類に用いることのできる方法ではない。 As classification of documents, as shown, for example in patent documents 1, the method of using the character information in a document is known. However, there is no character information in image data (it is possible but not necessarily accurate to obtain character information by OCR processing), and it is not a method that can be used for classification of image data.
特開2007-323454号公報JP 2007-323454 A
 本発明は、書類の画像データについて、その書類を分類する書類分類システム、該書類分類システムを利用する書類分類方法、分類された書類に基づいて会計処理を行う会計処理システム、及び該会計処理システムを利用する会計処理方法を提供することを課題とする。 The present invention relates to a document classification system for classifying documents, image classification method using the document classification system, an accounting processing system for accounting based on classified documents, and an accounting system for image data of documents. It is an issue to provide an accounting method that uses
 上記(1)~(4)の企業において保存される書類は、印刷された紙媒体に限らず、スキャナ等によって光学的に読取られた画像データとして保存されることが多い。上記の(1)~(3)の書類に関しては国税関係帳簿書類であり、特に(3)はスキャンして電子保存することができるようになった。一方で、会計ソフトや会計サービスで入力が必要なものは、請求書、領収書だけであり、経理担当者や税理士や会計士の方では、上記の膨大なスキャンされた書類から、請求書と領収書を取り分ける作業をしないといけない状況になっている。 Documents stored in the above companies (1) to (4) are not limited to printed paper media, but are often stored as image data optically read by a scanner or the like. With regard to the documents in (1) to (3) above, they are national tax related book documents, and in particular (3) can be scanned and stored electronically. On the other hand, accounting software and accounting services need only input invoices, receipts, accounting personnel and tax accountants and accountants from the above-mentioned vast scanned documents, invoices and receipts It is in a situation where we have to work to take out the book.
 例えば、会計書類において領収書を他の書類と区別することを考える。請求書等の他の書類がA4サイズの縦長の書類であるのに対し、領収書はより小さな横長の書類であることが多い。他の書類が白色の紙に印刷されるのに対し、飲食店の手書きの領収書は全体が有色の書類であることが多い。また、JRや他の鉄道や地下鉄の領収書は、青やオレンジなどの有色であり、長方形という形状が多い。他の書類には貼付されない収入印紙が領収書には貼付されることが多い。文字情報のない画像に特徴によって書類の分類が可能である。 For example, consider distinguishing receipts from other documents in accounting documents. Receipts are often smaller horizontal documents, while other documents such as bills are A4 size vertical documents. While other documents are printed on white paper, the restaurant's handwritten receipts are often entirely colored documents. In addition, receipts of JR and other railways and subways are colored such as blue and orange, and often have a rectangular shape. Revenue stamps, which are not affixed to other documents, are often affixed to receipts. It is possible to classify documents according to the feature on an image without text information.
 本発明は、以上の事情に鑑み、必ずしもOCR処理を行うことなく書類をその種類に分類する書類分類システム、該書類分類システムを利用する書類分類方法、分類された書類に基づいて会計処理を行う会計処理システム、及び該会計処理システムを利用する会計処理方法を提供する。 In view of the above circumstances, the present invention performs document processing based on a document classification system which classifies documents into the types without necessarily performing OCR processing, a document classification method using the document classification system, and the classified documents. An accounting system and an accounting method using the accounting system are provided.
 本発明の書類分類方法は、
 書類のデータについて、その書類を分類する書類分類方法であって、
 AI機能に各々の書類を入力して分類を行わせることを特徴とする。
The document classification method of the present invention is
A document classification method for classifying documents for data of documents,
It is characterized in that each document is input to the AI function to perform classification.
 この特徴によれば、AI機能を用いた書類の分類を行うことができる。ここで「AI機能」とは、多数のデータに基づいて各々のデータに係る判断を行う機能を言う。 According to this feature, it is possible to classify documents using the AI function. Here, the “AI function” refers to a function of making a determination regarding each data based on a large number of data.
 本発明の書類分類システムは、
 書類のデータについて、その書類を分類する書類分類システムであって、
 各々の書類について分類を行うAI機能を備えることを特徴とする。
The document classification system of the present invention is
A document classification system that classifies documents for document data, and
It is characterized in that it has an AI function that classifies each document.
 この特徴によれば、AI機能を用いた書類の分類を行う書類分類システムを提供することができる。 According to this feature, it is possible to provide a document classification system that classifies documents using the AI function.
 本発明の書類分類システムは、
 前記AI機能は各々の書類について事前学習の結果に基づいて分類を行うことを特徴とする。
The document classification system of the present invention is
The AI function is characterized by performing classification on each document based on the result of prior learning.
 この特徴によれば、多数の書類による事前学習を行って、AI機能を最適化することができる。 According to this feature, it is possible to pre-learn a large number of documents to optimize the AI function.
 本発明の書類分類システムは、
 前記事前学習は、書類の種類を付与されたデータに基づく教師付き学習であり、
 前記AI機能は、書類の種類を特定する分類を行うことを特徴とする。
The document classification system of the present invention is
The prior learning is supervised learning based on data given document types,
The AI function is characterized by performing classification to specify the type of document.
 この特徴によれば、請求書、領収書等の書類の種類を特定して学習し、その学習結果に基づいて書類の種類を特定して分類することができる。 According to this feature, it is possible to identify and learn the types of documents such as invoices and receipts, and to identify and classify the types of documents based on the learning result.
 本発明の書類分類システムは、
 前記事前学習は、書類の種類を付与されないデータに基づく教師なし学習であり、
 前記AI機能は、書類の種類を特定しない分類を行うことを特徴とする。
The document classification system of the present invention is
The prior learning is unsupervised learning based on data not given document types,
The AI function is characterized by performing classification without specifying the type of document.
 この特徴によれば、学習時に書類の種類が特定されていなくとも、類似する書類群(クラスタ)毎に分類することができる。ユーザは、分類された結果に基づいて例えば請求書がいずれの分類であるかを判断することができるので、書類分類システムによって書類の種類が特定されないとしても十分に利用できる。 According to this feature, even if the type of document is not specified at the time of learning, it can be classified by similar document group (cluster). Since the user can determine, for example, which category the invoice is based on the classified result, it is sufficiently usable even if the type of the document is not specified by the document classification system.
 本発明の書類分類システムは、
 前記AI機能は、複数の書類を入力して分類するものであり、事前学習を行わずに該複数の書類に基づいて書類の種類を特定しない分類を行うことを特徴とする。
The document classification system of the present invention is
The AI function is to input and classify a plurality of documents, and perform classification without specifying the type of the document based on the plurality of documents without performing prior learning.
 この特徴によれば、事前の学習をせずとも分類が可能である。 According to this feature, classification is possible without prior learning.
 本発明の書類分類システムは、
 前記書類の種類は、領収書及び請求書を含むことを特徴とする。
The document classification system of the present invention is
The type of the document is characterized by including a receipt and a bill.
 この特徴によれば、会計処理を行う上で必須の領収書及び請求書について、分類して抽出することができる。 According to this feature, it is possible to classify and extract receipts and invoices that are essential for performing accounting processing.
 本発明の書類分類システムは、
 前記AI機能は、収入印紙の貼付の有無を判定することを特徴とする。
The document classification system of the present invention is
The AI function is characterized by determining whether or not the revenue stamp is attached.
 この特徴によれば、背景と異なる色彩を有する区画の色彩及び形状に基づいて収入印紙の貼付の有無を判定し、明確に分類を行うことができる。 According to this feature, it is possible to determine whether or not the revenue stamp is attached based on the color and shape of the section having a color different from the background, and to perform classification clearly.
 本発明の書類分類システムは、
 前記AI機能は、押印の有無及び押印された印の形状を判定することを特徴とする。
The document classification system of the present invention is
The AI function is characterized by determining the presence or absence of a seal and the shape of the sealed seal.
 この特徴によれば、背景と異なる色彩(朱肉の色)を有する区画の色彩及び形状に基づいて丸印及び角印の押印の有無を判定し、明確に分類を行うことができる。 According to this feature, based on the color and shape of the section having a color different from the background (the color of the minced meat), it is possible to determine the presence or absence of the seal of the circle mark and the square mark, and to classify clearly.
 本発明の書類分類システムは、
 前記書類のデータは画像データであり、
 前記AI機能は、前記画像データの色彩、前記画像データの形状、前記画像データのうち背景と異なる色彩を有する区画の色彩及び前記画像データのうち背景と異なる色彩を有する区画の形状のうち少なくとも1に基づいて分類を行うことを特徴とする。
The document classification system of the present invention is
The data of the document is image data,
The AI function includes at least one of a color of the image data, a shape of the image data, a color of a section of the image data having a color different from the background, and a shape of the image data having a color different from the background It is characterized in that classification is performed on the basis of.
 この特徴によれば、主に会計書類に特徴的な、収入印紙の貼付の有無、押印の有無及び押印された印の形状の判定が可能である。 According to this feature, it is possible to determine whether or not the revenue stamp is affixed, the presence or absence of the seal, and the shape of the stamped seal, which are characteristic of the accounting document.
 本発明の書類分類システムは、
 前記書類のデータは文字データを含み、
 前記AI機能は、前記文字データに記載された文字や内容に基づいて分類を行うことを特徴とする。
The document classification system of the present invention is
The data of the document contains text data,
The AI function is characterized by performing classification based on characters and contents described in the character data.
 この特徴によれば、文字データに含まれる特徴を用いた分類が可能となる。例えば、領収書であれば、日付、金額、会社名、商品名が記載されることが多い。また、請求書であれば、日付、金額、取引先名、送付元の会社名、電話番号、Fax番号、商品名や商品数などが記載されることが多い。契約書に関しては、契約書に使われる独特な言い回しがあり、それが特徴になると考えられる。 According to this feature, classification using features included in character data is possible. For example, in the case of a receipt, a date, an amount, a company name, and a product name are often described. In the case of a bill, date, amount, supplier name, company name of sending source, telephone number, fax number, product name, number of products, etc. are often described. With regard to contracts, there is a unique wording used in contracts, which is considered to be a feature.
 この特徴によれば、主に会計書類に特徴的な、収入印紙の貼付の有無、押印の有無及び押印された印の形状の判定が可能である。 According to this feature, it is possible to determine whether or not the revenue stamp is affixed, the presence or absence of the seal, and the shape of the stamped seal, which are characteristic of the accounting document.
 本発明の書類分類方法は、
 前記書類の種類に領収書及び請求書を含む書類分類システムを用いることを特徴とする。
The document classification method of the present invention is
It is characterized by using a document classification system including a receipt and a bill in the type of the document.
 この特徴によれば、領収書及び請求書を分類する書類分類方法が提供される。 According to this feature, a document classification method for classifying receipts and invoices is provided.
 本発明の会計処理方法は、
 前記書類の種類に領収書及び請求書を含む書類分類システムを用いる書類分類方法により書類を分離するステップと、
 前記書類の種類が領収書及び請求書である画像データについて、OCR処理を行い、その結果を会計処理ソフトウエアに入力するステップとを含むことを特徴とする。
The accounting method of the present invention is
Separating the documents according to a document classification method using a document classification system including receipts and invoices in the document type;
Performing OCR processing on image data whose type of document is a receipt and a bill, and inputting the result into accounting software.
 この特徴によれば、領収書及び請求書について、OCRを行い、その結果に基づいて会計処理ソフトウエアに処理させる会計処理方法が提供される。 According to this feature, there is provided an accounting method that performs OCR on receipts and invoices, and causes accounting software to process the results based on the results.
 本発明の会計処理システムは、
 前記書類の種類に領収書及び請求書を含む書類分類システムと、
 前記書類分類システムの分類する前記書類の種類が領収書及び請求書である書類に対して前記画像データに対してOCR処理を行うOCR処理部と、
 前記OCR処理部の出力する文字列に基づいて会計処理を行う会計処理ソフトウエアとを含むことを特徴とする。
The accounting system of the present invention is
A document classification system including receipts and invoices in the types of documents;
An OCR processing unit that performs OCR processing on the image data with respect to a document whose type of the document classified by the document classification system is a receipt and a bill;
And accounting processing software for performing accounting processing based on the character string output from the OCR processing unit.
 この特徴によれば、領収書及び請求書について、OCRを行い、その結果に基づいて会計処理ソフトウエアによる処理を行う会計処理システムが提供される。 According to this feature, an accounting system is provided that performs OCR on receipts and invoices, and processes the accounting software based on the results.
 本発明の会計処理システムは、
 前記会計処理ソフトウエアは、前記文字列が不適合である場合にエラーを出力することを特徴とする。
The accounting system of the present invention is
The accounting software may output an error if the string is nonconforming.
 この特徴によれば、書類の分類が不適合であった場合にはエラーを出力する(かかる書類についての処理を行わない)会計処理システムが提供される。書類の分類に誤りがあっても頑健な会計処理システムである。 According to this feature, an accounting system is provided that outputs an error (does not process the document) if the classification of the document is not suitable. It is a robust accounting system even if there are errors in document classification.
 本発明の会計処理方法は、
 文字列が不適合である場合にエラーを出力する会計処理システムを使用し、
 前記会計処理方法ソフトウエアがエラーを出力した前記画像データについて、教師付き学習を行うとこと特徴とする。
The accounting method of the present invention is
Use an accounting system that outputs an error if the string is nonconforming,
It is characterized in that supervised learning is performed on the image data for which the accounting method software outputs an error.
 この特徴によれば、エラーの出力された書類について再学習し、書類分類の精度を高める会計処理方法が提供される。 According to this feature, an accounting method is provided which re-learns the document in which the error is output and improves the accuracy of the document classification.
 本発明によれば、書類をその種類に分類する書類分類システム、該書類分類システムを利用する書類分類方法、分類された書類に基づいて会計処理を行う会計処理システム、及び該会計処理システムを利用する会計処理方法が提供される。 According to the present invention, a document classification system for classifying documents into the types, a document classification method using the document classification system, an accounting system for accounting based on the classified documents, and the accounting system Accounting methods are provided.
図1は、書類分類システム及び会計処理システムの構成を示す図である。FIG. 1 is a diagram showing the configuration of a document classification system and an accounting system. 図2は、書類を示す図である。FIG. 2 is a diagram showing a document.
 以下、本発明の実施例を説明する。 Hereinafter, examples of the present invention will be described.
(システムの構成)
 図1は、書類分類システム及び会計処理システムの構成を示す図である。書類分類システム1は、AI機能11及び学習部12を備え、学習データ12a、学習結果12bを保持している。
(System configuration)
FIG. 1 is a diagram showing the configuration of a document classification system and an accounting system. The document classification system 1 includes an AI function 11 and a learning unit 12, and holds learning data 12a and learning results 12b.
 書類分類システム1は、画像データ13を読み込み、AI機能11によって請求書画像データ13a、領収書画像データ13b、その他画像データ13cに分類する。なお、請求書画像データ13a、領収書画像データ13b、その他画像データ13cは分類の一例であり、他の種類の画像データに分類してもよい。 The document classification system 1 reads the image data 13 and classifies it into invoice image data 13a, receipt image data 13b, and other image data 13c by the AI function 11. The bill image data 13a, the receipt image data 13b, and the other image data 13c are an example of classification, and may be classified into other types of image data.
 AI機能11は、学習結果12bを参照して、画像データ13を分類する。学習結果12bの形式は、AI機能11の仕様に基づいて定めればよい。典型的には各々の画像データから計算される特徴量に関する分類毎の境界値である。 The AI function 11 classifies the image data 13 with reference to the learning result 12 b. The format of the learning result 12 b may be determined based on the specification of the AI function 11. Typically, it is a boundary value for each classification related to the feature value calculated from each image data.
 学習部12は、学習データ12aに基づいて、例えば特徴量を計算して学習結果12bを出力する。ここで、学習データ12aは、画像データ13に書類の種類を付与したものとする。すなわち、学習部12は、画像データの分類される種類を知って教師付きの学習を行う。教師付きの学習以外の学習については、実施例2以下に説明する。 The learning unit 12 calculates, for example, a feature amount based on the learning data 12 a and outputs a learning result 12 b. Here, in the learning data 12 a, it is assumed that the type of document is added to the image data 13. That is, the learning unit 12 performs supervised learning by knowing the classified types of the image data. Learning other than supervised learning is described in the second embodiment and the following.
 会計処理システム2は、書類分類システム1、OCR処理部21及び会計処理ソフトウエア22を備えている。 The accounting system 2 includes a document classification system 1, an OCR processor 21, and accounting software 22.
 OCR処理部21は、画像データ13から、OCR処理によって書類名、摘要、金額等の文字を読み取る。いずれの画像データも処理可能であるが、請求書画像データ13a及び領収書画像データ13bについてOCR処理を行うものとする。 The OCR processing unit 21 reads characters such as a document name, a summary, and an amount from the image data 13 by OCR processing. Although any image data can be processed, it is assumed that the OCR process is performed on the bill image data 13a and the receipt image data 13b.
 会計処理ソフトウエア22は、OCR処理部21の読み取った文字に基づき、財務諸表の作成その他の会計処理を行う。ここで、読み取った文字が会計処理に不適合である場合(例えば請求書に請求元の名称がない場合)にはエラーを出力する。 The accounting software 22 prepares financial statements and performs other accounting processes based on the characters read by the OCR processing unit 21. Here, if the read character is not suitable for accounting (for example, if the bill does not have the name of the bill source), an error is output.
(処理の手順)
 以下、書類分類システム1及び会計処理システム2を用いる処理の手順について、説明する。
(Processing procedure)
Hereinafter, the procedure of processing using the document classification system 1 and the accounting system 2 will be described.
 書類分類システム1の運用に先立ち、学習部12により学習結果12bを生成する。学習データ12aとしてはユーザ企業の過去の書類(正しく処理されたもの)を用いることができる。本実施例では、書類の種類は「請求書」、「領収書」及び「その他」とする。学習部12は、「請求書」、「領収書」、「その他」の別が付与された学習データ12aを入力し、学習結果12bを出力する。 Prior to the operation of the document classification system 1, the learning unit 12 generates a learning result 12 b. As the learning data 12a, past documents (processed correctly) of the user company can be used. In this embodiment, the types of documents are "bill", "receipt" and "other". The learning unit 12 inputs the learning data 12 a to which “bill”, “receipt”, and “other” are added, and outputs the learning result 12 b.
 書類をスキャナ等で光学的に読み取り、画像データ13を生成する。 The document is optically read by a scanner or the like to generate image data 13.
 画像データ13は、AI機能11によって、請求書画像データ13a、領収書画像データ13b、その他画像データ13cに分類される。AI機能11は、学習結果12bを参照して画像データ13を分類し、請求書画像データ13a、領収書画像データ13b、その他画像データ13cを、それぞれ請求書フォルダ、領収書フォルダ、その他フォルダに移動するものとする。ファイルの収納されるフォルダによって書類の種類が示される。 The image data 13 is classified by the AI function 11 into invoice image data 13a, receipt image data 13b, and other image data 13c. The AI function 11 classifies the image data 13 with reference to the learning result 12b, and moves the bill image data 13a, the receipt image data 13b, and the other image data 13c to the bill folder, the receipt folder, and the other folder, respectively. It shall be. The type of document is indicated by the folder in which the file is stored.
 なお、分類結果の種類をフォルダによって区分する以外にも、ファイル名に種類を表す文字を入れる、ファイルに種類を表す情報を付与する、その他の方法で種類を示すことができる。 In addition to classification of types of classification results by folders, it is possible to indicate the types by other methods such as adding characters representing the types to the file name, giving information representing the types to the files.
 分類された書類のうち、請求書及び領収書を用いて会計処理を行う。OCR処理部21は、請求書画像データ13a及び領収書画像データ13b(請求書フォルダ及び領収書フォルダ内のファイル)のうち未処理のものについて、OCR処理によって書類名、摘要、金額等の文字を読み取る。 Of the classified documents, we will use the bill and receipt to account. The OCR processing unit 21 uses OCR processing to process characters such as a document name, a payment, an amount of money, and the like for unprocessed items among the invoice image data 13a and the receipt image data 13b (files in the invoice folder and the receipt folder). read.
 会計処理ソフトウエア22は、OCR処理部21の読み取った文字に基づき、財務諸表の作成その他の会計処理を行う。ここで、読み取った文字が会計処理に不適合である場合(例えば請求書に請求元の名称がない場合)にはエラーを出力する。 The accounting software 22 prepares financial statements and performs other accounting processes based on the characters read by the OCR processing unit 21. Here, if the read character is not suitable for accounting (for example, if the bill does not have the name of the bill source), an error is output.
 エラーが出力された画像データ13(請求書画像データ13a又は領収書画像データ13b)については、AI機能11の分類誤りである可能性が高い。そこで、かかる画像データ13に正しい種類を(人手により)付加し、学習データ12aとして教師付き学習を行う。なお、教師なし学習とすることも可能であるが、分類誤りを減少させるために教師付き学習とすることが好ましい。 The image data 13 (invoice image data 13a or receipt image data 13b) for which an error has been output is highly likely to be a classification error of the AI function 11. Therefore, the correct type is added to the image data 13 (manually), and supervised learning is performed as the learning data 12a. Although unsupervised learning is possible, supervised learning is preferable in order to reduce classification errors.
(書類の種類)
 以下、書類の種類毎の画像データ13を例示し、AI機能11による明確な分類が可能となることを説明する。
(Type of documents)
Hereinafter, image data 13 for each type of document will be illustrated, and it will be described that clear classification by the AI function 11 is possible.
 図2は、書類を示す図である。図2(A)は請求書、図2(B)は領収書、図2(C)は契約書を示す。図2(A)~(C)において、有色部分を網掛け、背景と異なる色彩を有する区画をハッチングで示す。 FIG. 2 is a diagram showing a document. 2 (A) shows a bill, FIG. 2 (B) shows a receipt, and FIG. 2 (C) shows a contract. In FIGS. 2A to 2C, colored portions are shaded, and sections having a color different from the background are hatched.
 請求書は、図2(A)に示すように、多くの場合にA4の用紙に印刷され、角印が押印されている。領収書は、図2(B)に示すように、多くの場合に有色の横長の用紙であり、角印が押印され、収入印紙が貼付されている。契約書は、図2(C)に示すように、多くの場合にA4の用紙に印刷され、2つの丸印が押印されている。 Invoices are often printed on A4 paper, as shown in FIG. 2A, and square marks are sealed. As shown in FIG. 2 (B), the receipt is often a colored, horizontally long sheet, and a square mark is stamped and a revenue stamp is attached. As shown in FIG. 2C, the contract is often printed on A4 paper, and two circles are sealed.
 以上のように、画像データ13の色彩(白色、黒色以外の色彩)の有無、画像データ13の形状(H、W、H/Wの値)、画像データ13のうち背景と異なる色彩を有する区画の色彩、画像データのうち背景と異なる色彩を有する区画の形状(h、w、h/wの値)、は、書類の種類に依存して大きく相違する。これらの特徴量のみを用いて、OCR処理を行うことなく書類をその種類に分類可能である。 As described above, the presence or absence of the color (white, color other than black) of the image data 13, the shape (values of H, W, H / W) of the image data 13, and the sections of the image data 13 having a color different from the background The color of the image data and the shape (h, w, h / w values) of the sections of the image data that have a color different from the background differ greatly depending on the type of document. Documents can be classified into their types without performing OCR processing using only these feature quantities.
 ここで、書類のサイズ(H、Wの値)は、背景白色の書類についてスキャナを用いて光学的読み取りを行う場合には決定できないとも考えられる。しかし、多くのスキャナに書類サイズの検出機能があるので問題ない。書類サイズの検出機能を有さない場合にを、書類よりも大きな例えば黒色の紙を書類の背面に配することで解決できる。 Here, it is considered that the size of the document (values of H and W) can not be determined in the case of performing optical reading using a scanner on a background white document. However, this is not a problem because many scanners have a document size detection function. In the case where the document size detection function is not provided, for example, a black paper larger than the document can be solved by placing it on the back of the document.
 以上詳細に説明したように、本実施例の書類分類システム1によれば、OCR処理を行うことなく、画像データ13の色彩(白色、黒色以外の色彩)の有無、画像データ13の形状(H、W、H/Wの値)、画像データ13のうち背景と異なる色彩を有する区画の色彩、画像データのうち背景と異なる色彩を有する区画の形状(h、w、h/wの値)の特徴量のみによって、書類をその種類に分類することができる。 As described above in detail, according to the document classification system 1 of the present embodiment, the presence or absence of a color (a color other than white or black) of the image data 13 and the shape of the image data 13 (H , W, H / W), the color of the section of the image data 13 having a color different from the background, the shape of the section of the image data having a color different from the background (h, w, h / w values) Documents can be classified into the types only by the feature amount.
 また、本実施例の会計処理システム2によれば、書類を分類して請求書及び領収書を抽出し、これらを用いて会計処理を行うことができる。 Further, according to the accounting system 2 of the present embodiment, documents can be classified to extract bills and receipts, and accounting can be performed using these.
 本実施例は、学習部12が教師なし学習を行うものである。他の点は実施例1と同様であり、詳細な説明を省略する。 In the present embodiment, the learning unit 12 performs unsupervised learning. The other points are the same as in the first embodiment, and the detailed description is omitted.
 学習時には書類の種類が特定されておらず、学習データ12aは、画像データ13のみであって書類の種類が付与されていない。学習部12は、それでも、類似する書類群(クラスタ)毎に分類することができる。 At the time of learning, the type of the document is not specified, and the learning data 12a is only the image data 13, and the type of the document is not given. The learning unit 12 can still classify the similar document groups (clusters).
 すなわち、学習データ12a中の書類における特徴の分布に基づく分類(クラスタリング)は可能である。実施例1において示したように、書類の種類に依存して明白な特徴があるので、明確な分類が可能である。 That is, classification (clustering) based on the distribution of features in the document in the learning data 12a is possible. As shown in Example 1, a clear classification is possible since there are obvious features depending on the type of document.
 学習データ12aに書類の種類が付与されていないので、分類されるだけであり、いずれの分類がいずれの種類(請求書、領収書等)であるかは決定されない。しかし、ユーザは、分類された結果に基づいて例えば請求書がいずれの分類であるかを判断することができる。 Since the type of document is not given to the learning data 12a, it is only classified, and it is not determined which type is which type (bill, receipt, etc.). However, the user can determine, for example, which category the bill is based on the classified result.
 なお、ユーザが一度判断した後は、教師なし学習であっても判断結果の分類として書類の種類を定めることができる。 Once the user makes a determination, the type of document can be determined as a classification of the determination result even in the unsupervised learning.
 ユーザが請求書又は領収書であると判断した分類に係る画像データ13により、OCR処理部21及び会計処理ソフトウエア22による会計処理が可能である。 Accounting processing by the OCR processing unit 21 and the accounting software 22 is possible by the image data 13 according to the classification determined that the user is a bill or a receipt.
 以上詳細に説明したように、本実施例の書類分類システム1によれば、学習部12は教師なし学習を行うことによっても、実施例1と同様の効果を得ることができる。 As described above in detail, according to the document classification system 1 of the present embodiment, the learning unit 12 can obtain the same effect as that of the first embodiment even by performing unsupervised learning.
 なお、教師なし学習を事前に行うものとして説明したが、事前の学習なしで分類を行うことも可能である。すなわち、学習データ12aとして機能し得るだけの多数の画像データ13を同時に分類するのであれば、それらの画像データ13に対する分類(クラスタリング)は事前学習なしでも可能である。 Although unsupervised learning has been described in advance, classification can be performed without prior learning. That is, if a large number of image data 13 that can only function as learning data 12a are classified at the same time, classification (clustering) of those image data 13 is possible without prior learning.
 本実施例は、書類のデータが文字データを含むものである。他の点は実施例1及び2と同様であり、詳細な説明を省略する。 In the present embodiment, the data of the document includes text data. The other points are the same as in the first and second embodiments, and the detailed description will be omitted.
 書類のデータは文字データを含む。書類をスキャナ等で光学的に読み取り画像データ13を生成する際に、OCR処理を行って文字データを生成することができる。また、書類を電子データとして受領した場合にも、そこに文字データが含まれること、そこから文字データを生成することができる。 Document data includes text data. When the document is optically read by a scanner or the like and the image data 13 is generated, OCR processing can be performed to generate character data. Also, even when a document is received as electronic data, it can be generated that character data is included therein, from which the character data can be generated.
 文字データは、書類の分類に有効な情報を含んでいる。例えば、領収書であれば、日付、金額、会社名、商品名が記載されることが多い。請求書であれば、日付、金額、取引先名、送付元の会社名、電話番号、Fax番号、商品名や商品数などが記載されることが多い。契約書に関しては、会社名に替えて「甲」「乙」と表記される等の契約書に特徴的な使われる独特な言い回しが用いられることが多い。 Character data contains information that is valid for document classification. For example, in the case of a receipt, a date, an amount, a company name, and a product name are often described. In the case of an invoice, the date, the amount, the supplier name, the company name of the sending source, the telephone number, the fax number, the item name, the number of items, etc. are often described. With regard to contracts, in place of company names, there are often used unique phrases characteristic of contracts such as “A” or “乙”.
 AI機能11を用いて、上記の各書類の特徴を検出することで、明確な分類が可能となる。 By using the AI function 11 to detect the features of each document described above, clear classification is possible.
 AI機能11は、例えば、領収書は「領収書」の文字列を含む等、その書類の名称の文字列をキーワードとして分類することもでき、多数のデータに基づいてさらに明確に分類することもできる。例えば請求書に補助情報として「見積書番号」を記載する場合等にも、AI機能11が対応する。 The AI function 11 can also classify the character string of the name of the document as a keyword, for example, the receipt includes the character string of “receipt”, and may be classified more clearly based on a large number of data. it can. For example, the AI function 11 also corresponds to the case where “estimate sheet number” is described as supplementary information on a bill.
 以上、文字データに基づく分類について説明したが、実施例1及び2において説明した画像データに基づく分類と併用することもできる。AI機能11は、文字データに基づく分類と、画像データに基づく分類との、それぞれの信頼度(分類の確からしさ)を求めることができる。文字データに基づく分類と画像データに基づく分類とを併用し、信頼度に基づいて最終的な分類を定めればよい。 As mentioned above, although classification based on character data was explained, it can also be used together with classification based on image data explained in Embodiments 1 and 2. The AI function 11 can obtain the reliability of each of classification based on character data and classification based on image data (probability of classification). The classification based on the character data and the classification based on the image data may be used together to determine the final classification based on the reliability.
 なお、文字データに基づく分類と画像データに基づく分類とを併用する場合には、文字データに基づく分類をその書類の名称の文字列のキーワードによる分類のみとすることも考えられる。 In the case where the classification based on character data and the classification based on image data are used in combination, it may be considered that the classification based on character data is only classified by the keyword of the character string of the name of the document.
 以上詳細に説明したように、本実施例の書類分類システム1によれば、文字データを用いて分類の明確性を向上させることができる。 As described above in detail, according to the document classification system 1 of the present embodiment, the clearness of classification can be improved using character data.
 本実施例は、会計書類に限定されない多様な書類の分類を行うものである。他の点は実施例1~3と同様であり、詳細な説明を省略する。 This embodiment is intended to classify various documents which are not limited to accounting documents. The other points are the same as in the first to third embodiments, and the detailed description will be omitted.
 実施例1、2において請求書、領収書等の会計書類を画像データ13とするものとして説明したが、企業において画像データとしては存される書類は他にも考えられる。例えば、名刺、他社から受領した説明資料等である。 Although accounting documents such as invoices and receipts have been described as the image data 13 in the first and second embodiments, documents existing as image data in a company may be considered. For example, business cards, explanatory materials received from other companies, etc.
 会計書類以外の書類であっても、画像データ13がその書類の種類に固有の特徴を有するものが多い。例えば、名刺はそのサイズが特徴的である。また、説明資料は赤色、青色等の原色が使用されている場合が多い。 Even in documents other than accounting documents, the image data 13 often has characteristics specific to the type of the document. For example, business cards are characterized by their size. In addition, primary materials such as red and blue are often used as explanatory materials.
 これら、画像データ13がその書類の種類に固有の特徴を有する書類については、書類分類システム1による分類が、実施例1、2と同様に可能である。 As for the documents having the image data 13 having characteristics specific to the document type, classification by the document classification system 1 is possible as in the first and second embodiments.
 以上詳細に説明したように、本実施例の書類分類システム1によれば、会計書類以外の書類についても分類が可能である。 As described above in detail, according to the document classification system 1 of the present embodiment, documents other than accounting documents can be classified.
 書類の画像データについて、その書類を分類する書類分類システム、該書類分類システムを利用する書類分類方法、分類された書類に基づいて会計処理を行う会計処理システム、及び該会計処理システムを利用する会計処理方法である。多くの企業による利用が考えられる。 A document classification system that classifies documents, a document classification method that uses the document classification system, an accounting system that performs accounting based on classified documents, and an accounting system that uses the accounting system It is a processing method. It can be used by many companies.
  1    書類分類システム
  11   AI機能
  12   学習部
  12a  学習データ
  12b  学習結果
  13   画像データ
  13a  請求書画像データ
  13b  領収書画像データ
  13c  その他画像データ
  2    会計処理システム
  21   OCR処理部
  22   会計処理ソフトウエア
DESCRIPTION OF SYMBOLS 1 document classification system 11 AI function 12 learning part 12a learning data 12b learning result 13 image data 13a invoice image data 13b receipt image data 13c other image data 2 accounting processing system 21 OCR processing unit 22 accounting processing software

Claims (16)

  1.  書類のデータについて、その書類を分類する書類分類方法であって、
     AI機能によって分類を行わせることを特徴とする、書類分類方法。
    A document classification method for classifying documents for data of documents,
    A document classification method characterized in that classification is performed by an AI function.
  2.  書類のデータについて、その書類を分類する書類分類システムであって、
     各々の書類について分類を行うAI機能を備えることを特徴とする、書類分類システム。
    A document classification system that classifies documents for document data, and
    A document classification system, comprising: an AI function that classifies each document.
  3.  前記AI機能は各々の書類について事前学習の結果に基づいて分類を行うことを特徴とする、請求項2に記載の書類分類システム。 The document classification system according to claim 2, wherein said AI function performs classification based on the result of prior learning about each document.
  4.  前記事前学習は、書類の種類を付与されたデータに基づく教師付き学習であり、
     前記AI機能は、書類の種類を特定する分類を行うことを特徴とする、請求項3に記載の書類分類システム。
    The prior learning is supervised learning based on data given document types,
    The document classification system according to claim 3, wherein said AI function performs classification which specifies the type of document.
  5.  前記事前学習は、書類の種類を付与されないデータに基づく教師なし学習であり、
     前記AI機能は、書類の種類を特定しない分類を行うことを特徴とする、請求項3に記載の書類分類システム。
    The prior learning is unsupervised learning based on data not given document types,
    The document classification system according to claim 3, wherein the AI function performs classification without specifying the type of the document.
  6.  前記AI機能は、複数の書類を入力して分類するものであり、事前学習を行わずに該複数の書類に基づいて書類の種類を特定しない分類を行うことを特徴とする、請求項2に記載の書類分類システム。 The AI function is to input and classify a plurality of documents, and perform classification without specifying the type of the document based on the plurality of documents without performing prior learning. Document classification system described.
  7.  前記書類の種類は、領収書及び請求書を含むことを特徴とする、請求項2~6のいずれか1項に記載の書類分類システム。 The document classification system according to any one of claims 2 to 6, wherein the type of the document includes a receipt and a bill.
  8.  前記AI機能は、収入印紙の貼付の有無を判定することを特徴とする、請求項7に記載の書類分類システム。 8. The document classification system according to claim 7, wherein said AI function determines presence or absence of revenue stamp sticking.
  9.  前記AI機能は、押印の有無及び押印された印の形状を判定することを特徴とする、請求項7又は8に記載の書類分類システム。 9. The document classification system according to claim 7, wherein said AI function determines the presence or absence of a seal and the shape of the sealed seal.
  10.  前記書類のデータは画像データであり、
     前記AI機能は、前記画像データの色彩、前記画像データの形状、前記画像データの大きさ、前記画像データのうち背景と異なる色彩を有する区画の色彩及び前記画像データのうち背景と異なる色彩を有する区画の形状のうち少なくとも1に基づいて分類を行うことを特徴とする、請求項2~9のいずれか1項に記載の書類分類システム。
    The data of the document is image data,
    The AI function has a color of the image data, a shape of the image data, a size of the image data, a color of a section of the image data having a color different from the background, and a color of the image data different from the background The document classification system according to any one of claims 2 to 9, wherein classification is performed based on at least one of the shapes of the sections.
  11.  前記書類のデータは文字データを含み、
     前記AI機能は、前記文字データに記載された文字や内容に基づいて分類を行うことを特徴とする、請求項2~10のいずれか1項に記載の書類分類システム。
    The data of the document contains text data,
    The document classification system according to any one of claims 2 to 10, wherein the AI function performs classification based on characters and contents described in the character data.
  12.  請求項7~11のいずれか1項に記載の書類分類システムを用いることを特徴とする、請求項1に記載の書類分類方法。 The document classification method according to claim 1, wherein the document classification system according to any one of claims 7 to 11 is used.
  13.  請求項12に記載の書類分類方法により書類を分離するステップと、
     前記書類の種類が領収書及び請求書である画像データについて、OCR処理を行い、その結果を会計処理ソフトウエアに入力するステップとを含むことを特徴とする、会計処理方法。
    Separating the documents according to the document classification method according to claim 12;
    And d) performing OCR processing on image data of which the type of the document is a receipt and a bill, and inputting the result into accounting software.
  14.  請求項7~11のいずれか1項に記載の書類分類システムと、
     前記書類分類システムの分類する前記書類の種類が領収書及び請求書である書類に対して前記画像データに対してOCR処理を行うOCR処理部と、
     前記OCR処理部の出力する文字列に基づいて会計処理を行う会計処理ソフトウエアとを含むことを特徴とする、会計処理システム。
    A document classification system according to any one of claims 7 to 11;
    An OCR processing unit that performs OCR processing on the image data with respect to a document whose type of the document classified by the document classification system is a receipt and a bill;
    An accounting processing system comprising: accounting software that performs accounting based on a character string output from the OCR processing unit.
  15.  前記会計処理ソフトウエアは、前記文字列が不適合である場合にエラーを出力することを特徴とする、請求項14に記載の会計処理システム。 The accounting system according to claim 14, wherein the accounting software outputs an error when the character string is nonconforming.
  16.  請求項15に記載の会計処理システムを使用し、
     前記会計処理方法ソフトウエアがエラーを出力した前記画像データについて、教師付き学習を行うとこと特徴とする、会計処理方法。
    Using the accounting system according to claim 15,
    A method of accounting processing, wherein supervised learning is performed on the image data for which the accounting method software outputs an error.
PCT/JP2017/025058 2017-07-08 2017-07-08 Document classification system and method, and accounting system and method WO2019012570A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2017/025058 WO2019012570A1 (en) 2017-07-08 2017-07-08 Document classification system and method, and accounting system and method
JP2017536900A JP6504514B1 (en) 2017-07-08 2017-07-08 Document classification system and method and accounting system and method.

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/025058 WO2019012570A1 (en) 2017-07-08 2017-07-08 Document classification system and method, and accounting system and method

Publications (1)

Publication Number Publication Date
WO2019012570A1 true WO2019012570A1 (en) 2019-01-17

Family

ID=65001895

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/025058 WO2019012570A1 (en) 2017-07-08 2017-07-08 Document classification system and method, and accounting system and method

Country Status (2)

Country Link
JP (1) JP6504514B1 (en)
WO (1) WO2019012570A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020187426A (en) * 2019-05-10 2020-11-19 京セラドキュメントソリューションズ株式会社 Image processing device and image processing method
CN112101367A (en) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 Text recognition method, image recognition and classification method and document recognition processing method
JP2021005226A (en) * 2019-06-26 2021-01-14 京セラドキュメントソリューションズ株式会社 Document classification system and document classification program
JP6856916B1 (en) * 2020-01-08 2021-04-14 ジーニアルテクノロジー,インク. Information processing equipment, information processing methods and information processing programs
JP2021072110A (en) * 2020-04-30 2021-05-06 株式会社日本デジタル研究所 Voucher determination device, accounting processing device, voucher determination program, voucher determination system, and voucher determination method
JP2021072088A (en) * 2020-04-30 2021-05-06 株式会社日本デジタル研究所 Voucher determination device, accounting processing device, voucher determination program, voucher determination system, and voucher determination method
WO2021140682A1 (en) * 2020-01-08 2021-07-15 ジーニアルテクノロジー,インク. Information processing device, information processing method, and information processing program

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0855157A (en) * 1994-08-09 1996-02-27 Supetsuku:Kk Taxation document issuing device
JP2007088609A (en) * 2005-09-20 2007-04-05 Fuji Xerox Co Ltd Electronic signature providing apparatus, method, and program
US9053350B1 (en) * 2009-01-21 2015-06-09 Google Inc. Efficient identification and correction of optical character recognition errors through learning in a multi-engine environment
JP2015170045A (en) * 2014-03-05 2015-09-28 グローリー株式会社 Sales management system and method
JP2016071412A (en) * 2014-09-26 2016-05-09 キヤノン株式会社 Image classification apparatus, image classification system, image classification method, and program
JP2016173822A (en) * 2015-03-17 2016-09-29 株式会社リコー Information processing apparatus, information processing system and program
JP2017069599A (en) * 2015-09-28 2017-04-06 富士ゼロックス株式会社 Image processing apparatus and program

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04127287A (en) * 1990-09-18 1992-04-28 Fujitsu Ltd Optical character reader
US8408544B2 (en) * 2011-06-08 2013-04-02 Eastman Kodak Company Sorting by controlling scanned document velocity
JP2016085538A (en) * 2014-10-23 2016-05-19 キヤノン株式会社 Information processing equipment, control method of information processing equipment, and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0855157A (en) * 1994-08-09 1996-02-27 Supetsuku:Kk Taxation document issuing device
JP2007088609A (en) * 2005-09-20 2007-04-05 Fuji Xerox Co Ltd Electronic signature providing apparatus, method, and program
US9053350B1 (en) * 2009-01-21 2015-06-09 Google Inc. Efficient identification and correction of optical character recognition errors through learning in a multi-engine environment
JP2015170045A (en) * 2014-03-05 2015-09-28 グローリー株式会社 Sales management system and method
JP2016071412A (en) * 2014-09-26 2016-05-09 キヤノン株式会社 Image classification apparatus, image classification system, image classification method, and program
JP2016173822A (en) * 2015-03-17 2016-09-29 株式会社リコー Information processing apparatus, information processing system and program
JP2017069599A (en) * 2015-09-28 2017-04-06 富士ゼロックス株式会社 Image processing apparatus and program

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020187426A (en) * 2019-05-10 2020-11-19 京セラドキュメントソリューションズ株式会社 Image processing device and image processing method
JP7419668B2 (en) 2019-05-10 2024-01-23 京セラドキュメントソリューションズ株式会社 Image processing device and image processing method
JP2021005226A (en) * 2019-06-26 2021-01-14 京セラドキュメントソリューションズ株式会社 Document classification system and document classification program
JP7364998B2 (en) 2019-06-26 2023-10-19 京セラドキュメントソリューションズ株式会社 Document classification system and document classification program
JP6856916B1 (en) * 2020-01-08 2021-04-14 ジーニアルテクノロジー,インク. Information processing equipment, information processing methods and information processing programs
WO2021140682A1 (en) * 2020-01-08 2021-07-15 ジーニアルテクノロジー,インク. Information processing device, information processing method, and information processing program
US11315351B2 (en) 2020-01-08 2022-04-26 Kabushiki Kaisha Genial Technology Information processing device, information processing method, and information processing program
JP2021072110A (en) * 2020-04-30 2021-05-06 株式会社日本デジタル研究所 Voucher determination device, accounting processing device, voucher determination program, voucher determination system, and voucher determination method
JP2021072088A (en) * 2020-04-30 2021-05-06 株式会社日本デジタル研究所 Voucher determination device, accounting processing device, voucher determination program, voucher determination system, and voucher determination method
CN112101367A (en) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 Text recognition method, image recognition and classification method and document recognition processing method
WO2022057707A1 (en) * 2020-09-15 2022-03-24 杭州睿琪软件有限公司 Text recognition method, image recognition classification method, and document recognition processing method

Also Published As

Publication number Publication date
JPWO2019012570A1 (en) 2019-07-11
JP6504514B1 (en) 2019-04-24

Similar Documents

Publication Publication Date Title
JP6504514B1 (en) Document classification system and method and accounting system and method.
US9552516B2 (en) Document information extraction using geometric models
US8167196B2 (en) Expanded mass data sets for electronic check processing
US11455784B2 (en) System and method for classifying images of an evidence
JP5202677B2 (en) Receipt data recognition device and program thereof
US9384393B2 (en) Check data lift for error detection
US11501344B2 (en) Partial perceptual image hashing for invoice deconstruction
JP2012226402A (en) Receipt data recognition device and program therefor
US20140268250A1 (en) Systems and methods for receipt-based mobile image capture
KR101942468B1 (en) Structured data and unstructured data extraction system and method
US20180053045A1 (en) Automated Processing of Receipts and Invoices
WO2020012539A1 (en) Journalization element analysis device, accounting system, journalization element analysis method, and journalization element analysis program
TWI716761B (en) Intelligent accounting system and identification method for accounting documents
US11030450B2 (en) System and method for determining originality of computer-generated images
CN111428725A (en) Data structuring processing method and device and electronic equipment
TWM575887U (en) Intelligent accounting system
US20160148283A1 (en) System and Method for Accessing Comic Book Grading Notes via a Quick Scan Code
JP6507459B2 (en) Accounting system
JP2011227787A (en) Accounting transaction information reading device
US10417488B2 (en) Re-application of filters for processing receipts and invoices
JP7429365B1 (en) Data processing device, data processing method and program
JP6835382B1 (en) Electronic data judgment system, electronic data judgment device, electronic data judgment method, electronic data judgment program
WO2022054136A1 (en) Data processing device, data processing method, and program
EP3557480B1 (en) Deriving data from documents and from transmission protocols using machine-learning techniques
US20230410214A1 (en) System and method for supervising expense or income

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2017536900

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17917573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27/03/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17917573

Country of ref document: EP

Kind code of ref document: A1