WO2022001637A1 - 文档处理方法、装置、设备及计算机可读存储介质 - Google Patents

文档处理方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2022001637A1
WO2022001637A1 PCT/CN2021/099799 CN2021099799W WO2022001637A1 WO 2022001637 A1 WO2022001637 A1 WO 2022001637A1 CN 2021099799 W CN2021099799 W CN 2021099799W WO 2022001637 A1 WO2022001637 A1 WO 2022001637A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
processed
feature
category
general
Prior art date
Application number
PCT/CN2021/099799
Other languages
English (en)
French (fr)
Inventor
詹明捷
许严
梁鼎
刘学博
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to KR1020227004409A priority Critical patent/KR20220031097A/ko
Priority to JP2022506431A priority patent/JP2022543052A/ja
Publication of WO2022001637A1 publication Critical patent/WO2022001637A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Definitions

  • the present disclosure relates to computer vision technology, and in particular, to a document processing method, apparatus, device, and computer-readable storage medium.
  • OCR Optical Character Recognition, Optical Character Recognition
  • Embodiments of the present disclosure provide a document classification scheme.
  • a document processing method comprising: acquiring semantic features and visual features of a document to be processed; determining a general feature of the document to be processed according to the semantic features and the visual features; The category of the to-be-processed document is determined according to the general characteristics of the to-be-processed document.
  • the obtaining the semantic feature of the document to be processed includes: obtaining a text recognition result of the document to be processed; and obtaining the semantic feature of the document to be processed based on the text recognition result.
  • the obtaining the text recognition result of the document to be processed includes: determining a target text box in the document to be processed and text content contained in the target text box; obtaining each The word segmentation processing result of the text content in the target text box; and the feature vector corresponding to the word segmentation processing result is obtained.
  • the determining the general feature of the document to be processed according to the visual feature and the semantic feature includes: performing regularization processing on the visual feature and the semantic feature respectively; A weighted summation is performed on the regularized visual feature and the regularized semantic feature to obtain the general feature of the document to be processed.
  • the document processing method is performed using a neural network, the neural network comprising a feature extraction sub-network for extracting general features of the document to be processed and a feature extraction sub-network for determining according to the general features
  • the first classification sub-network of the category of the document to be processed wherein the first classification sub-network is specifically used to: compare the general features of the to-be-processed document with the preset standard features of at least one category of documents, and determine the The similarity between the general feature of the document to be processed and the standard feature of the document of the at least one category is determined; the category of the document to be processed is determined according to the obtained at least one similarity.
  • the determining the category of the document to be processed according to the obtained at least one similarity includes: obtaining the highest similarity among the at least one similarity; If the similarity is greater than or equal to a preset similarity threshold, it is determined that the category of the document to which the standard feature corresponding to the highest similarity belongs is the category of the document to be processed.
  • the method further includes training a feature extraction sub-network in the neural network, specifically including: inputting a sample document into the feature extraction sub-network, and obtaining a feature extraction sub-network of the sample document.
  • a feature extraction sub-network in the neural network, specifically including: inputting a sample document into the feature extraction sub-network, and obtaining a feature extraction sub-network of the sample document.
  • General features wherein the sample documents are marked with categories; the general features are input into the second classification sub-network to obtain the predicted categories of the sample documents; according to the predicted categories of the sample documents and the annotations of the sample documents The difference between the categories is adjusted, and the network parameters of the feature extraction sub-network are adjusted.
  • the standard features of the at least one type of document are obtained by performing feature extraction on the at least one type of document by using a trained feature extraction sub-network.
  • the method further includes: in response to the highest similarity being less than the preset similarity threshold, adding the to-be-processed document as a standard template, and determining the to-be-processed document
  • the general characteristics of the document are the standard characteristics of the corresponding category of the newly added standard template.
  • the method further includes: in response to a selection instruction, selecting at least one category from preset document categories as a target category; the comparing the general characteristics of the document to be processed with the preset standard features of at least one type of document, determining the similarity between the general feature of the document to be processed and the standard feature of the at least one type of document, including: comparing the general feature of the document to be processed with a preset at least one target Standard features of documents of a category, determining the similarity between the general features of the documents to be processed and the standard features of documents of the at least one target category.
  • the method further includes: acquiring a corresponding preset standard template according to the category of the document to be processed; and performing layout recognition processing on the document to be processed based on the standard template, Get the result of the document's layout recognition.
  • a document processing apparatus comprising: an acquisition module for acquiring semantic features and visual features of a document to be processed; a general module for acquiring semantic features and visual features according to the semantic features and visual features determining a general feature of the document to be processed; a classification module, configured to determine the category of the document to be processed according to the general feature of the document to be processed.
  • a document processing apparatus comprising a non-volatile storage medium and a processor, the storage medium for storing computer instructions executable on the processor, the processor for The method described in any embodiment of the present disclosure is performed.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment of the present disclosure is implemented.
  • the document processing method, apparatus, device, computer-readable medium, and computer program of one or more embodiments of the present disclosure determine the general features of the document according to the obtained visual features and semantic features of the document, and determine the category of the document according to the general features.
  • the document processing method of the present disclosure can realize accurate classification of any document; by combining semantic features and visual features to obtain general features of documents, the accuracy of classification results of documents of different categories with similar visual features is improved, and the accuracy of document classification is also improved. robustness.
  • FIG. 1 is a flowchart of a document processing method according to an embodiment of the present disclosure
  • FIG. 2 schematically shows a partial network structure of a neural network for extracting visual features according to an embodiment of the present disclosure
  • FIG. 3 schematically shows a partial network structure of a neural network for extracting semantic features according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a text recognition process of a form shown in an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a user selection interface shown in an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a document processing apparatus shown in the implementation of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a document processing device according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • OCR Optical Character Recognition, Optical Character Recognition
  • FIG. 1 shows a flow of the document processing method, including steps S101 to S103 .
  • the document may include one or more of books, documents, forms, bills, certificates and radio frequency cards, etc.
  • the document processing method can automatically identify the categories of the above documents, for example, a bank card can be automatically identified as a bank card category, an ID card can be automatically identified as an ID card category, or an invoice can be automatically identified as an invoice category.
  • a bank card can be automatically identified as a bank card category
  • an ID card can be automatically identified as an ID card category
  • an invoice can be automatically identified as an invoice category.
  • the processing process of each document to be processed in batch processing is similar to the processing process of a single document to be processed, and you can refer to the processing process of a single document to be processed.
  • the document to be processed is a single item for illustration, but it is not a limitation of the technical solution of the present application.
  • step S101 the semantic features and visual features of the document to be processed are acquired.
  • This step is not intended to impose specific restrictions on the sequence of acquiring semantic features and visual features, that is, acquiring semantic features first and then visual features, or acquiring visual features first, then acquiring semantic features, or acquiring both semantic features and visual features.
  • a neural network can be used to extract the visual features of the document to be processed.
  • a convolution kernel such as a 3*3 convolution kernel
  • the initial features are sequentially extracted through multiple (such as 7) reverse residual blocks to extract intermediate features, and the last reverse
  • the intermediate features output by the residual block are then convolved with a convolution kernel (for example, a 1*1 convolution kernel) to output the features of the specified dimension as the visual features of the document to be processed.
  • Each inverse residual block includes a 1*1 convolution kernel and an activation function (such as Relu6) composed of an up-channel module (used to expand the number of channels of the input features), a depth-separable convolution layer and activation An extraction module composed of functions (used to extract the features of each channel and connect the features of each channel), and a down channel module composed of a 1*1 convolution kernel (used to restore the number of channels of the input features).
  • Each inverse residual block sums its input and the output of the drop channel block as the output of the inverse residual block. The output of each inverse residual block except the last inverse residual block is used as the input of the next inverse residual block.
  • FIG. 2 schematically shows a part of a network structure for extracting visual features of documents to be processed.
  • the partial network structure shown in FIG. 2 includes two inverse residual blocks, namely a first inverse residual block 201 and a second inverse residual block 202 .
  • the first inverse residual block 201 includes a first up-pass module 2011, a first extraction module 2012, and a first down-pass module 2013 which are connected in sequence.
  • the first ascending channel module 2011 may, for example, be composed of a 1*1 convolution kernel (Conv1*1) and an activation function (eg Relu6)
  • the first extraction module 2012 may be, for example, a depth-separable 3*3 convolution Layer (Dwise3*3) and activation function (for example, Relu6)
  • the first down channel module 2013 may be composed of, for example, a 1*1 convolution kernel (Conv1*1).
  • the first input of the first inverse residual block 201 is the initial feature of the document to be processed, which can be extracted by, for example, a 3*3 convolution kernel.
  • the first output of the first inverse residual block 201 is the sum of the first input and the output of the first down channel module, and the first output is the second input of the second inverse residual block 202.
  • the second inverse residual block 202 includes a second up-pass module 2021 , a second extraction module 2022 , and a second down-pass module 2023 connected in sequence.
  • the second up-channel module 2021 can be composed of, for example, a 1*1 convolution kernel (Conv1*1) and an activation function (such as Relu6)
  • the second extraction module 2022 can be composed of, for example, a depth-separable convolution layer (Dwise3 *3) and an activation function (such as Relu6)
  • the second down channel module can be composed of, for example, a 1*1 convolution kernel (Conv1*1).
  • the second output of the second inverse residual block 202 is the sum of the second input and the output of the second down channel block.
  • the semantic features of the document to be processed may be obtained in the following manner: first, a text recognition result of the to-be-processed document is obtained; then, based on the text recognition result, the semantic feature of the to-be-processed document is obtained.
  • the text recognition result may be a result of extracting the text content in the document to be processed and expressing it in a specific manner.
  • the OCR technology can be used to obtain the text recognition result of the document to be processed.
  • a neural network can be used to extract the semantic features of the text recognition results. Specifically, the features of different levels of the text recognition result can be extracted first, and then the above-mentioned features of different levels are connected and extracted, and finally the semantic features of the text recognition result are obtained.
  • each third extraction module 301 is used to obtain intermediate features of the text recognition result, wherein each third extraction module 301 may be a convolution kernel with different receptive fields.
  • a convolution kernel with a receptive field of 1 a convolution kernel with a receptive field of 3, and a convolution kernel with a receptive field of 5 can be used to extract features at three different levels of the text recognition result (eg, by convolution and/or pooling). operations), and then connect the features of three different levels to obtain intermediate features.
  • the fourth extraction module 302 (for example, a 1*1 convolution kernel) is used to further extract the intermediate features (for example, by convolution and/or pooling operations) to obtain semantic features of the text recognition result.
  • the feature extraction process corresponding to the above-mentioned Figure 3 is only an example of extracting semantic features, not a specific limitation on the way of extracting semantic features of text recognition results. More or less convolution kernels and other feelings can be used. Wild combinations are used to extract features at different levels.
  • the semantic features of the documents to be processed can be used to distinguish various documents with similar visual features but different text content.
  • the above-mentioned various documents are precisely one of the situations in which the related art cannot be accurately classified. This embodiment solves this problem in the related art by adding semantic features.
  • step S102 a general feature of the document to be processed is determined according to the semantic feature and the visual feature.
  • step S101 when extracting visual features and extracting semantic features in step S101, visual features and semantic features with the same dimensions can be output, so as to facilitate the fusion of the two features.
  • this embodiment does not intend to limit the dimensional relationship between the visual feature and the semantic feature extracted in step S101.
  • step S101 when extracting visual features and extracting semantic features in step S101, visual features and semantic features of different dimensions may also be output.
  • the dimensions of the two features can be compared, and then the dimension of the higher-dimensional feature of the two features can be reduced to make the dimensions of the two features the same, and then the two features can be fused.
  • linear dimension reduction and non-linear dimension reduction can be adopted in the dimension reduction method.
  • the general features of the documents to be processed can be obtained.
  • the general features of the documents to be processed may be used for document classification in step S103, and may also be used for document comparison to match document pictures.
  • step S103 the category of the to-be-processed document is determined according to the general characteristics of the to-be-processed document.
  • a general feature of the document is determined according to the obtained visual features and semantic features of the document, and a category of the document is determined according to the general feature.
  • the document processing method of the present disclosure can realize accurate classification of any document; by combining semantic features and visual features to obtain general features of documents, the accuracy of classification results of documents of different categories with similar visual features is improved, and the accuracy of document classification is also improved. robustness.
  • the text recognition result of the document to be processed may be obtained in the following manner:
  • the target text box in the document to be processed and the text content contained in the target text box are determined.
  • FIG. 4 shows a text recognition process for a document (ie, a form) to be processed.
  • the target text boxes in the document to be processed are determined, that is, the 15 text boxes 401 to 415, and the text content contained in each target text box.
  • the text box 401 contains the requisition form for office supplies
  • the text box 402 contains the date, date and time of filling in the form
  • the text box 415 contains the comments of the general manager.
  • word segmentation processing By performing word segmentation processing on the text content in each text box, multiple word segmentation processing results are obtained.
  • the 11 word segmentation processing results from 416 to 426 are obtained by performing word segmentation processing on the text content in the above 15 text boxes. Part of the word segmentation processing results.
  • the word segmentation processing results may include words or words.
  • the word segmentation processing results 416 office
  • 417 (supplies), 418 purchase requisitions) and 419 (tables) are the four text content in the text box 401 obtained after word segmentation processing.
  • the word segmentation processing results 425 (general manager) and 426 (opinion) are two word segmentation processing results obtained after the text content in the text box 415 is subjected to word segmentation processing.
  • 427 to 438 are 12 feature vectors, and each feature vector is a result obtained after a word segmentation processing result is represented by the feature vector.
  • the text recognition result is obtained by determining the target text box in the document and the text content in the target text box, and performing word segmentation processing and feature vector representation on the text content. Not only the text content in the document (for example, part or all of the text content in the document) is extracted, but also the smallest word/word unit in the text can be obtained through the division of the text box and word segmentation, so the semantic features are very accurate, The accuracy of document classification is further improved; and the text recognition result is represented by a feature vector, which facilitates the extraction of semantic features and further improves the efficiency of document classification.
  • the document processing method may be performed using a neural network
  • the neural network may include a feature extraction sub-network for extracting general features of the document to be processed and a feature extraction sub-network for determining the The first classification sub-network for the category of the document to be processed.
  • the first classification sub-network may be specifically used to: compare the general features of the documents to be processed with the standard features of at least one type of preset documents, and determine the general characteristics of the documents to be processed and the at least one type of documents. at least one similarity of standard features of the document; the category of the document to be processed is determined according to the at least one similarity.
  • the dimensions of the general feature and the standard feature of the document to be processed may be the same, thereby facilitating the comparison of the general feature and the standard feature.
  • the similarity between the common feature and the standard feature can be obtained by calculating the Euclidean distance of the two, or obtained by a neural network capable of outputting the similarity between the two, and the neural network is obtained through training.
  • standard features of various types of documents are preset in the neural network.
  • the categories of the documents to be processed are determined by using the common features of the documents to be processed and the similarity of different standard features.
  • the similarity is used to characterize the relationship between the document to be processed and various standard documents, that is, whether it is similar and the degree of similarity, which improves the accuracy of the classification result, and the operation is simple, and the classification efficiency is further improved.
  • the category of the document to be processed is determined according to at least one similarity in the following manner:
  • the highest similarity among the at least one similarity is obtained.
  • the category of the document to which the standard feature corresponding to the highest similarity belongs is the category of the document to be processed.
  • the highest similarity is determined by comparing the respective degrees of similarity.
  • the step of calculating the degrees of similarity can be returned, the degrees of similarity will be recalculated with higher precision, and then the calculated results will be compared again to obtain the highest degree of similarity. If the calculation is repeated one or more times and still includes at least two identical highest degrees of similarity, the repeated calculation is continued until only one highest degree of similarity remains.
  • the similarity can also be compared with the preset similarity threshold to filter out one or more similarities whose value is greater than or equal to the similarity threshold, and then filter out one or more similarities.
  • the highest similarity is obtained among the similarities. It can be seen from this that the implementation manner of determining the unique highest similarity may include but not be limited to the two cases exemplified above. In the implementation process, other implementation manners that can achieve the same or similar effects may also be used, which are not the same here. An example.
  • the similarity higher than the similarity threshold is considered to be an effective similarity, that is to say, the similarity between the general feature of the document to be processed and the standard feature is higher than or equal to the similarity threshold, and it is considered to be a valid similarity.
  • the document to be processed is similar to the standard document, and the more the similarity is higher than the similarity threshold, it is considered that the similarity between the document to be processed and the standard document is higher; the similarity between the general feature of the document to be processed and the standard feature is lower than Similarity threshold, it is considered that the document to be processed is not similar to the standard document.
  • the similarity threshold is preset in the neural network. By comparing the highest similarity with the similarity threshold, and when the highest similarity is greater than the similarity threshold, the to-be-processed document is classified into the category corresponding to the standard document. A classification error is avoided when the general features of the documents to be processed have low similarity with all standard features, that is, when the documents to be processed do not belong to a category corresponding to any standard document. The accuracy of classification is further improved, and the problem of misclassification of documents other than preset categories is avoided.
  • the feature extraction sub-network in the neural network is trained in the following manner:
  • the general feature is input into the second classification sub-network to obtain the predicted category of the sample document
  • the network parameters of the feature extraction sub-network are adjusted.
  • the network structure of the feature extraction sub-network enables it to extract the general features of the documents input into it, and the training of the feature extraction sub-network is to improve the accuracy of its feature extraction.
  • the second classification sub-network is a classifier, for example, it can be composed of at least one fully connected layer and a normalization layer; the number of categories classified by the second classification sub-network is fixed, corresponding to the number of categories of the sample documents, such as 5 , 8 or 10, etc., that is to say, the output of the second classification sub-network is the probability of each preset category, and the category with the highest probability is the classification result.
  • the output dimension of the second classification sub-network is 10, corresponding to the above 10 categories respectively.
  • the second classification sub-network When the general features of a sample document extracted by the feature extraction sub-network are input to the second classification sub-network, the second classification sub-network outputs 10 probabilities, which are 83%, 2%, 1%, 3%, 0.5 %, 0.2%, 0.3%, 5%, 4%, 1%, the above 10 probabilities are the probabilities that the sample documents are A, B, C, D, E, F, G, H, I, J, respectively, Therefore, the predicted class of the output sample probability of the second classification sub-network is A.
  • the adjustment of the network parameters of the feature extraction sub-network may be stopped, and/or when the number of adjustments exceeds the preset number of times threshold, the feature extraction sub-network may be stopped. adjustment of network parameters.
  • a sample document set can be prepared in advance. First, a plurality of sample documents are acquired; next, a category of each of the sample documents is marked respectively; finally, a sample document set is determined according to the plurality of marked sample documents. In addition, one can also be selected from each type of sample document as a standard template of this type of document for use in subsequent storage of standard features.
  • the extraction capability of the feature extraction sub-network determines the accuracy of the extracted general feature, and the accuracy of the general feature determines the accuracy of the classification result. Therefore, the predicted category output by the second classification sub-network has an The accuracy can characterize the strength of the feature extraction sub-network extraction ability.
  • the second classification sub-network is used to realize the characterization of the extraction ability of the feature extraction sub-network, and then the network parameters of the feature extraction sub-network are adjusted by feedback, and the network parameters are continuously optimized to improve the extraction capability of the feature extraction sub-network, thereby improving the extraction of general features. accuracy, and improved document classification accuracy.
  • the standard features of the at least one type of documents are obtained by processing standard templates of the at least one type of documents by using a trained feature extraction sub-network.
  • the standard template of each type of document can be determined first.
  • the layout of the standard template is clear, the boundaries of the text box and/or text block are clear, and the text content is complete.
  • After extracting the general characteristics of the standard template of each type of document it is stored as the standard of this type of document. feature.
  • the standard template may also be marked, that is, marking the attributes of each position, text box and/or text block, etc. of the standard template, so that the standard template can be used for document recognition.
  • both the standard template and the general document of the document to be processed are extracted by the feature extraction sub-network. Therefore, the general feature and the standard feature have the same origin, and the rules and standards are consistent. Therefore, the similarity determined by the two is more accurate. The accuracy of document classification is further improved.
  • the standard features stored in the above manner are limited and cannot cover all document categories. Furthermore, according to the introduction of some of the foregoing embodiments, only when the highest similarity threshold is greater than or equal to the similarity threshold, the to-be-processed document can be classified into the document category corresponding to the highest similarity. Based on the above two reasons, when the category of a document is not covered by the preset standard template, the classification cannot be completed.
  • the document to be processed is added as a standard template, and the general feature of the document to be processed is determined as a standard feature of the category corresponding to the newly added standard template.
  • the highest similarity is less than the similarity threshold, indicating that the document to be processed does not belong to any preset document category, that is, the document to be processed is a new document category.
  • the unclassified documents to be processed are stored as a new category in the neural network, that is, the to-be-processed documents are stored as standard templates, and the extracted general features are stored as standard features of the new category of documents.
  • reminder information can also be generated to remind the user to mark the standard template of the category so that it can be used for layout recognition.
  • the first classification sub-network can automatically expand the classification dimension or quantity.
  • the number of preset document categories can be automatically expanded, and the classification capability can be continuously improved.
  • the method further includes: in response to a selection instruction, selecting at least one category from preset document categories as a target category; wherein the selection instruction may be triggered by a user through a selection operation, or a trigger condition may be preset , which is automatically triggered when the trigger condition is met.
  • the similarity between the general feature of the document to be processed and the standard feature of the at least one type of document is determined by comparing the general feature of the document to be processed with the standard feature of the document of at least one preset target category, The similarity between the general feature of the document to be processed and the standard feature of the document of the at least one target category is determined.
  • the preset document categories include general text, ID card, bank card, driving license and driving license , passport, general form, VAT invoice, business license, and handwritten text; the user selected ID card, bank card, general form, VAT invoice, and handwritten text as the target category through the operation. Then, in the subsequent processing based on the document to be identified, the multiple categories selected by the user will be used as a reference.
  • the content shown in Figure 5 is only a possible implementation.
  • the user can also create a template independently to establish a new target category, and treat the new target category as a document to be identified. reference in the process.
  • the target category may include at least part of the various categories shown in FIG. 5 , that is, it may be more or less than the situation shown in FIG. 5 , which is not limited herein.
  • the present disclosure also provides a document processing device, please refer to FIG. 6 , which shows the structure of the device.
  • the device includes: an acquisition module 601 for acquiring semantic features and visual features of a document to be processed; a general module 602, configured to determine a general feature of the document to be processed according to the semantic feature and the visual feature; a classification module 603, configured to determine a category of the document to be processed according to the general feature of the document to be processed.
  • the obtaining module is specifically configured to: obtain a text recognition result of the document to be processed; and obtain a semantic feature of the document to be processed based on the text recognition result.
  • acquiring the text recognition result of the document to be processed includes: determining a target text box in the document to be processed and text content contained in the target text box; acquiring each of the target texts The word segmentation processing result of the text content in the box; the feature vector corresponding to the word segmentation processing result is obtained.
  • the general module is specifically configured to: respectively perform regularization processing on the visual features and the semantic features; A weighted summation is performed to obtain the general features of the document to be processed.
  • the document processing apparatus includes a neural network
  • the neural network includes a feature extraction sub-network for extracting general features of the documents to be processed and a feature extraction sub-network for determining the documents to be processed according to the general features
  • the first classification sub-network of the category of the The similarity between the general feature and the standard feature of the at least one type of document; the type of the document to be processed is determined according to the obtained at least one similarity.
  • the first classification sub-network when used to determine the category of the document to be processed according to the obtained at least one similarity, it is specifically configured to: obtain the highest similarity among the at least one similarity ; In response to the highest similarity being greater than or equal to a preset similarity threshold, determine that the category of the document to which the standard feature corresponding to the highest similarity belongs is the category of the document to be processed.
  • the apparatus further includes a training module for training a feature extraction sub-network in the neural network, for: inputting sample documents into the feature extraction sub-network to obtain the sample documents The general feature of the sample document, wherein the sample document is marked with a category; the general feature is input to the second classification sub-network to obtain the predicted category of the sample document; according to the predicted category of the sample document and the sample document The differences between the categories are marked, and the network parameters of the feature extraction sub-network are adjusted.
  • a training module for training a feature extraction sub-network in the neural network, for: inputting sample documents into the feature extraction sub-network to obtain the sample documents The general feature of the sample document, wherein the sample document is marked with a category; the general feature is input to the second classification sub-network to obtain the predicted category of the sample document; according to the predicted category of the sample document and the sample document The differences between the categories are marked, and the network parameters of the feature extraction sub-network are adjusted.
  • the standard features of the at least one type of documents are obtained by performing feature extraction on the at least one type of documents using a trained feature extraction sub-network.
  • the apparatus further includes an extension module, configured to: in response to the highest similarity being less than the preset similarity threshold, add the document to be processed as a standard template, and determine the document to be processed
  • the general features of processing documents are the standard features of the corresponding category of the newly added standard template.
  • the apparatus further includes a target module, configured to: in response to a selection instruction, select at least one category from preset document categories as a target category; the first classification sub-network is used for comparing the The general feature of the document to be processed and the preset standard feature of at least one type of document, when determining the similarity between the general feature of the to-be-processed document and the standard feature of the at least one type of document, it is specifically used for: comparing the to-be-processed document The common feature of the processed document and the preset standard feature of the document of at least one target category are determined to determine the similarity between the common feature of the document to be processed and the standard feature of the document of the at least one target category.
  • the apparatus further includes an identification module, configured to: obtain a corresponding preset standard template according to the category of the document to be processed; and perform layout identification processing on the document to be processed based on the standard template , to get the result of document layout recognition.
  • the present disclosure also provides a document processing device, please refer to FIG. 7 , which shows the structure of the device, the device includes a non-volatile storage medium 701 and a processor 702, and the storage medium 701 is used for storing Computer instructions executable on a processor 702 for implementing the method described in any of the embodiments of the present disclosure when the computer instructions are executed.
  • the present disclosure also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method described in any of the embodiments of the present disclosure.
  • At least one target category in the multiple categories can be selected as a reference, thereby reducing the determination of The computational load of the step of similarity and the computational load of the step of comparing the similarity improve the efficiency of classification.
  • the method further includes: acquiring a corresponding preset standard template according to the category of the document to be processed; and performing a layout recognition process on the document to be processed based on the standard template to obtain a document layout recognition result.
  • the corresponding standard template is automatically and accurately retrieved through the classification result for layout recognition, which not only improves the accuracy of layout recognition, but also improves the efficiency of layout recognition.
  • one or more embodiments of this specification may be provided as a method, system or computer program product. Accordingly, one or more embodiments of this specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present specification may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein form of the product.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the subject matter and functional operations described in this specification can be implemented in digital electronic circuitry, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this specification and their structural equivalents, or in a combination of one or more.
  • Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
  • the processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from read only memory and/or random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operably coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both.
  • the computer does not have to have such a device.
  • the computer may be embedded in another device such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks.
  • semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
  • magnetic disks eg, internal hard disks or memory devices. removable disks
  • magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
  • the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

一种文档处理方法、装置、设备及计算机可读存储介质。所述方法包括:获取待处理文档的语义特征以及视觉特征(101);根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征(102);根据所述待处理文档的通用特征确定所述待处理文档的类别(103)。

Description

文档处理方法、装置、设备及计算机可读存储介质
相关申请的交叉引用
本专利申请要求于2020年6月29日提交的、申请号为202010610080.8、发明名称为“文档处理方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉技术,尤其涉及一种文档处理方法、装置、设备及计算机可读存储介质。
背景技术
目前通常应用OCR(Optical Character Recognition,光学字符识别)技术对文档进行识别。利用该技术识别时,需要准确获取文档的类别,并使用相应的模板,但是相关技术中文档的分类结果并不准确。
因此,如何对文档进行准确分类,已成为一个亟待解决的问题。
发明内容
本公开实施例提供了一种文档分类方案。
根据本公开的一方面,提供一种文档处理方法,所述方法包括:获取待处理文档的语义特征以及视觉特征;根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征;根据所述待处理文档的通用特征确定所述待处理文档的类别。
结合本公开提供的任一实施方式,所述获取待处理文档的语义特征,包括:获取所述待处理文档的文本识别结果;基于所述文本识别结果,获得所述待处理文档的语义特征。
结合本公开提供的任一实施方式,所述获取所述待处理文档的文本识别结果,包括:确定所述待处理文档中的目标文本框以及所述目标文本框所包含的文本内容;获得各个所述目标文本框中的文本内容的分词处理结果;获得所述分词处理结果对应的特征向量。
结合本公开提供的任一实施方式,所述根据所述视觉特征和所述语义特征确定所述 待处理文档的通用特征,包括:分别对所述视觉特征和所述语义特征进行正则化处理;对正则化处理后的所述视觉特征和正则化处理后的所述语义特征进行加权求和,得到所述待处理文档的通用特征。
结合本公开提供的任一实施方式,所述文档处理方法利用神经网络执行,所述神经网络包括用于提取所述待处理文档的通用特征的特征提取子网络和用于根据所述通用特征确定所述待处理文档的类别的第一分类子网络,其中,所述第一分类子网络具体用于:比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度;根据所获得的至少一个相似度确定所述待处理文档的类别。
结合本公开提供的任一实施方式,所述根据所获得的至少一个相似度确定所述待处理文档的类别,包括:获得所述至少一个相似度中最高的相似度;应于所述最高的相似度大于或等于预设的相似度阈值,确定所述最高的相似度对应的标准特征所属文档的类别为所述待处理文档的类别。
结合本公开提供的任一实施方式,所述方法还包括对所述神经网络中的特征提取子网络进行训练,具体包括:将样本文档输入至所述特征提取子网络,获得所述样本文档的通用特征,其中,所述样本文档标注有类别;将所述通用特征输入至第二分类子网络,获得所述样本文档的预测类别;根据所述样本文档的预测类别和所述样本文档的标注类别之间的差异,对所述特征提取子网络的网络参数进行调整。
结合本公开提供的任一实施方式,所述至少一类文档的标准特征是利用训练完成的特征提取子网络,对所述至少一类文档进行特征提取而获得的。
结合本公开提供的任一实施方式,所述方法还包括:响应于所述最高的相似度小于所述预设的相似度阈值,增加所述待处理文档为标准模板,并确定所述待处理文档的通用特征为新增标准模板对应类别的标准特征。
结合本公开提供的任一实施方式,所述方法还包括:响应于选择指令,从预设的文档类别中选择至少一个类别作为目标类别;所述比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度,包括:比较所述待处理文档的通用特征与预设的至少一个目标类别的文档的标准特征,确定所述待处理文档的通用特征与所述至少一个目标类别的文档的标准特征的相似度。
结合本公开提供的任一实施方式,所述方法还包括:根据所述待处理文档的类别获取对应的预设的标准模板;基于所述标准模板,对所述待处理文档进行版式识别处理,得到文档的版式识别结果。
根据本公开的一方面,提供一种文档处理装置,所述装置包括:获取模块,用于获取待处理文档的语义特征以及视觉特征;通用模块,用于根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征;分类模块,用于根据所述待处理文档的通用特征确定所述待处理文档的类别。
根据本公开的一方面,提供一种文档处理设备,所述设备包括非易失性存储介质、处理器,所述存储介质用于存储可在处理器上运行的计算机指令,所述处理器用于在执行本公开任一实施方式所述的方法。
根据本公开的一方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施方式所述的方法。
根据本公开的一方面,提供一种计算机程序,所述程序被处理器执行时实现本公开任一实施方式所述的方法。
本公开一个或多个实施例的文档处理方法、装置、设备、计算机可读介质及计算机程序,根据获得的文档的视觉特征和语义特征确定文档的通用特征,并根据通用特征确定文档的类别。本公开的文档处理方法,能够实现对任意文档的准确分类;通过结合语义特征和视觉特征得到文档的通用特征,提高了视觉特征相似的不同类别文档的分类结果准确性,也提高了文档分类的鲁棒性。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本说明书的实施例,并与说明书一起用于解释本说明书的原理。
图1是本公开实施例示出的一种文档处理方法的流程图;
图2示意性示出了根据本公开实施例的用于提取视觉特征的神经网络的部分网络结构;
图3示意性示出了根据本公开实施例的用于提取语义特征的神经网络的部分网络结构;
图4是本公开实施例示出的表单的文本识别过程的示意图;
图5是本公开实施例示出的用户选择界面示意图;
图6是本公开实施示出的一种文档处理装置的示意图;
图7是本公开实施例示出的一种文档处理设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
目前通常应用OCR(Optical Character Recognition,光学字符识别)技术对文档进行识别。利用该技术识别时,需要准确获取文档的类别,并使用相应的模板,但是相关技术中文档的分类结果并不准确。
基于此,本公开至少一个实施例提供了一种文档处理方法,请参照图1,其示出了该文档处理方法的流程,包括步骤S101至步骤S103。
其中,所述文档可以包括书籍、文件、表单、票据、证件和射频卡等中的一项或是多项,具体的,例如通用文字、身份证、银行卡、行驶证、驾驶证、护照、表单、***、 营业执照和手写文件等。该文档处理方法可以自动识别上述文档的类别,例如能够自动将一张银行卡识别为银行卡类别,或自动将一张身份证识别为身份证类别,或自动将一张***识别为***类别。需要说明的是,在实现过程中,待处理文档可以为一个或是多个。也就意味着,用户可以基于自身需求,选择待处理文档的批量处理或是单件处理。在实现过程中,批量处理中每件待处理文档的处理过程,与单件待处理文档的处理过程类似,可以参考单件待处理文档的处理过程。在本申请中,为了方便表述,以待处理文档是单件为例,进行说明,但并不作为对本申请技术方案的限定。
在步骤S101中,获取待处理文档的语义特征以及视觉特征。
本步骤无意对获取语义特征和获取视觉特征的先后顺序进行具体限制,也就是说,可以先获取语义特征,再获取视觉特征,或先获取视觉特征,再获取语义特征,或同时获取语义特征和视觉特征。
本步骤中,可以采用神经网络提取待处理文档的视觉特征。具体的,可以先采用卷积核(例如3*3的卷积核)提取待处理文档的初始特征,然后初始特征经过多个(例如7个)逆向残差块依次提取中间特征,最后一个逆向残差块输出的中间特征再经过一个卷积核(例如1*1的卷积核)进行卷积,从而输出指定维度的特征,作为待处理文档的视觉特征。每一个逆向残差块都包括一个1*1的卷积核和激活函数(例如Relu6)组成的升通道模块(用于扩展输入的特征的通道数)、一个深度可分的卷积层和激活函数组成的提取模块(用于提取每个通道的特征和将各个通道的特征进行连接),以及一个1*1的卷积核构成的降通道模块(用于还原输入的特征的通道数)。每个逆向残差块均将其输入和降通道模块的输出相加作为逆向残差块的输出。除最后一个逆向残差块之外的每个逆向残差块的输出均作为下一个逆向残差块的输入。
在一个示例中,图2示意性示出了用于提取待处理文档的视觉特征的网络结构的一部分。图2所示的部分网络结构包含两个逆向残差块,即第一逆向残差块201和第二逆向残差块202。第一逆向残差块201包括依次连接的第一升通道模块2011、第一提取模块2012、第一降通道模块2013。其中,第一升通道模块2011例如可以由一个1*1的卷积核(Conv1*1)和激活函数(例如Relu6)组成,第一提取模块2012例如可以由深度可分的3*3卷积层(Dwise3*3)和激活函数(例如Relu6)组成,第一降通道模块2013例如可以由一个1*1的卷积核(Conv1*1)组成。第一逆向残差块201的第一输入为待处理文档的初始特征,其例如可以采用3*3的卷积核提取得到。第一逆向残差块201的第一输出为第一输入和第一降通道模块的输出的和,且第一输出即为第二逆向残差块 202的第二输入。第二逆向残差块202包括依次连接的第二升通道模块2021、第二提取模块2022、第二降通道模块2023。其中,第二升通道模块2021例如可以由一个1*1的卷积核(Conv1*1)和激活函数(例如Relu6)组成,第二提取模块2022例如可以由深度可分的卷积层(Dwise3*3)和激活函数(例如Relu6)组成,第二降通道模块例如可以由一个1*1的卷积核(Conv1*1)组成。第二逆向残差块202的第二输出为第二输入和第二降通道模块的输出的和。
本步骤中,可以采用下述方式获取待处理文档的语义特征:首先,获取所述待处理文档的文本识别结果;接下来,基于所述文本识别结果,获得所述待处理文档的语义特征。
其中,文本识别结果可以是将待处理文档中的文本内容进行提取并采用特定方式进行表示的结果。在一个示例中,可以采用OCR技术获取待处理文档的文本识别结果。
其中,可以采用神经网络提取文本识别结果的语义特征。具体的,可以先提取文本识别结果的不同层次的特征,再对上述不同层次的特征进行连接以及提取,最后得到文本识别结果的语义特征。
请参照附图3,在一个示例中,首先利用至少一个第三提取模块301获取文本识别结果的中间特征,其中,各个第三提取模块301可以是感受野不同的卷积核。例如,可以采用感受野为1的卷积核、感受野为3的卷积核和感受野为5的卷积核提取文本识别结果的三个不同层次的特征(例如通过卷积和/或池化等操作),然后将三个不同层次的特征进行连接,以得到中间特征。再利用第四提取模块302(例如1*1的卷积核)对中间特征进行进一步的特征提取(例如通过卷积和/或池化等操作),以得到文本识别结果的语义特征。
上述附图3对应的特征提取过程,只是提取语义特征的一个示例,并非对提取文本识别结果的语义特征的方式的具体限定,可以采用更多数量或更少数量的卷积核以及其他的感受野组合提取不同层次的特征。
其中,待处理文档的语义特征能够用于区分视觉特征相似但文本内容不同的多种文档。而上述的多种文档恰恰是相关技术中无法准确分类的情况之一,本实施例通过加入语义特征解决了相关技术中的这一问题。
在步骤S102中,根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征。
其中,步骤S101在提取视觉特征和提取语义特征时,可以输出维度相同的视觉特征和语义特征,从而方便两种特征的融合。当然,本实施例无意对步骤S101中提取到的视觉特征和语义特征的维度关系进行限定。
其中,步骤S101在提取视觉特征和提取语义特征时,还可以输出不同维度的视觉特征和语义特征。这种情况下,可以比较两种特征的维度,然后对两种特征中维度较高的特征进行降维以使两种特征的维度相同,再进行两种特征的融合。降维方式例如可以采用线性降维和非线性降维。
在一个示例中,首先,分别对所述视觉特征和所述语义特征进行正则化处理;接下来,对正则化处理后的所述视觉特征和正则化处理后的所述语义特征进行加权求和,得到所述待处理文档的通用特征。
还可以采用其他方式获得待处理文档的通用特征,例如对视觉特征和语义特征进行归一化或标准化后,进行加权求和,或者采用按点逐位相加或向量拼接的方式对语义特征和视觉特征进行融合,以得到待处理文档的通用特征,等等。
在本公开的实施例中,通过融合待处理文档的语义特征和视觉特征,能够获得待处理文档的通用特征。其中,待处理文档的通用特征可以用于步骤S103中的文档分类,还可以用于文档比对以匹配文档图片。
在步骤S103中,根据所述待处理文档的通用特征确定所述待处理文档的类别。
在本公开实施例中,根据获得的文档的视觉特征和语义特征确定文档的通用特征,并根据通用特征确定文档的类别。本公开的文档处理方法,能够实现对任意文档的准确分类;通过结合语义特征和视觉特征得到文档的通用特征,提高了视觉特征相似的不同类别文档的分类结果准确性,也提高了文档分类的鲁棒性。
在一些实施例中,可以通过以下方式获取所述待处理文档的文本识别结果:
首先,确定所述待处理文档中的目标文本框以及所述目标文本框所包含的文本内容。
接下来,获得各个所述目标文本框中的文本内容的分词处理结果。
最后,获得所述分词处理结果对应的特征向量。
请参照附图4,其示出了一个待处理文档(即,表单)的文本识别过程。通过文本识别,确定待处理文档中的目标文本框,即401至415这15个文本框,以及每个目标文本框中包含的文本内容。例如,文本框401中包含办公用品请购表,文本框402中包 含填表时间年月日,文本框415中包含总经理意见。通过对各个文本框中的文本内容进行分词处理,得到多个分词处理结果,例如,416至426这11个分词处理结果,也就是对上述15个文本框中的文本内容进行分词处理后得到的部分分词处理结果。分词处理结果可以包括字或词,例如,分词处理结果416(办公)、417(用品)、418(请购)和419(表)就是文本框401中的文本内容经过分词处理后得到的4个分词处理结果;分词处理结果420(填表)、421(时间)、422(年)、423(月)和424(日)为文本框402中的文本内容经过分词处理后得到的5个分词处理结果;分词处理结果425(总经理)和426(意见)是文本框415中的文本内容经过分词处理后得到的2个分词处理结果。427至438为12个特征向量,每个特征向量都是一个分词处理结果经过特征向量表示后得到的结果。
在本公开实施例中,通过确定文档中的目标文本框以及目标文本框内的文本内容,并对文本内容经过分词处理和特征向量表示后得到文本识别结果。不仅提取了文档中的文本内容(比如,文档中的部分或是全部文本内容),而且经过文本框的划分以及分词处理,能够得到文本中的最小字/词单位,因此确定语义特征非常准确,进一步提高了文档分类的准确性;而且文本识别结果为特征向量表示,便于进行语义特征的提取,进一步提高了文档分类的效率。
在一些实施例中,所述文档处理方法可以利用神经网络执行,所述神经网络可以包括用于提取所述待处理文档的通用特征的特征提取子网络和用于根据所述通用特征确定所述待处理文档的类别的第一分类子网络。其中,所述第一分类子网络可以具体用于:比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的至少一个相似度;根据至少一个相似度确定所述待处理文档的类别。
其中,待处理文档的通用特征和标准特征的维度可以相同,从而便于通用特征和标准特征的比较。通用特征和标准特征的相似度可以通过计算二者的欧氏距离获得,或通过一个能够输出二者的相似度的神经网络获得,该神经网络通过训练得到。
在本公开实施例中,神经网络内预设了各类文档的标准特征。利用待处理文档的通用特征和不同标准特征的相似度确定待处理文档的类别。通过相似度表征了待处理文档与各类标准文档的关系,即是否相似以及相似的程度,提高了分类结果的准确性,而且运算简单,分类效率得到进一步提高。
在一些实施例中,根据至少一个相似度确定所述待处理文档的类别具体采用下述方 式:
首先,获得所述至少一个相似度中最高的相似度。
接下来,响应于所述最高的相似度大于或等于预设的相似度阈值,则确定所述最高的相似度对应的所述标准特征所属文档的类别为所述待处理文档的类别。
其中,通过比较各个相似度确定最高的相似度。当出现至少两个相同的最高的相似度时,可以返回计算相似度的步骤,将以更高的精度重新计算相似度,然后再次将计算结果进行比较,从而得到一个最高的相似度。如果重复计算一次或多次,依然包括至少两个相同的最高的相似度,则继续重复计算,直至仅剩一个最高相似度为止。
需要说明的是,在实现过程中,还可以先将相似度与预设的相似度阈值进行比较,以筛选出取值大于或等于相似度阈值的一个或是多个相似度,而后从筛选出的相似度中得到最高的相似度。由此可见,确定唯一最高相似度的实现方式,可以包括但不限于上述例举的两种情况,在实现过程中,还可以采用其他可以达到相同或是相似效果的实现方式,在此不一一例举。
本实施例中,只有高于相似度阈值的相似度才被认为是有效的相似度,也就是说,待处理文档的通用特征与标准特征的相似度高于或等于相似度阈值,才被认为待处理文档与标准文档间相似,进而相似度高于相似度阈值越多,则被认为待处理文档与标准文档间的相似程度越高;待处理文档的通用特征与标准特征的相似度低于相似度阈值,则被认为待处理文档与标准文档间不相似。
在本公开实施例中,神经网络内预设了相似度阈值。通过比较最高相似度与相似度阈值,并在最高相似度大于相似度阈值时才将待处理文档分类至标准文档对应的类别。避免了待处理文档的通用特征与全部标准特征的相似度都较低时,也就是待处理文档不属于任何一个标准文档对应的类别时发生分类错误。进一步提高了分类的准确性,避免了预设类别之外的文档被误分类的问题。
在一些实施例中,采用下述方式对对所述神经网络中的特征提取子网络进行训练:
首先,将样本文档输入至所述特征提取子网络,获得所述样本文档的通用特征,其中,所述样本文档标注有类别;
接下来,将所述通用特征输入至第二分类子网络,获得所述样本文档的预测类别;
最后,根据所述样本文档的预测类别和所述样本文档的标注类别间的差异,对所述 特征提取子网络的网络参数进行调整。
其中,所述特征提取子网络的网络结构使其能够提取输入至其内的文档的通用特征,训练特征提取子网络就是希望提高其提取特征的准确性。
其中,第二分类子网络为一个分类器,例如其可以由至少一个全连接层和归一化层构成;第二分类子网络分类的类别数量固定,对应于样本文档的类别数量,例如5个、8个或10个等,也就是说,第二分类子网络的输出为各个预设类别的概率,概率最高的一个类别即为分类结果。例如,共有10类样本文档,分别为A、B、C、D、E、F、G、H、I、J,第二分类子网络的输出维度为10,分别对应上述10个类别。当特征提取子网络所提取到的一个样本文档的通用特征输入至第二分类子网络后,第二分类子网络输出了10个概率,分别为83%、2%、1%、3%、0.5%、0.2%、0.3%、5%、4%、1%,上述10个概率为该样本文档分别为A、B、C、D、E、F、G、H、I、J类的概率,因此第二分类子网络输出样本概率的预测类别为A。
其中,可以当网络损失值小于预设损失值阈值时,停止对所述特征提取子网络的网络参数的调整,和/或当调整次数超过预设次数阈值时,停止对所述特征提取子网络的网络参数的调整。
其中,可以预先准备样本文档集。首先,获取多个样本文档;接下来,分别标记每个所述样本文档的类别;最后,根据多个标记类别后的样本文档确定样本文档集。另外,还可以从每种样本文档中选择一个作为该类文档的标准模板,以备后续存储标准特征使用。
本公开的实施例中,特征提取子网络的提取能力决定了提取的通用特征的准确性,而通用特征的准确性又决定了分类结果的准确性,因此第二分类子网络输出的预测类别的准确性可以表征特征提取子网络提取能力的强弱。借助第二分类子网络实现特征提取子网络的提取能力的表征,进而反馈调节特征提取子网络的网络参数,不断优化网络参数以提高了特征提取子网络的提取能力,进而提高了提取的通用特征的准确性,以及提高了文档分类的准确性。
在一些实施例中,所述至少一类文档的标准特征是利用训练完成的特征提取子网络,对所述至少一类文档的标准模板进行处理获得。
其中,特征提取子网络训练完成后,具备准确提取输入至其内的文档的通用特征的能力。可以先确定每类文档的标准模板,标准模板的版式清晰,文本框和/或文本块的界 限清晰,文本内容完整,提取每类文档的标准模板的通用特征后,存储为该类文档的标准特征。还可以对标准模板进行标注,也就是对标准模板的每个位置、文本框和/或文本块等的属性进行标注,从而该标准模板能够用于进行文档的版式识别(document recognition)。
本公开实施例中,标准模板和待处理文档的通用文档均采用特征提取子网络进行提取,因此通用特征和标准特征同源,规则标准一致,因此通过二者确定的相似度准确性较高,进一步提高了文档分类的准确性。
通过上述方式存储的标准特征是有限的,不能涵盖所有文档的类别。而且根据前述一些实施例的介绍,只有当最高相似度阈值大于或等于相似度阈值时,才能将待处理文档分类至最高相似度对应的文档类别中。基于上述两方面的原因,当一个文档的类别未被预设的标准模板涵盖时,便无法完成分类。
因此,在一些实施例中,采用下述方式增加标准特征:
响应于所述最高的相似度小于预设的相似度阈值,则增加所述待处理文档为标准模板,并确定所述待处理文档的通用特征为新增标准模板对应类别的标准特征。
其中,最高相似度小于相似度阈值,说明待处理文档不属于任何一个预设的文档类别,也就是该待处理文档为一个新的文档类别。分类失败时,将未能完成分类的待处理文档作为一个新的类别存储至神经网络,即存储待处理文档为标准模板,存储提取的通用特征为该新类别文档的标准特征。还可以在存储该类别后,生成提醒信息,以提醒用户对该类别的标准模板进行标注,以使其能够用于版式识别。
本公开实施例中,由于特征提取子网络能够准确的提取待处理文档的通用特征,因此第一分类子网络能够自动扩展分类维度或数量。
本公开的实施例中,通过将分类失败的待处理文档存储并设置为一个新的类别,能够自动扩展预设文档类别的数量,不断提高分类能力。
在一些实施例中,还包括:响应于选择指令,从预设的文档类别中选择至少一个类别作为目标类别;其中,所述选择指令可以是用户通过选择操作触发的,也可以预设触发条件,当满足触发条件时自动触发。
采用下述方式确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度:比较所述待处理文档的通用特征与预设的至少一个目标类别的文档的标准特征,确定所述待处理文档的通用特征与所述至少一个目标类别的文档的标准特征的相似度。
在一个示例中,请参照附图5,其示出了一个用户选择界面中的部分内容,从图中可以看出,预设的文档类别包括通用文字、身份证、银行卡、行驶证驾驶证、护照、通用表单、增值税***、营业执照和手写文字;用户通过操作选择了身份证、银行卡、通用表单、增值税***和手写文字作为目标类别。那么在后续基于待识别文档进行处理过程中,会将用户选定的多个类别作为参考。
需要说明的是,图5所示内容仅为一种可能的实现方式,在实际应用过程中,用户还可以自主创建模板,以建立新的目标类别,并将新的目标类别作为待识别文档处理过程中的参考。此外,目标类别可以包括图5所示的多种类别中的至少部分,即可以是多于或是少于图5中所示的情况,在此不予限定。
本公开还提供了一种文档处理装置,请参照附图6,其示出了该装置的结构,所述装置包括:获取模块601,用于获取待处理文档的语义特征以及视觉特征;通用模块602,用于根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征;分类模块603,用于根据所述待处理文档的通用特征确定所述待处理文档的类别。
在一些实施例中,所述获取模块具体用于:获取所述待处理文档的文本识别结果;基于所述文本识别结果,获得所述待处理文档的语义特征。
在一些实施例中,所述获取所述待处理文档的文本识别结果,包括:确定所述待处理文档中的目标文本框以及所述目标文本框所包含的文本内容;获得各个所述目标文本框中的文本内容的分词处理结果;获得所述分词处理结果对应的特征向量。
在一些实施例中,所述通用模块具体用于:分别对所述视觉特征和所述语义特征进行正则化处理;对正则化处理后的所述视觉特征和正则化处理后的所述语义特征进行加权求和,得到所述待处理文档的通用特征。
在一些实施例中,所述文档处理装置包括神经网络,所述神经网络包括用于提取所述待处理文档的通用特征的特征提取子网络和用于根据所述通用特征确定所述待处理文档的类别的第一分类子网络,其中,所述第一分类子网络具体用于:比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度;根据所获得的至少一个相似度确定所述待处理文档的类别。
在一些实施例中,所述第一分类子网络在用于根据所获得的至少一个相似度确定所述待处理文档的类别时,具体用于:获得所述至少一个相似度中最高的相似度;响应于 所述最高的相似度大于或等于预设的相似度阈值,确定所述最高的相似度对应的标准特征所属文档的类别为所述待处理文档的类别。
在一些实施例中,所述装置还包括用于对所述神经网络中的特征提取子网络进行训练的训练模块,用于:将样本文档输入至所述特征提取子网络,获得所述样本文档的通用特征,其中,所述样本文档标注有类别;将所述通用特征输入至第二分类子网络,获得所述样本文档的预测类别;根据所述样本文档的预测类别和所述样本文档的标注类别之间的差异,对所述特征提取子网络的网络参数进行调整。
在一些实施例中,所述至少一类文档的标准特征是利用训练完成的特征提取子网络,对所述至少一类文档进行特征提取而获得的。
在一些实施例中,所述装置还包括扩展模块,用于:响应于所述最高的相似度小于所述预设的相似度阈值,增加所述待处理文档为标准模板,并确定所述待处理文档的通用特征为新增标准模板对应类别的标准特征。
在一些实施例中,所述装置还包括目标模块,用于:响应于选择指令,从预设的文档类别中选择至少一个类别作为目标类别;所述第一分类子网络在用于比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度时,具体用于:比较所述待处理文档的通用特征与预设的至少一个目标类别的文档的标准特征,确定所述待处理文档的通用特征与所述至少一个目标类别的文档的标准特征的相似度。
在一些实施例中,所述装置还包括识别模块,用于:根据所述待处理文档的类别获取对应的预设的标准模板;基于所述标准模板,对所述待处理文档进行版式识别处理,得到文档的版式识别结果。
本公开还提供了一种文档处理设备,请参照附图7,其示出了该设备的结构,所述设备包括非易失性存储介质701、处理器702,所述存储介质701用于存储可在处理器702上运行的计算机指令,所述处理器702用于在执行所述计算机指令时实现本公开任一实施例所述的方法。
本公开还提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例所述的方法。
在本公开的实施例中,根据已知的多种类别的文档利用本实施例的分类方法对待处理文档进行分类时,可以选定这多种类别中的至少一个目标类别作为参考,从而降 低确定相似度的步骤的运算负荷以及比较相似度的步骤的运算负荷,提高了分类的效率。
在一些实施例中,还包括:根据所述待处理文档的类别获取对应的预设的标准模板;基于所述标准模板,对所述待处理文档进行版式识别处理,得到文档的版式识别结果。
其中,通过分类结果自动准确调取对应的标准模板进行版式识别,不仅提高了版式识别的准确性,而且提高了版式识别的效率。
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、***或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本说明书中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本说明书中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本说明书中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本说明书中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位***(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本说明书包含许多具体实施细节,但是这些不应被解释为限制任何发明的范围或所要求保护的范围,而是主要用于描述特定发明的具体实施例的特征。本说明书内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种***模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和***通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。

Claims (21)

  1. 一种文档处理方法,其特征在于,所述方法包括:
    获取待处理文档的语义特征以及视觉特征;
    根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征;
    根据所述待处理文档的通用特征确定所述待处理文档的类别。
  2. 根据权利要求1所述的文档处理方法,其特征在于,所述获取待处理文档的语义特征,包括:
    获取所述待处理文档的文本识别结果;
    基于所述文本识别结果,获得所述待处理文档的语义特征。
  3. 根据权利要求2所述的文档处理方法,其特征在于,所述获取所述待处理文档的文本识别结果,包括:
    确定所述待处理文档中的目标文本框以及所述目标文本框所包含的文本内容;
    获得各个所述目标文本框中的文本内容的分词处理结果;
    获得所述分词处理结果对应的特征向量。
  4. 根据权利要求1所述的文档处理方法,其特征在于,所述根据所述视觉特征和所述语义特征确定所述待处理文档的通用特征,包括:
    分别对所述视觉特征和所述语义特征进行正则化处理;
    对正则化处理后的所述视觉特征和正则化处理后的所述语义特征进行加权求和,得到所述待处理文档的通用特征。
  5. 根据权利要求1至4任一项所述的文档处理方法,其特征在于,所述文档处理方法利用神经网络执行,所述神经网络包括用于提取所述待处理文档的通用特征的特征提取子网络和用于根据所述通用特征确定所述待处理文档的类别的第一分类子网络,其中,所述第一分类子网络具体用于:
    比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度;
    根据所获得的至少一个相似度确定所述待处理文档的类别。
  6. 根据权利要求5所述的文档处理方法,其特征在于,所述根据所获得的至少一个相似度确定所述待处理文档的类别,包括:
    获得所述至少一个相似度中最高的相似度;
    响应于所述最高的相似度大于或等于预设的相似度阈值,确定所述最高的相似度对应的标准特征所属文档的类别为所述待处理文档的类别。
  7. 根据权利要求5或6所述的文档处理方法,其特征在于,所述方法还包括对所述神经网络中的特征提取子网络进行训练,具体包括:
    将样本文档输入至所述特征提取子网络,获得所述样本文档的通用特征,其中,所述样本文档标注有类别;
    将所述通用特征输入至第二分类子网络,获得所述样本文档的预测类别;
    根据所述样本文档的预测类别和所述样本文档的标注类别之间的差异,对所述特征提取子网络的网络参数进行调整。
  8. 根据权利要求7所述的文档处理方法,其特征在于,所述至少一类文档的标准特征是利用训练完成的特征提取子网络,对所述至少一类文档进行特征提取而获得的。
  9. 根据权利要求6至8任一项所述的文档处理方法,其特征在于,所述方法还包括:
    响应于所述最高的相似度小于所述预设的相似度阈值,增加所述待处理文档为标准模板,并确定所述待处理文档的通用特征为新增标准模板对应类别的标准特征。
  10. 根据权利要求5至9任一项所述的文档处理方法,其特征在于,所述方法还包括:
    响应于选择指令,从预设的文档类别中选择至少一个类别作为目标类别;
    所述比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度,包括:
    比较所述待处理文档的通用特征与预设的至少一个目标类别的文档的标准特征,确定所述待处理文档的通用特征与所述至少一个目标类别的文档的标准特征的相似度。
  11. 根据权利要求1至10任一项所述的文档处理方法,其特征在于,所述方法还包括:
    根据所述待处理文档的类别获取对应的预设的标准模板;
    基于所述标准模板,对所述待处理文档进行版式识别处理,得到文档的版式识别结果。
  12. 一种文档处理装置,其特征在于,所述装置包括:
    获取模块,用于获取待处理文档的语义特征以及视觉特征;
    通用模块,用于根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征;
    分类模块,用于根据所述待处理文档的通用特征确定所述待处理文档的类别。
  13. 根据权利要求12所述的文档处理装置,其特征在于,所述获取模块具体用于:
    获取所述待处理文档的文本识别结果;
    基于所述文本识别结果,获得所述待处理文档的语义特征。
  14. 根据权利要求13所述的文档处理装置,其特征在于,所述获取所述待处理文档的文本识别结果,包括:
    确定所述待处理文档中的目标文本框以及所述目标文本框所包含的文本内容;
    获得各个所述目标文本框中的文本内容的分词处理结果;
    获得所述分词处理结果对应的特征向量。
  15. 根据权利要求12所述的文档处理装置,其特征在于,所述通用模块具体用于:
    分别对所述视觉特征和所述语义特征进行正则化处理;
    对正则化处理后的所述视觉特征和正则化处理后的所述语义特征进行加权求和,得到所述待处理文档的通用特征。
  16. 根据权利要求12至15任一项所述的文档处理装置,其特征在于,所述文档处理装置包括神经网络,所述神经网络包括用于提取所述待处理文档的通用特征的特征提取子网络和用于根据所述通用特征确定所述待处理文档的类别的第一分类子网络,其中,所述第一分类子网络具体用于:
    比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度;
    根据所获得的至少一个相似度确定所述待处理文档的类别。
  17. 根据权利要求16所述的文档处理装置所述的文档处理装置,其特征在于,所述第一分类子网络在用于根据所获得的至少一个相似度确定所述待处理文档的类别时,具体用于:
    获得所述至少一个相似度中最高的相似度;
    响应于所述最高的相似度大于或等于预设的相似度阈值,确定所述最高的相似度对应的标准特征所属文档的类别为所述待处理文档的类别,或
    响应于所述最高的相似度小于所述预设的相似度阈值,增加所述待处理文档为标准模板,并确定所述待处理文档的通用特征为新增标准模板对应类别的标准特征。
  18. 根据权利要求16或17所述的文档处理装置,其特征在于,还包括:
    目标模块,用于响应于选择指令,从预设的文档类别中选择至少一个类别作为目标类别;
    所述第一分类子网络在用于比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度时,具体用于:
    比较所述待处理文档的通用特征与预设的至少一个目标类别的文档的标准特征,确定所述待处理文档的通用特征与所述至少一个目标类别的文档的标准特征的相似度。
  19. 一种文档处理设备,其特征在于,所述设备包括非暂时性存储介质、处理器,所述存储介质用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求1至11任一项所述的方法。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至11任一所述的方法。
  21. 一种计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至11任一所述的方法。
PCT/CN2021/099799 2020-06-29 2021-06-11 文档处理方法、装置、设备及计算机可读存储介质 WO2022001637A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020227004409A KR20220031097A (ko) 2020-06-29 2021-06-11 문서 처리 방법, 장치, 기기 및 컴퓨터 판독 가능 저장 매체
JP2022506431A JP2022543052A (ja) 2020-06-29 2021-06-11 文書処理方法、文書処理装置、文書処理機器、コンピュータ可読記憶媒体及びコンピュータプログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010610080.8 2020-06-29
CN202010610080.8A CN111782808A (zh) 2020-06-29 2020-06-29 文档处理方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022001637A1 true WO2022001637A1 (zh) 2022-01-06

Family

ID=72760274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099799 WO2022001637A1 (zh) 2020-06-29 2021-06-11 文档处理方法、装置、设备及计算机可读存储介质

Country Status (4)

Country Link
JP (1) JP2022543052A (zh)
KR (1) KR20220031097A (zh)
CN (1) CN111782808A (zh)
WO (1) WO2022001637A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782808A (zh) * 2020-06-29 2020-10-16 北京市商汤科技开发有限公司 文档处理方法、装置、设备及计算机可读存储介质
CN112612911A (zh) * 2020-12-30 2021-04-06 华为技术有限公司 一种图像处理方法、***、设备及介质、程序产品
CN112861757B (zh) * 2021-02-23 2022-11-22 天津汇智星源信息技术有限公司 基于文本语义理解的笔录智能审核方法及电子设备
CN113051396B (zh) * 2021-03-08 2023-11-17 北京百度网讯科技有限公司 文档的分类识别方法、装置和电子设备
CN113297951A (zh) * 2021-05-20 2021-08-24 北京市商汤科技开发有限公司 文档处理方法、装置、设备及计算机可读存储介质
CN113742483A (zh) * 2021-08-27 2021-12-03 北京百度网讯科技有限公司 文档分类的方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344815A (zh) * 2018-12-13 2019-02-15 深源恒际科技有限公司 一种文档图像分类方法
CN110008944A (zh) * 2019-02-20 2019-07-12 平安科技(深圳)有限公司 基于模板匹配的ocr识别方法及装置、存储介质
CN110298338A (zh) * 2019-06-20 2019-10-01 北京易道博识科技有限公司 一种文档图像分类方法及装置
US20190325212A1 (en) * 2018-04-20 2019-10-24 EMC IP Holding Company LLC Method, electronic device and computer program product for categorization for document
CN111782808A (zh) * 2020-06-29 2020-10-16 北京市商汤科技开发有限公司 文档处理方法、装置、设备及计算机可读存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3851742B2 (ja) * 1999-03-31 2006-11-29 株式会社東芝 帳票処理方法及び装置
JP6030172B2 (ja) * 2015-03-12 2016-11-24 株式会社東芝 手書き文字検索装置、方法及びプログラム
US10354009B2 (en) * 2016-08-24 2019-07-16 Microsoft Technology Licensing, Llc Characteristic-pattern analysis of text
US10936970B2 (en) * 2017-08-31 2021-03-02 Accenture Global Solutions Limited Machine learning document processing
CN108288067B (zh) * 2017-09-12 2020-07-24 腾讯科技(深圳)有限公司 图像文本匹配模型的训练方法、双向搜索方法及相关装置
CN109033478B (zh) * 2018-09-12 2022-08-19 重庆工业职业技术学院 一种用于搜索引擎的文本信息规律分析方法与***
CN111480166B (zh) * 2018-12-05 2023-05-05 北京百度网讯科技有限公司 从视频中定位目标视频片段的方法和装置
CN110866116A (zh) * 2019-10-25 2020-03-06 远光软件股份有限公司 政策文档的处理方法、装置、存储介质及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190325212A1 (en) * 2018-04-20 2019-10-24 EMC IP Holding Company LLC Method, electronic device and computer program product for categorization for document
CN109344815A (zh) * 2018-12-13 2019-02-15 深源恒际科技有限公司 一种文档图像分类方法
CN110008944A (zh) * 2019-02-20 2019-07-12 平安科技(深圳)有限公司 基于模板匹配的ocr识别方法及装置、存储介质
CN110298338A (zh) * 2019-06-20 2019-10-01 北京易道博识科技有限公司 一种文档图像分类方法及装置
CN111782808A (zh) * 2020-06-29 2020-10-16 北京市商汤科技开发有限公司 文档处理方法、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
JP2022543052A (ja) 2022-10-07
CN111782808A (zh) 2020-10-16
KR20220031097A (ko) 2022-03-11

Similar Documents

Publication Publication Date Title
WO2022001637A1 (zh) 文档处理方法、装置、设备及计算机可读存储介质
WO2022037573A1 (zh) 表单识别方法、装置、设备及计算机可读存储介质
WO2021169111A1 (zh) 简历筛选方法、装置、计算机设备和存储介质
CN110069709B (zh) 意图识别方法、装置、计算机可读介质及电子设备
Faraki et al. Fisher tensors for classifying human epithelial cells
US10963685B2 (en) Generating variations of a known shred
US20160092730A1 (en) Content-based document image classification
Inoue et al. A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors
US20170076152A1 (en) Determining a text string based on visual features of a shred
US20130064444A1 (en) Document classification using multiple views
CN101937513A (zh) 信息处理设备、信息处理方法和程序
CN112632226B (zh) 基于法律知识图谱的语义搜索方法、装置和电子设备
CN113221918B (zh) 目标检测方法、目标检测模型的训练方法及装置
CN110046648B (zh) 基于至少一个业务分类模型进行业务分类的方法及装置
Zheng et al. Classification techniques in pattern recognition
CN108133224B (zh) 用于评估分类任务复杂度的方法
CN111353514A (zh) 模型训练方法、图像识别方法、装置及终端设备
US20230138491A1 (en) Continuous learning for document processing and analysis
JP6017277B2 (ja) 特徴ベクトルの集合で表されるコンテンツ間の類似度を算出するプログラム、装置及び方法
US11886809B1 (en) Identifying templates based on fonts
Salamah et al. Towards the machine reading of arabic calligraphy: a letters dataset and corresponding corpus of text
Ahmed et al. Hateful meme prediction model using multimodal deep learning
Vishwanath et al. Deep reader: Information extraction from document images via relation extraction and natural language
CN113033170B (zh) 表格标准化处理方法、装置、设备及存储介质
US11461693B2 (en) Training apparatus and training method for providing sample size expanding model

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022506431

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227004409

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21831758

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21831758

Country of ref document: EP

Kind code of ref document: A1