WO2022001637A1 - 文档处理方法、装置、设备及计算机可读存储介质 - Google Patents
文档处理方法、装置、设备及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2022001637A1 WO2022001637A1 PCT/CN2021/099799 CN2021099799W WO2022001637A1 WO 2022001637 A1 WO2022001637 A1 WO 2022001637A1 CN 2021099799 W CN2021099799 W CN 2021099799W WO 2022001637 A1 WO2022001637 A1 WO 2022001637A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- processed
- feature
- category
- general
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 26
- 230000000007 visual effect Effects 0.000 claims abstract description 49
- 238000000034 method Methods 0.000 claims abstract description 46
- 238000000605 extraction Methods 0.000 claims description 56
- 230000011218 segmentation Effects 0.000 claims description 28
- 238000013528 artificial neural network Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 11
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 17
- 230000006870 function Effects 0.000 description 8
- 238000012015 optical character recognition Methods 0.000 description 7
- 230000004913 activation Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
Definitions
- the present disclosure relates to computer vision technology, and in particular, to a document processing method, apparatus, device, and computer-readable storage medium.
- OCR Optical Character Recognition, Optical Character Recognition
- Embodiments of the present disclosure provide a document classification scheme.
- a document processing method comprising: acquiring semantic features and visual features of a document to be processed; determining a general feature of the document to be processed according to the semantic features and the visual features; The category of the to-be-processed document is determined according to the general characteristics of the to-be-processed document.
- the obtaining the semantic feature of the document to be processed includes: obtaining a text recognition result of the document to be processed; and obtaining the semantic feature of the document to be processed based on the text recognition result.
- the obtaining the text recognition result of the document to be processed includes: determining a target text box in the document to be processed and text content contained in the target text box; obtaining each The word segmentation processing result of the text content in the target text box; and the feature vector corresponding to the word segmentation processing result is obtained.
- the determining the general feature of the document to be processed according to the visual feature and the semantic feature includes: performing regularization processing on the visual feature and the semantic feature respectively; A weighted summation is performed on the regularized visual feature and the regularized semantic feature to obtain the general feature of the document to be processed.
- the document processing method is performed using a neural network, the neural network comprising a feature extraction sub-network for extracting general features of the document to be processed and a feature extraction sub-network for determining according to the general features
- the first classification sub-network of the category of the document to be processed wherein the first classification sub-network is specifically used to: compare the general features of the to-be-processed document with the preset standard features of at least one category of documents, and determine the The similarity between the general feature of the document to be processed and the standard feature of the document of the at least one category is determined; the category of the document to be processed is determined according to the obtained at least one similarity.
- the determining the category of the document to be processed according to the obtained at least one similarity includes: obtaining the highest similarity among the at least one similarity; If the similarity is greater than or equal to a preset similarity threshold, it is determined that the category of the document to which the standard feature corresponding to the highest similarity belongs is the category of the document to be processed.
- the method further includes training a feature extraction sub-network in the neural network, specifically including: inputting a sample document into the feature extraction sub-network, and obtaining a feature extraction sub-network of the sample document.
- a feature extraction sub-network in the neural network, specifically including: inputting a sample document into the feature extraction sub-network, and obtaining a feature extraction sub-network of the sample document.
- General features wherein the sample documents are marked with categories; the general features are input into the second classification sub-network to obtain the predicted categories of the sample documents; according to the predicted categories of the sample documents and the annotations of the sample documents The difference between the categories is adjusted, and the network parameters of the feature extraction sub-network are adjusted.
- the standard features of the at least one type of document are obtained by performing feature extraction on the at least one type of document by using a trained feature extraction sub-network.
- the method further includes: in response to the highest similarity being less than the preset similarity threshold, adding the to-be-processed document as a standard template, and determining the to-be-processed document
- the general characteristics of the document are the standard characteristics of the corresponding category of the newly added standard template.
- the method further includes: in response to a selection instruction, selecting at least one category from preset document categories as a target category; the comparing the general characteristics of the document to be processed with the preset standard features of at least one type of document, determining the similarity between the general feature of the document to be processed and the standard feature of the at least one type of document, including: comparing the general feature of the document to be processed with a preset at least one target Standard features of documents of a category, determining the similarity between the general features of the documents to be processed and the standard features of documents of the at least one target category.
- the method further includes: acquiring a corresponding preset standard template according to the category of the document to be processed; and performing layout recognition processing on the document to be processed based on the standard template, Get the result of the document's layout recognition.
- a document processing apparatus comprising: an acquisition module for acquiring semantic features and visual features of a document to be processed; a general module for acquiring semantic features and visual features according to the semantic features and visual features determining a general feature of the document to be processed; a classification module, configured to determine the category of the document to be processed according to the general feature of the document to be processed.
- a document processing apparatus comprising a non-volatile storage medium and a processor, the storage medium for storing computer instructions executable on the processor, the processor for The method described in any embodiment of the present disclosure is performed.
- a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment of the present disclosure is implemented.
- the document processing method, apparatus, device, computer-readable medium, and computer program of one or more embodiments of the present disclosure determine the general features of the document according to the obtained visual features and semantic features of the document, and determine the category of the document according to the general features.
- the document processing method of the present disclosure can realize accurate classification of any document; by combining semantic features and visual features to obtain general features of documents, the accuracy of classification results of documents of different categories with similar visual features is improved, and the accuracy of document classification is also improved. robustness.
- FIG. 1 is a flowchart of a document processing method according to an embodiment of the present disclosure
- FIG. 2 schematically shows a partial network structure of a neural network for extracting visual features according to an embodiment of the present disclosure
- FIG. 3 schematically shows a partial network structure of a neural network for extracting semantic features according to an embodiment of the present disclosure
- FIG. 4 is a schematic diagram of a text recognition process of a form shown in an embodiment of the present disclosure
- FIG. 5 is a schematic diagram of a user selection interface shown in an embodiment of the present disclosure.
- FIG. 6 is a schematic diagram of a document processing apparatus shown in the implementation of the present disclosure.
- FIG. 7 is a schematic structural diagram of a document processing device according to an embodiment of the present disclosure.
- first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
- first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
- word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
- OCR Optical Character Recognition, Optical Character Recognition
- FIG. 1 shows a flow of the document processing method, including steps S101 to S103 .
- the document may include one or more of books, documents, forms, bills, certificates and radio frequency cards, etc.
- the document processing method can automatically identify the categories of the above documents, for example, a bank card can be automatically identified as a bank card category, an ID card can be automatically identified as an ID card category, or an invoice can be automatically identified as an invoice category.
- a bank card can be automatically identified as a bank card category
- an ID card can be automatically identified as an ID card category
- an invoice can be automatically identified as an invoice category.
- the processing process of each document to be processed in batch processing is similar to the processing process of a single document to be processed, and you can refer to the processing process of a single document to be processed.
- the document to be processed is a single item for illustration, but it is not a limitation of the technical solution of the present application.
- step S101 the semantic features and visual features of the document to be processed are acquired.
- This step is not intended to impose specific restrictions on the sequence of acquiring semantic features and visual features, that is, acquiring semantic features first and then visual features, or acquiring visual features first, then acquiring semantic features, or acquiring both semantic features and visual features.
- a neural network can be used to extract the visual features of the document to be processed.
- a convolution kernel such as a 3*3 convolution kernel
- the initial features are sequentially extracted through multiple (such as 7) reverse residual blocks to extract intermediate features, and the last reverse
- the intermediate features output by the residual block are then convolved with a convolution kernel (for example, a 1*1 convolution kernel) to output the features of the specified dimension as the visual features of the document to be processed.
- Each inverse residual block includes a 1*1 convolution kernel and an activation function (such as Relu6) composed of an up-channel module (used to expand the number of channels of the input features), a depth-separable convolution layer and activation An extraction module composed of functions (used to extract the features of each channel and connect the features of each channel), and a down channel module composed of a 1*1 convolution kernel (used to restore the number of channels of the input features).
- Each inverse residual block sums its input and the output of the drop channel block as the output of the inverse residual block. The output of each inverse residual block except the last inverse residual block is used as the input of the next inverse residual block.
- FIG. 2 schematically shows a part of a network structure for extracting visual features of documents to be processed.
- the partial network structure shown in FIG. 2 includes two inverse residual blocks, namely a first inverse residual block 201 and a second inverse residual block 202 .
- the first inverse residual block 201 includes a first up-pass module 2011, a first extraction module 2012, and a first down-pass module 2013 which are connected in sequence.
- the first ascending channel module 2011 may, for example, be composed of a 1*1 convolution kernel (Conv1*1) and an activation function (eg Relu6)
- the first extraction module 2012 may be, for example, a depth-separable 3*3 convolution Layer (Dwise3*3) and activation function (for example, Relu6)
- the first down channel module 2013 may be composed of, for example, a 1*1 convolution kernel (Conv1*1).
- the first input of the first inverse residual block 201 is the initial feature of the document to be processed, which can be extracted by, for example, a 3*3 convolution kernel.
- the first output of the first inverse residual block 201 is the sum of the first input and the output of the first down channel module, and the first output is the second input of the second inverse residual block 202.
- the second inverse residual block 202 includes a second up-pass module 2021 , a second extraction module 2022 , and a second down-pass module 2023 connected in sequence.
- the second up-channel module 2021 can be composed of, for example, a 1*1 convolution kernel (Conv1*1) and an activation function (such as Relu6)
- the second extraction module 2022 can be composed of, for example, a depth-separable convolution layer (Dwise3 *3) and an activation function (such as Relu6)
- the second down channel module can be composed of, for example, a 1*1 convolution kernel (Conv1*1).
- the second output of the second inverse residual block 202 is the sum of the second input and the output of the second down channel block.
- the semantic features of the document to be processed may be obtained in the following manner: first, a text recognition result of the to-be-processed document is obtained; then, based on the text recognition result, the semantic feature of the to-be-processed document is obtained.
- the text recognition result may be a result of extracting the text content in the document to be processed and expressing it in a specific manner.
- the OCR technology can be used to obtain the text recognition result of the document to be processed.
- a neural network can be used to extract the semantic features of the text recognition results. Specifically, the features of different levels of the text recognition result can be extracted first, and then the above-mentioned features of different levels are connected and extracted, and finally the semantic features of the text recognition result are obtained.
- each third extraction module 301 is used to obtain intermediate features of the text recognition result, wherein each third extraction module 301 may be a convolution kernel with different receptive fields.
- a convolution kernel with a receptive field of 1 a convolution kernel with a receptive field of 3, and a convolution kernel with a receptive field of 5 can be used to extract features at three different levels of the text recognition result (eg, by convolution and/or pooling). operations), and then connect the features of three different levels to obtain intermediate features.
- the fourth extraction module 302 (for example, a 1*1 convolution kernel) is used to further extract the intermediate features (for example, by convolution and/or pooling operations) to obtain semantic features of the text recognition result.
- the feature extraction process corresponding to the above-mentioned Figure 3 is only an example of extracting semantic features, not a specific limitation on the way of extracting semantic features of text recognition results. More or less convolution kernels and other feelings can be used. Wild combinations are used to extract features at different levels.
- the semantic features of the documents to be processed can be used to distinguish various documents with similar visual features but different text content.
- the above-mentioned various documents are precisely one of the situations in which the related art cannot be accurately classified. This embodiment solves this problem in the related art by adding semantic features.
- step S102 a general feature of the document to be processed is determined according to the semantic feature and the visual feature.
- step S101 when extracting visual features and extracting semantic features in step S101, visual features and semantic features with the same dimensions can be output, so as to facilitate the fusion of the two features.
- this embodiment does not intend to limit the dimensional relationship between the visual feature and the semantic feature extracted in step S101.
- step S101 when extracting visual features and extracting semantic features in step S101, visual features and semantic features of different dimensions may also be output.
- the dimensions of the two features can be compared, and then the dimension of the higher-dimensional feature of the two features can be reduced to make the dimensions of the two features the same, and then the two features can be fused.
- linear dimension reduction and non-linear dimension reduction can be adopted in the dimension reduction method.
- the general features of the documents to be processed can be obtained.
- the general features of the documents to be processed may be used for document classification in step S103, and may also be used for document comparison to match document pictures.
- step S103 the category of the to-be-processed document is determined according to the general characteristics of the to-be-processed document.
- a general feature of the document is determined according to the obtained visual features and semantic features of the document, and a category of the document is determined according to the general feature.
- the document processing method of the present disclosure can realize accurate classification of any document; by combining semantic features and visual features to obtain general features of documents, the accuracy of classification results of documents of different categories with similar visual features is improved, and the accuracy of document classification is also improved. robustness.
- the text recognition result of the document to be processed may be obtained in the following manner:
- the target text box in the document to be processed and the text content contained in the target text box are determined.
- FIG. 4 shows a text recognition process for a document (ie, a form) to be processed.
- the target text boxes in the document to be processed are determined, that is, the 15 text boxes 401 to 415, and the text content contained in each target text box.
- the text box 401 contains the requisition form for office supplies
- the text box 402 contains the date, date and time of filling in the form
- the text box 415 contains the comments of the general manager.
- word segmentation processing By performing word segmentation processing on the text content in each text box, multiple word segmentation processing results are obtained.
- the 11 word segmentation processing results from 416 to 426 are obtained by performing word segmentation processing on the text content in the above 15 text boxes. Part of the word segmentation processing results.
- the word segmentation processing results may include words or words.
- the word segmentation processing results 416 office
- 417 (supplies), 418 purchase requisitions) and 419 (tables) are the four text content in the text box 401 obtained after word segmentation processing.
- the word segmentation processing results 425 (general manager) and 426 (opinion) are two word segmentation processing results obtained after the text content in the text box 415 is subjected to word segmentation processing.
- 427 to 438 are 12 feature vectors, and each feature vector is a result obtained after a word segmentation processing result is represented by the feature vector.
- the text recognition result is obtained by determining the target text box in the document and the text content in the target text box, and performing word segmentation processing and feature vector representation on the text content. Not only the text content in the document (for example, part or all of the text content in the document) is extracted, but also the smallest word/word unit in the text can be obtained through the division of the text box and word segmentation, so the semantic features are very accurate, The accuracy of document classification is further improved; and the text recognition result is represented by a feature vector, which facilitates the extraction of semantic features and further improves the efficiency of document classification.
- the document processing method may be performed using a neural network
- the neural network may include a feature extraction sub-network for extracting general features of the document to be processed and a feature extraction sub-network for determining the The first classification sub-network for the category of the document to be processed.
- the first classification sub-network may be specifically used to: compare the general features of the documents to be processed with the standard features of at least one type of preset documents, and determine the general characteristics of the documents to be processed and the at least one type of documents. at least one similarity of standard features of the document; the category of the document to be processed is determined according to the at least one similarity.
- the dimensions of the general feature and the standard feature of the document to be processed may be the same, thereby facilitating the comparison of the general feature and the standard feature.
- the similarity between the common feature and the standard feature can be obtained by calculating the Euclidean distance of the two, or obtained by a neural network capable of outputting the similarity between the two, and the neural network is obtained through training.
- standard features of various types of documents are preset in the neural network.
- the categories of the documents to be processed are determined by using the common features of the documents to be processed and the similarity of different standard features.
- the similarity is used to characterize the relationship between the document to be processed and various standard documents, that is, whether it is similar and the degree of similarity, which improves the accuracy of the classification result, and the operation is simple, and the classification efficiency is further improved.
- the category of the document to be processed is determined according to at least one similarity in the following manner:
- the highest similarity among the at least one similarity is obtained.
- the category of the document to which the standard feature corresponding to the highest similarity belongs is the category of the document to be processed.
- the highest similarity is determined by comparing the respective degrees of similarity.
- the step of calculating the degrees of similarity can be returned, the degrees of similarity will be recalculated with higher precision, and then the calculated results will be compared again to obtain the highest degree of similarity. If the calculation is repeated one or more times and still includes at least two identical highest degrees of similarity, the repeated calculation is continued until only one highest degree of similarity remains.
- the similarity can also be compared with the preset similarity threshold to filter out one or more similarities whose value is greater than or equal to the similarity threshold, and then filter out one or more similarities.
- the highest similarity is obtained among the similarities. It can be seen from this that the implementation manner of determining the unique highest similarity may include but not be limited to the two cases exemplified above. In the implementation process, other implementation manners that can achieve the same or similar effects may also be used, which are not the same here. An example.
- the similarity higher than the similarity threshold is considered to be an effective similarity, that is to say, the similarity between the general feature of the document to be processed and the standard feature is higher than or equal to the similarity threshold, and it is considered to be a valid similarity.
- the document to be processed is similar to the standard document, and the more the similarity is higher than the similarity threshold, it is considered that the similarity between the document to be processed and the standard document is higher; the similarity between the general feature of the document to be processed and the standard feature is lower than Similarity threshold, it is considered that the document to be processed is not similar to the standard document.
- the similarity threshold is preset in the neural network. By comparing the highest similarity with the similarity threshold, and when the highest similarity is greater than the similarity threshold, the to-be-processed document is classified into the category corresponding to the standard document. A classification error is avoided when the general features of the documents to be processed have low similarity with all standard features, that is, when the documents to be processed do not belong to a category corresponding to any standard document. The accuracy of classification is further improved, and the problem of misclassification of documents other than preset categories is avoided.
- the feature extraction sub-network in the neural network is trained in the following manner:
- the general feature is input into the second classification sub-network to obtain the predicted category of the sample document
- the network parameters of the feature extraction sub-network are adjusted.
- the network structure of the feature extraction sub-network enables it to extract the general features of the documents input into it, and the training of the feature extraction sub-network is to improve the accuracy of its feature extraction.
- the second classification sub-network is a classifier, for example, it can be composed of at least one fully connected layer and a normalization layer; the number of categories classified by the second classification sub-network is fixed, corresponding to the number of categories of the sample documents, such as 5 , 8 or 10, etc., that is to say, the output of the second classification sub-network is the probability of each preset category, and the category with the highest probability is the classification result.
- the output dimension of the second classification sub-network is 10, corresponding to the above 10 categories respectively.
- the second classification sub-network When the general features of a sample document extracted by the feature extraction sub-network are input to the second classification sub-network, the second classification sub-network outputs 10 probabilities, which are 83%, 2%, 1%, 3%, 0.5 %, 0.2%, 0.3%, 5%, 4%, 1%, the above 10 probabilities are the probabilities that the sample documents are A, B, C, D, E, F, G, H, I, J, respectively, Therefore, the predicted class of the output sample probability of the second classification sub-network is A.
- the adjustment of the network parameters of the feature extraction sub-network may be stopped, and/or when the number of adjustments exceeds the preset number of times threshold, the feature extraction sub-network may be stopped. adjustment of network parameters.
- a sample document set can be prepared in advance. First, a plurality of sample documents are acquired; next, a category of each of the sample documents is marked respectively; finally, a sample document set is determined according to the plurality of marked sample documents. In addition, one can also be selected from each type of sample document as a standard template of this type of document for use in subsequent storage of standard features.
- the extraction capability of the feature extraction sub-network determines the accuracy of the extracted general feature, and the accuracy of the general feature determines the accuracy of the classification result. Therefore, the predicted category output by the second classification sub-network has an The accuracy can characterize the strength of the feature extraction sub-network extraction ability.
- the second classification sub-network is used to realize the characterization of the extraction ability of the feature extraction sub-network, and then the network parameters of the feature extraction sub-network are adjusted by feedback, and the network parameters are continuously optimized to improve the extraction capability of the feature extraction sub-network, thereby improving the extraction of general features. accuracy, and improved document classification accuracy.
- the standard features of the at least one type of documents are obtained by processing standard templates of the at least one type of documents by using a trained feature extraction sub-network.
- the standard template of each type of document can be determined first.
- the layout of the standard template is clear, the boundaries of the text box and/or text block are clear, and the text content is complete.
- After extracting the general characteristics of the standard template of each type of document it is stored as the standard of this type of document. feature.
- the standard template may also be marked, that is, marking the attributes of each position, text box and/or text block, etc. of the standard template, so that the standard template can be used for document recognition.
- both the standard template and the general document of the document to be processed are extracted by the feature extraction sub-network. Therefore, the general feature and the standard feature have the same origin, and the rules and standards are consistent. Therefore, the similarity determined by the two is more accurate. The accuracy of document classification is further improved.
- the standard features stored in the above manner are limited and cannot cover all document categories. Furthermore, according to the introduction of some of the foregoing embodiments, only when the highest similarity threshold is greater than or equal to the similarity threshold, the to-be-processed document can be classified into the document category corresponding to the highest similarity. Based on the above two reasons, when the category of a document is not covered by the preset standard template, the classification cannot be completed.
- the document to be processed is added as a standard template, and the general feature of the document to be processed is determined as a standard feature of the category corresponding to the newly added standard template.
- the highest similarity is less than the similarity threshold, indicating that the document to be processed does not belong to any preset document category, that is, the document to be processed is a new document category.
- the unclassified documents to be processed are stored as a new category in the neural network, that is, the to-be-processed documents are stored as standard templates, and the extracted general features are stored as standard features of the new category of documents.
- reminder information can also be generated to remind the user to mark the standard template of the category so that it can be used for layout recognition.
- the first classification sub-network can automatically expand the classification dimension or quantity.
- the number of preset document categories can be automatically expanded, and the classification capability can be continuously improved.
- the method further includes: in response to a selection instruction, selecting at least one category from preset document categories as a target category; wherein the selection instruction may be triggered by a user through a selection operation, or a trigger condition may be preset , which is automatically triggered when the trigger condition is met.
- the similarity between the general feature of the document to be processed and the standard feature of the at least one type of document is determined by comparing the general feature of the document to be processed with the standard feature of the document of at least one preset target category, The similarity between the general feature of the document to be processed and the standard feature of the document of the at least one target category is determined.
- the preset document categories include general text, ID card, bank card, driving license and driving license , passport, general form, VAT invoice, business license, and handwritten text; the user selected ID card, bank card, general form, VAT invoice, and handwritten text as the target category through the operation. Then, in the subsequent processing based on the document to be identified, the multiple categories selected by the user will be used as a reference.
- the content shown in Figure 5 is only a possible implementation.
- the user can also create a template independently to establish a new target category, and treat the new target category as a document to be identified. reference in the process.
- the target category may include at least part of the various categories shown in FIG. 5 , that is, it may be more or less than the situation shown in FIG. 5 , which is not limited herein.
- the present disclosure also provides a document processing device, please refer to FIG. 6 , which shows the structure of the device.
- the device includes: an acquisition module 601 for acquiring semantic features and visual features of a document to be processed; a general module 602, configured to determine a general feature of the document to be processed according to the semantic feature and the visual feature; a classification module 603, configured to determine a category of the document to be processed according to the general feature of the document to be processed.
- the obtaining module is specifically configured to: obtain a text recognition result of the document to be processed; and obtain a semantic feature of the document to be processed based on the text recognition result.
- acquiring the text recognition result of the document to be processed includes: determining a target text box in the document to be processed and text content contained in the target text box; acquiring each of the target texts The word segmentation processing result of the text content in the box; the feature vector corresponding to the word segmentation processing result is obtained.
- the general module is specifically configured to: respectively perform regularization processing on the visual features and the semantic features; A weighted summation is performed to obtain the general features of the document to be processed.
- the document processing apparatus includes a neural network
- the neural network includes a feature extraction sub-network for extracting general features of the documents to be processed and a feature extraction sub-network for determining the documents to be processed according to the general features
- the first classification sub-network of the category of the The similarity between the general feature and the standard feature of the at least one type of document; the type of the document to be processed is determined according to the obtained at least one similarity.
- the first classification sub-network when used to determine the category of the document to be processed according to the obtained at least one similarity, it is specifically configured to: obtain the highest similarity among the at least one similarity ; In response to the highest similarity being greater than or equal to a preset similarity threshold, determine that the category of the document to which the standard feature corresponding to the highest similarity belongs is the category of the document to be processed.
- the apparatus further includes a training module for training a feature extraction sub-network in the neural network, for: inputting sample documents into the feature extraction sub-network to obtain the sample documents The general feature of the sample document, wherein the sample document is marked with a category; the general feature is input to the second classification sub-network to obtain the predicted category of the sample document; according to the predicted category of the sample document and the sample document The differences between the categories are marked, and the network parameters of the feature extraction sub-network are adjusted.
- a training module for training a feature extraction sub-network in the neural network, for: inputting sample documents into the feature extraction sub-network to obtain the sample documents The general feature of the sample document, wherein the sample document is marked with a category; the general feature is input to the second classification sub-network to obtain the predicted category of the sample document; according to the predicted category of the sample document and the sample document The differences between the categories are marked, and the network parameters of the feature extraction sub-network are adjusted.
- the standard features of the at least one type of documents are obtained by performing feature extraction on the at least one type of documents using a trained feature extraction sub-network.
- the apparatus further includes an extension module, configured to: in response to the highest similarity being less than the preset similarity threshold, add the document to be processed as a standard template, and determine the document to be processed
- the general features of processing documents are the standard features of the corresponding category of the newly added standard template.
- the apparatus further includes a target module, configured to: in response to a selection instruction, select at least one category from preset document categories as a target category; the first classification sub-network is used for comparing the The general feature of the document to be processed and the preset standard feature of at least one type of document, when determining the similarity between the general feature of the to-be-processed document and the standard feature of the at least one type of document, it is specifically used for: comparing the to-be-processed document The common feature of the processed document and the preset standard feature of the document of at least one target category are determined to determine the similarity between the common feature of the document to be processed and the standard feature of the document of the at least one target category.
- the apparatus further includes an identification module, configured to: obtain a corresponding preset standard template according to the category of the document to be processed; and perform layout identification processing on the document to be processed based on the standard template , to get the result of document layout recognition.
- the present disclosure also provides a document processing device, please refer to FIG. 7 , which shows the structure of the device, the device includes a non-volatile storage medium 701 and a processor 702, and the storage medium 701 is used for storing Computer instructions executable on a processor 702 for implementing the method described in any of the embodiments of the present disclosure when the computer instructions are executed.
- the present disclosure also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method described in any of the embodiments of the present disclosure.
- At least one target category in the multiple categories can be selected as a reference, thereby reducing the determination of The computational load of the step of similarity and the computational load of the step of comparing the similarity improve the efficiency of classification.
- the method further includes: acquiring a corresponding preset standard template according to the category of the document to be processed; and performing a layout recognition process on the document to be processed based on the standard template to obtain a document layout recognition result.
- the corresponding standard template is automatically and accurately retrieved through the classification result for layout recognition, which not only improves the accuracy of layout recognition, but also improves the efficiency of layout recognition.
- one or more embodiments of this specification may be provided as a method, system or computer program product. Accordingly, one or more embodiments of this specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present specification may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein form of the product.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- Embodiments of the subject matter and functional operations described in this specification can be implemented in digital electronic circuitry, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this specification and their structural equivalents, or in a combination of one or more.
- Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
- the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
- the processing device executes.
- the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
- the processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
- the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
- Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
- the central processing unit will receive instructions and data from read only memory and/or random access memory.
- the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operably coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both.
- the computer does not have to have such a device.
- the computer may be embedded in another device such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
- PDA personal digital assistant
- GPS global positioning system
- USB universal serial bus
- Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks.
- semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
- magnetic disks eg, internal hard disks or memory devices. removable disks
- magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
- the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
Description
Claims (21)
- 一种文档处理方法,其特征在于,所述方法包括:获取待处理文档的语义特征以及视觉特征;根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征;根据所述待处理文档的通用特征确定所述待处理文档的类别。
- 根据权利要求1所述的文档处理方法,其特征在于,所述获取待处理文档的语义特征,包括:获取所述待处理文档的文本识别结果;基于所述文本识别结果,获得所述待处理文档的语义特征。
- 根据权利要求2所述的文档处理方法,其特征在于,所述获取所述待处理文档的文本识别结果,包括:确定所述待处理文档中的目标文本框以及所述目标文本框所包含的文本内容;获得各个所述目标文本框中的文本内容的分词处理结果;获得所述分词处理结果对应的特征向量。
- 根据权利要求1所述的文档处理方法,其特征在于,所述根据所述视觉特征和所述语义特征确定所述待处理文档的通用特征,包括:分别对所述视觉特征和所述语义特征进行正则化处理;对正则化处理后的所述视觉特征和正则化处理后的所述语义特征进行加权求和,得到所述待处理文档的通用特征。
- 根据权利要求1至4任一项所述的文档处理方法,其特征在于,所述文档处理方法利用神经网络执行,所述神经网络包括用于提取所述待处理文档的通用特征的特征提取子网络和用于根据所述通用特征确定所述待处理文档的类别的第一分类子网络,其中,所述第一分类子网络具体用于:比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度;根据所获得的至少一个相似度确定所述待处理文档的类别。
- 根据权利要求5所述的文档处理方法,其特征在于,所述根据所获得的至少一个相似度确定所述待处理文档的类别,包括:获得所述至少一个相似度中最高的相似度;响应于所述最高的相似度大于或等于预设的相似度阈值,确定所述最高的相似度对应的标准特征所属文档的类别为所述待处理文档的类别。
- 根据权利要求5或6所述的文档处理方法,其特征在于,所述方法还包括对所述神经网络中的特征提取子网络进行训练,具体包括:将样本文档输入至所述特征提取子网络,获得所述样本文档的通用特征,其中,所述样本文档标注有类别;将所述通用特征输入至第二分类子网络,获得所述样本文档的预测类别;根据所述样本文档的预测类别和所述样本文档的标注类别之间的差异,对所述特征提取子网络的网络参数进行调整。
- 根据权利要求7所述的文档处理方法,其特征在于,所述至少一类文档的标准特征是利用训练完成的特征提取子网络,对所述至少一类文档进行特征提取而获得的。
- 根据权利要求6至8任一项所述的文档处理方法,其特征在于,所述方法还包括:响应于所述最高的相似度小于所述预设的相似度阈值,增加所述待处理文档为标准模板,并确定所述待处理文档的通用特征为新增标准模板对应类别的标准特征。
- 根据权利要求5至9任一项所述的文档处理方法,其特征在于,所述方法还包括:响应于选择指令,从预设的文档类别中选择至少一个类别作为目标类别;所述比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度,包括:比较所述待处理文档的通用特征与预设的至少一个目标类别的文档的标准特征,确定所述待处理文档的通用特征与所述至少一个目标类别的文档的标准特征的相似度。
- 根据权利要求1至10任一项所述的文档处理方法,其特征在于,所述方法还包括:根据所述待处理文档的类别获取对应的预设的标准模板;基于所述标准模板,对所述待处理文档进行版式识别处理,得到文档的版式识别结果。
- 一种文档处理装置,其特征在于,所述装置包括:获取模块,用于获取待处理文档的语义特征以及视觉特征;通用模块,用于根据所述语义特征和所述视觉特征确定所述待处理文档的通用特征;分类模块,用于根据所述待处理文档的通用特征确定所述待处理文档的类别。
- 根据权利要求12所述的文档处理装置,其特征在于,所述获取模块具体用于:获取所述待处理文档的文本识别结果;基于所述文本识别结果,获得所述待处理文档的语义特征。
- 根据权利要求13所述的文档处理装置,其特征在于,所述获取所述待处理文档的文本识别结果,包括:确定所述待处理文档中的目标文本框以及所述目标文本框所包含的文本内容;获得各个所述目标文本框中的文本内容的分词处理结果;获得所述分词处理结果对应的特征向量。
- 根据权利要求12所述的文档处理装置,其特征在于,所述通用模块具体用于:分别对所述视觉特征和所述语义特征进行正则化处理;对正则化处理后的所述视觉特征和正则化处理后的所述语义特征进行加权求和,得到所述待处理文档的通用特征。
- 根据权利要求12至15任一项所述的文档处理装置,其特征在于,所述文档处理装置包括神经网络,所述神经网络包括用于提取所述待处理文档的通用特征的特征提取子网络和用于根据所述通用特征确定所述待处理文档的类别的第一分类子网络,其中,所述第一分类子网络具体用于:比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度;根据所获得的至少一个相似度确定所述待处理文档的类别。
- 根据权利要求16所述的文档处理装置所述的文档处理装置,其特征在于,所述第一分类子网络在用于根据所获得的至少一个相似度确定所述待处理文档的类别时,具体用于:获得所述至少一个相似度中最高的相似度;响应于所述最高的相似度大于或等于预设的相似度阈值,确定所述最高的相似度对应的标准特征所属文档的类别为所述待处理文档的类别,或响应于所述最高的相似度小于所述预设的相似度阈值,增加所述待处理文档为标准模板,并确定所述待处理文档的通用特征为新增标准模板对应类别的标准特征。
- 根据权利要求16或17所述的文档处理装置,其特征在于,还包括:目标模块,用于响应于选择指令,从预设的文档类别中选择至少一个类别作为目标类别;所述第一分类子网络在用于比较所述待处理文档的通用特征与预设的至少一类文档的标准特征,确定所述待处理文档的通用特征与所述至少一类文档的标准特征的相似度时,具体用于:比较所述待处理文档的通用特征与预设的至少一个目标类别的文档的标准特征,确定所述待处理文档的通用特征与所述至少一个目标类别的文档的标准特征的相似度。
- 一种文档处理设备,其特征在于,所述设备包括非暂时性存储介质、处理器,所述存储介质用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求1至11任一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至11任一所述的方法。
- 一种计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至11任一所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227004409A KR20220031097A (ko) | 2020-06-29 | 2021-06-11 | 문서 처리 방법, 장치, 기기 및 컴퓨터 판독 가능 저장 매체 |
JP2022506431A JP2022543052A (ja) | 2020-06-29 | 2021-06-11 | 文書処理方法、文書処理装置、文書処理機器、コンピュータ可読記憶媒体及びコンピュータプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010610080.8 | 2020-06-29 | ||
CN202010610080.8A CN111782808A (zh) | 2020-06-29 | 2020-06-29 | 文档处理方法、装置、设备及计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022001637A1 true WO2022001637A1 (zh) | 2022-01-06 |
Family
ID=72760274
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/099799 WO2022001637A1 (zh) | 2020-06-29 | 2021-06-11 | 文档处理方法、装置、设备及计算机可读存储介质 |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP2022543052A (zh) |
KR (1) | KR20220031097A (zh) |
CN (1) | CN111782808A (zh) |
WO (1) | WO2022001637A1 (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111782808A (zh) * | 2020-06-29 | 2020-10-16 | 北京市商汤科技开发有限公司 | 文档处理方法、装置、设备及计算机可读存储介质 |
CN112612911A (zh) * | 2020-12-30 | 2021-04-06 | 华为技术有限公司 | 一种图像处理方法、***、设备及介质、程序产品 |
CN112861757B (zh) * | 2021-02-23 | 2022-11-22 | 天津汇智星源信息技术有限公司 | 基于文本语义理解的笔录智能审核方法及电子设备 |
CN113051396B (zh) * | 2021-03-08 | 2023-11-17 | 北京百度网讯科技有限公司 | 文档的分类识别方法、装置和电子设备 |
CN113297951A (zh) * | 2021-05-20 | 2021-08-24 | 北京市商汤科技开发有限公司 | 文档处理方法、装置、设备及计算机可读存储介质 |
CN113742483A (zh) * | 2021-08-27 | 2021-12-03 | 北京百度网讯科技有限公司 | 文档分类的方法、装置、电子设备和存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344815A (zh) * | 2018-12-13 | 2019-02-15 | 深源恒际科技有限公司 | 一种文档图像分类方法 |
CN110008944A (zh) * | 2019-02-20 | 2019-07-12 | 平安科技(深圳)有限公司 | 基于模板匹配的ocr识别方法及装置、存储介质 |
CN110298338A (zh) * | 2019-06-20 | 2019-10-01 | 北京易道博识科技有限公司 | 一种文档图像分类方法及装置 |
US20190325212A1 (en) * | 2018-04-20 | 2019-10-24 | EMC IP Holding Company LLC | Method, electronic device and computer program product for categorization for document |
CN111782808A (zh) * | 2020-06-29 | 2020-10-16 | 北京市商汤科技开发有限公司 | 文档处理方法、装置、设备及计算机可读存储介质 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3851742B2 (ja) * | 1999-03-31 | 2006-11-29 | 株式会社東芝 | 帳票処理方法及び装置 |
JP6030172B2 (ja) * | 2015-03-12 | 2016-11-24 | 株式会社東芝 | 手書き文字検索装置、方法及びプログラム |
US10354009B2 (en) * | 2016-08-24 | 2019-07-16 | Microsoft Technology Licensing, Llc | Characteristic-pattern analysis of text |
US10936970B2 (en) * | 2017-08-31 | 2021-03-02 | Accenture Global Solutions Limited | Machine learning document processing |
CN108288067B (zh) * | 2017-09-12 | 2020-07-24 | 腾讯科技(深圳)有限公司 | 图像文本匹配模型的训练方法、双向搜索方法及相关装置 |
CN109033478B (zh) * | 2018-09-12 | 2022-08-19 | 重庆工业职业技术学院 | 一种用于搜索引擎的文本信息规律分析方法与*** |
CN111480166B (zh) * | 2018-12-05 | 2023-05-05 | 北京百度网讯科技有限公司 | 从视频中定位目标视频片段的方法和装置 |
CN110866116A (zh) * | 2019-10-25 | 2020-03-06 | 远光软件股份有限公司 | 政策文档的处理方法、装置、存储介质及电子设备 |
-
2020
- 2020-06-29 CN CN202010610080.8A patent/CN111782808A/zh active Pending
-
2021
- 2021-06-11 KR KR1020227004409A patent/KR20220031097A/ko not_active Application Discontinuation
- 2021-06-11 WO PCT/CN2021/099799 patent/WO2022001637A1/zh active Application Filing
- 2021-06-11 JP JP2022506431A patent/JP2022543052A/ja active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190325212A1 (en) * | 2018-04-20 | 2019-10-24 | EMC IP Holding Company LLC | Method, electronic device and computer program product for categorization for document |
CN109344815A (zh) * | 2018-12-13 | 2019-02-15 | 深源恒际科技有限公司 | 一种文档图像分类方法 |
CN110008944A (zh) * | 2019-02-20 | 2019-07-12 | 平安科技(深圳)有限公司 | 基于模板匹配的ocr识别方法及装置、存储介质 |
CN110298338A (zh) * | 2019-06-20 | 2019-10-01 | 北京易道博识科技有限公司 | 一种文档图像分类方法及装置 |
CN111782808A (zh) * | 2020-06-29 | 2020-10-16 | 北京市商汤科技开发有限公司 | 文档处理方法、装置、设备及计算机可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
JP2022543052A (ja) | 2022-10-07 |
CN111782808A (zh) | 2020-10-16 |
KR20220031097A (ko) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022001637A1 (zh) | 文档处理方法、装置、设备及计算机可读存储介质 | |
WO2022037573A1 (zh) | 表单识别方法、装置、设备及计算机可读存储介质 | |
WO2021169111A1 (zh) | 简历筛选方法、装置、计算机设备和存储介质 | |
CN110069709B (zh) | 意图识别方法、装置、计算机可读介质及电子设备 | |
Faraki et al. | Fisher tensors for classifying human epithelial cells | |
US10963685B2 (en) | Generating variations of a known shred | |
US20160092730A1 (en) | Content-based document image classification | |
Inoue et al. | A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors | |
US20170076152A1 (en) | Determining a text string based on visual features of a shred | |
US20130064444A1 (en) | Document classification using multiple views | |
CN101937513A (zh) | 信息处理设备、信息处理方法和程序 | |
CN112632226B (zh) | 基于法律知识图谱的语义搜索方法、装置和电子设备 | |
CN113221918B (zh) | 目标检测方法、目标检测模型的训练方法及装置 | |
CN110046648B (zh) | 基于至少一个业务分类模型进行业务分类的方法及装置 | |
Zheng et al. | Classification techniques in pattern recognition | |
CN108133224B (zh) | 用于评估分类任务复杂度的方法 | |
CN111353514A (zh) | 模型训练方法、图像识别方法、装置及终端设备 | |
US20230138491A1 (en) | Continuous learning for document processing and analysis | |
JP6017277B2 (ja) | 特徴ベクトルの集合で表されるコンテンツ間の類似度を算出するプログラム、装置及び方法 | |
US11886809B1 (en) | Identifying templates based on fonts | |
Salamah et al. | Towards the machine reading of arabic calligraphy: a letters dataset and corresponding corpus of text | |
Ahmed et al. | Hateful meme prediction model using multimodal deep learning | |
Vishwanath et al. | Deep reader: Information extraction from document images via relation extraction and natural language | |
CN113033170B (zh) | 表格标准化处理方法、装置、设备及存储介质 | |
US11461693B2 (en) | Training apparatus and training method for providing sample size expanding model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2022506431 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20227004409 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21831758 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21831758 Country of ref document: EP Kind code of ref document: A1 |