WO2022100376A1

WO2022100376A1 - Text paragraph structure restoration method and apparatus, and device and computer storage medium

Info

Publication number: WO2022100376A1
Application number: PCT/CN2021/124605
Authority: WO
Inventors: 高超; 徐国强
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-11-13
Filing date: 2021-10-19
Publication date: 2022-05-19
Also published as: CN112070076B; CN112070076A

Abstract

The present application relates to the technical field of artificial intelligence. Disclosed are a text paragraph structure restoration method and apparatus, and a device and a computer storage medium. The method comprises: carrying out recognition on a target picture, and on the basis of a recognition result of the recognition, determining all text boxes in the target picture and text box positions of all of the text boxes; sorting the text boxes according to the text box positions, and on the basis of a sorting result of the sorting, inputting text features of all of the text boxes into a preset deep learning model for training; and organizing all of the text boxes on the basis of a result of the training, so as to obtain all text paragraphs corresponding to the target picture. By means of the present application, the accuracy of text paragraph structure restoration is improved.

Description

文本段落结构还原方法、装置、设备及计算机存储介质Text paragraph structure restoration method, device, device and computer storage medium

本申请要求于2020年11月13日提交中国专利局、申请号为202011264865.0、发明名称为“文本段落结构还原方法、装置、设备及计算机存储介质”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on November 13, 2020 with the application number 202011264865.0 and the invention title "Method, Apparatus, Equipment and Computer Storage Medium for Restoring Text Paragraph Structure", the entire contents of which are approved by Reference is incorporated in this application.

技术领域technical field

本申请涉及人工智能技术领域，尤其涉及一种文本段落结构还原方法、装置、设备以及计算机可读存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a text paragraph structure restoration method, apparatus, device, and computer-readable storage medium.

背景技术Background technique

在纸质文档电子化的过程中，需要将文档录入并保留原本格式，发明人意识到，目前基于文本行的检测、识别方法无法直接得到文本段落信息。目前存在两种方法，即自上而下，也就是先进行整个页面的版面分析，分割出段落，再对段落区域中的文本行进行检测识别。这类方法在做版面分析时不能捕捉局部文字细节特征，并且只使用图片信息而没有文字内容信息，准确率不高。或自下而上，即先检测出文本行，再对文本行进行合并，得到段落。这类方法主要是通过文本框位置的相容性，使用一定的规则或启发式算法，对文本框合并得到段落，需要手工提取大量特征，并且难以参考文字内容信息，因此准确率也不高。In the process of digitizing paper documents, it is necessary to enter the documents and keep the original format. The inventor realized that the current detection and recognition methods based on text lines cannot directly obtain text paragraph information. There are currently two methods, namely, top-down, that is, the layout analysis of the entire page is performed first, paragraphs are segmented, and then the text lines in the paragraph area are detected and identified. This type of method cannot capture the details of local text when doing layout analysis, and only uses picture information without text content information, and the accuracy rate is not high. Or bottom-up, that is, the text lines are detected first, and then the text lines are merged to obtain paragraphs. This type of method mainly uses certain rules or heuristic algorithms to merge text boxes to obtain paragraphs through the compatibility of text box positions, which requires manual extraction of a large number of features, and it is difficult to refer to text content information, so the accuracy rate is not high.

技术问题technical problem

本申请的主要目的在于提供一种文本段落结构还原方法、装置、设备及计算机存储介质，旨在解决如何提高文本段落结构还原的准确性的技术问题。The main purpose of the present application is to provide a text paragraph structure restoration method, apparatus, device and computer storage medium, which aims to solve the technical problem of how to improve the accuracy of text paragraph structure restoration.

技术解决方案technical solutions

为实现上述目的，本申请提供一种文本段落结构还原方法，所述文本段落结构还原方法包括：In order to achieve the above object, the present application provides a method for restoring the structure of a text paragraph, and the method for restoring the structure of a text paragraph includes:

对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；Identifying the target picture, and determining the text box positions of all text boxes and each of the text boxes in the target picture based on the recognition result of the identification;

根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；Sort each of the text boxes according to the position of each of the text boxes, and input the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting;

基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。The text boxes are merged based on the training result to obtain all text paragraphs corresponding to the target picture.

可选地，基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落的步骤，包括：Optionally, the steps of merging each of the text boxes based on the training results to obtain all the text paragraphs corresponding to the target picture include:

基于所述训练结果确定各所述文本框对应的文本标签，遍历各所述文本标签，检测遍历的文本标签对应的遍历标签信息是否为段落；Determine the text label corresponding to each of the text boxes based on the training result, traverse each of the text labels, and detect whether the traversed label information corresponding to the traversed text label is a paragraph;

若所述遍历标签信息是段落，则确定所述遍历的文本标签对应的文本框为所述目标图片对应的文本段落。If the traversed label information is a paragraph, it is determined that the text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.

可选地，检测遍历的文本标签对应的遍历标签信息是否为段落的步骤之后，包括：Optionally, after the step of detecting whether the traversed label information corresponding to the traversed text label is a paragraph, the following steps are included:

若否，则检测所述遍历标签信息是否为段落内容；If not, then detect whether the traversal tag information is paragraph content;

若所述遍历标签信息是段落内容，则确定遍历的文本标签的前一位文本标签的标签信息是否为段落起始信息；If the traversed label information is paragraph content, then determine whether the label information of the previous text label of the traversed text label is paragraph start information;

若所述前一位文本标签的标签信息是段落起始信息，则基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落。If the label information of the previous text label is paragraph start information, the text paragraph corresponding to the target picture is determined based on the traversed text label and the previous text label.

可选地，基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落的步骤，包括：Optionally, the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label includes:

检测各所述文本标签中是否存在连续相邻内容标签；Detecting whether there are consecutive adjacent content labels in each of the text labels;

若存在连续相邻内容标签，则确定所述连续相邻内容标签中是否存在遍历的文本标签；If there is a continuous adjacent content label, then determine whether there is a traversed text label in the continuous adjacent content label;

若不存在遍历的文本标签，则将所述遍历的文本标签对应的文本框和所述前一位文本标签对应的文本框进行合并，以获取合并后的文本框，并将所述合并后的文本框作为所述目标图片对应的文本段落。If there is no traversed text label, combine the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and combine the combined text box The text box is used as the text paragraph corresponding to the target picture.

可选地，确定所述连续相邻内容标签中是否存在遍历的文本标签的步骤之后，包括：Optionally, after the step of determining whether there is a traversed text tag in the continuous adjacent content tags, the method includes:

若存在遍历的文本标签，则将具有遍历的文本标签的连续相邻内容标签对应的所有文本框和所述前一位文本标签对应的文本框进行合并，以获取合并文本框，并将合并文本框作为所述图片对应的文本段落。If there is a traversed text label, merge all the text boxes corresponding to the consecutive adjacent content labels with the traversed text label and the text box corresponding to the previous text label to obtain the combined text box, and combine the combined text box as the text paragraph corresponding to the picture.

可选地，基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练的步骤，包括：Optionally, the step of inputting the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting includes:

依次提取各所述文本框的文本特征，根据所述排序的排序结果将各所述文本特征融合为序列特征，并将所述序列特征输入至预设的深度学习模型进行训练。The text features of each of the text boxes are sequentially extracted, each of the text features is fused into a sequence feature according to the sorting result of the sorting, and the sequence feature is input into a preset deep learning model for training.

可选地，依次提取各所述文本框的文本特征的步骤，包括：Optionally, the step of sequentially extracting text features of each of the text boxes includes:

依次遍历各所述文本框，并提取遍历的文本框的位置特征，语言特征和图像特征，将所述位置特征，语言特征和图像特征作为所述遍历的文本框的文本特征。Traversing each of the text boxes in turn, extracting the position features, language features and image features of the traversed text boxes, and using the position features, language features and image features as the text features of the traversed text boxes.

此外，为实现上述目的，本申请还提供一种文本段落结构还原装置，所述文本段落结构还原装置包括：In addition, in order to achieve the above purpose, the present application also provides a text paragraph structure restoration device, wherein the text paragraph structure restoration device includes:

确定模块，用于对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；A determination module, configured to identify the target picture, and determine all text boxes and the text box positions of each of the text boxes in the target picture based on the recognition result of the identification;

输入模块，用于根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；an input module, configured to sort each of the text boxes according to the position of each of the text boxes, and input the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting;

获取模块，用于基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。an obtaining module, configured to perform merging processing on each of the text boxes based on the training result to obtain all text paragraphs corresponding to the target picture.

此外，为实现上述目的，本申请还提供一种文本段落结构还原设备；In addition, in order to achieve the above purpose, the present application also provides a text paragraph structure restoration device;

所述文本段落结构还原设备包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，其中：The text paragraph structure restoration device includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:

所述计算机程序被所述处理器执行时实现如上所述的文本段落结构还原方法的步骤。When the computer program is executed by the processor, the steps of the above-mentioned method for restoring the structure of a text paragraph are implemented.

此外，为实现上述目的，本申请还提供一种计算机可读存储介质；In addition, in order to achieve the above purpose, the present application also provides a computer-readable storage medium;

所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上述的文本段落结构还原方法的步骤。A computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor, the steps of the above-mentioned method for restoring the structure of a text paragraph are implemented.

有益效果beneficial effect

本申请通过对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。通过根据对目标图片的识别结果确定各个文本框和文本框位置，并根据各个文本框位置对各个文本框进行排序，将各个文本框的文本特征输入至预设的深度学习模型进行训练，基于训练结果对各个文本框进行合并处理，以获取文本段落，从而避免了用户手动操作，导致得到的文本段落的准确性较低的现象发生，提高了文本段落结构还原的准确性。The present application identifies the target picture, and determines all text boxes in the target picture and the text box positions of each text box based on the recognition result; Sort, and input the text features of each of the text boxes into a preset deep learning model for training based on the sorting results of the sorting; perform a merge process on each of the text boxes based on the training results to obtain the target All text paragraphs corresponding to the image. By determining each text box and the position of the text box according to the recognition result of the target image, sorting each text box according to the position of each text box, and inputting the text features of each text box into the preset deep learning model for training, based on the training As a result, each text box is merged to obtain text paragraphs, thereby avoiding manual operation by users, resulting in the phenomenon of low accuracy of the obtained text paragraphs, and improving the accuracy of text paragraph structure restoration.

附图说明Description of drawings

图1是本申请实施例方案涉及的硬件运行环境的文本段落结构还原设备结构示意图；1 is a schematic structural diagram of a text paragraph structure restoration device of the hardware operating environment involved in the solution of the embodiment of the present application;

图2为本申请文本段落结构还原方法第一实施例的流程示意图；2 is a schematic flowchart of a first embodiment of a method for restoring a text paragraph structure of the present application;

图3为本申请文本段落结构还原装置的功能模块示意图；FIG. 3 is a schematic diagram of functional modules of an apparatus for restoring the structure of text paragraphs of the present application;

图4为本申请文本段落结构还原方法的文本框排序示意图。FIG. 4 is a schematic diagram of sorting text boxes of the method for restoring the structure of text paragraphs of the present application.

本申请目的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional features and advantages of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.

本发明的实施方式Embodiments of the present invention

应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

如图1所示，图1是本申请实施例方案涉及的硬件运行环境的文本段落结构还原设备结构示意图。As shown in FIG. 1 , FIG. 1 is a schematic structural diagram of a text paragraph structure restoration device of the hardware operating environment involved in the solution of the embodiment of the present application.

如图1所示，该文本段落结构还原设备可以包括：处理器1001，例如CPU，网络接口1004，用户接口1003，存储器1005，通信总线1002。其中，通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏（Display）、输入单元比如键盘（Keyboard），可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口（如WI-FI接口）。存储器1005可以是高速RAM存储器，也可以是稳定的存储器（non-volatile memory），例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the text paragraph structure restoration device may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory). memory), such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

可选地，文本段落结构还原设备还可以包括摄像头、RF（Radio Frequency，射频）电路，传感器、音频电路、WiFi模块等等。其中，传感器比如光传感器、运动传感器以及其他传感器。具体地，光传感器可包括环境光传感器及接近传感器，其中，环境光传感器可根据环境光线的明暗来调节显示屏的亮度。当然，文本段落结构还原设备还可配置陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器，在此不再赘述。Optionally, the text paragraph structure restoration device may further include a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. Among them, sensors such as light sensors, motion sensors and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of the ambient light. Of course, the text paragraph structure restoration device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which will not be repeated here.

本领域技术人员可以理解，图1中示出的文本段落结构还原设备结构并不构成对文本段落结构还原设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art can understand that the structure of the text paragraph structure restoration device shown in FIG. 1 does not constitute a limitation on the text paragraph structure restoration device, and may include more or less components than those shown in the figure, or combine some components, Or a different component arrangement.

如图1所示，作为一种计算机存储介质的存储器1005中可以包括操作***、网络通信模块、用户接口模块以及文本段落结构还原程序。As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a text paragraph structure restoration program.

在图1所示的文本段落结构还原设备中，网络接口1004主要用于连接后台服务器，与后台服务器进行数据通信；用户接口1003主要用于连接客户端（用户端），与客户端进行数据通信；而处理器1001可以用于调用存储器1005中存储的文本段落结构还原程序，并执行本申请实施例提供的文本段落结构还原方法。In the text paragraph structure restoration device shown in FIG. 1 , the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client and the processor 1001 may be configured to call the text paragraph structure restoration program stored in the memory 1005, and execute the text paragraph structure restoration method provided by the embodiment of the present application.

参照图2，本申请提供一种文本段落结构还原方法，在文本段落结构还原方法一实施例中，文本段落结构还原方法包括以下步骤：2, the present application provides a method for restoring a text paragraph structure. In an embodiment of the method for restoring a text paragraph structure, the method for restoring a text paragraph structure includes the following steps:

步骤S10，对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；Step S10, identifying the target picture, and determining all text boxes in the target picture and the text box positions of each of the text boxes based on the recognition result of the identification;

在本实施例中，当需要对目标图片中的文本段落进行还原时，可以将对目标图片检测识别得到的文本行信息（即文本框），并转换为序列特征输入至预设的深度学习模型中，通过深度学习模型进行序列标注，从而得到每个文本框的类别，根据各个类别分别进行合并，得到具体的文本段落。In this embodiment, when the text paragraph in the target image needs to be restored, the text line information (ie, text box) obtained by detecting and recognizing the target image can be converted into sequence features and input to the preset deep learning model In , the deep learning model is used to perform sequence labeling, so as to obtain the category of each text box, and merge them according to each category to obtain specific text paragraphs.

因此在本实施例中可以先对目标图片进行识别，而识别的方式可以是通过文本识别技术，如OCR（Optical Character Recognition，光学字符识别）技术来对目标图片进行识别，以确定目标图片中是否存在文本内容，若存在文本内容，且文本内容分布在不同的位置，则可以根据OCR识别的识别结果来获取目标图片中的各个文本框位置（如x1，y1，x2，y2）和文字内容信息，还可以是采用其它文本识别模型进行。并且当通过文本识别技术对目标图片进行识别获取到文本框和文本框的文本框位置后，则检测文本框是否有且只有一个，若是，则可以直接将该文本框作为目标图片中的文本段落，其文本框位置也就是文本段落的位置。但是若检测到的文本框存在多个，则需要根据对各个文本框及其对应的文本框位置来确定目标图片中的文本段落。Therefore, in this embodiment, the target image can be recognized first, and the recognition method can be through text recognition technology, such as OCR (Optical Character Recognition (Optical Character Recognition) technology to identify the target picture to determine whether there is text content in the target picture, if there is text content, and the text content is distributed in different locations, the target can be obtained according to the recognition results of OCR recognition. The position of each text box in the picture (such as x1, y1, x2, y2) and text content information can also be performed by using other text recognition models. And when the target image is identified by the text recognition technology to obtain the text box and the text box position of the text box, it is detected whether there is one and only one text box, and if so, the text box can be directly used as the text paragraph in the target picture. , whose text box position is also the position of the text paragraph. However, if there are multiple detected text boxes, it is necessary to determine the text paragraphs in the target image according to the positions of each text box and its corresponding text box.

步骤S20，根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；Step S20, sorting the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes into a preset deep learning model for training based on the sorting results of the sorting;

当获取到各个文本框和各个文本框的文本框位置后，可以根据各个文本框位置对各个文本框进行编号，而编号的顺序可以根据用户的需求自行进行设置，如可以设置从目标图片的上方开始进行编号，直至每个文本框都具有各自的编号，并根据各个编号对各个文本框件排序，得到排序后的各个文本框。例如，如图4所示，可以对各个文本框按照文本框的文本框位置，从上到下，从左到右依次进行编号排序，如1-12。After obtaining each text box and the text box position of each text box, each text box can be numbered according to the position of each text box, and the order of the numbering can be set according to the user's needs, for example, it can be set from the top of the target picture. Start numbering until each text box has its own number, and sort each text box piece according to each number to obtain the sorted text boxes. For example, as shown in FIG. 4 , each text box may be numbered and sorted according to the text box position of the text box, from top to bottom, and from left to right, such as 1-12.

当完成各个文本框的排序，得到排序的排序结果后，可以依次获取各个文本框的文本特征，如位置特征，语言特征和图像特征等。其中，位置特征可以是文本框的顶点坐标，中心点，文本框的宽和高等特征。语言特征可以是文本框中文本的语言模型特征，例如：文本的词向量，句向量，文本ngram（n元模型）得分等。图像特征可以是使用卷积神经网络对图像中的文本区域进行特征提取后的特征等。并将获取到的文本特征作为序列特征输入到提前设置训练好的深度学习模型中进行训练，以获取训练结果。When the sorting of each text box is completed and the sorting result of the sorting is obtained, the text features of each text box, such as position features, language features, and image features, can be obtained in sequence. Among them, the position feature can be the vertex coordinates of the text box, the center point, the width and height features of the text box. The language feature can be the language model feature of the text in the text box, such as: the word vector of the text, the sentence vector, the text ngram (n-gram) score, etc. The image features may be features obtained by using a convolutional neural network to extract features from text regions in the image. And input the acquired text features as sequence features into the pre-trained deep learning model for training to obtain the training results.

而对深度学习模型的训练过程可以是先收集大量文档图片，并使用OCR***进行文字的检测识别，得到文档图片中各个文本框的三类特征，即位置特征，语言特征和图像特征。并且还会人工对文档图片中各个文本框的三类特征进行标注，即标注文档图片中各个文本框属于文本段落的哪一部分，确定文档图片中各个文本框的人工标注结果。再将各个文本框的三类特征输入到深度学习模型中进行训练，得到模型输出结果，即文档图片中各个文本框的标签（即具有文本段落信息的标签），并依次遍历文档图片中各个文本框的标签，将遍历的文档图片中文本框的标签和与其对应的人工标注结果进行比对，若存在差异，则需要对深度学习模型进行优化，如使用梯度下降法优化深度学习模型，直至优化完成，得到训练好的深度学习模型，即预设的深度学习模型。其中，深度学习模型可以为各种序列模型，如循环神经网络模型，卷积神经网络模型等。The training process of the deep learning model can be to first collect a large number of document pictures, and use the OCR system to detect and identify the text, and obtain three types of features of each text box in the document picture, namely location features, language features and image features. In addition, three types of features of each text box in the document picture are manually annotated, that is, which part of the text paragraph each text box in the document picture belongs to, and the manual labeling result of each text box in the document picture is determined. Then input the three types of features of each text box into the deep learning model for training, and get the model output result, that is, the label of each text box in the document picture (that is, the label with text paragraph information), and traverse each text in the document picture in turn. The label of the box, compare the label of the text box in the traversed document image with the corresponding manual labeling result. If there is a difference, the deep learning model needs to be optimized, such as using the gradient descent method to optimize the deep learning model. After completion, the trained deep learning model, that is, the preset deep learning model, is obtained. Among them, the deep learning model can be various sequence models, such as a recurrent neural network model, a convolutional neural network model, and the like.

步骤S30，基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。Step S30, combining each of the text boxes based on the training result to obtain all text paragraphs corresponding to the target picture.

当获取到训练结果后，可以得到各个文本框的标签，如BIO标签，并各个文本框对应的标签进行合并操作，以得到目标图片对应的所有文本段落。而合并操作可以是根据不同的B IO标签进行。例如，若文本框的标签是O标签，则可以确定该文本框自成一个段落，其文本框位置即文本段落范围。若文本框的标签是B标签，则可以确定该文本段落的覆盖范围是从B标签所在文本框开始的，直至序列中最后一个连续的I标签结束，此时该文本段落的段落范围可以是B标签和此连续的I标签对应的所有文本框的文本框位置。而目标图片中的所有文本段落范围则可以根据各个BIO标签来确定。When the training results are obtained, the labels of each text box, such as the BIO label, can be obtained, and the labels corresponding to each text box can be merged to obtain all the text paragraphs corresponding to the target image. The merge operation can be performed according to different B IO tags. For example, if the label of the text box is an O label, it can be determined that the text box forms a paragraph by itself, and the position of the text box is the range of the text paragraph. If the label of the text box is the B label, it can be determined that the coverage range of the text paragraph starts from the text box where the B label is located until the end of the last consecutive I label in the sequence. At this time, the paragraph range of the text paragraph can be B Label and the text box position of all text boxes corresponding to this consecutive I label. The range of all text paragraphs in the target image can be determined according to each BIO tag.

也就是当获取到深度学习模型的训练结果后，可以根据此训练结果来确定各个文本框对应的文本标签，如B标签，I标签和O标签等。然后再遍历各个文本框对应的文本标签，并需要检测遍历的文本标签对应的标签信息，即遍历标签信息。再确定该遍历标签信息是否为段落，以便根据确定结果来判断遍历的文本标签对应的文本框是否为一个文本段落，如O标签对应的标签信息。当经过判断发现遍历标签信息是段落，则可以确定遍历标签信息对应的文本框文目标图片对应的文本段落。That is, after obtaining the training result of the deep learning model, the text labels corresponding to each text box, such as B label, I label and O label, can be determined according to the training result. Then traverse the text labels corresponding to each text box, and need to detect the label information corresponding to the traversed text labels, that is, traverse the label information. Then determine whether the traversed label information is a paragraph, so as to determine whether the text box corresponding to the traversed text label is a text paragraph according to the determination result, such as the label information corresponding to the O label. When it is determined that the traversal label information is a paragraph, the text paragraph corresponding to the text box text target image corresponding to the traversal label information can be determined.

但是若遍历标签信息不是段落，则还需要检查遍历标签信息是否为段落内容，如I标签对应的标签信息。当经过判断发现遍历标签信息是段落内容时，则需要确定遍历的文本标签的前一位标签的标签信息是否为段落起始信息，如B标签对应的标签信息，若是，则可以直接根据遍历的文本标签对应的文本框和前一位文本标签对应的文本框来确定目标图片的文本段落。而且在确定文本段落时，还需要检测在各个文本框中是否存在连续相邻内容标签，即确定是否存在连续相邻的内容标签，如连续相邻的I标签。若存在连续相邻内容标签，则还需要判断连续相邻内容标签中是否存在本次遍历的文本标签，若不存在遍历的文本标签，则可以直接将遍历的文本标签对应的文本框和前一位文本标签对应的文本框进行合并处理，并将合并后的文本框作为目标图片对应的文本段落。However, if the traversed label information is not a paragraph, it is also necessary to check whether the traversed label information is paragraph content, such as the label information corresponding to the I label. When it is judged that the traversed label information is the content of the paragraph, it is necessary to determine whether the label information of the previous label of the traversed text label is the paragraph start information, such as the label information corresponding to the B label. The text box corresponding to the text label and the text box corresponding to the previous text label are used to determine the text paragraph of the target image. Moreover, when determining a text paragraph, it is also necessary to detect whether there are consecutive adjacent content tags in each text box, that is, to determine whether there are consecutive adjacent content tags, such as consecutive adjacent I tags. If there are consecutive adjacent content labels, it is also necessary to judge whether there is a text label for this traversal in the consecutive adjacent content labels. If there is no traversed text label, you can directly compare the text box corresponding to the traversed text label with the previous The text boxes corresponding to the bit text labels are merged, and the merged text boxes are used as the text paragraphs corresponding to the target image.

但是若存在遍历的文本标签，则需要确定具有遍历的文本标签的所有相邻文本标签，并将具有遍历的文本标签的连续相邻内容标签对应的所有文本框进行合并处理。例如若遍历的文本标签是I4，连续相邻的文本标签是I1-I5，则连续相邻的文本标签中包含有遍历的文本标签I4，此时就可以将连续相邻的所有文本标签对应的文本框进行合并，即将I1，I2，I3，I4和I5对应的文本框进行合并，得到合并后的文本框，再将其与前一位文本标签对应的文本框进行合并，即和B标签对应的文本框进行合并处理，得到合并文本框，此时就可以直接将合并文本框作为图片对应的文本段落。However, if there are traversed text labels, all adjacent text labels with traversed text labels need to be determined, and all text boxes corresponding to consecutive adjacent content labels with traversed text labels are merged. For example, if the traversed text label is I4, and the consecutive adjacent text labels are I1-I5, then the consecutive adjacent text labels contain the traversed text label I4, at this time, all consecutive adjacent text labels corresponding to The text boxes are merged, that is, the text boxes corresponding to I1, I2, I3, I4 and I5 are merged to obtain the merged text box, and then it is merged with the text box corresponding to the previous text label, that is, corresponding to the B label The merged text boxes are merged to obtain a merged text frame. At this time, the merged text frame can be directly used as the text paragraph corresponding to the picture.

并且在本实施例中，通过对目标图片进行检测，确定各个文本框及各个文本框的文本特征，根据各个文本特征形成序列特征输入至深度学习模型中进行训练预测，能够避免手工设计规则，支持各种复杂结构，并取得较高的准确率。并且可以将文本结果中分散的文本行信息转化为文本段落，提高文档录入和纸质文本电子化的效率，更好地推动各行业信息化和数字化的发展。And in this embodiment, by detecting the target image, determining the text features of each text box and each text box, forming sequence features according to each text feature and inputting them into the deep learning model for training prediction, which can avoid manual design rules and support Various complex structures and achieve high accuracy. And it can convert the scattered text line information in the text results into text paragraphs, improve the efficiency of document entry and electronic paper text, and better promote the development of informatization and digitization in various industries.

在本实施例中，通过对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。通过根据对目标图片的识别结果确定各个文本框和文本框位置，并根据各个文本框位置对各个文本框进行排序，将各个文本框的文本特征输入至预设的深度学习模型进行训练，基于训练结果对各个文本框进行合并处理，以获取文本段落，从而避免了用户手动操作，导致得到的文本段落的准确性较低的现象发生，提高了文本段落结构还原的准确性。In this embodiment, by recognizing the target picture, all text boxes in the target picture and the text box positions of each text box are determined based on the recognition result of the recognition; The text boxes are sorted, and based on the sorting results of the sorting, the text features of each of the text boxes are input into a preset deep learning model for training; each of the text boxes is merged based on the training results. Acquire all text paragraphs corresponding to the target image. By determining each text box and the position of the text box according to the recognition result of the target image, sorting each text box according to the position of each text box, and inputting the text features of each text box into the preset deep learning model for training, based on the training As a result, each text box is merged to obtain text paragraphs, thereby avoiding manual operation by users, resulting in the phenomenon of low accuracy of the obtained text paragraphs, and improving the accuracy of text paragraph structure restoration.

进一步地，在本申请第一实施例的基础上，提出了本申请文本段落结构还原方法的第二实施例，本实施例是本申请第一实施例的步骤S10，基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落的步骤，包括：Further, based on the first embodiment of the present application, a second embodiment of the method for restoring the text paragraph structure of the present application is proposed. This embodiment is step S10 of the first embodiment of the present application. The text box is merged to obtain all the text paragraphs corresponding to the target picture, including:

步骤a，基于所述训练结果确定各所述文本框对应的文本标签，遍历各所述文本标签，检测遍历的文本标签对应的遍历标签信息是否为段落；Step a: Determine the text label corresponding to each of the text boxes based on the training result, traverse each of the text labels, and detect whether the traversal label information corresponding to the traversed text label is a paragraph;

在本实施例中，当获取到深度学习模型的训练结果后，可以根据此训练结果来确定各个文本框对应的文本标签，如B标签，I标签和O标签等。然后再遍历各个文本框对应的文本标签，并需要检测遍历的文本标签对应的标签信息，即遍历标签信息。再确定该遍历标签信息是否为段落，以便根据确定结果来判断遍历的文本标签对应的文本框是否为一个文本段落，如O标签对应的标签信息。In this embodiment, after the training result of the deep learning model is obtained, the text labels corresponding to each text box, such as the B label, the I label, and the O label, can be determined according to the training result. Then traverse the text labels corresponding to each text box, and need to detect the label information corresponding to the traversed text labels, that is, traverse the label information. Then determine whether the traversed label information is a paragraph, so as to determine whether the text box corresponding to the traversed text label is a text paragraph according to the determination result, such as the label information corresponding to the O label.

步骤b，若所述遍历标签信息是段落，则确定所述遍历的文本标签对应的文本框为所述目标图片对应的文本段落。Step b, if the traversed label information is a paragraph, determine that the text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.

当经过判断发现遍历标签信息是段落，则可以确定遍历标签信息对应的文本框文目标图片对应的文本段落。并且在本实施例中，可以对所有的标签信息均采用相同的方式进行检测，直至确定所有的文本段落。When it is determined that the traversal label information is a paragraph, the text paragraph corresponding to the text box text target image corresponding to the traversal label information can be determined. And in this embodiment, all tag information can be detected in the same way until all text paragraphs are determined.

在本实施例中，通过根据训练结果确定各个文本框对应的文本标签，并遍历各个文本标签，在遍历的文本标签的遍历标签信息是段落时，将遍历的文本标签对应的文本框作为目标图片对应的文本段落，从而保障了获取的文本段落的准确性。In this embodiment, by determining the text label corresponding to each text box according to the training result, and traversing each text label, when the traversed label information of the traversed text label is a paragraph, the text box corresponding to the traversed text label is used as the target image Corresponding text paragraphs, thus ensuring the accuracy of the acquired text paragraphs.

进一步地，检测遍历的文本标签对应的遍历标签信息是否为段落的步骤之后，包括：Further, after the step of detecting whether the traversed label information corresponding to the traversed text label is a paragraph, the following steps are included:

步骤c，若否，则检测所述遍历标签信息是否为段落内容；Step c, if not, then detect whether the traversal label information is paragraph content;

当经过判断发现遍历标签信息不是段落，则还需要检查遍历标签信息是否为段落内容，如I标签对应的标签信息，并根据不同的检测结果执行不同的操作。When it is determined that the traversal label information is not a paragraph, it is also necessary to check whether the traversal label information is paragraph content, such as the label information corresponding to the I label, and perform different operations according to different detection results.

步骤d，若所述遍历标签信息是段落内容，则确定遍历的文本标签的前一位文本标签的标签信息是否为段落起始信息；Step d, if the traversed label information is paragraph content, then determine whether the label information of the previous text label of the traversed text label is paragraph start information;

当经过判断发现遍历标签信息是段落内容，则还需要确定遍历的文本标签的前一位标签的标签信息是否为段落起始信息，如B标签对应的标签信息，以便确定遍历标签信息对应的文本框所在段落的起始位置，并根据不同的检测结果执行不同的操作。When it is found that the traversed label information is the content of the paragraph, it is also necessary to determine whether the label information of the previous label of the traversed text label is the paragraph start information, such as the label information corresponding to the B label, so as to determine the text corresponding to the traversed label information. The starting position of the paragraph where the box is located, and different operations are performed according to different detection results.

步骤e，若所述前一位文本标签的标签信息是段落起始信息，则基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落。Step e, if the label information of the previous text label is paragraph start information, determine the text paragraph corresponding to the target picture based on the traversed text label and the previous text label.

当经过判断发现前一位文本标签的标签信息不是段落起始信息，而是段落内容，则确定所有连续相邻标签信息为段落内容的连续相邻内容文本标签，并从中确定具有遍历的文本标签的目标连续相邻内容文本标签，将目标连续相邻内容文本标签对应的所有文本框和目标连续相邻内容文本标签前一位文本标签对应的文本框进行合并处理，作为目标图片中的文本段落。When it is judged that the label information of the previous text label is not the paragraph start information, but the paragraph content, then all consecutive adjacent label information is determined to be the consecutive adjacent content text labels of the paragraph content, and the traversed text labels are determined. The target continuous adjacent content text label, merge all the text boxes corresponding to the target continuous adjacent content text label and the text box corresponding to the previous text label of the target continuous adjacent content text label, as the text paragraph in the target image .

若前一位文本标签的标签信息是段落起始信息，则需要确定遍历的文本标签的下一位文本标签的标签信息是否为段落内容，若不是段落内容，则可以直接将遍历的文本标签对应的文本框和前一位文本标签对应的文本框合并在一起作为目标图片中的一个文本段落。若下一位文本标签的标签信息是段落内容，则需要继续对下一位文本标签的下一位文本标签进行相同的检测操作，直至标签信息不是段落内容，此时会将标签信息为段落内容的所有相邻的包含有遍历的文本标签的标签对应的文本框进行合并，并将其作为目标图片中的一个文本段落。If the label information of the previous text label is the paragraph start information, it is necessary to determine whether the label information of the next text label of the traversed text label is the paragraph content. If it is not the paragraph content, the traversed text label can be directly corresponded to The text box of , and the text box corresponding to the previous text label are combined together as a text paragraph in the target image. If the label information of the next text label is paragraph content, you need to continue to perform the same detection operation on the next text label of the next text label until the label information is not paragraph content, then the label information will be the paragraph content. All adjacent text boxes corresponding to the labels containing the traversed text labels are merged and used as a text paragraph in the target image.

在本实施例中，通过在确定遍历标签信息是段落内容，且遍历的文本标签的前一位文本标签的标签信息是段落起始信息时，根据遍历的文本标签和前一位文本标签确定文本段落，从而保障了获取的文本段落的准确性。In this embodiment, when it is determined that the traversed label information is paragraph content, and the label information of the previous text label of the traversed text label is the paragraph start information, the text is determined according to the traversed text label and the previous text label. paragraphs, thus ensuring the accuracy of the acquired text paragraphs.

具体地，基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落的步骤，包括：Specifically, the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label includes:

步骤f，检测各所述文本标签中是否存在连续相邻内容标签；Step f, detects whether there is continuous adjacent content label in each described text label;

在确定文本段落时，还需要检测在各个文本框中是否存在连续相邻内容标签，即确定是否存在连续相邻的内容标签，如连续相邻的I标签。并根据不同检测结果执行不同的操作。其中，若文本标签的标签信息是段落内容，并且和该文本标签相邻的文本标签的标签信息也是段落内容，则将该文本标签和与之相邻的文本标签都作为连续相邻内容标签。When determining a text paragraph, it is also necessary to detect whether there are consecutive adjacent content labels in each text box, that is, to determine whether there are consecutive adjacent content labels, such as consecutive adjacent I labels. And perform different operations according to different detection results. Wherein, if the label information of the text label is paragraph content, and the label information of the text label adjacent to the text label is also the paragraph content, both the text label and the adjacent text label are regarded as continuous adjacent content labels.

步骤g，若存在连续相邻内容标签，则确定所述连续相邻内容标签中是否存在遍历的文本标签；Step g, if there is a continuous adjacent content label, then determine whether there is a traversed text label in the continuous adjacent content label;

当经过判断发现存在连续相邻内容标签，且不同的连续相邻内容标签存在多个，则需要继续确定在各个连续相邻内容标签中是否存在遍历的文本标签，并根据不同的确定结果执行不同的操作。若不存在连续相邻内容标签，则直接将遍历的文本标签对应的文本框和前一位文本标签对应的文本框进行合并，并将合并结果作为目标图片的文本段落。When it is found that there are continuous adjacent content tags after judgment, and there are multiple different continuous adjacent content tags, it is necessary to continue to determine whether there are traversed text tags in each continuous adjacent content tag, and perform different executions according to different determination results. operation. If there are no consecutive adjacent content tags, the text box corresponding to the traversed text tag and the text box corresponding to the previous text tag are directly merged, and the merged result is used as the text paragraph of the target image.

步骤h，若不存在遍历的文本标签，则将所述遍历的文本标签对应的文本框和所述前一位文本标签对应的文本框进行合并，以获取合并后的文本框，并将所述合并后的文本框作为所述目标图片对应的文本段落。In step h, if there is no traversed text label, the text box corresponding to the traversed text label and the text box corresponding to the previous text label are merged to obtain the merged text box, and the The combined text box is used as the text paragraph corresponding to the target picture.

若不存在遍历的文本标签，则可以直接将遍历的文本标签对应的文本框和前一位文本标签对应的文本框进行合并处理，并将合并后的文本框作为目标图片对应的文本段落。If there is no traversed text label, the text box corresponding to the traversed text label and the text box corresponding to the previous text label can be directly merged, and the merged text box is used as the text paragraph corresponding to the target image.

在本实例中，通过确定各个文本标签中存在连续相邻内容标签，且连续相邻内容标签中不存在遍历的文本标签时，对遍历的文本标签对应的文本框和前一位文本标签对应的文本框进行合并，将合并后的文本框作为文本段落，从而保障了获取的文本段落的准确性。In this example, when it is determined that there are consecutive adjacent content labels in each text label, and there is no traversed text label in the consecutive adjacent content labels, the text box corresponding to the traversed text label and the text box corresponding to the previous text label are determined. The text boxes are combined, and the combined text boxes are used as text paragraphs, thereby ensuring the accuracy of the acquired text paragraphs.

进一步地，确定所述连续相邻内容标签中是否存在遍历的文本标签的步骤之后，包括：Further, after the step of determining whether there is a traversed text label in the continuous adjacent content labels, it includes:

步骤k，若存在遍历的文本标签，则将具有遍历的文本标签的连续相邻内容标签对应的所有文本框和所述前一位文本标签对应的文本框进行合并，以获取合并文本框，并将合并文本框作为所述图片对应的文本段落。Step k, if there is a traversed text label, merge all the text boxes corresponding to the consecutive adjacent content labels with the traversed text label and the text box corresponding to the previous text label to obtain the combined text box, and The merged text box is used as the text paragraph corresponding to the picture.

当经过判断发现遍历的文本标签，则需要确定具有遍历的文本标签的所有相邻文本标签，并将具有遍历的文本标签的连续相邻内容标签对应的所有文本框进行合并处理。例如若遍历的文本标签是I4，连续相邻的文本标签是I1-I5，则连续相邻的文本标签中包含有遍历的文本标签I4，此时就可以将连续相邻的所有文本标签对应的文本框进行合并，即将I1，I2，I3，I4和I5对应的文本框进行合并，得到合并后的文本框，再将其与前一位文本标签对应的文本框进行合并，即和B标签对应的文本框进行合并处理，得到合并文本框，此时就可以直接将合并文本框作为图片对应的文本段落。When traversed text labels are found after judgment, it is necessary to determine all adjacent text labels with traversed text labels, and merge all text boxes corresponding to consecutive adjacent content labels with traversed text labels. For example, if the traversed text label is I4, and the consecutive adjacent text labels are I1-I5, then the consecutive adjacent text labels contain the traversed text label I4, at this time, all consecutive adjacent text labels corresponding to The text boxes are merged, that is, the text boxes corresponding to I1, I2, I3, I4 and I5 are merged to obtain the merged text box, and then merged with the text box corresponding to the previous text label, that is, corresponding to the B label The merged text boxes are merged to obtain a merged text frame. At this time, the merged text frame can be directly used as the text paragraph corresponding to the picture.

在本实施例中，通过在确定连续相邻内容标签中存在遍历的文本标签时，将具有遍历的文本标签的连续相邻内容标签对应的所有文本框和前一位文本标签对应的文本框进行合并，并将合并文本框作为文本段落，从而保障了获取的文本段落的准确性。In this embodiment, when it is determined that there is a traversed text label in the consecutive adjacent content labels, all the text boxes corresponding to the consecutive adjacent content labels with the traversed text label and the text box corresponding to the previous text label are processed. Merge, and use the merged text box as a text paragraph, thereby ensuring the accuracy of the acquired text paragraph.

进一步地，基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练的步骤，包括：Further, the step of inputting the text features of each of the text boxes into a preset deep learning model for training based on the sorting results of the sorting includes:

步骤m，依次提取各所述文本框的文本特征，根据所述排序的排序结果将各所述文本特征融合为序列特征，并将所述序列特征输入至预设的深度学习模型进行训练。In step m, text features of each of the text boxes are sequentially extracted, each of the text features is fused into sequence features according to the sorting results, and the sequence features are input into a preset deep learning model for training.

在本实施例中，当获取到各个文本框，并且各个文本框进行排序后，可以依次提取各个文本框中的文本特征，如位置特征，语言特征和图像特征，并在提取到各个文本特征后，可以根据各个文本框的排序结果将各个文本特征如何为序列特征，并将序列特征作为预设的深度学习模型中的输入，输入值预设的深度学习模型中进行训练，以获取输出结果，即训练结果。In this embodiment, after each text box is acquired and sorted, the text features in each text box, such as position feature, language feature and image feature, can be sequentially extracted, and after each text feature is extracted , you can determine how each text feature is a sequence feature according to the sorting result of each text box, and use the sequence feature as the input in the preset deep learning model, and train the input value in the preset deep learning model to obtain the output result, That is, the training result.

在本实施例中，通过依次提取各个文本框的文本特征，并对各个文本特征进行融合，将融合后的序列特征输入至预设的深度学习模型进行训练，从而保障了训练的有效进行。In this embodiment, the text features of each text box are sequentially extracted, and each text feature is fused, and the fused sequence features are input into a preset deep learning model for training, thereby ensuring effective training.

进一步地，依次提取各所述文本框的文本特征的步骤，包括：Further, the step of sequentially extracting the text features of each of the text boxes includes:

步骤n，依次遍历各所述文本框，并提取遍历的文本框的位置特征，语言特征和图像特征，将所述位置特征，语言特征和图像特征作为所述遍历的文本框的文本特征。Step n, traverse each of the text boxes in sequence, and extract the position feature, language feature and image feature of the traversed text box, and use the position feature, language feature and image feature as the text feature of the traversed text box.

在本实施例中，提取所有文本框的文本特征时，可以依次遍历各个文本框，并提取遍历的文本框的位置特征，语言特征和图像特征，再将位置特征，语言特征和图像特征作为遍历的文本框的文本特征。也就是对所有的文本框均采用相同的提取操作。其中，位置特征可以是文本框的顶点坐标，中心点，文本框的宽和高等特征。语言特征可以是文本框中文本的语言模型特征，例如：文本的词向量，句向量，文本ngram得分等。图像特征可以是使用卷积神经网络对图像中的文本区域进行特征提取后的特征等。In this embodiment, when extracting the text features of all text boxes, each text box can be traversed in turn, and the position features, language features and image features of the traversed text boxes can be extracted, and then the position features, language features and image features are used as the traversal features. The text feature of the text box. That is, the same extraction operation is used for all text boxes. Among them, the position feature can be the vertex coordinates of the text box, the center point, the width and height features of the text box. The language feature can be the language model feature of the text in the text box, such as: the word vector of the text, the sentence vector, the text ngram score, etc. The image features may be features obtained by using a convolutional neural network to perform feature extraction on text regions in the image, and the like.

在本实施例中，通过提取遍历的文本框的位置特征，语言特征和图像特征，并将其作为遍历的文本框的文本特征，从而保障了获取到的文本特征的有效性。In this embodiment, the location feature, language feature and image feature of the traversed text box are extracted and used as the text feature of the traversed text box, thereby ensuring the validity of the acquired text feature.

此外，参照图3，本申请实施例还提出一种文本段落结构还原装置，所述文本段落结构还原装置包括：In addition, referring to FIG. 3 , an embodiment of the present application further proposes a text paragraph structure restoration device, where the text paragraph structure restoration device includes:

确定模块A10，用于对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；A determination module A10, configured to identify the target picture, and determine all text boxes in the target picture and the text box positions of each of the text boxes based on the recognition result of the identification;

输入模块A20，用于根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；An input module A20, configured to sort each of the text boxes according to the position of each of the text boxes, and input the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting;

获取模块A30，用于基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。The obtaining module A30 is configured to perform merging processing on each of the text boxes based on the training result, so as to obtain all the text paragraphs corresponding to the target picture.

进一步地，所述获取模块A30，还用于：Further, the acquisition module A30 is also used for:

基于所述训练结果确定各所述文本框对应的文本标签，遍历各所述文本标签，检测遍历的文本标签对应的遍历标签信息是否为段落；Determine the text label corresponding to each of the text boxes based on the training result, traverse each of the text labels, and detect whether the traversal label information corresponding to the traversed text label is a paragraph;

进一步地，所述输入模块A20，还用于：Further, the input module A20 is also used for:

其中，文本段落结构还原装置的各个功能模块实现的步骤可参照本申请文本段落结构还原方法的各个实施例，此处不再赘述。Wherein, for the steps implemented by each functional module of the text paragraph structure restoration apparatus, reference may be made to the various embodiments of the text paragraph structure restoration method of the present application, which will not be repeated here.

本申请还提供一种文本段落结构还原设备，所述文本段落结构还原设备包括：存储器、处理器及存储在所述存储器上的文本段落结构还原程序；所述处理器用于执行所述文本段落结构还原程序，以实现以下步骤：The present application further provides a text paragraph structure restoration device, the text paragraph structure restoration device includes: a memory, a processor, and a text paragraph structure restoration program stored on the memory; the processor is configured to execute the text paragraph structure. Restore the program to perform the following steps:

本申请还提供了一种计算机可读存储介质，所述计算机可读存储介质可以是非易失性，也可以是易失性。所述计算机可读存储介质存储有一个或者一个以上程序，所述一个或者一个以上程序还可被一个或者一个以上的处理器执行以用于实现上述文本段落结构还原方法各实施例的步骤。The present application also provides a computer-readable storage medium, where the computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores one or more programs, and the one or more programs can also be executed by one or more processors to implement the steps of the above embodiments of the text paragraph structure restoration method.

本申请计算机可读存储介质具体实施方式与上述文本段落结构还原方法各实施例基本相同，在此不再赘述。The specific implementation manner of the computer-readable storage medium of the present application is basically the same as the above-mentioned embodiments of the text paragraph structure restoration method, and details are not repeated here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者***不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者***所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者***中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disc), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of the present application.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.

Claims

一种文本段落结构还原方法，其中，所述文本段落结构还原方法包括以下步骤：A text paragraph structure restoration method, wherein the text paragraph structure restoration method comprises the following steps:

对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；Identifying the target picture, and determining the text box positions of all text boxes and each of the text boxes in the target picture based on the recognition result of the identification;

根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；Sort each of the text boxes according to the position of each of the text boxes, and input the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting;

基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。The text boxes are merged based on the training result to obtain all text paragraphs corresponding to the target picture.
如权利要求1所述的文本段落结构还原方法，其中，所述基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落的步骤，包括：The method for restoring text paragraph structure according to claim 1, wherein the step of merging each of the text boxes based on the training result to obtain all the text paragraphs corresponding to the target picture comprises:

基于所述训练结果确定各所述文本框对应的文本标签，遍历各所述文本标签，检测遍历的文本标签对应的遍历标签信息是否为段落；Determine the text label corresponding to each of the text boxes based on the training result, traverse each of the text labels, and detect whether the traversed label information corresponding to the traversed text label is a paragraph;

若所述遍历标签信息是段落，则确定所述遍历的文本标签对应的文本框为所述目标图片对应的文本段落。If the traversed label information is a paragraph, it is determined that the text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.
如权利要求2所述的文本段落结构还原方法，其中，所述检测遍历的文本标签对应的遍历标签信息是否为段落的步骤之后，包括：The method for restoring a text paragraph structure according to claim 2, wherein, after the step of detecting whether the traversed label information corresponding to the traversed text label is a paragraph, the method comprises:

若否，则检测所述遍历标签信息是否为段落内容；If not, then detect whether the traversal tag information is paragraph content;

若所述遍历标签信息是段落内容，则确定遍历的文本标签的前一位文本标签的标签信息是否为段落起始信息；If the traversed label information is paragraph content, then determine whether the label information of the previous text label of the traversed text label is paragraph start information;

若所述前一位文本标签的标签信息是段落起始信息，则基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落。If the label information of the previous text label is paragraph start information, the text paragraph corresponding to the target picture is determined based on the traversed text label and the previous text label.
如权利要求3所述的文本段落结构还原方法，其中，所述基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落的步骤，包括：The method for restoring a text paragraph structure according to claim 3, wherein the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label comprises:

检测各所述文本标签中是否存在连续相邻内容标签；Detecting whether there are consecutive adjacent content labels in each of the text labels;

若存在连续相邻内容标签，则确定所述连续相邻内容标签中是否存在遍历的文本标签；If there is a continuous adjacent content label, then determine whether there is a traversed text label in the continuous adjacent content label;

若不存在遍历的文本标签，则将所述遍历的文本标签对应的文本框和所述前一位文本标签对应的文本框进行合并，以获取合并后的文本框，并将所述合并后的文本框作为所述目标图片对应的文本段落。If there is no traversed text label, combine the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and combine the combined text box The text box is used as the text paragraph corresponding to the target picture.
如权利要求4所述的文本段落结构还原方法，其中，所述确定所述连续相邻内容标签中是否存在遍历的文本标签的步骤之后，包括：The method for restoring a text paragraph structure according to claim 4, wherein after the step of determining whether there are traversed text tags in the continuous adjacent content tags, the method comprises:

若存在遍历的文本标签，则将具有遍历的文本标签的连续相邻内容标签对应的所有文本框和所述前一位文本标签对应的文本框进行合并，以获取合并文本框，并将合并文本框作为所述图片对应的文本段落。If there is a traversed text label, merge all the text boxes corresponding to the consecutive adjacent content labels with the traversed text label and the text box corresponding to the previous text label to obtain the combined text box, and combine the combined text box as the text paragraph corresponding to the picture.
如权利要求1-5任一项所述的文本段落结构还原方法，其中，所述基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练的步骤，包括：The method for restoring a text paragraph structure according to any one of claims 1-5, wherein the step of inputting the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting ,include:

依次提取各所述文本框的文本特征，根据所述排序的排序结果将各所述文本特征融合为序列特征，并将所述序列特征输入至预设的深度学习模型进行训练。The text features of each of the text boxes are sequentially extracted, each of the text features is fused into a sequence feature according to the sorting result of the sorting, and the sequence feature is input into a preset deep learning model for training.
如权利要求6所述的文本段落结构还原方法，其中，所述依次提取各所述文本框的文本特征的步骤，包括：The method for restoring a text paragraph structure according to claim 6, wherein the step of sequentially extracting the text features of each of the text boxes comprises:

依次遍历各所述文本框，并提取遍历的文本框的位置特征，语言特征和图像特征，将所述位置特征，语言特征和图像特征作为所述遍历的文本框的文本特征。Traversing each of the text boxes in turn, extracting the position features, language features and image features of the traversed text boxes, and using the position features, language features and image features as the text features of the traversed text boxes.
一种文本段落结构还原装置，其中，所述文本段落结构还原装置包括：A text paragraph structure restoration device, wherein the text paragraph structure restoration device comprises:

确定模块，用于对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；A determination module, configured to identify the target picture, and determine all text boxes and the text box positions of each of the text boxes in the target picture based on the recognition result of the identification;

输入模块，用于根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；an input module, configured to sort each of the text boxes according to the position of each of the text boxes, and input the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting;

获取模块，用于基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。an obtaining module, configured to perform merging processing on each of the text boxes based on the training result to obtain all text paragraphs corresponding to the target picture.
一种文本段落结构还原设备，其中，所述文本段落结构还原设备包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的文本段落结构还原程序，所述文本段落结构还原程序被所述处理器执行时实现如下步骤：A text paragraph structure restoration device, wherein the text paragraph structure restoration device comprises: a memory, a processor, and a text paragraph structure restoration program stored on the memory and running on the processor, the text paragraph When the structure restoration program is executed by the processor, the following steps are implemented:

对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；Identifying the target picture, and determining the text box positions of all text boxes and each of the text boxes in the target picture based on the recognition result of the identification;

根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；Sort each of the text boxes according to the position of each of the text boxes, and input the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting;

基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。The text boxes are merged based on the training result to obtain all text paragraphs corresponding to the target picture.
如权利要求9所述的文本段落结构还原设备，其中，所述基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落的步骤，包括：The text paragraph structure restoration device according to claim 9, wherein the step of merging each of the text boxes based on the training result to obtain all the text paragraphs corresponding to the target picture comprises:

基于所述训练结果确定各所述文本框对应的文本标签，遍历各所述文本标签，检测遍历的文本标签对应的遍历标签信息是否为段落；Determine the text label corresponding to each of the text boxes based on the training result, traverse each of the text labels, and detect whether the traversed label information corresponding to the traversed text label is a paragraph;

若所述遍历标签信息是段落，则确定所述遍历的文本标签对应的文本框为所述目标图片对应的文本段落。If the traversed label information is a paragraph, it is determined that the text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.
如权利要求10所述的文本段落结构还原设备，其中，所述检测遍历的文本标签对应的遍历标签信息是否为段落的步骤之后，包括：The text paragraph structure restoration device according to claim 10, wherein, after the step of detecting whether the traversed label information corresponding to the traversed text label is a paragraph, the method comprises:

若否，则检测所述遍历标签信息是否为段落内容；If not, then detect whether the traversal tag information is paragraph content;

若所述遍历标签信息是段落内容，则确定遍历的文本标签的前一位文本标签的标签信息是否为段落起始信息；If the traversed label information is paragraph content, then determine whether the label information of the previous text label of the traversed text label is paragraph start information;

若所述前一位文本标签的标签信息是段落起始信息，则基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落。If the label information of the previous text label is paragraph start information, the text paragraph corresponding to the target picture is determined based on the traversed text label and the previous text label.
如权利要求11所述的文本段落结构还原设备，其中，所述基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落的步骤，包括：The text paragraph structure restoration device according to claim 11, wherein the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label comprises:

检测各所述文本标签中是否存在连续相邻内容标签；Detecting whether there are consecutive adjacent content labels in each of the text labels;

若存在连续相邻内容标签，则确定所述连续相邻内容标签中是否存在遍历的文本标签；If there is a continuous adjacent content label, then determine whether there is a traversed text label in the continuous adjacent content label;

若不存在遍历的文本标签，则将所述遍历的文本标签对应的文本框和所述前一位文本标签对应的文本框进行合并，以获取合并后的文本框，并将所述合并后的文本框作为所述目标图片对应的文本段落。If there is no traversed text label, combine the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and combine the combined text box The text box is used as the text paragraph corresponding to the target picture.
如权利要求12所述的文本段落结构还原设备，其中，所述确定所述连续相邻内容标签中是否存在遍历的文本标签的步骤之后，包括：The device for restoring a text paragraph structure according to claim 12, wherein after the step of determining whether there are traversed text tags in the continuous adjacent content tags, the method comprises:

若存在遍历的文本标签，则将具有遍历的文本标签的连续相邻内容标签对应的所有文本框和所述前一位文本标签对应的文本框进行合并，以获取合并文本框，并将合并文本框作为所述图片对应的文本段落。If there is a traversed text label, merge all the text boxes corresponding to the consecutive adjacent content labels with the traversed text label and the text box corresponding to the previous text label to obtain the combined text box, and combine the combined text box as the text paragraph corresponding to the picture.
如权利要求9-13任一项所述的文本段落结构还原设备，其中，所述基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练的步骤，包括：The text paragraph structure restoration device according to any one of claims 9-13, wherein the step of inputting the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting ,include:

依次提取各所述文本框的文本特征，根据所述排序的排序结果将各所述文本特征融合为序列特征，并将所述序列特征输入至预设的深度学习模型进行训练。The text features of each of the text boxes are sequentially extracted, each of the text features is fused into a sequence feature according to the sorting result of the sorting, and the sequence feature is input into a preset deep learning model for training.
一种计算机存储介质，其中，所述计算机存储介质上存储有文本段落结构还原程序，所述文本段落结构还原程序被处理器执行时实现如下步骤：A computer storage medium, wherein a text paragraph structure restoration program is stored on the computer storage medium, and when the text paragraph structure restoration program is executed by a processor, the following steps are implemented:

对目标图片进行识别，基于所述识别的识别结果确定所述目标图片中所有文本框和各所述文本框的文本框位置；Identifying the target picture, and determining the text box positions of all text boxes and each of the text boxes in the target picture based on the recognition result of the identification;

根据各所述文本框位置对各所述文本框进行排序，并基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练；Sort each of the text boxes according to the position of each of the text boxes, and input the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting;

基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落。The text boxes are merged based on the training result to obtain all text paragraphs corresponding to the target picture.
如权利要求15所述的计算机存储介质，其中，所述基于所述训练结果对各所述文本框进行合并处理，以获取所述目标图片对应的所有文本段落的步骤，包括：The computer storage medium according to claim 15, wherein the step of merging each of the text boxes based on the training result to obtain all the text paragraphs corresponding to the target picture comprises:

基于所述训练结果确定各所述文本框对应的文本标签，遍历各所述文本标签，检测遍历的文本标签对应的遍历标签信息是否为段落；Determine the text label corresponding to each of the text boxes based on the training result, traverse each of the text labels, and detect whether the traversed label information corresponding to the traversed text label is a paragraph;

若所述遍历标签信息是段落，则确定所述遍历的文本标签对应的文本框为所述目标图片对应的文本段落。If the traversed label information is a paragraph, it is determined that the text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.
如权利要求16所述的计算机存储介质，其中，所述检测遍历的文本标签对应的遍历标签信息是否为段落的步骤之后，包括：The computer storage medium according to claim 16, wherein after the step of detecting whether the traversed label information corresponding to the traversed text label is a paragraph, the step comprises:

若否，则检测所述遍历标签信息是否为段落内容；If not, then detect whether the traversal tag information is paragraph content;

若所述遍历标签信息是段落内容，则确定遍历的文本标签的前一位文本标签的标签信息是否为段落起始信息；If the traversed label information is paragraph content, then determine whether the label information of the previous text label of the traversed text label is paragraph start information;

若所述前一位文本标签的标签信息是段落起始信息，则基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落。If the label information of the previous text label is paragraph start information, the text paragraph corresponding to the target picture is determined based on the traversed text label and the previous text label.
如权利要求17所述的计算机存储介质，其中，所述基于所述遍历的文本标签和前一位文本标签确定所述目标图片对应的文本段落的步骤，包括：The computer storage medium according to claim 17, wherein the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label comprises:

检测各所述文本标签中是否存在连续相邻内容标签；Detecting whether there are consecutive adjacent content labels in each of the text labels;

若存在连续相邻内容标签，则确定所述连续相邻内容标签中是否存在遍历的文本标签；If there is a continuous adjacent content label, then determine whether there is a traversed text label in the continuous adjacent content label;

若不存在遍历的文本标签，则将所述遍历的文本标签对应的文本框和所述前一位文本标签对应的文本框进行合并，以获取合并后的文本框，并将所述合并后的文本框作为所述目标图片对应的文本段落。If there is no traversed text label, combine the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and combine the combined text box The text box is used as the text paragraph corresponding to the target picture.
如权利要求18所述的计算机存储介质，其中，所述确定所述连续相邻内容标签中是否存在遍历的文本标签的步骤之后，包括：The computer storage medium of claim 18, wherein after the step of determining whether there is a traversed text tag in the consecutive adjacent content tags, comprising:

若存在遍历的文本标签，则将具有遍历的文本标签的连续相邻内容标签对应的所有文本框和所述前一位文本标签对应的文本框进行合并，以获取合并文本框，并将合并文本框作为所述图片对应的文本段落。If there is a traversed text label, merge all the text boxes corresponding to the consecutive adjacent content labels with the traversed text label and the text box corresponding to the previous text label to obtain the combined text box, and combine the combined text box as the text paragraph corresponding to the picture.
如权利要求15-19任一项所述的计算机存储介质，其中，所述基于所述排序的排序结果将各所述文本框的文本特征输入至预设的深度学习模型进行训练的步骤，包括：The computer storage medium according to any one of claims 15 to 19, wherein the step of inputting the text features of each of the text boxes into a preset deep learning model for training based on the sorting result of the sorting includes the following steps: :

依次提取各所述文本框的文本特征，根据所述排序的排序结果将各所述文本特征融合为序列特征，并将所述序列特征输入至预设的深度学习模型进行训练。The text features of each of the text boxes are sequentially extracted, each of the text features is fused into a sequence feature according to the sorting result of the sorting, and the sequence feature is input into a preset deep learning model for training.