CN114550175A

CN114550175A - Image processing method, image processing device, electronic equipment and computer readable storage medium

Info

Publication number: CN114550175A
Application number: CN202210147338.4A
Authority: CN
Inventors: 尹康
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-05-27

Abstract

The application relates to an image processing method, an apparatus, an electronic device, a storage medium and a computer program product. The method comprises the following steps: determining a plurality of character areas in an image to be processed; preprocessing the character region in character recognition processing through a first processing thread to obtain a preprocessing result; performing reasoning processing in the character recognition processing on the preprocessed result through a second processing thread to obtain a reasoning result; performing post-processing in character recognition processing on the inference result through a third processing thread to obtain a character recognition result of the image to be processed; wherein the second processing thread is in parallel with at least one of the first processing thread and the third processing thread. By adopting the method, the image processing efficiency can be improved.

Description

Image processing method, image processing device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the development of computer technology, computer vision technology has been widely used in various fields such as retail, manufacturing, medical treatment, automatic driving, agriculture, and the like. Optical Character Recognition (OCR) is an important branch of computer vision technology, and OCR refers to a process of analyzing, recognizing and processing an image file of text data to obtain characters and layout information, that is, recognizing characters in an image to obtain a text Recognition result. However, the efficiency in the optical character recognition process is low at present.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, an electronic device, a computer readable storage medium and a computer program product, which can improve the image processing efficiency.

An image processing method comprising:

determining a plurality of character areas in an image to be processed;

preprocessing the character region in character recognition processing through a first processing thread to obtain a preprocessing result;

performing reasoning processing in the character recognition processing on the preprocessed result through a second processing thread to obtain a reasoning result;

performing post-processing in character recognition processing on the inference result through a third processing thread to obtain a character recognition result of the image to be processed;

wherein the second processing thread is in parallel with at least one of the first processing thread and the third processing thread.

An image processing apparatus comprising:

the character area determining module is used for determining a plurality of character areas in the image to be processed;

the preprocessing module is used for preprocessing the character recognition processing of the character area through the first processing thread to obtain a preprocessing result;

the reasoning module is used for carrying out reasoning processing in character recognition processing on the preprocessed result through a second processing thread to obtain a reasoning result;

the post-processing module is used for performing post-processing in character recognition processing on the inference result through a third processing thread to obtain a character recognition result of the image to be processed;

An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

determining a plurality of character areas in an image to be processed;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

determining a plurality of character areas in an image to be processed;

A computer program product comprising a computer program which when executed by a processor implements the steps of:

determining a plurality of character areas in an image to be processed;

performing reasoning processing in the character recognition processing on the preprocessing result through a second processing thread to obtain a reasoning result;

According to the image processing method, the device, the electronic equipment, the computer readable storage medium and the computer program product, for a plurality of character areas in the image to be processed, the character areas are preprocessed in the character recognition processing through the first processing thread, the preprocessing result is subjected to the reasoning processing through the second processing thread, the reasoning result is subjected to the post-processing through the third processing thread, the character recognition result of the image to be processed is obtained, and the second processing thread is parallel to at least one of the first processing thread and the third processing thread. In the image processing process, the preprocessing, the inference processing and the post-processing in the character recognition processing are respectively executed through different processing threads, and the thread for executing the inference processing is parallel to at least one thread of the thread for executing the preprocessing and the thread for executing the post-processing, so that the parallel processing in the character recognition processing is realized, and the image processing efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative work.

FIG. 1 is a diagram of an exemplary embodiment of an image processing method;

FIG. 2 is a flow diagram of a method of image processing in one embodiment;

FIG. 3 is a flow chart of an image processing method in another embodiment;

FIG. 4 is a diagram illustrating how the character recognition process takes time in one embodiment;

FIG. 5 is a schematic diagram illustrating the time consumption for sequential execution in one embodiment;

FIG. 6 is a diagram illustrating how time is consumed by pipeline execution in one embodiment;

FIG. 7 is a diagram illustrating an average elapsed time of a character recognition process according to an embodiment;

FIG. 8 is a diagram illustrating an average elapsed time of a character recognition process according to another embodiment;

FIG. 9 is a block diagram showing the configuration of an image processing apparatus according to an embodiment;

FIG. 10 is a diagram illustrating the internal architecture of an electronic device in one embodiment;

fig. 11 is an internal structural view of an electronic device in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The image processing method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The terminal 102 may send the image to be processed to the server 104 through the network, the server 104 may determine a plurality of character regions in the image to be processed after receiving the image to be processed, for the plurality of character regions in the image to be processed, the server 104 performs preprocessing in the character recognition processing on the character regions through a first processing thread, performs inference processing on a preprocessing result through a second processing thread, and performs post-processing on the inference result through a third processing thread to obtain the character recognition result of the image to be processed, where the second processing thread is parallel to at least one of the first processing thread and the third processing thread. The server 104 may also return the character recognition result of the image to be processed to the terminal 102. Furthermore, the image processing method may also be implemented by the server 104 or the terminal 102 alone, that is, by the server 104 alone, or by the terminal 102 alone.

The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device may be a smart watch, a smart bracelet, a head-mounted device, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, an image processing method is provided, which is described by taking the example that the method is applied to the server or the terminal in fig. 1, and includes the following steps:

step 202, determining a plurality of character areas in the image to be processed.

The image to be processed is an image that needs to be processed, and specifically, may be an image that needs to be subjected to character recognition processing. The image to be processed can be obtained by shooting through the terminal, the text exists in the image to be processed, and the text can be recognized from the image to be processed by performing character recognition processing on the image to be processed. The character area is an area where the existence of characters is detected in the image to be processed, and in a specific application, the character area may be a circumscribed rectangular frame to show the characters in the image. There may be several character areas in the image to be processed, so that character recognition may be performed based on the character areas to recognize text from the image to be processed.

Specifically, the server can determine a plurality of character areas in the image to be processed, and in the specific implementation, the server can acquire the image to be processed and perform character detection on the image to be processed, so that the character areas are detected from the image to be processed, the positioning of characters in the image to be processed is realized, and the characters can be specifically marked in a form of a circumscribed rectangular frame. In addition, the server can also directly acquire character areas in the image to be processed, the processing of detecting the character areas of the image to be processed can be realized in advance, and the server directly acquires a plurality of character areas in the image to be processed so as to perform character recognition processing.

Step 204, the character region is preprocessed in the character recognition process through the first processing thread, and a preprocessing result is obtained.

The thread is the smallest unit that the operating system can perform operation scheduling. The first processing thread is used to perform pre-processing in the character recognition process. The character recognition processing may be OCR processing, which aims to automatically extract and recognize characters in a target image or video through a computer vision technology, and is one of the most widely applied technologies in the field of computer vision at present. The specific processing of the preprocessing can be determined correspondingly according to an algorithm adopted by the character recognition processing, and specifically can include various processing such as cutting out a character area in an original image according to the coordinates of the character area, size normalization, value normalization, binarization processing and the like.

Specifically, the server may perform preprocessing in the character recognition processing on the character regions in the image to be processed through the first processing thread, that is, repeatedly execute the preprocessing step in the character recognition processing through the first processing thread, so as to perform preprocessing on a plurality of character regions in the image to be processed, respectively, to obtain a preprocessing result.

And step 206, performing reasoning processing in the character recognition processing on the preprocessing result through the second processing thread to obtain a reasoning result.

The inference process is a specific recognition process for performing character recognition on the preprocessing result, for example, the character recognition may be performed on the preprocessing result through a pre-trained artificial neural network model. For example, the inference process in the character recognition process can be realized by various models such as CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long Short Term Memory), and the like, and an inference result can be obtained.

Specifically, the server may obtain the preprocessing result obtained by the processing of the first processing thread through the second processing thread, and perform inference processing in the character recognition processing on the preprocessing result by the second processing thread, so as to facilitate the character inference of the preprocessing result, and obtain the inference result.

Step 208, performing post-processing in character recognition processing on the inference result through a third processing thread to obtain a character recognition result of the image to be processed; wherein the second processing thread is in parallel with at least one of the first processing thread and the third processing thread.

The specific post-processing may be determined according to an algorithm adopted in the character recognition processing, and specifically may include various processing such as traversing a model output probability vector, dictionary mapping, and the like. The character recognition result is a processing result obtained after character recognition processing is performed on the image to be processed, and specifically may be a text in the image to be processed. A second processing thread in parallel with at least one of the first processing thread and the third processing thread. Specifically, the second processing thread may be parallel to the first processing thread, that is, while the first processing thread performs preprocessing, the second processing thread performs inference processing on a preprocessing result obtained by the first processing thread. The second processing thread may be parallel to the third processing thread, that is, while the second processing thread performs inference processing, the third processing thread performs post-processing on an inference processing result obtained by the second processing thread. The first processing thread, the second processing thread and the third processing thread can all be executed in parallel, namely, the preprocessing is executed through the first processing thread, the inference processing is executed through the second processing thread, and the post-processing is executed through the third processing thread, so that a pipeline processing mode of character processing can be realized, and the image processing efficiency is improved.

Specifically, the server may perform post-processing on an inference result obtained by performing inference processing on the second processing thread through the third processing thread to obtain a character recognition result of the image to be processed. The first processing thread, the second processing thread and the third processing thread are different threads, and the second processing thread can be processed in parallel with at least one of the second processing thread and the third processing thread, so that the processing efficiency of the image is improved.

In the image processing method, for a plurality of character areas in the image to be processed, the character areas are preprocessed in the character recognition processing through the first processing thread, the preprocessing result is subjected to reasoning processing through the second processing thread, the reasoning result is subjected to post-processing through the third processing thread, and the character recognition result of the image to be processed is obtained, wherein the second processing thread is parallel to at least one of the first processing thread and the third processing thread. In the image processing process, the preprocessing, the reasoning processing and the post-processing in the character recognition processing are respectively executed through different processing threads, and the thread for executing the reasoning processing is parallel to at least one thread of the thread for executing the preprocessing and the thread for executing the post-processing, so that the parallel processing in the character recognition processing is realized, and the image processing efficiency is improved.

In one embodiment, as shown in fig. 3, the pre-processing in the character recognition processing is performed on the character region by a first processing thread to obtain a pre-processing result, including:

step 302, through a first processing thread, preprocessing in character recognition processing is performed on the character regions in which the batch processing number is sequentially obtained from the character region sequence, so as to obtain a batch preprocessing result; the character region sequence is obtained by sorting character regions according to the region sizes.

Wherein the character region sequence is obtained by sorting the character regions according to the region sizes. A plurality of character areas can be detected in the image to be processed, and the size of each character area is different. For the character detection boxes, the height of the character detection boxes is the same, but the width may be different, i.e., the larger the number of characters in the region, the larger the width of the corresponding character detection box. The size of the character region may refer to the width of the character region, and specifically, each character region in the image to be processed may be sorted according to the width of the character region, so as to obtain a character region sequence. For example, the character region sequence can be obtained by sorting the region size from small to small, or by sorting the region size from large to small. The batch processing number is the number of character areas to be batch-processed for each batch processing when the character areas are batch-processed, and may be considered as a batch size. The batch processing quantity can be set according to actual needs to ensure the stability of image processing and improve the processing efficiency. The batch preprocessing result is a preprocessing result obtained by preprocessing the character areas with the batch processing quantity.

Specifically, the server may obtain the batch processing number of character regions in sequence from the sequence of character regions through the first processing thread, and perform preprocessing in the character recognition processing on the obtained character regions in batches to obtain a batch preprocessing result. In specific application, after a plurality of character areas in an image to be processed are determined, the character areas can be sequenced according to the area size of each character area to obtain a character area sequence. The server can continuously acquire character areas with batch processing quantity in sequence from the character area sequence through the first processing thread so as to acquire character areas needing batch preprocessing, and the server can carry out batch preprocessing on the acquired character areas through the first processing thread so as to acquire batch preprocessing results.

Further, the second processing thread performs inference processing in the character recognition processing on the pre-processing result to obtain an inference result, including:

and step 304, performing reasoning processing in the character recognition processing on the batch preprocessing result through a second processing thread to obtain a batch reasoning result.

Specifically, for the batch preprocessing result obtained by performing batch preprocessing on the character region by the first processing thread, the server may perform batch inference processing on the batch preprocessing result by the second processing thread to obtain a batch inference result.

Further, the post-processing in the character recognition processing is performed on the inference result through a third processing thread to obtain a character recognition result of the image to be processed, and the method includes:

and step 306, performing post-processing in the character recognition processing on the batch reasoning results through a third processing thread to obtain character recognition results of the images to be processed.

Specifically, for the batch reasoning result obtained by performing batch reasoning processing on the batch preprocessing result by the second processing thread, the server can perform post-processing in character recognition processing on the batch reasoning result by the third processing thread to obtain the character recognition result of the image to be processed.

In this embodiment, for a character region sequence obtained by sorting character regions according to the size of the region, the character regions of the batch processing number are obtained from the character region sequence, and the character recognition processing is performed in batch, so that the character regions in the image to be processed can be subjected to batch character recognition processing, the number of times of the character recognition processing can be reduced, and the image processing efficiency can be improved.

In one embodiment, the preprocessing in the character recognition process is performed on the character regions in which the batch processing number is sequentially obtained from the sequence of character regions by the first processing thread, and obtaining a batch preprocessing result includes: obtaining batch character areas with batch processing quantity in sequence from the character area sequence through a first processing thread; respectively preprocessing the batch character areas in the character recognition processing through a first processing thread to obtain preprocessing results corresponding to the batch character areas; and performing data expansion processing on the preprocessing result corresponding to the batch character area through the first processing thread to obtain a batch preprocessing result.

The batch processing number is the number of character areas to be batch processed per batch processing when the character areas are batch processed, and may be considered as a batch size. The batch processing quantity can be set according to actual needs so as to ensure the stability of image processing and improve the processing efficiency. The batch character area is batch data composed of a batch processing number of character areas.

Specifically, the server obtains a character region sequence obtained by sequencing the character regions according to the region sizes through a first processing thread, and obtains batch character regions of batch processing quantity in sequence from the character region sequence. After the batch character areas are obtained, the server respectively carries out preprocessing in the character recognition processing on the batch character areas through the first processing thread to obtain preprocessing results corresponding to the batch character areas. Specifically, each character area in the batch character area may be respectively preprocessed in the character recognition process, and the preprocessing result corresponding to the batch character area is obtained according to the preprocessing result of each character area. When the second processing thread performs inference processing, it is necessary to ensure that the data size of the inference processing is uniform, and the data size of the preprocessing result corresponding to the batch character region is consistent with the size of the respective corresponding character region, at this time, it is necessary to perform data expansion on the preprocessing result corresponding to the batch character region, so that the data size of the preprocessing result corresponding to the batch character region is uniform. Specifically, the server performs data expansion processing on the preprocessing result corresponding to the batch character area through a first processing thread to obtain a batch preprocessing result.

During specific implementation, the server may perform data filling on the preprocessing results corresponding to the batch character areas through the first processing thread, so as to expand the preprocessing results corresponding to the batch character areas to the same size. For example, the server may perform 0-padding on the preprocessing result corresponding to the batch character area through the first processing thread, and specifically may perform 0-padding on the preprocessing result corresponding to the batch character area according to the character area with the largest area in the batch character area, so that the size of the preprocessing result corresponding to the batch character area is the same as the size of the area of the character area with the largest area in the batch character area.

In this embodiment, after the batch character regions of the batch processing number are acquired from the character region sequence through the first processing thread, the preprocessing is performed, and the data expansion processing is performed on the preprocessing results corresponding to the batch character regions to obtain batch preprocessing results of uniform size, so that the subsequent batch pushing processing of the batch preprocessing results is facilitated, and the improvement of the image processing efficiency is facilitated.

In one embodiment, the performing post-processing in character recognition processing on the batch reasoning results through a third processing thread to obtain character recognition results of the image to be processed includes: splitting the batch reasoning result through a third processing thread to obtain a regional reasoning result corresponding to the batch character region; and respectively carrying out post-processing in character recognition processing on the regional reasoning results through a third processing thread to obtain character recognition results of the image to be processed.

The batch reasoning result is a reasoning result obtained by reasoning in batch by the second processing thread, and comprises reasoning results corresponding to each character area in the batch character area. The regional reasoning result is the reasoning result corresponding to each character region in the batch character region.

Specifically, the server splits the batch reasoning result through a third processing thread for the obtained batch reasoning result to obtain a region reasoning result corresponding to the batch character region. In specific implementation, the inference results corresponding to each character region are the same in size, so that the batch inference results can be equally split according to the batch processing quantity, and the preprocessing results corresponding to the batch character regions are obtained. And the server respectively performs post-processing in the character recognition processing on the regional reasoning results through a third processing thread, so that the post-processing on each character region in the batch character regions is realized, the character recognition result of each character region is obtained, and the character recognition result of the image to be processed is obtained.

In this embodiment, after the obtained batch reasoning result is split by the third processing thread, post-processing in the character recognition processing is performed respectively, so that batch character recognition processing on the character region is realized, and improvement of image processing efficiency is facilitated.

In one embodiment, the image processing method further comprises: and matching the character region in the character region sequence with the character recognition result through a second processing thread to obtain a character recognition result matched with the character region in the image to be processed.

When the batch character recognition processing is performed on each character area, the character areas are sorted according to the size of the area, the actual processing sequence of each character area is different from the distribution position of the character area in the image to be processed, and after the character recognition processing is completed, each character recognition result needs to be matched with each character area in the image to be processed, so that the accuracy of the character recognition result is ensured.

Specifically, the server may match the character region in the character region sequence with the character recognition result through the second processing thread, for example, the server may match each character region with the character recognition result through the second processing thread according to the index information of each character region in the character region sequence, thereby obtaining a character recognition result matched with the character region in the image to be processed, and ensuring correspondence between the character recognition result and each character region. The index information may be generated when the character regions are ordered to form a character region sequence.

In the embodiment, each character recognition result is matched with each character area in the image to be processed, so that the accuracy of the character recognition result is ensured.

In one embodiment, the second processing thread is a primary thread and the first and third processing threads are secondary threads.

The main thread is a thread running on the foreground, and generally only one thread is used, and the second processing thread is a main thread, so that inference processing is executed through the main thread, and the processing efficiency of the inference processing which consumes longer time is ensured. The sub-threads refer to threads running in the background, the number is not fixed, and the first processing thread and the third processing thread are sub-threads. Specifically, the server sets the second processing thread to the main thread for execution, and sets the first processing thread and the third processing thread to the sub-threads for execution, so that the inference processing which takes longer time is set to the main thread with stronger computing power for execution, the blocking problem generated during the character recognition parallel processing is avoided, and the image processing efficiency is ensured.

In one embodiment, after the character area is preprocessed in the character recognition process by the first processing thread to obtain a preprocessing result, the method further includes: and storing the preprocessing result into a preprocessing result queue through the first processing thread.

The preprocessing result queue is a preset storage queue and is used for storing preprocessing results. Specifically, after the server preprocesses the character area through the first processing thread, the server stores the preprocessing result into the preprocessing result queue through the first processing thread.

Further, the second processing thread performs inference processing in the character recognition processing on the pre-processing result to obtain an inference result, including: and acquiring a preprocessing result from the preprocessing result queue through the second processing thread, performing inference processing in the character recognition processing, and storing the acquired inference result into the inference result queue.

The reasoning result queue is a preset storage queue and is used for storing reasoning results. Specifically, the server obtains the preprocessing result from the preprocessing result queue through the second processing thread, and performs inference processing in the character recognition processing on the obtained preprocessing result to obtain an inference result. And the server stores the obtained inference result into the inference result queue through the second processing thread.

Further, the post-processing in the character recognition processing is performed on the inference result through a third processing thread to obtain a character recognition result of the image to be processed, and the method comprises the following steps: and acquiring the inference result from the inference result queue through a third processing thread, performing post-processing in the character recognition processing, and storing the acquired character recognition result into the recognition result queue.

The recognition result queue is a preset storage queue and is used for character recognition results. Specifically, the server acquires the inference processing result from the inference result queue through the third processing thread, and performs post-processing in character recognition processing on the acquired inference processing result to obtain a character recognition result. And the server stores the obtained character recognition result into a recognition result queue through a third processing thread.

Further, the image processing method further includes: and obtaining the character recognition result of the image to be processed from the recognition result queue through the second processing thread.

Specifically, after the third processing thread stores the obtained character recognition result in the recognition result queue, the server may further read the character recognition result of the image to be processed from the recognition result queue through the second processing thread, so that the character recognition result of the image to be processed may be output or displayed.

In this embodiment, the respective processing results of the first processing thread, the second processing thread and the third processing thread may be stored in the corresponding preset storage queues, so as to avoid the blocking problem during the character recognition parallel processing, thereby ensuring the processing efficiency of the character recognition parallel processing and the processing efficiency of the image.

The application also provides an application scene, and the application scene applies the image processing method. Specifically, the application of the image processing method in the application scenario is as follows:

the embodiment relates to the processing of performing OCR (optical character recognition) on an image, and the OCR is an optical character recognition algorithm, aims to automatically extract and recognize characters in a target image or video through a computer vision technology, and is one of the most widely applied technologies in the field of computer vision at present. Generally OCR algorithms contain two major parts: a detection process, i.e. locating to a character area, and a recognition process, i.e. extracting the characters of the character area, i.e. mapping pixel values to a character string. Generally, after an image to be recognized is input, firstly, performing OCR detection on the whole image to obtain N character areas, wherein the N character areas can be displayed in the form of external rectangular frames; and recognizing the N character areas one by utilizing an OCR recognition algorithm to obtain a final result. When the sizes of the input images are similar, all detection frames can be obtained by OCR detection at one time, so the time consumption of the detection step is approximate; however, since the OCR detection step is performed on character regions one by one, the time consumption of the OCR detection step is positively correlated with the number of characters in the input image. At present, OCR recognition can be implemented based on CNN, and a complete OCR recognition process generally includes three modules, namely, preprocessing module, model reasoning module and post-Processing module, where the preprocessing module and the post-Processing module can be performed on a CPU (Central Processing Unit), and the reasoning module is performed on a GPU (graphics Processing Unit), and when there are many detection frames, there is a lot of data transmission between the CPU and the GPU, which wastes a lot of time, so that the utilization rates of the CPU and the GPU are also low. Moreover, for pictures with many characters, such as documents, newspaper photos, etc., the OCR recognition time is too long, resulting in inefficient image processing.

As shown in fig. 4, for a time-consuming analysis of one OCR process, the time consumed for a single character region is about 10.2 ms. Wherein the pretreatment takes 0.84ms, and the time consumption accounts for 8%; the reasoning processing takes 4.77ms, and the time consumption accounts for 47%; the post-treatment took 4.59ms, and the ratio of the time consumption was 45%. For pictures with more characters, if the pictures are identified one by one, the time is about 683ms when the pictures have 67 character detection boxes compared with document pictures, and the OCR processing efficiency is low. Meanwhile, in OCR, different operations run on different devices, and only one device works at the same time, so that the utilization rate of the device is low.

The image processing method of the embodiment is based on the batch pipeline processing OCR recognition process, and enables the preprocessing of different character areas, namely the preprocessing, the model reasoning and the post-processing operations to be carried out simultaneously through reasonable scheduling, so that the overall time consumption can be effectively reduced, and the equipment utilization rate can be improved. Specifically, if in a certain OCR task, there are 3 character detection boxes in total. For single-pass OCR recognition, the time spent by character recognition in preprocessing, model reasoning processing and post processing is t. As shown in fig. 5, when OCR recognition is sequentially performed one by one, the time taken for a single OCR recognition operation is 3t, and the total time taken for 3 character detection boxes is 9 t. As shown in fig. 6, if pipeline operation is adopted, OCR recognition operations of different character detection boxes are performed simultaneously, specifically, while inference processing of the character detection box 1 is performed, preprocessing of the character detection box 2 is performed simultaneously; when the post-processing of the character detection box 1 is carried out, the reasoning processing of the character detection box 2 and the preprocessing of the character detection box 3 are carried out simultaneously, so that the equipment can be ensured to be always in a full-load state, and the total consumed time is shortened to 5 t. More generally, it is assumed that preprocessing, inference processing of the model, and post-processing take time t1, t2, and t3, respectively, where the maximum value is tm, and there are N character regions in total. Then the total time consumption of sequential execution is N (t1+ t2+ t3), the total time consumption of pipeline execution is t1+ t2+ t3+ (N-1) × tm, and when N is greater than 3, the processing time consumption of the OCR recognition can be effectively reduced, and the processing efficiency of the OCR recognition can be improved.

Further, the embodiment implements the pipeline execution of OCR recognition based on a multithreading mechanism, and particularly, considering that the inference processing takes the longest time, in order to avoid queue blocking, the inference processing may be performed in the main thread, and the pre-processing and the post-processing may be performed in the sub-thread. In particular implementations, any operation may be placed on a primary thread and other operations may be placed on a secondary thread. Specifically, the specific pre-processing in the character recognition processing may be preset, and for example, the specific pre-processing may include processing such as cutting out a text area in an original image according to the coordinates of the detection frame, size normalization, and value normalization; for the post-processing, the processing may specifically include traversing the model output probability vector, dictionary mapping, and the like. The server establishes three queues, wherein the queue A is used for storing the preprocessed result, the queue B is used for storing the reasoning result output by the model, and the queue C is used for storing the character recognition result obtained by post-processing.

Further, a pre-processing thread PR is initialized, the input of the PR is an original image and N character detection frames, and the PR executes the operation of traversing each character detection frame, executing the pre-processing operation one by one and storing the result into a queue A. And the processing end condition of PR may be the preprocessing of the N character detection boxes after the operation is completed. The operation executed by the main thread is to carry out N times of circulation, the operation of single circulation is to take a model input from the queue A, namely to obtain a preprocessing result from the queue A, execute the model inference processing and store the inference result into the queue B, and the main thread is blocked and waits when the queue A is empty. And initializing a post-processing thread PO, wherein the input of the PO is a queue B and the number N of detection frames, the operation executed by the PO is to read model output from the queue B, namely acquiring an inference result, and after the post-processing operation is executed, storing a character recognition result into a queue C, and the finishing condition of the PO can be that the post-processing frequency reaches N. In addition, after the post-processing loop is finished, the main thread can also take out N outputs from the queue C and return to obtain an OCR recognition result. As shown in fig. 7, in one particular application, the total time taken for the OCR recognition process is reduced from 683ms to 384ms, by about 44%. For the average consumed time of each OCR treatment, the consumed time of the pretreatment is 0.02ms, and the consumed time accounts for nearly 0%; the time consumption of reasoning treatment is 4.97ms, and the ratio of the time consumption is 99%; the post-treatment took 0.04ms, and the ratio of the time taken was 1%.

In addition, OCR recognition can also be implemented based on a batch pipeline. OCR recognition can be implemented based on CNN models that support batch reasoning and increase time consumption much less than linear increase as batch size BS increases. For example, a single inference takes 10ms, for two text regions, it takes 20ms to perform the BS 1 inference twice, and the BS 2 inference time may be reduced to 15ms, so that batch processing and pipeline combination may be adopted to further improve processing efficiency.

Due to the limitation of the CNN structure, the input in the same batch must keep the same size, so the problem of size selection needs to be solved when processing in batch, and the OCR recognition algorithm needs to keep the height of the input character region consistent, for example, it can be 32 pixels. In the specific implementation, a fixed value W can be taken as the same width, and the area with insufficient width is expanded by 0-padding, namely 0-padding; and the part with the excessive width is compressed by an interpolation mode. However, W is difficult to determine, and when W is too large, a large number of areas need to be expanded, and the time consumption increased by expansion can offset the time consumption reduced by batch processing, so that the OCR processing efficiency is influenced; when W is too small, a large number of regions are compressed, so that characters stick together, resulting in poor recognition effect and reduced character recognition accuracy. The frame of the region with the longest width in the same batch can be selected as the uniform width of the current batch, so that no character region is compressed, but the distribution of the width of the character region is random, so that a very long and a very short character region exist in each batch, and the large-scale expansion of the very short region increases the time consumption of OCR.

Based on the method, all the character detection frames in the image to be processed can be sequenced according to the width from small to large, and the batch construction is carried out according to a certain batch size, such as 4 or 8 batch sizes, so that the character areas in a unified batch are ensured to be relatively close in width, and the large-scale expansion is avoided. Specifically, the specific pre-processing in the character recognition processing may be preset, for example, the specific pre-processing may include processing such as cutting out a text region from an original image according to the coordinates of the detection frame, size normalization, and value normalization; for the post-processing, the processing may specifically include traversing the model output probability vector, dictionary mapping, and the like. The server establishes three queues, wherein the queue A is used for storing the preprocessed result, the queue B is used for storing the reasoning result output by the model, and the queue C is used for storing the character recognition result obtained by post-processing.

Further, the server sorts all the character detection boxes in the order of small to large according to the width, and retains the original index information. For example, the width with the rank 1, i.e., the shortest one, is the 18 th box of the original. Initializing a preprocessing thread PR, inputting original images, N character detection frames and batch size BS by the PR, wherein the PR executes the operations of sequentially taking the BS character detection frames, taking the actual number of the character detection frames when the number of the character detection frames is less than BS, traversing the BS character detection frames by the preprocessing thread PR, executing preprocessing operations one by one, and storing each preprocessing result into a queue A after the PR executes 0-padding by taking the largest width of the character detection frames as a target to form batch data. And the processing end condition of PR may be the preprocessing of the N character detection boxes after the operation is completed. The operation executed by the main thread is to carry out N times of circulation, the operation of single circulation is to take a model input from the queue A, namely to obtain a preprocessing result from the queue A, execute the model inference processing and store the inference result into the queue B, and the main thread is blocked and waits when the queue A is empty. And initializing a post-processing thread PO, wherein the input of the PO is a queue B and the number N of detection frames, the operation executed by the PO is to read model output from the queue B, namely acquiring an inference result, the PO divides batch output into single character region output, and stores a character recognition result into a queue C after executing post-processing operation on each character region output, and the finishing condition of the PO can be that the post-processing times reach N. After the post-processing loop is finished, the main thread can also take out N outputs from the queue C, and the N outputs are reordered according to the original index, so that the final character recognition result and the original detection frame are in one-to-one correspondence in sequence and returned to obtain an OCR recognition result.

As shown in FIG. 8, in one particular application, the total time taken for the OCR recognition process is reduced from 683ms to 131ms, by about 80%. For the average consumed time of each OCR treatment, the consumed time of the pretreatment is 0.04ms, and the consumed time accounts for nearly 2%; the time consumption of reasoning processing is 1.83ms, and the time consumption accounts for 93%; the post-treatment took 0.09ms, and the time consumption accounted for 5%.

According to the image processing method, the electronic equipment can simultaneously process different recognition processes of different character areas through the pipeline type reasoning processing flow, the recognition speed is improved by increasing the utilization efficiency of the equipment, and therefore the image processing efficiency is improved.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially shown as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides an image processing apparatus for implementing the image processing method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the image processing apparatus provided below can refer to the limitations on the image processing method in the foregoing, and details are not repeated here.

In one embodiment, as shown in fig. 9, there is provided an image processing apparatus 900 including: a character region determination module 902, a pre-processing module 904, an inference module 906, and a post-processing module 908, wherein:

a character region determining module 902, configured to determine a plurality of character regions in the image to be processed;

a preprocessing module 904, configured to perform preprocessing in the character recognition processing on the character region through a first processing thread to obtain a preprocessing result;

the reasoning module 906 is configured to perform reasoning processing in the character recognition processing on the preprocessing result through a second processing thread to obtain a reasoning result;

a post-processing module 908, configured to perform post-processing in the character recognition processing on the inference result through a third processing thread, to obtain a character recognition result of the image to be processed; wherein the second processing thread is in parallel with at least one of the first processing thread and the third processing thread.

In an embodiment, the preprocessing module 904 is further configured to perform, by using the first processing thread, preprocessing in the character recognition processing on the character regions whose batch processing number is sequentially obtained from the sequence of character regions, so as to obtain a batch preprocessing result; the character region sequence is obtained by sequencing character regions according to the size of the regions; the reasoning module 906 is further configured to perform reasoning processing in the character recognition processing on the batch preprocessing result through the second processing thread to obtain a batch reasoning result; the post-processing module 908 is further configured to perform post-processing in the character recognition processing on the batch inference result through a third processing thread to obtain a character recognition result of the image to be processed.

In one embodiment, the pre-processing module 904 includes a batch region acquisition module, a batch pre-processing module, and a data expansion module; wherein: the batch region obtaining module is used for obtaining batch character regions of batch processing quantity from the character region sequence in sequence through the first processing thread; the batch preprocessing module is used for respectively preprocessing the batch character areas in the character recognition processing through the first processing thread to obtain preprocessing results corresponding to the batch character areas; and the data expansion module is used for performing data expansion processing on the preprocessing result corresponding to the batch character area through the first processing thread to obtain a batch preprocessing result.

In an embodiment, the post-processing module 908 is further configured to split the batch inference result through a third processing thread to obtain a regional inference result corresponding to the batch character region; and respectively carrying out post-processing in character recognition processing on the regional reasoning results through a third processing thread to obtain character recognition results of the image to be processed.

In an embodiment, the image processing apparatus further includes a recognition result matching module, configured to match, by using a second processing thread, a character region in the character region sequence with the character recognition result, so as to obtain a character recognition result matched with the character region in the image to be processed.

In one embodiment, the system further comprises a preprocessing result storage module, configured to store the preprocessing result into a preprocessing result queue through the first processing thread; the reasoning module 906 is further configured to obtain the preprocessing result from the preprocessing result queue through the second processing thread, perform reasoning processing in the character recognition processing, and store the obtained reasoning result in the reasoning result queue; the post-processing module 908 is further configured to obtain the inference result from the inference result queue through a third processing thread, perform post-processing in the character recognition processing, and store the obtained character recognition result in the recognition result queue; the device also comprises an identification result obtaining module which is used for obtaining the character identification result of the image to be processed from the identification result queue through a second processing thread.

The respective modules in the image processing apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, or can be stored in a memory in the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, an electronic device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 10. The electronic device includes a processor, a memory, an Input/Output (I/O) interface, and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing image processing data. The input/output interface of the electronic device is used for exchanging information between the processor and an external device. The communication interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an image processing method.

In one embodiment, an electronic device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 11. The electronic device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The input/output interface of the electronic device is used for exchanging information between the processor and an external device. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image processing method. The display unit of the electronic equipment is used for forming a visual picture and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

It will be understood by those skilled in the art that the configurations shown in fig. 10 and 11 are only block diagrams of partial configurations relevant to the present disclosure, and do not constitute a limitation on the electronic devices to which the present disclosure may be applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.

The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of the image processing method.

Embodiments of the present application also provide a computer program product containing instructions which, when run on a computer, cause the computer to perform an image processing method.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related countries and regions.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. An image processing method, comprising:

determining a plurality of character areas in an image to be processed;

preprocessing the character area in character recognition processing through a first processing thread to obtain a preprocessing result;

performing post-processing in the character recognition processing on the inference result through a third processing thread to obtain a character recognition result of the image to be processed;

2. The method according to claim 1, wherein the pre-processing in the character recognition processing is performed on the character area through a first processing thread to obtain a pre-processing result, and the method comprises:

through a first processing thread, preprocessing is carried out on the character areas with the batch processing quantity obtained from the character area sequence in the character identification processing process, and a batch preprocessing result is obtained;

the character region sequence is obtained by sequencing the character regions according to the region sizes;

the performing inference processing in the character recognition processing on the pre-processing result through a second processing thread to obtain an inference result, including:

performing reasoning processing in the character recognition processing on the batch preprocessing result through a second processing thread to obtain a batch reasoning result;

the performing, by the third processing thread, post-processing in the character recognition processing on the inference result to obtain a character recognition result of the image to be processed, including:

and performing post-processing in the character recognition processing on the batch reasoning results through a third processing thread to obtain a character recognition result of the image to be processed.

3. The method according to claim 2, wherein the preprocessing in the character recognition process is performed on the character regions sequentially obtaining the batch processing number from the character region sequence by the first processing thread to obtain a batch preprocessing result, and the method comprises:

obtaining batch character areas with batch processing quantity in sequence from the character area sequence through a first processing thread;

respectively preprocessing the batch character areas in the character recognition processing through the first processing thread to obtain preprocessing results corresponding to the batch character areas;

and performing data expansion processing on the preprocessing result corresponding to the batch character area through the first processing thread to obtain a batch preprocessing result.

4. The method according to claim 2, wherein the performing post-processing in the character recognition processing on the batch inference result through a third processing thread to obtain a character recognition result of the image to be processed comprises:

splitting the batch reasoning result through a third processing thread to obtain a regional reasoning result corresponding to the batch character region;

and respectively carrying out post-processing in the character recognition processing on the regional reasoning results through the third processing thread to obtain character recognition results of the image to be processed.

5. The method of claim 2, further comprising:

matching the character region in the character region sequence with the character recognition result through the second processing thread to obtain a character recognition result matched with the character region in the image to be processed;

the second processing thread is a main thread, and the first processing thread and the third processing thread are sub-threads.

6. The method according to any one of claims 1 to 5, wherein after the pre-processing in the character recognition processing is performed on the character area by the first processing thread to obtain a pre-processing result, the method further comprises:

storing the preprocessing result into a preprocessing result queue through the first processing thread;

acquiring the preprocessing result from the preprocessing result queue through a second processing thread, performing inference processing in the character recognition processing, and storing the acquired inference result into an inference result queue;

the post-processing in the character recognition processing is performed on the inference result through a third processing thread to obtain a character recognition result of the image to be processed, and the method includes:

acquiring the inference result from the inference result queue through a third processing thread, performing post-processing in the character recognition processing, and storing the acquired character recognition result into a recognition result queue;

the method further comprises the following steps:

and obtaining the character recognition result of the image to be processed from the recognition result queue through the second processing thread.

7. An image processing apparatus characterized by comprising:

the preprocessing module is used for preprocessing the character recognition processing of the character area through a first processing thread to obtain a preprocessing result;

the reasoning module is used for carrying out reasoning processing in the character recognition processing on the preprocessing result through a second processing thread to obtain a reasoning result;

the post-processing module is used for performing post-processing in the character recognition processing on the inference result through a third processing thread to obtain a character recognition result of the image to be processed;

8. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the computer program, when executed by the processor, causes the processor to perform the steps of the image processing method according to any of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.