CN114391260A

CN114391260A - Character recognition method and device, storage medium and electronic equipment

Info

Publication number: CN114391260A
Application number: CN201980100391.5A
Authority: CN
Inventors: 郭子亮
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2022-04-22
Also published as: WO2021134229A1

Abstract

A character recognition method, a device, a storage medium and an electronic device are provided, the method comprises: acquiring a plurality of video frames (101); creating a plurality of CPU processes, decoding each video frame by using each CUP process to obtain a plurality of first images (102), and storing the plurality of first images into a database (103); creating a first GPU process, sequentially acquiring first images from the database by using the first GPU process, and sequentially determining the position information of characters in each first image to obtain the position information corresponding to each first image (104); according to the position information corresponding to each first image, performing cutting processing on each first image to obtain a plurality of second images (105); and performing character recognition processing on each second image to obtain a character recognition result (106).

Description

Character recognition method and device, storage medium and electronic equipment

Technical Field

The present application belongs to the field of electronic technologies, and in particular, to a method and an apparatus for character recognition, a storage medium, and an electronic device.

Background

Video images in electronic devices such as smart phones often contain a large amount of information content. In addition to the image frames, the video images may also contain text information, which is usually used to display important information of the video playing contents. Compared with the ever-changing image information, the text information is identified and analyzed, and it is generally easier to know what type of video is played by the electronic equipment.

In the related art, the cooperation between a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU) is usually required to perform character recognition on a video image. For example, the central processing unit cuts out the area containing the characters from the video image, and the graphics processing unit identifies the characters in the area containing the characters.

Disclosure of Invention

The embodiment of the application provides a character recognition method, a character recognition device, a storage medium and electronic equipment, and can improve the resource utilization rate of a GPU.

In a first aspect, an embodiment of the present application provides a text recognition method, including:

acquiring a plurality of video frames;

creating a plurality of CPU processes, and decoding each video frame by using each CUP process to obtain a plurality of first images;

storing the plurality of first images in a database;

creating a first GPU process, sequentially acquiring first images from the database by using the first GPU process, and sequentially determining the position information of characters in each first image to obtain the position information corresponding to each first image;

according to the position information corresponding to each first image, performing cutting processing on each first image to obtain a plurality of second images;

and performing character recognition processing on each second image to obtain a character recognition result.

In a second aspect, an embodiment of the present application provides a text recognition apparatus, including:

the acquisition module is used for acquiring a plurality of video frames;

the decoding module is used for creating a plurality of CPU processes and decoding each video frame by using each CUP process to obtain a plurality of first images;

the storage module is used for storing the plurality of first images into a database;

the determining module is used for creating a first GPU process, sequentially acquiring first images from the database by using the first GPU process, and sequentially determining the position information of characters in each first image to obtain the position information corresponding to each first image;

the cutting module is used for cutting each first image according to the position information corresponding to each first image to obtain a plurality of second images;

and the recognition module is used for performing character recognition processing on each second image to obtain a character recognition result.

In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, where when the computer program is executed on a computer, the computer is caused to execute the character recognition method provided in this embodiment.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to, by calling the computer program stored in the memory, execute:

acquiring a plurality of video frames;

storing the plurality of first images in a database;

Drawings

The technical solutions and advantages of the present application will become apparent from the following detailed description of specific embodiments of the present application when taken in conjunction with the accompanying drawings.

Fig. 1 is a first flowchart of a text recognition method according to an embodiment of the present application.

Fig. 2 is a second flowchart of a text recognition method according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of a character recognition device according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

Referring to fig. 1, fig. 1 is a first flowchart illustrating a text recognition method according to an embodiment of the present disclosure. The flow of the character recognition method can comprise the following steps:

in 101, a plurality of video frames are acquired.

For example, the electronic device may capture a video and then break the video into a plurality of video frames so that the electronic device may obtain the plurality of video frames.

At 102, a plurality of CPU processes are created, and each CPU process decodes each video frame to obtain a plurality of first images.

For example, after obtaining a plurality of video frames, the electronic device may create a plurality of CPU processes, and perform decoding processing on each video frame by using each CPU process to obtain a plurality of first images.

It will be appreciated that the number of CPU processes may be less than or equal to the number of video frames. When the number of the CPU processes is smaller than the number of the video frames, if the number of the CPU processes is 5 and the number of the video frames is 10, the electronic device can acquire the first 5 video frames of the 10 video frames by using the 5 CPU processes to perform decoding processing, so as to obtain a plurality of first images; or, the electronic device may acquire any 5 video frames of the 10 video frames by using the 5 CPU processes to perform decoding processing, so as to obtain a plurality of first images.

In some embodiments, for a video, such as a video stream, in which the electronic device outputs video frames in real time, the electronic device creates a CPU process to decode one video frame, so as to obtain a plurality of first images.

In 103, a plurality of first images is stored in a database.

For example, when a CPU process finishes decoding the acquired video frame, the electronic device may store the acquired first image in the database by using the CPU process. That is, each time the CPU process finishes decoding the acquired video frame, the electronic device may store the first image obtained by the CPU process in the database, so as to store the plurality of first images in the database.

In 104, a first GPU process is created, the first GPU process is used to sequentially obtain the first images from the database, and the position information of the text is sequentially determined from each first image, so as to obtain the position information corresponding to each first image.

For example, when at least one first image exists in the database, the electronic device may create a first GPU process, and sequentially retrieve the first image from the database using the first GPU process. Then, each time the first GPU process acquires a first image, the electronic device may perform position detection processing on the first image by using the first GPU process, thereby determining position information of a character from the first image, and obtaining position information corresponding to the first image. The position detection processing on the image may be: the regions of the image where text is present are detected to determine which regions of the image have text.

It will be appreciated that, while the first image is sequentially retrieved from the database using the first GPU process, the electronic device may sequentially retrieve the first image from the database using the first GPU process in a first-in-first-out order. That is, the first image that is first stored in the database will be first acquired by the first GPU process.

In some embodiments, the electronic device may perform, by using a first GPU process, a position detection process on the acquired first image by using a pre-trained position detection model to obtain position information corresponding to the acquired first image.

It should be noted that, before the electronic device may utilize the first GPU process to perform the position detection processing on the acquired first image by using the pre-trained position detection model to obtain the position information corresponding to the acquired first image, the electronic device may first utilize each CPU process of the plurality of CPU processes to perform the decoding processing on each video frame of the plurality of video frames, and then perform the pre-processing to obtain the plurality of first images, and store the plurality of first images in the database. Subsequently, the electronic device may employ the first GPU process to sequentially retrieve the first images from the database. The preprocessing of the image may be: the format, size, etc. of the image is converted into a format, size, etc. supported by the position detection model.

At 105, each first image is subjected to cropping processing according to the position information corresponding to each first image, and a plurality of second images are obtained.

In this embodiment of the application, each time position information corresponding to one first image is obtained, the electronic device may perform cropping processing on the corresponding first image according to the position information to obtain a second image. It is understood that the electronic device obtains the position information according to which first image, and then performs the cropping processing on which first image according to the position information. Wherein, according to the position information, the image is cropped by: and cutting out the area with characters in the image.

At 106, a character recognition process is performed on each second image to obtain a character recognition result.

For example, after obtaining a plurality of second images, the electronic device may perform a text recognition process on each of the second images to obtain a text recognition result. Wherein the character recognition result comprises a character recognition result corresponding to each second image.

It can be understood that the plurality of second images are not necessarily obtained simultaneously, and therefore, each time one second image is obtained, the electronic device may perform the character recognition processing on the second image to obtain the character recognition result of the second image, so as to finally obtain the character recognition results of the plurality of second images. Subsequently, the electronic device can store the character recognition result, and perform video classification, video pushing and the like by using the character recognition result.

In the embodiment, a plurality of CPU processes are used for decoding each video frame to obtain a plurality of first images; and storing the plurality of first images into a database, so that the first GPU process can continuously acquire the first images from the database and sequentially acquire the position information corresponding to each first image, thereby finally acquiring a character recognition result. Therefore, the character recognition method provided by the embodiment can prevent the first GPU process from being in a long waiting process, and further improve the resource utilization rate of the GPU.

In some embodiments, the plurality of video frames are video frames corresponding to videos to be classified, and after the process 106, the method may further include:

performing word segmentation processing on the character recognition result to obtain a plurality of words;

and determining the category of the video to be classified according to the plurality of word segments.

It is understood that the character recognition result obtained by the electronic device is only to extract the characters in the image, and the word segmentation is not performed. For example, the text recognition result obtained by the electronic device may be: the children can explore beautiful and magical Chinese characters in playing games by listening to stories. Then, the electronic device may perform word segmentation on the character recognition result, and the obtained multiple word segments may be: let, children, in, listen, story, play, game, play, explore, beauty, magical, Chinese, characters.

For example, after obtaining a plurality of segmented words, the electronic device may determine the category of the video to be classified according to the plurality of segmented words. For example, when words such as songs, lyrics, singing, etc. appear for many times in a plurality of segmented words obtained by the electronic device, the electronic device may determine the category of the video to be classified as the category of the songs. For another example, if the electronic device analyzes the lyrics of a song according to the obtained multiple participles, the electronic device may determine the category of the video to be classified as the song category; alternatively, the electronic device may also determine the type of lyrics, such as paleo-style, pop-style, rock-and-roll style, and so on. Assuming that the electronic device determines that the type of the lyrics belongs to the ancient style, the electronic device may determine the category of the video to be classified as the ancient style under the song category.

In some embodiments, determining the category of the video to be classified according to the plurality of word segments may include:

determining a target keyword from the multiple word segments;

and determining the category of the video to be classified according to the target keyword.

For example, after obtaining the plurality of segmented words, the electronic device may determine the target keyword from the plurality of segmented words. Then, the electronic device can determine the category of the video to be classified according to the target keyword. For example, the electronic device may determine the same participle from a plurality of participles. Then, the electronic device may determine the number of the same segmentation, and determine the most number of the same segmentation as the target keyword. For example, assume that the electronic device gets 10 segmented words, where the number of song words is 7, the number of genre words is 2, and the number of graceful words is 1. Then, the electronic device may determine "song" as the target keyword, so that the electronic device may determine the category of the video to be classified as the song category.

In some embodiments, determining the category of the video to be classified according to the target keyword may include:

determining a category corresponding to the target keyword according to a preset mapping relation between the keyword and the category;

and determining the category corresponding to the target keyword as the category of the video to be classified.

For example, the electronic device may preset a preset mapping R1 between the keyword and the category. For example, keyword K1 corresponds to category C1, keyword K2 corresponds to category C2, keyword K3 corresponds to category C3, and so on. Assuming that the target keyword is K1, the corresponding category is C1, and therefore, the category of the video to be classified is C1.

For another example, the electronic device may preset a preset mapping R2 between the keyword and the category. For example, keywords K1, K2, K3 correspond to category C1, keywords K4, K5, K6 correspond to category C2, keywords K7, K8, K9 correspond to category C3, and so on. Assuming that the target keyword is K3, the corresponding category is C1, and therefore, the category of the video to be classified is C1.

In some embodiments, determining the target keyword from the plurality of segmented words may include:

determining the same participle from the multiple participles;

determining the number of the same participles;

and determining the same participles corresponding to the number larger than the preset number as the target keywords.

For example, after obtaining the plurality of segmented words, the electronic device may determine the same segmented word from the plurality of segmented words. Then, the electronic device may determine the number of the same segmented words, and determine the same segmented words corresponding to the number greater than a preset number as the target keyword. For example, assume that the electronic device obtains 10 segmented words, where the number of song-word is 7, the number of genre-word is 2, the number of graceful-word is 1, and the preset number is 5. Then the electronic device may determine "song" as the target keyword.

In some embodiments, the text recognition method may further include:

acquiring a user portrait of a user;

judging whether the video to be classified is pushed to the user or not according to the user portrait and the category of the video to be classified;

and if so, pushing the video to be classified to the user.

For example, after determining the category of the video to be classified, the electronic device may obtain a user representation of the user. The user portrait refers to abstracting each concrete information of the user into labels, and concreting the user image by using the labels, thereby providing targeted service for the user. Colloquially, a user's user imagery may describe which categories of articles a user often browses, which categories of videos the user often watches, which categories of articles the user often purchases, and so on.

After obtaining a user representation of a user, the electronic device may determine which categories of videos the user frequently watches. Then, the electronic device may determine whether the category of the video to be classified belongs to one of the categories corresponding to the videos frequently watched by the user. If the category of the video to be classified belongs to one of the categories corresponding to the videos frequently watched by the user, the electronic equipment can push the video to be classified to the user for the user to watch.

Referring to fig. 2, fig. 2 is a second flowchart illustrating a text recognition method according to an embodiment of the present disclosure. The character recognition method can comprise the following steps:

in 201, an electronic device acquires a plurality of video frames.

For example, the electronic device may obtain a video and then decompose the video into a plurality of video frames so that the electronic device may obtain the plurality of video frames.

For another example, the electronic device may enter a video recording mode, and continuously capture a captured scene with the camera to continuously output a plurality of video frames, thereby forming a video stream. The electronic device can acquire the plurality of video frames which are continuously output.

The shooting scene is a scene to be shot by a user through the camera, namely, the scene aligned with the camera is the shooting scene. It should be noted that the shooting scene in the embodiment of the present application is not specifically a specific scene, but is a scene aligned in real time according to the direction of the camera. The shooting scene may include text.

At 202, the electronic device creates a plurality of CPU processes and performs a decoding process on each video frame using each CPU process, resulting in a plurality of first images.

For another example, after obtaining a plurality of video frames, the electronic device may create a plurality of CPU processes, and perform decoding processing and preprocessing on each video frame by using each CPU process to obtain a plurality of first images. The preprocessing of the image may be: the format, size, etc. of the image is converted to a corresponding format, size, etc., such as a format, size, etc. supported by the position detection model.

In some embodiments, for a video, such as a video stream, in which the electronic device outputs video frames in real time, the electronic device creates a CPU process to decode one video frame, so as to obtain a plurality of first images. It will be appreciated that the number of CPU processes may be less than or equal to the number of video frames in the video stream.

It should be noted that, when a process, such as a CPU process or a GPU process, is created and started, necessary process communication initialization work needs to be performed. For example, a shared memory is allocated to each process, and a queue, a pipe, etc. for inter-process communication are established. The shared memory may be used to implement data transfer between processes, for example, a process may obtain data from the shared memory of another process.

At 203, the electronic device stores the first image obtained by each CPU process in its shared memory, and stores its identification information in the first queue.

For example, each time the CPU process finishes decoding the acquired video frame, the electronic device may store the first image acquired by the CPU process in the shared memory and store the identification information of the first image in the first queue. The identification information of the CPU process may be a process ID of the CPU process. The first queue may be a first-in-first-out queue, i.e. data that is entered into the queue first is processed first, and data that is entered into the queue is processed later.

In some embodiments, when the CPU process is used to store the identification information thereof in the first queue, the electronic device may also use the CPU process to store the image size, image format, and the like of the first image it obtains in the first queue.

In other embodiments, after the first image obtained by the CPU process is stored in the shared memory of the electronic device and the identification information of the first image is stored in the first queue, the electronic device may suspend the CPU process until the first GPU process sends the obtained location information to the CPU process, and then the electronic device puts the CPU process into a ready state.

It will be appreciated that each CPU process has a shared memory and the database may include the shared memory of each CPU process.

It should be noted that, in the embodiment of the present application, the shared memory is used to store the image data, such as the first image or the second image, instead of storing the image data in the queue or the pipeline, because the image data is generally large, if the image data is directly stored in the queue for transmission, operations such as unnecessary copying may be caused, and thus the overall operation speed of the process in the text recognition process is seriously affected. By placing image data into the shared memory and only storing identification information and the like in the queue, the communication speed between the processes can be greatly improved, so that the time for communication between the processes can be almost ignored for the time required by the whole character recognition process.

At 204, the electronic device creates a first GPU process and sequentially obtains identification information from the first queue using the first GPU process.

In 205, the electronic device sequentially obtains the first images from the shared memory of the corresponding CPU process according to the identification information by using the first GPU process, and sequentially determines the position information of the text from each first image, to obtain the position information corresponding to each first image.

For example, when at least one piece of identification information exists in the first queue, the electronic device may create a first GPU process, and sequentially acquire the identification information from the first queue using the first GPU process. The identification information stored in the first queue may be obtained by the first GPU process.

When one piece of identification information is obtained, the electronic equipment can utilize the first GPU process to obtain a first image from the shared memory of the CPU process corresponding to the identification information, and sequentially perform position detection processing on the obtained first image so as to determine the position information of the character in the obtained first image and obtain the position information corresponding to the obtained first image. For example, assuming that the obtained identification information is the process ID of the CPU process P1, the electronic device may use the first GPU process to obtain the first image from the shared memory of the CPU process P1.

In some embodiments, the sequentially performing, by the electronic device, position detection processing on the acquired first image by using a first GPU process to determine position information of a text from the acquired first image, and obtaining position information corresponding to the acquired first image may include: the electronic equipment sequentially performs position detection processing on the acquired first images by using a first GPU process and a pre-trained position detection model so as to determine position information of characters in the acquired first images and obtain position information corresponding to the acquired first images. Wherein, the position detection model can be a deep neural network model.

At 206, the electronic device sequentially sends the location information corresponding to each first image to the corresponding CPU processes using the first GPU process.

In the embodiment of the application, each time the electronic device obtains the position information corresponding to one first image by using the first GPU process, the obtained position information corresponding to the first image is sent to the corresponding CPU process by using the first GPU process through the pipeline.

For example, assuming that the identification information acquired by the first GPU process is the process ID of the CPU process P1, the electronic device may acquire a first image from the shared memory of the CPU process P1 by using the first GPU process, and perform position detection processing on the first image, so as to obtain position information corresponding to the first image. The electronic device may then send the location information to the CPU process P1 using the first GPU process.

It can be understood that, in the embodiment of the present application, the electronic device performs decoding processing in parallel by using multiple CPU processes, stores the obtained first image in the shared memory, and stores the identification information in the first queue. Therefore, the first CPU process can continuously acquire the identification information from the first queue, and acquire the first image from the shared memory of the corresponding CPU process according to the acquired identification information to perform position detection processing, so that the waiting time of the first GPU process is reduced, and the resource utilization rate of the GPU is improved.

In 207, the electronic device performs cropping processing on the corresponding first image according to the received position information by using each CPU process to obtain a plurality of second images in sequence.

For example, each time a CPU process receives the position information, the electronic device may perform cropping processing on the corresponding first image according to the received position information by using the CPU process, so as to obtain a plurality of second images.

For example, assuming that the CPU process P1 receives the position information, the electronic apparatus can perform a cropping process on a first image obtained before the CPU process P1 using the CPU process P1 based on the received position information, thereby obtaining a second image. Wherein, according to the position information, the image is cropped by: and cutting out the area with characters in the image. It is understood that the electronic device obtains the position information according to which first image, and then performs the cropping processing on which first image according to the position information. For example, assuming that the electronic apparatus obtains position information from the first image G1, the electronic apparatus may perform a trimming process on the first image G1 according to the position information.

In some embodiments, after acquiring the location information, the electronic device may perform post-processing on the corresponding first image according to the location information by using a CPU process, for example, the location information may include the size and the location of the area. Then, the electronic device may use the CPU process to filter the corresponding first image according to the size and the position of the area, so as to determine the area where the text is located from the corresponding first image. Then, the electronic device can utilize the CPU process to cut out the area where the character is located. Then, the electronic device can utilize the CPU process to pre-process the region where the characters are located, so as to obtain a second image. The region where the text is located is preprocessed, so that the format, size, and the like of the obtained second image can be converted into a corresponding format and size, for example, the format, size, and the like of the obtained second image are converted into a format, size, and the like supported by the text recognition model.

At 208, the electronic device stores the second image obtained by each CPU process in its shared memory and stores its identification information in a second queue.

For example, each time the CPU process finishes trimming the corresponding first image according to the received position information, the electronic device may store the second image obtained by the CPU process in the shared memory thereof, and store the identification information thereof in the second queue. The identification information of the CPU process may be a process ID of the CPU process. The second queue may be a first-in-first-out queue, i.e. data that is entered into the queue first is processed first, and data that is entered into the queue is processed later.

In some embodiments, when the CPU process is used to store the identification information thereof in the second queue, the electronic device may also use the CPU process to store the image size, image format, and the like of the second image it obtains in the second queue.

In other embodiments, after the second image obtained by the CPU process is stored in the shared memory of the electronic device and the identification information of the second image is stored in the second queue, the electronic device may suspend the CPU process until the second GPU process sends the obtained text recognition result to the CPU process, and then the electronic device may enable the CPU process to enter the ready state.

At 209, the electronic device creates a second GPU process and sequentially retrieves identification information from the second queue using the second GPU process.

At 210, the electronic device sequentially obtains, by using the second GPU process, the second image from the shared memory of the corresponding CPU process according to the identification information obtained from the second queue.

In 211, the electronic device performs a text recognition process on the second image by using a second GPU process to obtain a text recognition result.

For example, when at least one piece of identification information exists in the second queue, the electronic device may create a second GPU process, and sequentially obtain the identification information from the second queue using the second GPU process. The identification information stored in the second queue may be obtained by the second GPU process.

And when one piece of identification information is obtained, the electronic equipment can utilize the second GPU process to obtain a second image from the shared memory of the CPU process corresponding to the identification information, and sequentially perform character recognition processing on the obtained second image to obtain a character recognition result. For example, assuming that the obtained identification information is the process ID of the CPU process P1, the electronic device may use the second GPU process to obtain the second image from the shared memory of the CPU process P1.

In some embodiments, the electronic device sequentially performs, by using the second GPU process, a text recognition process on the acquired second image to obtain a text recognition result, which may include: and the electronic equipment sequentially performs character recognition processing on the acquired second images by using a pre-trained character recognition model by using a second GPU process so as to obtain character recognition results. The character recognition model can be a deep neural network model.

In other embodiments, after the electronic device performs the text recognition processing on the second image obtained by the CPU process P1 by using the second GPU process to obtain the text recognition result, the electronic device may further send the text recognition result to the CPU process P1 by using the second GPU process through a pipeline. Subsequently, the electronic device may utilize the CPU process P1 to perform post-processing on the text recognition result, such as performing transcoding processing on the text recognition result, thereby converting the text recognition result into a format desired by the user, and so on.

It will be appreciated that the plurality of second images are derived from a plurality of first images derived from a plurality of video frames. The plurality of video frames belong to a video to be classified. Therefore, the electronic equipment can classify the video to be classified according to the character recognition result.

For example, the electronic device may perform word segmentation on the text recognition result to obtain a plurality of words. After obtaining the multiple word segmentations, the electronic device may determine the category of the video to be classified according to the multiple word segmentations. For example, when words such as songs, lyrics, singing, etc. appear for many times in a plurality of segmented words obtained by the electronic device, the electronic device may determine the category of the video to be classified as the category of the songs. For another example, if the electronic device analyzes the lyrics of a song according to the obtained multiple participles, the electronic device may determine the category of the video to be classified as the song category.

In some embodiments, after a certain CPU process executes a process of performing post-processing on a text recognition result, if a plurality of video frames in the process 201 have not been processed, the electronic device may continue to acquire a video frame by using the CPU process, and perform decoding processing on the video frame to obtain the first image. Then, the flow 203 to the flow 211 are executed to obtain the character recognition result corresponding to the first image. It can be understood that, when the above flow is executed, the electronic device does not need to create a new CPU process or GPU process, and continues to use the previously created CPU process and GPU process.

In other embodiments, after a certain CPU process executes the process of performing cropping processing on a corresponding first image, if a plurality of video frames in the process 201 have not been processed, the electronic device may continue to acquire a video frame by using the CPU process, and perform decoding processing on the video frame to obtain the first image. Then, the flow 203 to the flow 211 are executed to obtain the character recognition result corresponding to the first image. It can be understood that, when the above flow is executed, the electronic device does not need to create a new CPU process or GPU process, and continues to use the previously created CPU process and GPU process.

In some embodiments, the number of CPU processes, first GPU process and second GPU process may also be set appropriately to make the running time of the three parts as close as possible, thereby maximizing resource utilization. For example, for the character recognition method, the decoding processing process consumes long time, so that the number of the CPU processes can be set to be a larger number, and the first GPU process and the second GPU process can be relatively set to be a smaller number, so that the GPU processes can continuously obtain the computing tasks from the shared memory, for example, the first GPU process can continuously obtain the first image from the shared memory for position detection processing, and the second GPU process can continuously obtain the second image from the shared memory for character recognition processing, thereby maximizing the resource utilization rate of the GPU and improving the operating speed of the whole system.

In other embodiments, because the time required for the decoding process is long, the electronic device can also utilize the idle time of the CPU process to perform the decoding process, so as to improve the resource utilization of the CPU and save the running time of the whole process. For example, after a first image obtained by a certain CPU process is stored in the shared memory of the electronic device and the identification information of the first image is stored in the first queue, the electronic device may obtain the video frame again by using the CPU process to perform decoding processing, so as to obtain the first image again. And then, the electronic equipment can store the first image obtained again in the shared memory of the CPU process, and store the identification information of the CPU process in the first queue until the first GPU process sends the obtained position information to the CPU process, or the second GPU process sends the obtained character recognition result to the CPU process, if the CPU process is in the process of decoding the video frame, the electronic equipment can enable the CPU process to suspend decoding processing on the video frame, and start to cut the corresponding first image according to the received position information, or perform post-processing on the received character recognition result. If the CPU process finishes the cutting processing or the post-processing and does not receive the position information or the character recognition result, the electronic equipment can continue to decode the video frame by utilizing the CPU process.

In some embodiments, after the video frame is decoded by the CPU process P1 to obtain the first image G1, the electronic device may store the first image G1 and the identification information of the CPU process P1 in a database by using the CPU process P1. After the first GPU process obtains the first image G1 from the database and obtains the location information of the text in the first image G1, the electronic device may send the location information to the CPU process P1 according to the identification information associated with the first image G1.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a character recognition device according to an embodiment of the present application. The character recognition apparatus 300 may include: an acquisition module 301, a decoding module 302, a saving module 303, a determination module 304, a cropping module 305, and an identification module 306.

An obtaining module 301, configured to obtain a plurality of video frames;

a decoding module 302, configured to create multiple CPU processes, and perform decoding processing on each video frame by using each CPU process to obtain multiple first images;

a saving module 303, configured to save the plurality of first images into a database;

a determining module 304, configured to create a first GPU process, sequentially obtain first images from the database by using the first GPU process, and sequentially determine position information of a text from each first image to obtain position information corresponding to each first image;

a cropping module 305, configured to crop each first image according to the position information corresponding to each first image to obtain a plurality of second images;

and the recognition module 306 is configured to perform character recognition processing on each second image to obtain a character recognition result.

In some embodiments, the database includes a shared memory of each CPU process, and the saving module 303 may be configured to: storing the obtained first image into a shared memory by each CPU process, and storing identification information of the first image into a first queue;

the determining module 304 may be configured to: utilizing the first GPU process to sequentially acquire the identification information from the first queue; and sequentially acquiring a first image from the shared memory of the corresponding CPU process by utilizing the first GPU process according to the identification information.

In some embodiments, the determining module 304 may be configured to: sequentially sending the position information corresponding to each first image to a corresponding CPU process by utilizing the first GPU process;

the trimming module 305 may be configured to: and utilizing each CPU process to cut the corresponding first image according to the received position information so as to sequentially obtain a plurality of second images.

In some embodiments, the cutting module 305 may be configured to: storing the obtained second image into the shared memory by each CPU process, and storing the identification information into a second queue;

the identifying module 306 may be configured to: creating a second GPU process, and utilizing the second GPU process to sequentially acquire identification information from the second queue; sequentially acquiring second images from the shared memory of the corresponding CPU process by using the second GPU process according to the identification information acquired from the second queue; and performing character recognition processing on the second image by utilizing the second GPU process to obtain a character recognition result.

In some embodiments, the plurality of video frames are video frames corresponding to a video to be classified, and the identifying module 306 may be configured to: performing word segmentation processing on the character recognition result to obtain a plurality of words; and determining the category of the video to be classified according to the plurality of word segments.

In some embodiments, the identification module 306 may be configured to: determining a target keyword from the multiple word segments; and determining the category of the video to be classified according to the target keyword.

In some embodiments, the identification module 306 may be configured to: determining a category corresponding to the target keyword according to a preset mapping relation between the keyword and the category; and determining the category as the category of the video to be classified.

In some embodiments, the identification module 306 may be configured to: determining the same participle from the participles; determining the number of the same participles; and determining the same participles corresponding to the number larger than the preset number as the target keywords.

In some embodiments, the identification module 306 may be configured to: acquiring a user portrait of a user; judging whether the video to be classified is pushed to the user or not according to the user portrait and the category of the video to be classified; and if so, pushing the video to be classified to the user.

The present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to execute the flow in the character recognition method provided in this embodiment.

The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the flow in the character recognition method provided in this embodiment by calling the computer program stored in the memory.

For example, the electronic device may be a mobile terminal such as a tablet computer or a smart phone. Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

The electronic device 400 may include components such as a memory 401, a central processor 402, a graphics processor 403, and the like. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 4 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The memory 401 may be used to store applications and data. The memory 401 stores applications containing executable code. The application programs may constitute various functional modules. The central processor 402 executes various functional applications and data processing by running an application program stored in the memory 401.

The cpu 402 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing an application program stored in the memory 401 and calling data stored in the memory 401, thereby integrally monitoring the electronic device.

Graphics processor 403 may be used to do image and graphics related arithmetic work.

In this embodiment, the central processing unit 402 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 401 according to the following instructions, and the central processing unit 402 runs the application programs stored in the memory 401, so as to implement the following processes:

acquiring a plurality of video frames;

storing the plurality of first images in a database;

Referring to fig. 5, fig. 5 is a schematic view of a second structure of an electronic device according to an embodiment of the present disclosure.

The electronic device 400 may include components such as a memory 401, a central processor 402, a graphics processor 403, an input unit 404, an output unit 405, a display 406, and so forth.

The memory 401 may be used to store applications and data. The memory 401 stores applications containing executable code. The application programs may constitute various functional modules. The central processor 402 executes various functional applications and data processing by running the application programs stored in the storage 401.

The central processing unit 402 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, and performs various functions and processes of the electronic device by running or executing an application program stored in the memory 401 and calling data stored in the memory 401, thereby performing overall monitoring of the electronic device, such as decoding processing, cropping processing, and the like of the first image.

The graphics processor 403 may be used for performing image and graphics related operations, such as performing a position detection process on the first image, performing a character recognition process on the second image, and so on.

The input unit 404 may be used to receive input numbers, character information, or user characteristic information, such as a fingerprint, and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.

The output unit 405 may be used to display information input by or provided to a user and various graphical user interfaces of the electronic device, which may be made up of graphics, text, icons, video, and any combination thereof. The output unit may include a display panel.

The display 406 may be used to display text, pictures, etc.

acquiring a plurality of video frames;

storing the plurality of first images in a database;

In some embodiments, the database includes a shared memory of each CPU process, and the storing of the plurality of first images into the database by the central processing unit 402 may be performed by: storing the obtained first image into a shared memory by each CPU process, and storing identification information of the first image into a first queue; when the central processor 402 executes the first image sequentially obtained from the database by using the first GPU process, the following steps may be executed: utilizing the first GPU process to sequentially acquire the identification information from the first queue; and sequentially acquiring a first image from the shared memory of the corresponding CPU process by utilizing the first GPU process according to the identification information.

In some embodiments, after the central processor 402 determines the position information of the text from each first image in sequence to obtain the position information corresponding to each first image, the following steps may be further performed: sequentially sending the position information corresponding to each first image to a corresponding CPU process by utilizing the first GPU process; when the central processing unit 402 executes the trimming processing on each first image according to the corresponding position information of each first image to obtain a plurality of second images, the following steps may be executed: and utilizing each CPU process to cut the corresponding first image according to the received position information so as to sequentially obtain a plurality of second images.

In some embodiments, after the CPU 402 performs the cropping processing on the corresponding first image according to the received position information by using each CPU process to obtain a plurality of second images in sequence, the method may further perform: storing the obtained second image into the shared memory by each CPU process, and storing the identification information into a second queue; when the central processor 402 executes the character recognition processing on each second image to obtain a character recognition result, the following steps may be executed: creating a second GPU process, and utilizing the second GPU process to sequentially acquire identification information from the second queue; sequentially acquiring second images from the shared memory of the corresponding CPU process by using the second GPU process according to the identification information acquired from the second queue; and performing character recognition processing on the second image by utilizing the second GPU process to obtain a character recognition result.

In some embodiments, the video frames are video frames corresponding to a video to be classified, and the central processor 402 performs the character recognition processing on each second image, and after obtaining a character recognition result, may further perform: performing word segmentation processing on the character recognition result to obtain a plurality of words; and determining the category of the video to be classified according to the plurality of word segments.

In some embodiments, when the central processor 402 determines the category of the video to be classified according to the plurality of word segmentations, the following steps may be performed: determining a target keyword from the multiple word segments; and determining the category of the video to be classified according to the target keyword.

In some embodiments, when the central processor 402 determines the category of the video to be classified according to the target keyword, the following steps may be performed: determining a category corresponding to the target keyword according to a preset mapping relation between the keyword and the category; and determining the category as the category of the video to be classified.

In some embodiments, when the central processor 402 performs the determining of the target keyword from the plurality of segmented words, the following steps may be performed: determining the same participle from the participles; determining the number of the same participles; and determining the same participles corresponding to the number larger than the preset number as the target keywords.

In some embodiments, the central processor 402 may further perform: acquiring a user portrait of a user; judging whether the video to be classified is pushed to the user or not according to the user portrait and the category of the video to be classified; and if so, pushing the video to be classified to the user.

In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the character recognition method, and are not described herein again.

The text recognition device provided in the embodiment of the present application and the text recognition method in the above embodiment belong to the same concept, and any one of the methods provided in the text recognition method embodiment may be operated on the text recognition device, and the specific implementation process thereof is described in detail in the text recognition method embodiment, and is not described herein again.

It should be noted that, for the text recognition method described in the embodiment of the present application, it can be understood by those skilled in the art that all or part of the process of implementing the text recognition method described in the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, where the computer program can be stored in a computer-readable storage medium, such as a memory, and executed by at least one processor, and during the execution, the process of the embodiment of the text recognition method can be included. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

In the text recognition device according to the embodiment of the present application, each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.

The text recognition method, the text recognition device, the storage medium and the electronic device provided by the embodiments of the present application are introduced in detail, and a specific example is applied to illustrate the principle and the implementation manner of the present application, and the description of the embodiments is only used to help understanding the method and the core concept of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

A character recognition method comprises the following steps:

acquiring a plurality of video frames;

creating a plurality of CPU processes, and decoding each video frame by using each CUP process to obtain a plurality of first images;

storing the plurality of first images in a database;

creating a first GPU process, sequentially acquiring first images from the database by using the first GPU process, and sequentially determining the position information of characters in each first image to obtain the position information corresponding to each first image;

according to the position information corresponding to each first image, performing cutting processing on each first image to obtain a plurality of second images;

and performing character recognition processing on each second image to obtain a character recognition result.
The text recognition method of claim 1, wherein the database includes a shared memory for each CPU process, and wherein storing the plurality of first images in the database comprises:

storing the obtained first image into a shared memory by each CPU process, and storing identification information of the first image into a first queue;

the sequentially obtaining the first image from the database by using the first GPU process comprises:

utilizing the first GPU process to sequentially acquire the identification information from the first queue;

and sequentially acquiring a first image from the shared memory of the corresponding CPU process by utilizing the first GPU process according to the identification information.
The character recognition method of claim 2, wherein after the sequentially determining the position information of the character from each first image and obtaining the position information corresponding to each first image, the method further comprises:

sequentially sending the position information corresponding to each first image to a corresponding CPU process by utilizing the first GPU process;

the cutting processing is performed on each first image according to the position information corresponding to each first image to obtain a plurality of second images, and the cutting processing comprises the following steps:

and utilizing each CPU process to cut the corresponding first image according to the received position information so as to sequentially obtain a plurality of second images.
The character recognition method of claim 3, wherein the step of, after cropping, by each CPU process, a corresponding first image according to the received position information to obtain a plurality of second images in sequence, further comprises:

storing the obtained second image into the shared memory by each CPU process, and storing the identification information into a second queue;

the character recognition processing is performed on each second image to obtain a character recognition result, and the character recognition processing method comprises the following steps:

creating a second GPU process, and utilizing the second GPU process to sequentially acquire identification information from the second queue;

sequentially acquiring second images from the shared memory of the corresponding CPU process by using the second GPU process according to the identification information acquired from the second queue;

and performing character recognition processing on the second image by utilizing the second GPU process to obtain a character recognition result.
The method of claim 1, wherein the video frames are corresponding to videos to be classified, and after performing the text recognition processing on each second image to obtain a text recognition result, the method further comprises:

performing word segmentation processing on the character recognition result to obtain a plurality of words;

and determining the category of the video to be classified according to the plurality of word segments.
The character recognition method of claim 5, wherein the determining the category of the video to be classified according to the plurality of word segments comprises:

determining a target keyword from the multiple word segments;

and determining the category of the video to be classified according to the target keyword.
The character recognition method of claim 6, wherein the determining the category of the video to be classified according to the target keyword comprises:

determining a category corresponding to the target keyword according to a preset mapping relation between the keyword and the category;

and determining the category as the category of the video to be classified.
The method of claim 6, wherein determining the target keyword from the plurality of segmented words comprises:

determining the same participle from the participles;

determining the number of the same participles;

and determining the same participles corresponding to the number larger than the preset number as the target keywords.
The word recognition method of claim 6, wherein the method further comprises:

acquiring a user portrait of a user;

judging whether the video to be classified is pushed to the user or not according to the user portrait and the category of the video to be classified;

and if so, pushing the video to be classified to the user.
A character recognition apparatus, comprising:

the acquisition module is used for acquiring a plurality of video frames;

the decoding module is used for creating a plurality of CPU processes and decoding each video frame by using each CUP process to obtain a plurality of first images;

the storage module is used for storing the plurality of first images into a database;

the determining module is used for creating a first GPU process, sequentially acquiring first images from the database by using the first GPU process, and sequentially determining the position information of characters in each first image to obtain the position information corresponding to each first image;

the cutting module is used for cutting each first image according to the position information corresponding to each first image to obtain a plurality of second images;

and the recognition module is used for performing character recognition processing on each second image to obtain a character recognition result.
A storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the character recognition method of any one of claims 1 to 9.
An electronic device, wherein the electronic device comprises a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute, by calling the computer program stored in the memory:

acquiring a plurality of video frames;

creating a plurality of CPU processes, and decoding each video frame by using each CUP process to obtain a plurality of first images;

storing the plurality of first images in a database;

creating a first GPU process, sequentially acquiring first images from the database by using the first GPU process, and sequentially determining the position information of characters in each first image to obtain the position information corresponding to each first image;

according to the position information corresponding to each first image, performing cutting processing on each first image to obtain a plurality of second images;

and performing character recognition processing on each second image to obtain a character recognition result.
The electronic device of claim 12, wherein the database comprises a shared memory for each CPU process, the processor to perform:

storing the obtained first image into a shared memory by each CPU process, and storing identification information of the first image into a first queue;

utilizing the first GPU process to sequentially acquire the identification information from the first queue;

and sequentially acquiring a first image from the shared memory of the corresponding CPU process by utilizing the first GPU process according to the identification information.
The electronic device of claim 13, wherein the processor is configured to perform:

sequentially sending the position information corresponding to each first image to a corresponding CPU process by utilizing the first GPU process;

and utilizing each CPU process to cut the corresponding first image according to the received position information so as to sequentially obtain a plurality of second images.
The electronic device of claim 14, wherein the processor is configured to perform:

storing the obtained second image into the shared memory by each CPU process, and storing the identification information into a second queue;

creating a second GPU process, and utilizing the second GPU process to sequentially acquire identification information from the second queue;

sequentially acquiring second images from the shared memory of the corresponding CPU process by using the second GPU process according to the identification information acquired from the second queue;

and performing character recognition processing on the second image by utilizing the second GPU process to obtain a character recognition result.
The electronic device of claim 12, wherein the processor is configured to perform:

performing word segmentation processing on the character recognition result to obtain a plurality of words;

and determining the category of the video to be classified according to the plurality of word segments.
The electronic device of claim 16, wherein the processor is configured to perform:

determining a target keyword from the multiple word segments;

and determining the category of the video to be classified according to the target keyword.
The electronic device of claim 17, wherein the processor is configured to perform:

determining a category corresponding to the target keyword according to a preset mapping relation between the keyword and the category;

and determining the category as the category of the video to be classified.
The electronic device of claim 17, wherein the processor is configured to perform:

determining the same participle from the participles;

determining the number of the same participles;

and determining the same participles corresponding to the number larger than the preset number as the target keywords.
The electronic device of claim 17, wherein the processor is configured to perform:

acquiring a user portrait of a user;

judging whether the video to be classified is pushed to the user or not according to the user portrait and the category of the video to be classified;

and if so, pushing the video to be classified to the user.