CN116453142A

CN116453142A - Identification method, identification device, electronic equipment and computer medium

Info

Publication number: CN116453142A
Application number: CN202310417863.8A
Authority: CN
Inventors: 孙杰; 丁拥科
Original assignee: Zhongan Online P&c Insurance Co ltd
Current assignee: Zhongan Online P&c Insurance Co ltd
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-07-18

Abstract

The present disclosure discloses an identification method, an identification device, an electronic device, and a computer medium, the method comprising: determining first coordinate information of a plurality of images to be detected and each image to be detected in the initial image based on the initial image and a sliding window sampling method; determining second coordinate information of a preset object in the images to be detected according to the images to be detected; determining third coordinate information of a preset object in the initial image based on the second coordinate information and the first coordinate information of the image to be detected in the initial image, and obtaining a plurality of groups of third coordinate information corresponding to a plurality of images to be detected; and determining a recognition result corresponding to the preset object in the initial image based on the initial image and the plurality of groups of third coordinate information. According to the embodiment of the disclosure, the initial image is sampled and segmented into a plurality of images to be detected by the sliding window sampling method, and the coordinate information is respectively identified, so that the data processing capacity of the system is reduced, and the data processing efficiency is improved.

Description

Identification method, identification device, electronic equipment and computer medium

Technical Field

The disclosure belongs to the technical field of data processing, and particularly relates to an identification method, an identification device, electronic equipment and a computer medium.

Background

With the development and popularization of artificial intelligence technology, optical Character Recognition (OCR) has been applied in various industries and land, and is one of the essential capabilities in many technical systems. The mainstream OCR technology in the related art can be basically divided into two steps: character detection and character recognition. The character detection aims at detecting the character area range in the image, so that the image in the character area range can be conveniently intercepted from the image later and input into a character recognition system for character recognition.

The input of the current text detection method is an original image to be detected, and the input is the position coordinates of the text in the image. However, when the pixel size (resolution) of the image to be detected is too large, for example, when the side length reaches more than tens of thousands of pixels, when the image to be detected is input into the neural network with the text detection algorithm built therein at one time, huge memory or occupation can be caused in the operation process of the neural network, and the risk of overflowing of the memory or the memory of the system is high.

Disclosure of Invention

The embodiment of the disclosure provides an implementation scheme different from the related art, so as to solve the technical problem that when a large-size image to be detected is input into a neural network with a character detection algorithm built in at one time, huge memory or memory occupation is caused in the operation process of the neural network, and the risk of memory overflow of a system is large.

In a first aspect, the present disclosure provides an identification method, comprising:

determining first coordinate information of a plurality of images to be detected and each image to be detected in an initial image based on the initial image and a sliding window sampling method;

determining second coordinate information of a preset object in the images to be detected aiming at the images to be detected;

determining third coordinate information of the preset object in the initial image based on the second coordinate information and the first coordinate information of the image to be detected in the initial image, and obtaining a plurality of groups of third coordinate information corresponding to the plurality of images to be detected;

and determining a recognition result corresponding to the preset object in the initial image based on the initial image and the plurality of sets of third coordinate information.

In a second aspect, the present disclosure provides an identification device, the device comprising:

the first determining module is used for determining a plurality of images to be detected and first coordinate information of each image to be detected in the initial image based on the initial image and a sliding window sampling method;

the second determining module is used for determining second coordinate information of a preset object in the images to be detected aiming at the images to be detected;

The third determining module is used for determining third coordinate information of the preset object in the initial image based on the second coordinate information and the first coordinate information of the image to be detected in the initial image, and obtaining a plurality of groups of third coordinate information corresponding to the plurality of images to be detected;

the identification module is used for determining an identification result corresponding to the preset object in the initial image based on the initial image and the plurality of sets of third coordinate information.

In a third aspect, the present disclosure provides an electronic device comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any of the possible implementation manners of the first aspect via execution of the executable instructions.

In a fourth aspect, the presently disclosed embodiments provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the possible implementations of the first aspect.

According to the identification method, the initial image can be sampled and divided into a plurality of images to be detected through the sliding window sampling method, the first coordinate information of each image to be detected in the initial image is determined, and the initial image is sampled and divided into a plurality of images to be detected, so that the data processing capacity can be greatly reduced. Determining second coordinate information of a preset object in the to-be-detected image according to each to-be-detected image, further determining third coordinate information of the preset object in the initial image according to first coordinate information of the to-be-detected image in the initial image, obtaining a plurality of groups of third coordinate information corresponding to the plurality of to-be-detected images, finally determining identification content corresponding to the preset object in the initial image based on the initial image and the plurality of groups of third coordinate information, further calculating the third coordinate information of the preset object in the initial image according to the obtained second coordinate information of the preset object in the to-be-detected image, and finally identifying an identification result corresponding to the preset object according to the third coordinate information, thereby not only ensuring the accuracy of the identification result of the preset object, but also dividing the initial image into a plurality of to-be-detected images for processing, so that the display memory or the memory occupation in the operation process of the neural network is controlled within a certain range, and the risk of overflow of the display memory of the system is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the related art, a brief description will be given below of the drawings required for the embodiments or the related technical descriptions, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without any inventive effort for a person of ordinary skill in the art. In the drawings:

fig. 1 is a schematic flow chart of an identification method according to an embodiment of the disclosure;

FIG. 2 is a schematic view of a sliding window moving longitudinally according to an embodiment of the disclosure;

FIG. 3 is a schematic view illustrating a lateral movement of a sliding window according to an embodiment of the disclosure;

fig. 4 is a display diagram of second coordinate information corresponding to a preset object in an image to be detected according to an embodiment of the present disclosure;

fig. 5 is a display diagram of a preset object in two adjacent images to be detected according to an embodiment of the present disclosure;

fig. 6 is a display diagram of a preset object in one to-be-detected image of two adjacent to-be-detected images according to an embodiment of the present disclosure;

fig. 7 is a display diagram of a preset object in another to-be-detected image of two adjacent to-be-detected images according to an embodiment of the present disclosure;

Fig. 8 is a schematic structural diagram of an identification device according to an embodiment of the disclosure;

fig. 9 is a schematic block diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, are described in detail below. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present disclosure and are not to be construed as limiting the present disclosure.

The terms first and second and the like in the description, the claims and the drawings of embodiments of the disclosure are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the disclosure described herein may be capable of implementation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

With the development and popularization of artificial intelligence technology, optical Character Recognition (OCR) has been applied in various industries and land, and is one of the essential capabilities in many technical systems. The current main stream OCR technology facing Chinese and English can be basically divided into two steps: character detection and character recognition. The character detection aims at detecting the range of a character area in an image, so that a character part image can be intercepted from the image and input into a character recognition system.

The input of the current text detection method is an original image to be detected, and the input is the position coordinates of the text in the image. However, when the pixel size (resolution) of the image to be detected is too large, for example, when the side length reaches more than tens of thousands of pixels, when the image to be detected is input into the neural network with the text detection algorithm built therein at one time, huge memory or occupation can be caused in the operation process of the neural network, and the risk of overflowing of the memory or the memory of the system is high. A remedy method is that when the original image to be detected is scaled to a fixed small size and then is input into a neural network with a character detection algorithm, huge video memory or memory occupation can be controlled within a certain range in the operation process of the neural network, but the defect that the operation is remarkable is that the imaging quality of partial tiny characters in the original image to be detected is possibly reduced, the subsequent character recognition effect is affected, and the loss of a character detection target is seriously and even directly caused.

The technical scheme disclosed by the disclosure is mainly applied to the technical fields of character recognition and the like.

The following describes the technical scheme of the present disclosure and how the technical scheme of the present disclosure solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an identification method according to an exemplary embodiment of the present disclosure, where the method may be applied to various electronic devices with text recognition functions. The method comprises S101-S104:

s101, determining a plurality of images to be detected and first coordinate information of each image to be detected in the initial image based on the initial image and a sliding window sampling method.

In some embodiments of the present invention, in some embodiments, the first coordinate information mainly refers to coordinate information of a preset vertex of the image to be detected in the initial image. For example, the preset vertex is the vertex of the upper left corner of the image to be detected.

In some embodiments, determining a number of images to be detected based on an initial image and a sliding window sampling method includes:

Determining the height and width of a sliding window, moving the sliding window in an initial image for a plurality of times according to a preset initial position and a preset moving step length according to a sliding window sampling method, the height and the width, and taking an image in an area in the sliding window each time as an image to be detected to obtain a plurality of images to be detected.

The height and width of the sliding window are consistent with those of the image to be detected. Taking a text recognition system as an example, the width and the height of an image to be detected (the height and the width of a sliding window) are the input size of a text detection model in the text detection system, and mainly depend on the actual maximum bearing capacity of the text detection model in a production environment.

In some embodiments, the preset initial position mainly refers to an initial position where the sliding window starts to move. For example, in an initial image, the sliding window is moved longitudinally or laterally from the position of the upper left corner of the initial image, which is the preset initial position.

In some embodiments, the movement step primarily refers to the distance that the sliding window moves longitudinally or laterally.

In specific implementation, taking an initial image a as an example, after determining the height and width of the sliding window, the sliding window starts to move from the upper left corner of the initial image, when the sliding window moves transversely or longitudinally once according to the moving step length, the image in the area in the sliding window each time is used as an image to be detected, and the sliding window is continuously moved until the whole initial image a is sequentially and completely dropped into the sliding window.

In some implementations, determining first coordinate information of each image to be detected in the initial image based on the initial image and a sliding window sampling method includes S1011-S1015:

s1011, determining the number of the first group of images to be detected acquired by longitudinally moving the sliding window.

The number of images to be detected acquired by the longitudinally moving sliding window is determined according to the following formula:

wherein H is the height of the initial image, ceiling is an upward rounding function, n ^H For longitudinally moving the number of images to be detected acquired by the sliding window, the overlay ^H For the height of the overlapping portion between two adjacent images to be detected, which is obtained by longitudinally moving the sliding window, sampling_is the height of the sliding window, and can be seen in fig. 2.

In some embodiments, the height of the overlapping portion between two adjacent images to be detected acquired by longitudinally moving the sliding window mainly refers to the overlapping pixel distance. The overlapping part is arranged to ensure that characters near the boundary of the sampled image to be detected are completely displayed in the image to be detected at least once, so that the characters are prevented from being lost due to incomplete sampling.

S1012, determining ordinate information of the first group of images to be detected.

The ordinate information of the first set of images to be detected is determined according to the following formula:

wherein y_top _i Y_bottom, the ordinate of the upper border of the image to be detected _i For the ordinate of the lower boundary of the image to be detected, i takes the value of [1, n ^H ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein the respective ordinate is shown in fig. 2.

In some embodiments, as shown in FIG. 2, when the height of the original image is H-overlay ^H Not a moving step size Stride ^h Integer multiple of (n) th ^H Subsampled moving step size ^h Move step size Stride from previous n-1 samplings ^h Different, then move step size ^h Directly taking the position y_bottom as H. This ensures consistency of the sample size and integrity of the preset object.

S1013, determining the number of the second group of images to be detected acquired by the transverse moving sliding window.

The number of images to be detected obtained by transversely moving the sliding window is determined according to the following formula:

wherein W is the width of the initial image, n ^W For the number of images to be detected obtained by laterally moving the sliding window, the overlay ^W In order to laterally move the width of the overlapping portion between two adjacent images to be detected, which is obtained by the sliding window, sampling_w is the width of the sliding window, and specifically, see fig. 3. .

In some embodiments, the width of the overlapping portion between two adjacent images to be detected acquired by laterally moving the sliding window mainly refers to the overlapping pixel distance. The overlapping part is arranged to ensure that characters near the boundary of the sampled image to be detected are completely displayed in the image to be detected at least once, so that the characters are prevented from being lost due to incomplete sampling.

S1014, determining abscissa information of the second group of images to be detected.

The abscissa information of the second set of images to be detected is determined according to the following formula:

wherein x_left _j X_right, the abscissa of the left boundary of the image to be detected _j For the abscissa of the right boundary of the image to be detected, i takes the value of [1, n ] ^w ]The method comprises the steps of carrying out a first treatment on the surface of the Wherein the abscissa is shown with reference to fig. 3.

In some embodiments, as shown in FIG. 3, when the width of the original image is W-overlay ^W Not a lateral movement step size Stride ^h Integer multiple of (n) th ^H Subsampled moving step size ^h Move step size Stride from previous n-1 samplings ^h Different, then move step size ^h Directly taking x_right as W. This ensures consistency of the sample size and integrity of the preset object.

S1015, determining first coordinate information of each image to be detected in the initial image based on the abscissa information of the ordinate information.

The first coordinate information set of each image to be detected in the initial image is determined based on the abscissa information of the ordinate information according to the following formula:

R(i,j)

＝[(x_left _j ,_top _i ),(x_h _j ，y_top _i ),(x_left _j ,y_ _i ),(x_h _j ,y_ _i )],i∈[1,n ^H ],j∈[1,n ^W ]

wherein R (i, j) is a first coordinate information set of the image to be detected.

In this embodiment, from the first coordinate information set, the coordinate information of the vertex of the same position in the initial image of each image to be detected is selected as the first coordinate information of each image to be detected in the initial image. For example, first coordinate information Region (i, j):

Region(i,j)＝[x_left _j ,y_top _i ],i∈[1,n ^H ],j∈[1,n ^W ]。

s102, determining second coordinate information of a preset object in the images to be detected according to the images to be detected.

In some embodiments, the preset object is an object with the function of expressing information, such as a text, a symbol or a graph.

In some embodiments, determining the second coordinate information of the preset object in the image to be detected includes steps S1021-S1022:

s1021, a first connected domain corresponding to a preset object in the image to be detected and first edge information of the first connected domain are obtained.

In some embodiments, the connected domain: a region G on a plane is called a connected domain if a simple closed curve is made in any of the regions, and the inside of the closed curve always belongs to G.

In some embodiments, according to a position detection algorithm, a detection operation is performed on a position of a preset object in an image to be detected, so as to obtain a first connected domain of the preset object in the image to be detected and first edge information of the first connected domain.

For example, taking a preset object as a text as an example, a DBNet++ algorithm (text detection) is adopted, a feature map of the text in an image to be detected is obtained through a segmentation model based on self-adaptive multi-scale feature fusion, and simultaneously, the text and the background are determined pixel by pixel on a text threshold map obtained by processing the image to be detected through a differentiable binarization module, so that a first communication domain corresponding to the accurate text and first edge information of the first communication domain are obtained. Wherein each first communication field includes at least one text.

S1022, determining a minimum envelope rectangle corresponding to the first communication domain based on the first edge information, and determining vertex coordinate information of the minimum envelope rectangle as second coordinate information of the preset object in the image to be detected.

In some embodiments, the minimum envelope rectangle is primarily the minimum bounding rectangle that encloses the primitive and is parallel to the x, y axes.

In some embodiments, according to the first connected domain of the preset object in the image to be detected and the first edge information of the first connected domain obtained in step S1021, a minimum envelope rectangle of the preset object in the image to be detected is obtained according to a predetermined function cv2. Minuarerect, where the second coordinate information includes a plurality of first vertex coordinates, i.e. one coordinate information of the coordinate information of four vertices (four first vertex coordinates) of the minimum envelope rectangle, to form the second coordinate information of the preset object in the image to be detected.

As shown in fig. 4, taking a preset object as a text as an example, a minimum envelope rectangle of the text 1"AB" is a rectangle formed by dashed lines around the "AB", and coordinates of four vertices of the rectangle in the image to be detected form second coordinate information of the text 1"AB" in the image to be detected: [ (x 1, y 1), (x 2, y 1), (x 1, y 2), (x 2, y 2) ].

And S103, determining third coordinate information of a preset object in the initial image based on the second coordinate information and the first coordinate information of the image to be detected in the initial image, and obtaining a plurality of groups of third coordinate information corresponding to a plurality of images to be detected.

In some embodiments, determining third coordinate information of the preset object in the initial image based on the second coordinate information and the first coordinate information of the image to be detected in the initial image includes:

summing the first vertex coordinates and the first coordinate information aiming at each first vertex coordinate to obtain second vertex coordinates, and further obtaining a plurality of second vertex coordinates corresponding to the plurality of first vertex coordinates; and taking the coordinates of the plurality of second vertexes as third coordinate information of the preset object in the initial image.

In order to ensure the accuracy of the first coordinate information and the second coordinate information, the initial image and the fixed position point in the image to be detected are taken as the origin of a coordinate system in the process of acquiring the first coordinate information and the second coordinate information. For example, the initial image has the top left corner of the initial image as the origin, the right side of the origin as the positive x-axis, the bottom of the origin as the positive y-axis, in the image to be detected, the top point of the upper left corner is taken as an original point, the right side of the original point is taken as a positive x-axis, and the lower part of the original point is taken as a positive y-axis.

Therefore, in the embodiment of the disclosure, for first coordinate information of different images to be detected, the first coordinate information of the images to be detected and each first vertex coordinate in second coordinate information corresponding to a preset object in the images to be detected are summed to obtain third vertex coordinates corresponding to each first vertex coordinate, and the plurality of second vertex coordinates are used as third coordinate information of the preset object in the initial image.

For example, taking a preset object as a text as an example, if the first coordinate information of the image a to be detected is (1, 1), the second coordinate information of the text 1 in the image a to be detected [ (1, 1), (2, 1), (1, 2), (2, 2) ], the second coordinate information of the text 2 [ (3, 4), (4, 4), (3, 5), (4, 5) ], the second coordinate information of the text 3 [ (6, 7), (7, 7), (6, 8), (7, 8) ], and further the third coordinate information of the text 1 in the initial image [ (2, 2), (3, 2), (2, 3), (3, 3) ], the third coordinate information of the text 2 in the initial image [ (4, 5), (4, 6), (5, 6) ], and the third coordinate information of the text 3 in the initial image [ (7, 8), (8, 9) ].

If the first coordinate information of the image B to be detected is (4, 4), the second coordinate information of the character 1 in the image B to be detected [ (2, 2), (3, 2), (2, 3), (3, 3) ], the second coordinate information of the character 2 [ (4, 5), (5, 5), (4, 6), (5, 6) ], the second coordinate information of the character 3 [ (7, 8), (7, 9), (8, 9) ], and further the third coordinate information of the character 1 in the initial image [ (6, 6), (7, 6), (6, 7), (7, 7) ], the third coordinate information of the character 2 in the initial image [ (8, 9), (8, 10), (9, 10) ], and the third coordinate information of the character 3 in the initial image [ (11, 12), (12, 12), (11, 13), (12, 13) ].

In some embodiments, taking a preset object as a text, as shown in fig. 5, two adjacent images to be detected (to be detected image C and to be detected image D) respectively include corresponding preset objects, that is, the to-be-detected image C includes text 4, text 5 and text 6, and the to-be-detected image D includes text 7, text 8 and text 9. In fact, in the overlapping area of two adjacent images (the area where the solid line and the broken line overlap in fig. 5), there are three cases:

first, the text 4 in the image to be detected C is identical to the text 7 in the image to be detected D. Secondly, the text 6 in the image D to be detected is part of the text 5 in the image C to be detected, and finally, the text 6 in the image C to be detected and the text 9 in the image D to be detected are incomplete text, and the text 6 and the text 9 can be combined into one complete text.

In some embodiments, the method further comprises: traversing a first preset object in two images to be detected aiming at two adjacent images to be detected, and determining the intersection ratio of the pixel area and the pixel area of the first preset object contained in each of the two images to be detected;

and deleting third coordinate information corresponding to a second preset object with the smallest pixel area in the two images to be detected from a plurality of sets of third coordinate information when the ratio of the largest pixel area of the pixel areas of the two first preset objects contained in the two images to be detected to the pixel area of the union of the two first preset objects is larger than a third threshold value under the condition that the pixel area union ratio is larger than a first threshold value.

In some embodiments, the pixel area intersection ratio mainly refers to the ratio of the intersection and union of two preset object pixel areas.

In some embodiments, the pixel area intersection ratio of two first preset objects in two adjacent images to be detected is greater than the first threshold value, which represents that the two first preset objects are interlaced, and if the pixel area intersection ratio is not greater than the first threshold value, it represents that the two preset objects are not interlaced.

For example, taking the to-be-detected image C and the to-be-detected image D, and the first threshold value being 0 as an example, the third threshold value being 0.99, as shown in fig. 5 to 7, the pixel area of the letter 5 in the to-be-detected image C is determined to be 8, the pixel area of the letter 8 in the to-be-detected image D is determined to be 6, the area of the overlapping portion of the pixels of the letter 5 and the letter 8 is determined to be 6, the pixel area intersection ratio of the letter 5 and the letter 8 is determined to be 0.75, and the pixel area intersection ratio of the letter 5 and the letter 8 is greater than the first threshold value 0.75, so the ratio of the largest pixel area (the pixel area of the letter 5 is 8) among the letter 5 and the letter 8 to the pixel area of the union of the letter 5 and the letter 8 (the pixel area of the union is 8) is calculated to be 1, so the ratio 1 is greater than the third threshold value 0.99, and the third coordinate information corresponding to the letter 8 with the pixel area 6 is deleted from the sets of third coordinate information.

In some embodiments, when the ratio of the largest pixel area of the pixel areas of the two first preset objects included in the two images to be detected to the pixel area of the union of the two first preset objects is not greater than the third threshold, it is indicated that the preset objects included in the two adjacent images to be detected are not complete. I.e. as text 6 and text 9 in fig. 6-7, in this embodiment, text 6 in the image to be detected C is "ABCDEFGHJKLM", text 9 in the image to be detected D is "FGHJKLMBFDE", and two adjacent detected images are combined to look like the text "ABCDEFGHJKLMBFDE" in fig. 5 is a complete object.

For example, for the image to be detected C and the image to be detected D, the third threshold is 0.99, as shown in fig. 5-7, the pixel area of the text 6 in the image to be detected C is determined to be 12, the pixel area of the text 9 in the image to be detected D is determined to be 11, the pixel area of the union of the text 6 and the text 9 is determined to be 16, the ratio of the maximum pixel area (the pixel area of the text 6 is 12) in the text 6 and the text 9 to the pixel area (the pixel area of the union is 16) of the union of the text 6 and the text 9 is calculated to be 0.75, so the ratio of 0.75 is not greater than the third threshold 0.99, and it is explained that the text 6 and the text 9 are incomplete.

In this embodiment, in a case where a ratio of a maximum pixel area of two first preset objects included in the two images to be detected to a pixel area of a union of the two first preset objects is not greater than a third threshold value, third coordinate information of the preset objects is determined, as follows steps S1031 to S1035:

s1031, determining a third preset object in the initial image based on the initial image and the first preset objects respectively contained in the two adjacent images to be detected.

For example, taking the first preset object included in the image to be detected C as the text 6 and the first preset object included in the image to be detected D as the text 9 as shown in fig. 5-7, the text 6 in the image to be detected C is "ABCDEFGHJKLM", the text 9 in the image to be detected D is "FGHJKLMBFDE", and according to the third preset objects in the image to be detected C and the image to be detected D, i.e., "ABCDEFGHJKLMBFDE" shown in fig. 5.

According to the method, the reasonable overlapping sampling areas are arranged between two adjacent images to be detected, and the preset objects are combined in subsequent processing, so that the boundary of local sampling is ensured not to cause omission of the preset objects; and, since the quality of the image to be detected for each detection is identical to that of the initial image, the detection result of the preset object is accurate and reliable.

S1032, obtaining a second connected domain and second edge information of the second connected domain corresponding to the third preset object in the initial image.

In some embodiments, the second connected domain corresponding to the third preset object and the second edge information corresponding to the second connected domain are determined, and the specific determination method thereof may be referred to the above, which is not described herein.

S1033, based on the second edge information, determining a minimum envelope rectangle corresponding to the second connected domain, and determining vertex coordinate information of the minimum envelope rectangle corresponding to the second connected domain as fourth coordinate information of the third preset object in the initial image.

In some embodiments, in combination with the third preset object "ABCDEFGHJKLMBFDE" obtained in step S1031, a dashed line around the third preset object "ABCDEFGHJKLMBFDE" in fig. 5 is a minimum envelope rectangle corresponding to the third preset object of the second connected domain, and vertex coordinate information of the minimum envelope rectangle is fourth coordinate information of the third preset object in the initial image.

S1034, deleting the third coordinate information corresponding to the first preset object contained in each of the two adjacent images to be detected from the plurality of groups of third coordinate information.

And S1035, adding the fourth coordinate information as new third coordinate information to the groups of third coordinate information.

S104, determining a recognition result corresponding to the preset object in the initial image based on the initial image and the plurality of groups of third coordinate information.

In some embodiments, taking a preset object as a text as an example, obtaining third coordinate information of all the text in an initial image, and further, according to the third coordinate information in the initial image, cutting out a text region image, and for an inclined text region image, rotating the inclined text region image and obtaining a non-inclined result through an interpolation method. And then, normalizing the text to a neural network with a uniform size by a scaling method (such as a cv2.Resize method) and inputting the text into a text detection algorithm, so that the text content can be obtained.

According to the identification method, the initial image can be sampled and divided into a plurality of images to be detected through the sliding window sampling method, the first coordinate information of each image to be detected in the initial image is determined, and the initial image is sampled and divided into a plurality of images to be detected, so that the data processing capacity can be greatly reduced. Determining second coordinate information of a preset object in the to-be-detected image according to each to-be-detected image, further determining third coordinate information of the preset object in the initial image according to first coordinate information of the to-be-detected image in the initial image, obtaining a plurality of groups of third coordinate information corresponding to a plurality of to-be-detected images, finally determining identification content corresponding to the preset object in the initial image based on the initial image and the plurality of groups of third coordinate information, further calculating the third coordinate information of the preset object in the initial image according to the obtained second coordinate information of the preset object in the to-be-detected image, and finally identifying an identification result corresponding to the preset object according to the third coordinate information, so that not only is the accuracy of the identification result of the preset object ensured, but also the initial image is divided into a plurality of to-be-detected images to be processed, the display memory or the memory occupation in the operation process of the neural network can be controlled within a certain range, and the risk of system display memory or the overflow is reduced.

Fig. 8 is a schematic structural diagram of an identification device according to an exemplary embodiment of the present disclosure, where the structure includes: a first determination module 201, a second determination module 202, a third determination module 203, and an identification module 204.

A first determining module 201, configured to determine a plurality of images to be detected and first coordinate information of each image to be detected in the initial image based on the initial image and a sliding window sampling method;

a second determining module 202, configured to determine, for each image to be detected, second coordinate information of the preset object in the image to be detected;

the third determining module 203 is configured to determine third coordinate information of a preset object in the initial image based on the second coordinate information and first coordinate information of the image to be detected in the initial image, and obtain a plurality of sets of third coordinate information corresponding to a plurality of images to be detected;

the recognition module 204 is configured to determine a recognition result corresponding to the preset object in the initial image based on the initial image and the plurality of sets of third coordinate information.

In some embodiments, the first determining module 201 is further configured to determine a height and a width of the sliding window;

and according to the sliding window sampling method, the height and the width, moving the sliding window in the initial image for a plurality of times according to the preset initial position and the moving step length, taking the image which falls into the area in the sliding window each time as an image to be detected, and obtaining a plurality of images to be detected.

In some embodiments, the first determining module 201 is further configured to determine a number of the first set of images to be detected acquired by moving the sliding window longitudinally;

determining ordinate information of a first group of images to be detected;

determining the number of the second group of images to be detected, which are acquired by the transversely moving sliding window;

determining abscissa information of a second group of images to be detected;

and determining first coordinate information of each image to be detected in the initial image based on the abscissa information of the ordinate information.

In some embodiments, the second determining module 202 is further configured to obtain a first connected domain corresponding to a preset object in the image to be detected and first edge information of the first connected domain;

and determining the minimum envelope rectangle corresponding to the first communication domain based on the first edge information, and determining the vertex coordinate information of the minimum envelope rectangle as the second coordinate information of the preset object in the image to be detected.

In some embodiments, the second coordinate information includes a plurality of first vertex coordinates.

In some embodiments, the third determining module 203 is further configured to sum, for each first vertex coordinate, the first vertex coordinate and the first coordinate information to obtain a second vertex coordinate, and further obtain a plurality of second vertex coordinates corresponding to the plurality of first vertex coordinates;

And taking the coordinates of the plurality of second vertexes as third coordinate information of the preset object in the initial image.

In some embodiments, the third determining module 203 is further configured to traverse, for two adjacent images to be detected, a first preset object in the two images to be detected, and determine a pixel area to pixel area intersection ratio of the first preset object contained in each of the two images to be detected;

In some embodiments, the third determining module 203 is further configured to determine, if a ratio of a largest pixel area of the pixel areas of the two first preset objects included in the two images to be detected to a pixel area of a union of the two first preset objects is not greater than a third threshold, a third preset object in the initial image based on the initial image and the first preset objects included in each of the two adjacent images to be detected;

Acquiring a second connected domain corresponding to a third preset object and second edge information of the second connected domain in the initial image;

determining a minimum envelope rectangle corresponding to the second connected domain based on the second edge information, and determining vertex coordinate information of the minimum envelope rectangle corresponding to the second connected domain as fourth coordinate information of a third preset object in the initial image;

deleting third coordinate information corresponding to a first preset object contained in each of two adjacent images to be detected from a plurality of sets of third coordinate information;

the fourth coordinate information is added as new third coordinate information to the sets of third coordinate information.

The execution principle and interaction process of the constituent modules in the embodiment of the present apparatus, such as the first determining module 201, the second determining module 202, the third determining module 203, and the identifying module 204, may be referred to as the description of the method embodiments above.

According to the identification device disclosed by the invention, the initial image can be sampled and divided into a plurality of images to be detected through the sliding window sampling method, the first coordinate information of each image to be detected in the initial image is determined, and the data processing capacity can be greatly reduced through sampling and dividing the initial image into a plurality of images to be detected. Determining second coordinate information of a preset object in the to-be-detected image according to each to-be-detected image, further determining third coordinate information of the preset object in the initial image according to first coordinate information of the to-be-detected image in the initial image, obtaining a plurality of groups of third coordinate information corresponding to a plurality of to-be-detected images, finally determining identification content corresponding to the preset object in the initial image based on the initial image and the plurality of groups of third coordinate information, further calculating the third coordinate information of the preset object in the initial image according to the obtained second coordinate information of the preset object in the to-be-detected image, and finally identifying an identification result corresponding to the preset object according to the third coordinate information, so that not only is the accuracy of the identification result of the preset object ensured, but also the initial image is divided into a plurality of to-be-detected images to be processed, the display memory or the memory occupation in the operation process of the neural network can be controlled within a certain range, and the risk of system display memory or the overflow is reduced.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus may perform the above method embodiments, and the foregoing and other operations and/or functions of each module in the apparatus are respectively for corresponding flows in each method in the above method embodiments, which are not described herein for brevity.

The apparatus of the embodiments of the present disclosure are described above in terms of functional modules with reference to the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiments in the embodiments of the present disclosure may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present disclosure may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

Fig. 9 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure, which may include:

a memory 301 and a processor 302, the memory 301 being for storing a computer program and for transmitting the program code to the processor 302. In other words, the processor 302 may call and run a computer program from the memory 301 to implement the methods in the embodiments of the present disclosure.

For example, the processor 302 may be configured to perform the above-described method embodiments according to instructions in the computer program.

In some embodiments of the present disclosure, the processor 302 may include, but is not limited to:

a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

In some embodiments of the present disclosure, the memory 301 includes, but is not limited to:

volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).

In some embodiments of the present disclosure, the computer program may be partitioned into one or more modules that are stored in the memory 301 and executed by the processor 302 to perform the methods provided by the present disclosure. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device.

As shown in fig. 9, the electronic device may further include:

a transceiver 303, the transceiver 303 being connectable to the processor 302 or the memory 301.

The processor 302 may control the transceiver 303 to communicate with other devices, and in particular, may send information or data to other devices, or receive information or data sent by other devices. The transceiver 303 may include a transmitter and a receiver. The transceiver 303 may further include antennas, the number of which may be one or more.

It will be appreciated that the various components in the electronic device are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.

The present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present disclosure also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.

When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function, in whole or in part, according to embodiments of the present disclosure. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in various embodiments of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The foregoing is merely a specific embodiment of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it should be covered in the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of identification, the method comprising:

2. The method of claim 1, wherein the determining a number of images to be detected based on the initial image and the sliding window sampling method comprises:

determining the height and width of the sliding window;

and according to the sliding window sampling method, the height and the width, moving the sliding window in the initial image for a plurality of times according to a preset initial position and a moving step length, taking the image falling into the area in the sliding window each time as an image to be detected, and obtaining a plurality of images to be detected.

3. The method according to claim 2, wherein the determining first coordinate information of the respective images to be detected in the initial image based on the initial image and a sliding window sampling method includes:

Determining the number of a first group of images to be detected, which are obtained by longitudinally moving the sliding window;

determining ordinate information of the first group of images to be detected;

determining the number of second groups of images to be detected, which are obtained by transversely moving the sliding window;

determining abscissa information of the second group of images to be detected;

and determining first coordinate information of each image to be detected in the initial image based on the ordinate information and the abscissa information.

4. The method according to claim 1, wherein determining second coordinate information of a preset object in the image to be detected comprises:

acquiring first connected domain corresponding to the preset object in the image to be detected and first edge information of the first connected domain;

and determining a minimum envelope rectangle corresponding to the first connected domain based on the first edge information, and determining vertex coordinate information of the minimum envelope rectangle as second coordinate information of the preset object in the image to be detected.

5. The method of any one of claims 1 to 4, wherein the second coordinate information includes a plurality of first vertex coordinates;

determining third coordinate information of the preset object in the initial image based on the second coordinate information and the first coordinate information of the image to be detected in the initial image, wherein the third coordinate information comprises the following steps:

Summing the first vertex coordinates and the first coordinate information for each first vertex coordinate to obtain second vertex coordinates, and further obtaining a plurality of second vertex coordinates corresponding to the plurality of first vertex coordinates;

6. The method of claim 5, wherein the method further comprises:

traversing a first preset object in two adjacent images to be detected, and determining the intersection ratio of the pixel area and the pixel area of the first preset object contained in each of the two images to be detected;

and deleting third coordinate information corresponding to a second preset object with the smallest pixel area in the two images to be detected from the plurality of sets of third coordinate information when the ratio of the largest pixel area of the pixel areas of the two first preset objects contained in the two images to be detected to the pixel area of the union of the two first preset objects is larger than a third threshold value under the condition that the pixel area intersection ratio is larger than a first threshold value.

7. The method of claim 6, wherein the method further comprises:

Determining a third preset object in the initial image based on the initial image and the two first preset objects under the condition that the ratio of the largest pixel area in the pixel areas of the two first preset objects contained in the two images to be detected to the pixel area of the union of the two first preset objects is not greater than a third threshold;

acquiring a second connected domain corresponding to the third preset object and second edge information of the second connected domain in the initial image;

determining a minimum envelope rectangle corresponding to the second connected domain based on the second edge information, and determining vertex coordinate information of the minimum envelope rectangle corresponding to the second connected domain as fourth coordinate information of the third preset object in the initial image;

deleting third coordinate information corresponding to a first preset object contained in each of the two adjacent images to be detected from the plurality of sets of third coordinate information;

and adding the fourth coordinate information as new third coordinate information to the groups of third coordinate information.

8. An identification device, the device comprising:

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1-7 via execution of the executable instructions.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1-7.