CN111832554A - Image detection method, device and storage medium - Google Patents

Image detection method, device and storage medium Download PDF

Info

Publication number
CN111832554A
CN111832554A CN201910300190.1A CN201910300190A CN111832554A CN 111832554 A CN111832554 A CN 111832554A CN 201910300190 A CN201910300190 A CN 201910300190A CN 111832554 A CN111832554 A CN 111832554A
Authority
CN
China
Prior art keywords
character
detected
determining
image
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910300190.1A
Other languages
Chinese (zh)
Inventor
张恒瑞
郭明坚
宋翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
SF Tech Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201910300190.1A priority Critical patent/CN111832554A/en
Publication of CN111832554A publication Critical patent/CN111832554A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses an image detection method, an image detection device and a storage medium, wherein the image detection device acquires a plurality of images to be detected of a detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.

Description

Image detection method, device and storage medium
Technical Field
The present application relates to the field of image recognition, and in particular, to an image detection method, an image detection apparatus, and a storage medium.
Background
Recognition of objects such as license plates, house plates, etc. is a popular application in the field of Optical Character Recognition (OCR).
A large number of parking lots, toll stations and the like are arranged on the market, and the OCR is adopted for license plate recognition. However, in these application scenarios, the distance between the license plate and the camera needs to be about 1 meter, the imaging requirement is relatively high, but when the distance between the camera and the license plate is relatively long (for example, in the logistics industry, when the license plate of a vehicle at a loading/unloading port is identified, the distance between the camera and the license plate is greater than 3 meters), or the imaging quality of the object is poor due to environmental reasons, the method is applied to image identification, and the identification accuracy is relatively low.
Disclosure of Invention
The embodiment of the application provides an image detection method, an image detection device and a storage medium, which can improve the accuracy of image identification.
In one aspect, the present application provides an image detection method, including:
acquiring a plurality of images to be detected of a detection object;
respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string;
determining an editing distance matrix corresponding to the plurality of images to be detected according to the character string;
and determining a target character string corresponding to the detection object according to the editing distance matrix.
Optionally, in some embodiments, the determining, according to the character string, an edit distance matrix corresponding to the multiple images to be detected includes:
respectively carrying out editing distance calculation on any two character strings in the character strings to obtain the editing distance between any two character strings;
and determining the editing distance matrix according to the editing distance.
Optionally, in some embodiments, the determining, according to the edit distance matrix, a target character string corresponding to the detection object includes:
determining the number of zero elements of each row in the editing distance matrix;
determining the maximum zero element number with the maximum numerical value from the zero element numbers of each line;
determining whether the maximum number of zero elements is greater than a number threshold;
and if the number of the zero elements is larger than the number threshold, determining the character string corresponding to the zero element in the maximum zero element number as the target character string.
Optionally, in some embodiments, the detection result further includes a character, a frame corresponding to the character, and a confidence of the character, and after determining whether the maximum zero element number is greater than a number threshold, the method further includes:
if the number of the characters is not larger than the number threshold, determining a target character corresponding to each character position according to the characters, the frames corresponding to the characters and the confidence degrees of the characters;
and determining the target character string according to the target character corresponding to each character position.
Optionally, in some embodiments, the determining, according to the character, a frame corresponding to the character, and the confidence level of the character, a target character corresponding to each character position includes:
determining a character position corresponding to each character according to the frame corresponding to the character;
determining characters corresponding to each character position;
determining the score of each character in each character position according to the character corresponding to each character position and the confidence coefficient of the character;
and determining the character with the highest score in each character position as the target character corresponding to each character position.
Optionally, in some embodiments, the acquiring a plurality of images to be detected of the detection object includes:
acquiring a video to be detected of a detected object;
and acquiring the plurality of images to be detected from the video to be detected.
Optionally, in some embodiments, the acquiring the plurality of images to be detected from the video to be detected includes:
determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected;
and determining the multiple images to be detected from each frame of image to be detected according to the confidence coefficient of each frame of image to be detected and the target pixel area of each frame of image to be detected.
Correspondingly, the present application further provides an image detection apparatus, specifically including:
the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of images to be detected of a detection object;
the processing unit is used for respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, and the detection result comprises a character string;
the first determining unit is used for determining an editing distance matrix corresponding to the multiple images to be detected according to the character strings;
and the second determining unit is used for determining a target character string corresponding to the detection object according to the editing distance matrix.
Optionally, in some embodiments, the first determining unit is specifically configured to:
respectively carrying out editing distance calculation on any two character strings in the character strings to obtain the editing distance between any two character strings;
and determining the editing distance matrix according to the editing distance.
Optionally, in some embodiments, the second determining unit includes:
the first determining subunit is used for determining the number of zero elements in each row in the editing distance matrix;
the second determining subunit is used for determining the maximum zero element number with the maximum numerical value from the zero element numbers of each row;
a third determining subunit, configured to determine whether the maximum number of zero elements is greater than a number threshold;
and the fourth determining subunit is configured to determine, when the maximum number of zero elements is greater than the number threshold, a character string corresponding to a zero element in the maximum number of zero elements as the target character string.
Optionally, in some embodiments, the detection result further includes a character, a frame corresponding to the character, and a confidence level of the character, and the apparatus further includes:
a third determining unit, configured to determine, when the maximum zero element number is not greater than the number threshold, a target character corresponding to each character position according to the character, a frame corresponding to the character, and a confidence of the character;
and the fourth determining unit is used for determining the target character string according to the target character corresponding to each character position.
Optionally, in some embodiments, the third determining unit is specifically configured to:
determining a character position corresponding to each character according to the frame corresponding to the character;
determining characters corresponding to each character position;
determining the score of each character in each character position according to the character corresponding to each character position and the confidence coefficient of the character;
and determining the character with the highest score in each character position as the target character corresponding to each character position.
Optionally, in some embodiments, the obtaining unit is specifically configured to:
acquiring a video to be detected of a detected object;
and acquiring the plurality of images to be detected from the video to be detected.
Optionally, in some embodiments, the obtaining unit is further specifically configured to:
determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected;
and determining the multiple images to be detected from each frame of image to be detected according to the confidence coefficient of each frame of image to be detected and the target pixel area of each frame of image to be detected.
Yet another aspect of the present application provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the above-described aspects.
In addition, a storage medium is further provided, where multiple instructions are stored, and the instructions are suitable for being loaded by a processor to perform the steps in any one of the image detection methods provided in the embodiments of the present application.
In the embodiment of the application, an image detection device acquires a plurality of images to be detected of a detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an image detection method provided in an embodiment of the present application;
FIG. 2 is a schematic flowchart of an image detection method according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart of another image detection method provided in the embodiments of the present application;
FIG. 4 is a schematic structural diagram of an image detection apparatus provided in an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific embodiments shown, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.
The principles of the present application may be employed in numerous other general-purpose or special-purpose computing, communication environments or configurations. Examples of well known computing systems, environments, and configurations that may be suitable for use with the application include, but are not limited to, hand-held telephones, personal computers, servers, multiprocessor systems, microcomputer-based systems, mainframe-based computers, and distributed computing environments that include any of the above systems or devices.
The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The embodiment of the application provides an image detection method, an image detection device and a storage medium.
The image detection device can be integrated in the server, and the accuracy of image identification can be improved by the image detection device.
In some embodiments, the image detection apparatus in the present application may be applied to a loading and unloading port of a logistics transition, for identifying a license plate number of a vehicle at the loading and unloading port, as shown in fig. 1, fig. 1 is an application scene schematic diagram of the image detection method in the embodiments of the present application, a camera in fig. 1 may be installed at a position of the loading and unloading port close to a ceiling, and a lens faces a direction of a vehicle coming, the image detection apparatus in the present application obtains a surveillance video of a license plate position of the truck through the camera, extracts a plurality of images to be detected from the surveillance video, and then performs image detection processing on the plurality of images to be detected to obtain a detection result corresponding to each image to be detected, where the detection result includes a character string; then determining an edit distance matrix corresponding to a plurality of images to be detected according to the character strings; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. And the target character string is a license plate character string in the application scene.
The image detection apparatus in the present application may also be used to detect other objects, such as a doorplate, and the specific application scenario and the detection object are not limited herein.
Referring to fig. 2, fig. 2 is a schematic flowchart of an image detection method according to an embodiment of the present disclosure. The method comprises the following specific processes:
201. and acquiring a plurality of images to be detected of the detection object.
The object to be detected can be a license plate, a doorplate and other objects needing character strings identified through images.
In some embodiments, acquiring a plurality of images to be detected of a detection object includes: acquiring a video to be detected of a detected object; and then acquiring a plurality of images to be detected from the video to be detected.
In some embodiments, a camera may be used to collect a video to be detected of a detected object, a multi-target detection network (SSD) may be used to continuously detect a license plate in the video to be detected, and the detected license plate image and a corresponding confidence thereof may be stored in a queue.
More specifically, acquiring a plurality of images to be detected from a video to be detected includes: determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected; and then determining a plurality of images to be detected from each frame of images to be detected according to the confidence coefficient of each frame of images to be detected and the target pixel area of each frame of images to be detected.
After the confidence of each frame of image to be detected and the target pixel area of each frame of image to be detected are determined from the queue, the confidence of each frame of image to be detected is multiplied by the target pixel area of the corresponding image to be detected to serve as an evaluation index of the image to be detected, then each frame of image to be detected is sequenced, N license plates with the highest scores are taken out to serve as a plurality of images to be detected of an object to be detected in the embodiment of the application and are normalized to be a uniform size, wherein the value of N can be any number from 6 to 10.
202. And respectively carrying out image detection processing on the plurality of images to be detected to obtain a detection result corresponding to each image to be detected.
The method and the device for recognizing the character strings can be used for respectively carrying out image detection processing on the multiple images to be detected based on a deep learning character string recognition method, and storing detailed detection results, wherein the detection results comprise the character strings of each image to be detected.
In some embodiments, the detection result further includes a character, a frame corresponding to the character, and a confidence level of the character.
203. And determining an edit distance matrix corresponding to the plurality of images to be detected according to the character strings.
Specifically, in some embodiments, the edit distance between any two character strings in the character strings is calculated respectively to obtain the edit distance between any two character strings; and then determining the edit distance matrix according to the edit distance.
That is, the edit distance (Levenshtein distance) is calculated between every two character strings of each image to be detected in the multiple images to be detected, wherein the multiple images to be detected in the embodiment of the application can be N images to be detected, and the character strings of the N images to be detected are plateStri(i-0, 1, …, N-1), where the edit distance between string i and string j is Li,j=levenshteinDistance(plateStri,plateStrj)。
Wherein the edit distance is used to quantify the difference between two strings, defined as: the minimum number of times required to edit a single character (modify, insert, delete) when modifying string a to string B.
After the editing distance calculation is carried out on every two character strings in the N character strings, the editing distance between every two character strings can be combined into an editing distance matrix.
204. And determining a target character string corresponding to the detection object according to the editing distance matrix.
In some embodiments, the number of zero elements in each row in the edit distance matrix needs to be determined first; then determining the maximum zero element number with the maximum numerical value from the zero element number of each row; determining whether the maximum number of zero elements is greater than a number threshold; and if the number of the zero elements is larger than the number threshold, determining the character string corresponding to the zero element in the maximum zero element number as a target character string. If the number of the characters is not greater than the number threshold, determining a target character corresponding to each character position according to the characters, the frames corresponding to the characters and the confidence degrees of the characters; and determining a target character string according to the target character corresponding to each character position, wherein the number threshold may be an up-integer of N/2, for example, if N is 8, the number threshold is 4, and if N is 9, the number threshold is 5.
The editing distance matrix is a symmetric square matrix with a main diagonal line of 0, and the number of zero elements in each row represents the number of times that the corresponding character string of the row appears in the N detection results (character strings). If all the detection results are the same, the row is all zero, and the edit distance matrix L is a zero matrix at this time.
If L is a zero matrix, all detection results are the same at this time, the results are directly output, the calculated amount is reduced, otherwise, the number of zero elements of each row of the matrix L needs to be counted, the rows of the matrix L are sorted according to the number of the zero elements, the row with the most zero elements is obtained, and the zero elements of the row with the most zero elements represent the number n of the same detection results.
And judging whether N > ceil (N/2) is established or not, if so, determining the fused detection result to be the N same detection results, and determining the character strings corresponding to the N zero elements as target character strings.
If N > ceil (N/2) is not true, then the target character corresponding to each character position needs to be determined respectively, and then the target character string is determined according to the target character corresponding to each character position.
Specifically, because N images to be detected are normalized, and the detection result further includes each character in each image to be detected, a frame (i.e., bounding-box) of each character, and a confidence of each character, based on these information, we align the characters first, and then vote and sort the characters at the positions of each character to obtain the best character (i.e., target character) at the position, where the step of aligning the characters is as follows:
according to the bounding-box of each character in the detection result, a vertex coordinate x and a character width can be obtained, and then the horizontal coordinate of the center point of the corresponding character is calculated according to the vertex coordinate x and the character width:
Figure BDA0002027983760000081
then, one detection result is taken from the N detection results (each detection result includes all characters of one image to be detected) as a reference, for example, if the detection result of one image to be detected includes 8 characters, the 8 characters are taken as the reference, then each character in the remaining N-1 detection results is compared with the reference once from left to right, specifically, a threshold is set, and if the minimum value of the distance between the center point of one character and the horizontal coordinate of the center point of one reference character is smaller than the threshold, the character is classified to the corresponding reference character.
When all characters in the N-1 detection results are classified to the reference detection result (namely, when the character corresponding to each character position is determined), character voting of the character position is carried out from left to right according to the character position in the reference detection result, and it is assumed that m characters (C) exist at a certain character position1,C2,…,Cm) For character Ci(i ═ 1,2, …, m), the number of occurrences is l, the character score is:
scoreCi=confidence1+confidence2+…+confidencel
i.e. character CiAnd adding the corresponding character confidences to obtain the final score of the character, then taking the character with the highest score in the position as the optimal character (target character) of the position, and sequencing the optimal characters of all the character positions according to the sequence from left to right to obtain the target character string (namely the fused detection result).
In the embodiment of the application, an image detection device acquires a plurality of images to be detected of a detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
By using the image detection method, the problem of low image character recognition accuracy caused by poor image imaging quality due to the fact that the camera is far away from a detection object and the illumination of the shooting environment is complex can be effectively solved, the image detection technology can be applied to complex scenes such as a logistics loading and unloading port, the license plate of a vehicle can be automatically detected, and the scheduling management of the vehicle can be improved.
Referring to fig. 3, fig. 3 is another schematic flow chart of an image detection method according to an embodiment of the present application, and a specific flow of the method may be as follows:
301. the image detection equipment acquires a video to be detected of the license plate.
The object to be detected can be a license plate, a doorplate and other objects needing character strings through image recognition.
In some embodiments, a camera may be used to collect a video to be detected of a detected object, an SSD is used to continuously detect a license plate in the video to be detected, and the detected license plate image and a corresponding confidence thereof are stored in a queue.
302. The image detection equipment acquires a plurality of images to be detected from a video to be detected.
Specifically, in some embodiments, the confidence of each frame of the image to be detected in the video to be detected and the target pixel area of each frame of the image to be detected can be determined; and then determining a plurality of images to be detected from each frame of images to be detected according to the confidence coefficient of each frame of images to be detected and the target pixel area of each frame of images to be detected.
After the confidence of each frame of image to be detected and the target pixel area of each frame of image to be detected are determined from the queue, the confidence of each frame of image to be detected is multiplied by the target pixel area of the corresponding image to be detected to serve as an evaluation index of the image to be detected, then each frame of image to be detected is sequenced, N license plates with the highest scores are taken out to serve as a plurality of images to be detected of an object to be detected in the embodiment of the application and are normalized to be a uniform size, wherein the value of N can be any number from 6 to 10.
303. And the image detection equipment respectively carries out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected.
The detection result comprises a character string and a character corresponding to each image to be detected, a frame corresponding to the character and the confidence coefficient of the character.
In this embodiment, the character string is a specific series of license plate numbers, for example, the character string of a certain image to be detected is "yue B8866 x", the character is a character at each character position in the series of license plate numbers, for example, "B", "8", "6" and "x", a frame corresponding to the character is a bounding-box of each character, and the confidence of the character may reflect the definition or the trueness of each character.
Specifically, the embodiment of the application can perform image detection processing on the multiple images to be detected respectively based on a deep learning character string recognition method, and store detailed detection results.
304. The image detection equipment respectively carries out edit distance calculation on any two license plates in the license plates to obtain the edit distance between any two license plates.
That is, the edit distance (Levenshtein distance) is calculated between every two character strings (license plates) of each image to be detected in the plurality of images to be detected, wherein the plurality of images to be detected in the embodiment of the application can be N images to be detected, and the character strings of the N images to be detected are placestri(i-0, 1, …, N-1), where the edit distance between string i and string j is Li,j=levenshteinDistance(plateStri,plateStrj)。
305. The image detection device determines an edit distance matrix from the edit distance.
After the editing distance calculation is carried out on every two character strings in the N character strings, the editing distance between every two character strings can be combined into an editing distance matrix.
The edit distance matrix is a symmetric square matrix with a main diagonal line of 0, and the number of zero elements in each row represents the number of times that the corresponding character string in the row appears in the N detection results (character strings).
306. The image detection device determines the number of zero elements in each row in the edit distance matrix.
After the edit distance matrix is determined, the number of zero elements in each row in the edit distance matrix is counted.
If all the detection results (license plates) are the same, the row is all zero, and the edit distance matrix L is a zero matrix at the moment.
If L is a zero matrix, all detection results are the same at the moment, the results are directly output at the moment, the calculated amount is reduced, and otherwise, the number of zero elements in each row of the matrix L needs to be counted.
307. The image detection device determines the maximum zero element number with the maximum value from the zero element numbers of each line.
After the number of zero elements in each row in the edit distance matrix is determined, the rows of the edit distance matrix L are sorted according to the number of zero elements to obtain a row with the most zero elements, and the number of zero elements in the row (i.e., the maximum number of zero elements) is determined, where the number of zero elements in the row represents the number of the same detection results, for example, the number of zero elements in the row with the most zero elements is n (i.e., the maximum number of zero elements is n).
308. The image detection equipment determines whether the maximum number of zero elements is greater than a number threshold value; if yes, go to step 309, and if not, go to step 310 and 311.
After the maximum zero element number N of the edit distance matrix L is obtained, it is determined whether the maximum zero element number is greater than a number threshold, where the number threshold may be ceil (N/2), which is an upward integer of N/2.
309. And the image detection equipment determines the character strings corresponding to the zero elements in the maximum zero element number as the target license plate.
If n is greater than the number threshold, it indicates that most elements of the row where the largest zero element is located are zero elements, and at this time, the character string corresponding to the zero element in the row can be directly determined as a target license plate (a finally confirmed license plate).
310. And the image detection equipment determines a target character corresponding to each character position according to the characters, the frames corresponding to the characters and the confidence degrees of the characters.
If n is not greater than the number threshold, determining the target license plate according to each character in the picture to be detected, specifically:
because the N images to be detected are normalized, and the detection result further includes each character in each image to be detected, a border of each character (i.e., bounding-box), and a confidence of each character, based on these information, we align the characters first, and then vote and sort the characters at the positions of each character to obtain the optimal characters (i.e., target characters) at the positions, where the step of aligning the characters is as follows:
according to the bounding-box of each character in the detection result, a vertex coordinate x and a character width can be obtained, and then the horizontal coordinate of the center point of the corresponding character is calculated according to the vertex coordinate x and the character width:
Figure BDA0002027983760000121
then, one detection result is taken from the N detection results (each detection result includes all characters of one image to be detected) as a reference, for example, if the detection result of one image to be detected includes 8 characters, the 8 characters are taken as the reference, then each character in the remaining N-1 detection results is compared with the reference once from left to right, specifically, a threshold is set, and if the minimum value of the distance between the center point of one character and the horizontal coordinate of the center point of one reference character is smaller than the threshold, the character is classified to the corresponding reference character.
When all characters in the N-1 detection results are classified to the reference detection result (namely, when the character corresponding to each character position is determined), character voting of the character position is carried out from left to right according to the character position in the reference detection result, and it is assumed that m characters (C) exist at a certain character position1,C2,…,Cm) For character Ci(i ═ 1,2, …, m), the number of occurrences is l, the character score is:
scoreCi=confidence1+confidence2+…+confidencel
i.e. character CiAdding the corresponding character confidences to obtain the final score of the character, and then, obtaining the highest score in the positionAs the optimal character (target character) for that position.
311. And the image detection equipment determines a target license plate according to the target character corresponding to each character position.
And sequencing the optimal characters of all the character positions from left to right to obtain the target license plate (namely the fused detection result).
In the embodiment of the application, the image detection equipment acquires a plurality of images to be detected of a detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target license plate corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target license plate corresponding to the detection object is determined by using the edit distance matrix corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
By using the image detection method, the problem of low image character recognition accuracy caused by poor image imaging quality due to the fact that the camera is far away from a detection object and the illumination of the shooting environment is complex can be effectively solved, the image detection technology can be applied to complex scenes such as a logistics loading and unloading port, the license plate of a vehicle can be automatically detected, and the scheduling management of the vehicle can be improved.
That is, in the image detection method in the embodiment of the present application, first, two character strings corresponding to detection objects are compared to obtain an edit distance, an edit distance matrix is constructed, then, zero element properties in the edit distance matrix are used to simplify calculation under specific conditions, and a fused detection result is directly obtained, for example, if the edit distance matrix is an all-zero element matrix, the detection result is directly output at this time, if the edit distance matrix is a non-all-zero element matrix, the detection result is determined according to a row with the most zero elements, if the number of zero elements in the row with the most zero elements is greater than ceil (N/2), the character strings corresponding to the zero elements in the row are fused, the detection result is output, if the number of zero elements in the row with the most zero elements is not greater than ceil (N/2), then a voting strategy based on character positions needs to be performed through a normalized image at this time, and obtaining the optimal character of each character position, and then obtaining a final detection result according to the optimal character of each character position.
In order to better implement the image detection method provided by the embodiment of the present application, an embodiment of the present application further provides an image detection device, and the image detection device may be specifically integrated in a server. The meanings of the nouns are the same as those in the text recognition method, and specific implementation details can refer to the description in the method embodiment.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present application, where the image detection apparatus includes: the acquisition unit 401, the processing unit 402, the first determination unit 403, and the second determination unit 404 are as follows:
an acquiring unit 401, configured to acquire a plurality of images to be detected of a detection object;
the processing unit 402 is configured to perform image detection processing on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, where the detection result includes a character string;
a first determining unit 403, configured to determine, according to the character string, an edit distance matrix corresponding to the multiple images to be detected;
a second determining unit 404, configured to determine, according to the edit distance matrix, a target character string corresponding to the detection object.
In some embodiments, the first determining unit 403 is specifically configured to:
respectively carrying out editing distance calculation on any two character strings in the character strings to obtain the editing distance between any two character strings;
and determining the editing distance matrix according to the editing distance.
Referring to fig. 5, in some embodiments, the second determining unit 404 includes:
a first determining subunit 4041, configured to determine the number of zero elements in each row in the edit distance matrix;
a second determining subunit 4042, configured to determine, from the number of zero elements in each row, the maximum number of zero elements with the largest numerical value;
a third determining subunit 4043, configured to determine whether the maximum number of zero elements is greater than a number threshold;
a fourth determining subunit 4044, configured to determine, when the maximum number of zero elements is greater than the number threshold, a character string corresponding to a zero element in the maximum number of zero elements as the target character string.
In some embodiments, the detection result further includes a character, a frame corresponding to the character, and a confidence of the character, and the apparatus further includes:
a third determining unit 405, configured to determine, when the maximum number of zero elements is not greater than the number threshold, a target character corresponding to each character position according to the character, a frame corresponding to the character, and a confidence of the character;
a fourth determining unit 406, configured to determine the target character string according to the target character corresponding to each character position.
In some embodiments, the third determining unit 405 is specifically configured to:
determining a character position corresponding to each character according to the frame corresponding to the character;
determining characters corresponding to each character position;
determining the score of each character in each character position according to the character corresponding to each character position and the confidence coefficient of the character;
and determining the character with the highest score in each character position as the target character corresponding to each character position.
In some embodiments, the obtaining unit 401 is specifically configured to:
acquiring a video to be detected of a detected object;
and acquiring the plurality of images to be detected from the video to be detected.
In some embodiments, the obtaining unit 401 is further specifically configured to:
determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected;
and determining the multiple images to be detected from each frame of image to be detected according to the confidence coefficient of each frame of image to be detected and the target pixel area of each frame of image to be detected.
In the embodiment of the present application, the obtaining unit 401 obtains a plurality of images to be detected of a detection object; then, the processing unit 402 performs image detection processing on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; the first determining unit 403 determines an edit distance matrix corresponding to a plurality of images to be detected according to the character string; finally, the second determining unit 404 determines a target character string corresponding to the detection object according to the edit distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
Referring to fig. 6, embodiments of the present application provide a server 600, which may include one or more processors 601 of a processing core, one or more memories 602 of a computer-readable storage medium, a Radio Frequency (RF) circuit 603, a power supply 604, an input unit 605, and a display unit 606. Those skilled in the art will appreciate that the server architecture shown in FIG. 5 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 601 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 602 and calling data stored in the memory 602, thereby performing overall monitoring of the server. Optionally, processor 601 may include one or more processing cores; preferably, the processor 601 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601.
The memory 602 may be used to store software programs and modules, and the processor 601 executes various functional applications and data processing by operating the software programs and modules stored in the memory 602.
The RF circuitry 603 may be used for receiving and transmitting signals during the process of transmitting and receiving information.
The server also includes a power supply 604 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 601 via a power management system to manage charging, discharging, and power consumption management functions via the power management system.
The server may also include an input unit 605, and the input unit 605 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The server may also include a display unit 606, and the display unit 606 may be used to display information input by the user or provided to the user, as well as various graphical user interfaces of the server, which may be made up of graphics, text, icons, video, and any combination thereof. Specifically, in this embodiment, the processor 601 in the server loads the executable file corresponding to the process of one or more application programs into the memory 602 according to the following instructions, and the processor 601 runs the application programs stored in the memory 602, thereby implementing various functions as follows:
acquiring a plurality of images to be detected of a detection object;
respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string;
determining an editing distance matrix corresponding to the plurality of images to be detected according to the character string;
and determining a target character string corresponding to the detection object according to the editing distance matrix.
As can be seen from the above, in the embodiment of the present application, the image detection device obtains a plurality of images to be detected of the detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, the present application provides a storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the image detection methods provided in the present application. For example, the instructions may perform the steps of:
acquiring a plurality of images to be detected of a detection object;
respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string;
determining an editing distance matrix corresponding to the plurality of images to be detected according to the character string;
and determining a target character string corresponding to the detection object according to the editing distance matrix.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any image detection method provided in the embodiments of the present application, beneficial effects that can be achieved by any image detection method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The foregoing detailed description is directed to an image detection method, an image detection apparatus, and a storage medium provided in the embodiments of the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the methods and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. An image detection method, comprising:
acquiring a plurality of images to be detected of a detection object;
respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string;
determining an editing distance matrix corresponding to the plurality of images to be detected according to the character string;
and determining a target character string corresponding to the detection object according to the editing distance matrix.
2. The method according to claim 1, wherein the determining the edit distance matrix corresponding to the plurality of images to be detected according to the character string comprises:
respectively carrying out editing distance calculation on any two character strings in the character strings to obtain the editing distance between any two character strings;
and determining the editing distance matrix according to the editing distance.
3. The method according to claim 1, wherein the determining a target character string corresponding to the detection object according to the edit distance matrix comprises:
determining the number of zero elements of each row in the editing distance matrix;
determining the maximum zero element number with the maximum numerical value from the zero element numbers of each line;
determining whether the maximum number of zero elements is greater than a number threshold;
and if the number of the zero elements is larger than the number threshold, determining the character string corresponding to the zero element in the maximum zero element number as the target character string.
4. The method of claim 3, wherein the detection result further includes a character, a frame corresponding to the character, and a confidence of the character, and after determining whether the maximum number of zero elements is greater than a number threshold, the method further includes:
if the number of the characters is not larger than the number threshold, determining a target character corresponding to each character position according to the characters, the frames corresponding to the characters and the confidence degrees of the characters;
and determining the target character string according to the target character corresponding to each character position.
5. The method according to claim 4, wherein the determining a target character corresponding to each character position according to the character, a frame corresponding to the character, and a confidence level of the character comprises:
determining a character position corresponding to each character according to the frame corresponding to the character;
determining characters corresponding to each character position;
determining the score of each character in each character position according to the character corresponding to each character position and the confidence coefficient of the character;
and determining the character with the highest score in each character position as the target character corresponding to each character position.
6. The method according to any one of claims 1 to 5, wherein the acquiring a plurality of images to be detected of the detection object comprises:
acquiring a video to be detected of a detected object;
and acquiring the plurality of images to be detected from the video to be detected.
7. The method according to claim 6, wherein the obtaining the plurality of images to be detected from the video to be detected comprises:
determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected;
and determining the multiple images to be detected from each frame of image to be detected according to the confidence coefficient of each frame of image to be detected and the target pixel area of each frame of image to be detected.
8. An image detection apparatus, characterized by comprising:
the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of images to be detected of a detection object;
the processing unit is used for respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, and the detection result comprises a character string;
the first determining unit is used for determining an editing distance matrix corresponding to the multiple images to be detected according to the character strings;
and the second determining unit is used for determining a target character string corresponding to the detection object according to the editing distance matrix.
9. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps in the image detection method according to any one of claims 1 to 7.
10. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the image detection method according to any one of claims 1 to 7.
CN201910300190.1A 2019-04-15 2019-04-15 Image detection method, device and storage medium Pending CN111832554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910300190.1A CN111832554A (en) 2019-04-15 2019-04-15 Image detection method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910300190.1A CN111832554A (en) 2019-04-15 2019-04-15 Image detection method, device and storage medium

Publications (1)

Publication Number Publication Date
CN111832554A true CN111832554A (en) 2020-10-27

Family

ID=72914502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910300190.1A Pending CN111832554A (en) 2019-04-15 2019-04-15 Image detection method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111832554A (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003223610A (en) * 2002-01-28 2003-08-08 Toshiba Corp Character recognizing device and character recognizing method
JP2011065646A (en) * 2009-09-18 2011-03-31 Fujitsu Ltd Apparatus and method for recognizing character string
CN102750379A (en) * 2012-06-25 2012-10-24 华南理工大学 Fast character string matching method based on filtering type
US20130314755A1 (en) * 2012-05-23 2013-11-28 Andrew C. Blose Image capture device for extracting textual information
CN103428307A (en) * 2013-08-09 2013-12-04 中国科学院计算机网络信息中心 Method and equipment for detecting counterfeit domain names
CN103493067A (en) * 2011-12-26 2014-01-01 华为技术有限公司 Method and apparatus for recognizing a character of a video
CN103996021A (en) * 2014-05-08 2014-08-20 华东师范大学 Fusion method of multiple character identification results
CN104464736A (en) * 2014-12-15 2015-03-25 北京百度网讯科技有限公司 Error correction method and device for voice recognition text
CN105930836A (en) * 2016-04-19 2016-09-07 北京奇艺世纪科技有限公司 Identification method and device of video text
CN106203425A (en) * 2016-07-01 2016-12-07 北京旷视科技有限公司 Character identifying method and device
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text
CN107220639A (en) * 2017-04-14 2017-09-29 北京捷通华声科技股份有限公司 The correcting method and device of OCR recognition results
RU2673015C1 (en) * 2017-12-22 2018-11-21 Общество с ограниченной ответственностью "Аби Продакшн" Methods and systems of optical recognition of image series characters
CN108920580A (en) * 2018-06-25 2018-11-30 腾讯科技(深圳)有限公司 Image matching method, device, storage medium and terminal

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003223610A (en) * 2002-01-28 2003-08-08 Toshiba Corp Character recognizing device and character recognizing method
JP2011065646A (en) * 2009-09-18 2011-03-31 Fujitsu Ltd Apparatus and method for recognizing character string
CN103493067A (en) * 2011-12-26 2014-01-01 华为技术有限公司 Method and apparatus for recognizing a character of a video
US20130314755A1 (en) * 2012-05-23 2013-11-28 Andrew C. Blose Image capture device for extracting textual information
CN102750379A (en) * 2012-06-25 2012-10-24 华南理工大学 Fast character string matching method based on filtering type
CN103428307A (en) * 2013-08-09 2013-12-04 中国科学院计算机网络信息中心 Method and equipment for detecting counterfeit domain names
CN103996021A (en) * 2014-05-08 2014-08-20 华东师范大学 Fusion method of multiple character identification results
CN104464736A (en) * 2014-12-15 2015-03-25 北京百度网讯科技有限公司 Error correction method and device for voice recognition text
CN105930836A (en) * 2016-04-19 2016-09-07 北京奇艺世纪科技有限公司 Identification method and device of video text
CN106203425A (en) * 2016-07-01 2016-12-07 北京旷视科技有限公司 Character identifying method and device
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text
CN107220639A (en) * 2017-04-14 2017-09-29 北京捷通华声科技股份有限公司 The correcting method and device of OCR recognition results
RU2673015C1 (en) * 2017-12-22 2018-11-21 Общество с ограниченной ответственностью "Аби Продакшн" Methods and systems of optical recognition of image series characters
CN108920580A (en) * 2018-06-25 2018-11-30 腾讯科技(深圳)有限公司 Image matching method, device, storage medium and terminal

Similar Documents

Publication Publication Date Title
CN108898086B (en) Video image processing method and device, computer readable medium and electronic equipment
CN110020592B (en) Object detection model training method, device, computer equipment and storage medium
US8792722B2 (en) Hand gesture detection
EP3806064B1 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
US20120027263A1 (en) Hand gesture detection
CN112052186B (en) Target detection method, device, equipment and storage medium
CN103617432A (en) Method and device for recognizing scenes
CN112381104A (en) Image identification method and device, computer equipment and storage medium
CN109116129B (en) Terminal detection method, detection device, system and storage medium
CN111767908B (en) Character detection method, device, detection equipment and storage medium
CN111160202A (en) AR equipment-based identity verification method, AR equipment-based identity verification device, AR equipment-based identity verification equipment and storage medium
CN108229494B (en) Network training method, processing method, device, storage medium and electronic equipment
EP4113376A1 (en) Image classification model training method and apparatus, computer device, and storage medium
CN111461105A (en) Text recognition method and device
CN113902944A (en) Model training and scene recognition method, device, equipment and medium
CN114299546A (en) Method and device for identifying pet identity, storage medium and electronic equipment
CN113158773B (en) Training method and training device for living body detection model
CN112532884B (en) Identification method and device and electronic equipment
WO2021138893A1 (en) Vehicle license plate recognition method and apparatus, electronic device, and storage medium
CN112488054A (en) Face recognition method, face recognition device, terminal equipment and storage medium
CN113269730B (en) Image processing method, image processing device, computer equipment and storage medium
EP4332910A1 (en) Behavior detection method, electronic device, and computer readable storage medium
CN111832554A (en) Image detection method, device and storage medium
CN112214639B (en) Video screening method, video screening device and terminal equipment
CN117671548A (en) Abnormal sorting detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination