CN113192081B

CN113192081B - Image recognition method, image recognition device, electronic device and computer-readable storage medium

Info

Publication number: CN113192081B
Application number: CN202110485406.3A
Authority: CN
Inventors: 李笑寒; 张维一
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2024-06-11
Anticipated expiration: 2041-04-30
Also published as: CN113192081A

Abstract

The present disclosure relates to an image recognition method, apparatus, electronic device, and computer-readable storage medium, the method comprising: acquiring an original image and a gray level image corresponding to the original image, wherein the original image comprises an icon to be identified; separating an icon foreground region from a background region in the original image, and setting a gray value of the background region corresponding to the gray map as a preset gray value; respectively carrying out transverse projection and longitudinal projection on pixel values of pixels in the gray level image, and determining an icon area in the gray level image according to a projection result; extracting the icon to be identified from a target area corresponding to the icon area in the original image; and carrying out image recognition on the extracted icon to be recognized to obtain an icon recognition result. By the adoption of the scheme, the speed of image recognition can be increased, and the accuracy of image recognition is improved.

Description

Image recognition method, image recognition device, electronic device and computer-readable storage medium

Technical Field

The present disclosure relates to the field of computer applications, and more particularly, to image recognition methods, apparatuses, electronic devices, and computer-readable storage media.

Background

In order to obtain the information of the subscribed column more quickly, the icon can be generally extracted from the picture containing the icon of the subscribed column, and the information of the corresponding subscribed column can be obtained after the extracted icon is identified; for example, if there is an icon of the video creator in a video shot, the icon of the video creator may be extracted from the video shot, and further information about the video creator may be found by identifying the icon of the video creator.

In the related art, people can usually use a neural network algorithm to complete the process of extraction and identification, but because the neural network algorithm consumes a large amount of energy, a long analysis processing time may need to be waited on a device with limited performance, which affects the user experience.

Disclosure of Invention

In view of the above, the present disclosure provides an image recognition method, an image recognition device, an electronic device, and a storage medium, so as to at least solve the technical problems in the related art. The technical scheme of the present disclosure is as follows:

According to a first aspect of an embodiment of the present disclosure, an image recognition method is provided, including:

Acquiring an original image and a gray level image corresponding to the original image, wherein the original image comprises an icon to be identified;

Separating an icon foreground region from a background region in the original image, and setting a gray value of the background region corresponding to the gray map as a preset gray value;

respectively carrying out transverse projection and longitudinal projection on pixel values of pixels in the gray level image, and determining an icon area in the gray level image according to a projection result;

extracting the icon to be identified from a target area corresponding to the icon area in the original image;

and carrying out image recognition on the extracted icon to be recognized to obtain an icon recognition result.

Optionally, the separating the foreground area from the background area of the icon in the gray map, and setting the gray value of the background area to a preset gray value includes:

And counting the gray values of all pixels in the gray map to obtain the gray value with the highest occurrence frequency in the gray map, and setting the gray value with the highest occurrence frequency as a preset gray value.

Optionally, before the pixel values of the pixels in the gray scale image are respectively projected in the lateral direction and the longitudinal direction, the method further includes:

Converting the gray level image into a binary image based on a preset gray level value threshold;

And calling an expansion erosion algorithm to remove the text image area in the binary image.

Optionally, the converting the gray scale map into a binary image based on a preset gray scale value threshold includes:

Setting a gray value larger than a preset gray value threshold in the gray map as a maximum gray value supported by the gray map, and setting a gray value smaller than the preset gray value threshold in the gray map as a minimum gray value supported by the gray map so as to convert the gray map into a binary image.

Optionally, the determining the icon area in the gray scale map according to the projection result includes:

determining a region in the gray level map, in which the projection results in the transverse and longitudinal directions are smaller than a preset threshold value, as an icon region in the gray level map;

Predicting a missing icon area which actually exists an icon but is not determined to be an icon area in the gray level map based on the size and distribution rule of each subarea of the icon area;

and adding the predicted missing icon area to the icon area.

Optionally, the performing image recognition on the extracted icon to be recognized to obtain a corresponding recognition result includes:

extracting features of the extracted icons to be identified to obtain feature vectors of the icons to be identified;

Respectively calculating the similarity between the feature vector of each reference icon in the icon library and the feature vector of the icon to be identified, wherein the reference icons in the icon library are marked with identification tags in advance;

and determining a target reference icon from the reference icons based on the calculated similarity, and determining an identification tag corresponding to the target reference icon as an identification result of the icon to be identified.

Optionally, the icon to be identified includes texture information; the feature vector comprises a bag of words vector; the feature extraction is performed on the obtained icon to be identified to obtain a feature vector of the icon to be identified, which comprises the following steps:

Extracting local feature points of the icon to be identified based on the texture information of the icon to be identified;

Matching the local feature points of the icon to be identified with a pre-trained word bag, and generating a word bag vector of the icon to be identified according to a matching result; wherein the pre-trained word bag comprises: and clustering the local feature points of each reference icon in the reference icon library to generate a word bag.

Optionally, the method further comprises:

And matching the feature points of each reference icon with the word bags, and generating word bag vectors of each reference icon according to the matching result.

Optionally, the determining, based on the calculated similarity, a target reference icon from the reference icons includes:

sequencing each reference icon from high to low based on the calculated similarity;

respectively carrying out homography verification on a plurality of first reference icons in the sequence obtained by sequencing and the icons to be identified;

and determining the reference icon with the highest homography check score as a target reference icon.

Optionally, the feature vector further includes a color histogram component;

the feature extraction is performed on the obtained icon to be identified to obtain a feature vector of the icon to be identified, which comprises the following steps:

Counting pixel values of pixels of the icon to be identified to obtain a color histogram of the icon to be identified;

And generating a color histogram component of the feature vector of the icon to be identified according to the color histogram.

According to a second aspect of the embodiments of the present disclosure, there is provided an image recognition apparatus including:

The device comprises an acquisition module, a display module and a display module, wherein the acquisition module is configured to acquire an original image and a gray level image corresponding to the original image, and the original image contains an icon to be identified;

The separation module is configured to separate an icon foreground region from a background region in the original image, and set a gray value of the background region corresponding to the gray map as a preset gray value;

The projection module is configured to respectively carry out transverse projection and longitudinal projection on pixel values of pixels in the gray level image, and an icon area in the gray level image is determined according to a projection result;

an extracting module configured to extract the icon to be identified from a target area corresponding to the icon area in the original image;

The identification module is configured to carry out image identification on the extracted icon to be identified to obtain an icon identification result.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, including:

A processor;

A memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image recognition method according to the embodiment of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, a computer readable storage medium is presented, which when executed by a processor of an electronic device, enables the electronic device to perform the image recognition method according to the embodiments of the first aspect described above.

According to a fourth aspect of embodiments of the present disclosure, a computer program product is provided, comprising a computer program, which when executed by a processor implements the image recognition method according to the embodiments of the first aspect described above.

In the technical scheme, on one hand, because the neural network with larger energy consumption is not used for extracting the icons to be identified, the consumption of the equipment performance can be reduced, and the speed of extracting the icons to be identified is improved;

On the other hand, before the horizontal projection and the vertical projection are performed on the gray level image corresponding to the original image, the gray level value of the background area of the gray level image is set to be the preset gray level value, so that the characteristics of the corresponding icon area in the projection result are more obvious, the determined icon area is more accurate, and the accuracy of extracting the icon to be identified can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles and not to limit the disclosure unduly.

FIG. 1 is a schematic diagram of an original picture shown in accordance with an embodiment of the present disclosure;

FIG. 2 is a flow chart of an image recognition method shown in accordance with an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating one determination of icon regions by projection according to an embodiment of the present disclosure;

FIG. 4 is a schematic block diagram of an image recognition device shown in accordance with an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device shown in accordance with an embodiment of the present disclosure.

Detailed Description

In order to better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present disclosure. It will be apparent that the described embodiments are only some embodiments and not all embodiments. All other embodiments, which may be made by one of ordinary skill in the art based on one or more embodiments of the present disclosure without undue burden, are intended to be within the scope of the present disclosure.

When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of systems and methods that are consistent with some aspects of the present disclosure, as detailed in the accompanying claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

Referring to fig. 1, fig. 1 is a schematic diagram of an original picture according to an embodiment of the disclosure; in this example, the original picture is a screenshot of a smart phone, and the main content of the original picture includes a page of "subscription column of Ta", and a picture-text list, that is, a corresponding relationship list of "picture a" (specific pattern is not shown) and "title a" is shown in the page. Assuming that user a wishes to share the four subscribed ABCD columns to user b, it is highly likely that the above screen shots will be sent to user b, who needs to extract and identify the information contained in the above screen shots.

Based on the above, the disclosure proposes a technical scheme based on a conventional image processing algorithm, in which an icon region is determined from a gray scale image corresponding to an original image including an icon to be identified, and then the icon to be identified is extracted from a target region corresponding to the icon region in the original image and identified.

When the method is realized, an icon foreground region and a background region of an original image can be separated, a gray value of the background region of a gray image corresponding to the original image is set to be a preset gray value, then transverse projection and longitudinal projection are carried out on the gray image, and the icon region in the gray image is determined according to the projection result; after determining the icon area, the icon to be identified can be extracted from the corresponding position of the original image and identified.

The following describes a technical scheme through a specific embodiment and in combination with a specific application scenario.

Referring to fig. 2, fig. 2 is a flowchart of an image recognition method according to an embodiment of the disclosure, and the method may include steps S201 to S205 as follows:

s201, acquiring an original image and a gray level image corresponding to the original image, wherein the original image comprises an icon to be identified.

In this example, the device may first acquire an original image containing an icon to be identified, and a grayscale image corresponding to the original image; the icon to be identified can be any content icon to be identified, and the original image can be a complete image containing the icon to be identified; for example, the original image may be in a form as shown in fig. 1, and then, the images a, B, C and D in fig. 1 may all be used as icons to be identified; for another example, the original image may be a photograph of a street billboard that is taken by the user at hand, and the icon to be identified may be a trademark appearing in the photograph.

It can be appreciated that the specific manner of acquiring the gray scale image corresponding to the original image can be selected according to specific requirements; for example, for a color original image of three RGB channels, a pixel value of any one of the three RGB color channels can be taken as a gray value of a corresponding gray map, so as to generate the corresponding gray map; the maximum value of the three-component brightness in the color image can be used as the gray value of the gray map to generate a corresponding gray map; the pixel values of the RGB three color channels can be weighted and averaged to obtain corresponding gray values, and a corresponding gray map is generated; etc. The graying algorithm can be selected by the person skilled in the art according to specific requirements.

S202, separating an icon foreground region from a background region in the original image, and setting a gray value of the background region corresponding to the gray map as a preset gray value.

In this example, the device may separate the foreground region and the background region of the icon in the original image, find the corresponding background region in the gray scale map according to the background region in the original image, and set the gray scale value of the corresponding background region in the gray scale map to a preset gray scale value. Specifically, the specific mode of separating the foreground region and the background region of the icon can be achieved through algorithms such as contour detection based on morphology or through pixel value statistics based on a statistical rule; for example, the icon to be detected usually has a regular geometric shape, so that after the original image is subjected to contour detection, the foreground region and the background region of the icon surrounded by the specific geometric shape can be separated by identifying the shape of the contour.

In an embodiment, the gray values of the pixels in the gray map may be counted first to obtain the gray value with the highest frequency of occurrence in the gray map, and then the gray value with the highest frequency of occurrence may be set as the preset gray value. It will be appreciated that, since the background may contain a plurality of adjacent gray levels, the gray level value with the highest occurrence frequency may be the highest gray level value; for example, the background of an original picture is a gradual light gray, the gray values are distributed between 220 and 225, and if the corresponding statistical histogram is drawn, the peak of the gray value occurrence frequency between 220 and 225 can be found.

The preset gray value for resetting the gray value with the highest occurrence frequency can be selected according to specific requirements; for example, the gray level corresponding to the original image is an 8-bit gray level, which includes the light background and the dark icon to be identified, and the preset gray level can be taken as the highest gray level under the 255-bit bitmap, so that the gray level is directly changed into the white background, the contrast between the foreground and the background is obviously enhanced, and the subsequent processing is facilitated.

Because the background area is larger than the icon foreground area and the content is more single, the area corresponding to the gray value with the highest occurrence frequency has higher probability of being the background area which needs to be separated from the icon foreground area; by adopting the method for removing the background, statistics can be completed by traversing the pictures once, so that separation of the foreground region and the background region of the icon can be completed under the condition of consuming less system resources.

In an embodiment, the device may further convert the gray-scale image into a binary image based on a preset gray-scale value threshold; and then calling an expansion erosion algorithm to remove the text image area in the binary image. Specifically, the expansion erosion algorithm is also called an expansion erosion algorithm, and is an algorithm capable of morphologically changing the position of a black-and-white boundary in a picture; in the implementation process, the gray level image can be subjected to binarization processing based on a preset gray level value threshold value to obtain a corresponding binary image; and then calling an expansion erosion algorithm to remove the text image area in the binary image. It will be understood, of course, that image elements such as notification bar icons, border lines, etc. may also be removed as a result of morphological proximity to the text.

When the expansion erosion algorithm is specifically applied to remove the character image area in the original picture, the maximum stroke width of the interference character in the character image area can be used as the processing width of the expansion erosion algorithm. Since the dilation-erosion algorithm may have an irreversible effect on the morphology of the picture after use, the pixel width to be processed should not be too large, but should be determined on demand. The maximum stroke width of the interference characters is used as the processing width of the expansion erosion algorithm, so that not only can all the interference characters be eroded by the algorithm, but also excessive erosion and expansion can not be caused, the damage of the expansion erosion algorithm to the gray level diagram can be reduced to a certain extent, and the accuracy of acquiring the position of the icon to be identified in the original picture based on the gray level diagram is improved.

By applying the scheme, the text image area which is previously and incorrectly identified as the icon foreground area due to the fact that the gray level is not the same as the background in the gray level image can be removed, so that the possible interference caused by the text image area during the subsequent gray level value projection is eliminated, and the accuracy of determining the icon area in a subsequent gray level value projection mode is improved.

In an embodiment, when the gray scale map is converted into the binary image, specifically, a gray scale value greater than a preset gray scale threshold in the gray scale map may be set as a maximum gray scale value supported by the gray scale map, and a gray scale value smaller than the preset gray scale threshold in the gray scale map may be set as a minimum gray scale value supported by the gray scale map, so that the gray scale map may be converted into the binary image. For example, the above gray-scale image is an 8-bit gray-scale image, wherein the preset gray-scale value threshold is 192, the gray-scale value of the pixel with the gray-scale value greater than 192 may be set to 255, and the gray-scale value of the pixel with the gray-scale value less than 192 may be set to 0. By adopting the method, the contrast ratio of the gray level image can be further improved, so that the expansion erosion algorithm is favorably executed, and the accuracy of determining the icon area in a subsequent gray level value projection mode is also favorably improved.

It will be appreciated that in addition to the above manner, the gray scale map may be converted into a binary image by other strategies; for example, it is possible to directly hold the gray value of the pixel whose gray value is smaller than the preset gray value threshold without changing, and adjust only the pixel higher than the preset gray value threshold; it is also possible to maintain the gray value of the pixel having the gray value higher than the preset gray value threshold value unchanged, adjust the pixel having the gray value lower than the preset gray value threshold value, and so on. The binarization strategy can be selected by a person skilled in the art according to specific requirements.

S203, respectively carrying out horizontal projection and vertical projection on pixel values of pixels in the gray level image, and determining an icon area in the gray level image according to a projection result;

In this example, the device may respectively perform a lateral projection and a longitudinal projection on the pixel values of the pixels in the gray scale map, and determine an icon area in the gray scale map according to the result of the projection; referring to fig. 3, fig. 3 is a schematic view illustrating determination of an icon area by projection according to an embodiment of the present disclosure; it can be seen that, compared with the original picture shown in fig. 1, in the picture shown in fig. 3, a large light gray page background is recoloured to be white through statistics of gray values, an icon part to be identified is directly recoloured to be pure black through a binarization method, and lines, characters and icons in the picture are removed through an expansion erosion algorithm; after the processing, the black areas corresponding to the 1 st, 2 nd and 4 th icons to be identified from top to bottom are obvious under a large white background. It will be understood, of course, that the black and white may be opposite, with black as the ground color and white as the foreground color, and the disclosure need not be limited in detail.

Specifically, a specific algorithm for determining the icon area according to the projection result can be selected according to specific situations; taking the original picture after preprocessing as shown in fig. 3 as an example, in the image, since the position of the icon to be identified is pure black, and other positions are pure white, if the gray value of the picture is projected in the horizontal and vertical directions, the position of the icon to be identified can be determined through the peak and the valley of the statistical result obtained by projection; for example, assuming that the picture is an 8bit gray scale, the gray scale value of white is 255, and the presence of the black portion pulls down the result of gray scale projection to less than 255; and the position information of the icon to be identified in the original image can be obtained only by combining the sections with the gray value projection result pulled down on the horizontal axis and the vertical axis.

For example, for a gray picture with lower contrast, if a transition region exists between the gray values of the icon region and the non-icon region, a gradient threshold method may be adopted, where the gradient of the projection result is greater than a preset threshold, and there is a higher probability that the position is a boundary position between the icon region and the non-icon region, and combining a plurality of boundary positions, so as to determine the icon region in the gray picture.

Similarly to the above-mentioned idea, in the original picture after preprocessing as shown in fig. 3, since only black and white are bright, it can be regarded as a 1-bit 01 binarized picture, and then the picture further generates a gradient image, a bright line outlining the outline of the position of the icon to be recognized appears, and based on the outline, the acquisition of the position information of the icon to be recognized in the original image can be further completed.

In an embodiment, an area in the gray scale map, where the result of the projection in the lateral direction and the result of the projection in the longitudinal direction are smaller than the preset threshold, may be determined as an icon area in the gray scale map. For example, assuming that the preset gray value is 255 when the gray value is reset to the background area, it means that the background area is entirely recoloured to white, and the gray value of the foreground area of the icon is generally lower than 255; then, if such a gray-scale image is projected in the horizontal and vertical directions, the projection result of the foreground region of the corresponding icon will be lower than 255 of the background region, so a threshold value slightly lower than or equal to 255 can be taken, and the region where the horizontal and vertical projection results are both smaller than the preset threshold value can be determined as the icon region in the gray-scale image.

Further, the device may predict a missing icon area in the gray map, in which an icon actually exists but is not determined as an icon area, based on a size and a distribution rule of each sub-area of the icon area, and add the predicted missing icon area to the icon area. It can be understood that an error may occur when the foreground region and the background region of the icon are separated, for example, if a certain icon to be identified is also light-colored in the case that the original image has a light-colored background, then when the foreground region and the background region of the icon are separated, the light-colored icon to be identified is likely to be mistaken as the background region, which further results in that the region where the icon to be identified is located cannot be directly determined in the projection result. Please continue to refer to the position of the 3 rd icon from top to bottom outlined by the dotted line in fig. 3; in this example, the icon to be identified is not normally recoloured to black with other icons to be identified; in this case, the position of the 3 rd icon from top to bottom, i.e. the missing icon area, can be predicted according to the determined size and distribution rule of each sub-area of the icon area.

By applying the scheme, the problem that the area where the icon to be identified is located cannot be directly determined in the projection result due to the fact that the icon to be identified is mistaken as the background area can be solved to a certain extent, and the omission ratio in the process of extracting the icon to be identified can be reduced.

S204, extracting the icon to be identified from the target area corresponding to the icon area in the original image.

In this example, after determining the icon area in the gray-scale image, the icon to be identified may be extracted from the target area corresponding to the icon area in the original image. The specific software implementation method for icon extraction is not further limited in this disclosure, and can be determined by one skilled in the art according to requirements.

S205, performing image recognition on the extracted icon to be recognized to obtain an icon recognition result.

In this example, after the icon to be identified is extracted from the original image, image identification may be further performed on the extracted icon to be identified, so as to obtain an icon identification result. Specifically, the specific mode of image recognition may be that the marked reference icon is used for comparison recognition; for example, the image of FIG. A may have been previously marked with the identification tag "a column", or the reference icon of trademark e may have been marked with the identification tag "brand e", or the like. It will be appreciated that the above columns, brands, etc. are examples for ease of description only, and that one skilled in the art may apply the above schemes to other scenarios according to specific needs.

It is also possible that the image recognition may be artificial intelligence recognition using a neural network; as described above, the traditional image processing mode with lower energy consumption is adopted in the image extraction stage, so that more abundant system resources can be reserved for the icon recognition stage, and an image recognition algorithm with good effects such as artificial intelligence and more system resource consumption can be conveniently adopted. Those skilled in the art can automatically balance the advantages and disadvantages according to specific business requirements and system resource margins, and select specific image recognition algorithms.

In an embodiment, feature extraction may be performed on the extracted icon to be identified first to obtain a feature vector of the icon to be identified; respectively calculating the similarity between the feature vector of each reference icon in the icon library and the feature vector of the icon to be identified, wherein the reference icons in the icon library can be marked with identification tags in advance; specifically, the similarity between vectors may be measured by using the included angle between the vectors, or may be measured by using the euclidean distance or manhattan distance between the vectors, and the specific similarity criteria may be determined by those skilled in the art, which is not defined in detail in this disclosure.

After the similarity is obtained, a target reference icon is determined from the reference icons according to the similarity, and an identification tag corresponding to the target reference icon is determined as an identification result of the icon to be identified. By applying the scheme, on one hand, since the feature vectors can generally describe the features of the icons, if the feature vectors of the two images are similar, the similar features of the two images can be indirectly inferred, so that the identification tags corresponding to the two images can be judged to be the same with high probability, and therefore, the target reference icon is determined according to the similarity of the feature vectors, and the identification success rate can be improved; on the other hand, compared with a scheme of directly calculating the similarity between the icon to be identified and each reference icon, the similarity of the feature vector is used for determining the target reference icon, so that the consumption of system resources in the process of determining the target reference icon can be remarkably reduced.

In an embodiment, the icon to be identified may include texture information therein; for the icons to be identified, local feature points such as ORB feature points can be extracted firstly, and then corresponding feature vectors are generated based on pre-trained word bags; when the feature vector of the icon to be identified is generated, firstly, extracting local feature points of the icon to be identified based on texture information of the icon to be identified, then matching the local feature points of the icon to be identified with a word bag trained in advance, and generating the word bag vector of the icon to be identified according to a matching result; that is, in this case, the generated bag-of-words vector may be a feature vector; the pre-trained word bag may be a word bag generated by clustering local feature points of each reference icon in the reference icon library by using a clustering algorithm such as K-means.

For example, if the local feature points of the reference icons in the reference icon library are clustered by using a clustering algorithm such as K-means, a 6-dimensional word bag is generated, wherein 6 dimensions respectively correspond to the classes of the clustered local feature points in 6 of A, B, C, D, E and F; if a to-be-identified icon has A, C, D, E kinds of local feature points, the bag-of-word vector is marked as (1,0,1,1,1,0); if a reference icon in the icon library has A, B, C, D, E local feature points, the bag-of-words vector can be noted as (1,1,1,1,1,0).

By adopting the scheme, a plurality of feature points can be classified into a plurality of word bag categories in a clustering mode, so that the dimension of the feature vector can be reduced on the premise of ensuring that the semantics of the feature points are not lost, and the speed of feature comparison calculation can be improved.

In an embodiment, since the feature vectors of the reference icons need to be invoked when the similarity between the feature vectors is calculated, after the device generates the bag of words by clustering, the device may further match the feature points of the reference icons with the bag of words, so as to generate the bag of words vector of each reference icon according to the matching result. It can be understood that the above process of generating the bag-of-words vector of each reference icon may be completed in advance before the whole scheme starts, or may be temporarily generated when the bag-of-words vector of the reference icon is needed, or may be re-clustered, generated, and regenerated when a new reference icon is added to the icon library each time. Those skilled in the art can select the time of generating the bag of words vector of each reference icon specifically according to the specific business requirements.

In an embodiment, considering that the feature vector generated by the feature points may not be absolutely reliable, the step of secondary verification may be further added on the basis of the foregoing scheme; specifically, each reference icon can be sequenced from high to low based on the calculated similarity, and then homography verification is carried out on the first plurality of reference icons in the sequenced sequence and the icons to be identified respectively; and then, the reference icon with the highest homography check score can be determined to be the target reference icon. Homography (Homography) checking generally refers to checking whether two pictures can be obtained through a homography matrix transformation; specifically, the homography matrix can be used for expressing the mapping relation between two planes, wherein the transformation such as stretching, twisting and the like can be embodied in matrix parameters; the higher the score of the homography check, the higher the probability that the content contained in the two icons to be checked is the same. By applying the scheme, the determined target reference icon is closer to the icon to be identified, so that the obtained identification result is more accurate.

The specific flow of homography checking can be adaptively adjusted according to requirements; in general, a homography matrix may be calculated from 4 pairs of feature points, and then the homography matrix is used to verify other feature points, and the more pairs of feature points, the higher the homography verification score may be considered. It is to be understood that, in addition to this, those skilled in the art may design other scoring mechanisms for homography using homography matrix principles, and this disclosure need not be fully set forth.

It can be appreciated that generating the feature vector based on the feature points is only one possible way, and in addition, the feature vector may be generated based on data such as statistics of pixel values in the picture; the method for generating the feature vector can be determined by a person skilled in the art according to specific requirements.

In an embodiment, the statistical result of the pixel value of the image can be used to generate a feature vector to adapt to the situation that the feature point is difficult to extract; specifically, if the icon to be identified is a texture-free image, it is very likely that a considerable number of feature points cannot be extracted according to a conventional method, and thus, a corresponding feature vector cannot be generated by using the feature points, so that in this case, a statistical rule, for example, a color histogram, of pixel values of the icon to be identified can be used as a component of the corresponding feature vector, and feature vector comparison can be performed with a reference icon that also generates the feature vector in this way, and an identification result of the icon to be identified can also be finally obtained.

For example, if the peak-to-valley trend of the color histogram of the icon to be identified is higher than the color histogram of a certain reference icon, it means that the two have the same or similar pixel value distribution rule, which can result in higher similarity between the feature vectors of the two, so that it can be determined with higher probability that the identification result of the icon to be identified is the identification label corresponding to the reference icon.

By applying the scheme, the recognition task can still be completed under the condition that enough characteristic points of the icon to be recognized cannot be extracted, and the compatibility of the scheme is obviously improved; it can be understood, of course, that for the icon to be identified that can extract the feature points, the accuracy of the identification result can be further improved by combining the above scheme.

The above is all embodiments of the present disclosure directed to the image recognition method. The present disclosure also provides embodiments of a corresponding image recognition apparatus as follows:

Referring to fig. 4, fig. 4 is a block diagram illustrating a structure of an image recognition apparatus according to an embodiment of the present disclosure, the apparatus may include:

The obtaining module 401 may be configured to obtain an original image and a gray scale image corresponding to the original image, where the original image includes an icon to be identified;

A separation module 402, configured to separate an icon foreground region from a background region in the original image, and set a gray value of the background region corresponding to the gray map to a preset gray value;

a projection module 403, configured to respectively perform a horizontal projection and a vertical projection on pixel values of pixels in the gray scale map, and determine an icon area in the gray scale map according to a projection result;

an extracting module 404 configured to extract the icon to be identified from a target area corresponding to the icon area in the original image;

the recognition module 405 is configured to perform image recognition on the extracted icon to be recognized, so as to obtain an icon recognition result.

In an embodiment, the separation module 402 may be configured to firstly count the gray values of the pixels in the gray map to obtain the gray value with the highest occurrence frequency in the gray map, and then set the gray value with the highest occurrence frequency as the preset gray value, because the background area is larger than the foreground area of the icon and the content is more single, the area corresponding to the gray value with the highest occurrence frequency has a higher probability of being the background area that needs to be separated from the foreground area of the icon; by adopting the method for removing the background, statistics can be completed by traversing the pictures once, so that separation of the foreground region and the background region of the icon can be completed under the condition of consuming less system resources.

In an embodiment, the apparatus may further include a binary processing module, which may be configured to convert the gray scale map into a binary image based on a preset gray scale value threshold; and then calling an expansion erosion algorithm to remove the text image area in the binary image. Specifically, the expansion erosion algorithm is also called an expansion erosion algorithm, and is an algorithm capable of morphologically changing the position of a black-and-white boundary in a picture; in the implementation process, the gray level image can be subjected to binarization processing based on a preset gray level value threshold value to obtain a corresponding binary image; and then calling an expansion erosion algorithm to remove the text image area in the binary image. It will be understood, of course, that image elements such as notification bar icons, border lines, etc. may also be removed as a result of morphological proximity to the text.

In an embodiment, the binary processing module may be further configured to set a gray value greater than a preset gray value threshold in the gray map to a maximum gray value supported by the gray map, and set a gray value less than the preset gray value threshold in the gray map to a minimum gray value supported by the gray map, so as to convert the gray map into the binary image. For example, the above gray-scale image is an 8-bit gray-scale image, wherein the preset gray-scale value threshold is 192, the gray-scale value of the pixel with the gray-scale value greater than 192 may be set to 255, and the gray-scale value of the pixel with the gray-scale value less than 192 may be set to 0. By adopting the method, the contrast ratio of the gray level image can be further improved, so that the expansion erosion algorithm is favorably executed, and the accuracy of determining the icon area in a subsequent gray level value projection mode is also favorably improved.

In an embodiment, the projection module 403 may be further configured to determine, as the icon area in the gray-scale image, an area in which the result of the horizontal and vertical projection is smaller than a preset threshold in the gray-scale image. For example, assuming that the preset gray value is 255 when the gray value is reset to the background area, it means that the background area is entirely recoloured to white, and the gray value of the foreground area of the icon is generally lower than 255; then, if such a gray-scale image is projected in the horizontal and vertical directions, the projection result of the foreground region of the corresponding icon will be lower than 255 of the background region, so a threshold value slightly lower than or equal to 255 can be taken, and the region where the horizontal and vertical projection results are both smaller than the preset threshold value can be determined as the icon region in the gray-scale image.

The projection module 403 may be further configured to predict a missing icon area in the gray-scale map, in which an icon is actually present but is not determined as an icon area, based on a size and a distribution rule of each sub-area of the icon area, and add the predicted missing icon area to the icon area. It can be understood that an error may occur when the foreground region and the background region of the icon are separated, for example, if a certain icon to be identified is also light-colored in the case that the original image has a light-colored background, then when the foreground region and the background region of the icon are separated, the light-colored icon to be identified is likely to be mistaken as the background region, which further results in that the region where the icon to be identified is located cannot be directly determined in the projection result.

In an embodiment, the above-mentioned identification module may include the following three sub-modules: the extracting sub-module can be configured to extract the features of the extracted icons to be identified to obtain feature vectors of the icons to be identified; the computing sub-module can be configured to respectively compute the similarity between the feature vector of each reference icon in the icon library and the feature vector of the icon to be identified, wherein the reference icons in the icon library can be pre-marked with identification tags; the determining submodule can be configured to determine a target reference icon from the reference icons according to the similarity, and determine an identification tag corresponding to the target reference icon as an identification result of the icon to be identified; specifically, the similarity between vectors may be measured by using the included angle between the vectors, or may be measured by using the euclidean distance or manhattan distance between the vectors, and the specific similarity criteria may be determined by those skilled in the art, which is not defined in detail in this disclosure.

By applying the scheme, on one hand, since the feature vectors can generally describe the features of the icons, if the feature vectors of the two images are similar, the similar features of the two images can be indirectly inferred, so that the identification tags corresponding to the two images can be judged to be the same with high probability, and therefore, the target reference icon is determined according to the similarity of the feature vectors, and the identification success rate can be improved; on the other hand, compared with a scheme of directly calculating the similarity between the icon to be identified and each reference icon, the similarity of the feature vector is used for determining the target reference icon, so that the consumption of system resources in the process of determining the target reference icon can be remarkably reduced.

In one embodiment, the icon to be identified may include texture information therein; for such icons to be identified, the extraction submodule may be further configured to extract local feature points such as ORB feature points first, and then generate corresponding feature vectors based on pre-trained word bags; when the feature vector of the icon to be identified is generated, firstly, extracting local feature points of the icon to be identified based on texture information of the icon to be identified, then matching the local feature points of the icon to be identified with a word bag trained in advance, and generating the word bag vector of the icon to be identified according to a matching result; that is, in this case, the generated bag-of-words vector may be a feature vector; the pre-trained word bag may be a word bag generated by clustering local feature points of each reference icon in the reference icon library by using a clustering algorithm such as K-means.

In an embodiment, since the feature vectors of the reference icons need to be invoked when the similarity between the feature vectors is calculated, the recognition module 405 may further include a generation sub-module, which may be configured to further match the feature points of the reference icons with the word bags after the word bags are generated by clustering, so as to generate the word bag vector of the reference icons according to the matching result. Those skilled in the art can select the time of generating the bag of words vector of each reference icon specifically according to the specific business requirements.

In an embodiment, considering that the feature vector generated by the feature points may not be absolutely reliable, the function of secondary verification may be further added on the basis of the determination submodule; specifically, the determining submodule may be configured to sort the reference icons from high to low based on the calculated similarity, and then perform homography checking on the first several reference icons in the sorted sequence and the icons to be identified respectively; and then, the reference icon with the highest homography check score can be determined to be the target reference icon. By applying the scheme, the determined target reference icon is closer to the icon to be identified, so that the obtained identification result is more accurate.

In an embodiment, the statistical result of the pixel value of the image can be used to generate a feature vector to adapt to the situation that the feature point is difficult to extract; specifically, if the icon to be identified is a texture-free image, it is highly likely that a considerable number of feature points cannot be extracted according to a conventional method, and thus a corresponding feature vector cannot be generated by using the feature points, so in this case, the extraction submodule may be further configured to count pixel values of pixels of the icon to be identified to obtain a color histogram of the icon to be identified; and generating a color histogram component of the feature vector of the icon to be identified according to the color histogram. The statistical rule, such as a color histogram, of the pixel values of the icons to be identified is used as one component of the corresponding feature vector, and the feature vector is compared with the reference icon of the feature vector generated in the same way, so that the identification result of the icons to be identified can be finally obtained.

The specific implementation of each module in the apparatus in the foregoing embodiments has been described in detail in describing the embodiments of the corresponding method, and will not be described in detail herein.

The embodiment of the disclosure also proposes an electronic device, including:

A processor;

A memory for storing the processor-executable instructions;

Wherein the processor is configured to execute the instructions to implement the image recognition method as described in any of the embodiments above.

Embodiments of the present disclosure also provide a computer-readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the image recognition method described in any one of the above embodiments.

Embodiments of the present disclosure also provide a computer program product comprising a computer program which, when executed by a processor, implements the image recognition method according to any of the above embodiments.

Fig. 5 is a schematic block diagram of an electronic device shown in accordance with an embodiment of the present disclosure. Referring to fig. 5, an electronic device 500 may include one or more of the following components: a processing component 502, a memory 504, a power supply component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 518.

The processing component 502 generally controls overall operation of the electronic device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 502 may include one or more processors 520 to execute instructions to perform all or part of the steps of the image recognition method described above. Further, the processing component 502 can include one or more modules that facilitate interactions between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operations at the electronic device 500. Examples of such data include instructions for any application or method operating on the electronic device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 506 provides power to the various components of the electronic device 500. The power components 506 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 500.

The multimedia component 508 includes a screen that provides an output interface between the electronic device 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front-facing camera and/or a rear-facing camera. When the electronic device 500 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed or optical lens system with focal length and optical zoom capabilities.

The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 500 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 504 or transmitted via the communication component 518. In some embodiments, the audio component 510 further comprises a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 514 includes one or more sensors for providing status assessment of various aspects of the electronic device 500. For example, the sensor assembly 514 may detect an on/off state of the electronic device 500, a relative positioning of components such as a display and keypad of the electronic device 500, a change in position of the electronic device 500 or a component of the electronic device 500, the presence or absence of a user's contact with the electronic device 500, an orientation or acceleration/deceleration of the electronic device 500, and a change in temperature of the electronic device 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 518 is configured to facilitate communication between the electronic device 500 and other devices, either wired or wireless. The electronic device 500 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 518 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 518 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an embodiment of the present disclosure, the electronic device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the image recognition methods described above.

In an embodiment of the present disclosure, a computer-readable storage medium is also provided, such as memory 504, including instructions executable by processor 520 of electronic device 500 to perform the image recognition method described above. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

It is noted that in this disclosure relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined the detailed description of the method and apparatus provided by the embodiments of the present disclosure, and the detailed description of the principles and embodiments of the present disclosure has been provided herein with the application of the specific examples, the above examples being provided only to facilitate the understanding of the method of the present disclosure and its core ideas; meanwhile, as one of ordinary skill in the art will have variations in the detailed description and the application scope in light of the ideas of the present disclosure, the present disclosure should not be construed as being limited to the above description.

Claims

1. An image recognition method, the method comprising:

Converting the gray level image into a binary image based on a preset gray level value threshold, respectively carrying out transverse projection and longitudinal projection on pixel values of pixels in the gray level image converted into the binary image, and determining an icon area in the gray level image according to a projection result;

performing image recognition on the extracted icon to be recognized to obtain an icon recognition result;

The determining the icon area in the gray scale image according to the projection result comprises the following steps: and determining the areas, in the gray level diagram, of which the projection results in the transverse and longitudinal directions are smaller than a preset threshold value, as icon areas in the gray level diagram.

2. The method of claim 1, wherein separating the foreground region from the background region of the icon in the grayscale map and setting the grayscale value of the background region to a preset grayscale value comprises:

3. The method of claim 2, wherein prior to said projecting pixel values of pixels in said gray scale map laterally and longitudinally, respectively, the method further comprises:

4. A method according to claim 3, wherein said converting the gray scale map into a binary image based on a preset gray scale value threshold value comprises:

5. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The determining the icon area in the gray scale image according to the projection result comprises the following steps:

and adding the predicted missing icon area to the icon area.

6. The method according to claim 1, wherein the performing image recognition on the extracted icon to be recognized to obtain a corresponding recognition result includes:

7. The method of claim 6, wherein the icon to be identified comprises texture information; the feature vector comprises a bag of words vector; the feature extraction is carried out on the extracted icon to be identified to obtain a feature vector of the icon to be identified, and the feature vector comprises the following components:

8. The method of claim 7, wherein the method further comprises:

9. The method of claim 6, wherein the step of providing the first layer comprises,

The determining a target reference icon from the reference icons based on the calculated similarity comprises:

10. The method of claim 6, wherein the feature vector further comprises a color histogram component; the feature extraction is carried out on the extracted icon to be identified to obtain a feature vector of the icon to be identified, and the feature vector comprises the following components:

11. An image recognition apparatus, the apparatus comprising:

The projection module is configured to convert the gray level image into a binary image based on a preset gray level value threshold, respectively carry out transverse projection and longitudinal projection on pixel values of pixels in the gray level image converted into the binary image, and determine an icon area in the gray level image according to a projection result;

The identification module is configured to carry out image identification on the extracted icon to be identified to obtain an icon identification result;

12. The apparatus of claim 11, wherein the separation module is further configured to:

13. The apparatus of claim 12, further comprising a binary processing module configured to:

14. The apparatus of claim 13, wherein the binary processing module is further configured to:

15. The apparatus of claim 11, wherein the projection module is further configured to:

and adding the predicted missing icon area to the icon area.

16. The apparatus of claim 11, wherein the identification module comprises:

the extraction sub-module is configured to extract the characteristics of the extracted icons to be identified to obtain the characteristic vectors of the icons to be identified;

The computing sub-module is configured to respectively compute the similarity between the feature vector of each reference icon in the icon library and the feature vector of the icon to be identified, wherein the reference icons in the icon library are all marked with identification tags in advance;

And the determining submodule is configured to determine a target reference icon from the reference icons based on the calculated similarity, and determine an identification tag corresponding to the target reference icon as an identification result of the icon to be identified.

17. The apparatus of claim 16, wherein the icon to be identified comprises texture information; the feature vector comprises a bag of words vector; the extraction sub-module is further configured to:

18. The apparatus of claim 17, wherein the identification module further comprises:

And the generating sub-module is configured to match the feature points of each reference icon with the word bag and generate a word bag vector of each reference icon according to the matched result.

19. The apparatus of claim 16, wherein the determination submodule is further configured to:

20. The apparatus of claim 16, wherein the feature vector further comprises a color histogram component; the extraction sub-module is further configured to:

21. An electronic device, comprising:

A processor;

A memory for storing the processor-executable instructions;

Wherein the processor is configured to execute the instructions to implement the image recognition method of any one of claims 1 to 10.

22. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the image recognition method according to any one of claims 1-10.

23. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the image recognition method according to any one of claims 1-10.