CN106529380B

CN106529380B - Image recognition method and device

Info

Publication number: CN106529380B
Application number: CN201510587818.2A
Authority: CN
Inventors: 金炫
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-09-15
Filing date: 2015-09-15
Publication date: 2019-12-10
Anticipated expiration: 2035-09-15
Also published as: CN106529380A

Abstract

The application discloses an image identification method and device. Wherein, the method comprises the following steps: extracting a region characteristic value from the acquired image information according to the color channel to obtain a region characteristic value of each region corresponding to the color channel in the image information; screening each region in the image information according to the region characteristic value and a preset first threshold value, and determining a text region in the image information; performing character recognition on a text region in the image information to obtain field information corresponding to the text region; and matching the field information with a preset illegal keyword set to determine whether the image information is an illegal image containing illegal field information. The method and the device solve the technical problem that in the prior art, due to the influence of the background content of the image information, the accuracy rate of character information identification in the image information is low.

Description

Image recognition method and device

Technical Field

The application relates to the field of internet, in particular to an image identification method and device.

Background

along with the development of electronic commerce, illegal behaviors such as credit stir-frying, fraud and the like are still clever and propagate, and the false behavior violating the integrity seriously influences the healthy development of the electronic commerce. For example, in platforms such as Taobao and Techthy cat, the weight of using pictures as information carriers is increasing year by year in order to avoid the control of text rules. Compared with simple text information, the method has greater technical difficulty for extracting the text information embedded in the picture or even the video, and the greater technical difficulty mainly has the following two reasons: 1. diversity of picture styles; 2. diversity of text embedding styles.

in the prior art, the methods for identifying pictures with embedded characters are divided into two categories:

The first type is that the embedded characters are identified by a method of comparing graphs with graphs. The method comprises the steps of marking a certain number of picture samples with embedded characters, extracting image characteristics of the picture samples to serve as image fingerprints, and calculating the similarity between pictures and the picture samples by utilizing the image fingerprints to identify the pictures. Recognizing a picture with embedded characters by a graph-to-graph method is a simpler prevention and control method, but mainly has the following problems:

1. the picture is identified by a method of comparing pictures, the range of the identified picture is limited by the range of a picture sample library, and the method of adding the picture into the picture sample library can ensure the accuracy of an algorithm, the operation is complex and the timeliness cannot be ensured only by manually auditing the picture in a new character embedding mode.

2. In the picture with embedded characters, there are many pictures with the same background but different embedded characters, and such files are difficult to distinguish by the method of intention than the method of pictures, resulting in low recognition accuracy of recognized pictures.

the second category is the recognition of embedded characters by using a natural scene OCR method. The method for recognizing the character information in the picture by using the natural scene OCR technology is an effective method, but has the following problems:

1. Any picture can be selected as a background in the picture with the embedded characters, because the picture in some natural scenes usually contains inherent character contents, such as house numbers, car numbers and the like in the picture, while the ways of embedding 1 character are various, the embedded picture characters and the inherent character contents in the natural scenes of the picture are difficult to distinguish by the existing recognition method of natural scene OCR.

2. A picture including a natural scene has a certain directivity, and usually, the main axis direction of the text content inherent in the picture is the same as the main axis direction of the natural scene of the picture. However, characters for embedding in a picture can be embedded in the picture in various directions, and the characters embedded in the picture are distinguished by using a main axis alone, so that it is difficult to achieve a recognition effect.

3. In the picture of the embedded characters, the font size, the stroke width and the like of the embedded characters are various, so that the recognition engine trained by the labeled word stock cannot recognize the images.

4. The method for optimizing the text information by utilizing the semantic model is also difficult to meet the requirement of identifying the specific type of text pictures due to low accuracy.

5. Different natural scenes generally have specific requirements on a recognition algorithm, and the overall accuracy of the algorithm is difficult to guarantee by using a single natural scene OCR algorithm and a keyword matching algorithm.

in order to solve the problem that the accuracy of recognizing the character information in the image information is low due to the influence of the background content of the image information, an effective solution is not provided at present.

Content of application

the embodiment of the application provides an image identification method and device, which are used for at least solving the technical problem that in the prior art, due to the influence of background content of image information, the accuracy rate of identifying character information in the image information is low.

according to an aspect of an embodiment of the present application, there is provided an image recognition method, including: extracting a region characteristic value from the acquired image information according to the color channel to obtain a region characteristic value of each region corresponding to the color channel in the image information; screening each region in the image information according to the region characteristic value and a preset first threshold value, and determining a text region in the image information; performing character recognition on a text region in the image information to obtain field information corresponding to the text region; and matching the field information with a preset illegal keyword set to determine whether the image information is an illegal image containing illegal field information.

according to another aspect of the embodiments of the present application, there is also provided an image recognition apparatus, including: the first extraction module is used for extracting the region characteristic values from the acquired image information according to the color channels to obtain the region characteristic values of the regions corresponding to the color channels in the image information; the screening module is used for screening each region in the image information according to the region characteristic value and a preset first threshold value to determine a text region in the image information; the identification module is used for carrying out character identification on the text area in the image information to obtain field information corresponding to the text area; and the matching module is used for matching the field information with a preset illegal keyword set and determining whether the image information is an illegal image containing illegal field information.

Optionally, the apparatus further comprises: and the word segmentation module is used for splitting the field information according to a preset word segmentation rule to obtain a word segmentation set corresponding to the field information, wherein the word segmentation set is used for recording word segmentation entries obtained after the field information is split.

optionally, the matching module comprises: the first sub-acquisition module is used for acquiring a preset illegal keyword set; the second sub-acquisition module is used for acquiring the weight value corresponding to each illegal keyword in the illegal keyword set; the sub-matching module is used for matching the word segmentation entries in the word segmentation set with the illegal keywords in the illegal keyword set respectively to obtain matching results; the sub-operation module is used for carrying out weighted operation according to the weight value corresponding to the illegal keyword and the matching result to obtain the field weight of the field information; and the first comparison module is used for comparing the field weight with a preset second threshold value and determining whether the image information is the illegal image containing the illegal field.

optionally, when the color channel is a color channel, the first extracting module includes: and the sub-correction module is used for correcting the RGB color values of the pixels in the image information to obtain the preprocessing color values corresponding to the pixels, wherein the preprocessing color values comprise: red, green, blue and yellow values; the first sub-determination module is used for determining a first pixel characteristic value of the pixel point in the color channel according to the preprocessing color value and the position information of the pixel point, wherein the first pixel characteristic value corresponds to the color channel; and the first sub-extraction module is used for extracting the image information according to the first pixel characteristic value to obtain the region characteristic value corresponding to the color channel.

Optionally, when the color channel is a grayscale channel, the first extraction module further includes: the third sub-acquisition module is used for acquiring preset calculation parameters; the second sub-determination module is used for determining a second pixel characteristic value of the pixel point in the gray channel according to the RGB color value of the pixel point in the image information, the position information of the pixel point and the calculation parameter, wherein the second pixel characteristic value corresponds to the gray channel; and the second sub-extraction module is used for extracting the image information according to the second pixel characteristic value to obtain the region characteristic value corresponding to the gray channel.

Optionally, the screening module includes a sub-comparison module, configured to compare the region feature values corresponding to the color channels with the first threshold respectively to obtain comparison results; a third sub-determining module, configured to determine, according to the comparison result, the text region corresponding to the color channel in the image information.

Optionally, when at least two text regions are included in the image information, the apparatus further includes: the first acquisition module is used for acquiring the relative position information of the text region in the image information; the first determining module is used for determining color channel information corresponding to the text region according to the region characteristic value of the text region; the second determining module is used for determining the incidence relation between the text regions according to the relative position information and the color channel information; and the merging module is used for merging the text regions with the association relationship into a new text region.

optionally, the identification module comprises: the sub-processing module is used for performing projection processing on the text region according to a preset direction to obtain a projection result of the text region corresponding to the direction, wherein the projection processing is to perform statistical processing on the pixel density in each section in the text region along the preset direction to obtain a distribution result of the pixel density corresponding to the direction; the fourth sub-determination module is used for determining the character trend in the text area according to the projection result; the sub-segmentation module is used for sequentially segmenting the text region according to the character trend and the projection result to obtain sub-character regions, wherein each sub-character region comprises a character; the first sub-recognition module is used for recognizing the characters in the sub-character area through a character recognition engine and determining a recognition result corresponding to the sub-character area; and the sub-generation module is used for generating the field information according to the character trend and the identification result.

Optionally, the sub-identification module comprises: the second sub-recognition module is used for recognizing the sub-character region through the character recognition engine to obtain an initial recognition result corresponding to the sub-character region, wherein the initial recognition result at least comprises one alternative character and a confidence coefficient corresponding to the alternative character; and the fifth sub-determination module is used for determining the preset number of candidate characters with the highest confidence degree from the initial recognition result as the recognition result according to the confidence degree.

Optionally, the apparatus further comprises: the second acquisition module is used for acquiring a preset first image feature vector, wherein the first image feature vector is used for representing the image features of the specific type of image; the second extraction module is used for extracting the feature vector of the illegal image to obtain a second image feature vector; a second comparing module, configured to compare the second image feature vector with the first image feature vector, and determine an image type of the illegal image, where the image type at least includes: chat screenshots, non-chat screenshots.

optionally, the apparatus further comprises: the third acquisition module is used for acquiring the image resolution of the image information; the third comparison module is used for comparing the image resolution with a preset standard image resolution; and the scaling module is used for scaling the image information according to the standard image resolution in an equal proportion when the image resolution is not equal to the standard image resolution.

In the embodiment of the application, extracting the region characteristic value from the acquired image information according to the color channel to obtain the region characteristic value of each region corresponding to the color channel in the image information; screening each region in the image information according to the region characteristic value and a preset first threshold value, and determining a text region in the image information; performing character recognition on a text region in the image information to obtain field information corresponding to the text region; matching the field information with a preset illegal keyword set, determining whether the image information is an illegal image containing illegal field information, and extracting a region characteristic value from the obtained image information according to a color channel through a first extraction module to obtain the region characteristic value of each region corresponding to the color channel in the image information; the screening module is used for screening each region in the image information according to the region characteristic value and a preset first threshold value to determine a text region in the image information; the identification module is used for carrying out character identification on the text area in the image information to obtain field information corresponding to the text area; the matching module is used for matching the field information with a preset illegal keyword set to determine whether the image information is an illegal image containing illegal field information or not, so that the aim of identifying the character information in various scenes in the image information is fulfilled, the technical effect of improving the accuracy of identifying the character information in the image information is realized, and the technical problem of low accuracy of identifying the character information in the image information caused by the influence of background content of the image information in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

Fig. 1 is a block diagram of a hardware configuration of a mobile terminal according to an image recognition method according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for recognizing an image according to an embodiment of the present application;

FIG. 3 is a flow chart illustrating an alternative image recognition method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating an alternative image recognition method according to an embodiment of the present application;

Fig. 5 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an alternative image recognition apparatus according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an alternative image recognition apparatus according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of a first extraction module of an alternative image recognition apparatus according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram of a first extraction module of an alternative image recognition apparatus according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of an alternative image recognition apparatus according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of an alternative image recognition apparatus according to an embodiment of the present application;

FIG. 12 is a schematic structural diagram of an identification module of an alternative image identification apparatus according to an embodiment of the present application;

FIG. 13 is a schematic diagram of an alternative image recognition apparatus according to an embodiment of the present application; and

fig. 14 is a schematic structural diagram of an alternative image recognition apparatus according to an embodiment of the present application.

Detailed Description

in order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The terms referred to in the embodiments of the present application are first explained as follows:

color channel: each image information can have one or more color channels, each color channel stores color information in the image information, and the color of each pixel in the image information is generated by superposing and mixing the color information in the channels.

the area characteristic value: and segmenting image information corresponding to each color channel according to the color value and the pixel position of the pixel point in the image, and then obtaining the characteristic value of each region corresponding to each color channel.

example 1

there is also provided, in accordance with an embodiment of the present application, a method embodiment of a method for image recognition, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

the method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking an example of the method running on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of an image recognition method according to an embodiment of the present application. As shown in fig. 1, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the image recognition method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the above-mentioned image recognition method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

Under the above operating environment, the present application provides a method for recognizing an image as shown in fig. 2. Fig. 2 is a flowchart of an image recognition method according to a first embodiment of the present application. Wherein, the method comprises the following steps:

Step S21 is to extract a region feature value from the acquired image information according to the color channel, and obtain a region feature value of each region corresponding to the color channel in the image information.

Specifically, each image information is composed of pixels superimposed by basic colors. Therefore, in each color channel corresponding to the base color, information of the color element corresponding thereto in the image information is stored. When the image information is displayed, colors in all the color channels are overlapped and mixed, so that the color of each pixel point in the image information is obtained.

The number of color channels per image information depends on the number of basic colors of the image information. And the number of basic colors of the image information depends on the color pattern of the image information. For example, image information of which color mode is CMYK, has 4 color channels corresponding to basic colors, a cyan channel, a magenta channel, a yellow channel, and a black channel, respectively; the color mode is image information of a bitmap mode, a grayscale mode, a two-tone mode, and an index color mode, and has 1 color channel by default. The information of the image in which the color mode is the RGB mode and the Lab mode has 3 color channels.

in step S21, the image information is subjected to the region feature value extraction processing for each color channel, respectively, to thereby obtain the region feature values of the respective regions corresponding to each color channel. The extraction processing of the regional characteristic value is to obtain the regional characteristic value of each region corresponding to each color channel through calculation after segmenting the image information corresponding to each color channel through the color value and the pixel position of the pixel point in the image.

In practical applications, the number of color channels is different due to different color modes. Therefore, for convenience of processing, the color pattern of the image information may be unified before the region feature value extraction processing is performed on the image information. And uniformly converting the image information into an RGB mode, and then carrying out subsequent region characteristic value extraction processing on the image information.

The region feature value extraction algorithm may be implemented by an mser (maximum Stable Extreme value region) algorithm, and the specific method is not described herein.

Step S23, screening each region in the image information according to the region feature value and a preset first threshold, and determining a text region in the image information.

Specifically, when the first threshold is determined, the extraction algorithm of the region feature value may be used to perform region feature value extraction processing on various types of image information with characters. And obtaining a first threshold value for determining a text region in the image information by counting the region characteristic values with the characters. The first threshold is used for judging each region in the image information corresponding to each color channel, and whether the current region is a text region is determined through judgment.

in step S23, the characteristic values of the regions of the image information corresponding to the color channels are respectively compared with the first threshold, and text regions containing text information are screened out from the color channels by the comparison.

In practical applications, the text information in the image information usually appears as a phrase or a phrase. When image information with text information is produced, in order to facilitate a user to recognize phrases or phrases, the same size, font, and style of text are usually used in a phrase or phrase and arranged in an arrangement that conforms to reading habits. For example, the text is arranged in left to right, top to bottom, etc.

When people recognize characters in image information, the characters are usually sensitive to the font, the size, the font style and the arrangement mode of the characters. The logical relationship between the characters appearing in the image can be judged by the size, font, style and arrangement of the font.

when people recognize characters in image information, the characters are not interfered by the colors of the characters. However, for computers, the change of the color of the text often causes a barrier to recognition. Therefore, when the text regions are screened, the text regions of each color channel, which are obtained by screening through the first threshold value, can be uniformly associated according to the size, the font and the style of the characters, so that the interference of the colors of the characters is avoided.

in step S25, text regions in the image information are recognized to obtain field information corresponding to the text regions.

In step S25, the characters in the text area are sequentially recognized by the character recognition engine, and field information corresponding to the text area is obtained. The character recognition engine may be a character recognition engine obtained by training each font separately and specially used for recognizing a specific font, or may be a general character recognition engine capable of recognizing all fonts. When the character recognition engine is a character recognition engine specially used for recognizing a specific font, each character recognition engine is used for recognizing characters in the text area, and therefore a preset number of recognition results with the highest confidence coefficient are obtained to serve as character recognition results. Wherein the predetermined number may be 5.

And arranging the characters identified by the character identification engine according to the position relation to obtain field information corresponding to the text area. In addition, the single characters obtained by identification can be combined in a permutation and combination mode to obtain a plurality of pieces of field information corresponding to the text area.

Step S27, matching the field information with a preset illegal keyword set, and determining whether the image information is an illegal image containing illegal field information.

In step S27, the field information corresponding to the text region obtained by the character recognition engine is sequentially matched with the illegal keywords in the preset illegal keyword set, and when the field information matches with the illegal keywords in the illegal keyword set, the image information is determined to be an illegal image containing illegal field information.

Of course, a weighted value may be set for the illegal keyword in the illegal keyword set, and after the field information is matched with the illegal keyword in the illegal keyword set, a weighted operation may be performed on the matching result. And when the weight value of the image information exceeds the preset prefabrication, confirming that the image information is an illegal image containing illegal field information.

Through the above steps S21 to S27, the region feature values of the regions corresponding to the color channels in the image information are extracted and obtained for each color channel. And screening the region characteristic values according to the first threshold value, and determining a text region in the image information. Then, the characters in the text area are identified through a character identification engine, and the identified characters are matched with preset illegal fields, so that whether the image information is an illegal image containing the illegal fields or not is determined. By the method, the purpose of identifying the character information in the image information under various scenes is achieved, so that the technical effect of improving the accuracy of identifying the character information in the image information is achieved, and the technical problem of low accuracy of identifying the character information in the image information due to the influence of background content of the image information in the prior art is solved.

As an alternative implementation, as shown in fig. 3, before matching the field information with a preset illegal keyword set and determining whether the image information is an illegal image containing illegal field information in step S27, the method may further include:

And step S26, splitting the field information according to a preset word splitting rule to obtain a word splitting set corresponding to the field information, wherein the word splitting set is used for recording word splitting entries obtained after the field information is split.

Specifically, in step S26, the word segmentation process is performed on the field information obtained by the recognition. The word group in the field information can be split into a plurality of new word-splitting entries with new meanings through word-splitting processing. And combining the new word segmentation entries obtained through word segmentation processing into a word segmentation set corresponding to the picture information.

In practical applications, word segmentation processing methods can be generally classified into three major categories: the word segmentation processing method based on character string matching, the word segmentation processing method based on understanding and the word segmentation processing method based on statistics are not repeated.

here, the word segmentation processing method may be used to perform word segmentation processing on the field information obtained by recognition. Then, while keeping the original word segmentation entries in the field information, splitting the word segmentation entries in the field information into a plurality of binary words, for example: one ternary word generates two binary words, and characters in all the binary words are arranged and combined to form a new binary word. And combining the participle entries obtained by splitting by using the method into a participle set corresponding to the image information.

As an optional implementation manner, before performing word segmentation processing on the field information, the field information in the picture information that needs to be controlled may be labeled in advance, and the labeled field information is stored to generate an illegal keyword set for determining whether the picture information is an illegal picture.

As an alternative implementation, matching the field information with a preset illegal keyword set in step S27 to determine whether the image information is an illegal image containing illegal field information, may include:

step S271, a preset illegal keyword set is acquired.

Step S273, a weight value corresponding to each illegal keyword in the illegal keyword set is obtained.

and step S275, matching the participle entries in the participle set with the illegal keywords in the illegal keyword set respectively to obtain matching results.

And S277, performing weighting operation according to the weight value corresponding to the illegal keyword and the matching result to obtain the field weight of the field information.

Step S279, comparing the field weight with a preset second threshold, and determining whether the image information is an illegal image containing an illegal field.

Specifically, in steps S271 to S279, the participle information corresponding to the image information obtained by the participle processing is matched with a preset illegal keyword set, and the matching result obtained by the matching is subjected to weighted calculation to obtain a field weight corresponding to the field information included in the image information. And determining whether the image information is an illegal image by comparing the field weight with a preset second threshold value for judging whether the image information is an illegal image.

In practical applications, the meaning of a word contained in different contexts is different. Therefore, it is not always possible to determine that image information is an illegal image by whether or not the image information includes a keyword that is considered to be illegal. Accordingly, a weight value may be set for each illegal keyword in the illegal keyword set. And determining whether the image information is an illegal image or not by matching the word segmentation entries contained in the image information with the illegal keywords in the illegal keyword set in sequence and calculating a field weight value corresponding to the image information according to a matching result. Thereby improving the accuracy of illegal image recognition. Specifically, the segmentation obtained by the segmentation processing can be directly matched with an illegal keyword library, a one-dimensional vector is generated by using a matched result, a keyword model is input for scoring, and when the score is higher than a threshold value, the image information is judged to be an illegal image and a judgment result is output.

As an alternative embodiment, the color channel may include: a color channel and a grayscale channel, wherein the color channel includes at least: red channel, green channel, blue channel and yellow channel, the grey scale channel includes at least: black channel, white channel.

Specifically, each image information may have one or more color channels, each color channel stores color information in the image information, and the color of each pixel in the image information is generated by superimposing and mixing the color information in the respective channels. Therefore, in order to improve the accuracy of recognizing the text region in the image information, a specific color channel may be extracted from the image information, and the text region in each color channel may be extracted separately.

As an alternative implementation, when the color channel is a color channel, wherein, in the step S21, extracting the region characteristic value from the acquired image information according to the color channel, and obtaining the region characteristic value of each region corresponding to the color channel in the image information, the method may include:

Step S211, correcting the RGB color values of the pixels in the image information to obtain a preprocessed color value corresponding to the pixels, where the preprocessed color value includes: red, green, blue and yellow values.

Step S212, determining a first pixel characteristic value of the pixel point in the color channel according to the preprocessed color value and the position information of the pixel point, where the first pixel characteristic value corresponds to the color channel.

Step S213, extracting the image information according to the first pixel feature value, to obtain a region feature value corresponding to the color channel.

Specifically, through steps S211 to S213, the region feature values of the color channels of the colors in the image information are extracted. Firstly, extracting pixel characteristic values of color channels of image information to obtain pixel characteristic values of each pixel point in the image information in each color channel. Then, according to the pixel characteristic value in each color channel, the extraction processing of the area characteristic value is carried out on the image information, and finally the area characteristic value corresponding to each color channel is obtained.

in practical applications, the extraction steps for extracting the color channels are as follows:

Firstly, correcting the original color value in the image information by using a correction formula to obtain a corrected color value:

Wherein, r, g, b are three original colors used for representing pixel colors in the image information, and r, g, b represent red, green, blue respectively; r, G, B, Y are four corrected colors obtained by correcting the three original colors of r, g, and b, and R, G, B, Y represents red, green, blue, and yellow after correction, respectively.

Further, extracting and processing the color channels of the four colors RG, GR, BY, YB respectively BY using an MSER extraction algorithm, wherein the description of extracting the image information BY the MSER extraction algorithm is as follows:

Step 1, multi-color channel extraction: the extraction processing is respectively performed on the color channels of the four colors RG, GR, BY, YB, which is described as follows:

RG, GR, BY, YB, four color channels selected, and (x, y) is the position information of the pixel points in the image information.

and 2, determining an ordered pixel set, namely selecting a segmentation threshold (pixel value) of 8 to segment each color channel of the image information, so as to obtain S _color (0, 7, 15.., 255).

Step 3, determining a candidate area: after the four colored color channels are segmented according to the ordered pixel set, the regions corresponding to the segmented pixels, of which the color numerical values are larger than a preset color threshold value, in the ordered pixel set are selected as candidate regions.

step 4, extracting a linking area from the candidate area: the connected region is 8 adjacent regions around the candidate region, each segmentation threshold value can generate a plurality of regions, and the total number of the regions is assumed to be n; then the set of connected regions can be expressed as:

RG(R_l)＝(R_l,R_l+1,...,R_l+n)，

GR(R_l)＝(R_l,R_l+1,...,R_l+n)，

BY(R_l)＝(R_l,R_l+1,...,R_l+n)，

YB(R_l)＝(R_l,R_l+1,...,R_l+n)，

Where l is used to characterize the segmentation threshold, and R _l is used to characterize the corresponding region information in the ordered set of pixels.

Step 5, calculating the change degree of the area: the degree of regional variation is described as follows:

wherein, the number of pixels in the calculation area is represented in the absolute value symbol of | |.

Step 6, extracting MSER areas of each color channel:

When R _l satisfies the following condition, it is a region that needs to be extracted by MSER extraction algorithm:

v(R_l)<v(R_l-1)&v(R_l)<v(R_l+1)；

The region characteristic values obtained by extraction correspond to the respective color channels.

As an alternative implementation, when the color channel is a grayscale channel, wherein in step S21, extracting the region feature value from the acquired image information according to the color channel, and obtaining the region feature value of each region corresponding to the color channel in the image information, the steps may further include:

In step S214, the preset calculation parameters are acquired.

step S215, determining a second pixel characteristic value of the pixel point in the gray channel according to the RGB color value of the pixel point in the image information, the position information of the pixel point, and the calculation parameter, where the second pixel characteristic value corresponds to the color channel.

And step S216, extracting the image information according to the second pixel characteristic value to obtain a region characteristic value corresponding to the gray channel.

specifically, in addition to the extraction processing of the color channel of the image information, the extraction processing of the grayscale channel of the image information may be performed. Specifically, through steps S214 to S216, the region feature value of the grayscale channel in the image information is extracted. Firstly, extracting pixel characteristic values of gray channels of image information to obtain pixel characteristic values of each pixel point in the image information in each gray color channel. Then, according to the pixel characteristic value in each gray scale channel, the extraction processing of the region characteristic value is carried out on the image information, and finally the region characteristic value corresponding to each gray scale channel is obtained.

In practical application, the extraction steps for extracting the grayscale channel are as follows:

Firstly, preset test parameters α, β, γ are obtained, wherein the test parameters need to satisfy the following constraints: 0 ≦ α, β, γ ≦ 1, and α + β + γ ≦ 1. After acquiring preset test parameters α, β, γ, black and white channels are generated respectively, which are described as follows:

W(x,y)＝255-(α×r(x,y)+β×g(x,y)+γ×b(x,y))；

B(x,y)＝α×r(x,y)+β×g(x,y)+γ×b(x,y)；

Where W is used to indicate a white channel and B is used to indicate a black channel.

Then, extracting the color channels of the RG, GR, BY, YB four colors respectively BY using an MSER extraction algorithm, wherein the description of extracting the image information BY the MSER extraction algorithm is as follows:

RG(R_l)＝(R_l,R_l+1,...,R_l+n)，

GR(R_l)＝(R_l,R_l+1,...,R_l+n)，

BY(R_l)＝(R_l,R_l+1,...,R_l+n)，

YB(R_l)＝(R_l,R_l+1,...,R_l+n)，

step 6, extracting MSER areas of each color channel:

v(R_l)<v(R_l-1)&v(R_l)<v(R_l+1)；

As an optional implementation manner, in step S23, the filtering, according to the region feature value and a preset first threshold, each region in the image information, and determining a text region in the image information may include:

Step S231, comparing the area feature values corresponding to the color channels with the first threshold respectively to obtain comparison results.

step S233, determining a text region corresponding to the color channel in the image information according to the comparison result.

Specifically, through steps S231 to S233, the region feature values corresponding to the color channels in the same region in the image information are respectively compared with the preset first threshold, so as to determine whether the current region is a text region containing characters. Wherein, a corresponding threshold value can be set for the region characteristic value corresponding to each color channel, respectively, to determine whether the region is a text region. The same threshold may also be set for all color channels to determine whether the region is a text region, which is not described herein.

In practical application, aiming at the extraction results of the color channels obtained by the MSER extraction algorithm, a character region classifier obtained by training is used for filtering out non-character regions, and the character region classifier outputs regions with the extraction results higher than a judgment threshold value as text regions.

When the character region classifier is trained, a plurality of image information with text regions can be prepared, and region characteristic values corresponding to each color channel are calculated in the image information by the MSER extraction algorithm. Extracting the region characteristic value corresponding to the text region, performing statistical analysis on the extracted region characteristic value, and finally determining a threshold value for judging whether the region characteristic value is the text region.

as an optional implementation manner, when the image information includes at least two text regions, in step S23, after the regions in the image information are filtered according to the region feature values and a preset first threshold, and the text regions in the image information are determined, the method may further include:

in step S241, the relative position information of the text region in the image information is acquired.

In step S243, color channel information corresponding to the text region is determined according to the region feature value of the text region.

step S245, determining the association relationship between the text regions according to the relative position information and the color channel information.

In step S247, the text regions having the association relationship are merged into a new text region.

Specifically, in a common situation, the reading habit of people is reading from left to right and from top to bottom. Also, people read characters having the same color in the image in series. Therefore, it is possible to determine whether there is an association relationship between a plurality of text regions according to the relative positional relationship of the text regions in the image information and the color channels corresponding to the region feature values through steps S241 to S247, and merge the text regions having the association relationship to generate a new text region. Of course, the text area in the image information may also be associated according to the size of the character in the text area and the width of the stroke in the character.

As an optional implementation manner, after text regions having an association relationship are merged to generate a new text region, the merged new text region may be further extracted by using the MSER extraction algorithm to obtain a region feature value corresponding to the new text region, and the character region classifier is further used to filter the region feature value corresponding to the new text region to determine the text region. By the method, the text area can be further extracted, and the interference information in the image information can be removed.

as an alternative implementation manner, in step S25, performing text recognition on the text region in the image information to obtain field information corresponding to the text region, the steps may include:

step S251, performing projection processing on the text region according to a predetermined direction to obtain a projection result of the text region corresponding to the direction, where the projection processing is to perform statistical processing on the pixel density in each section in the text region along the predetermined direction to obtain a distribution result of the pixel density corresponding to the direction.

step S252, determining the text orientation in the text area according to the projection result.

and step S253, sequentially dividing the text area according to the character trend and the projection result to obtain sub-character areas, wherein each sub-character area comprises a character.

In step S254, the character recognition engine recognizes the character in the sub-character region and determines a recognition result corresponding to the sub-character region.

and step S255, generating field information according to the character trend and the identification result.

Specifically, through steps S251 to S255, a projection process is performed on the text region, that is, the text region is divided into several sections in a predetermined direction, and the pixels included in each section for representing the character are statistically processed, so as to obtain the pixel density in each section in the current direction. Further, the distribution length of the pixels containing the symbol in the current direction may be determined. First, the text run in the text area is determined by the distribution length corresponding to each direction. After the text trend is determined, the text area containing a plurality of characters can be segmented according to the pixel density, and a sub-character area containing only one character is obtained. And respectively identifying the characters in the sub-character areas according to the sequence to obtain field information corresponding to the text areas.

the method for determining the preset direction comprises the following steps: firstly, with 1 degree as a step length, projecting pixel points used for representing characters in a text region from 0 degree to 360 degrees to obtain a projection distance corresponding to a projection angle. Selecting an angle with the shortest projection distance as a main angle of the current text region, and then taking four directions obtained by rotating 0 degree, 90 degrees, 180 degrees and 270 degrees relative to the main angle as predetermined directions for projection processing of the text region. And performing statistical processing on pixels for representing characters in the text area according to the preset direction, and determining the direction with the longest projection distance as the character heading direction of the text area.

As an optional implementation manner, after segmenting a text region containing a plurality of characters in sequence according to pixel density and character trend to obtain sub-character regions, the sub-character regions may be further divided according to projection distances of the sub-character regions in the horizontal direction and the longitudinal direction and a width-to-height ratio of the characters: including chinese character classes and including numeric character classes. In general, the width-to-height ratio of a kanji character is 1: 1, the width-height ratio of the numeric characters is 1: 2. of course, the sub-character regions may be further classified according to the width-to-height ratio of the english characters, and the specific method is not described herein.

As an alternative embodiment, in the step S254, identifying the character in the sub-character region by the character recognition engine, and determining the recognition result corresponding to the sub-character region, the steps may further include:

Step S2541, recognizing the sub-character region through the character recognition engine to obtain an initial recognition result corresponding to the sub-character region, wherein the initial recognition result at least comprises one alternative character and a confidence corresponding to the alternative character.

step S2543, determining a predetermined number of candidate characters with the highest confidence level from the initial recognition results as recognition results according to the confidence level.

specifically, the character recognition engine is used for recognizing the characters in the sub-character region to obtain an initial recognition result which is matched with the sub-character region and contains a plurality of candidate characters and a confidence coefficient corresponding to the candidate characters. And determining a preset number of candidate characters with the highest confidence as a recognition result according to the confidence, wherein the preset number can be set to be 5.

In addition, the recognition result can be corrected based on feedback, and the specific steps are described as follows:

step 1, selecting the confidence coefficient with the maximum value in the recognition result to compare with a preset threshold value, if the confidence coefficient is smaller than the threshold value, carrying out corrosion treatment on the sub-character region with the template size of 3 pixels multiplied by 3 pixels, and recognizing the sub-character region subjected to corrosion treatment again to obtain a new recognition result;

Step 2, identifying the sub-character area after corrosion treatment, and if the confidence coefficient with the largest numerical value in the processing result is greater than or equal to the confidence coefficient with the largest numerical value in the new recognition result, taking the new recognition result after corrosion treatment as the recognition result of the current sub-character area, and continuously repeating the step 1 to carry out iterative treatment; and if the confidence coefficient with the maximum numerical value in the processing result is smaller than the confidence coefficient with the maximum numerical value in the new recognition result, selecting the new recognition result as an output result for recognizing the character, and stopping the iteration process.

As an alternative implementation, as shown in fig. 4, after determining that the image information is an illegal image containing illegal field information at step S27, the method further includes:

Step S28, obtaining a preset first image feature vector, where the first image feature vector is used to characterize the image features of the specific type of image.

and step S29, extracting the feature vector of the illegal image to obtain a second image feature vector.

step S30, comparing the second image feature vector with the first image feature vector, and determining the image type of the illegal image, where the image type at least includes: chat screenshots, non-chat screenshots.

Specifically, after the image information is determined to be an illegal image including illegal field information, the image type of the illegal image may be determined by extracting a second image feature vector of the illegal image through steps S28 to S30 and comparing the second image feature vector with a preset first image feature vector. By the method, the accuracy of image recognition can be further improved.

In practical applications, illegal images can be further classified through a post-processing model. The post-processing model can be divided into an image model and a text model, and is specifically described as follows:

firstly, a sample of post-processing images is screened, and some typical multi-character image information is screened from the image information. Then, carrying out sample marking on the image information obtained by screening so as to distinguish the image types of illegal images; further, feature extraction is carried out on illegal images of various image types, and feature vectors such as color histograms, block gray level histograms, PCA (principal component analysis), image pixel gradients and the like of the illegal images are extracted and obtained. And inputting the feature vectors into a neural network, and training the classification model by using the neural network to obtain the classification model for distinguishing the image types of the illegal images.

after the classification model is obtained, the feature vectors of the illegal images are input into the classification model, and then the illegal images can be further classified.

Furthermore, misjudged keywords appearing in illegal images can be collected and sorted to generate a keyword white list word bank; in the process of identifying characters in the image information, when word segmentation fields in the image information are matched with keywords in a white list word bank, the image information is judged to be a legal image.

as an alternative implementation manner, before extracting, in step S21, a region feature value from the acquired image information according to the color channel, and obtaining a region feature value of each region corresponding to the color channel in the image information, the method may further include:

In step S201, the image resolution of the image information is acquired.

Step S203, comparing the image resolution with a preset standard image resolution.

and S205, when the image resolution is not equal to the standard image resolution, scaling the image information according to the standard image resolution in an equal proportion mode.

Specifically, before extracting the region feature value of the image information, normalization processing may be performed on the image information through steps S201 to S205, and images with different resolutions may be scaled in an equal proportion, so as to obtain image information with a standard image resolution.

In practical applications, the image uploaded to the server will cause errors in recognizing characters due to different image resolutions. Thus, the image information may be scaled equally before the text in the image information is recognized, resulting in image information having a standard image resolution.

As an optional implementation manner, the specific implementation steps are as follows:

Step 1, text detection is carried out, and the text detection is used for detecting the position of a text area in image information.

Specifically, first, the size of the input image information is scaled by the image information normalization process, and the image having the longest side larger than 1024 pixels is scaled down by the same scale so that the longest side is equal to 1024 pixels. In the scaling process, the interpolation algorithm that can be used may be a nearest neighbor interpolation algorithm, a bilinear interpolation algorithm, a cubic interpolation algorithm, or the like, in this embodiment, the bilinear interpolation algorithm is preferably used to perform normalization processing on the image information, and no processing is performed on the image with the longest edge smaller than 1024 pixels.

After normalization processing is performed on the image information, a text region in the image information is extracted. Dividing the input image information into 6 color channels, and extracting the text region by using MSER extraction algorithm for each channel to obtain an extraction result.

And filtering non-character areas in the image information by using a trained character classifier according to the extraction result corresponding to each color channel, and outputting the areas with the extraction result higher than a threshold value in the image information as text areas by using the character classifier.

Further, the text regions may also be associated. Associating the text areas in each color channel according to the information of the color, the relative position relationship, the stroke width and the like of the characters, and synthesizing the text areas with high similarity into text line areas;

After the text line region is obtained, the trained text line classifier can be used for filtering out non-character regions in the text line region to obtain the text line region.

And 2, text recognition, wherein the text recognition is used for recognizing the text line region obtained in the step 1 to obtain the corresponding field information.

Firstly, with 1 degree as a step length, projecting pixel points used for representing characters in a text region from 0 degree to 360 degrees to obtain a projection distance corresponding to a projection angle. Selecting an angle with the shortest projection distance as a main angle of the current text region, and then taking four directions obtained by rotating 0 degree, 90 degrees, 180 degrees and 270 degrees relative to the main angle as predetermined directions for projection processing of the text region. And performing statistical processing on pixels for representing characters in the text area according to the preset direction, and determining the direction with the longest projection distance as the character heading direction of the text area.

After a text area containing a plurality of characters is sequentially segmented according to pixel density and character trend to obtain a sub-character area, the sub-character area can be further divided into the following parts according to the projection distance of the sub-character area in the transverse direction and the longitudinal direction and the width-height ratio of the characters: including chinese character classes and including numeric character classes.

Then, the characters in the text area are sequentially identified through a character identification engine, so that field information corresponding to the text area is obtained. The character recognition engine may be a character recognition engine obtained by training each font separately and specially used for recognizing a specific font, or may be a general character recognition engine capable of recognizing all fonts. When the character recognition engine is a character recognition engine specially used for recognizing a specific font, each character recognition engine is used for recognizing characters in the text area, and therefore a preset number of recognition results with the highest confidence coefficient are obtained to serve as character recognition results. Wherein the predetermined number may be 5.

Step 2.1, selecting the confidence coefficient with the maximum value in the recognition results to compare with a preset threshold, if the confidence coefficient is smaller than the threshold, performing corrosion treatment on the sub-character region with the template size of 3 pixels multiplied by 3 pixels, and recognizing the sub-character region subjected to corrosion treatment again to obtain a new recognition result;

step 2.2, identifying the corroded sub-character area, taking the corroded new identification result as the identification result of the current sub-character area if the confidence coefficient of the maximum numerical value in the processing result is greater than or equal to the confidence coefficient of the maximum numerical value in the new identification result, and continuously repeating the step 2.1 to carry out iterative processing; and if the confidence coefficient with the maximum numerical value in the processing result is smaller than the confidence coefficient with the maximum numerical value in the new recognition result, selecting the new recognition result as an output result for recognizing the character, and stopping the iteration process.

And 3, matching the field information output in the step 2 by using the keyword model, wherein:

Firstly, a word segmentation model is utilized to perform word segmentation processing on word segmentation entries in a word segmentation set corresponding to image information, and unary words appearing in the word segmentation are discarded, so that a plurality of word segmentation entries are obtained. The method is characterized in that the original word segmentation entries in the field information are kept, and meanwhile, the word segmentation entries in the field information are split into a plurality of binary words, for example: one ternary word generates two binary words, and characters in all the binary words are arranged and combined to form a new binary word. And combining the participle entries obtained by splitting by using the method into a participle set corresponding to the image information.

And then matching the participle entries in the participle set corresponding to the image information with the illegal keyword library, performing weighted operation on the matched result, judging the image information to be an illegal image when the weight value is higher than a threshold value, and outputting the judgment result.

And 4, further classifying the illegal images through a post-processing model. The post-processing model can be divided into an image model and a text model, and is specifically described as follows:

Firstly, a sample of post-processing images is screened, and some typical multi-character image information is screened from the image information. For example, QQ chat logs, WeChat chat logs, Wangwang chat logs, order detail screenshots, logistics information screenshots, and the like. Then, sample labeling is performed on the image information obtained by screening, so as to distinguish the image types of illegal images, for example: chat screenshot and non-chat screenshot; further, feature extraction is carried out on illegal images of various image types, and feature vectors such as color histograms, block gray level histograms, PCA (principal component analysis), image pixel gradients and the like of the illegal images are extracted and obtained. And inputting the feature vector into a BP neural network, and training the classification model by using the neural network to obtain the classification model for distinguishing the image types of the illegal images.

In addition, a manual review platform can be established, and finally, the illegal images obtained by the method are manually reviewed, so that the identification accuracy is ensured, and risk control can be performed. The aim of training the model regularly can be achieved by adding manually checked data into the labeled sample again so as to adapt to the change of the habit of the user.

Before the illegal image is identified, the judgment model can be trained by the following steps:

Step 1, marking 0 and 1 to the image information by obtaining the image information in batch, namely marking whether the image information is an illegal image, wherein 0 is a legal image and 1 is an illegal image.

And 2, labeling the illegal field information labeled as the illegal image in the step 1, storing the labeled field information, and generating an illegal keyword set for judging whether the image information is the illegal image.

and 3, performing word segmentation on the illegal keywords in the illegal keyword set by using a word segmentation model, and discarding unary words appearing in the word segmentation so as to obtain a plurality of word segmentations. While keeping the original illegal keywords, the word segmentation entry is split into a plurality of binary words or ternary words, for example: one ternary word generates two binary words, and characters in all the binary words are arranged and combined to form a new binary word. The participles obtained by splitting by the method are combined with the original illegal keywords into an illegal keyword set.

step 4, establishing a weak classifier by using a keyword decision tree algorithm: and (3) taking each illegal keyword obtained in the step (3) as a decision condition of the keyword decision tree, and outputting a result of 1 when the keyword is matched and hit with the illegal keyword, or outputting a result of 0.

and 5, giving the initial weight of each weak classifier in the keyword decision tree to be 1/N, wherein N is the number of the weak classifiers.

And 6, obtaining an illegal keyword set through word segmentation processing according to the labeling data obtained through labeling in the step 1 and through the steps 2 to 3, and training the weight of the weak classifier through a semantic model and an adaboost algorithm to obtain a weight value corresponding to each illegal keyword in the illegal keyword set.

and 7, extracting the judgment conditions of the keyword decision tree, extracting the illegal keywords with the weight larger than a preset threshold value, and adding the illegal keywords into an illegal keyword lexicon so as to improve the matching efficiency.

it should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method of the embodiments of the present application.

Example 2

according to an embodiment of the present application, there is also provided an image recognition apparatus for implementing the image recognition method, as shown in fig. 5, the apparatus including: a first extraction module 21, a screening module 23, a recognition module 25 and a matching module 27.

the first extraction module 21 is configured to extract a region feature value from the acquired image information according to the color channel, so as to obtain a region feature value of each region corresponding to the color channel in the image information. The screening module 23 is configured to screen each region in the image information according to the region feature value and a preset first threshold, and determine a text region in the image information. The identification module 25 is configured to perform character identification on a text region in the image information to obtain field information corresponding to the text region. And a matching module 27, configured to match the field information with a preset illegal keyword set, and determine whether the image information is an illegal image containing illegal field information.

Through the first extraction module 21, the screening module 23, the identification module 25 and the matching module 27, the region feature values of each region corresponding to the color channel in the image information are extracted and obtained according to the color channel. And screening the region characteristic values according to the first threshold value, and determining a text region in the image information. Then, the characters in the text area are identified through a character identification engine, and the identified characters are matched with preset illegal fields, so that whether the image information is an illegal image containing the illegal fields or not is determined. By the method, the purpose of identifying the character information in the image information under various scenes is achieved, so that the technical effect of improving the accuracy of identifying the character information in the image information is achieved, and the technical problem of low accuracy of identifying the character information in the image information due to the influence of background content of the image information in the prior art is solved.

As an alternative embodiment, as shown in fig. 6, the apparatus may further include: a word segmentation module 26.

The word segmentation module 26 is configured to split the field information according to a preset word segmentation rule to obtain a word segmentation set corresponding to the field information, where the word segmentation set is used to record a word segmentation entry obtained after the field information is split.

Specifically, the word segmentation module 26 performs word segmentation processing on the field information obtained through recognition. The word group in the field information can be split into a plurality of new word-splitting entries with new meanings through word-splitting processing. And combining the new word segmentation entries obtained through word segmentation processing into a word segmentation set corresponding to the picture information.

As an alternative implementation, as shown in fig. 7, the matching module 27 may include: a first sub-acquisition module 271, a second sub-acquisition module 273, a sub-matching module 275, a sub-operation module 277, and a first comparison module 279.

The first sub-obtaining module 271 is configured to obtain a preset illegal keyword set; a second sub-obtaining module 273, configured to obtain a weight value corresponding to each illegal keyword in the illegal keyword set; the sub-matching module 275 is configured to match the participle entries in the participle set with the illegal keywords in the illegal keyword set, respectively, to obtain matching results; the sub-operation module 277 is configured to perform weighting operation according to a weight value corresponding to the illegal keyword and the matching result, so as to obtain a field weight of the field information; the first comparison module 279 is configured to compare the field weight with a preset second threshold, and determine whether the image information is an illegal image containing an illegal field.

Specifically, the first sub-obtaining module 271, the second sub-obtaining module 273, the sub-matching module 275, the sub-operation module 277, and the first comparison module 279 are used to match the segmentation information corresponding to the image information obtained through the segmentation processing with a preset illegal keyword set, and perform weighted calculation on the matching result obtained through matching to obtain the field weight corresponding to the field information included in the image information. And determining whether the image information is an illegal image by comparing the field weight with a preset second threshold value for judging whether the image information is an illegal image.

As shown in fig. 8, when the color channel is a color channel, the first extraction module 21 includes: a sub-modification module 211, a first sub-determination module 212, and a first sub-extraction module 213.

Wherein, the sub-correction module 211 is configured to perform correction processing on the RGB color values of the pixels in the image information to obtain a preprocessed color value corresponding to the pixels, where the preprocessed color value includes: red, green, blue and yellow values; a first sub-determining module 212, configured to determine a first pixel characteristic value of the pixel in the color channel according to the preprocessed color value and the position information of the pixel, where the first pixel characteristic value corresponds to the color channel; the first sub-extraction module 213 is configured to extract the image information according to the first pixel feature value to obtain a region feature value corresponding to the color channel.

specifically, the sub-correction module 211, the first sub-determination module 212, and the first sub-extraction module 213 extract the region feature values of the color channels of the colors in the image information. Firstly, extracting pixel characteristic values of color channels of image information to obtain pixel characteristic values of each pixel point in the image information in each color channel. Then, according to the pixel characteristic value in each color channel, the extraction processing of the area characteristic value is carried out on the image information, and finally the area characteristic value corresponding to each color channel is obtained.

As an alternative implementation, as shown in fig. 9, when the color channel is a grayscale channel, the first extraction module 21 further includes: a third sub-acquisition module 214, a second sub-determination module 215, and a second sub-extraction module 216.

the third sub-obtaining module 214 is configured to obtain a preset calculation parameter; the second sub-determining module 215 is configured to determine a second pixel characteristic value of the pixel point in the gray channel according to the RGB color value of the pixel point in the image information, the position information of the pixel point, and the calculation parameter, where the second pixel characteristic value corresponds to the color channel; and the second sub-extraction module 216 is configured to extract the image information according to the second pixel characteristic value to obtain a region characteristic value corresponding to the grayscale channel.

Specifically, in addition to the extraction processing of the color channel of the image information, the extraction processing of the grayscale channel of the image information may be performed. Specifically, the third sub-obtaining module 214, the second sub-determining module 215, and the second sub-extracting module 216 are used to extract the region feature value of the gray channel in the image information. Firstly, extracting pixel characteristic values of gray channels of image information to obtain pixel characteristic values of each pixel point in the image information in each gray color channel. Then, according to the pixel characteristic value in each gray scale channel, the extraction processing of the region characteristic value is carried out on the image information, and finally the region characteristic value corresponding to each gray scale channel is obtained.

as an alternative embodiment, as shown in FIG. 10, the screening module 23 may include a sub-alignment module 231 and a third sub-determination module 233.

The sub-comparison module 231 is configured to compare the area characteristic values corresponding to the color channels with the first threshold respectively to obtain comparison results; the third sub-determining module 233 is configured to determine, according to the comparison result, a text region corresponding to the color channel in the image information.

specifically, the sub-alignment module 231 and the third sub-determination module 233 are used. And comparing the region characteristic values corresponding to the color channels in the same region in the image information with a preset first threshold value respectively, so as to determine whether the current region is a text region containing characters. Wherein, a corresponding threshold value can be set for the region characteristic value corresponding to each color channel, respectively, to determine whether the region is a text region. The same threshold may also be set for all color channels to determine whether the region is a text region, which is not described herein.

As an alternative implementation, when at least two text regions are included in the image information, as shown in fig. 11, the apparatus may further include: a first obtaining module 241, a first determining module 243, a second determining module 245, and a merging module 247.

The first obtaining module 241 is configured to obtain relative position information of the text region in the image information; a first determining module 243, configured to determine, according to the region feature value of the text region, color channel information corresponding to the text region; a second determining module 245, configured to determine an association relationship between text regions according to the relative position information and the color channel information; a merging module 247, configured to merge the text regions having the association relationship into a new text region.

specifically, in a common situation, the reading habit of people is reading from left to right and from top to bottom. Also, people read characters having the same color in the image in series. Therefore, by the first obtaining module 241, the first determining module 243, the second determining module 245 and the merging module 247, whether an association relationship exists between a plurality of text regions is determined according to the relative position relationship of the text regions in the image information and according to the color channel corresponding to the region feature value, and the text regions having the association relationship are merged to generate a new text region. Of course, the text area in the image information may also be associated according to the size of the character in the text area and the width of the stroke in the character.

As an alternative embodiment, as shown in fig. 12, the identification module 25 includes: a sub-processing module 251, a fourth sub-determination module 252, a sub-segmentation module 253, a first sub-identification module 254, and a sub-generation module 255.

The sub-processing module 251 is configured to perform projection processing on the text region according to a predetermined direction to obtain a projection result of the text region corresponding to the direction, where the projection processing is to perform statistical processing on pixel density in each section in the text region along the predetermined direction to obtain a distribution result of the pixel density corresponding to the direction; a fourth sub-determining module 252, configured to determine a text direction in the text region according to the projection result; the sub-segmentation module 253 is configured to sequentially segment the text region according to the text trend and the projection result to obtain sub-character regions, where each sub-character region includes one character; a first sub-recognition module 254, configured to recognize, by a character recognition engine, characters in a sub-character region, and determine a recognition result corresponding to the sub-character region; and a sub-generating module 255, configured to generate field information according to the character trend and the recognition result.

specifically, the sub-processing module 251, the fourth sub-determining module 252, the sub-dividing module 253, the first sub-identifying module 254 and the sub-generating module 255 perform projection processing on the text region, that is, divide the text region into a plurality of sections in a predetermined direction, and perform statistical processing on pixels included in each section and used for representing characters, so as to obtain the pixel density in each section in the current direction. Further, the distribution length of the pixels containing the symbol in the current direction may be determined. First, the text run in the text area is determined by the distribution length corresponding to each direction. After the text trend is determined, the text area containing a plurality of characters can be segmented according to the pixel density, and a sub-character area containing only one character is obtained. And respectively identifying the characters in the sub-character areas according to the sequence to obtain field information corresponding to the text areas.

As an alternative embodiment, the first sub-identification module 254 includes: a second sub-identification module 2541 and a fifth sub-determination module 2543.

the second sub-recognition module 2541 is configured to recognize the sub-character region through the character recognition engine, and obtain an initial recognition result corresponding to the sub-character region, where the initial recognition result at least includes one candidate character and a confidence corresponding to the candidate character; a fifth sub-determining module 2543, configured to determine, according to the confidence level, a predetermined number of candidate characters with the highest confidence level from the initial recognition results as recognition results.

As an alternative implementation, as shown in fig. 13, the apparatus may further include: a second obtaining module 28, a second extracting module 29 and a second comparing module 30.

the second obtaining module 28 is configured to obtain a preset first image feature vector, where the first image feature vector is used to represent an image feature of a specific type of image; the second extraction module 29 is configured to perform feature vector extraction on the illegal image to obtain a second image feature vector; a second comparing module 30, configured to compare the second image feature vector with the first image feature vector, and determine an image type of the illegal image, where the image type at least includes: chat screenshots, non-chat screenshots.

Specifically, after the image information is determined to be an illegal image including illegal field information, the second obtaining module 28, the second extracting module 29 and the second comparing module 30 may further extract a second image feature vector of the illegal image, and compare the second image feature vector with a preset first image feature vector, so as to determine the image type of the illegal image. By the method, the accuracy of image recognition can be further improved.

As an alternative embodiment, as shown in fig. 14, the apparatus further includes: a third obtaining module 201, a third comparing module 203 and a scaling module 205.

The third obtaining module 201 is configured to obtain an image resolution of the image information; the third comparison module 203 is configured to compare the image resolution with a preset standard image resolution; and the scaling module 205 is configured to, when the image resolution is not equal to the standard image resolution, scale the image information according to the standard image resolution.

specifically, before extracting the region feature value of the image information, the third obtaining module 201, the third comparing module 203, and the scaling module 205 may further perform normalization processing on the image information, and perform equal-scale scaling on the images with different resolutions, so as to obtain the image information with the standard image resolution.

example 3

The embodiment of the application can provide a computer terminal, and the computer terminal can be any one computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the method for recognizing an image of an application program:

s1, extracting area characteristic values from the acquired image information according to the color channels to obtain the area characteristic values of each area corresponding to the color channels in the image information;

S2, screening each region in the image information according to the region characteristic value and a preset first threshold value, and determining a text region in the image information;

S3, performing character recognition on the text area in the image information to obtain field information corresponding to the text area;

And S4, matching the field information with a preset illegal keyword set, and determining whether the image information is an illegal image containing illegal field information.

Optionally, fig. 1 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 1, the computer terminal a may include: one or more processors (only one of which is shown), memory, and a transmission module.

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the image recognition method and apparatus in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, the method for detecting a system vulnerability attack is implemented. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: extracting a region characteristic value from the acquired image information according to the color channel to obtain a region characteristic value of each region corresponding to the color channel in the image information; screening each region in the image information according to the region characteristic value and a preset first threshold value, and determining a text region in the image information; performing character recognition on a text region in the image information to obtain field information corresponding to the text region; and matching the field information with a preset illegal keyword set to determine whether the image information is an illegal image containing illegal field information.

optionally, the processor may further execute the program code of the following steps: and splitting the field information according to a preset word segmentation rule to obtain a word segmentation set corresponding to the field information, wherein the word segmentation set is used for recording word segmentation entries obtained after the field information is split.

Optionally, the processor may further execute the program code of the following steps: acquiring a preset illegal keyword set; acquiring a weight value corresponding to each illegal keyword in the illegal keyword set; matching the participle entries in the participle set with the illegal keywords in the illegal keyword set respectively to obtain matching results; performing weighting operation according to the weight value corresponding to the illegal keyword and the matching result to obtain the field weight of the field information; and comparing the field weight with a preset second threshold value to determine whether the image information is an illegal image containing illegal fields.

Optionally, the processor may further execute the program code of the following steps: through carrying out correction processing to the RGB colour value of pixel in the image information, obtain the preliminary treatment colour value that corresponds with the pixel, wherein, the preliminary treatment colour value includes: red, green, blue and yellow values; determining a first pixel characteristic value of the pixel point in the color channel according to the preprocessed color value and the position information of the pixel point, wherein the first pixel characteristic value corresponds to the color channel; and extracting the image information according to the first pixel characteristic value to obtain a region characteristic value corresponding to the color channel.

Optionally, the processor may further execute the program code of the following steps: acquiring preset calculation parameters; determining a second pixel characteristic value of the pixel point in the gray channel according to the RGB color value of the pixel point in the image information, the position information of the pixel point and the calculation parameter, wherein the second pixel characteristic value corresponds to the color channel; and extracting the image information according to the second pixel characteristic value to obtain a region characteristic value corresponding to the gray channel.

optionally, the processor may further execute the program code of the following steps: comparing the area characteristic values corresponding to the color channels with a first threshold value respectively to obtain comparison results; and determining a text region corresponding to the color channel in the image information according to the comparison result.

Optionally, the processor may further execute the program code of the following steps: acquiring relative position information of the text area in the image information; determining color channel information corresponding to the text region according to the region characteristic value of the text region; determining the incidence relation between the text regions according to the relative position information and the color channel information; and merging the text regions with the association relationship into a new text region.

optionally, the processor may further execute the program code of the following steps: performing projection processing on the text region according to a preset direction to obtain a projection result of the text region corresponding to the direction, wherein the projection processing is to perform statistical processing on the pixel density in each section in the text region along the preset direction to obtain a distribution result of the pixel density corresponding to the direction; determining the character trend in the text area according to the projection result; sequentially segmenting the text region according to the character trend and the projection result to obtain sub-character regions, wherein each sub-character region comprises a character; identifying characters in the sub-character area through a character identification engine, and determining an identification result corresponding to the sub-character area; and generating field information according to the character trend and the recognition result.

optionally, the processor may further execute the program code of the following steps: identifying the sub-character region through a character identification engine to obtain an initial identification result corresponding to the sub-character region, wherein the initial identification result at least comprises one alternative character and a confidence coefficient corresponding to the alternative character; and determining a preset number of alternative characters with the highest confidence degree from the initial recognition result as the recognition result according to the confidence degree.

Optionally, the processor may further execute the program code of the following steps: acquiring a preset first image feature vector, wherein the first image feature vector is used for representing the image features of a specific type of image; extracting a feature vector of the illegal image to obtain a second image feature vector; comparing the second image feature vector with the first image feature vector to determine the image type of the illegal image, wherein the image type at least comprises: chat screenshots, non-chat screenshots.

Optionally, the processor may further execute the program code of the following steps: acquiring the image resolution of the image information; comparing the image resolution with a preset standard image resolution; and when the image resolution is not equal to the standard image resolution, scaling the image information according to the standard image resolution in an equal proportion.

By adopting the embodiment of the application, a scheme for identifying the image is provided. Extracting a region characteristic value from the acquired image information according to the color channel to obtain a region characteristic value of each region corresponding to the color channel in the image information; screening each region in the image information according to the region characteristic value and a preset first threshold value, and determining a text region in the image information; performing character recognition on a text region in the image information to obtain field information corresponding to the text region; the field information is matched with a preset illegal keyword set to determine whether the image information is an illegal image containing illegal field information or not, so that the aim of identifying the character information in various scenes in the image information is fulfilled, and the technical problem of low accuracy in identifying the character information in the image information due to the influence of background content of the image information in the prior art is solved.

It can be understood by those skilled in the art that the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, etc., without limiting the structure of the electronic apparatus. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 4

Embodiments of the present application also provide a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the image recognition method provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: extracting a region characteristic value from the acquired image information according to the color channel to obtain a region characteristic value of each region corresponding to the color channel in the image information; screening each region in the image information according to the region characteristic value and a preset first threshold value, and determining a text region in the image information; performing character recognition on a text region in the image information to obtain field information corresponding to the text region; and matching the field information with a preset illegal keyword set to determine whether the image information is an illegal image containing illegal field information.

optionally, the storage medium is configured to store program code for performing the steps of: and splitting the field information according to a preset word segmentation rule to obtain a word segmentation set corresponding to the field information, wherein the word segmentation set is used for recording word segmentation entries obtained after the field information is split.

Optionally, the storage medium is configured to store program code for performing the steps of: acquiring a preset illegal keyword set; acquiring a weight value corresponding to each illegal keyword in the illegal keyword set; matching the participle entries in the participle set with the illegal keywords in the illegal keyword set respectively to obtain matching results; performing weighting operation according to the weight value corresponding to the illegal keyword and the matching result to obtain the field weight of the field information; and comparing the field weight with a preset second threshold value to determine whether the image information is an illegal image containing illegal fields.

Optionally, the storage medium is configured to store program code for performing the steps of: through carrying out correction processing to the RGB colour value of pixel in the image information, obtain the preliminary treatment colour value that corresponds with the pixel, wherein, the preliminary treatment colour value includes: red, green, blue and yellow values; determining a first pixel characteristic value of the pixel point in the color channel according to the preprocessed color value and the position information of the pixel point, wherein the first pixel characteristic value corresponds to the color channel; and extracting the image information according to the first pixel characteristic value to obtain a region characteristic value corresponding to the color channel.

Optionally, the storage medium is configured to store program code for performing the steps of: acquiring preset calculation parameters; determining a second pixel characteristic value of the pixel point in the gray channel according to the RGB color value of the pixel point in the image information, the position information of the pixel point and the calculation parameter, wherein the second pixel characteristic value corresponds to the color channel; and extracting the image information according to the second pixel characteristic value to obtain a region characteristic value corresponding to the gray channel.

Optionally, the storage medium is configured to store program code for performing the steps of: comparing the area characteristic values corresponding to the color channels with a first threshold value respectively to obtain comparison results; and determining a text region corresponding to the color channel in the image information according to the comparison result.

optionally, the storage medium is configured to store program code for performing the steps of: acquiring relative position information of the text area in the image information; determining color channel information corresponding to the text region according to the region characteristic value of the text region; determining the incidence relation between the text regions according to the relative position information and the color channel information; and merging the text regions with the association relationship into a new text region.

Optionally, the storage medium is configured to store program code for performing the steps of: performing projection processing on the text region according to a preset direction to obtain a projection result of the text region corresponding to the direction, wherein the projection processing is to perform statistical processing on the pixel density in each section in the text region along the preset direction to obtain a distribution result of the pixel density corresponding to the direction; determining the character trend in the text area according to the projection result; sequentially segmenting the text region according to the character trend and the projection result to obtain sub-character regions, wherein each sub-character region comprises a character; identifying characters in the sub-character area through a character identification engine, and determining an identification result corresponding to the sub-character area; and generating field information according to the character trend and the recognition result.

Optionally, the storage medium is configured to store program code for performing the steps of: identifying the sub-character region through a character identification engine to obtain an initial identification result corresponding to the sub-character region, wherein the initial identification result at least comprises one alternative character and a confidence coefficient corresponding to the alternative character; and determining a preset number of alternative characters with the highest confidence degree from the initial recognition result as the recognition result according to the confidence degree.

Optionally, the storage medium is configured to store program code for performing the steps of: acquiring a preset first image feature vector, wherein the first image feature vector is used for representing the image features of a specific type of image; extracting a feature vector of the illegal image to obtain a second image feature vector; comparing the second image feature vector with the first image feature vector to determine the image type of the illegal image, wherein the image type at least comprises: chat screenshots, non-chat screenshots.

Optionally, the storage medium is configured to store program code for performing the steps of: acquiring the image resolution of the image information; comparing the image resolution with a preset standard image resolution; and when the image resolution is not equal to the standard image resolution, scaling the image information according to the standard image resolution in an equal proportion.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

in the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. An image recognition method, comprising:

Extracting a region characteristic value from the acquired image information according to a color channel to obtain a region characteristic value of each region corresponding to the color channel in the image information;

Screening each region in the image information according to the region characteristic value and a preset first threshold value, and determining a text region in the image information;

Performing character recognition on the text region in the image information to obtain field information corresponding to the text region;

matching the field information with a preset illegal keyword set to determine whether the image information is an illegal image containing illegal field information;

Before matching the field information with a preset illegal keyword set and determining whether the image information is an illegal image containing illegal field information, the method further comprises the following steps:

Splitting the field information according to a preset word segmentation rule to obtain a word segmentation set corresponding to the field information, wherein the word segmentation set is used for recording word segmentation entries obtained after the field information is split;

Matching the field information with a preset illegal keyword set to determine whether the image information is an illegal image containing illegal field information, wherein the method comprises the following steps:

Acquiring a preset illegal keyword set;

Acquiring a weight value corresponding to each illegal keyword in the illegal keyword set;

matching the word segmentation entries in the word segmentation set with the illegal keywords in the illegal keyword set respectively to obtain matching results;

performing weighting operation according to the weight value corresponding to the illegal keyword and the matching result to obtain the field weight of the field information;

and comparing the field weight with a preset second threshold value to determine whether the image information is the illegal image containing the illegal field.

2. The method of claim 1, wherein the color channel comprises: a color channel and a grayscale channel, the color channel including at least: red, green, blue and yellow channels, the grayscale channel including: a black channel and a white channel.

3. The method according to claim 2, wherein when the color channel is the color channel, wherein extracting a region feature value from the acquired image information according to the color channel to obtain a region feature value of each region corresponding to the color channel in the image information comprises:

correcting the RGB color value of the pixel point in the image information to obtain a preprocessing color value corresponding to the pixel point, wherein the preprocessing color value comprises: red, green, blue and yellow values;

Determining a first pixel characteristic value of the pixel point in the color channel according to the preprocessing color value and the position information of the pixel point, wherein the first pixel characteristic value corresponds to the color channel;

And extracting the image information according to the first pixel characteristic value to obtain the region characteristic value corresponding to the color channel.

4. The method according to claim 2, wherein when the color channel is the grayscale channel, wherein a region feature value is extracted from the acquired image information according to the color channel to obtain a region feature value of each region corresponding to the color channel in the image information, further comprising:

acquiring preset calculation parameters;

Determining a second pixel characteristic value of the pixel point in the gray channel according to the RGB color value of the pixel point in the image information, the position information of the pixel point and the calculation parameter, wherein the second pixel characteristic value corresponds to the gray channel;

And extracting the image information according to the second pixel characteristic value to obtain the region characteristic value corresponding to the gray channel.

5. The method according to claim 1, wherein the filtering each region in the image information according to the region feature value and a preset first threshold value to determine a text region in the image information comprises:

comparing the region characteristic values corresponding to the color channels with the first threshold respectively to obtain comparison results;

And determining the text region corresponding to the color channel in the image information according to the comparison result.

6. The method according to claim 5, wherein when at least two text regions are included in the image information, after the text regions in the image information are determined by filtering the respective regions in the image information according to the region feature values and a preset first threshold, the method further comprises:

Acquiring relative position information of the text region in the image information;

determining color channel information corresponding to the text region according to the region characteristic value of the text region;

Determining the incidence relation between the text regions according to the relative position information and the color channel information;

and merging the text regions with the association relationship into a new text region.

7. the method according to claim 1, wherein performing character recognition on the text region in the image information to obtain field information corresponding to the text region comprises:

performing projection processing on the text region according to a preset direction to obtain a projection result of the text region corresponding to the direction, wherein the projection processing is to perform statistical processing on the pixel density in each section in the text region along the preset direction to obtain a distribution result of the pixel density corresponding to the direction;

Determining the character trend in the text area according to the projection result;

Sequentially segmenting the text region according to the character trend and the projection result to obtain sub-character regions, wherein each sub-character region comprises a character;

Identifying characters in the sub-character area through a character identification engine, and determining an identification result corresponding to the sub-character area;

And generating the field information according to the character trend and the identification result.

8. The method of claim 7, wherein identifying the character in the sub-character region by a character recognition engine, and determining the recognition result corresponding to the sub-character region comprises:

Identifying the sub-character region through the character identification engine to obtain an initial identification result corresponding to the sub-character region, wherein the initial identification result at least comprises one alternative character and a confidence coefficient corresponding to the alternative character;

and determining a preset number of alternative characters with the highest confidence degrees from the initial recognition results as the recognition results according to the confidence degrees.

9. The method of claim 1, wherein after determining that the image information is an illegal image containing illegal field information, the method further comprises:

Acquiring a preset first image feature vector, wherein the first image feature vector is used for representing the image features of a specific type of image;

Extracting a feature vector of the illegal image to obtain a second image feature vector;

Comparing the second image feature vector with the first image feature vector to determine the image type of the illegal image, wherein the image type at least comprises: chat screenshots, non-chat screenshots.

10. the method according to any one of claims 1 to 9, wherein before extracting region feature values from the acquired image information according to color channels to obtain region feature values of respective regions in the image information corresponding to the color channels, the method further comprises:

Acquiring the image resolution of the image information;

Comparing the image resolution with a preset standard image resolution;

And when the image resolution is not equal to the standard image resolution, scaling the image information according to the standard image resolution in an equal proportion.

11. an apparatus for recognizing an image, comprising:

The first extraction module is used for extracting a region characteristic value from the acquired image information according to a color channel to obtain the region characteristic value of each region corresponding to the color channel in the image information;

the screening module is used for screening each region in the image information according to the region characteristic value and a preset first threshold value to determine a text region in the image information;

The identification module is used for carrying out character identification on the text area in the image information to obtain field information corresponding to the text area;

The matching module is used for matching the field information with a preset illegal keyword set and determining whether the image information is an illegal image containing illegal field information;

The identification device is further configured to split the field information according to a preset word splitting rule before matching the field information with a preset illegal keyword set and determining whether the image information is an illegal image containing illegal field information, so as to obtain a word splitting set corresponding to the field information, where the word splitting set is used for recording word splitting entries obtained after splitting the field information;

The matching module is also used for acquiring a preset illegal keyword set; acquiring a weight value corresponding to each illegal keyword in the illegal keyword set; matching the word segmentation entries in the word segmentation set with the illegal keywords in the illegal keyword set respectively to obtain matching results; performing weighting operation according to the weight value corresponding to the illegal keyword and the matching result to obtain the field weight of the field information; and comparing the field weight with a preset second threshold value to determine whether the image information is the illegal image containing the illegal field.