CN109977859B

CN109977859B - Icon identification method and related device

Info

Publication number: CN109977859B
Application number: CN201910228432.0A
Authority: CN
Inventors: 黎伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2022-11-18
Anticipated expiration: 2039-03-25
Also published as: CN109977859A

Abstract

The application discloses a method for identifying an icon, which comprises the following steps: randomly acquiring a P frame image to be detected from a video to be detected; performing edge detection on an image to be detected in the P frames of images to be detected to obtain a target edge detection image set, wherein the target edge detection image set comprises M target edge detection images, and each target edge detection image is obtained by fusing P edge detection images; determining an icon area according to the target edge detection graph set; determining an icon in a video to be detected according to the P frame image to be detected and the icon area; and matching the icon with a preset icon set to obtain an icon identification result of the video to be detected, wherein the preset icon set comprises at least one preset icon. The application also discloses a device. This application utilizes random sampling can increase the variety that the station caption background changes, reaches better sampling effect, is applicable to the detection to static station caption and dynamic station caption to promote the discernment rate of accuracy.

Description

Icon identification method and related device

Technical Field

The present application relates to the field of image processing, and in particular, to a method and a related apparatus for icon identification.

Background

With the proliferation of a large amount of video information into the real life of people, video station caption detection has become an effective means for video source analysis. The distributor of the video can be easily determined through the station caption of the video, and the specific program can be positioned through the station caption in the program. These important semantic information are used to provide accurate video searching. In addition, advertisement fragments can be removed by detecting station marks in the video programs, and therefore the ornamental value is improved. In the field of video security, the video station caption detection technology can effectively determine the video source.

Currently, an Optical Character Recognition (OCR) method can be used to detect and recognize the station logo. When the user switches programs, a station caption with characters appears on a video picture. There is a delay of a short time before the station caption is displayed, OCR character recognition can be carried out on the station caption in the short time, and the type of the station caption is directly judged based on characters.

However, as the types of video continue to increase, more and more station captions emerge. The station captions often have some special effects, for example, the station caption of a type A can continuously shake, the subtitle of the station caption of a type B can disappear after gradually rolling, and the subtitle of the station caption of a type B can alternately appear at the upper left corner and the lower right corner of a video, the image and the characters of the station caption of a type C can continuously rotate, and the like. These types of station captions change over time and may also be referred to as moving picture station captions. The OCR-based method is used for identifying the moving picture station caption, the accuracy rate is low, and the method is not suitable for a pure image dynamic station caption, so that the application range is small.

Disclosure of Invention

The embodiment of the application provides an icon identification method and a related device, on one hand, the diversity of station caption background change can be increased by utilizing random sampling, and a better sampling effect is achieved, on the other hand, a plurality of frames of video images are fused, so that a dynamic station caption can be changed into a relatively stable static station caption, and then the static station caption is identified, therefore, the icon identification method and the device are suitable for detection of the static station caption and the dynamic station caption, and the identification accuracy is improved.

In view of this, a first aspect of the present application provides a method for icon identification, including:

randomly obtaining P frames of images to be detected from a video to be detected, wherein the video to be detected comprises Q frames of video images, Q is an integer greater than 1, and P is an integer greater than or equal to 1 and less than or equal to Q;

performing edge detection on an image to be detected in the P frames of images to be detected to obtain a target edge detection map set, wherein the target edge detection map set comprises M target edge detection maps, each target edge detection map is obtained by fusing P edge detection maps, and M is an integer greater than or equal to 1;

determining an icon area according to the target edge detection graph set;

determining an icon in the video to be detected according to the P frame image to be detected and the icon area;

and matching the icon with a preset icon set to obtain an icon identification result of the video to be detected, wherein the preset icon set comprises at least one preset icon.

A second aspect of the present application provides an icon identifying apparatus, including:

the device comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for randomly acquiring P frames of images to be detected from a video to be detected, the video to be detected comprises Q frames of video images, Q is an integer greater than 1, and P is an integer greater than or equal to 1 and less than or equal to Q;

the detection module is used for carrying out edge detection on the image to be detected in the P frames of images to be detected acquired by the acquisition module to obtain a target edge detection map set, wherein the target edge detection map set comprises M target edge detection maps, each target edge detection map is obtained by fusing P edge detection maps, and M is an integer greater than or equal to 1;

the determining module is used for determining a station caption area according to the target edge detection image set obtained by the detection module;

the determining module is further configured to determine an icon in the video to be detected according to the P-frame image to be detected and the icon area;

and the identification module is used for matching the icon determined by the determination module with a preset icon set to obtain an icon identification result of the video to be detected, wherein the preset icon set comprises at least one preset icon.

In one possible design, in a first implementation manner of the second aspect of the embodiment of the present application, the icon identifying apparatus further includes a dividing module and an extracting module;

the dividing module is used for dividing each frame of image to be detected in the P frames of images to be detected to obtain a plurality of image areas corresponding to each frame of image to be detected before the detecting module carries out edge detection on the image to be detected in the P frames of images to be detected to obtain a target edge detection image set;

the extraction module is used for extracting M image areas corresponding to each frame of image to be detected from a plurality of image areas corresponding to each frame of image to be detected obtained by division of the division module, wherein the M image areas are used for edge detection.

In one possible design, in a second implementation of the second aspect of the embodiments of the present application,

the detection module is specifically configured to perform edge detection on a target image area in the M image areas of each frame of image to be detected to obtain P edge detection maps corresponding to the target image area, where the target image area belongs to any one of the M image areas;

determining a target edge detection image corresponding to the target image area according to the P edge detection images corresponding to the target image area;

and when the target edge detection images corresponding to the M image areas are obtained, obtaining the target edge detection image set.

In one possible design, in a third implementation of the second aspect of the embodiment of the present application,

the determining module is specifically configured to perform histogram statistics on each target edge detection map in the target edge detection map set to obtain M statistical results, where the histogram statistics is used to perform statistics in a horizontal direction and a vertical direction on the target edge detection map;

respectively judging whether each statistical result in the M statistical results meets the station caption region extraction condition;

if at least one statistical result in the M statistical results meets the statistical threshold, determining that the target edge detection graph set has a station caption area;

and if no statistical result in the M statistical results meets the statistical threshold, determining that the station caption area does not exist in the target edge detection graph set.

In one possible design, in a fourth implementation of the second aspect of the embodiment of the present application,

the determining module is specifically used for determining the image score of the image to be matched in the station caption area according to the P frame image to be detected;

and if the image score of the image to be matched is greater than or equal to the station caption image threshold value, determining that the station caption image exists in the video to be detected.

In one possible design, in a fifth implementation of the second aspect of the embodiments of the present application,

the identification module is specifically configured to acquire a local feature set to be matched of the station caption image, where the local feature set to be matched includes at least one local feature to be matched;

acquiring a local feature set of each preset icon in the preset icon set, wherein the local feature set comprises at least one local feature;

determining a candidate logo image set from the preset icon set through a k nearest neighbor algorithm according to the local feature set to be matched and the local feature set of each preset icon, wherein the candidate logo image set comprises N candidate logo images, and N is an integer greater than or equal to 1;

comparing the station logo image with each candidate station logo image in the candidate station logo image set to obtain a matching point set of each candidate station logo image, wherein the matching point set comprises at least one matching point, and the matching point represents a feature point of successful matching between the candidate station logo image and the station logo image;

calculating to obtain N similarity scores according to the matching point set of each candidate station caption image and the local feature set of each candidate station caption image;

and determining a target station caption image of the video to be detected from the candidate station caption image set according to the maximum value of the similarity scores in the N similarity scores.

In one possible design, in a sixth implementation of the second aspect of the embodiments of the present application,

the identification module is specifically used for 1) obtaining one local feature to be matched in the local feature set to be matched;

2) Acquiring K candidate features closest to the local feature to be matched from the local feature set of each preset icon according to the local feature to be matched, wherein K is an integer greater than or equal to 1;

repeatedly executing the step 1) to the step 2) until the candidate feature of each local feature to be matched in the local feature set to be matched is obtained;

and acquiring the candidate station caption image set according to the candidate characteristics of each local characteristic to be matched in the local characteristic set to be matched.

In one possible design, in a seventh implementation of the second aspect of the embodiment of the present application,

the identification module is specifically configured to match each local feature to be matched in the station caption image with each local feature of each candidate station caption image to obtain a projection matrix, where the projection matrix represents a position coordinate of the station caption image after projection;

and determining a pairing point set of each candidate station caption image according to the station caption image and the projection matrix.

In one possible design, in an eighth implementation of the second aspect of the embodiments of the present application,

the identification module is specifically configured to calculate the similarity score in the following manner:

the score represents the similarity score, the a represents an area union set corresponding to a matching point set of the candidate station caption image, and the B represents an area union set corresponding to a local feature set of the candidate station caption image.

In one possible design, in a ninth implementation manner of the second aspect of the embodiment of the present application, the icon identifying apparatus further includes a processing module and an extracting module;

the acquisition module is further configured to acquire a to-be-processed video set before the identification module matches the icon with a preset icon set and acquires an icon identification result of the to-be-detected video, wherein the to-be-processed video set includes at least one to-be-processed video;

the detection module is further configured to detect each to-be-processed video in the to-be-processed video set acquired by the acquisition module to obtain a to-be-processed logo image set, where the to-be-processed logo image set includes at least one to-be-processed logo image, and at least one to-be-processed logo image in the to-be-processed logo image set corresponds to the same identifier;

the processing module is used for processing the station caption images to be processed in the station caption image set to be processed, which is detected by the detection module, so as to obtain the preset icon set;

the extraction module is configured to perform feature extraction on each preset icon in the preset icon set obtained through processing by the processing module to obtain a local feature set of each preset icon, where the local feature set includes at least one local feature, and the local feature includes a feature point position coordinate and feature information.

In one possible design, in a tenth implementation of the second aspect of the embodiment of the present application,

the processing module is specifically configured to, when a first processing instruction is received, remove a first to-be-processed station caption image from the to-be-processed station caption image set according to the first processing instruction, where the first processing instruction carries an identifier of the first to-be-processed station caption image;

and when a second processing instruction is received, adjusting a second station caption image to be processed in the station caption image set to be processed according to the second processing instruction to obtain a preset icon in the preset icon set, wherein the second processing instruction carries an identifier of the second station caption image to be processed.

A third aspect of the present application provides a server comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory and comprises the following steps:

determining an icon area according to the target edge detection graph set;

matching the icon with a preset icon set to obtain an icon identification result of the video to be detected, wherein the preset icon set comprises at least one preset icon;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

The fourth aspect of the present application provides a terminal device, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is configured to execute the program in the memory, and includes the steps of:

determining an icon area according to the target edge detection graph set;

A fifth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above-described aspects.

According to the technical scheme, the embodiment of the application has the following advantages:

in the embodiment of the application, a method for identifying icons is provided, first, a P-frame image to be detected needs to be randomly acquired from a video to be detected, then edge detection is performed on the image to be detected in the P-frame image to be detected, a target edge detection map set is obtained, the target edge detection map set comprises M target edge detection maps, each target edge detection map is obtained after the P edge detection maps are fused, an icon area is determined according to the target edge detection map set, icons in the video to be detected are determined according to the P-frame image to be detected and the icon area, the icons are matched with a preset icon set, and an icon identification result of the video to be detected is acquired, wherein the preset icon set comprises at least one preset icon. Through the mode, the multiframe video images of random sampling are fused, on one hand, the diversity of station caption background change can be increased by utilizing random sampling, a better sampling effect is achieved, on the other hand, the multiframe video images are fused, the dynamic icons can be changed into relatively stable static icons, then the static icons are identified, and therefore the method and the device are suitable for detection of the static icons and the dynamic icons, and therefore the identification accuracy rate is improved.

Drawings

FIG. 1 is a block diagram of an embodiment of an icon recognition system;

FIG. 2 is a schematic overall flow chart of an icon identification system in an embodiment of the present application;

FIG. 3 is a schematic diagram of an embodiment of a method for icon identification in an embodiment of the present application;

FIG. 4 is a schematic diagram of an embodiment of M image regions of an image to be detected in an embodiment of the present application;

FIG. 5 is a schematic diagram of another embodiment of M image regions of an image to be detected in the embodiment of the present application;

FIG. 6 is a schematic diagram of another embodiment of M image regions of an image to be detected in the embodiment of the present application;

FIG. 7 is a schematic diagram of an embodiment of a method for generating a target edge detection map in an embodiment of the present application;

FIG. 8 is a schematic diagram of an embodiment of performing histogram statistics on a target edge detection graph in an embodiment of the present application;

FIG. 9 is a schematic view of a logo image according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an embodiment of a similarity alignment in an embodiment of the present application;

FIG. 11 is a diagram illustrating different display forms of the same station caption in an embodiment of the present application;

FIG. 12 is a schematic diagram of an embodiment of an icon identifying apparatus in the embodiment of the present application;

FIG. 13 is a schematic diagram of another embodiment of an icon identifying apparatus in the embodiment of the present application;

FIG. 14 is a schematic diagram of another embodiment of an icon identifying apparatus in the embodiment of the present application;

FIG. 15 is a schematic diagram of an embodiment of a server in an embodiment of the present application;

fig. 16 is a schematic diagram of an embodiment of a terminal device in the embodiment of the present application.

Detailed Description

The embodiment of the application provides an icon identification method and a related device, on one hand, the diversity of icon background change can be increased by utilizing random sampling, a better sampling effect is achieved, on the other hand, a plurality of frames of video images are fused, dynamic icons can be changed into relatively stable static icons, and then the static icons are identified, so that the method and the device are suitable for detecting the static icons and the dynamic icons, and the identification accuracy is improved.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that the station caption identification method provided by the application can be used in the fields of artificial intelligence, video image analysis, retrieval and the like. The station logo (logo) refers to the logo and identification of a television station or a video website, and is usually hung at the corner position such as the upper left corner, the upper right corner or the lower right corner of a video. Station caption detection and identification are implemented by automatically acquiring station caption images from videos by adopting an image processing and identification technology and automatically classifying the station caption images into a known certain station caption. With the rapid development of television technology and internet technology, as many as dozens or hundreds of programs are transmitted to television stations at all levels by means of satellites, microwaves and the like, and thousands of video contents are transmitted to terminal equipment used by users in a network manner.

In order to monitor television programs and network videos in real time, the icon identification method is provided, the detection rate of static station captions and dynamic station captions is high, manual operation is not needed in the detection process, and human resources are saved. It is known that ensuring the security of tv programs and network videos and preventing illegal inter cut and interference is a very important task for secure broadcasting. Illegal inter cut and illegal signal intrusion can be monitored by identifying the station caption, and then real-time alarm is given to the multi-channel television program and the network video according to the detection result of the station caption, so that the illegal inter cut and intrusion are effectively prevented, the labor intensity of workers is reduced, and misoperation is avoided.

Specifically, the illegal video interception system based on station caption detection and identification can manually collect relevant illegal videos with station captions, generate an index library, automatically detect and identify the station captions of videos needing to be checked on a network, intercept the videos if the videos hit the station captions in the index library, and release the videos if the videos hit the station captions in the index library. In addition, the network videos can be classified according to the station caption, videos with the station caption to be classified are collected (only a small amount of videos are needed in each type), a station caption index database is established, a large amount of videos which are not marked on the network are automatically identified, and the videos are classified according to the station caption.

For convenience of understanding, the present application provides an icon identification method, which is applied to an icon identification system shown in fig. 1, please refer to fig. 1, where fig. 1 is an architecture schematic diagram of the icon identification system in an embodiment of the present application, as shown in the figure, the station caption identification method provided in the present application may be used in a server or a terminal device, and will be described below with reference to fig. 2 by taking the application to the server as an example. Referring to fig. 2, fig. 2 is a schematic overall flow chart of the icon identification system in the embodiment of the present application, and as shown in the figure, the whole flow of station caption identification may be divided into two parts, one is an offline library building process, and the other is online station caption detection and identification. In the off-line library building process, the method specifically comprises the following steps:

in the step A1, a server obtains manually collected related videos, where the videos carry station captions, where the videos are videos played on a client, and it needs to be noted that the client is deployed on a terminal device, where the terminal device includes but is not limited to a tablet computer, a notebook computer, a palm computer, a mobile phone, a voice interaction device, and a Personal Computer (PC), and this is not limited herein;

in the step A2, the server detects the station caption in the video, namely extracts the corresponding station caption from the video;

in the step A3, the server may automatically clean the extracted station caption, where the purpose of cleaning is to remove some images that do not belong to the station caption, and certainly, in practical applications, the server may also manually remove images that do not belong to the station caption, and in any case, the purpose is to obtain the station caption image;

in the step A4, after the server acquires the video, the station caption images can be acquired completely in a manual mode, namely, each station caption image is manually cut;

in the step A5, after a station caption image corresponding to the video is obtained, local Feature extraction is performed on the station caption image, for example, scale-Invariant Feature Transform (SIFT) Feature extraction is performed;

in the step A6, a fast index is established for the extracted features, for example, a SIFT feature is given, and other SIFT features closest to the SIFT feature are quickly found through the fast index, so that the off-line library establishment process is completed.

In the process of detecting and identifying the online station caption, the method specifically comprises the following steps:

in the step B1, firstly, a video to be detected is obtained;

in the step B2, the station caption detection is carried out on the video to be detected by the server so as to obtain a station caption image, and at the moment, the station caption detection is automatic detection without manually extracting the station caption image;

in step B3, the server extracts local features from the detected logo image, and it can be understood that the types of the feature extraction process and the feature extraction process in step A5 are not described herein again;

in the step B4, the server carries out coarse matching on the local features in the video to be detected and the local features in the local feature quick index library;

in step B5, the server retrieves the most similar N station caption images from the established local feature quick index library;

in the step B6, the server compares the similarity of 1 to 1 of the N station caption images, and selects the station caption image with the highest similarity score;

in step B7, if the similarity score is greater than the given threshold, the output station caption identification result is the station caption a, otherwise, if the similarity score is less than the given threshold, it indicates that the station caption of the video to be detected is not in the station caption library.

With reference to fig. 3, an embodiment of the icon identification method in the embodiment of the present application includes:

101. randomly obtaining P frames of images to be detected from a video to be detected, wherein the video to be detected comprises Q frames of video images, Q is an integer larger than 1, and P is an integer larger than or equal to 1 and smaller than or equal to Q;

in this embodiment, the icon recognition device obtains a video to be detected, where the video to be detected may be a network video or a television program, and the like, and this is not limited here. The video to be detected comprises Q frames of video images, and P frames of images to be detected are randomly selected from the Q frames of video images, for example, 1000 frames of video images are in the video to be detected, and 64 video images can be randomly selected from the video to be detected as the images to be detected.

There are various methods for randomly acquiring the P frame to-be-detected image, for example, a random function is used to generate the P frame to-be-detected image, which may specifically be a rand () function of C language, or a rand () function in MATLAB, etc. The random number is a random number sequence which is uniformly distributed and calculated by a deterministic algorithm from [0,1], and has statistical characteristics similar to the random number, such as uniformity, independence and the like. When calculating a pseudo random number, the order of the random numbers is not changed if the initial value used is not changed. Random numbers can be generated in large numbers by a computer.

In the prior art, a frame of image is often extracted from a video to be detected, or images of a continuous period of time are extracted, the frame extraction method may not extract video images with station marks at all for moving picture station marks, because some videos have station marks after being played for a period of time, the efficiency of a system is greatly reduced if all video images in the video to be detected are extracted, and for longer videos, a large amount of time is consumed for decoding all image videos. On the other hand, in the method of taking frames at equal intervals, for some moving picture station captions, all video images at the time of disappearance of the station caption may be taken. Therefore, the random frame taking can increase the change of the station caption background, and the station caption after the superposition of multiple frames can reduce the influence of the background.

102. Performing edge detection on an image to be detected in the P frames of images to be detected to obtain a target edge detection map set, wherein the target edge detection map set comprises M target edge detection maps, each target edge detection map is obtained by fusing P edge detection maps, and M is an integer greater than or equal to 1;

in this embodiment, the icon recognition device performs edge detection on each frame of to-be-detected images in the P frames of to-be-detected images, where each frame of to-be-detected images includes an edge detection graph, and the edge detection graph is an image obtained by edge detection in the to-be-detected images. Then, the edge detection maps in each frame of image to be detected are fused, namely the P frame edge detection maps are fused, so that the target edge detection map is obtained. It can be understood that, a plurality of edge detection maps may exist in one frame of image to be detected, and P edge detection maps at the same position are fused to obtain a target edge detection map. And when the P edge detection images on each position are fused, obtaining a target edge detection image set.

The purpose of edge detection is to identify points with obvious brightness change in the digital image, the edge detection of the image can greatly reduce the data volume, eliminate information which can be considered irrelevant, and retain important structural attributes of the image. There are many methods for edge detection, including one based on finding and one based on zero crossing. Where the search-based approach detects the boundary by finding the maximum and minimum values in the first derivative of the image, the boundary is typically positioned in the direction where the gradient is largest. Whereas zero crossing-based methods find boundaries by finding zero crossings of the second derivative of the image, typically Laplacian zero crossings or non-linear differential representation zero crossings.

103. Determining an icon area according to the target edge detection graph set;

in this embodiment, the icon identification apparatus detects each target edge detection map in the target edge detection map set, and determines whether an icon area exists, where the icon area may specifically be a station logo area. If the station caption area is detected, the icon recognition device continuously detects whether the station caption image exists in the video to be detected according to the P frame image to be detected and the station caption area. Otherwise, if the station caption area is not detected, the video to be detected is considered to not include the station caption image.

104. Determining an icon in a video to be detected according to the P frame image to be detected and the icon area;

in this embodiment, the icon recognition device detects an icon in the video to be detected according to the P-frame image to be detected and the icon area, where the icon may be a station caption image.

105. And matching the icon with a preset icon set to obtain an icon identification result of the video to be detected, wherein the preset icon set comprises at least one preset icon.

In this embodiment, the icon recognition apparatus detects whether the station caption image to be detected exists in the station caption area according to the P-frame image to be detected and the station caption area, and if the station caption image exists, the station caption image may be matched with each preset icon in the preset icon set, and the preset icon with the highest matching degree is selected as the icon recognition result of the video to be detected according to the matching result. If the preset icon which is not matched is determined after the station caption image is matched with each preset icon in the preset icon set, the result of failed station caption identification can be used as the icon identification result of the video to be detected.

In the embodiment of the application, a method for identifying icons is provided, first, a P-frame image to be detected needs to be randomly acquired from a video to be detected, then edge detection is performed on the image to be detected in the P-frame image to be detected, a target edge detection map set is obtained, the target edge detection map set comprises M target edge detection maps, each target edge detection map is obtained after the P edge detection maps are fused, an icon area is determined according to the target edge detection map set, icons in the video to be detected are determined according to the P-frame image to be detected and the icon area, the icons are matched with a preset icon set, and an icon identification result of the video to be detected is acquired, wherein the preset icon set comprises at least one preset icon. Through the mode, the multiframe video images of random sampling are fused, on the one hand, the diversity of station caption background changes can be increased by utilizing random sampling, a better sampling effect is achieved, on the other hand, the multiframe video images are fused, dynamic icons can be changed into relatively stable static icons, and then the static icons are identified, so that the method is suitable for detection of the static icons and the dynamic icons, and the identification accuracy is improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in a first optional embodiment of the icon identification method provided in the embodiment of the present application, before performing edge detection on each frame of to-be-detected image in a P frame of to-be-detected image to obtain a target edge detection atlas, the method may further include:

dividing each frame of image to be detected in the P frames of images to be detected to obtain a plurality of image areas corresponding to each frame of image to be detected;

and extracting M image areas corresponding to each frame of image to be detected from a plurality of image areas corresponding to each frame of image to be detected, wherein the M image areas are used for edge detection.

In this embodiment, a dividing manner of an image to be detected will be described, for the whole image to be detected, before performing edge detection, an icon recognition device may divide the image to be detected, for convenience of description, please refer to fig. 4, where fig. 4 is an exemplary illustration of M image regions of the image to be detected in this embodiment of the present application, as shown in the figure, it is assumed that each frame of the image to be detected in the P frame of image to be detected is divided into 4 × 4 equal parts, each part is an image region, and the image to be detected includes 16 image regions. It is understood that, in practical applications, other division ratios can be designed, such as an average division into 5 × 5 equal parts, or an average division into 4 × 5 equal parts, which is only an illustration here and should not be construed as a limitation to the present application.

Next, the icon recognition device extracts M image areas corresponding to each frame of the image to be detected from the image areas corresponding to each frame of the image to be detected, and please refer to fig. 4, it is assumed that the image to be detected is divided into 16 image areas, and at this time, it is necessary to select a plurality of image areas from the 16 image areas for subsequent processing. Considering that a logo image tends to exist in several corners of a video picture, there are a top left corner (i.e., image area No. 1 as shown in fig. 4), a top right corner (i.e., image area No. 2 as shown in fig. 4), a bottom left corner (i.e., image area No. 3 as shown in fig. 4), and a bottom right corner (i.e., image area No. 4 as shown in fig. 4). Therefore, the subsequent operation can be performed by taking only the image areas of the four corners, i.e., M takes 4. It is understood that, in practical applications, other values of M may also be taken, and the selected M image regions may also be on other positions of the image to be detected. For example, referring to fig. 5, fig. 5 is a schematic view of another embodiment of M image regions of an image to be detected in the embodiment of the present application, and as shown in the figure, it is assumed that image regions of M top edge portions of the image to be detected are selected, that is, image region No. 1 shown in fig. 5, image region No. 2 shown in fig. 4, image region No. 3 shown in fig. 4, and image region No. 4 shown in fig. 4, respectively, where M is set to 4.

For another example, referring to fig. 6, fig. 6 is a schematic diagram of another embodiment of M image regions of an image to be detected in the embodiment of the present application, and as shown in the figure, it is assumed that M upper-right image regions of the image to be detected, i.e., image region No. 1 shown in fig. 6, are selected, and M is set to 1.

After extracting M image regions corresponding to each frame of image to be detected, the icon identifying device may perform edge detection on each of the M image regions. Assuming that 64 frames of images to be detected exist, 4 image regions are extracted from each frame of image to be detected for edge detection, and then edge detection is required to be performed on 256 image regions in total.

Secondly, in the embodiment of the application, a dividing mode of the image to be detected is provided, before edge detection is performed on each frame of image to be detected in the P frames of image to be detected to obtain a target edge detection image set, each frame of image to be detected in the P frames of image to be detected can be divided to obtain a plurality of image areas corresponding to each frame of image to be detected, and then M image areas corresponding to each frame of image to be detected are extracted from the plurality of image areas corresponding to each frame of image to be detected. Through the mode, the image to be detected is reasonably divided to form a plurality of operable areas, so that subsequent operation is facilitated, and the flexibility and operability of the scheme are improved.

Optionally, on the basis of the first embodiment corresponding to fig. 3, in a second optional embodiment of the icon identification method provided in the embodiment of the present application, performing edge detection on each frame of to-be-detected image in the P frames of to-be-detected images to obtain a target edge detection map set, where the method may include:

performing edge detection on a target image area in M image areas of each frame of image to be detected to obtain P edge detection images corresponding to the target image area, wherein the target image area belongs to any one of the M image areas;

determining a target edge detection image corresponding to the target image region according to the P edge detection images corresponding to the target image region;

and when the target edge detection graphs corresponding to the M image areas are obtained, obtaining a target edge detection graph set.

In this embodiment, how to generate a target edge detection map set is described, for convenience of description, please refer to fig. 7, where fig. 7 is a schematic view of an embodiment of generating a target edge detection map in this embodiment of the present application, and as shown in the figure, one of P frames of images to be detected is described as an example, it is understood that the processing manners of other images to be detected are similar, and thus, details are not repeated here. Assume that M is 4, that is, the M image areas are image area No. 1, image area No. 2, image area No. 3, and image area No. 4 shown in fig. 7. Taking any one of the M image regions as an example for explanation, the image region is a target image region, and it can be understood that the processing manner of other image regions in the M image regions is similar to that of the target image region, and therefore details are not repeated here. Assuming that the target image area is the image area No. 1 shown in fig. 7, at this time, edge detection is performed on the image area No. 1 to obtain an edge detection map, and since P frames of to-be-detected images are shared and edge detection is performed on the image area No. 1 in each frame of to-be-detected image, P edge detection maps, that is, P edge detection maps corresponding to the image area No. 1 shown in fig. 7, can be obtained. And superposing the P edge detection maps corresponding to the image area No. 1 to obtain a target edge detection map corresponding to the image area No. 1, namely the target edge detection map a shown in fig. 7. And performing similar processing on the No. 2 image area, the No. 3 image area and the No. 4 image area to obtain a target edge detection image B corresponding to the No. 2 image area, a target edge detection image C corresponding to the No. 3 image area and a target edge detection image D corresponding to the No. 4 image area. And when the target edge detection graphs corresponding to the M image areas are obtained, obtaining a target edge detection graph set. That is, when the target edge detection map a, the target edge detection map B, the target edge detection map C, and the target edge detection map D are acquired, it is considered that a target edge detection map set is obtained.

The task of edge detection is to find a set of pixels with step or roof changes. The edge is a boundary of different regions, is a set of pixels with significant changes of surrounding pixels, and has two attributes of amplitude and direction. A contour is generally considered to be a description of the complete boundary of an object, with edge points connected one to another to form a contour. The edge may be a segment of an edge and the contour is generally complete. The edge detection method adopted by the application comprises but is not limited to Canny operators, roberts operators, sobel operators, prewitt operators, kirsch operators and Robinson operators. For example, an edge detection map can be obtained by using Canny operator for edge detection, and specifically, the edge detection process based on Canny operator is as follows:

firstly, converting a color image into a gray image;

secondly, performing Gaussian blur on the image;

thirdly, calculating image gradient, calculating the edge amplitude and angle of the image according to the gradient, and calculating the gradient amplitude direction by using a differential edge detection operator;

fourthly, adopting a non-maximum signal to carry out pressing treatment, namely carrying out edge thinning;

fifthly, processing double-threshold edge connection;

and sixthly, outputting a result by the binary image.

In the embodiment of the present application, a method for obtaining a target edge detection map set is provided, where edge detection is performed on a target image region in M image regions of each frame of image to be detected, to obtain P edge detection maps corresponding to the target image region, then the target edge detection map corresponding to the target image region is determined according to the P edge detection maps corresponding to the target image region, and when the target edge detection maps corresponding to the M image regions are obtained, the target edge detection map set can be obtained. By the method, the edge detection can be performed on the image area of the part extracted from the image to be detected, and the edge detection is not required to be performed on the whole image to be detected, so that the calculated amount is reduced, and the detection efficiency is improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in a third optional embodiment of the method for identifying an icon provided in the embodiment of the present application, determining an icon area according to the target edge detection atlas may include:

performing histogram statistics on each target edge detection map in the target edge detection map set to obtain M statistical results, wherein the histogram statistics is used for performing statistics in the horizontal direction and the vertical direction on the target edge detection map;

if at least one statistical result in the M statistical results meets the station caption region extraction condition, determining that the target edge detection image set exists in the station caption region;

and if no statistical result in the M statistical results meets the station caption region extraction condition, determining that the station caption region does not exist in the target edge detection image set.

In this embodiment, a method for detecting whether a station caption region exists will be described, where if M target edge detection maps exist in a target edge detection map set, histogram statistics is performed on each target edge detection map, and based on a given statistical threshold, portions with statistical results smaller than the statistical threshold are removed along left and right edges and upper and lower edges, so as to obtain a more compact station caption region. In practical application, the above operation needs to be performed on each target edge detection map, and if at least one statistical result in the statistical results corresponding to the M target edge detection maps meets the station caption region extraction condition, the station caption region meeting the subsequent processing is determined.

Specifically, for convenience of introduction, please refer to fig. 8, where fig. 8 is a schematic diagram of an embodiment of performing histogram statistics on an object edge detection graph in the embodiment of the present application, and as shown in the figure, taking an object edge detection graph as an example, it is assumed that the object edge detection graph is an image of 5 × 5 pixels, where black represents 0, gray represents 1, the number of gray pixels counted in the horizontal direction is sequentially 1, 4, 3, 5, and 2, the number of gray pixels counted in the vertical direction is sequentially 2, 4, 2, 5, and 2, and if the statistical threshold is 3, pixels smaller than 3 are rejected along the left and right edges, that is, two columns with left and right edges having a value of 2 are unsatisfactory, and regions smaller than 3 are rejected along the upper and lower edges, that is, two rows with values of 1 and 2 are rejected, and finally, a middle 3 × 3 station mark region is obtained. Then, it is determined that the statistical result satisfies the station mark region extraction condition, that is, the station mark image exists in the target edge detection image. And if the statistical results are all smaller than the statistical threshold, the target edge detection image is considered to have no station caption image.

It will be appreciated that the third column in fig. 8, although 2, is not eliminated because it stops when it hits the second and fourth columns.

Secondly, in the embodiment of the application, a method for detecting whether the station caption area exists is provided. Before detecting whether a station caption image exists in a video to be detected, histogram statistics can be carried out on each target edge detection image in a target edge detection image set to obtain M statistical results, whether each statistical result in the M statistical results meets a station caption region extraction condition is judged respectively, if at least one statistical result in the M statistical results meets the station caption region extraction condition, the fact that the station caption region exists in the target edge detection image set is determined, and if no statistical result in the M statistical results meets the station caption region extraction condition, the fact that the station caption region does not exist in the target edge detection image set is determined. By the method, under the condition of giving the statistical threshold, histogram statistics is carried out on the target edge detection image respectively along the horizontal direction and the vertical direction, and the part of the histogram smaller than the statistical threshold is removed, so that a compact station caption area can be obtained, and the feasibility of the scheme is improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in a fourth optional embodiment of the icon identification method provided in the embodiment of the present application, determining the logo image in the video to be detected according to the image to be detected in the P frame and the logo area may include:

determining the image score of an image to be matched in the station logo area according to the P frames of images to be detected;

In this embodiment, a method for detecting whether a station caption image exists in a video to be detected is introduced, and an average value is obtained on a P-frame image to be detected based on a detected station caption area. Specifically, assuming that a station caption region exists currently, the station caption region is an a region of the image to be detected, and then the a region corresponding to each frame of the image to be detected on the P frames of the image to be detected is superimposed. If 64 frames of images to be detected are supposed, the detection result of the station caption area corresponding to each frame of image to be detected is 1 or 0, if the detection result of 50 frames of station caption areas is 1 and the detection result of the remaining 14 frames of station caption areas is 0, the image score of the image to be matched is calculated to be 0.83. And judging whether the image score is greater than or equal to the station caption image threshold value or not according to the image score of the image to be matched, and if the station caption image threshold value is 0.5, determining that the image score of the image to be matched is greater than the station caption image threshold value, thereby determining that the station caption image exists in the video to be detected. Fig. 9 shows a detected station caption image, please refer to fig. 9, and fig. 9 is a schematic diagram of the station caption image according to the embodiment of the present application, in which the station caption image may include a pattern or a text.

And otherwise, if the image score of the image to be matched is smaller than the station caption image threshold value, determining that the station caption image does not exist in the video to be detected. For a video to be detected, assuming that M image regions all satisfy the station caption region extraction condition, it indicates that the video to be detected has M station caption regions, and therefore, the number of detected station caption images is at most M (that is, each station caption region has a station caption image), and the number of detected station caption images is at least 0 (that is, there is no station caption image in the video to be detected).

Secondly, in the embodiment of the application, a method for detecting whether a station caption image exists in a video to be detected is provided, firstly, the image score of the image to be matched in the station caption area is determined according to the P frame image to be detected, and if the image score of the image to be matched is larger than or equal to a station caption image threshold value, the station caption image exists in the video to be detected. By the mode, for a video to be detected, the number of the detected station caption images is at least 0, namely, no station caption image exists in the video to be detected, the number of the detected station caption images is at most one station caption image in each station caption area, and the image to be matched, of which the image score is greater than or equal to the threshold value of the station caption image, is determined as the station caption image, so that the detection error rate can be effectively reduced, and a stable station caption image can be obtained.

Optionally, on the basis of the embodiment corresponding to fig. 3, in a fifth optional embodiment of the method for identifying an icon provided in the embodiment of the present application, matching the logo image with the preset icon set to obtain an icon identification result of the video to be detected may include:

acquiring a local feature set to be matched of the station caption image, wherein the local feature set to be matched comprises at least one local feature to be matched;

acquiring a local feature set of each preset icon in a preset icon set, wherein the local feature set comprises at least one local feature;

determining a candidate station caption image set from the preset icon set through a k nearest neighbor algorithm according to the local feature set to be matched and the local feature set of each preset icon, wherein the candidate station caption image set comprises N candidate station caption images, and N is an integer greater than or equal to 1;

comparing the station logo image with each candidate station logo image in the candidate station logo image set to obtain a matching point set of each candidate station logo image, wherein the matching point set comprises at least one matching point, and the matching point represents a characteristic point of successful matching between the candidate station logo image and the station logo image;

In this embodiment, a manner of obtaining a station caption identification result will be described. Firstly, a local feature set to be matched of the station caption image needs to be extracted, where the local feature set to be matched includes at least one local feature to be matched, and it may be understood that the local feature to be matched may specifically be a SIFT feature, and may also be a Speeded Up Robust Features (SURF), or an oriented fast and rotated brief (ORB) feature, and the like, where the SIFT feature is taken as an example for description, but this should not be construed as a limitation to the present application.

Similarly, a local feature set of each preset icon in the preset icon set needs to be obtained, where the local feature set includes at least one local feature, and it is understood that the local feature may specifically be a SIFT feature, a SURF, an ORB feature, or the like, and the SIFT feature is taken as an example for explanation here, which should not be construed as a limitation to the present application.

Based on the above description, assume a presetThe icon set comprises S preset icons, and the number of SIFT feature points extracted from each preset icon set is n _i (i =1,2, …, S), the preset set of icons has a total of

Each SIFT feature point needs to store two types of information, namely, the position coordinate (x) of the feature point _i ,y _i ) First, feature information F of feature points _i Wherein the characteristic information F _i A floating point number of 128 dimensions, corresponding to S in the predetermined icon set _A The local features and the local feature set to be matched may be obtained by using a K-Nearest neighbor Fast indexing method to establish a Fast index of the local features, where the method for establishing the K-Nearest neighbor Fast index includes, but is not limited to, fast Nearest neighbor Search packages (Fast Library for Approximate Nearest Neighbors, FLANN) and a Similarity Search Library (Fast Nearest Search, FAISS).

Namely, a candidate logo image set is determined from a preset icon set through a k-nearest neighbor algorithm, wherein the candidate logo image set comprises N candidate logo images, and N is an integer greater than or equal to 1. Next, the station logo image and each candidate station logo image in the candidate station logo image set need to be compared, specifically, the similarity comparison of 1 to 1 is performed, so as to obtain a matching point set of each candidate station logo image, wherein the matching point set includes at least one matching point, and the matching point represents a feature point of successful matching between the candidate station logo image and the station logo image. And then calculating to obtain N similarity scores according to the matching point set of each candidate station caption image and the local feature set of each candidate station caption image. And finally, from the maximum value of the similarity scores in the N similarity scores, the candidate station caption image corresponding to the maximum value is the target station caption image of the video to be detected.

Secondly, in the embodiment of the application, a method for obtaining a station logo recognition result is provided, which includes the steps of firstly obtaining a local feature set to be matched of a station logo image, then obtaining a local feature set of each preset icon in the preset icon set, determining a candidate station logo image set from the preset icon set through a k nearest neighbor algorithm, then comparing the station logo image with each candidate station logo image in the candidate station logo image set to obtain a matching point set of each candidate station logo image, further calculating N similarity scores according to the matching point set of each candidate station logo image and the local feature set of each candidate station logo image, and finally determining a target station logo image of a video to be detected from the candidate station logo image set according to the N similarity scores. By adopting the mode, the station caption identification result can be obtained by adopting rough matching and fine matching, so that the calculation amount of matching is reduced, and meanwhile, the detection accuracy can be effectively improved, thereby improving the reliability of the scheme.

Optionally, on the basis of the fifth embodiment corresponding to fig. 3, in a sixth optional embodiment of the icon identification method provided in the embodiment of the present application, according to the local feature set to be matched and the local feature set of each preset icon, determining a candidate logo image set from the preset icon set by using a k-nearest neighbor algorithm may include:

1) Acquiring a local feature to be matched in a local feature set to be matched;

2) Acquiring K candidate features closest to the local features to be matched from the local feature set of each preset icon according to the local features to be matched, wherein K is an integer greater than or equal to 1;

and acquiring a candidate station caption image set according to the candidate characteristics of each local characteristic to be matched in the local characteristic set to be matched.

In this embodiment, how to determine a candidate logo image set from a preset icon set through a k-nearest neighbor algorithm will be described, for convenience of description, a local feature to be matched will be taken as an example for description, and it can be understood that processing manners of other local features to be matched in the local feature set to be matched are similar, and therefore details are not repeated here.

Specifically, one to-be-matched local feature in the to-be-matched local feature set is obtained, the to-be-matched local feature may be an SIFT feature, for each to-be-matched local feature in the to-be-matched local feature set, K neighbor retrieval may be performed within a range of a given radius R in a local feature set (i.e., a local feature fast index library) of a preset icon, then K neighbor hit times are arranged in the local feature set of the preset icon from most to least according to station marks, and N candidate station marks with the largest hit times are taken as candidate station mark image sets selected by rough matching.

The number of K neighbors can be any integer, R is an empirical value, and how to obtain the candidate logo image set by using the K-neighbor algorithm will be described in the following by using an example. Assuming that K =5 and assuming that there are 50 preset icons in the preset icon set, i.e. A1, A2, …, a50, 10 local features are extracted from each preset icon, it can be seen that the preset icon set has 50 × 10=500 local features in total. Assuming that the extracted logo image also has 10 local features to be matched, at this time, for the first local feature to be matched of the logo image, the nearest 5 candidate features need to be searched in 500 local features, and if these 5 candidate features are A1, A2, A3, A4 and A5, A1, A2, A3, A4 and A5 are hit by k neighbors 1 times respectively. Next, for the second local feature to be matched of the logo image, the nearest 5 candidate features still need to be searched in 500 local features, if the 5 candidate features at this time belong to A1, a36, A8, a10, and a25, respectively, then A1 has been hit 2 times, and so on, and finally N (for example, N = 3) candidate features with the largest number of hits are selected as the candidate logo image set.

Thirdly, in the embodiment of the present application, a method for determining a candidate station caption image set is provided, that is, one to-be-matched local feature in the to-be-matched local feature set is obtained, then K candidate features closest to the to-be-matched local feature are obtained from the local feature set of each preset icon according to the to-be-matched local feature, the above steps are repeatedly performed until the candidate feature of each to-be-matched local feature in the to-be-matched local feature set is obtained, and finally, the candidate station caption image set is obtained according to the candidate feature of each to-be-matched local feature in the to-be-matched local feature set. By the method, the coarse matching result is obtained by using the k-nearest neighbor algorithm, and the candidate station caption image set is obtained, so that the most possible candidate station caption images can be quickly screened out, all preset icons in the preset icon set are prevented from being traversed, and the station caption detection efficiency is improved.

Optionally, on the basis of the fifth embodiment corresponding to fig. 3, in a seventh optional embodiment of the icon identification method provided in the embodiment of the present application, the comparing the station caption image with each candidate station caption image in the candidate station caption image set to obtain a matching point set of each candidate station caption image may include:

matching each local feature to be matched in the station caption image with each local feature of each candidate station caption image to obtain a projection matrix, wherein the projection matrix represents the position coordinates of the station caption image after projection;

and determining a matching point set of each candidate station caption image according to the station caption image and the projection matrix.

In this embodiment, how to obtain a point matching set of each candidate station caption image is described, a similarity comparison of 1 to 1 is performed on N candidate station caption images in the candidate station caption image set and a currently detected station caption image, and a candidate station caption image with a highest similarity value and larger than a given score threshold value is selected as a final identification result of the station caption image. For convenience of introduction, please refer to fig. 10, where fig. 10 is a schematic view of an embodiment of similarity comparison in the present embodiment, and as shown in the figure, taking a candidate logo image in a candidate logo image set as an example for description, first, each local feature to be matched (for example, SIFT feature) in a currently detected logo image is searched in a corresponding candidate logo image to obtain a local feature (for example, SIFT feature) with a nearest euclidean distance, so as to serve as a matching point. According to the position coordinates of all matched points, a random sample consensus (random sample consensus, ransac) is then used to calculate a corresponding projection matrix H. And projecting the station caption image to the coordinates of the candidate station caption image by adopting a projection matrix H according to the position coordinates of the local features to be matched in the currently detected station caption image. And then calculating the Euclidean distance between the position coordinates of the projection points and the position coordinates of the original corresponding points, eliminating matching points of which the Euclidean distance is greater than a given threshold value, and obtaining a matching point set of the candidate station logo image by remaining all the matching points which are correctly matched.

In the embodiment of the present application, a method for determining a matching point set of each candidate station caption image is provided, where each local feature to be matched in the station caption image may be matched with each local feature of each candidate station caption image to obtain a projection matrix, and then a matching point set of each candidate station caption image is determined according to the station caption image and the projection matrix. By the method, the correct paired point set can be obtained from the candidate station caption image set with less relative data, so that the similarity score between the station caption image and each candidate station caption image is calculated, and the accuracy of station caption matching is improved.

Optionally, on the basis of the seventh embodiment corresponding to fig. 3, in an eighth optional embodiment of the icon identification method provided in the embodiment of the present application, the calculating to obtain N similarity scores according to the pairing-point set of each candidate station caption image and the local feature set of each candidate station caption image may include:

the similarity score is calculated as follows:

wherein score represents the similarity score, a represents an area union set corresponding to the matching point set of the candidate station caption image, and B represents an area union set corresponding to the local feature set of the candidate station caption image.

In this embodiment, a specific way of calculating the similarity score of the candidate logo image will be described. After the matching point set of each candidate logo image is obtained, for convenience of description, the calculation of the similarity score of one candidate logo image will be described as an example, and it can be understood that the calculation of the similarity scores of other candidate logo images is similar, and therefore, the details are not repeated herein.

Specifically, since the local features of the candidate logo image are obtained, each local feature (i.e., feature point) may represent a domain range with a radius r, where r is an empirical value, for example, may take a value of 9 pixels. The similarity score can be calculated using the following formula:

a represents an area union set corresponding to the matching point set of the candidate station caption image, and B represents an area union set corresponding to the local feature set of the candidate station caption image.

In the case where there is area overlap, the area overlap portion needs to be subtracted, that is, the area is calculated once for all the area overlap portions. For example, if feature points of 10 local features are all matched, if the feature points of the 10 local features are all crowded together, only a small area can be matched, and if the feature points of the 10 local features are uniformly distributed on the candidate logo image, the matched area is definitely larger than the crowded area, so that the similarity value of the latter is higher. The candidate station caption image with the highest similarity score and larger than a given threshold value is the station caption which is identified finally.

Further, in the embodiment of the application, a specific way for calculating the similarity score is provided, and a reasonable and reliable implementation way is provided for implementation of the scheme through the way, so that the feasibility and the operability of the scheme are improved.

Optionally, on the basis of any one of the first to eighth embodiments corresponding to fig. 3 and fig. 3, in a ninth optional embodiment of the method for identifying an icon provided in the embodiment of the present application, before matching the logo image with the preset icon set and obtaining an icon identification result of the video to be detected, the method may further include:

acquiring a video set to be processed, wherein the video set to be processed comprises at least one video to be processed;

detecting each video to be processed in the video set to be processed to obtain a station logo image set to be processed, wherein the station logo image set to be processed comprises at least one station logo image to be processed, and at least one station logo image to be processed in the station logo image set to be processed corresponds to the same identifier;

processing station logo images to be processed in the station logo image set to be processed to obtain a preset icon set;

and performing feature extraction on each preset icon in the preset icon set to obtain a local feature set of each preset icon, wherein the local feature set comprises at least one local feature, and the local feature comprises feature point position coordinates and feature information.

In this embodiment, a method for establishing a preset icon set will be described, where a to-be-processed video set is first obtained, where the to-be-processed video set includes at least one to-be-processed video, and the to-be-processed video may be a manually extracted video or a video randomly selected from a background database. Next, each to-be-processed video is detected, and a to-be-processed logo image set is obtained, where one to-be-processed video may include at least one to-be-processed logo image, and it can be understood that, in actual application, at least one to-be-processed logo image corresponds to the same logo. For convenience of understanding, please refer to fig. 11, fig. 11 is a schematic diagram illustrating different display forms of the same station caption in the embodiment of the present application, and as shown in the figure, the same station caption may have different display forms, for example, a station caption named "small sun video" has the representation implementation in (a) in fig. 11, and may also have the representation form in (b) in fig. 11, so that the identifier is the identifier corresponding to "small sun video" in any representation form. A plurality of videos to be processed are collected as much as possible, so that the videos to be processed contain different display forms, and the coverage rate of the icon recognition system can be increased.

After the processing station caption image set is obtained, each station caption image to be processed in the processing station caption image set is continuously processed, and the processing modes include but are not limited to clipping of the station caption image to be processed, zooming of the station caption image to be processed, cleaning of the station caption image to be processed and the like. In the off-line library building process, false detection non-station logo images need to be manually cleaned after station logo detection, and in the on-line station logo identification process, the false detection can be eliminated in the subsequent feature matching process, so that the final identification result cannot be influenced. The station caption cleaning is to remove non-station caption images which are possibly detected by mistake, and we need to ensure that the preset icons in the preset icon set are correct station caption images, and can also adopt a manual cutting mode to obtain the station caption images for static station caption images, and can obtain the station caption automatically for dynamic station captions.

And obtaining a preset icon set after the processing is finished. Further, based on the preset icon set, local features of each preset icon may be further extracted to obtain a local feature set of each preset icon, where the local feature set includes at least one local feature, and the local features include feature point position coordinates and feature information, where the feature point position coordinates are specifically abscissa and ordinate of a pixel point, and the feature information may be a 128-dimensional floating point number.

Furthermore, in the embodiment of the present application, a method for establishing a preset icon set is provided, that is, first, a to-be-processed video set needs to be obtained, then, each to-be-processed video in the to-be-processed video set is detected to obtain a to-be-processed logo image set, the to-be-processed logo image set includes at least one to-be-processed logo image, at least one to-be-processed logo image in the to-be-processed logo image set corresponds to the same identifier, then, the to-be-processed logo images in the to-be-processed logo image set are processed to obtain a preset icon set, and finally, each preset icon in the preset icon set is subjected to feature extraction to obtain a local feature set of each preset icon. Through the method, the preset icon set can be established in an off-line state, namely, the station caption index library is established, so that subsequent matching operation is facilitated, at least one station caption image to be processed corresponds to the same identifier, namely, the preset icon set can store the preset icons of the same station caption in different display forms, the reliability and flexibility of the scheme are improved, and the matching success rate is improved.

Optionally, on the basis of the ninth embodiment corresponding to fig. 3, in a tenth optional embodiment of the method for identifying an icon provided in the embodiment of the present application, processing the station caption image to be processed in the station caption image set to be processed to obtain a preset icon set may include:

when a first processing instruction is received, removing a first station logo image to be processed from a station logo image set to be processed according to the first processing instruction, wherein the first processing instruction carries an identifier of the first station logo image to be processed;

and when a second processing instruction is received, adjusting a second station logo image to be processed in the station logo image set to be processed according to the second processing instruction to obtain a preset icon in the preset icon set, wherein the second processing instruction carries an identifier of the second station logo image to be processed.

In this embodiment, how to process the to-be-processed logo images in the to-be-processed logo image set is described. The method mainly comprises two modes, wherein one mode is to remove the to-be-processed logo images which do not accord with the conditions, and the other mode is to cut the to-be-processed logo images which accord with the conditions.

Specifically, the first to-be-processed station caption image in the to-be-processed station caption image set is taken as an example for introduction, and it is assumed that the first to-be-processed station caption image does not meet the station caption condition, for example, the first to-be-processed station caption image is not a real station caption image, or the first to-be-processed station caption image has more noise, or the first to-be-processed station caption image has serious deformation. In this case, the user triggers a first processing instruction, the first processing instruction carries an identifier of the first station caption image to be processed, and the icon recognition device removes the first station caption image to be processed from the station caption image set to be processed according to the first processing instruction.

Taking the second to-be-processed station caption image in the to-be-processed station caption image set as an example, it is assumed that the second to-be-processed station caption image meets the station caption condition, but the size of the second to-be-processed station caption image is larger or smaller. In this case, the user triggers a second processing instruction, the second processing instruction carries an identifier of the second logo image to be processed, and the icon recognition device adjusts the second logo image to be processed according to the second processing instruction, for example, cuts or enlarges the second logo image to be processed, so as to obtain the preset icon.

Still further, in the embodiment of the present application, a method for processing a logo image to be processed is provided. And when a first processing instruction is received, removing the first station caption image to be processed from the station caption image set to be processed according to the first processing instruction. And when a second processing instruction is received, adjusting a second station caption image to be processed in the station caption image set to be processed according to the second processing instruction to obtain a preset icon in the preset icon set. Through the mode, in the process of establishing the preset icon set, the to-be-processed station caption images can be cut according to actual conditions, the to-be-processed station caption images which do not meet requirements can also be removed, therefore, more regular preset icons can be obtained, subsequent matching is facilitated, meanwhile, the to-be-processed station caption images corresponding to the same identification can also be added, and therefore the diversity of the scheme is improved.

It should be understood that the application provides a method for identifying a dynamic station caption as a relatively stable static station caption by a multi-frame fusion method, and a local feature is extracted from the obtained static station caption, so that the method can not only deal with the dynamic station caption, but also can make the identification of the static station caption more accurate. Based on the above manner, a series of experiments are performed, in which related videos of 18 station captions, such as 360 fast videos, hundred-degree good-looking videos, tremble videos and the like, are collected from the internet, 5 videos of each type are collected to extract the station caption to be used as a station caption index database, 728 videos with the station caption and 1000 videos without the station caption are collected to be used as a test set, and test results are shown in table 1 below.

TABLE 1

Platform sign	Total amount of	Amount of real coverage	Coverage rate	Total number of hits	Hit correct amount	Rate of accuracy
							Number
1 station mark	41	40	98％	40	40	100％
							Number
2 station mark	44	44	100％	45	44	98％
							Number
3 station mark	44	38	86％	38	38	100％
							Number
4 station mark	36	36	100％	36	36	100％
							Number
5 station mark	31	31	100％	32	31	97％
							Number 6 station mark	54	51	94％	51	51	100％
Number 7 station label	21	20	95％	20	20	100％
							Number 8 station mark	54	50	93％	50	50	100％
Number 9 station mark	32	28	88％	28	28	100％
							Number 10 station mark	14	13	93％	13	13	100％
Number 11 station mark	57	56	98％	56	56	100％
							Number 12 station label	42	41	98％	41	41	100％
Number 13 station mark	47	43	91％	44	43	98％
							14 # station label	52	52	100％	52	52	100％
Number 15 station mark	55	51	93％	52	51	98％
							16 # station label	32	27	84％	27	27	100％
Number 17 station mark	18	16	89％	16	16	100％
							Number 18 station mark	54	52	96％	52	52	100％
Total of	728	689	95％	693	689	99％

It can be seen that the dynamic station caption of station caption No. 5 has good coverage and accuracy, and if only the first frame of the video is used for station caption extraction and identification, the coverage of the dynamic station caption is almost 0. In that

Core ^TM Under the hardware test environment of an i7-4790 Central Processing Unit (CPU) @3.6 ghz (Hertz, hz), the average consumed time of each video is about 1.5 seconds, if a mode of traversing a station standard library is adopted, the average consumed time of each video is about 3 seconds, the consumed time is linearly increased along with the number of pictures of the station standard library, coarse matching is adopted, and then only a mode of comparing 1-to-1 similarity of the most similar N station standards is adopted, so that the consumed time is almost ignored when the number of the pictures of the station standard library is increased.

Referring to fig. 12, fig. 12 is a schematic view of an embodiment of an icon recognition apparatus 20 according to the present invention, which includes:

an obtaining module 201, configured to randomly obtain P frames of to-be-detected images from a to-be-detected video, where the to-be-detected video includes Q frames of video images, Q is an integer greater than 1, and P is an integer greater than or equal to 1 and less than or equal to Q;

a detection module 202, configured to perform edge detection on an image to be detected in the P frames of images to be detected acquired by the acquisition module 201 to obtain a target edge detection map set, where the target edge detection map set includes M target edge detection maps, each target edge detection map is obtained by fusing P edge detection maps, and M is an integer greater than or equal to 1;

a determining module 203, configured to determine an icon area according to the target edge detection atlas detected by the detecting module 202;

the determining module 203 is further configured to determine an icon in the video to be detected according to the image to be detected of the P frame and the icon area;

the identifying module 204 is configured to match the icon determined by the determining module 203 with a preset icon set, and obtain an icon identifying result of the video to be detected, where the preset icon set includes at least one preset icon.

In this embodiment, an obtaining module 201 randomly obtains P frames of images to be detected from a video to be detected, where the video to be detected includes Q frames of video images, Q is an integer greater than 1, P is an integer greater than or equal to 1 and less than or equal to Q, a detecting module 202 performs edge detection on the images to be detected in the P frames of images to be detected, which are obtained by the obtaining module 201, to obtain a target edge detection map set, where the target edge detection map set includes M target edge detection maps, each target edge detection map is obtained by fusing P edge detection maps, M is an integer greater than or equal to 1, a determining module 203 determines an icon region according to the target edge detection map set obtained by the detecting module 202, the determining module 203 determines an icon in the video to be detected according to the P frames of images to be detected and the icon region, and an identifying module 204 matches the icon determined by the determining module 203 with a preset icon set to obtain an icon identification result of the video to be detected, where the preset icon set includes at least one preset icon.

In the embodiment of the application, an icon recognition device is provided, firstly, the icon recognition device needs to randomly obtain a P frame to-be-detected image from a to-be-detected video, then edge detection is performed on each frame to-be-detected image in the P frame to-be-detected image, a target edge detection image set is obtained, the target edge detection image set comprises M target edge detection images, each target edge detection image is obtained after the P edge detection images are fused, if a station logo area is determined to exist according to the target edge detection image set, whether a station logo image exists in the to-be-detected video or not is detected according to the P frame to-be-detected image and the station logo area. And if the station caption image exists in the video to be detected, matching the station caption image with a preset icon set to obtain an icon identification result of the video to be detected, wherein the preset icon set comprises at least one preset icon. Through the mode, the multiframe video images of random sampling are fused, on the one hand, the diversity of station caption background change can be increased by utilizing random sampling, a better sampling effect is achieved, on the other hand, the multiframe video images are fused, the dynamic station caption can be changed into a relatively stable static station caption, and then the static station caption is identified, so that the method is suitable for detecting the static station caption and the dynamic station caption, and the identification accuracy is improved.

Optionally, on the basis of the embodiment corresponding to fig. 12, please refer to fig. 13, in another embodiment of the icon identifying apparatus 20 provided in the embodiment of the present application, the icon identifying apparatus 20 further includes a dividing module 205 and an extracting module 206;

the dividing module 205 is configured to divide each frame of the image to be detected in the P frame of image to be detected before the detecting module 202 performs edge detection on the image to be detected in the P frame of image to be detected to obtain a target edge detection set, so as to obtain a plurality of image regions corresponding to each frame of image to be detected;

the extracting module 206 is configured to extract M image regions corresponding to each frame of image to be detected from the plurality of image regions corresponding to each frame of image to be detected obtained by dividing by the dividing module 205, where the M image regions are used for edge detection.

Secondly, in the embodiment of the application, a dividing mode of the image to be detected is provided, before edge detection is performed on each frame of the image to be detected in the P frame to obtain a target edge detection image set, each frame of the image to be detected in the P frame can be divided to obtain a plurality of image areas corresponding to each frame of the image to be detected, and then M image areas corresponding to each frame of the image to be detected are extracted from the plurality of image areas corresponding to each frame of the image to be detected. Through the mode, the image to be detected is reasonably divided, a plurality of operable areas are formed, subsequent operation is facilitated, and therefore the flexibility and operability of the scheme are improved.

Optionally, on the basis of the embodiment corresponding to fig. 13, in another embodiment of the icon identifying apparatus 20 provided in the embodiment of the present application,

the detection module 202 is specifically configured to perform edge detection on a target image area in the M image areas of each frame of image to be detected to obtain P edge detection maps corresponding to the target image area, where the target image area belongs to any one of the M image areas;

and when the target edge detection images corresponding to the M image areas are acquired, acquiring the target edge detection image set.

Alternatively, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the icon identifying apparatus 20 provided in the embodiment of the present application,

the determining module 203 is specifically configured to perform histogram statistics on each target edge detection map in the target edge detection map set to obtain M statistical results, where the histogram statistics is used to perform statistics in a horizontal direction and a vertical direction on the target edge detection map;

if at least one statistical result in the M statistical results meets the statistical threshold, determining that the station caption area is stored in the target edge detection graph set;

Secondly, in the embodiment of the present application, a method for detecting whether a station caption area exists is provided. Before detecting whether a station caption image exists in a video to be detected, histogram statistics can be carried out on each target edge detection image in a target edge detection image set to obtain M statistical results, whether each statistical result in the M statistical results meets a station caption region extraction condition is judged respectively, if at least one statistical result in the M statistical results meets the station caption region extraction condition, the fact that the station caption region exists in the target edge detection image set is determined, and if no statistical result in the M statistical results meets the station caption region extraction condition, the fact that the station caption region does not exist in the target edge detection image set is determined. By the method, under the condition of giving the statistical threshold, histogram statistics is carried out on the target edge detection image respectively along the horizontal direction and the vertical direction, and the part of the histogram smaller than the statistical threshold is removed, so that a compact station caption area can be obtained, and the feasibility of the scheme is improved.

the determining module 203 is specifically configured to determine an image score of an image to be matched in the station caption area according to the P frame image to be detected;

Secondly, in the embodiment of the application, a method for detecting whether a station caption image exists in a video to be detected is provided, firstly, the image score of an image to be matched in a station caption area is determined according to a P frame image to be detected, and if the image score of the image to be matched is larger than or equal to a station caption image threshold value, the station caption image exists in the video to be detected. By the mode, for a video to be detected, the number of the detected station caption images is at least 0, namely, no station caption image exists in the video to be detected, the number of the detected station caption images is at most one station caption image in each station caption area, and the image to be matched, of which the image score is greater than or equal to the threshold value of the station caption image, is determined as the station caption image, so that the detection error rate can be effectively reduced, and a stable station caption image can be obtained.

the identification module 204 is specifically configured to obtain a local feature set to be matched of the station caption image, where the local feature set to be matched includes at least one local feature to be matched;

comparing the station caption image with each candidate station caption image in the candidate station caption image set to obtain a matching point set of each candidate station caption image, wherein the matching point set comprises at least one matching point which represents a characteristic point of successful matching between the candidate station caption image and the station caption image;

Secondly, in the embodiment of the application, a way of obtaining a station logo recognition result is provided, which includes obtaining a local feature set to be matched of a station logo image, obtaining a local feature set of each preset icon in the preset icon set, determining a candidate station logo image set from the preset icon set through a k-nearest neighbor algorithm, comparing the station logo image with each candidate station logo image in the candidate station logo image set to obtain a matching point set of each candidate station logo image, calculating to obtain N similarity scores according to the matching point set of each candidate station logo image and the local feature set of each candidate station logo image, and determining a target station logo image of a video to be detected from the candidate station logo image set according to the N similarity scores. By adopting the mode, the station caption identification result can be obtained by adopting rough matching and fine matching, so that the calculation amount of matching is reduced, and meanwhile, the detection accuracy can be effectively improved, thereby improving the reliability of the scheme.

the identification module 204 is specifically configured to 1) obtain one local feature to be matched in the local feature set to be matched;

Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the icon identifying apparatus 20 provided in the embodiment of the present application,

the identification module 204 is specifically configured to match each local feature to be matched in the station caption image with each local feature of each candidate station caption image to obtain a projection matrix, where the projection matrix represents a position coordinate of the station caption image after projection;

In the embodiment of the present application, a method for determining a matching point set of each candidate station caption image is provided, where each local feature to be matched in the station caption image may be matched with each local feature of each candidate station caption image to obtain a projection matrix, and then a matching point set of each candidate station caption image is determined according to the station caption image and the projection matrix. By the method, the correct paired point set can be obtained from the candidate station caption image set with less relative data, so that the similarity score between the station caption image and each candidate station caption image is obtained through calculation, and the accuracy of station caption matching is improved.

the identifying module 204 is specifically configured to calculate the similarity score in the following manner:

Optionally, on the basis of the embodiment corresponding to fig. 12, please refer to fig. 14, in another embodiment of the icon identifying apparatus 20 provided in the embodiment of the present application, the icon identifying apparatus 20 further includes a processing module 207 and an extracting module 206;

the obtaining module 201 is further configured to obtain a to-be-processed video set before the identifying module 204 matches the icon with a preset icon set and obtains an icon identification result of the to-be-detected video, where the to-be-processed video set includes at least one to-be-processed video;

the detection module 202 is further configured to detect each to-be-processed video in the to-be-processed video set acquired by the acquisition module 201, so as to obtain a to-be-processed logo image set, where the to-be-processed logo image set includes at least one to-be-processed logo image, and at least one to-be-processed logo image in the to-be-processed logo image set corresponds to the same identifier;

the processing module 207 is configured to process the station caption image to be processed in the station caption image set to be processed, which is detected by the detecting module 202, to obtain the preset icon set;

the extracting module 206 is configured to perform feature extraction on each preset icon in the preset icon set obtained by processing by the processing module 209 to obtain a local feature set of each preset icon, where the local feature set includes at least one local feature, and the local feature includes a feature point position coordinate and feature information.

Optionally, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the icon identifying apparatus 20 provided in the embodiment of the present application,

the processing module 207 is specifically configured to, when a first processing instruction is received, remove a first to-be-processed station caption image from the to-be-processed station caption image set according to the first processing instruction, where the first processing instruction carries an identifier of the first to-be-processed station caption image;

Still further, in the embodiment of the present application, a method for processing a logo image to be processed is provided. And when a first processing instruction is received, removing the first station caption image to be processed from the station caption image set to be processed according to the first processing instruction. And when a second processing instruction is received, adjusting a second station caption image to be processed in the station caption image set to be processed according to the second processing instruction to obtain a preset icon in the preset icon set. Through the mode, in the process of establishing the preset icon set, the to-be-processed station caption images can be cut according to actual conditions, the to-be-processed station caption images which are not in line with requirements can be removed, therefore, more regular preset icons can be obtained, subsequent matching is facilitated, meanwhile, the to-be-processed station caption images corresponding to the same identification can be added, and therefore the diversity of the scheme is improved.

Fig. 15 is a schematic diagram of a server 300 according to an embodiment of the present application, where the server 300 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) for storing applications 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the server 300.

The server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and so forth.

The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 15.

In the embodiment of the present application, the CPU 322 included in the server further has the following functions:

determining an icon area according to the target edge detection graph set;

determining an icon in the video to be detected according to the image to be detected of the P frame and the icon area;

As shown in fig. 16, for convenience of description, only the portions related to the embodiments of the present application are shown, and details of the specific technology are not disclosed, please refer to the method portion of the embodiments of the present application. This terminal equipment can be for including cell-phone, panel computer, personal Digital Assistant (PDA), point of sale terminal equipment (POS), arbitrary terminal equipment such as on-vehicle computer, and use terminal equipment as the cell-phone for example:

fig. 16 is a block diagram illustrating a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 16, the cellular phone includes: radio Frequency (RF) circuitry 410, memory 420, input unit 430, display unit 440, sensor 450, audio circuitry 460, wireless fidelity (WiFi) module 470, processor 480, and power supply 490. Those skilled in the art will appreciate that the handset configuration shown in fig. 16 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 16:

RF circuit 410 may be used for receiving and transmitting signals during a message transmission or a call, and in particular, for receiving downlink information from a base station and processing the received downlink information to processor 480; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuitry 410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 410 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), long Term Evolution (LTE), email, short Message Service (SMS), etc.

The memory 420 may be used to store software programs and modules, and the processor 480 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 430 may include a touch panel 431 and other input devices 432. The touch panel 431, also called a touch screen, can collect the touch operation of the user on or near the touch panel 431 (for example, the operation of the user on or near the touch panel 431 by using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 431 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 480, and receives and executes commands sent from the processor 480. In addition, the touch panel 431 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 430 may include other input devices 432 in addition to the touch panel 431. In particular, other input devices 432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 440 may be used to display information input by the user or information provided to the user and various menus of the cellular phone. The display unit 440 may include a display panel 441, and optionally, the display panel 441 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 431 can cover the display panel 441, and when the touch panel 431 detects a touch operation on or near the touch panel 431, the touch operation is transmitted to the processor 480 to determine the type of the touch event, and then the processor 480 provides a corresponding visual output on the display panel 441 according to the type of the touch event. Although the touch panel 431 and the display panel 441 are shown in fig. 16 as two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 431 and the display panel 441 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 450, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 441 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 441 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuit 460, speaker 461, microphone 462 may provide an audio interface between the user and the cell phone. The audio circuit 460 may transmit the electrical signal converted from the received audio data to the speaker 461, and convert the electrical signal into a sound signal for output by the speaker 461; on the other hand, the microphone 462 converts the collected sound signal into an electrical signal, which is received by the audio circuit 460 and converted into audio data, which is then processed by the audio data output processor 480 and then transmitted to, for example, another cellular phone via the RF circuit 410, or output to the memory 420 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and the mobile phone can help a user to receive and send emails, browse webpages, access streaming media and the like through the WiFi module 470, and provides wireless broadband internet access for the user. Although fig. 16 shows the WiFi module 470, it is understood that it does not belong to the essential constitution of the handset, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 480 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 420 and calling data stored in the memory 420. Optionally, processor 480 may include one or more processing units; optionally, the processor 480 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 480.

The mobile phone further includes a power supply 490 (e.g., a battery) for supplying power to the components, and optionally, the power supply may be logically connected to the processor 480 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In this embodiment, the processor 480 included in the terminal device further has the following functions:

determining an icon area according to the target edge detection graph set;

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of icon recognition, comprising:

randomly acquiring P frames of images to be detected from a video to be detected, wherein the video to be detected comprises Q frames of video images, Q is an integer greater than 1, P is an integer greater than or equal to 1 and less than or equal to Q;

determining an icon area according to the target edge detection graph set, wherein the icon area is a station logo area;

determining the image score of an image to be matched in the station caption area according to the P frame image to be detected, and determining that the station caption image exists in the video to be detected if the image score of the image to be matched is greater than or equal to a station caption image threshold value;

2. The method according to claim 1, wherein before performing edge detection on the image to be detected in the P-frame image to be detected to obtain the target edge detection map set, the method further comprises:

3. The method according to claim 2, wherein the performing edge detection on the image to be detected in the P-frame image to be detected to obtain a target edge detection map set comprises:

performing edge detection on a target image area in the M image areas of each frame of image to be detected to obtain P edge detection images corresponding to the target image area, wherein the target image area belongs to any one of the M image areas;

4. The method of claim 1, wherein determining an icon region from the set of target edge detection maps comprises:

performing histogram statistics on each target edge detection graph in the target edge detection graph set to obtain M statistical results, wherein the histogram statistics is used for performing statistics in the horizontal direction and the vertical direction on the target edge detection graph;

if at least one statistical result in the M statistical results meets a statistical threshold, determining that the target edge detection graph set has a station caption area;

5. The method according to claim 1, wherein the matching the icon with a preset icon set to obtain the icon recognition result of the video to be detected comprises:

acquiring a local feature set to be matched of a station caption image, wherein the local feature set to be matched comprises at least one local feature to be matched;

6. The method according to claim 5, wherein the determining a candidate logo image set from the preset icon set through a k-nearest neighbor algorithm according to the local feature set to be matched and the local feature set of each preset icon comprises:

1) Acquiring a local feature to be matched in the local feature set to be matched;

and acquiring the candidate station caption image set according to the candidate characteristic of each local characteristic to be matched in the local characteristic set to be matched.

7. The method according to claim 5, wherein the comparing the station caption image with each candidate station caption image in the candidate station caption image set to obtain a matching point set of each candidate station caption image comprises:

matching each local feature to be matched in the station caption image with each local feature of each candidate station caption image to obtain a projection matrix, wherein the projection matrix represents the position coordinate of the station caption image after projection;

8. The method according to claim 7, wherein the calculating N similarity scores according to the paired point set of each candidate station caption image and the local feature set of each candidate station caption image comprises:

the similarity score is calculated as follows:

9. The method according to any one of claims 1 to 8, wherein before the icon is matched with a preset icon set and an icon recognition result of the video to be detected is obtained, the method further comprises:

detecting each to-be-processed video in the to-be-processed video set to obtain a to-be-processed logo image set, wherein the to-be-processed logo image set comprises at least one to-be-processed logo image, and at least one to-be-processed logo image in the to-be-processed logo image set corresponds to the same identifier;

and extracting features of each preset icon in the preset icon set to obtain a local feature set of each preset icon, wherein the local feature set comprises at least one local feature, and the local feature comprises feature point position coordinates and feature information.

10. The method according to claim 9, wherein the processing the station caption image to be processed in the station caption image set to be processed to obtain the preset icon set includes:

when a first processing instruction is received, removing a first station logo image to be processed from the station logo image set to be processed according to the first processing instruction, wherein the first processing instruction carries an identifier of the first station logo image to be processed;

11. An icon recognition apparatus, comprising:

the detection module is used for carrying out edge detection on the image to be detected in the P frame image to be detected acquired by the acquisition module to obtain a target edge detection map set, wherein the target edge detection map set comprises M target edge detection maps, each target edge detection map is obtained by fusing P edge detection maps, and M is an integer greater than or equal to 1;

the determining module is used for determining an icon area according to the target edge detection image set obtained by the detection module, and the icon area is a station caption area;

the determining module is further configured to determine an image score of an image to be matched in the station caption area according to the P-frame image to be detected, and if the image score of the image to be matched is greater than or equal to a station caption image threshold value, determine that a station caption image exists in the video to be detected;

12. A server, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

13. A terminal device, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

14. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 10.