CN116958180A

CN116958180A - Image processing method, device, apparatus, storage medium, and program

Info

Publication number: CN116958180A
Application number: CN202210416363.8A
Authority: CN
Inventors: 梁晓云; 高永强; 杨萍
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2022-04-20
Filing date: 2022-04-20
Publication date: 2023-10-27

Abstract

The embodiment of the disclosure provides an image processing method, an apparatus, a device, a storage medium and a program, wherein the method comprises the following steps: acquiring an interface image corresponding to a first user interface, wherein the interface image comprises at least one display object, and sliding the interface image by adopting at least one detection frame so as to determine N detection areas in the interface image and respectively determine detection results of the N detection areas; the detection result of each detection area includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object and/or the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; and determining a first target area occupied by the super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to the detection results of the N detection areas. Through the process, the detection efficiency of the super-frame object can be improved, and the labor cost and the time cost are saved.

Description

Image processing method, device, apparatus, storage medium, and program

Technical Field

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to an image processing method, an image processing device, a storage medium and a program.

Background

With the development of technology, man-machine interaction is becoming more and more common. In the process of a User interacting with an application program in an electronic device, an operation Interface that the User intuitively faces is generally called a User Interface (UI).

Included in the user interface are various interface elements, such as: views, windows, dialog boxes, menus, buttons, tabs, and the like. Text may be displayed on some of the interface elements. In some cases, a text hyper-box (i.e., text that exceeds interface element boundaries) may occur. This may affect the aesthetic appearance of the interface on the one hand and may also mask other interface elements on the other hand.

At present, whether a text hyper-box exists in a user interface is mainly checked by naked eyes, so that the detection efficiency is low, and a great deal of labor cost and time cost are required.

Disclosure of Invention

The embodiment of the disclosure provides an image processing method, an image processing device, a storage medium and a program, which are used for improving the detection efficiency of a super-frame object, thereby reducing the labor cost and the time cost.

In a first aspect, an embodiment of the present disclosure provides an image processing method, including:

acquiring an interface image corresponding to a first user interface, wherein the interface image comprises at least one display object;

Sliding the interface image by adopting at least one detection frame to determine N detection areas in the interface image, and respectively determining detection results of the N detection areas; the detection result of each detection area includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object and/or the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; the N is an integer greater than 1;

according to the detection results of the N detection areas, determining a first target area occupied by the super-frame object and/or a second target area occupied by the super-frame part of the super-frame object in the interface image; the super-frame object is a display object of which at least part of the area exceeds a preset display boundary in the interface image, and the super-frame part is a part of the super-frame object exceeding the preset display boundary.

In a second aspect, an embodiment of the present disclosure provides an image processing apparatus including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an interface image corresponding to a first user interface, and the interface image comprises at least one display object;

the detection module is used for sliding in the interface image by adopting at least one detection frame so as to determine N detection areas in the interface image and respectively determine detection results of the N detection areas; the detection result of each detection area includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object and/or the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; the N is an integer greater than 1;

The determining module is used for determining a first target area occupied by the super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to the detection results of the N detection areas; the super-frame object is a display object of which at least part of the area exceeds a preset display boundary in the interface image, and the super-frame part is a part of the super-frame object exceeding the preset display boundary.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor and a memory;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory, causing the processor to perform the image processing method as described in the first aspect above.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the image processing method as described in the first aspect above.

In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the image processing method as described in the first aspect above.

The embodiment of the disclosure provides an image processing method, an apparatus, a device, a storage medium and a program, wherein the method comprises the following steps: acquiring an interface image corresponding to a first user interface, wherein the interface image comprises at least one display object, and sliding the interface image by adopting at least one detection frame so as to determine N detection areas in the interface image and respectively determine detection results of the N detection areas; the detection result of each detection area includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object and/or the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; and determining a first target area occupied by the super-frame object and/or a second target area occupied by the super-frame part of the super-frame object in the interface image according to the detection results of the N detection areas. In the process, the detection processing is carried out on the interface image corresponding to the first user interface, so that the automatic detection of the super-frame object in the first user interface is realized, the detection efficiency is improved, and the labor cost and the time cost are saved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the description of the prior art, it being obvious that the drawings in the following description are some embodiments of the present disclosure, and that other drawings may be obtained from these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 is a schematic illustration of a user interface provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a process for detecting a super frame object in a user interface according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of an image processing method according to an embodiment of the disclosure;

fig. 4 is a schematic diagram of detecting an interface image by using a detection frame according to an embodiment of the disclosure;

FIG. 5 is a schematic diagram of a system architecture for detecting a super frame object in a user interface according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a preset model according to an embodiment of the disclosure;

FIG. 7 is a schematic structural diagram of another preset model according to an embodiment of the present disclosure;

fig. 8 is a flowchart of another image processing method according to an embodiment of the disclosure;

FIG. 9 is a schematic diagram of an image processing procedure provided in an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.

The technical scheme provided by the embodiment of the disclosure can be used for detecting the super-frame object existing in the User Interface (UI). Wherein the user interface refers to: and in the process of interaction between the user and the application program in the electronic equipment, an operation interface visually faced by the user. One or more interface elements (Interface Element) may be included in the user interface. Interface elements refer to a series of elements included in a user interface that meet user interaction requirements, including, but not limited to: views, windows, dialog boxes, menus, buttons, tabs, and the like.

In some scenarios, an application may be published online in multiple countries in the world, where the application is published online in a different country, the text in the user interface needs to be translated into language text for the corresponding country. Due to the variability of the text in different countries, after the text in the user interface is translated, the text hyperbox situation may be followed. This may affect the aesthetic appearance of the interface on the one hand and may also mask other interface elements on the other hand.

For ease of understanding, the following is illustrated in connection with fig. 1. Fig. 1 is a schematic diagram of a user interface provided in an embodiment of the present disclosure. Taking a game interface as an example, as shown in fig. 1, the user interface includes 8 interface elements, 101 to 108 respectively. The interface elements 101, 105, 106, 107 are labels for static display of text. The interface elements 102, 103, 104, 108 are buttons that can be clicked by the user.

With continued reference to FIG. 1, text may be displayed/carried on interface elements of the user interface. For example, the text displayed on interface element 101 is "game property", the text displayed on interface element 102 is "property a", the text displayed on interface element 105 is "property a has A1, A2 feature", and so on.

In some cases, text displayed on an interface element may exceed the boundary of the interface element. For example, in FIG. 1, the text displayed on interface element 106 is "prop B has the characteristics of B1, is used in the case of B2," and is beyond the boundary of interface element 106. As another example, in FIG. 1, the text displayed on interface element 108 is "click learn more about" that is beyond the boundary of interface element 108.

In the embodiment of the disclosure, the phenomenon that the text exceeds the boundary of the interface element is called a text hyper-box phenomenon. Text in which the phenomenon of "exceeding the boundary of the interface element" exists is referred to as "hypertext. For example, in FIG. 1, the text "prop B has the characteristics of B1, and in the case of B2" may be referred to as "hypertext" and "click to see more prop" may be referred to as "hypertext".

It should be understood that, in addition to text, other content may be displayed/carried on interface elements of the user interface, such as: images, icons, other interface elements, etc. In the embodiments of the present disclosure, content displayed/carried on an interface element is collectively referred to as a display object. A situation may occur for each display object that exceeds a preset display boundary (e.g., the boundary of an interface element that displays/carries the display object). In the embodiment of the present disclosure, the phenomenon that the display object exceeds the preset display boundary is referred to as "object superstration". The object superbox may include one or more of the following: text superboxes, image superboxes, icon superboxes, interface element superboxes, and the like.

Further, a display object having a phenomenon of "exceeding a preset display boundary" is referred to as a "super-frame object". In other words, the super-frame object refers to a display object in which at least a partial region exceeds a preset display boundary. Also, a portion of the super-frame object beyond the preset display boundary is referred to as a "super-frame portion". For example, in FIG. 1, the super-frame object "prop B has the feature of B1, and the 3 words" under use "in" use "in the case of B2 are beyond the boundary of interface element 106, so the portion of text" under use "in the super-frame object may be referred to as the super-frame portion. As another example, in FIG. 1, the 2 words of "property" in the super-box object "click know more property" are beyond the boundary of interface element 108, and thus the portion of text of "property" in the super-box object may be referred to as the super-box portion.

At present, when detecting the super-frame object in the user interface, a manual visual inspection mode is needed, so that the detection efficiency is low, and a great deal of labor cost and time cost are needed.

In the embodiment of the disclosure, an image processing technology may be adopted, and the electronic device processes the image corresponding to the user interface. Fig. 2 is a schematic diagram of a process for detecting a super frame object in a user interface according to an embodiment of the disclosure. As shown in fig. 2, a user interface to be detected is captured or photographed to obtain an interface image, the interface image is input into an electronic device, the electronic device processes the interface image, and the area occupied by the super-frame object (for example, the dotted line box 201 and the dotted line box 203 in fig. 2) and/or the area occupied by the super-frame portion of the super-frame object (for example, the dotted line box 202 and the dotted line box 204 in fig. 2) are marked in the interface image. Therefore, the automatic detection of the super-frame object in the user interface is realized, the detection efficiency is improved, and the labor cost and the time cost are saved.

The technical solutions provided by the present disclosure are described in detail below with reference to several specific embodiments. The following specific embodiments may be combined with each other and may not be described in detail in some embodiments for the same or similar concepts or processes.

Fig. 3 is a flowchart of an image processing method according to an embodiment of the disclosure. The method of the present embodiment may be performed by the electronic device of fig. 2. As shown in fig. 3, the method of the present embodiment includes:

s301: and acquiring an interface image corresponding to the first user interface, wherein the interface image comprises at least one display object.

The first user interface is a user interface to be detected, that is, whether a super-frame object exists in the first user interface needs to be detected in the embodiment of the disclosure. The interface image includes one or more display objects, and each display object may be displayed (also referred to as accommodating, carrying, or presenting) in a certain interface element.

In an embodiment of the present disclosure, the display object may be one or more of the following: text, images, icons, interface elements, etc. It should be noted that, when the display object is text, the display object may refer to a line of text (i.e., text line) or a segment of text (i.e., text segment). One or more characters may be included in a line of text. For example, in FIG. 1, the text displayed by each interface element is taken as a display object.

The interface image corresponding to the first user interface may be an image obtained by capturing a picture of the first user interface or capturing a picture. The content presented in the interface image is the same as the content presented in the first user interface.

S302: sliding the interface image by adopting at least one detection frame to determine N detection areas in the interface image, and respectively determining detection results of the N detection areas; the detection result of each detection area includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object and/or the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; and N is an integer greater than 1.

For ease of understanding, the following is illustrated in connection with fig. 4. Fig. 4 is a schematic diagram of detection processing of an interface image by using a detection frame according to an embodiment of the disclosure. As shown in fig. 4, it is assumed that the interface image includes 10 pixels in the width direction and 10 pixels in the height direction, that is, the size of the interface image is 10×10. Assume that the detection frame has a width of 5 and a height of 3, i.e., the size of the detection frame is 5*3. The detection frame slides on the interface image, and a detection area is determined on the interface image when the detection frame slides to a position in the sliding process. In this way, a plurality of detection areas can be generated. For example, in fig. 4, a black rectangular frame is a detection frame, and an area surrounded by the detection frame (i.e., a shadow filling area) is a detection area during the movement of the detection frame.

Illustratively, the pixel (i, j) in the interface image is taken as the center of the detection frame, and the area covered/enclosed by the detection frame on the interface image forms a detection area. Wherein i is 1, 2, … and 10 in sequence, and j is 1, 2, … and 10 in sequence. Thus, 100 detection regions were obtained in total.

For each detection area, the detection area can be detected, and the category of the detection area is determined. In the present disclosure, the types of detection areas are two, one is "the area occupied by the super-frame object" and the other is "the area occupied by the super-frame portion of the super-frame object". Thus, the detection result of each detection region may include: the coordinate information of the detection area and the first probability that the detection area is the area occupied by the super-frame object. Alternatively, the detection result of each detection region may include: the coordinate information of the detection region, and the second probability that the detection region is the "region occupied by the super frame portion of the super frame object". Alternatively, the detection result of each detection region may include: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object, and the second probability that the detection area is the area occupied by the super-frame part of the super-frame object.

The coordinate information of the detection area may be represented in various manners, which is not limited in this embodiment. For example, the center point coordinates (x, y) of the detection area, the width w, and the height h may be used for representation, the upper left corner vertex coordinates (x 1, y 1) and the lower right corner vertex coordinates (x 2, y 2) of the detection area may be used for representation, and the offset (Δx, Δy) of the center point coordinates of the detection area with respect to a certain preset coordinate may be used for representation.

S303: according to the detection results of the N detection areas, determining a first target area occupied by the super-frame object and/or a second target area occupied by the super-frame part of the super-frame object in the interface image; the super-frame object is a display object of which at least part of the area exceeds a preset display boundary in the interface image, and the super-frame part is a part of the super-frame object exceeding the preset display boundary.

That is, according to the detection results of the N detection areas, a first target area occupied by the super-frame object is determined in the interface image. Or determining a second target area occupied by the super-frame part of the super-frame object in the interface image according to the detection results of the N detection areas. Or determining a first target area occupied by the super-frame object and a second target area occupied by the super-frame part of the super-frame object in the interface image according to the detection results of the N detection areas.

The first target area and the second target area are determined simultaneously, for example, as follows.

In this embodiment, the area occupied by the super-frame object is referred to as a first target area, for example, a dashed box 201 and a dashed box 203 in fig. 2. The area occupied by the superframe portion of the superframe object is referred to as a second target area, such as a dashed box 202 and a dashed box 204. It should be appreciated that in practical applications, there may be multiple super-frame objects in the first user interface. In this case, the number of the first target areas determined in S303 may be plural, and the number of the second target areas may be plural.

It should be understood that, since the detection results of the N detection areas have been determined, the first target area occupied by the super-frame object and the second target area occupied by the super-frame portion of the super-frame object may be determined among the N detection areas based on the detection results of the N detection areas.

For example, the N detection regions may be screened according to the first probability and the second probability corresponding to the N detection regions, for example, the detection region with the lower probability is excluded, so as to obtain the first target region and the second target region.

In the example shown in fig. 4, a detection frame is taken as an example for illustration. In some possible implementations, the number of detection frames may be plural, and the sizes of different detection frames may be different. Thus, the detection results of a plurality of detection areas with different sizes can be detected by using a plurality of detection frames. Therefore, the super-frame objects with different sizes can be detected in the interface image, and the accuracy of the detection result is improved.

When there are a plurality of different sizes of detection frames, among the N detection areas detected in S302, there are different sizes of detection areas. In this way, in S303, when the first target region and the second target region are determined from the N detection regions, the N detection regions may be screened based on the coordinate information of each detection region, in addition to the first probability and the second probability corresponding to each detection region.

When the N detection regions are screened according to the coordinate information of each detection region, the screening may be performed according to the overlapping relationship between different detection regions. For example, if the detection region 1 completely covers the detection region 2, the detection region 2 may be deleted. Alternatively, if the intersection ratio (Intersection of Union, ioU) between the detection area 3 and the detection area 4 is greater than or equal to the preset threshold, one of the detection area 3 or the detection area 4 may be deleted.

Optionally, the N detection regions may be screened according to the first probability and the second probability corresponding to each detection region; and then screening the rest detection areas according to the coordinate information of each detection area. Optionally, the N detection areas may be screened according to the coordinate information of each detection area; and then screening the rest detection areas according to the first probability and the second probability corresponding to each detection area. Optionally, the N detection regions may be screened according to the first probability and the second probability corresponding to each detection region and the coordinate information of each detection region.

In some possible implementations, the first target area and the second target area may be determined in the N detection areas in the following manner:

(1) Determining a plurality of first candidate areas and a plurality of second candidate areas in the N detection areas according to the first probability and the second probability corresponding to the N detection areas; the first probability corresponding to the first candidate region is greater than a first threshold, and the second probability corresponding to the second candidate region is greater than a second threshold.

That is, for each detection region, if the first probability corresponding to the detection region is greater than the first threshold, the detection region is determined as a first candidate region. And if the second probability corresponding to the detection region is greater than a second threshold value, determining the detection region as a second candidate region. Thus, a first target region will be determined in each of the first candidate regions and a second target region will be determined in each of the second candidate regions.

(2) And determining the first target area in the first candidate areas and the second target area in the second candidate areas according to the coordinate information of the first candidate areas and the coordinate information of the second candidate areas.

For example, the first candidate regions having the overlapping relationship may be excluded from the plurality of first candidate regions according to the coordinate information of the plurality of first candidate regions, and the first target region may be determined from the remaining first candidate regions. And according to the coordinate information of the second candidate areas, the second candidate areas with overlapping relation are eliminated from the second candidate areas, and further, the second target area is determined in the remaining second candidate areas.

In this implementation manner, the first target area and the second target area are determined after the N detection areas are filtered twice. The first filtering (i.e., step (1)) is based on the first probability and the second probability, and some detection regions with lower probability can be filtered out. The second filtering (i.e., step (2)) can filter out some detection areas with overlapping relationships based on the coordinate information. Therefore, the first target area and the second target area are finally determined, and the accuracy of the detection result is ensured.

The image processing method provided in this embodiment includes: acquiring an interface image corresponding to a first user interface, wherein the interface image comprises at least one display object; sliding in the interface image by adopting at least one detection frame to determine N detection areas in the interface image, and respectively determining detection results of the N detection areas, wherein the detection results of each detection area comprise: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object and/or the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; and determining a first target area occupied by the super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to the detection results of the N detection areas. In the process, the detection processing is carried out on the interface image corresponding to the first user interface, so that the automatic detection of the super-frame object in the first user interface is realized, the detection efficiency is improved, and the labor cost and the time cost are saved.

In the embodiment shown in fig. 3, S302 and S303 may also be implemented by a preset model. That is, at least one detection frame is adopted to slide in the interface image through a preset model, so that N detection areas are determined in the interface image, detection results of the N detection areas are respectively determined, and further, a first target area occupied by the super-frame object and/or a second target area occupied by the super-frame part of the super-frame object are determined in the interface image through the preset model according to the detection results of the N detection areas.

The preset model can be obtained by training by a machine learning method. Illustratively, the predetermined model is obtained by training a plurality of sets of training samples, each set of training samples comprising: the method comprises the steps of a sample image, an area occupied by a super-frame object in the sample image and an area occupied by a super-frame part of the super-frame object.

Fig. 5 is a schematic diagram of a system architecture for detecting a super frame object in a user interface according to an embodiment of the disclosure. As shown in fig. 5, a training device and an execution device may be included in the system architecture. The training device can train a plurality of groups of training samples in the database to obtain a preset model. The pre-set model may be deployed into an execution device.

When it is required to detect whether the super-frame object exists in the first user interface, an interface image corresponding to the first user interface can be input into the execution device. The execution device detects the interface image through a preset model to obtain detection results of the N detection areas. And the execution equipment determines a first target area occupied by the super-frame object and/or a second target area occupied by the super-frame part of the super-frame object in the interface image according to the detection results of the N detection areas through the preset model.

In some embodiments, the execution device and the training device may be the same device. In other embodiments, the execution device and the training device may be different electronic devices.

In the present disclosure, a preset model is used to detect a super-frame object in a user interface, and the preset model may also be referred to as a super-frame object detector.

The structure of the preset model and the processing procedure of the preset model for the interface image are described in detail below with reference to fig. 6 and 7.

Fig. 6 is a schematic structural diagram of a preset model according to an embodiment of the disclosure. As shown in fig. 6, the preset model may include: the device comprises a feature extraction network, a feature fusion network and a super-frame detection network. The number of the feature fusion grids and the super-frame detection networks can be multiple.

The process of processing the interface image by the preset model shown in fig. 6 is as follows:

(1) And carrying out K kinds of feature extraction of different scales on the interface image to obtain feature images corresponding to the K kinds of scales respectively. And K is an integer greater than 1.

For example, referring to fig. 6, assuming that k=3, 3 kinds of feature extraction of different scales are performed on the interface image through the feature extraction network, so as to obtain a feature map corresponding to the 1 st scale, a feature map corresponding to the 2 nd scale, and a feature map corresponding to the 3 rd scale. The 1 st scale is greater than the 2 nd scale, and the 2 nd scale is greater than the 3 rd scale. For example, scale 1 is 52 x 52, scale 2 is 26 x 26, and scale 3 is 13 x 13.

Further, at least one detection frame is adopted for detecting the feature map of each scale. Taking the ith scale as an example, sliding the feature map corresponding to the ith scale by adopting at least one detection frame to determine N in the feature map corresponding to the ith scale _i A plurality of detection areas and respectively determining the N _i Detection results of the detection areas; the N is _i Is an integer greater than 1. See in particular steps (2) to (4) below.

(2) For the Kth scale, sliding the feature map corresponding to the Kth scale by adopting at least one detection frame so as to determine N in the feature map corresponding to the Kth scale _k A plurality of detection areas and respectively determining the N _k Detection results of the individual detection areas.

For example, referring to fig. 6, let k=3. The feature map corresponding to the 3 rd scale is detected and processed through the super-frame detection network 3 to obtain N ₃ Detection results of the individual detection areas.

(3) For the ith scale, the feature map corresponding to the ith scale and the feature map corresponding to the (i+1) th scale are comparedPerforming fusion processing to obtain a feature map corresponding to the fused ith scale, and sliding the feature map corresponding to the fused ith scale by adopting at least one detection frame to determine N in the feature map corresponding to the fused ith scale _i A plurality of detection areas and respectively determining the N _i Detection results of the individual detection areas.

In the step (3), i is K-1, K-2 and 2, 1 in sequence.

For example, referring to fig. 6, the feature map corresponding to the 2 nd scale and the feature map corresponding to the 3 rd scale are fused through the feature fusion network 2, so as to obtain a feature map corresponding to the 2 nd scale after fusion. Further, the feature map corresponding to the 2 nd scale after fusion is detected through the super-frame detection network 2 to obtain N ₂ Detection results of the individual detection areas.

With continued reference to fig. 6, the feature map corresponding to the 1 st scale and the feature map corresponding to the 2 nd scale (it should be noted that, here, the original feature map corresponding to the 2 nd scale may be adopted, or the feature map corresponding to the 2 nd scale after fusion may be adopted) are subjected to fusion processing through the feature fusion network 1, so as to obtain the feature map corresponding to the 1 st scale after fusion. Further, the feature map corresponding to the 1 st scale after fusion is detected through the super-frame detection network 1 to obtain N ₁ Detection results of the individual detection areas.

In this embodiment, the feature map corresponding to the ith scale and the feature map corresponding to the (i+1) th scale are fused to obtain the feature map corresponding to the ith scale after fusion, so that the feature map corresponding to the ith scale is more accurate.

Optionally, the feature map corresponding to the ith scale and the feature map corresponding to the (i+1) th scale are fused to obtain a feature map corresponding to the ith scale after fusion, which may be as follows: and carrying out up-sampling treatment on the feature map corresponding to the i+1 scale to obtain a sampling feature map, wherein the size of the sampling feature map is the same as that of the feature map corresponding to the i scale. And carrying out fusion processing on the sampling feature map and the feature map corresponding to the ith scale to obtain the feature map corresponding to the ith scale after fusion.

Thus, based on the preset model shown in fig. 6, a total output n=n ₃ +N ₂ +N ₁ Detection results of the individual detection areas.

In the above process, for the ith scale, the number N of detection areas output by the super-frame detection network i _i Associated with the ith scale. The following is an illustration in connection with the 3 dimensions in fig. 6.

Assuming that the 3 rd scale is 13×13, when the super-frame detection network 3 performs detection processing, 1 detection frame is adopted, and each point in the feature map corresponding to the 3 rd scale is taken as the center point of the detection frame, so that the super-frame detection network 3 can obtain N ₃ Detection results of 13×13 detection areas. If M detection frames are adopted, N can be obtained by the super-frame detection network 3 ₃ Detection results of the detection areas=13×13×m.

Assuming that the 2 nd scale is 26×26, when the super-frame detection network 2 performs detection processing, 1 detection frame is adopted, and each point in the feature map corresponding to the 2 nd scale after fusion is taken as the center point of the detection frame, so that the super-frame detection network 2 can obtain N ₂ Detection results for 26×26 detection areas. If M detection frames are adopted, N can be obtained by the super-frame detection network 2 ₂ Detection results of the detection regions of 26×26×m.

Assuming that the 1 st scale is 52×52, when the super-frame detection network 1 performs detection processing, 1 detection frame is adopted, and each point in the feature map corresponding to the 1 st scale after fusion is taken as the center point of the detection frame, so that the super-frame detection network 1 can obtain N ₁ Detection results for 52×52 detection areas. If M detection frames are adopted, N can be obtained by the super-frame detection network 1 ₁ =52×52×m assays detection result of the region.

Fig. 7 is a schematic structural diagram of another preset model according to an embodiment of the present disclosure. An example is illustrated below in connection with fig. 7.

Referring to fig. 7, assuming that the interface image includes 416 pixels in the width direction and 416 pixels in the height direction and 3 channels per pixel point in the channel direction, the interface image may be denoted as (416, 416,3).

Inputting the interface image into a feature extraction network, wherein the feature extraction network comprises: convolution unit 1 and residual units 1 to 5. See fig. 7:

the interface image (416, 416,3) is input to the convolution unit 1, and the convolution unit 1 performs channel expansion processing on the interface image to obtain image features (416, 416, 32). For example, the convolution unit 1 may be Conv2D 32×3×3.

The residual unit 1 performs downsampling and channel expansion processing on the image features (416, 416, 32) to obtain image features (208, 208, 64). For example, residual unit 1 is Residual Block 1×64.

The residual unit 2 performs a downsampling process and a channel expansion process on the image features (208, 208, 64) to obtain image features (104, 104, 128). For example, residual unit 2 may be Residual Block 2×128.

The residual unit 3 performs downsampling and channel expansion processing on the image features (104, 104, 128) to obtain image features (52, 52, 256). For example, residual unit 3 may be Residual Block 8×256.

The residual unit 4 performs downsampling and channel expansion processing on the image features (52, 52, 256) to obtain image features (26, 26, 512). For example, residual unit 4 may be Residual Block 8×512.

The residual unit 5 performs downsampling processing and channel expansion processing on the image features (26, 26, 512) to obtain image features (13, 13, 1024). For example, residual unit 5 may be Residual Block 4×1024.

In this embodiment, taking k=3 as an example, 3 scale features are used. Let the 1 st scale be 52 x 52, the 2 nd scale be 26 x 26 and the 3 rd scale be 13 x 13. That is, after the feature extraction network, a feature map (52, 52, 256) corresponding to the 1 st scale, a feature map (26, 26, 512) corresponding to the 2 nd scale, and a feature map (13, 13, 1024) corresponding to the 3 rd scale are extracted.

Further, the feature map (13, 13, 1024) corresponding to the 3 rd scale is input to the super-frame detection network 3. The super-frame detection network 3 includes a convolution unit 2 and a detection unit 1. The super-frame detection network 3 outputs detection results through the processing of the convolution unit 2 and the detection unit 1 (1313, 21). For example, the convolution unit 2 may be Conv2D Block 5l 1024, and the detection unit 1 may be Conv2D 3×3+conv2d1×1.

The feature map (26, 26, 512) corresponding to the 2 nd scale and the feature map (13, 13, 1024) corresponding to the 2 nd scale are input into the feature fusion network 2. The feature fusion network 2 comprises an upsampling unit 1 and a fusion unit 1. The up-sampling unit 1 performs up-sampling processing on the feature map (13, 13, 1024) corresponding to the 3 rd scale to obtain sampling features (26, 26, 256) of the 2 nd scale. The fusion unit 1 performs fusion processing on the feature map (26, 26, 512) corresponding to the 2 nd scale and the sampled feature (26, 26, 256) output by the upsampling unit 1 to obtain a feature map (26, 26, 768) corresponding to the 2 nd scale after fusion. For example, the UpSampling unit 1 may be conv2d+upsampling2D, and the fusing unit 1 may be a Concat.

The feature map (26, 26, 768) corresponding to the 2 nd scale after fusion is input into the super-frame detection network 2. The super-frame detection network 2 comprises a convolution unit 3 and a detection unit 2. The convolution unit 3 performs channel number reduction processing on the feature map (26, 26, 768) corresponding to the 2 nd scale after fusion, and obtains a feature map (26, 26, 256) of the 2 nd scale. The detection unit 2 performs super-frame detection processing on the feature map (26, 26, 256) output from the convolution unit 3 to obtain detection results (26, 26, 21). For example, the convolution unit 3 may be Conv2D Block 5l 256, and the detection unit 2 may be Conv2D 3×3+conv2d1×1.

The feature map (52, 52, 256) corresponding to the 1 st scale and the feature map (26, 26, 256) corresponding to the 2 nd scale output by the convolution unit 3 are input into the feature fusion network 1. The feature fusion network 1 comprises an upsampling unit 2 and a fusion unit 2. The up-sampling unit 2 performs up-sampling processing on the feature map (26, 26, 256) corresponding to the 2 nd scale output by the convolution unit 3 to obtain sampling features (52, 52, 128) of the 1 st scale. The fusion unit 2 performs fusion processing on the feature map (52, 52, 256) corresponding to the 1 st scale and the sampled feature (52, 52, 128) output by the upsampling unit 2, so as to obtain a feature map (52, 52, 384) corresponding to the 1 st scale after fusion. For example, the UpSampling unit 2 may be conv2d+upsampling2D, and the fusing unit 2 may be a Concat.

And inputting the feature map (52, 52, 384) corresponding to the 1 st scale after fusion into the super-frame detection network 1. The super-frame detection network 1 comprises a convolution unit 4 and a detection unit 3. The convolution unit 4 performs channel number reduction processing on the fused feature map (52, 52, 384) corresponding to the 1 st scale, and obtains a feature map (52, 52, 128) of the 1 st scale. The detection unit 3 performs super-frame detection processing on the feature map (52, 52, 128) output from the convolution unit 4 to obtain a detection result (52, 52, 21). For example, the convolution unit 4 may be Conv2D Block 5l 128, and the detection unit 3 may be Conv2D 3×3+cony2d1×1.

Through the processing, the preset model outputs detection results of 3 scales, which are respectively: (52, 52, 21), (26, 26, 21), (1313, 21). Of the above 3 detection results, 21=3 (4+2+1). Wherein 3 represents the number of detection frames. 4 represents coordinate information of the detection region (e.g., center point coordinates (x, y), width w, and height h). 2 represents a first probability that the detection area is the "area occupied by the super-frame object", and a second probability that the detection area is the "area occupied by the super-frame portion of the super-frame object", and 1 represents whether the detection area is the super-frame object.

That is, based on the preset model shown in fig. 7, 3 detection frames are adopted at each scale, and the detection results of 52×52×3+26×26×3+13×13×3 detection areas are output, and the detection result of each detection area is (4+2+1) =7 dimensions.

In this embodiment, the feature extraction processing of multiple scales is performed on the interface image, and then the super-frame detection is performed based on the image features of each scale, so that the super-frame objects under different scales can be detected, and the accuracy of the detection result is improved.

Fig. 8 is a flowchart illustrating another image processing method according to an embodiment of the disclosure. As shown in fig. 8, the method of the present embodiment includes:

s801: and acquiring an interface image corresponding to the first user interface, wherein the interface image comprises at least one display object.

S802: sliding the interface image by adopting at least one detection frame to determine N detection areas in the interface image, and respectively determining detection results of the N detection areas; the detection result of each detection area includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object, and the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; and N is an integer greater than 1.

It should be understood that the specific implementation of S801 and S802 in this embodiment is similar to the previous embodiment, and will not be repeated here.

S803: and identifying the display objects in the interface image to obtain the coordinate information of the S display objects.

In this embodiment, according to the type of the super-frame object to be detected, a display object recognition algorithm corresponding to the type may be adopted to perform recognition processing on the display objects in the interface image, so as to obtain coordinate information of S display objects.

For example, assuming that the hypertext is required to be detected, text recognition is performed on the interface image, so as to obtain coordinate information of each text object in the interface image. For example, text recognition is performed on the interface image by using an optical character recognition (Optical Character Recognition, OCR) algorithm, so that coordinate information of each text line in the interface image is obtained.

For example, assuming that the super-frame interface element needs to be detected, the interface image may be subjected to interface element identification, so as to obtain coordinate information of each interface element in the interface image.

Note that, in this embodiment, the execution order of S802 and S803 is not limited, and the execution order of the two may be interchanged, or the two may be executed simultaneously.

S804: determining a plurality of first candidate areas and a plurality of second candidate areas in the N detection areas according to the first probability and the second probability corresponding to the N detection areas; the first probability corresponding to the first candidate region is greater than a first threshold, and the second probability corresponding to the second candidate region is greater than a second threshold.

S805: and determining a first target area occupied by the super-frame object in the plurality of first candidate areas, and determining a second target area occupied by the super-frame part of the super-frame object in the plurality of second candidate areas according to the coordinate information of the plurality of first candidate areas, the coordinate information of the plurality of second candidate areas and the coordinate information of the S display objects.

For example, the screening and filtering may be performed on the plurality of first candidate regions to obtain one or more first target regions.

(A1) If the intersection ratio between two first candidate areas in the plurality of first candidate areas is greater than or equal to a preset threshold value, the two first candidate areas are overlapped areas or most of the overlapped areas, and one of the two first candidate areas can be deleted. For example, the first candidate region having the lower first probability of corresponding of the two first candidate regions may be deleted.

(A2) If there is no overlapping region between the first candidate region and the S display objects identified in S803, the first candidate region is deleted.

(A3) If there is no overlapping area between the first candidate area and all the second candidate areas for a certain first candidate area, the first candidate area is deleted.

(A4) If there is an overlap region for a certain first candidate region with a certain second candidate region, but the overlap region is not located at the edge of the first candidate region, the first candidate region is deleted.

For example, the screening and filtering may be performed on the plurality of second candidate regions to obtain one or more second target regions.

(B1) If the intersection ratio between two second candidate areas in the plurality of second candidate areas is greater than or equal to a preset threshold value, the two second candidate areas are overlapped areas or most of the overlapped areas, and one of the two second candidate areas can be deleted. For example, the second candidate region having the lower second probability of corresponding of the two second candidate regions may be deleted.

(B2) If there is no overlapping region between the second candidate region and the S display objects identified in S803, the second candidate region is deleted.

(B3) And deleting the second candidate region if no overlapping region exists between the second candidate region and all the first candidate regions for a certain second candidate region.

(B4) If there is an overlap region with a certain first candidate region for a certain second candidate region, but the overlap region is not located at the edge of the first candidate region, the second candidate region is deleted.

The screening order of the principles A1, A2, A3, and A4 is not limited, and may be any order. The screening order of the above-mentioned principles B1, B2, B3 and B4 is not limited, and may be any order.

Thus, the determined first target area satisfies the following condition:

(1) When the number of the first target areas is larger than 1, the intersection ratio between any two first target areas is smaller than a preset threshold value.

(2) The first target area and at least one of the S display objects have a first overlapping area.

(3) The first target area and the second target area have a second overlapping area, and the second overlapping area is positioned at the edge of the first target area.

Similarly, the determined second target area satisfies the following condition:

(1) When the number of the second target areas is larger than 1, the intersection ratio between any two second target areas is smaller than a preset threshold value.

(2) The second target area has a first overlapping area with at least one of the S display objects.

(3) A second overlapping area exists between the second target area and the first target area, and the second overlapping area is positioned at the edge of the first target area.

Based on the embodiment shown in fig. 8, the following will exemplify a detection process of the hypertext document. Fig. 9 is a schematic diagram of an image processing procedure according to an embodiment of the disclosure. As shown in fig. 9, assume that detection of hypertext in a first user interface is required. The detection process is as follows:

(1) And capturing a screenshot of the first user interface or shooting to obtain an interface image.

(2) And processing the interface image through a preset model (or called a hypertext detector) to obtain detection results of the N detection areas.

It should be understood that the processing procedure of the interface image may be referred to in the detailed description of the embodiments shown in fig. 3, 6 and 7, and will not be described herein.

(3) And identifying text lines in the interface image based on an OCR algorithm to obtain coordinate information of S text lines.

(4) And carrying out post-processing according to the detection results of the N detection areas and the coordinate information of the S text lines to obtain a first target area where the hypertext is located and a second target area where the hypertext is located.

In the case of a post-treatment, the following principle can be considered:

principle 1: the first target area and the second target area should have an overlapping area with the area of a certain text line in the S text lines;

principle 2: the second target area should have an overlap with a certain first target area, and the overlap should be located at the edge of the first target area, and the area of the overlap should be the same as or close to the area of the second target area.

It should be understood that the implementation manner of the post-processing in the step (4) may be referred to in the detailed description of the embodiment shown in fig. 8, which is not described herein.

In this embodiment, by combining the detection result of the preset model (hypertext detector) with the text line detection result based on the OCR algorithm, the detection result of the preset model is corrected by using the text line detection result of the OCR algorithm, so that the accuracy of the detection result is further improved.

In the above embodiments, the process of detecting the super frame object in the user interface using the preset model is described in detail. The preset model is obtained by training in advance. The embodiment of the disclosure does not limit the training process of the preset model. Can be trained using a plurality of sets of training samples, each set of training samples comprising: the device comprises a sample image, an area occupied by a super-frame object in the sample image and an area occupied by a super-frame part of the super-frame object in the sample image.

In the embodiment of the disclosure, in consideration that the detection of the super-frame object belongs to anomaly detection, fewer training samples are used for training the preset model. An algorithm may be employed to automatically generate a sample image of the presence of the superframe object. Taking hypertext as an example, a sample image may be generated as follows:

(1) And acquiring an original image corresponding to the sample user interface.

(2) Coordinate information of each text line in the original image is obtained through recognition by using an OCR algorithm.

(3) And detecting the original image by using the interface element detection model to obtain the coordinate information of the interface element in the original image.

(4) Interface elements bearing text lines are randomly selected in the original image, and fonts are randomly selected from a font library.

(5) And performing text erasure on the text line in the selected interface element.

(6) And writing text information into the interface element by using the selected font according to the coordinate information of the interface element, and enabling the text information to exceed the boundary of the interface element, so as to obtain a sample image.

Through the above procedure, a sample image containing the hypertext is generated. Since the hypertext is automatically written by an algorithm, the area occupied by the hypertext in the sample image and the area occupied by the hypertext in the hypertext can be known. Further, model training can be performed by using the sample image, the area occupied by the hypertext in the sample image, and the area occupied by the hypertext in the hypertext, so as to obtain a trained preset model.

In the embodiment, the training samples are automatically generated by adopting the algorithm, so that the collection difficulty of the training samples is reduced, the labor cost and the time cost required by manually marking the training samples are reduced, and the training efficiency of the preset model is improved.

Fig. 10 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. The image processing apparatus provided in this embodiment may be in the form of software and/or hardware. As shown in fig. 10, the image processing apparatus 1000 provided in this embodiment may include: an acquisition module 1001, a detection module 1002 and a determination module 1003. Wherein,,

an obtaining module 1001, configured to obtain an interface image corresponding to a first user interface, where the interface image includes at least one display object;

the detection module 1002 is configured to slide in the interface image by using at least one detection frame, so as to determine N detection areas in the interface image, and determine detection results of the N detection areas respectively; the detection result of each detection area includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object and/or the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; the N is an integer greater than 1;

A determining module 1003, configured to determine, according to detection results of the N detection areas, a first target area occupied by a super-frame object and/or a second target area occupied by a super-frame portion of the super-frame object in the interface image; the super-frame object is a display object of which at least part of the area exceeds a preset display boundary in the interface image, and the super-frame part is a part of the super-frame object exceeding the preset display boundary.

In some possible implementations, the detection module 1002 is specifically configured to:

extracting K features with different scales from the interface image to obtain feature images corresponding to the K scales respectively; the K is an integer greater than 1;

sliding the feature map corresponding to the ith scale by adopting at least one detection frame so as to determine N in the feature map corresponding to the ith scale _i A plurality of detection areas and respectively determining the N _i Detection results of the detection areas; the N is _i Is an integer greater than 1;

wherein, i sequentially takes K, K-1, K-2, and the terms of the first and second, the n=n _K +N _k-1 +…+N ₁ 。

In some possible implementations, when i < K, the detection module 1002 is specifically configured to:

carrying out fusion treatment on the feature images corresponding to the ith scale and the feature images corresponding to the (i+1) th scale to obtain a feature image corresponding to the ith scale after fusion;

Sliding the feature map corresponding to the fused ith scale by adopting at least one detection frame so as to determine N in the feature map corresponding to the fused ith scale _i A plurality of detection areas and respectively determining the N _i Detection results of the individual detection areas.

In some possible implementations, the ith scale is greater than the (i+1) th scale; the detection module 1002 is specifically configured to:

carrying out up-sampling treatment on the feature map corresponding to the i+1th scale to obtain a sampling feature map, wherein the size of the sampling feature map is the same as that of the feature map corresponding to the i scale;

and carrying out fusion processing on the sampling feature map and the feature map corresponding to the ith scale to obtain the feature map corresponding to the ith scale after fusion.

In some possible implementations, the detection result of each detection region includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object, and the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; the determining module 1003 is specifically configured to:

determining a plurality of first candidate areas and a plurality of second candidate areas in the N detection areas according to the first probability and the second probability corresponding to the N detection areas; the first probability corresponding to the first candidate region is larger than a first threshold value, and the second probability corresponding to the second candidate region is larger than a second threshold value;

And determining the first target area in the first candidate areas and the second target area in the second candidate areas according to the coordinate information of the first candidate areas and the coordinate information of the second candidate areas.

In some possible implementations, the determining module 1003 is specifically configured to:

identifying display objects in the interface image to obtain coordinate information of S display objects; s is an integer greater than or equal to 1;

and determining the first target area in the plurality of first candidate areas, and determining the second target area in the plurality of second candidate areas according to the coordinate information of the plurality of first candidate areas, the coordinate information of the plurality of second candidate areas and the coordinate information of the S display objects.

In some possible implementations, the first target area satisfies two conditions:

a first overlapping area exists between the first target area and at least one display object in the S display objects;

and a second overlapping area exists between the first target area and the second target area, and the second overlapping area is positioned at the edge of the first target area.

In some possible implementations, the second target area satisfies two conditions:

a first overlapping area exists between the second target area and at least one display object in the S display objects;

the second target area and the first target area have a second overlapping area, and the second overlapping area is positioned at the edge of the first target area.

In some possible implementations, the detection module 1002 is specifically configured to: sliding the interface image by adopting at least one detection frame through a preset model to determine N detection areas in the interface image, and respectively determining detection results of the N detection areas;

the determining module 1003 is specifically configured to: determining a first target area occupied by the super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to detection results of the N detection areas through the preset model;

the preset model is obtained by training a plurality of groups of training samples, and each group of training samples comprises: the device comprises a sample image, an area occupied by a super-frame object in the sample image and an area occupied by a super-frame part of the super-frame object in the sample image.

In some possible implementations, the display object is any one of the following: text, icons, images, interface elements.

The image processing apparatus provided in this embodiment may be used to execute the image processing method provided in any of the above method embodiments, and its implementation principle and technical effects are similar, and are not described herein.

In order to achieve the above embodiments, the embodiments of the present disclosure further provide an electronic device.

Referring to fig. 11, there is shown a schematic structural diagram of an electronic device 1100 suitable for use in implementing embodiments of the present disclosure, the electronic device 1100 may be a terminal device or a server. The terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a personal digital assistant (Personal Digital Assistant, PDA for short), a tablet (Portable Android Device, PAD for short), a portable multimedia player (Portable Media Player, PMP for short), an in-vehicle terminal (e.g., an in-vehicle navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 11 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 11, the electronic device 1100 may include a processing means (e.g., a central processor, a graphics processor, etc.) 1101 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage means 1108 into a random access Memory (Random Access Memory, RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are also stored. The processing device 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

In general, the following devices may be connected to the I/O interface 1105: input devices 1106 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1107 including, for example, a liquid crystal display (Liquid Crystal Display, abbreviated as LCD), a speaker, a vibrator, and the like; storage 1108, including for example, magnetic tape, hard disk, etc.; and a communication device 1109. The communication means 1109 may allow the electronic device 1100 to communicate wirelessly or by wire with other devices to exchange data. While fig. 11 illustrates an electronic device 1100 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communications device 1109, or from storage device 1108, or from ROM 1102. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 1101.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN for short) or a wide area network (Wide Area Network, WAN for short), or it may be connected to an external computer (e.g., connected via the internet using an internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to a first aspect, according to one or more embodiments of the present disclosure, there is provided an image processing method including:

According to one or more embodiments of the present disclosure, sliding in the interface image with at least one detection frame to determine N detection areas in the interface image, and determining detection results of the N detection areas, respectively, includes:

According to one or more embodiments of the present disclosure, when i < K, sliding is performed on a feature map corresponding to an ith scale using at least one detection frame to determine N in the feature map corresponding to the ith scale _i A plurality of detection areas and respectively determining the N _i The detection results of the detection areas comprise:

According to one or more embodiments of the present disclosure, the ith scale is greater than the (i+1) th scale; carrying out fusion processing on the feature map corresponding to the ith scale and the feature map corresponding to the (i+1) th scale to obtain a feature map corresponding to the ith scale after fusion, wherein the fusion processing comprises the following steps:

According to one or more embodiments of the present disclosure, the detection result of each detection region includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object, and the second probability that the detection area is the area occupied by the super-frame part of the super-frame object;

according to the detection results of the N detection areas, determining a first target area occupied by the super-frame object and a second target area occupied by the super-frame part of the super-frame object in the interface image, wherein the method comprises the following steps:

According to one or more embodiments of the present disclosure, determining the first target region among the plurality of first candidate regions and the second target region among the plurality of second candidate regions according to the coordinate information of the plurality of first candidate regions and the coordinate information of the plurality of second candidate regions includes:

According to one or more embodiments of the present disclosure, the first target area satisfies two conditions:

According to one or more embodiments of the present disclosure, the second target area satisfies two conditions:

sliding the interface image by adopting at least one detection frame through a preset model to determine N detection areas in the interface image, and respectively determining detection results of the N detection areas;

According to the detection results of the N detection areas, determining a first target area occupied by the super-frame object in the interface image, and/or determining a second target area occupied by the super-frame part of the super-frame object, wherein the method comprises the following steps:

determining a first target area occupied by the super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to detection results of the N detection areas through the preset model;

According to one or more embodiments of the present disclosure, the display object is any one of the following: text, icons, images, interface elements.

In a second aspect, according to one or more embodiments of the present disclosure, there is provided an image processing apparatus including:

According to one or more embodiments of the present disclosure, the detection module is specifically configured to:

According to one or more embodiments of the present disclosure, when i < K, the detection module is specifically configured to:

According to one or more embodiments of the present disclosure, the ith scale is greater than the (i+1) th scale; the detection module is specifically used for:

According to one or more embodiments of the present disclosure, the detection result of each detection region includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object, and the second probability that the detection area is the area occupied by the super-frame part of the super-frame object; the determining module is specifically configured to:

According to one or more embodiments of the present disclosure, the determining module is specifically configured to:

According to one or more embodiments of the present disclosure, the detection module is specifically configured to: sliding the interface image by adopting at least one detection frame through a preset model to determine N detection areas in the interface image, and respectively determining detection results of the N detection areas;

The determining module is specifically configured to: determining a first target area occupied by the super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to detection results of the N detection areas through the preset model;

In a third aspect, according to one or more embodiments of the present disclosure, there is provided an electronic device comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored in the memory, causing the at least one processor to perform the image processing method as described above in the first aspect and the various possible designs of the first aspect.

In a fourth aspect, according to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the image processing method as described above in the first aspect and the various possible designs of the first aspect.

In a fifth aspect, according to one or more embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the image processing method according to the first aspect and the various possible designs of the first aspect.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. An image processing method, comprising:

2. The method of claim 1, wherein sliding the interface image with at least one detection frame to determine N detection areas in the interface image and to determine detection results of the N detection areas, respectively, comprises:

wherein i is K, K-1, K-2, … …, 1 in sequence, where N=N _K +N _k-1 +…+N ₁ 。

3. The method according to claim 2, wherein when i<During K, sliding the feature map corresponding to the ith scale by adopting at least one detection frame so as to determine N in the feature map corresponding to the ith scale _i A plurality of detection areas and respectively determining the N _i The detection results of the detection areas comprise:

4. A method according to claim 3, wherein the i-th scale is greater than the i+1-th scale; carrying out fusion processing on the feature map corresponding to the ith scale and the feature map corresponding to the (i+1) th scale to obtain a feature map corresponding to the ith scale after fusion, wherein the fusion processing comprises the following steps:

5. The method according to any one of claims 1 to 4, wherein the detection result of each detection region includes: the coordinate information of the detection area, the first probability that the detection area is the area occupied by the super-frame object, and the second probability that the detection area is the area occupied by the super-frame part of the super-frame object;

6. The method of claim 5, wherein determining the first target region among the plurality of first candidate regions and the second target region among the plurality of second candidate regions based on the coordinate information of the plurality of first candidate regions and the coordinate information of the plurality of second candidate regions comprises:

7. The method of claim 6, wherein the first target region satisfies two conditions:

8. The method of claim 6, wherein the second target region satisfies two conditions:

9. The method according to any one of claims 1 to 8, wherein sliding the interface image with at least one detection frame to determine N detection areas in the interface image, and determining detection results of the N detection areas, respectively, includes:

10. The method according to any one of claims 1 to 9, wherein the display object is any one of the following: text, icons, images, interface elements.

11. An image processing apparatus, comprising:

12. An electronic device, comprising: a processor and a memory;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory, causing the processor to perform the image processing method according to any one of claims 1 to 10.

13. A computer-readable storage medium, in which computer-executable instructions are stored which, when executed by a processor, implement the image processing method of any one of claims 1 to 10.

14. A computer program product comprising a computer program which, when executed by a processor, implements the image processing method according to any one of claims 1 to 10.