CN117112446B

CN117112446B - Editor debugging method and device, electronic equipment and medium

Info

Publication number: CN117112446B
Application number: CN202311331392.5A
Authority: CN
Inventors: 苟亚明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-02-02
Anticipated expiration: 2043-10-16
Also published as: CN117112446A

Abstract

The disclosure provides an editor debugging method, an editor debugging device, electronic equipment and a medium. The method for debugging the editor comprises the following steps: acquiring a plurality of screen shots of an editor picture; performing multi-level downsampling processing on pixels in the screen capturing to obtain multi-level downsampled pixel blocks, convoluting and cascading the multi-level downsampled pixel blocks to obtain cascading characteristic representations, and calling a probability map and a threshold prediction model to predict the cascading characteristic representations to obtain a predicted probability map and a first threshold; determining edge information of the code region frame based on comparison of the predicted probability map and the first threshold, and then invoking a multi-head attention mechanism to determine a second probability that each pixel in the screen capture is an error position; based on the second probability, an error location in the editor's picture is determined. The embodiment of the disclosure can improve the debugging universality of the editor, reduce the development load and improve the debugging automation degree. The embodiment of the disclosure can be applied to various scenes such as automatic debugging of an editor, intelligent code debugging products and the like.

Description

Editor debugging method and device, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of computers, and in particular relates to an editor debugging method, an editor debugging device, electronic equipment and a medium.

Background

In the prior art, in software development, developers need to write, debug and run programs using various editors. Errors, such as crashes, wild pointers, stuck, etc., may occur while executing the program. To address these issues, a developer may need to develop a specific editor plug-in to locate the error location. However, as the business upgrades, the editor needs to upgrade and the plug-ins also need to upgrade and adapt, resulting in heavy load on the developer. Meanwhile, the plug-ins of different editors cannot be mixed, and a developer needs to develop different plug-ins aiming at different editors, so that the load of the developer is increased. Therefore, there is a need to improve the versatility of editor debugging and reduce the debugging-related development costs.

Disclosure of Invention

The embodiment of the disclosure provides an editor debugging method, an editor debugging device, electronic equipment and a medium, which improve the universality of the editor debugging, reduce development load and improve the debugging automation degree.

According to an aspect of the present disclosure, there is provided an editor debugging method including:

acquiring a plurality of screen shots of an editor picture, and carrying out region division on the screen shots to obtain a plurality of detection frames, wherein each detection frame corresponds to one frame body of a plurality of frame bodies in the editor picture, and the plurality of frame bodies comprise code region frame bodies;

Performing multi-level downsampling processing on pixels in the screen capture to obtain multi-level downsampled pixel blocks, convoluting the downsampled pixel blocks at each level to obtain cascaded characteristic representations, and calling a probability map and a threshold prediction model to predict the cascaded characteristic representations to obtain a predicted probability map and a first threshold, wherein the probability map comprises a first probability for each pixel in the screen capture, and the pixel is a frame boundary pixel;

performing frame edge processing in the screen capture based on the predicted probability map and the comparison of the first threshold value, determining edge information of the code region frame, and calling a multi-head attention mechanism to determine a second probability that each pixel in the screen capture is an error position for the screen capture with the edge information determined;

based on the second probability, an error location in the editor picture is determined.

According to an aspect of the present disclosure, there is provided an editor debugging apparatus including:

the screen capturing unit is used for acquiring a plurality of screens of the editor picture, carrying out region division on the screens to obtain a plurality of detection frames, wherein each detection frame corresponds to one frame body of a plurality of frame bodies in the editor picture, and the plurality of frame bodies comprise code region frame bodies;

The prediction unit is used for carrying out multi-stage downsampling on pixels in the screen capture to obtain multi-stage downsampled pixel blocks, carrying out convolution and cascading on the downsampled pixel blocks of each stage to obtain cascading characteristic representations, and calling a probability map and a threshold prediction model to predict the cascading characteristic representations to obtain a predicted probability map and a first threshold, wherein the probability map comprises a first probability of a frame boundary pixel for each pixel in the screen capture;

a first determining unit configured to perform frame edge processing in the screen shot based on a comparison of the predicted probability map and the first threshold, determine edge information of the code region frame, and invoke a multi-head attention mechanism to determine a second probability that each of the pixels in the screen shot is an error position for the screen shot for which the edge information is determined;

and the second determining unit is used for determining the error position in the editor picture based on the second probability.

Optionally, the screen capturing unit is specifically configured to:

calling an instance segmentation model to divide the region boundaries of the screen shots to obtain a plurality of detection frame boundaries and frame functions corresponding to each detection frame;

And demarcating on the screen shot according to the boundary of the detection frame to obtain a plurality of detection frames, and rendering the detection frames by using the frame body function.

Optionally, the prediction unit is specifically configured to:

taking the downsampled pixel block at the last stage as an equalized pixel block at the last stage;

summing the downsampled pixel blocks of each stage and the pixel blocks which are upsampled by the equalized pixel blocks of the next stage according to elements to obtain the equalized pixel blocks of each stage;

and convolving the equalized pixel blocks of each stage and cascading to obtain the cascading characteristic representation.

Optionally, the prediction unit is specifically further configured to:

convolving the equalized pixel blocks of each stage, and then up-sampling to a pixel block with a preset size;

and cascading the pixel blocks with the preset size to obtain the feature representation after cascading.

Optionally, the prediction unit is specifically further configured to:

determining the corresponding characteristic types of each level;

determining convolution kernels corresponding to each level based on the feature types;

and convolving the downsampled pixel blocks of each stage by using the convolution kernels corresponding to each stage, and cascading convolution results to obtain the cascading feature representation.

Optionally, the first determining unit is specifically configured to:

Determining one of the pixels in the probability map as the frame boundary pixel if the first probability of the pixel is greater than the first threshold;

determining a content type in a closed loop if a plurality of the frame boundary pixels form the closed loop;

and if the content type is code, determining the edge information of the code region frame body as the edge position of the closed loop.

Optionally, the multi-headed gaze mechanism comprises a first multi-headed gaze model and a second multi-headed gaze model;

the first determining unit is specifically configured to:

determining a first context location code of the screen shot of the current period, the first context location code determining the edge information;

determining a second context location code of the screen shot of the edge information determined for a next cycle of the current cycle;

invoking the second multi-head attention model to perform second attention transformation on the second context position code to obtain a first intermediate output;

and calling the first multi-head attention model to perform first attention transformation on the first context position code and the first intermediate output, so as to obtain the second probability.

Optionally, the first determining unit is specifically further configured to:

Masking the code of the position of the second context position code after the next period to obtain a masked second context position code;

and calling the second multi-head attention model to perform second attention transformation on the masked second context position code to obtain a first intermediate output.

Optionally, the first determining unit is specifically further configured to:

determining a feature map of the screen shot of the current period, wherein the feature map determines the edge information and has a first dimension, a second dimension and a third dimension, the first dimension indicates a feature type of a pixel, the second dimension indicates a line number of the pixel on the screen shot, and the third dimension indicates a column number of the pixel on the screen shot;

carrying out convolution on the feature map, adjusting and normalizing the feature map structure, and obtaining an adjusted feature map;

invoking a transformation model to perform feature transformation on the adjusted feature map to obtain a transformed feature map;

the first context location code is determined based on the transformed feature map.

Optionally, the first determining unit is specifically further configured to:

performing first convolution on the feature map to obtain a feature map after the first convolution;

Performing first feature map structure adjustment on the first convolved feature map to obtain a first structure-adjusted feature map;

normalizing the feature map after the first structure adjustment to obtain a first normalized feature map;

performing element summation on a second structure-adjusted feature map obtained after performing second feature map structure adjustment on the first normalized feature map and a third structure-adjusted feature map obtained after performing third feature map structure adjustment on the feature map to obtain a first feature map and a feature map;

and performing second convolution on the first and feature images to obtain the adjusted feature image.

Optionally, the first determining unit is specifically further configured to:

performing third convolution on the transformed feature map to obtain a second convolved feature map, wherein the second dimension and the third dimension of the second convolved feature map are 1;

and summing the characteristic diagram and the characteristic diagram after the second convolution according to elements to obtain the first context position code.

Optionally, before convolving, adjusting and normalizing the feature map to obtain an adjusted feature map, the first determining unit is specifically further configured to:

Performing fourth convolution in the width direction and fifth convolution in the height direction on the feature map to obtain a feature map after third convolution;

and calling a residual processing module to perform residual processing on the third convolved feature map.

Optionally, the residual processing module includes a convolution input channel and a convolution output channel connected in series, and a short-circuit path for shorting the convolution input channel and the convolution output channel connected in series;

the first determining unit is specifically further configured to: and introducing the third convolved feature map into the short-circuit path, wherein when the residual processing module is trained, the inverse gradient during training is sequentially propagated through the convolution output channel and the convolution input channel.

Optionally, the first determining unit is specifically further configured to:

inputting the first context position code and the normalized output into the first multi-head attention model to obtain a first attention output;

and inputting the first attention output into a linear regression model through a feed-forward channel to obtain the second probability.

Optionally, the first determining unit is specifically further configured to:

acquiring header information of the editor picture based on the first contextual location encoding and the first intermediate output using the first multi-headed attention model;

Acquiring the type of the editor based on the header information;

determining a debugging rule based on the type of the editor;

acquiring a work log in the editor picture based on the first context location code and the first intermediate output using the first multi-headed attention model;

determining, with the first multi-headed attention model, the second probability based on the debug rules and the work log.

Optionally, after determining the error location in the editor's picture based on the second probability, the editor debugging device further comprises: a third determination unit configured to:

determining an operation stack corresponding to the error position;

adding a debugging breakpoint in the running stack;

running the debug breakpoint to determine a function body location that caused the error.

Optionally, after running the debug breakpoint, the third determining unit is further configured to:

if the running of the debugging breakpoint fails, identifying a fragment associated with the error position;

and storing the identified fragments to increase the fault tolerance of the editor debugging.

According to an aspect of the present disclosure, there is provided an electronic device including a memory storing a computer program and a processor implementing an editor debugging method as described above when executing the computer program.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements an editor debugging method as described above.

According to an aspect of the present disclosure, there is provided a computer program product comprising a computer program that is read and executed by a processor of a computer device, causing the computer device to perform the editor debugging method as described above.

In the embodiments of the present disclosure, regardless of the editor, a screen capture is uniformly employed in which a rough detection box is generated that substantially coincides with a box in the screen capture. On the basis, through multi-stage up-sampling and cascading of pixels in the screen capture, the obtained cascading characteristic representation input probability map and a threshold prediction model are used for obtaining a probability map and a first threshold, whether each pixel in the screen capture is a boundary point or not can be accurately determined according to the probability map and the first threshold, and therefore the edge of a code region frame body is accurately determined (only a rough detection frame is generated in the last step and is not accurate). The edges of the code area frame are determined because only the actual code that is placed inside this frame will be able to detect the code error location therein. Then, for the screen shot with the edge information determined, a second probability that each pixel in the screen shot is an error position is determined based on the multi-head attention mechanism, and the error position is determined based on the second probability, so that the debugging purpose is achieved. The above process is fully automated, without requiring a developer to write a different plug-in for each editor. When the editor changes, the method of the embodiment of the disclosure can achieve the purpose of uniformly debugging any editor by continuously capturing the screen and automatically executing the screen capturing, and improves the debugging universality of the editor. Meanwhile, when the editor is upgraded, the screen capturing is changed, and the method of the embodiment of the disclosure is automatically executed for the screen capturing through continuous screen capturing, so that the purpose of re-writing the plug-in once without upgrading the editor once is achieved, development load is reduced, and debugging automation degree is improved.

Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the disclosure. The objectives and other advantages of the disclosure will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosed embodiments and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain, without limitation, the disclosed embodiments.

FIG. 1 is an architectural diagram of a system to which an editor debugging method applies in accordance with an embodiment of the present disclosure;

2A-2C are interface diagrams of scenarios in which an editor debugging method is applied to code writing and debugging, according to embodiments of the present disclosure;

FIG. 3 is a general flow diagram of an editor debugging method in accordance with one embodiment of the disclosure;

FIG. 4 is a schematic diagram of an editor picture;

FIGS. 5A-5B are schematic diagrams of specific implementations of step 310 of FIG. 3 for generating a detection box using an example segmentation model;

FIGS. 6A-6B are schematic diagrams showing an embodiment of the step 310 of FIG. 3 for generating a detection frame using a predetermined segmentation template;

FIG. 7 is a schematic diagram of a first implementation of step 320 in FIG. 3;

FIG. 8 is a particular flow chart of step 320 of FIG. 3 utilizing pixel-wise summation to obtain a cascaded characteristic representation;

FIG. 9 is a schematic diagram of a specific implementation of steps 810-830 in FIG. 8;

FIG. 10 is a specific flow chart of step 320 of FIG. 3 using feature types to obtain a concatenated feature representation;

FIG. 11 is a schematic diagram of a second implementation of step 320 of FIG. 3 based on FIG. 7;

FIG. 12A is a schematic diagram of a specific implementation of step 330 of FIG. 3 in which a closed loop is determined based on a comparison of a first probability for each pixel in the probability map with a first threshold;

FIG. 12B is a schematic diagram of a specific implementation of step 330 of FIG. 3 with respect to determining a content type in a closed loop;

FIG. 12C is a schematic diagram showing an implementation of step 330 in FIG. 3 for determining edge information of a code region frame based on a closed loop of content type code;

FIG. 13 is a specific flowchart of step 330 of FIG. 3 for determining a second probability that each pixel is an error location;

FIG. 14 is a schematic diagram of an implementation of the determination of the second probability of FIG. 13 using a first multi-headed attention model and a second multi-headed attention model;

FIG. 15A is a schematic diagram of a second probability that each pixel in the screen shot is an error location;

FIG. 15B is a schematic illustration of the determined error location of the screen shot;

FIG. 16 is a first particular flow chart of step 1310 of FIG. 13 for determining a first context location code;

FIG. 17 is a schematic diagram of an implementation of step 1310 of FIG. 13 for determining a first context location code;

FIG. 18 is a second particular flowchart of step 1310 of FIG. 13 for determining a first context location code based on FIG. 16;

FIGS. 19A-19B are schematic diagrams illustrating embodiments of step 1810 of FIG. 18;

FIG. 19C is a schematic diagram of step 1820 of FIG. 18 with respect to a residual processing model;

FIG. 20 is a first particular flow chart of step 1340 of FIG. 13 for determining a second probability;

FIG. 21 is a specific flowchart of step 2020 in FIG. 20;

FIG. 22 is a schematic diagram showing a specific implementation of steps 2110-2120 of FIG. 21;

FIG. 23 is a schematic diagram of a specific implementation of determining a second probability based on a feature map of a current period and a period next to the current period;

FIG. 24 is a second particular flow chart of step 1340 of FIG. 13 for determining a second probability;

FIG. 25 is a specific flow chart following step 340 with respect to determining the location of the error-causing function body;

FIG. 26 is a first overall implementation detail of an editor debugging method of an embodiment of the disclosure;

FIG. 27 is a second overall implementation detail of an editor debugging method of an embodiment of the disclosure;

FIG. 28 is an application example diagram of an editor debugging method of an embodiment of the disclosure;

FIG. 29 is a block diagram of an editor debugging apparatus in accordance with an embodiment of the disclosure;

FIG. 30 is a terminal block diagram of performing the editor debugging method shown in FIG. 3 in accordance with an embodiment of the disclosure;

fig. 31 is a server configuration diagram for performing the editor debugging method shown in fig. 3 according to an embodiment of the disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present disclosure.

Before proceeding to further detailed description of the disclosed embodiments, the terms and terms involved in the disclosed embodiments are described, which are applicable to the following explanation:

optical character recognition (Optical Character Recognition, OCR): is a computer vision technique for identifying and extracting text from an image or scanned document. OCR technology can convert unstructured image data into structured text data, thereby enabling a computer to understand and process such data.

An editor: also known as a code editor, refers to software or an application used to write code. There are a variety of editors, such as PyCharm, android Studio, and Xcode. The editors used by developers at different ends are different. For example, backend development generally uses pychar, android (Android) development uses Android Studio, iOS development uses Xcode.

The method for debugging the editor comprises the following steps: is an editor processing technique, referring to a debugging method employed to identify the location of errors in the code in an editor. Typically different editors may be provided with corresponding editor plug-ins in order to enable editor debugging. But the use of the editor plug-in increases the development burden of the developer and has poor debugging generality. Furthermore, the editor debugging method is very important in the scenario where the editor is required to write code, so it is very necessary to explore new technologies continuously to ensure that the editor debugging can be performed more efficiently.

System architecture and scenario description applied to embodiments of the present disclosure

Fig. 1 is a system architecture diagram to which an editor debugging method according to an embodiment of the present disclosure is applied. It includes an object terminal 140, the internet 130, a gateway 120, a server 110, etc.

The object terminal 140 is a device for displaying a writing interface and a debugging interface of an editor. The system comprises a desktop computer, a notebook computer, a tablet personal computer, a PDA (personal digital assistant), a mobile phone, a vehicle-mounted terminal and the like. In addition, the device can be a single device or a set of a plurality of devices. For example, a plurality of devices are connected via a lan, and a display device is commonly used for cooperative work, so as to form a target terminal 140. The target terminal 140 may also communicate with the internet 130 in a wired or wireless manner to exchange data.

Server 110 refers to a computer system that can be debugged from an editor's picture. The server 110 is required to have high stability, security, performance, and the like, as compared with the general object terminal 140. The server 110 may be one high-performance computer in a network platform, a cluster of multiple high-performance computers, a portion of one high-performance computer (e.g., a virtual machine), a combination of portions of multiple high-performance computers (e.g., virtual machines), etc.

Gateway 120 is also known as an intersubnetwork connector, protocol converter. The gateway realizes network interconnection on a transmission layer and is a computer system or device which acts as a conversion function. The gateway is a translator between two systems using different communication protocols, data formats or languages, and even architectures that are quite different. At the same time, the gateway may also provide filtering and security functions. The message transmitted from the object terminal 140 to the server 110 is transmitted to the corresponding server 110 through the gateway 120. A message sent by the server 110 to the subject terminal 140 is also sent to the corresponding subject terminal 140 through the gateway 120.

The embodiment of the disclosure can be applied to various scenes, such as the scenes of code writing and debugging shown in fig. 2A-2C.

As shown in fig. 2A, when an object is code written on an editor, 234 to 243 lines of code (1 to 233 lines of code are not shown) are written. After the object completes the code writing, if the code of 1 to 243 lines needs to be debugged, a run button on the editor can be clicked, so that the code of 1 to 243 lines is run. During the running of lines 1 through 243, program termination may occur due to errors in the code somewhere. The program termination interface of the editor is shown in fig. 2B.

In fig. 2B, the object can learn "program termination reason" from the console at the lower right of the editor: excbad ACCESS. After the reason for program termination is known, stack information is acquired at the left stack information of the editor, debugging breakpoints are added to row codes corresponding to the stack information, and then a run button is clicked, so that the row codes before the debugging breakpoints run. If the line code before the debugging breakpoint runs successfully, the line code before the debugging breakpoint is proved to have no error, next stack information is continuously acquired, and the debugging breakpoint is inquired and determined. For example, the stack information is acquired as "7- [ TMFXLogger appenderOpenWithTimespan ]", and the corresponding line code is 238 line code. The debug breakpoint is increased at line 238 (small black dot at line 238 shown in fig. 2B) and then the "run" button is clicked, causing lines 1 to 238 to run. During the running of lines 1 to 238, the "program termination reason" appears again: excbad ACCESS ", indicating 238 that the line code is in error. The debug results interface of the editor is shown in fig. 2C.

In FIG. 2C, the object may learn from the left side of the editor that the error stack is "7- [ TMFXLogger appenderOpenWithTimespan ]". The object can learn that the error location is in the 238 line code from the code area of the editor, and the 238 line code is specifically a function body, so that it is learned that the error function body is "application-open (self.LogPath.UTF8String, self.conflguration.namePrefix.UTF8String, self.conflguration.publickey.UTF8String)".

General description of embodiments of the disclosure

According to one embodiment of the present disclosure, an editor debugging method is provided.

The editor debugging method refers to the process of finding the code error position according to the editor picture. On the one hand, different types of editors, typically different editors pictures, are also different. On the other hand, the same type of editor may also differ in the editor pictures due to version update iterations, etc. According to the embodiment of the disclosure, the editors of different types and editors of different versions can be self-adapted on the basis of the image debugging of the editors, so that the universality of the debugging is greatly improved. In addition, the embodiment of the disclosure can distinguish editors with different types and different versions based on the editors, does not need to develop corresponding editors for debugging, and reduces development burden.

The method for debugging the editor in the embodiment of the disclosure may be performed in the server 110, or may be partially performed in the server 110, or partially performed in the object terminal 140.

As shown in fig. 3, according to one embodiment of the present disclosure, an editor debugging method includes:

step 310, acquiring a plurality of screen shots of an editor picture, and dividing areas on the screen shots to obtain a plurality of detection frames, wherein each detection frame corresponds to one of a plurality of frames in the editor picture, and the plurality of frames comprise code area frames;

step 320, performing multi-level downsampling processing on pixels in the screen capture to obtain multi-level downsampled pixel blocks, convoluting the multi-level downsampled pixel blocks, cascading to obtain cascading characteristic representations, and calling a probability map and a threshold prediction model to predict the cascading characteristic representations to obtain a predicted probability map and a first threshold, wherein the probability map comprises a first probability for each pixel in the screen capture, and the pixel is a frame boundary pixel;

step 330, based on the comparison of the predicted probability map and the first threshold, performing frame edge processing in the screen capture, determining edge information of the code region frame, and calling a multi-head attention mechanism to determine a second probability that each pixel in the screen capture is an error position for the screen capture with the edge information determined;

Step 340, determining an error position in the editor picture based on the second probability.

The following is a detailed description of steps 310-340.

Detailed description of step 310

Step 310 may be divided into a front half and a rear half. The first half of step 310 includes: a plurality of screenshots of the editor's picture are acquired. The second half of step 310 includes: and carrying out region division on the screen capturing to obtain a plurality of detection frames. Wherein each detection frame corresponds to one of a plurality of frames in the editor picture, the plurality of frames including a code region frame.

The editor picture refers to a real-time display interface when an object uses an editor for code writing, and/or code debugging. The editor picture includes a plurality of frames including a header information area frame, a code area frame, a work log area frame, and a menu bar area frame. As shown in fig. 4, the editor picture includes a frame 410, a frame 420, a frame 430, and a frame 440. The frame 410 corresponds to a header information area frame of an editor picture. The header information in the header information area frame includes an engineering name, a running environment, a compiling process, an engineering branch, and the like. For example, in FIG. 4, the engineering name is TMFDemo, the operating environment is ipone 14 Pro, and the engineering branch is deveop. The frame 420 corresponds to the code region frame of the editor picture. The code region frame contains code information. For example, in fig. 4, the code information contains 234 to 243 lines of codes. The frame 430 corresponds to the journal area frame of the editor's picture. The work log area box contains the work log. For example, in fig. 4, the work log is "program termination cause: excbad ACCESS. The frame 440 corresponds to the menu bar area frame of the editor picture. The menu bar in the menu bar area frame body comprises files, edits, views, searches, navigation, products, file resource areas, three-party libraries and the like. For example, in FIG. 4, the file includes Thread 1 (Thread 1), thread 2 (Thread 2), thread 3 (Thread 3), thread 4 (Thread 4), thread 5 (Thread 5), and the like.

It should be noted that, although the editor picture has a plurality of frames, in the embodiment of the disclosure, debugging is mainly implemented by using code area frames in the plurality of frames, and the remaining frames are used for providing auxiliary effects.

Since the frame 420 of the editor picture is limited in scope, it is not possible to show the entire code. As shown in fig. 4, the frame 420 in one editor picture shows the 234 to 243 lines of codes, but cannot show the 1 to 233 lines of codes. Since code runs are typically performed in a front-to-back order, it is necessary to obtain complete code. To this end, in the first half of step 310, the editor's picture is screenshot based on a predetermined rule, resulting in a plurality of screenshots.

The predetermined rule refers to a rule set in advance to screen-capture an editor picture. In one embodiment, the editor's picture is taken a screen shot every predetermined period. The predetermined period may be 1 second, 2 seconds, etc. In another embodiment, in response to detecting the newly increased number of lines of the code region, a screen capture of the editor picture is performed. For example, in response to detecting that the code region has newly increased line 244, a screen shot is made of the editor picture.

It will be appreciated that there is typically a distinction between code region frames among multiple screenshots, with the other frames being substantially identical. In order to be able to accurately identify the code region frames in the screenshot, in the latter half of step 310, a plurality of detection frames are generated on the screenshot, each detection frame corresponding to one of the plurality of frames in the editor's picture.

The specific process of generating the detection frames can be to process the screen capturing by using a neural network model, and annotate each detection frame and corresponding frame body functions for the screen capturing. These labels may be bounding boxes or pixel-level divisions, represented by a detection box. Since the types of editors can be enumerated, the neural network model can employ instance segmentation models such as Mask R-CNN, YOLACT, and the like.

In one embodiment, the second half of step 310 includes:

calling an instance segmentation model to divide regional boundaries of the screen shots to obtain a plurality of detection frame boundaries and frame functions corresponding to each detection frame;

and demarcating on the screen shot according to the boundary of the detection frames to obtain a plurality of detection frames, and rendering the detection frames by using the frame body function.

The example segmentation model refers to a neural network model that performs object detection and semantic segmentation. Specifically, in the screen capturing, an object is detected by using an object detection function of example segmentation to obtain a detection frame boundary, and then each detection frame is labeled by using a semantic segmentation function to obtain a frame function.

Corresponding operations may be performed based on the detection of the box boundaries and the box functions of the output of the instance segmentation model. In this embodiment, the detection frames are mainly delimited on the screen shot according to the detection frame boundaries, so as to obtain a plurality of detection frames, and the detection frames are rendered by using the frame body function. The rendering here may be highlighting a specific area, extracting relevant information, etc.

As shown in fig. 5A, after inputting the screenshot into the instance segmentation model, 4 bounding boxes are obtained, namely bounding box 510a, bounding box 520a, bounding box 530a, and bounding box 540a. The frame function corresponding to the detection frame boundary 510a is header information. The frame function corresponding to the detection frame boundary 520a is code information. The frame function corresponding to the detection frame boundary 530a is log information. The frame function corresponding to the detection frame boundary 540a is menu bar information.

As shown in fig. 5B, a detection frame 510B is obtained by drawing a detection frame boundary 510a on the screen shot, and the detection frame 510B is rendered with the frame function "header information". The detection frame 520b is obtained by drawing a border on the screen shot according to the detection frame border 520a, and the detection frame 520b is rendered with the frame body function of code information. The detection frame 530b is obtained by drawing a border on the screen shot according to the detection frame border 530a, and the detection frame 530b is rendered with the frame body function of "log information". The detection frame 540b is obtained by drawing a border on the screen shot according to the detection frame border 540a, and the detection frame 540b is rendered with the frame body function of "menu bar information".

Referring to fig. 4, a detection frame 510b corresponds to the frame 410, a detection frame 520b corresponds to the frame 420, a detection frame 530b corresponds to the frame 430, and a detection frame 540b corresponds to the frame 440.

The embodiment has the advantages that the generation of the boundary of the detection frame and the identification of the frame body function are realized by using the example segmentation model, and the generation accuracy of the detection frame is improved. And the parameters of the model can be flexibly adjusted, so that the flexibility and the adaptability of generating the detection frame for screen capturing are improved. In addition, the use of the example segmentation model also reduces the operation of the loss function, provides a basis for the determination of the subsequent error position, and reduces the operation time spent by feature extraction.

In another embodiment, the second half of step 310 includes: and obtaining a detection frame template corresponding to the screen capture, demarcating the screen capture by using a detection frame boundary on the detection frame template to obtain a plurality of detection frames, and rendering the detection frames by using a frame body function corresponding to the preset detection frame boundary.

In this embodiment, a common detection frame template can be counted according to several frames commonly found in the screen capture, so that detection frame generation on the screen capture is realized by using the detection frame template. The detection frame template has the functions of detecting frame boundaries and frame bodies. As shown in fig. 6A, the detection frame template has a detection frame boundary 610a, a detection frame boundary 620a, a detection frame boundary 630a, and a detection frame boundary 640a. The corresponding frame functions are header information, code information, log information and menu bar information in sequence.

As shown in fig. 6B, a detection box 610B is rendered with a box function "header information" as the detection box boundary 610a is bounded on the screen shot. The bounding of the detection box 620a is drawn on the screen shot and the detection box 620b is rendered with the box function "code information". The bounding of the detection box 630a is drawn on the screen shot and the detection box 630b is rendered with the box function "log information". The detection box 540b is rendered with the box function "menu bar information" as the detection box boundary 640a borders on the screen shot.

Referring to fig. 4, a detection frame 610b corresponds to the frame 410, a detection frame 620b corresponds to the frame 420, a detection frame 630b corresponds to the frame 430, and a detection frame 640b corresponds to the frame 440.

The method has the advantages that the detection frame generation can be realized on the screen capture based on the detection frame template, and the generation efficiency is improved.

Detailed description of step 320

In step 320, performing multi-level downsampling processing on pixels in the screen capture to obtain multi-level downsampled pixel blocks, convolving and cascading the multi-level downsampled pixel blocks to obtain a cascading feature representation, and calling a probability map and a threshold prediction model to predict the cascading feature representation to obtain a predicted probability map and a first threshold, wherein the probability map comprises a first probability for each pixel in the screen capture, and the pixel is a frame boundary pixel.

Downsampling refers to the operation of compressing/shrinking pixels in a screen capture. Downsampling may be implemented with a convolutional layer with a step size (stride) of 2: the image resulting from the convolution process is made smaller in order to extract the features. The downsampling process is an information loss process, the pooling layer is non-learnable, and a learnable convolution layer with stride of 2 is used for replacing pooling (pooling) to obtain better effects, and a certain amount of calculation is increased. Downsampling may also be implemented with a pooling layer with a step size (stride) of 2: the pooling downsampling is to reduce the dimension of features. Such as Max-pooling and Average-pooling, max-pooling is currently commonly used because of its simplicity of calculation and better preservation of texture features.

Multi-level downsampling refers to the operation of multiple levels of compression/reduction of pixels in a screen capture. For example, a screen shot with a size of 32×32 is downsampled in 5 steps with a step size of 2, and a pixel block after 5 steps of downsampling can be obtained. The 5-stage downsampled pixel blocks are 16×16, 8×8, 4*4, 2×2, 1*1 in order.

As shown in fig. 7, the pixels in the screen shot are downsampled at 5 levels with a step size of 2, and a pixel block after 5-level downsampling is obtained. The size of the pixel block after the 1 st level downsampling is 1/2 screen capturing. The size of the pixel block after the 2 nd level downsampling is 1/4 screen capture. The size of the pixel block after the 3 rd level downsampling is 1/8 screen capturing. The size of the pixel block after the 4 th level downsampling is 1/16 screen capturing. The size of the pixel block after the 5 th level downsampling is 1/32 screen capturing.

And respectively convoluting the 2 nd-level downsampled pixel block, the 3 rd-level downsampled pixel block, the 4 th-level downsampled pixel block and the 5 th-level downsampled pixel block, and then cascading after convolution to obtain the cascading characteristic representation. And finally, inputting the cascaded characteristic representation into a probability map and a threshold prediction model to obtain a predicted probability map and a first threshold. The probability map and the first threshold value will be described in detail later, and the description thereof will be omitted.

In one embodiment, referring to fig. 8, convolving each stage of downsampled pixel blocks and concatenating to obtain a concatenated feature representation, comprising:

step 810, taking the pixel block after the last level downsampling as the pixel block after the last level equalization;

step 820, summing the pixel blocks after downsampling at each level and the pixel blocks after upsampling with the pixel blocks after equalization at the next level according to elements to obtain the pixel blocks after equalization at each level;

and 830, convolving and cascading the pixel blocks after equalization of each stage to obtain the cascading characteristic representation.

In particular, upsampling works in contrast to downsampling, which refers to the operation of increasing/decreasing pixels in a screen shot. Upsampling may be achieved by interpolation: bilinear interpolation, but also nearest neighbor interpolation, trilinear interpolation, etc. are generally used. Upsampling may also be implemented with Transpose convolution, or deconvolution (TransposeConv): by filling the pixel value interval of the screen shot with 0 s and then performing standard convolution calculations, the size of the output screen shot can be made larger than the input.

The convolution is implemented by a convolution layer/kernel. The convolution layer/kernel consists of several convolution units, the parameters of each of which are optimized by a back-propagation algorithm. The purpose of convolution operations is to extract different features of the input, and a first layer of convolution may only extract some low-level features, such as edges, lines, and corners, from which a network of more layers can iteratively extract more complex features.

As shown in fig. 9, the 5 th level is the last level, and the pixel block after the 5 th level downsampling is directly used as the pixel block after the 5 th level equalization. And summing the pixel blocks after the 4 th level downsampling and the pixel blocks after the 5 th level equalization upsampling according to elements (the sum according to the elements is shown in the figure 9), so as to obtain the pixel block after the 4 th level equalization. And summing the pixel blocks after the 3 rd level downsampling and the pixel blocks after the 4 th level equalization upsampling according to elements to obtain the 3 rd level equalization pixel block. And summing the pixel blocks after the 2 nd level downsampling and the pixel blocks after the 3 rd level equalization upsampling according to elements to obtain the 2 nd level equalization pixel block. The pixel block after the level 1 downsampling does not participate in the processing. And finally, respectively convoluting the 2 nd-level balanced pixel block, the 3 rd-level balanced pixel block, the 4 th-level balanced pixel block and the 5 th-level balanced pixel block and then cascading to obtain the cascading characteristic representation.

The benefit of the embodiment of steps 810-830 is that the equalized pixel block at each level contains not only the pixel information at the present level, but also the pixel information at the next level, which improves not only the carrying capacity of the pixel information, but also the accuracy of the feature representation after cascading.

In one embodiment, step 830 includes:

convoluting the pixel blocks after each level of equalization, and then up-sampling to a pixel block with a preset size;

and cascading pixel blocks with preset sizes to obtain the feature representation after cascading.

As shown in fig. 9, the size of the 2 nd-stage equalized pixel block is 1/4 screen shot, the size of the 3 rd-stage equalized pixel block is 1/8 screen shot, the size of the 4 th-stage equalized pixel block is 1/16 screen shot, and the size of the 5 th-stage equalized pixel block is 1/32 screen shot. Assume that the predetermined size is a 1/4 screen shot. The pixel block after the 2 nd level equalization does not need up-sampling after convolution. The pixel block after the 3 rd level equalization is convolved and then up-sampled according to the proportion of 2 times. The 4 th-level equalized pixel block is convolved and then up-sampled according to a 4-times proportion. The pixel block after the 5 th level equalization is convolved and then up-sampled according to 8 times proportion. And finally, cascading all levels of pixel blocks with the size of 1/4 screen capturing to obtain the feature representation after cascading.

The advantage of this embodiment is that the size of the pixel blocks is made the same by upsampling, which reduces the cascading difficulty while ensuring the information carrying capacity.

In the above embodiment, the same convolution check may be used to verify that each level of equalized pixel block is being convolved. In an embodiment, different convolution kernels should be used specifically to take into account the difference in pixel information carried by the equalized pixel blocks of different levels, so that potential features of a deeper level can be extracted.

In a specific implementation of this embodiment, referring to fig. 10, the downsampled pixel blocks of each stage are convolved and concatenated to obtain a concatenated feature representation, including:

step 1010, determining the feature types corresponding to each level;

step 1020, determining convolution kernels corresponding to each level based on the feature types;

step 1030, convolving the downsampled pixel block at each stage with the convolution kernels corresponding to each stage, and cascading the convolution results to obtain the cascading characteristic representation.

In this embodiment, the feature type refers to the type of pixel feature contained in the pixel block after each level of downsampling. Pixel features include edge features, texture features, shape features, and the like. The convolution kernels are different if the feature types are different. Feature types include the state of the editor, the work area distribution, menu bars, codes, header information, etc. The different convolution kernels differ primarily in the size, shape, and parameters of the convolution kernels. Different convolution kernels may capture different pixel characteristics in the downsampled pixel block. And then when the pixel blocks after downsampling of each stage enter convolution, extracting corresponding pixel characteristics by applying convolution kernels corresponding to the stages. And finally, cascading the convolution results to obtain the feature representation after cascading.

The advantage of the embodiment of steps 1010-1030 is that the convolution kernels required for different levels are distinguished, so that different levels of differential convolution are realized, and the pixel characteristics contained in the convolution results of different levels are emphasized, so that the accuracy of the feature representation after cascading is improved.

After the cascade feature representation is obtained, inputting the cascade feature representation into a probability map and a threshold prediction model to obtain a predicted probability map and a first threshold. The probability map contains, for each pixel in the screen capture, a first probability that the pixel is a frame boundary pixel.

The probability map and threshold prediction model refers to a model capable of performing probability map prediction and threshold prediction on the feature representation after cascading. The probability map and the threshold prediction model can be a deep learning model for common text detection, such as a DBNet (Differentiable Binarization Network) model. The DBNet model is specially designed for scene text detection tasks, and can efficiently detect text regions in images, especially irregular text in natural scenes. Embodiments of the present disclosure utilize DBNet to predict a probability map and a first threshold for a post-cascade feature representation. The generated probability map predicts a first probability that each pixel is a frame boundary pixel for the region of the split element. The first threshold is a threshold of probability that the pixel is a frame boundary pixel, assisting in frame boundary segmentation.

The specific implementation of step 320 is described below in conjunction with fig. 11.

In one embodiment, step 320 includes:

performing multi-stage downsampling on pixels in the screen capturing to obtain a pixel block after the multi-stage downsampling;

taking the pixel block after the downsampling of the last stage as the pixel block after the equalization of the last stage;

summing the pixel blocks after downsampling of each stage and the pixel blocks after upsampling of the pixel blocks after equalization of the next stage according to elements to obtain pixel blocks after equalization of each stage;

cascading pixel blocks with preset sizes to obtain a cascading characteristic representation;

and (3) predicting the feature representation after cascading by calling the probability map and a threshold prediction model to obtain a predicted probability map and a first threshold.

In this embodiment, as shown in fig. 11, the pixels in the screen capture are downsampled by 5 levels with a step size of 2, resulting in a block of 5-level downsampled pixels. The size of the pixel block after 5-level downsampling is sequentially 1/2 screen shots, 1/4 screen shots, 1/8 screen shots, 1/16 screen shots and 1/32 screen shots. The 5 th stage is the last stage, and the pixel block after 5 th stage downsampling is directly used as the pixel block after 5 th stage equalization. And summing the pixel blocks after the 4 th level downsampling and the pixel blocks after the 5 th level equalization upsampling according to elements (the sum according to the elements is shown in the figure 11), so as to obtain the pixel block after the 4 th level equalization. And summing the pixel blocks after the 3 rd level downsampling and the pixel blocks after the 4 th level equalization upsampling according to elements to obtain the 3 rd level equalization pixel block. And summing the pixel blocks after the 2 nd level downsampling and the pixel blocks after the 3 rd level equalization upsampling according to elements to obtain the 2 nd level equalization pixel block. The pixel block after the level 1 downsampling does not participate in the processing. Setting the preset size as 1/4 screen capturing is realized. The pixel block after the 2 nd level equalization does not need up-sampling after convolution. The pixel block after the 3 rd level equalization is convolved and then up-sampled according to the proportion of 2 times. The 4 th-level equalized pixel block is convolved and then up-sampled according to a 4-times proportion. The pixel block after the 5 th level equalization is convolved and then up-sampled according to 8 times proportion. And finally, cascading all levels of pixel blocks with the size of 1/4 screen capturing to obtain the feature representation after cascading. Finally, the cascaded feature representation is input into a probability map and a threshold prediction model, and a predicted probability map and a first threshold are obtained (the probability map is shown in fig. 12A).

The embodiment has the advantages that on one hand, the pixel block after equalization of each level not only contains the pixel information of the level, but also contains the pixel information of the next level, so that the carrying capacity of the pixel information is improved, and the accuracy of the feature representation after cascading is also improved. On the other hand, the up-sampling is utilized to enable the pixel blocks to be the same in size, so that the cascading difficulty is reduced, and meanwhile, the information carrying capacity is ensured.

Detailed description of step 330

In step 330, based on a comparison of the predicted probability map and the first threshold, frame edge processing is performed in the screen capture, edge information for the code region frame is determined, and for the screen capture for which the edge information is determined, a multi-headed attention mechanism is invoked to determine a second probability that each pixel in the screen capture is an error location.

In one embodiment, the first half of step 330 includes:

if the first probability of a pixel in the probability map is greater than a first threshold, determining the pixel as a frame boundary pixel;

if a plurality of frame boundary pixels form a closed loop, determining a content type in the closed loop;

if the content type is code, the edge information of the code region frame is determined as the edge position of the closed loop.

In this embodiment, as shown in fig. 12A, the probability map includes a total of 16×10 pixels of the first probability. Assuming that the first threshold is 0.5, pixels having a first probability greater than 0.5 are pixels having a first probability of 0.6, 0.7, 0.8, and 0.9. Pixels with a first probability of 0.6, 0.7, 0.8, and 0.9 are determined as frame boundary pixels (shown bolded in fig. 12A). Based on the box boundary pixels on the probability map, 4 closed loops can be obtained. The 4 closed loops are specifically closed loop 1210, closed loop 1220, closed loop 1230, and closed loop 1240.

After determining the closed loop, the type of content in the closed loop is determined. The content type is used to indicate the type of content in the closed loop. In one embodiment, a corresponding target frame of the closed loop among a plurality of frames of the editor picture is determined, and a content type in the closed loop is determined based on the content of the target frame. For example, if a certain closed loop corresponds to a header information area frame in an editor picture, the content in the closed loop is header information, and thus the content type is header. For another example, if a certain closed loop corresponds to a code region frame in an editor picture, the content in the closed loop is code information, and thus the content type is code. In addition to the above determination, a classification model may also be utilized to determine the type of content in the closed loop. The embodiments of the present disclosure are not particularly limited.

As shown in fig. 12B, after determining the content type in the closed loop, the content type of the closed loop 1210 is obtained as a header. The content type of the closed loop 1220 is a code. The content type of the closed loop 1230 is a log. The content type of the closed loop 1240 is a stack.

As shown in fig. 12C, after determining the content type of the closed loop 1220 as a code, the edge information of the code region frame is determined as the edge position of the closed loop 1220.

The embodiment has the advantages that the frame boundary pixels are determined first, then the closed loop is determined, and then the edge information of the code region frame body is determined by utilizing the closed loop with the content type being the code, so that the accuracy is improved.

After the edge information is determined in the first half of step 330, in the second half of step 330, for the screen shots for which edge information is determined, a second probability that each pixel in the screen shots is an error location is determined based on a multi-headed attention mechanism.

The multi-head attention mechanism is characterized in that when a certain problem is solved in a specific scene, different weights are applied to different information which needs to be considered for solving the problem, higher weights are applied to information with large problem help and lower weights are applied to information with small problem help, so that a model for solving the problem by using the information is better utilized. In particular, with embodiments of the present disclosure, a screen capture includes a plurality of pixels, each of which contributes little to identifying the location of the fault as a problem. For pixels that contribute significantly to identifying the location of the error, more weight is given by the multi-headed attention mechanism. For pixels that do not contribute much to identifying the location of the error, less weight is given by the multi-headed attention mechanism.

In an embodiment, the multi-headed gaze mechanism includes a first multi-headed gaze model and a second multi-headed gaze model. Referring to fig. 13, the second half of step 330 includes:

step 1310, determining a first context position code of the screen shot of the current period, wherein the first context position code is used for determining the edge information;

step 1320, determining a second context position code of the screen shot of the next cycle of the current cycle, wherein the second context position code of the screen shot of the edge information is determined;

step 1330, calling a second multi-head attention model to perform a second attention transformation on the second context position code to obtain a first intermediate output;

step 1340, call the first multi-headed attention model to perform a first attention transformation on the first context position code and the first intermediate output, resulting in a second probability.

In steps 1310-1320, the current period and the period next to the current period each refer to one period among predetermined periods set in advance. The current period and the next period of the current period are adjacent periods, and the current period is before the next period of the current period. For example, if the predetermined period is 1 second, the screen shots of the current period in which the edge information is determined may be obtained at 1 second intervals, and the screen shots of the next period in which the edge information is determined may be obtained at 1 second intervals. Two adjacent periods determine that the screen shots of the edge information have a dependency relationship to some extent. To extract the dependency, embodiments of the present disclosure represent a screen shot of the determined edge information for the current period using a first context position code and a screen shot of the determined edge information for the next period using a second context position code. The determination of the second probability of the error location is then effected based on a dependency relationship between the first context location code and the second context location code.

In step 1330, as shown in FIG. 14, a second context position code is input to a second multi-headed attention model, resulting in a first intermediate output. The second context position code includes a second position code of a plurality of pixels. For a second position code that contributes a large contribution to identifying the error position, a greater weight is given by the second multi-headed attention model. For a second position code with a small contribution to identifying the erroneous position, less weight is given by the second multi-headed attention model.

In step 1340, as shown in fig. 14, the first context position code and the first intermediate output are both input to the first multi-headed attention model, resulting in a second probability. The first context position code includes a first position code of a plurality of pixels. The first intermediate output contains a second position encoding of the plurality of pixels processed by the second multi-headed attention model. For a first position code and a processed second position code, which have a large contribution to the recognition of the error position, a greater weight is given by the first multi-headed attention model. For a first position code and a processed second position code with a small contribution to identifying the error position, a smaller weight is given by the first multi-headed attention model.

As shown in fig. 15A, the screen shot includes 16×10 pixels, and the second probability that each pixel is an error location is 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, and so on. Assuming that the pixel with the highest second probability is the error position, the pixel with the second probability of 0.8 is the error position. As shown in fig. 15B, from the pixel whose second probability is 0.8, the error position of the code found in the screen shot is a function body (the portion shown bolded in fig. 15B).

The advantage of embodiments of steps 1310-1340 is that the determination of the second probability is achieved with the first multi-headed attention model and the second multi-headed attention model together, improving the accuracy of the determination.

In one embodiment, referring to fig. 16, step 1310 includes:

step 1610, determining a feature map of the screen shot of the current period, wherein the feature map is determined by the edge information and has a first dimension, a second dimension and a third dimension;

step 1620, convoluting the feature map, and adjusting and normalizing the feature map structure to obtain an adjusted feature map;

step 1630, invoking a transformation model to perform feature transformation on the adjusted feature map to obtain a transformed feature map;

step 1640, based on the transformed feature map, a first context position encoding is determined.

In step 1610, the screen shots of the current period with the edge information determined are represented in a multi-dimensional manner, resulting in a feature map. The first dimension of the feature map indicates the feature type of the pixel. The second dimension of the feature map indicates the line number of the pixel on the screen capture. The third dimension of the feature map indicates the column number of the pixel on the screen capture. For example, feature representation with a dimension of c×h×w is performed on the screen shot of the current period, where the edge information is determined, and a feature map with a dimension of c×h×w is obtained. Where C refers to a first dimension representing the number of channels, also referred to as depth. Each channel represents a feature type such as edge, texture, color, etc. H refers to a second dimension, also referred to as height. The line number of pixels on the screen capture can also be expressed as the number of pixels or cells of the feature map in the vertical direction. W refers to a third dimension, also referred to as width. The column number of a pixel on the screen capture may also be expressed as the number of pixels or cells of the feature map in the horizontal direction.

In step 1620, in order to further mine the inherent correlation between the feature information of 3 dimensions, the feature map is convolved, and the feature map structure is adjusted and normalized, so as to obtain an adjusted feature map, considering that the feature map contains the feature information of 3 dimensions. The feature map after adjustment not only contains the feature information of the feature map before adjustment, but also contains the context relation among the feature information of 3 dimensions, so that the feature representation capability is improved.

In particular, the convolution is implemented by a convolution layer/kernel. Specific explanations about convolution are detailed above and will not be repeated here. Feature map structure adjustment refers to the process of changing the arrangement positions of features to achieve feature remodeling without changing the total number of features. For example, a feature map with a dimension of 1×h×w may be obtained after feature map structure adjustment. In addition, the convolutional layers herein are used with 1/2 to 1/32 of the pooling layer and other types of layers (e.g., fully-connected layers, looped layers, etc.) to build a complete neural network.

Normalization means that the convolved result is made to satisfy the normal distribution again. In the embodiment of the disclosure, the feature map generally conforms to a normal distribution with a mean value u and a variance h, so that convergence of the model can be accelerated. However, after the feature map is convolved and the feature map structure is adjusted, the feature map may not satisfy the above-mentioned positive distribution, so that normalization is required, and the obtained adjusted feature map is input into the linear rectification function again, so that gradient disappearance is not caused.

In one embodiment, step 1620 includes:

the method comprises the steps of carrying out element summation on a second structure-adjusted feature map obtained after second feature map structure adjustment on a first normalized feature map and a third structure-adjusted feature map obtained after third feature map structure adjustment on the feature map to obtain a first feature map and a feature map;

and performing second convolution on the first and the feature images to obtain an adjusted feature image.

The advantage of this embodiment is that feature adjustments in different directions of the feature map are achieved with a combination of convolution, feature map structure adjustment, and normalization, such that the resulting adjusted feature map further contains contextual dependencies between features.

In step 1630, the transformation model refers to a model that performs feature dimension transformation on the adjusted feature map. The transformation model may employ a neural network model based on a transducer structure. The main role of the transducer model is to handle sequence-to-sequence conversion tasks such as machine translation, text summarization, question-answering systems, etc. Compared with the traditional cyclic neural network and long-short-time memory network, the transducer model has higher calculation parallelism and longer dependence distance, so that higher performance is realized. According to the embodiment of the disclosure, the feature dimension transformation is carried out on the adjusted feature map by using the transducer model, so that the dependency relationship between the features of the adjusted feature map can be mined, and the dimension transformation performance is improved.

In step 1640, a first context position code is determined based on the transformed feature map. In one embodiment, the transformed feature map is subjected to a second convolution to obtain the first context-position code.

In another embodiment, step 1640 includes:

Specifically, assuming that the dimension of the second convolved feature map is Cx1x1, this means that both the height and width of the second convolved feature map are compressed to 1. At this time, each channel in the second convolved feature map contains only one value, and is no longer a two-dimensional matrix.

The embodiment has the advantages that the first context position code is determined based on the feature map and the transformed feature map, so that the first context position code can carry not only the position code of the pixel, but also the position code relation among a plurality of pixels, and the accuracy of the first context position code is greatly improved. In addition, the feature map is compressed to be C1*1, so that the number of parameters is reduced, the computational complexity is reduced, and the global information of each channel can be reserved.

The advantage of this embodiment of steps 1610-1640 is that the information carrying capacity of the first context location coding is improved, which contributes to an improved accuracy of determining the second probability.

The detailed description of the specific implementation of steps 1610-1640 is provided below in connection with fig. 17.

In fig. 17, the feature map is globally context-modeled to obtain a first context-location encoding. Global context modeling is divided into two processes, local context modeling and transformation modeling. In the local context modeling process, feature representation with the dimension of C, H and W is carried out on the screen capturing of the determined edge information of the current period, and a feature map with the dimension of C, H and W is obtained. 1*1 convolution (first convolution) is performed on the feature map with the dimension of c×h×w, to obtain a first convolved feature map with the dimension of c×h×w. And carrying out first feature map structure adjustment on the first convolved feature map with the dimension of C, H and W to obtain a first structure adjusted feature map with the dimension of 1, HW. And normalizing the first structure-adjusted feature map with the dimension of 1 xHW to obtain a first normalized feature map with the dimension of 1 xHW. And performing element summation on a second structure-adjusted feature map with the dimension HW 1*1 obtained by performing second structure adjustment on the first normalized feature map with the dimension HW, and a third structure-adjusted feature map with the dimension C H W obtained by performing third structure adjustment on the feature map with the dimension C H W, so as to obtain the first sum feature map. In the transformation modeling process, 1*1 convolution (second convolution) is performed on the first and feature maps to obtain an adjusted feature map with dimension C/r 1*1. And inputting the adjusted feature map with the dimension of C/r 1*1 into a transformation model to obtain a transformed feature map with the dimension of C/r 1*1. 1*1 convolutions (third convolutions) are performed on the transformed feature map with dimension C/r 1*1 to obtain a second convolved feature map with dimension C1*1. And summing the characteristic diagram with the dimension of C, H and W and the second convolved characteristic diagram with the dimension of C, 1*1 according to elements to obtain the first context position code.

Note that C, H, W, r in the embodiments of the present disclosure are all constant.

In the embodiment of steps 1610-1640, after determining the feature map in step 1610, the feature map is directly convolved, feature map structure adjusted and normalized in step 1620 to obtain an adjusted feature map. In one embodiment, in order to further improve the efficiency and accuracy of processing the feature map in step 1620, feature processing in the width direction and the height direction is required for the feature map.

In particular implementations of this embodiment, referring to fig. 18, step 1310 includes:

step 1610, determining a feature map (first feature map) of the screen capturing of the current period, wherein the feature map has a first dimension, a second dimension and a third dimension;

step 1810, performing a fourth convolution on the feature map (first feature map) in the width direction and a fifth convolution on the height direction to obtain a third convolved feature map;

step 1820, calling a residual processing module to perform residual processing on the third convolved feature map;

step 1620, convolving the feature map (the first feature map after residual processing), adjusting and normalizing the feature map structure to obtain an adjusted feature map (the first adjusted feature map);

Step 1630, inputting the adjusted feature map (first adjusted feature map) into a transformation model to obtain a transformed feature map (first transformed feature map);

step 1640, determining a first context position code based on the transformed feature map (first transformed feature map).

Steps 1610-1640 are described in detail above and are not repeated here.

In step 1810, a fourth convolution may be performed before a fifth convolution. The fifth convolution may be performed first and then the fourth convolution may be performed. As shown in fig. 19A, the fourth convolution in the width direction is performed on the feature map, and the fifth convolution in the height direction is performed on the obtained width convolution result, so that a third convolved feature map is obtained. In an example, as shown in fig. 19B, assume that the fourth convolution is a 3*3 convolution and the fifth convolution is a 3*3 convolution. In this example, the feature map is assumed to be 64×128 in size. First, 3*3 convolution of w=64 is performed on the feature map in the width direction, and a width convolution result is obtained. And then, 3*3 convolution of H=128 in the height direction is carried out on the width convolution result, and a third convolution characteristic diagram is obtained. In another example, assume that the feature map has a size of 64×128. The 3*3 convolution of h=128 in the height direction is performed on the feature map to obtain a height convolution result. And then, 3*3 convolution with w=64 in the width direction is performed on the height convolution result, and a third convolved feature map is obtained.

In step 1820, the residual processing model refers to a module that performs residual processing on the third convolved feature map. As shown in fig. 19C, the residual processing module includes a convolution input channel and a convolution output channel connected in series, and a short-circuit path that shorts the convolution input channel and the convolution output channel connected in series.

The advantage of this embodiment of steps 1810-1820 is that before the adjusted feature map is obtained based on the feature map of the screen shot of the current period, which has determined the edge information, the feature map retains rich dependency relationships between features in both the width direction and the height direction by means of convolution in the width direction and in the height direction, thereby improving the accuracy of the first context position coding.

In an embodiment, invoking the residual processing module to perform residual processing on the third convolved feature map includes: and introducing the third convolved feature map into a short circuit path, wherein when the residual processing module is trained, the inverse gradient during training is sequentially transmitted through the convolution output channel and the convolution input channel.

Specifically, when the residual processing module is actually used, the third post-convolution feature map will pass through the residual processing module from the short-circuit path, and not pass through the convolution input channel and the convolution output channel. When the residual processing module is trained, the reverse gradient is transmitted in sequence through the convolution output channel and the convolution input channel, so that the problem of gradient disappearance in the training process can be avoided.

The embodiment has the advantages that the characteristic diagram after the third convolution is not influenced in actual use, the problem that the counter propagation gradient is easy to disappear can be relieved in training, the stability of the model in training is improved, and model convergence can be quickened.

After determining the first context location code in step 1310 described above, a second context location code of the screen shot of the edge information is determined for the next cycle of the current cycle in step 1320.

In one embodiment, step 1320 includes:

determining a feature map of the screen capturing of the next period of the current period, wherein the feature map of the screen capturing of the edge information is determined, and obtaining a second feature map, wherein the second feature map has a first dimension, a second dimension and a third dimension;

carrying out convolution on the second feature map, and adjusting and normalizing the feature map structure to obtain a second adjusted feature map;

inputting the second adjusted feature map into a transformation model to obtain a second transformed feature map;

a second context position code is determined based on the second transformed feature map.

This embodiment is similar to the embodiment of steps 1610-1640. While this embodiment is directed to a screen shot of the next cycle of the current cycle in which the edge information is determined, the embodiment shown in fig. 18 is directed to a screen shot of the current cycle in which the edge information is determined. For a detailed explanation of this embodiment, please refer to the embodiments of steps 1610-1640, which are not described herein. This embodiment improves the information carrying capacity of the second context position code, contributing to an improved accuracy of determining the second probability.

In one embodiment, step 1320 includes:

performing fourth convolution in the width direction and fifth convolution in the height direction on the second feature map to obtain a fourth convolved feature map;

inputting the fourth convolved feature map into a residual processing module for residual processing;

convolving the feature map (the second feature map after residual processing), and adjusting and normalizing the feature map structure to obtain a second adjusted feature map;

This embodiment is similar to the process of the embodiment shown in fig. 18. While this embodiment is directed to a screen shot of the next cycle of the current cycle in which the edge information is determined, the embodiment shown in fig. 18 is directed to a screen shot of the current cycle in which the edge information is determined. For a detailed explanation of this embodiment, please refer to the embodiment shown in fig. 18, and the detailed description thereof is omitted.

The advantage of this embodiment is that the accuracy of the second context position coding is improved by utilizing the respective convolutions in the width direction and the height direction before the second adjusted feature map is obtained, such that the dependency between the rich features is preserved when the feature map is in both the width direction and the height direction.

After determining the second context position code in step 1320, in step 1330, a first intermediate output is obtained using the second multi-headed attention model based on the second context position code.

In one embodiment, referring to fig. 20, step 1330 includes:

step 2010, masking the codes of the positions of the second context position codes after the next period to obtain masked second context position codes;

step 2020, inputting the masked second context position code into a second multi-head attention model to obtain a first intermediate output.

The second multi-headed attention model in steps 2010-2020 is actually a masked multi-headed attention model. The mask Multi-Head Attention model is a special Multi-Head Self-Attention (Multi-Head Attention) mechanism, which is mainly used for sequence generation in natural language processing tasks, and mainly aims to prevent a decoder from looking at future information in advance when generating an output sequence. Similar to the normal multi-headed attention mechanism, a Mask (Mask) is applied when calculating the attention weight. This mask ensures that when generating the output of the current location, only the current location and its previous inputs are accessible, and that future inputs are not accessible. Thus, erroneous dependency of the model during training can be prevented.

An advantage of embodiments of steps 2010-2020 is that based on masking the encoding of the position after the next cycle, the probability of generating a wrong dependency is reduced while generating the dependency with the second multi-headed attention model, improving the accuracy of the first intermediate output.

After determining the first intermediate output at step 1330, a second probability is obtained based on the first context position code and the first intermediate output using the first multi-headed attention model at step 1340.

In one embodiment, referring to fig. 21, step 1340 includes:

step 2110, inputting the first intermediate output and the second context position code into a residual error connection module, and normalizing the output of the residual error connection module to obtain normalized output;

step 2120, obtaining a second probability using the first multi-headed attention model based on the first context position encoding and the normalized output.

In this embodiment, the residual connection module refers to a module for input and superposition of nonlinear variations of the input. The input corresponds to a second context location code. The nonlinear variation of the input corresponds to the first intermediate output.

The advantage of this embodiment of steps 2110-2120 is that the model complexity can be reduced to reduce the over-fit and improve the accuracy of the second probability by using the residual connection module.

In an embodiment, step 2120 includes:

inputting the first context position coding and normalization output into a first multi-head attention model to obtain a first attention output;

the first attention output is input into a linear regression model through a feed-through channel to obtain a second probability.

The first multi-headed attention model is described in detail above and will not be described in detail herein. A feed-forward channel refers to a channel in which a signal propagates unidirectionally from an input layer to an output layer without feedback. The linear regression model refers to a mathematical regression model that determines the correlation between variables. The present embodiment uses a feed forward channel to unidirectionally propagate the first attention output to the linear regression model. A correlation between the first attention output and the second probability is then determined using a linear regression model, resulting in a second probability.

The advantage of this embodiment is that the second probability is determined jointly using the first multi-headed attention model, the feed forward channel and the linear regression model, improving the accuracy of the second probability.

In one example, as shown in FIG. 22, a second context position code is input to a second multi-headed attention model, resulting in a first intermediate output. The first intermediate output and the second context position code are input into a residual error connection module together, and the output of the residual error connection module is normalized to obtain normalized output. The first context position code and the normalized output are input together into a first multi-headed attention model to obtain a first attention output. The first attention output is input into a linear regression model through a feed-through channel to obtain a second probability.

A detailed implementation of determining the second probability based on the feature map of the current cycle and the feature map of the next cycle of the current cycle is described in detail below in conjunction with fig. 23.

In fig. 23, it is assumed that the size of a feature map (also referred to as a first feature map) of the current period is 64×128. The feature map of the current cycle is subjected to 3*3 convolution in the width direction w=64, and then 3*3 convolution in the height direction h=128, to obtain a third convolved feature map. The third convolved feature map is input to a residual processing module, and the output of the residual processing module is input to a global context modeling (the global context modeling is described in detail above) to obtain a second convolved feature map. And summing the third convolved feature map and the second convolved feature map according to pixels, and performing 3*3 convolution to obtain the first context position code. Similarly, it is assumed that the size of the feature map (also called the second feature map) of the next cycle of the current cycle is also 64×128. The feature map of the next cycle of the current cycle is subjected to 3*3 convolution with w=64 in the width direction, and then 3*3 convolution with h=128 in the height direction, to obtain a fourth convolved feature map. The fourth convolved feature map is input to a residual processing module, and the output of the residual processing module is input to a global context modeling (the global context modeling is described in detail above) to obtain a fifth convolved feature map. The fourth convolved feature map and the fifth convolved feature map are summed pixel by pixel and convolved 3*3 to obtain a second context position code. The second context position code is input into a second multi-headed attention model to obtain a first intermediate output. The first intermediate output and the second context position code are input into a residual error connection module together, and the output of the residual error connection module is normalized to obtain normalized output. The first context position code and the normalized output are input together into a first multi-headed attention model to obtain a first attention output. The first attention output is input into a linear regression model through a feed-through channel to obtain a second probability.

In one embodiment, referring to fig. 24, step 1340 includes:

step 2410, using a first multi-head attention model, obtaining header information of an editor picture based on a first context position code and a first intermediate output;

step 2420, based on the header information, obtaining the type of the editor;

step 2430, determining a debugging rule based on the type of the editor;

step 2440, using the first multi-headed attention model, obtaining a work log in the editor's picture based on the first contextual location encoding and the first intermediate output;

step 2450, using the first multi-headed attention model, determines a second probability based on the debug rules and the work log.

In step 2410, the type of editor is determined to determine the debug rules based on the type of editor, considering that the debug rules required for different editors are not the same. There are a variety of editors on the market. Different editors generally differ in the header information. For example, the header information includes project names, running environments, compiling processes, project branches, and the like. Different editors mainly differ in project name, operating environment, project branches. For example, in fig. 4, the header information includes an engineering name TMFDemo, a running environment of ipone 14 Pro, and an engineering branch of development. Thus, header information of the editor's picture is obtained based on the first contextual position encoding and the first intermediate output using the first multi-headed attention model.

At step 2420, the header information contains different contents and the type of the editor is different. The type of the editor can be obtained by looking up the information and type relation table. The head information may also be classified using a classification model to obtain the type of editor.

In an example, assuming that header information of a certain editor includes an engineering name TMFDemo, a running environment of ipone 14 Pro, and an engineering branch of a development, the type of the editor is X code. In addition to the X code, the types of editors include Android Studio, pychar, and the like.

In step 2430, the debug rule refers to a rule/program set in advance that is capable of locating the location of the error. The debug rule is determined based on the type of the editor, and can be obtained by searching a relation table of the type and the rule. The method can also determine the index based on the type of the editor, and find the debugging rule corresponding to the index based on the index to obtain the debugging rule.

At step 2440, the work log refers to the log that the editor outputs at the workbench/console after running the code. The work log typically contains the code termination exception cause so that code error causes can be determined using rule work, which in turn determines the error location along with the debug rules.

After obtaining the debug rules at step 2430 and the work log at step 2440, at step 1450, a second probability is determined based on the debug rules and the work log using the first multi-headed attention model.

The benefit of the embodiment of steps 2410-2450 is that the use of debug rules and a work log together to determine the second probability that each pixel is an error location greatly improves the accuracy of the determination and the adaptability to the type of editor.

Detailed description of step 340

In step 340, an error location in the editor's picture is determined based on the second probability. As shown in FIG. 15A, the second probability that each pixel in the screen shot is an error location is within the range of [0,1], e.g., the second probability comprises 0.1,0.2,0.3,0, 4, 0.8. As shown in fig. 15B, the bolded code in the code region frame in the editor picture is the error location.

In an embodiment, the second probability is ranked from high to low, and the error position of the editor picture is determined based on the second probability that the ranking is the foremost. For example, the second probability includes {0.1,0.2,0.3,0.8,0.7}, then after the high-to-low ordering, the ordering result is {0.8,0.7,0.3,0.2,0.1}, then the pixel corresponding to the second probability of 0.8 is the error location.

In another embodiment, the second probability is compared to a predetermined threshold, and an error location in the editor's picture is determined based on the second probability being greater than the predetermined threshold. For example, assume that the predetermined threshold is 0.6 and the second probability includes {0.1,0.2,0.3,0.8,0.7}. The second probability greater than 0.6 has 0.8 and 0.7, so that a pixel with a second probability of 0.8 is an error location and a pixel with a second probability of 0.7 is an error location.

After step 340, a detailed description of determining the location of the error-causing function body

In one embodiment, as shown in fig. 25, after step 340, the editor debugging method further includes:

step 2510, determining a running stack corresponding to the error position;

step 2520, adding debug break points in the run stack;

step 2530, running a debug breakpoint to determine the location of the function body that caused the error.

At step 2510, the run stack simply refers to the memory area for the parameters, local variables, return values, and data at the time of hypervisor run, allocated and reclaimed by the system or programmer. The run stack typically corresponds to multiple lines of code. A line of code typically corresponds to a run stack, and the error location indicates that a line of code is in error. Thus, based on the error location, the run stack can be uniquely determined.

At step 2520, a debug breakpoint is added to the run stack, and the termination point of the code run can be controlled. The debug breakpoint is then run in step 2530, such that the line code preceding the debug breakpoint is run. Thereby determining the position of the function body causing the error based on the operation result.

The benefit of this embodiment of steps 2510-2530 is that after determining the location of the error, the location of the function body that caused the error can be automatically determined by adding a debug breakpoint to the run stack. The method not only can reduce the specific implementation and result of the action of each function body in the human analysis stack, but also can locate the specific function body position from the work logs of warning, error reporting information and the like.

In one embodiment, after running the debug breakpoint in step 2530, the editor debugging method further comprises:

if the running debugging breakpoint fails, identifying a fragment associated with the error position;

and storing the identified fragments, and increasing the fault tolerance of the editor debugging.

If the running debugging breakpoint fails, the code before the debugging breakpoint is wrong, and the associated fragment at the wrong position can be identified. The large probability of the patch contains the location of the function body that actually caused the error. The identified tiles are stored and the object is notified by means of printing/output. Fault tolerance refers to the degree of containment errors. The greater the fault tolerance, the greater the degree of error in accommodating code, the greater the pass rate of running debug breakpoints. Conversely, the smaller the fault tolerance, the higher the failure rate of running the debug breakpoint. By increasing the fault tolerance of the editor, the passing rate of running debugging breakpoints is increased so as to determine the position of a function body causing errors.

The embodiment has the advantages that the fault tolerance of the editor debugging can be improved by increasing the fault tolerance, and the flexibility and the accuracy of the editor debugging are improved.

Implementation details of the editor debugging method of the embodiments of the disclosure (one)

As shown in fig. 26, the implementation of the editor debugging method is divided into a detection phase, a post-detection processing phase, and an identification phase.

In the detection stage, the specific structure is basically similar to that of an Optical Character Recognition (OCR) pipeline. Firstly, detecting various elements in an editor picture, and then identifying semantic information of the elements to form preliminary functional area information. In general, an input editor picture may start a timer to take a screen shot at a prescribed time. Then inputting the intercepted screen capturing, and performing corresponding operations such as highlighting specific areas, extracting relevant information and the like according to the output result of the model through the processing of the model based on the instance segmentation. The inspection category includes header information, menu bar, running simulator/platform, console work log. Finally, each screen shot marks each detection frame and the corresponding frame function.

In the post-detection processing stage, a screen capture with detection frame and frame functions generated based on the detection stage is input. Correcting the screen shot, gray level map, binarizing, denoising, inclination correction, scaling and text region detection. The model structure is built based on DBNet, and typically the DBNet model is trained using a prepared data set. During the training process, the model needs to learn how to extract text regions from the images, and thus involves the operation of a loss function, which is often time consuming. The post-detection processing stage obtains a predicted probability map and a first threshold by inputting a screen capture with detection frame and frame functions into a probability map and a threshold prediction model. And then determining the edge information of the code region frame body in the screen capturing based on the comparison of the predicted probability map and the first threshold value, so as to realize detection frame shunting.

In the identification stage, aiming at the positions of head information, source code areas, a control console, a work log and a function stack detection frame, a detected area is cut from an original image, and semantic information is identified based on a neural network of a global context mechanism of a transducer framework. The semantic information contains the type of editor, the work log, and other information. And finally determining preliminary debugging information, wherein the preliminary debugging information can be an error position or a function body position causing error.

It should be noted that, in the embodiment of the present disclosure, a screen capture of a detection frame with a frame body function is detected by using an example segmentation model; then generating feature mapping through the probability map and the threshold model, and taking the probability map and the first threshold; the data are stored in a binary file through a DB algorithm, the output of the current position is ensured to be generated by using a mask multi-head attention mechanism, and the dependence is determined; and building structural debugging information for the main stream editor, and positioning the main stream editor to the function body according to information such as errors, logs and the like. The embodiment of the disclosure can adapt to different types of editors and editors of the same type but different versions while realizing the debugging of the editors. The universality of the editor debugging is improved, and the development burden is reduced.

Implementation details of the editor debugging method of the embodiments of the disclosure (II)

As shown in fig. 27, implementation details of the editor debugging method include:

and acquiring head information of the editor aiming at the input editor picture. Based on the header information of the editor, the type of the editor is acquired. Based on the type of editor, a debug rule is determined.

The running environment, simulator, code branches are obtained before debugging with the debugging rules. And unifying the position corresponding relation and removing the interference data. And acquiring a work log in the editor picture.

In the process of debugging by utilizing the debugging rules, searching and matching the source code file and the line number. Judging whether the debugging breakpoint is increased.

If the debugging breakpoint is not increased, identifying the fragment associated with the error position. And the fault tolerance of the editor debugging is increased, and a debugging breakpoint and a printing stack are inserted. Until a specific function body is located.

And if the debugging breakpoint is increased, running the debugging breakpoint. If the running debug breakpoint fails, an error location can be determined. If the error location is a function body location, directly converting to structural information. If the error position is not the position of the function body, searching more effective information and converting to the structural information. And finally, ending.

Application examples of editor debugging method of embodiments of the present disclosure

As shown in fig. 28, application example details of the editor debugging method include:

(1) The head information is taken, the current engineering name is TMFDemo, the running environment is ipone14 Pro, the code branch is development, and the editor is determined to be Xcode.

(2) Invalid data, such as threads 2-5, is removed.

(3) The current program termination is due to excbad ACCESS by the bt command of the console taking the work log.

(4) Searching and matching files, wherein the searching content is the output content of the control console.

(5) Stack information is acquired, and debugging breakpoints are added.

(6) And locating a specific function body of the crash, and converting structural information analysis parameters.

Apparatus and device descriptions of embodiments of the present disclosure

It will be appreciated that, although the steps in the various flowcharts described above are shown in succession in the order indicated by the arrows, the steps are not necessarily executed in the order indicated by the arrows. The steps are not strictly limited in order unless explicitly stated in the present embodiment, and may be performed in other orders. Moreover, at least some of the steps in the flowcharts described above may include a plurality of steps or stages that are not necessarily performed at the same time but may be performed at different times, and the order of execution of the steps or stages is not necessarily sequential, but may be performed in turn or alternately with at least a portion of the steps or stages in other steps or other steps.

In the various embodiments of the present application, when related processing is required according to data related to characteristics of an object, such as attribute information or attribute information sets, permission or consent of the object is obtained first, and collection, use, processing, etc. of the data complies with related laws and regulations and standards. In addition, when the embodiment of the application needs to acquire the object attribute information, the independent permission or independent consent of the object is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the object is explicitly acquired, the necessary object related data for enabling the embodiment of the application to normally operate is acquired.

Referring to fig. 29, fig. 29 is a schematic structural diagram of an editor debugging device 2900 provided in an embodiment of the present disclosure, the editor debugging device 2900 including:

the screen capturing unit 2910 is configured to obtain a plurality of screen capturing of the editor picture, and perform region division on the screen capturing to obtain a plurality of detection frames, where each detection frame corresponds to one of a plurality of frames in the editor picture, and the plurality of frames include code region frames;

the prediction unit 2920 is configured to perform multi-stage downsampling processing on pixels in the screen capture to obtain multi-stage downsampled pixel blocks, convolve and concatenate the multi-stage downsampled pixel blocks to obtain a concatenated feature representation, and call a probability map and a threshold prediction model to predict the concatenated feature representation to obtain a predicted probability map and a first threshold, where the probability map includes a first probability for each pixel in the screen capture, and the pixel is a frame boundary pixel;

A first determining unit 2930, configured to perform frame edge processing in the screenshot based on the comparison between the predicted probability map and the first threshold, determine edge information of the code region frame, and invoke a multi-head attention mechanism to determine, for the screenshot in which the edge information is determined, a second probability that each pixel in the screenshot is an error position;

a second determining unit 2940 is configured to determine an error position in the editor picture based on the second probability.

Optionally, the screen capturing unit 2910 is specifically configured to:

Optionally, the prediction unit 2920 is specifically configured to:

and convolving each level of equalized pixel blocks and cascading to obtain cascading characteristic representation.

Optionally, the prediction unit 2920 is specifically further configured to:

determining the corresponding characteristic types of each level;

and convolving the downsampled pixel blocks at each stage by using convolution kernels corresponding to each stage, and cascading convolution results to obtain the feature representation after cascading.

Alternatively, the first determining unit 2930 is specifically configured to:

Optionally, the multi-headed attentiveness mechanism includes a first multi-headed attentiveness model and a second multi-headed attentiveness model;

the first determining unit 2930 specifically is configured to:

determining a first context position code of the screen shot of the current period, wherein the screen shot of the edge information is determined;

determining a second context position code of the screen shot of the next period of the current period, wherein the screen shot of the edge information is determined;

Invoking a second multi-head attention model to perform a second attention transformation on the second context position code to obtain a first intermediate output;

and calling a first multi-head attention model to perform first attention transformation on the first context position code and the first intermediate output to obtain a second probability.

Optionally, the first determining unit 2930 is specifically further configured to:

masking the code of the position of the second context position code after the next period to obtain the masked second context position code;

and calling a second multi-head attention model to perform second attention transformation on the masked second context position code to obtain a first intermediate output.

determining a characteristic diagram of the screen shot of the current period, wherein the characteristic diagram is provided with a first dimension, a second dimension and a third dimension, the first dimension indicates the characteristic type of the pixel, the second dimension indicates the line number of the pixel on the screen shot, and the third dimension indicates the column number of the pixel on the screen shot;

A first context location code is determined based on the transformed feature map.

Optionally, before convolving, adjusting and normalizing the feature map to obtain an adjusted feature map, the first determining unit 2930 is specifically further configured to:

Carrying out fourth convolution in the width direction and fifth convolution in the height direction on the feature map to obtain a feature map after third convolution;

the first determining unit 2930 is specifically further configured to include: and introducing the third convolved feature map into a short circuit path, wherein when the residual processing module is trained, the inverse gradient during training is sequentially transmitted through the convolution output channel and the convolution input channel.

inputting the first intermediate output and the second context position code into a residual error connection module, and normalizing the output of the residual error connection module to obtain normalized output;

based on the first context position coding and the normalized output, a second probability is obtained using the first multi-headed attention model.

acquiring header information of the editor picture based on the first context position code and the first intermediate output using the first multi-header attention model;

based on the head information, obtaining the type of the editor;

determining a debugging rule based on the type of the editor;

acquiring a work log in the editor picture based on the first context position code and the first intermediate output by using the first multi-head attention model;

a second probability is determined based on the debug rules and the work log using the first multi-headed attention model.

Optionally, after determining the error location in the editor's picture based on the second probability, the editor debugging device further comprises: a third determining unit (not shown) for:

determining an operation stack corresponding to the error position;

adding debugging breakpoints in the operation stack;

the debug breakpoint is run to determine the location of the function body that caused the error.

Optionally, after running the debug breakpoint, the third determining unit (not shown) is further configured to:

Referring to fig. 30, fig. 30 is a block diagram of a portion of a terminal implementing an editor debugging method of an embodiment of the present disclosure, the terminal including: radio Frequency (RF) circuitry 3010, memory 3015, input unit 3030, display unit 3040, sensor 3050, audio circuitry 3060, wireless fidelity (wireless fidelity, wiFi) module 3070, processor 3080, and power supply 3090. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 30 is not limiting of a cell phone or computer and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The RF circuit 3010 may be used for receiving and transmitting signals during a message or a call, and in particular, after receiving downlink information of a base station, the RF circuit may process the downlink information for the processor 3080; in addition, the data of the design uplink is sent to the base station.

The memory 3015 may be used to store software programs and modules, and the processor 3080 performs various functional applications and data processing of the terminal by executing the software programs and modules stored in the memory 3015.

The input unit 3030 may be used to receive input numeric or character information and to generate key signal inputs related to the setting and function control of the terminal. In particular, the input unit 3030 may include a touch panel 3031 and other input devices 3032.

The display unit 3040 may be used to display input information or provided information and various menus of the terminal. The display unit 3040 may include a display panel 3041.

Audio circuitry 3060, speaker 3061, microphone 3062 may provide an audio interface.

In this embodiment, the processor 3080 included in the terminal may perform the editor debugging method of the previous embodiment.

Terminals of embodiments of the present disclosure include, but are not limited to, cell phones, computers, intelligent voice interaction devices, intelligent home appliances, vehicle terminals, aircraft, and the like. The embodiments of the present invention may be applied to a variety of scenarios including, but not limited to, editor auto-debugging, intelligent code debugging products, and the like.

Fig. 31 is a block diagram of a portion of a server 3100 implementing an editor debugging method of an embodiment of the disclosure. The server 3100 may vary considerably in configuration or performance and may include one or more central processing units (Central Processing Units, simply CPU) 3182 (e.g., one or more processors) and memory 3132, one or more storage media 3130 (e.g., one or more mass storage devices) storing applications 3142 or data 3144. Wherein the memory 3132 and storage medium 3130 may be transitory or persistent. The program stored in the storage medium 3130 may include one or more modules (not shown), each of which may include a series of instruction operations on the server 3100. Still further, the central processor 3122 may be provided in communication with the storage medium 3130, executing a series of instruction operations in the storage medium 3130 on the server 3100.

The server 3100 can also include one or more power supplies 3126, one or more wired or wireless network interfaces 3150, one or more input output interfaces 3158, and/or one or more operating systems 3141, such as Windows server (tm), mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

A processor in the server 3100 may be used to perform the editor debugging method of an embodiment of the disclosure.

The embodiments of the present disclosure also provide a computer readable storage medium storing program code for executing the editor debugging method of the foregoing embodiments.

The disclosed embodiments also provide a computer program product comprising a computer program. The processor of the computer device reads the computer program and executes it, causing the computer device to execute the editor debugging method described above.

The terms "first," "second," "third," "fourth," and the like in the description of the present disclosure and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this disclosure, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

It should be understood that in the description of the embodiments of the present disclosure, the meaning of a plurality (or multiple) is two or more, and that greater than, less than, exceeding, etc. is understood to not include the present number, and that greater than, less than, within, etc. is understood to include the present number.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should also be appreciated that the various implementations provided by the embodiments of the present disclosure may be arbitrarily combined to achieve different technical effects.

The above is a specific description of the embodiments of the present disclosure, but the present disclosure is not limited to the above embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present disclosure, and are included in the scope of the present disclosure as defined in the claims.

Claims

1. An editor debugging method, comprising:

Performing frame edge processing in the screen capture based on a comparison of the predicted probability map and the first threshold value, and determining edge information of the code region frame;

for the screen shot of which the edge information is determined, determining a first context position code of the screen shot of which the edge information is determined in a current period, and determining a second context position code of the screen shot of which the edge information is determined in a next period of the current period;

invoking a second multi-head attention model to perform second attention transformation on the second context position code to obtain a first intermediate output;

acquiring header information of the editor picture based on the first contextual location encoding and the first intermediate output using a first multi-headed attention model;

acquiring the type of the editor based on the header information;

determining a debugging rule based on the type of the editor;

determining, using the first multi-headed attention model, a second probability that each of the pixels in the screen shot is an error location based on the debug rules and the work log;

2. The method for debugging an editor according to claim 1, wherein said dividing the region on the screen shot to obtain a plurality of detection frames comprises:

3. The method for debugging an editor according to claim 1, wherein said convolving said downsampled pixel blocks of each stage into a concatenated cascade to obtain a concatenated feature representation, comprising:

4. A method of debugging an editor according to claim 3, wherein said convolving said equalized pixel blocks of each stage into a concatenated cascade to obtain said concatenated feature representation, comprising:

5. A method of debugging an editor according to claim 3, wherein said convolving each stage of said downsampled pixel block into a concatenated cascade of post-cascade feature representations, comprising:

determining the corresponding characteristic types of each level;

6. The editor debugging method of claim 1 wherein the performing frame edge processing in the screenshot based on the comparison of the predicted probability map and the first threshold to determine edge information for the code region frame comprises:

7. The method of claim 1, wherein invoking the second multi-headed attention model to perform a second attention transformation on the second context location code to obtain a first intermediate output comprises:

8. The method according to claim 1, wherein said determining a first context location code of the screen shot of the current period in which the edge information is determined, comprises:

9. The method for debugging an editor of claim 8, wherein the convolving, feature map structure adjusting, and normalizing the feature map to obtain an adjusted feature map comprises:

10. The editor debugging method of claim 8, wherein the determining the first context location code based on the transformed feature map comprises:

11. The method of claim 8, wherein before convolving, feature map structure adjusting, and normalizing the feature map to obtain an adjusted feature map, the method further comprises:

12. The editor debugging method of claim 11, wherein the residual processing module comprises a series of convolved input channels and convolved output channels, and a shorting path shorting the series of convolved input channels and convolved output channels;

And the calling residual processing module performs residual processing on the third convolved feature map, and the calling residual processing module comprises the following steps: and introducing the third convolved feature map into the short-circuit path, wherein when the residual processing module is trained, the inverse gradient during training is sequentially propagated through the convolution output channel and the convolution input channel.

13. The editor debugging method of claim 1, wherein after determining the error location in the editor picture based on the second probability, the editor debugging method further comprises:

determining an operation stack corresponding to the error position;

adding a debugging breakpoint in the running stack;

14. The editor debugging method of claim 13, wherein after running the debugging breakpoint, the editor debugging method further comprises:

15. An editor debugging device, comprising:

a first determining unit configured to perform frame edge processing in the screen shot, determine edge information of the code region frame, determine a first context position code of the screen shot of which the edge information is determined for a current period, and determine a second context position code of the screen shot of which the edge information is determined for a next period of the current period, based on a comparison of the predicted probability map and the first threshold; invoking a second multi-head attention model to perform second attention transformation on the second context position code to obtain a first intermediate output; acquiring header information of the editor picture based on the first contextual location encoding and the first intermediate output using a first multi-headed attention model; acquiring the type of the editor based on the header information; determining a debugging rule based on the type of the editor; acquiring a work log in the editor picture based on the first context location code and the first intermediate output using the first multi-headed attention model; determining, using the first multi-headed attention model, a second probability that each of the pixels in the screen shot is an error location based on the debug rules and the work log;

16. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the editor debugging method of any of claims 1 to 14 when the computer program is executed.

17. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the editor debugging method of any one of claims 1 to 14.