CN114327703A

CN114327703A - Method, device, equipment and medium for translating terminal screen display content

Info

Publication number: CN114327703A
Application number: CN202111307299.1A
Authority: CN
Inventors: 黄辉煌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2022-04-12

Abstract

The invention provides a method, a device, equipment and a storage medium for translating terminal screen display contents, and related embodiments can be applied to scenes such as a mobile terminal, a computer, a vehicle-mounted terminal and the like. The method comprises the following steps: acquiring current screen display content of the terminal, performing screen capture processing to obtain an image frame to be translated, including text content information, of the current screen display content of the terminal, and performing translation processing on the text content information in the image frame to be translated to obtain a text translation result; and rendering the text translation result according to the text attribute information and then covering the text translation result to a current screen display content page of the terminal, so that an image frame to be translated in the screen display content of the terminal can be accurately translated, and meanwhile, when the text translation result is presented, text color information, text word size information, background information and foreground information can be covered to the screen display content of the terminal through rendering processing and are matched with the picture style of the screen display content of the terminal, and a user can be ensured to obtain more comfortable use experience.

Description

Method, device, equipment and medium for translating terminal screen display content

Technical Field

The present invention relates to video processing technologies, and in particular, to a method and an apparatus for translating content displayed on a terminal screen, an electronic device, and a storage medium.

Background

With the development of Machine Translation, Neural Network Machine Translation (NMT) is commonly used as a new generation of Translation technology. The neural network machine translation system is built based on an encoder-decoder framework, however, in the translation process of the neural network machine translation system, although foreign language character translation in a terminal display screen can accurately translate contents, when translated characters are directly displayed in the screen of the terminal, the expression form of the characters cannot be matched with the display effect of an original foreign language page, fuzzy and double images are often caused, and the use experience of a user is influenced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for translating content on a screen of a terminal, which can not only accurately translate text content information in an image frame to be translated in the content on the screen of the terminal, but also cover text color information, text font information, background information, and foreground information to a current page of the content on the screen of the terminal after rendering processing when the image frame to be translated is presented, so as to match with a picture style of the content on the screen of the terminal, thereby avoiding ghost images and fuzzy characters caused by translating the content of the text, and ensuring that a user obtains more comfortable use experience.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a method for translating contents displayed on a screen of a terminal, which comprises the following steps:

receiving a translation trigger instruction for translating the current screen display content of the terminal;

responding to the translation processing instruction aiming at the current screen display content of the terminal, acquiring the current screen display content of the terminal, and performing screen capture processing to obtain an image frame to be translated, wherein the current screen display content of the terminal comprises text content information; the image frame to be translated comprises text content information and text attribute information corresponding to the text content information;

translating the text content information in the image frame to be translated to obtain a text translation result;

acquiring text attribute information corresponding to the text content information of the image frame to be translated, wherein the text attribute information comprises text color information and text word size information;

rendering the text translation result according to the text attribute information and then covering the rendered text translation result to a current screen display content page of the terminal so as to realize real-time presentation of the rendered text translation result in the terminal.

The embodiment of the invention also provides a device for translating the content displayed on the screen of the terminal, which comprises the following steps:

the information transmission module is used for receiving a translation triggering instruction for translating the current screen display content of the terminal;

the information processing module is used for responding to the translation processing instruction aiming at the current screen display content of the terminal, acquiring the current screen display content of the terminal, and performing screen capture processing to obtain an image frame to be translated, including text content information, of the current screen display content of the terminal; the image frame to be translated comprises text content information and text attribute information corresponding to the text content information;

the information processing module is used for translating the text content information in the image frame to be translated to obtain a text translation result;

the information processing module is used for acquiring text attribute information corresponding to the text content information of the image frame to be translated, wherein the text attribute information comprises text color information and text font information;

and the information transmission module is used for rendering the text translation result according to the text attribute information and then covering the rendered text translation result to the current screen display content page of the terminal so as to realize the real-time rendering of the rendered text translation result in the terminal.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for responding to the translation processing instruction aiming at the current screen display content of the terminal and triggering a corresponding translation model;

the information processing module is used for determining at least one word-level hidden variable corresponding to the text content information through an encoder of the translation model;

the information processing module is used for generating translation words corresponding to the hidden variables of the word level and the selected probability of the translation words according to the hidden variables of the at least one word level through a decoder of the translation model;

and the information processing module is used for selecting at least one translation word to form a text translation result corresponding to the text content information according to the selection probability of the translation result.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for converting the format of the image frame to be translated to obtain the image frame to be translated in a bitmap format;

the information processing module is used for performing color space conversion processing on the image frame to be translated in the image format, and according to a color space conversion processing result, performing identification processing on a background color in the image frame to be translated to obtain a color value of a pixel point corresponding to the background color;

the information processing module is used for extracting color values of pixel points in a region where text content information in the image frame to be translated is located;

the information processing module is used for comparing the color values of the pixel points corresponding to the background color with the color values of the pixel points in the region where the text content information is located to obtain the text color information;

and the information processing module is used for segmenting the image frame to be translated in the bitmap format to obtain the text font information.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for segmenting the image frame to be translated according to the position of the text content information to obtain at least two text recognition segments,

the information processing module is used for counting the text word size corresponding to the text of the single character related in the at least two text recognition segments to obtain the text word size information in each text recognition segment;

the information processing module is used for comparing text word size information in different text recognition segments, and summarizing the different text recognition segments when the horizontal starting positions of the different text recognition segments are determined to be less than or equal to the horizontal position threshold value and the heights of the different text recognition segments are determined to be less than or equal to the height threshold value;

and the information processing module is used for counting the text word size information of the text recognition segment obtained by summarizing processing and screening the corresponding text word size information in the image frame to be translated according to the counting result.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining background information and foreground information corresponding to the text content information according to the position of the text content information in the image frame to be translated;

the information processing module is used for respectively matching the text translation results according to text color information and text font information in the text attribute information to obtain a first text translation result, wherein the first text translation result is matched with the picture style of the current screen display content;

and the information processing module is used for rendering the first text translation result based on the background information and the foreground information corresponding to the text content information to obtain the rendered text translation result.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for converting the image frame to be translated from a red, green and blue (RGB) mode into a hue saturation degree (HSV) mode;

the information processing module is used for clipping the text content information in the image frame to be translated according to the position of the text content information in the image frame to be translated in the HSV mode;

the information processing module is used for determining the background information corresponding to the text content information according to the cutting processing result;

the information processing module is used for determining the edge position of the text content information according to the result of the clipping processing and the position of the text content information in the image frame to be translated, and determining the foreground information according to the edge position of the text content information.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for configuring a character area boundary detection box and a maximum search radius based on the background information corresponding to the text content information;

the information processing module is used for carrying out first marking processing on pixel points exceeding the character area boundary detection frame according to the character area boundary detection frame and the maximum search radius to obtain a first marking result;

the information processing module is used for carrying out second marking processing on the pixel points inside the character area boundary detection frame to obtain a second marking result;

the information processing module is used for traversing pixel points corresponding to all text content information in the two marking results and carrying out pixel reverse filling processing to obtain a background processing result of the first text translation result;

and the information processing module is used for responding to a background processing result of the first text translation result, performing foreground information fusion processing, and obtaining a fused first text translation result.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for carrying out Gaussian blur processing on the image of the area where the foreground information is located;

the information processing module is used for converting the result of the Gaussian blur processing from an RGB mode to an HSV mode to obtain the foreground information of the HSV mode;

and the information processing module is used for sequentially carrying out morphological expansion processing and morphological corrosion processing on the image of the region where the foreground information is located to obtain the fused first text translation result.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for acquiring replacement operation aiming at a target bullet screen through the bullet screen editing component when the terminal screen display content is a game video and the text content information is a bullet screen;

the information processing module is used for responding to the replacement operation and acquiring a second text translation result corresponding to the target bullet screen;

the information processing module is used for rendering the second text translation result according to the text attribute information through the bullet screen editing component;

and the information processing module is used for replacing the target bullet screen in the current screen display content page of the terminal by using the rendered second text translation result.

In the above-mentioned scheme, the first step of the method,

and the information processing module is used for controlling and adjusting the display position of the new bullet screen by the control layer component corresponding to the current screen display content of the terminal so as to realize the display of the new bullet screen.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining an experience threshold matched with the image frame to be translated when the terminal screen display content is a cloud game video;

the information processing module is used for determining the color overflowing pixels in the terminal screen display content based on the experience threshold matched with the image frame to be translated and the image processing boundary range matched with the color key parameters;

and the information processing module is used for adjusting the gray value of the overflowing pixel in the terminal screen display content.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for presenting a translation function item in a view interface, and the translation function item is used for translating the character information in the terminal screen display content;

the information processing module is used for responding to the triggering operation aiming at the translation function item, acquiring and presenting the translated image frame to be translated and the corresponding background image, wherein the translated image frame comprises the first text translation result.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for presenting the translation function items in the form of the hovercall in the view interface, wherein the translation function items in the form of the hovercall can respond to the trigger operation and adjust the display position in the view interface; alternatively, the first and second electrodes may be,

the information processing module is configured to present a transparent translation function item in the view interface, where the number of pixels of the transparent translation function item is 1, and the transparent translation function item can adjust a display position in the view interface in response to the trigger operation.

An embodiment of the present invention further provides an electronic device, where the electronic device includes:

a memory for storing executable instructions;

and the processor is used for realizing the method for translating the screen display content of the terminal according to the preorder when the executable instruction stored in the memory is operated.

The embodiment of the invention also provides a computer readable storage medium, which stores executable instructions, and the executable instructions are executed by a processor to realize a method for translating the contents displayed on the screen of the terminal according to the preamble.

The embodiment of the invention has the following beneficial effects:

receiving a translation trigger instruction for translating the current screen display content of the terminal; responding to the translation processing instruction aiming at the current screen display content of the terminal, acquiring the current screen display content of the terminal, and performing screen capture processing to obtain an image frame to be translated, wherein the current screen display content of the terminal comprises text content information; the image frame to be translated comprises text content information and text attribute information corresponding to the text content information; translating the text content information in the image frame to be translated to obtain a text translation result; acquiring text attribute information corresponding to the text content information of the image frame to be translated, wherein the text attribute information comprises text color information and text word size information; rendering the text translation result according to the text attribute information and then covering the rendered text translation result to a current screen display content page of the terminal so as to realize real-time presentation of the rendered text translation result in the terminal. Therefore, the image frames to be translated in the terminal screen display content can be accurately translated, and meanwhile, when the image frames to be translated are presented, the text color information, the text word size information, the background information and the foreground information can be covered on a current screen display content page of the terminal after rendering processing and are matched with the picture style of the terminal screen display content, so that double images and fuzzy characters caused by directly translating the text content are avoided, and a user is ensured to obtain more comfortable use experience.

Drawings

Fig. 1 is a schematic view of a usage scenario of a method for translating content displayed on a terminal screen according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 3A is an optional flowchart of a method for translating content displayed on a screen of a terminal according to an embodiment of the present invention;

fig. 3B is a schematic diagram of capturing an image frame to be translated in the method for translating the content displayed on the screen of the terminal according to the embodiment of the present invention;

fig. 4 is a schematic view of an optional processing interface of the method for translating the contents displayed on the screen of the terminal according to the embodiment of the present invention;

FIG. 5 is a diagram illustrating the structure of a layer assembly in an embodiment of the present invention;

fig. 6 is a schematic view of an optional processing interface of a method for translating content displayed on a screen of a terminal according to an embodiment of the present invention;

fig. 7 is a schematic view of an alternative processing interface of a method for translating contents displayed on a screen of a terminal according to an embodiment of the present invention;

fig. 8 is a schematic view of an alternative processing interface of a method for translating contents displayed on a screen of a terminal according to an embodiment of the present invention;

fig. 9 is a schematic diagram illustrating processing of a target bullet screen by the method for translating content displayed on a terminal screen according to the embodiment of the present invention;

fig. 10 is a schematic diagram illustrating that a new barrage is formed by the method for translating the content displayed on the terminal screen according to the embodiment of the present invention;

fig. 11 is a schematic view of a display effect of forming a new barrage by the method for translating the display content of the terminal screen according to the embodiment of the present invention;

fig. 12 is a schematic flowchart of an alternative method for translating content displayed on a screen of a terminal according to an embodiment of the present invention;

fig. 13 is an optional flowchart of a method for translating content displayed on a screen of a terminal according to an embodiment of the present invention;

fig. 14 is a schematic flowchart of an alternative method for translating content displayed on a terminal screen according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

1) Neural Networks (NN): an Artificial Neural Network (ANN), referred to as Neural Network or Neural Network for short, is a mathematical model or computational model that imitates the structure and function of biological Neural Network (central nervous system of animals, especially brain) in the field of machine learning and cognitive science, and is used for estimating or approximating functions.

2) Speech Recognition (SR Speech Recognition): also known as Automatic Speech Recognition (ASR Automatic Speech Recognition), Computer Speech Recognition (CSR Computer Speech Recognition) or Speech-To-Text Recognition (STT Speech To Text), the goal of which is To automatically convert human Speech content into corresponding Text using a Computer.

3) Machine Translation (MT): in the category of computational linguistics, the study of translating words or speech from one natural language to another by computer programs has been carried out. Neural Network Machine Translation (NMT) is a technique for performing Machine Translation using Neural network technology.

4) Speech Translation (Speech Translation): also known as automatic speech translation, is a technology for translating speech of one natural language into text or speech of another natural language through a computer, and generally comprises two stages of speech recognition and machine translation.

5) Optical Character Recognition (OCR) converts characters of various bills, newspapers, books, documents and other printed matters into image information by an Optical input method such as scanning, and converts the image information into a usable computer input technology by using a Character Recognition technology.

6) In response to the condition or state on which the performed operation depends, one or more of the performed operations may be in real-time or may have a set delay when the dependent condition or state is satisfied; there is no restriction on the order of execution of the operations performed unless otherwise specified.

7) Virtual scene: is a virtual scene that is displayed (or provided) by an application program when the application program runs on a terminal. The virtual scene can be a simulation environment of a real world, a semi-simulation semi-fictional three-dimensional environment or a pure fictional three-dimensional environment.

The virtual scene may be any one of a two-dimensional virtual scene, a 2.5-dimensional virtual scene, and a three-dimensional virtual scene, and the following embodiments are illustrated by way of example, but not limited thereto, in which the virtual scene is a three-dimensional virtual scene. Optionally, the virtual scene is also used for virtual scene engagement between at least two virtual objects. Optionally, the virtual scene is also used for a virtual firearm fight between at least two virtual objects. Alternatively, the virtual scene may be, but not limited to, a gunfight Game, a running cool Game, a Racing Game, a Multiplayer Online tactical sports Game (MOBA), a Racing Game (RCG), and a sports Game (SPG). The trained data processing model can be deployed in game servers corresponding to various game scenes and used for generating a real-time virtual scene advancing route and presenting the virtual scene advancing route in a game interface, executing corresponding actions in corresponding games, simulating the operation of virtual users, and completing different types of games in the virtual scenes together with users who actually participate in the games.

8) Virtual objects, the appearance of various people and objects in the virtual scene that can interact, or movable objects in the virtual scene. The movable object can be a virtual character, a virtual animal, an animation character, etc., such as: characters, animals, plants, oil drums, walls, stones, etc. displayed in the virtual scene. The virtual object may be an avatar in the virtual scene that is virtual to represent the user. The virtual scene may include a plurality of virtual objects, each virtual object having its own shape and volume in the virtual scene and occupying a portion of the space in the virtual scene.

Fig. 1 is a schematic view of a usage scenario of a method for translating content displayed on a terminal screen according to an embodiment of the present invention, and referring to fig. 1, a client for a video playing function is disposed on a terminal (including a terminal 10-1 and a terminal 10-2), the terminal is connected to a server 200 through a network 300, the network 300 may be a wide area network or a local area network, or a combination of the two, and data transmission is achieved by using a wireless link.

As an example, the terminal (terminal 10-1 and/or terminal 10-2) is configured to acquire a video and a translation result acquired from the server 200 and display the text translation result in the played video, so as to facilitate the user to understand the content of the video.

The terminal (including the terminal 10-1 and the terminal 10-2) is further configured to, after the editing operation for the target text translation result is obtained by the text translation result editing component and a new text translation result is formed based on the content of the target text translation result, store the formed new text translation result in a corresponding storage medium.

The terminal (terminal 10-1 and/or terminal 10-2) is further configured to obtain a new text translation result stored in the server 200 through the network 300, and display all text translation results including the new text translation result returned by the server 200 to the user in the playing process of the current on-screen content of the terminal.

As will be described in detail below, the electronic device according to the embodiment of the present invention may be implemented in various forms, such as a terminal with a video playing function, such as a smart phone, a tablet computer, and a desktop computer, and may also be a server with a video displaying function. Fig. 2 is a schematic diagram of a composition structure of an electronic device according to an embodiment of the present invention, and it is understood that fig. 2 only shows an exemplary structure of the electronic device, and not a whole structure, and a part of the structure or the whole structure shown in fig. 2 may be implemented as needed.

The electronic equipment provided by the embodiment of the invention comprises: at least one processor 201, memory 202, user interface 203, and at least one network interface 204. The various components in the electronic device 20 are coupled together by a bus system 205. It will be appreciated that the bus system 205 is used to enable communications among the components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 205 in fig. 2.

The user interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.

It will be appreciated that the memory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.

In some embodiments, the apparatus for translating the content displayed on the screen of the terminal provided by the embodiment of the present invention may be implemented by a combination of hardware and software, and as an example, the image apparatus provided by the embodiment of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the method for translating the content displayed on the screen of the terminal provided by the embodiment of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

As an example that the apparatus for translating the content on the screen of the terminal provided by the embodiment of the present invention is implemented by combining software and hardware, the apparatus for translating the content on the screen of the terminal provided by the embodiment of the present invention may be directly embodied as a software module executed by the processor 201, where the software module may be located in a storage medium located in the memory 202, and the processor 201 reads executable instructions included in the software module in the memory 202 and completes the method for translating the content on the screen of the terminal provided by the embodiment of the present invention in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).

By way of example, the Processor 201 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.

As an example of the hardware implementation of the Device for translating the terminal screen content provided by the embodiment of the present invention, the Device provided by the embodiment of the present invention may be implemented directly by using the processor 201 in the form of a hardware decoding processor, for example, by using one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components to implement the method for translating the terminal screen content provided by the embodiment of the present invention.

The memory 202 in embodiments of the present invention is used to store various types of data to support the operation of the electronic device 20. Examples of such data include: any executable instructions for operating on the electronic device 20, such as executable instructions, may be included in the executable instructions, and the program implementing the method for translating the terminal on-screen content according to the embodiment of the present invention may be included in the executable instructions.

In other embodiments, the apparatus for translating terminal on-screen content according to embodiments of the present invention may be implemented in software, and fig. 2 illustrates an apparatus 2020 for translating terminal on-screen content stored in the memory 202, which may be software in the form of programs and plug-ins, and includes a series of modules, and as an example of the program stored in the memory 202, an apparatus 2020 for translating terminal on-screen content may be included, and the apparatus 2020 for translating terminal on-screen content includes the following software modules: an information transmission module 2081, an information processing module 2082, and a data layer module 2083. When the software modules in the apparatus 2020 for translating terminal on-screen display content are read into the RAM by the processor 201 and executed, the method for translating terminal on-screen display content according to the embodiment of the present invention will be implemented, and the functions of the software modules in the apparatus 2020 for translating terminal on-screen display content will be described below, wherein,

the information transmission module 2081 is used for receiving a translation triggering instruction for translating the current screen display content of the terminal;

the information processing module 2082 is configured to respond to the translation processing instruction for the current screen display content of the terminal, obtain the current screen display content of the terminal, and perform screenshot processing to obtain an image frame to be translated, where the current screen display content of the terminal includes text content information; the image frame to be translated comprises text content information and text attribute information corresponding to the text content information;

the information processing module 2082 is configured to translate text content information in the image frame to be translated to obtain a text translation result;

the information processing module 2082 is configured to render the text translation result according to the text attribute information and then cover the rendered text translation result to a current screen display content page of the terminal, so as to realize real-time presentation of the rendered text translation result in the terminal.

According to the electronic device shown in fig. 2, in one aspect of the present application, the present application also provides a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the different embodiments and the combination of the embodiments provided in the various optional implementation modes of the method for translating the terminal screen display content.

In the following description, the method for translating the content displayed on the screen of the terminal according to the embodiment of the present invention is described with reference to the exemplary application and implementation of the terminal according to the embodiment of the present invention, and it can be understood from the foregoing that the method for translating the content displayed on the screen of the terminal according to the embodiment of the present invention can be implemented by various types of devices with video processing functions, such as a special device for video playing, a computer, a server, and the like.

Referring to fig. 3A, fig. 3A is an optional flowchart of the method for translating terminal on-screen display content according to the embodiment of the present invention, and it can be understood that the steps shown in fig. 3A may be executed by various electronic devices running a video processing function, for example, a terminal, a server, or a server cluster of a type such as a computer, a smart phone, and the like with video playing and processing functions. The following is a description of the steps shown in fig. 3A.

Step 301: and receiving a translation trigger instruction for translating the current screen display content of the terminal.

In some embodiments of the present invention, the terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, and the like. Taking the current screen display content of the terminal as the text information in the display interface of the game program as an example, by acquiring the real-time screen capture of the user in the game process, and combining the translation process provided by the game server to translate the game capture in real time to generate a text translation result, and finally rendering and covering the result in the original screen interface, the user can know the meaning of the text information displayed in the game video.

In some embodiments, when the method for translating the screen display content of the terminal provided by the present invention is used, a translation trigger instruction for translating the current screen display content of the terminal may be received by presenting a translation function item in a view interface; the translation function items can be always in a starting state, when a terminal screen is started, real-time monitoring can be provided for terminal screen display contents, screenshot processing is carried out in response to a translation processing instruction aiming at the current terminal screen display contents, and image frames to be translated including text content information in the current terminal screen display contents are obtained;

in some embodiments, when a user of a terminal does not authorize information collection of a presented translation function item, the presented translation function item may be in a closed state, receiving a translation trigger instruction for performing translation processing on a current screen content of the terminal by triggering the presented translation function item, then, in response to the translation processing instruction for the current screen content of the terminal, obtaining the current screen content of the terminal, performing screen capture processing to obtain an image frame to be translated, which includes text content information, in the current screen content of the terminal, and adjusting the presented translation function item to the closed state after a rendered text translation result is presented in the terminal in real time, so as to ensure the safety of user information of the terminal.

Referring to fig. 3B, fig. 3B is a schematic diagram of capturing an image frame to be translated in a method for translating terminal on-screen display content according to an embodiment of the present invention, where a translation function item 3001 may be presented in a view interface, where the translation function item 3001 is used to implement translation of text information in the terminal on-screen display content; in response to a trigger operation for the translation function item 3001, acquiring and presenting the image frame to be translated and the corresponding background image, wherein the image frame to be translated and the corresponding background image are subjected to translation processing, and the translation image frame subjected to translation processing includes a corresponding text translation result.

As shown in fig. 3B, the translation function item 3001 presents a hover ball shape in the view interface, where the translation function item in the hover ball shape can adjust a display position in the view interface in response to the trigger operation by receiving a control instruction; in some embodiments, the translation function item 3001 may also present a transparent form in the view interface, where the number of pixels of the translation function item in the transparent form is 1, the translation function item in the transparent form may adjust a display position in the view interface in response to the trigger operation, and by adjusting the pixel of the translation function item 3001 to 1, the translation function item in the transparent form may be prevented from blocking characters in the game video presented by the view interface, so as to implement accurate translation of character information in an image frame to be translated, as an embodiment, the translation function item in the transparent form may be set as a shape of a floating ball by default, and a user may also adjust the shape of the translation function item 3001 according to usage requirements, which is not limited in this embodiment of the present invention.

In some embodiments of the present invention, the method for translating the content displayed on the screen of the terminal according to the present invention may further process the foreign language barrage information in the long video, so as to implement translation and display of the foreign language barrage information, specifically, when the video client running through the terminal obtains the video data sent by the corresponding server, the video client synchronously obtains the text translation result sent by the server, and stores the text translation result in the queue of the text translation result to be displayed corresponding to the obtained video, where the text translation result obtained by the client currently displaying the content on the screen of the terminal may be submitted by different users watching the communicated video through respective clients, or submitted by different users watching the communicated video through corresponding web play interfaces. When the current screen display content of the terminal is played through the client, calling the text translation result queue to be displayed, acquiring a second text translation result corresponding to a target bullet screen, and rendering the second text translation result according to the text attribute information through a bullet screen editing component; and replacing the target bullet screen in the current screen display content page of the terminal by using the rendered second text translation result, so that the user can know the foreign language bullet screen information in time.

In some embodiments of the present invention, displaying the text translation result during playing of the video by a video client (e.g., a video player) running on the terminal may be implemented as follows: and playing the video through a picture layer of the client, and displaying the text translation results in the text translation result queue to be displayed through a text translation result layer suspended above the picture layer.

In some embodiments of the present invention, when the text content information to be translated is bullet screen information of a game video, in the process of playing the video by a video client operated by a terminal, a text translation result in a text translation result queue to be displayed enters from one side of a playing area of a content currently displayed on a screen of the terminal and moves to the other side, and the display is stopped when the text translation result moves to the other side of the video playing area, wherein in the present invention, no specific limitation is made on the moving mode of the text translation result.

In some embodiments of the present invention, the displaying the text translation result in the current on-screen content of the terminal includes:

and controlling and adjusting the display position of the text translation result by a control layer component corresponding to the current screen display content of the terminal so as to display the text translation result. The control layer component can adjust the position, the stay time and the display position of the text translation result in the text translation result queue to be displayed entering the video playing area.

Step 302: and responding to the translation processing instruction aiming at the current screen display content of the terminal, acquiring the current screen display content of the terminal, and performing screen capture processing to obtain an image frame to be translated, wherein the current screen display content of the terminal comprises text content information.

The image frame to be translated shown in step 302 includes text content information and text attribute information corresponding to the text content information, and in the following embodiments, a processing procedure of the text content information and a process of performing rendering processing based on the text attribute information will be described.

In some embodiments of the present invention, when a video client running through a terminal starts playing of a current on-screen content of the terminal, the obtaining of the selection operation for the displayed text translation result is triggered.

In some embodiments of the present invention, when the current screen display content client of the terminal or the game client is started, the acquisition of the selection operation of the translation result for the displayed text content information is triggered and triggered at the same time, so as to keep the acquisition state of the selection operation of the displayed text translation result in the playing process of the current screen display content of the terminal.

In some embodiments of the present invention, the obtaining of the selection operation for the displayed text translation result includes:

when a corresponding text translation result is displayed in the current screen display content of the terminal, the display layer component corresponding to the current screen display content of the terminal acquires an image frame of a video in a playing state, and performs screenshot processing to form different bitmaps, wherein pixels of the bitmaps are all distributed with specific positions and color values. The color information of each pixel is represented by an RGB combination or a gray value.

The bitmap can be divided into 1, 4, 8, 16, 24, and 32 bit images, etc., according to bit depth. The larger the number of information bits used per pixel, the more colors are available, the more realistic the color representation, and the larger the corresponding amount of data. For example, a pixel bitmap with a bit depth of 1 has only two possible values (black and white), and is therefore also referred to as a binary bitmap. An image with a bit depth of 8 has 2^8 (i.e., 256) possible values. A gray mode image with a bit depth of 8 has 256 possible gray values. Since the terminal screen content may be a game video or a long video, the bitmap may be obtained through different approaches, for example: when the electronic equipment is touch electronic equipment with a video playing function, the selection operation can be realized by touch control on a display interface of the electronic equipment; when the electronic device is an electronic device with a video playing function connected to an external operating device, the selecting operation may be performed by the external operating device, and the external operating device includes but is not limited to: mouse, keyboard, action bars.

Step 303: and translating the text content information in the image frame to be translated to obtain a text translation result.

The method comprises the following steps of translating text content information in an image frame to be translated to obtain a text translation result, and can be realized in the following modes:

responding to the translation processing instruction aiming at the current screen display content of the terminal, and triggering a corresponding translation model; determining at least one word-level hidden variable corresponding to the text content information through an encoder of the translation model; generating, by a decoder of the translation model, a translated term corresponding to the word-level hidden variable and a selected probability of the translated term according to the at least one word-level hidden variable; and selecting at least one translation word to form a text translation result corresponding to the text content information according to the selection probability of the translation result. For a game use scene of non-Chinese continents (such as a game scene of an American game server or a game scene of a Japanese game server), because English or Japanese is used for a virtual object and a virtual scene in a game, Chinese continent users often cannot understand the meanings of the virtual object and the virtual scene in time, and therefore foreign language meanings of the virtual object and the virtual scene can be obtained in time through a translation model.

In some embodiments of the present invention, since the current screen content of the foreign language terminal may have a problem that the text content information may be long sentences (for example, when a japanese costume game program is executed, the current screen content of the terminal is ち broken from を to ること from "" できなくて to を and further に strong にならせます), a segmentation operation for the target text translation result may also be obtained through the text translation result editing component, so as to form at least one segmentation point; screening text contents forming the new text translation result from the target text translation result according to the at least one division point; and combining the screened text contents to form the new text translation result so as to adapt to the translation of a longer sentence.

With continuing reference to fig. 4, fig. 4 is an optional structural schematic diagram of a translation model in an embodiment of the present invention, where the Encoder includes: n ═ 6 identical layers, each layer containing two sub-layers. The first sub-layer is a multi-head attention layer (multi-head attention layer) and then a simple fully connected layer. Each sub-layer is added with residual connection (residual connection) and normalization (normalization).

The Decoder includes: the Layer consists of N-6 identical layers, wherein the layers and the encoder are not identical, and the layers comprise three sub-layers, wherein one self-addressing Layer is arranged, and the encoder-decoder addressing Layer is finally a full connection Layer. Both of the first two sub-layers are based on multi-head authentication layers. Specifically, Nx on the left side represents the structure of one layer of the encoder, and the layer includes two sublayers, the first sublayer is a multi-head attention layer, and the second sublayer is a forward propagation layer. The input and output of each sub-layer are associated, and the output of the current sub-layer is used as an input data of the next sub-layer. Each sub-layer is followed by a normalization operation, which can increase the convergence speed of the model. The Nx on the right side represents the structure of one layer of the decoder, the decoder comprises three sublayers in one layer, the first sublayer is a multi-head attention sublayer controlled by a mask matrix and is used for modeling generated target end sentence vectors, and in the training process, the multi-head attention sublayer needs one mask matrix to control, so that only the first t-1 words are calculated in each multi-head attention calculation. The second sub-layer is a multi-head attention sub-layer, which is an attention mechanism between an encoder and a decoder, that is, relevant semantic information is searched in a source text, and the calculation of the layer uses a dot product mode. The third sublayer is a forward propagation sublayer, which is computed in the same way as the forward propagation sublayer in the encoder. There is also a relation between each sub-layer of the decoder, and the output of the current sub-layer is used as an input data of the next sub-layer. And each sub-layer of the decoder is also followed by a normalization operation to speed up model convergence.

With continuing reference to fig. 5, fig. 5 is a schematic diagram of an alternative translation process of the translation model in the embodiment of the present invention, in which both the encoder and decoder portions include 6 encoders and encoders. Inputs into the first encoder combine embedding and positional embedding. After passing 6 encoders, outputting to each decoder of the decoder part; the sentence to be translated is input as japanese "zhenzhizi , cloth は, and った. "これから japanese は is a private person で possessing します. Zi とう portrait います. Tianxiang dragon Yun は Yun います. "admiration of the courage いは, birth or death よりも or even だしい! Fear れないと heart に embraces いて and empty をべます. Light at the periphery of the upper margin of where な is irradiated, は at います: "private region を with ち, Zhi ること, できなくて, を, に with great strength, にならせます! Sheng dynasty emperor Benwu yak は Below date double! Secret を teaches えます. Privacy は is shown at です. "after the processing of the translation model, the output translation result is: "magic lubu without double: "from this moment, the battlefield is dominated by one person! One can dare to fight with me! "cang tian xiang longzhao yun saying: "oath of courage, rather than death! Will not fear and soar over the sky! \26688, \39580: "unable to defeat me, will let me more powerful! "Jiansheng palace this Wucang saying: "Do not have two in the world! Tell you a secret: i, I is invincible! "".

With continuing reference to FIG. 6, FIG. 6 is an alternative structural diagram of an encoder in a translation model in an embodiment of the present invention, where its input consists of a query (Q) and a key (K) of dimension d and a value (V) of dimension d, all keys calculate a dot product of the query, and apply a softmax function to obtain a weight of the value.

With continued reference to FIG. 6, FIG. 6 is a vector diagram of the encoder in the translation model of the embodiment of the present invention, wherein Q, K and V are obtained by multiplying the vector x of the input encoder by W ^ Q, W ^ K, W ^ V. W ^ Q, W ^ K, W ^ V are (512, 64) in the dimension of the article, then assume that the dimension that can input is (m, 512), where m represents the number of words. The dimension of Q, K and V obtained after multiplying the input vector by W ^ Q, W ^ K, W ^ V is (m, 64).

With continued reference to fig. 7, fig. 7 is a schematic diagram of vector splicing of an encoder in a translation model according to an embodiment of the present invention, where Z0 to Z7 are corresponding 8 parallel heads (dimension is (m, 64)), and then concat obtains (m, 512) dimension after the 8 heads. After the final multiplication with W ^ O, the output matrix with the dimension (m, 512) is obtained, and the dimension of the matrix is consistent with the dimension of entering the next encoder.

With continued reference to fig. 8, fig. 8 is a schematic diagram of an encoding process of an encoder in the translation model according to the embodiment of the present invention, in which x1 passes through self-attack to a state of z1, the tensor passing through self-attack further needs to go through a residual error network and a latex Norm, and then the tensor passes through a fully connected feed-forward network, and the feed-forward network needs to perform the same operation, and perform residual error processing and normalization. The tensor which is finally output can enter the next encoder, then the iteration is carried out for 6 times, and the result of the iteration processing enters the decoder.

With continuing reference to fig. 9, fig. 9 is a schematic diagram of a decoding process of a decoder in the translation model according to an embodiment of the present invention, wherein the input and output of the decoder and the decoding process are as follows:

and (3) outputting: probability distribution of output words corresponding to the i position;

inputting: output of encoder & output of corresponding i-1 position decoder. So the middle atttion is not self-atttion, its K, V comes from encoder and Q comes from the output of the decoder at the last position.

The method for translating the contents displayed on the screen of the terminal provided by the embodiment of the application is realized based on Artificial Intelligence (AI), which is a theory, a method, a technology and an application system for simulating, extending and expanding human Intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In the embodiment of the present application, the artificial intelligence software technology mainly involved includes the above-mentioned voice processing technology and machine learning and other directions. For example, the present invention may relate to a Speech Recognition Technology (ASR) in Speech Technology (Speech Technology), which includes Speech signal preprocessing (Speech signal preprocessing), Speech signal frequency domain analysis (Speech signal analysis), Speech signal feature extraction (Speech signal feature extraction), Speech signal feature matching/Recognition (Speech signal feature matching/Recognition), training of Speech (Speech training), and the like.

For example, Machine Learning (ML) may be involved, which is a multi-domain cross discipline, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and so on. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine Learning generally includes techniques such as Deep Learning (Deep Learning), which includes artificial Neural networks (artificial Neural networks), such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Neural Networks (DNN), and the like.

Step 304: and acquiring text attribute information corresponding to the text content information of the image frame to be translated, wherein the text attribute information comprises text color information and text word size information.

In some embodiments of the present invention, fig. 10 is an optional flowchart of a method for translating content displayed on a screen of a terminal according to an embodiment of the present invention, and it can be understood that the steps shown in fig. 10 can be executed by various electronic devices operating a video processing function, for example, a terminal, a server or a server cluster such as a computer, a smart phone, etc. with video playing and processing functions. The following is a description of the steps shown in fig. 10.

Step 1001: and converting the format of the image frame to be translated to obtain the image frame to be translated in a bitmap format.

Step 1002: and performing color space conversion processing on the image frame to be translated in the image format, and according to a color space conversion processing result, performing identification processing on a background color in the image frame to be translated to obtain a color value of a pixel point corresponding to the background color.

The method for converting the color space of the image frame bitmap to be translated from RGB to HSV color space in the bitmap format comprises the following steps: where (r, g, b) are the red, green and blue color values of a color, respectively, their values being real numbers between 0 and 1 and (h, s, v) are the hue, saturation and lightness, respectively, the transformation process referring to equation 1.

Wherein the content of the first and second substances,

v＝max。

for the nearby area where the characters are located, in order to avoid the influence of the background on the display of the characters, the background of the characters cannot be designed to have too large color difference by default, and the main color tones in the background of the characters are relatively close. Therefore, when the main background color is identified, the HSV color values of (0, 0) and (width) positions in the text vicinity area are directly taken.

Further, an attempt can be made to compare HSV color values of other pixels in the main background color and the text region map to obtain a difference value of color difference. The difference value calculation method is as follows, where d denotes a color difference calculation formula, col1, col2 denote abbreviations of (h, s, v) values of HSV color spaces of two pixels, respectively, and the transformation process refers to formula 2.

By the method, all the pixel points are sampled and traversed, and the color value of the pixel point with the maximum difference value with the set main background color is found, so that the color of the character font in the image frame to be translated can be determined.

Step 1003: extracting color values of pixel points in a region where text content information in the image frame to be translated is located;

step 1004: comparing the color value of the pixel point corresponding to the background color with the color value of the pixel point in the area where the text content information is located to obtain the text color information;

step 1005: and segmenting the image frame to be translated in the bitmap format to obtain the text font information.

In some embodiments of the present invention, in order to obtain the font size information, in some embodiments of the present invention, fig. 11 is an optional flowchart of the method for translating the content displayed on the screen of the terminal, which is provided in the embodiments of the present invention, and specifically includes:

step 1101: and segmenting the image frame to be translated according to the position of the text content information to obtain at least two text recognition segments.

For example, the word sizes of the multiple lines of words in a list in the foreign language are consistent, and the number of words is also consistent. These words are typically recognized by the OCR engine as multiple word regions that are translated separately and the results returned. If the word size calculation is performed on the characters in the list type or the grid type at this time, the characters may be calculated into different word sizes due to the problem of inconsistent lengths of the characters in the translation result, and finally, the plurality of character areas with consistent word sizes in the original text are translated into the plurality of character areas with inconsistent word sizes, and at least two text recognition segments can be cut into different text recognition segments to realize accurate recognition of the word sizes.

Step 1102: counting the text word size corresponding to the text of the single character related in the at least two text recognition segments to obtain the text word size information in each text recognition segment;

step 1103: and comparing the text word size information in different text recognition segments, and summarizing the different text recognition segments when the horizontal starting positions of the different text recognition segments are determined to be less than or equal to the horizontal position threshold value and the heights of the different text recognition segments are determined to be less than or equal to the height threshold value. As shown in fig. 11, the character areas first calculate the character sizes in a unified manner, wherein the conditions for summarizing the character segments a1, a2, etc. are as follows:

1, equal or similar horizontal starting positions of regions a1 and a 2;

2, the areas of A1 and A2 are equal or similar in height;

for each summary region, the maximum font size that can accommodate the translation result text can be calculated for each region according to the region width. And then, taking the minimum value of the maximum character size result of the A-type area, and ensuring that each A-type character segment can completely display the result by using the character size.

Step 1104: and counting the text word size information of the text recognition segment obtained by summarizing, and screening the corresponding text word size information in the image frame to be translated according to the counting result.

Step 305: rendering the text translation result according to the text attribute information and then covering the rendered text translation result to a current screen display content page of the terminal so as to realize real-time presentation of the rendered text translation result in the terminal.

In some embodiments of the present invention, fig. 12 is an optional flowchart of the method for translating the content displayed on the screen of the terminal according to an embodiment of the present invention, and after obtaining the corresponding text font size information in the image frame to be translated through the processing process shown in fig. 11, the step shown in fig. 12 is required to render the text translation result according to the text color information and the text font size information included in the text attribute information, and then cover the rendered text translation result on the current content page displayed on the screen of the terminal, which specifically includes the following steps:

step 1201: and determining background information and foreground information corresponding to the text content information according to the position of the text content information in the image frame to be translated.

In some embodiments of the present invention, in order to obtain foreground information, the image frame to be translated may be converted from a red, green, blue, RGB mode to a hue saturation value HSV mode; in the image frame to be translated in the HSV mode, cutting text content information in the image frame to be translated according to the position of the text content information in the image frame to be translated; determining background information corresponding to the text content information according to the cutting processing result; and determining the edge position of the text content information according to the result of the clipping processing and the position of the text content information in the image frame to be translated, and determining the foreground information according to the edge position of the text content information.

Step 1202: and respectively matching the text translation results according to text color information and text word size information in the text attribute information to obtain a first text translation result, wherein the first text translation result is matched with the picture style of the current screen display content.

Through the matching process of step 1202, the first text translation result is consistent with the text color of the foreign text in the current screen display content of the terminal and the text font size is matched, so that the use of the user is not affected by the text color and the text font size of the translated text, for example: the text content information (the number of characters 4) is "double kill, triple kill", and the first text translation result (the number of characters 4) obtained through the processing of the step 1202 is "double kill, triple kill", so that the first text translation result can be set to the text font size same as the text content information.

In some embodiments, the text content information (number of characters 28) is "the same as" the double-profiling magic cloth は and った. "これから japanese は private person で dominates します", the processing of step 1202 is performed to obtain a first text translation result (character number 32) as "magic lubba without two: "from this moment, the battlefield is dominated by one person! One may dare to fight me "will thus set the first text translation result to a text word size of 5, matching the text content information (small 4 word), to avoid ghost images due to an excessive number of characters of the first text translation result.

Step 1203: rendering the first text translation result based on the background information and the foreground information corresponding to the text content information to obtain the rendered text translation result.

In some embodiments of the present invention, fig. 13 is an optional flowchart of a method for translating content displayed on a screen of a terminal according to an embodiment of the present invention, and specifically includes the following steps:

step 1301: and configuring a character area boundary detection box and a maximum search radius based on the background information corresponding to the text content information.

Fig. 14 is a schematic diagram of a display state of a virtual character in a game video, where a diagram displayed in a mobile game screen is not a static resource in which an App is stored locally, but rather a dynamic effect generated according to a user operation, as a third-party developer, there is no way to directly obtain an original diagram, and meanwhile, in a process of erasing original text and restoring an original background, since there are many text pixels, enough servers must participate in most of the pixels in a text region to restore the background, and the hardware cost required in the process is too high.

Step 1302: and according to the character area boundary detection frame and the maximum search radius, carrying out first marking processing on pixel points exceeding the character area boundary detection frame to obtain a first marking result.

The maximum search radius can be adjusted according to the type of the terminal screen display content or the area of a display interface of the terminal equipment, the radius is 1-a preset value (less than or equal to 15), and when the method for translating the terminal screen display content provided by the application is triggered, the maximum search radius corresponding to the current video processing environment can be automatically selected.

Step 1303: and carrying out second marking processing on the pixel points inside the character area boundary detection frame to obtain a second marking result.

Step 1304: and traversing pixel points corresponding to all text content information in the two marking results, and performing pixel reverse filling processing to obtain a background processing result of the first text translation result.

The purpose of traversing bitmap for the first time is to exchange space efficiency for time efficiency, and a hash table is used for storing the mapping relation between the pixel coordinate value and whether the pixel is the pixel where the character is located. The pixel points outside the character area boundary detection frame can be directly identified as non-character areas, so that the non-character areas can be directly marked without color similarity comparison, and the non-character positions are taken as values;

for the pixel points in the detection frame region, the following judgment can be made based on a basic setting: the font color and the main background color have higher resolution and identification. Therefore, in order to mark the text position, the RGB color value of the pixel point needs to be converted into an HSV color value, then the color similarity between the value and the HSV value of the main background color is compared, and when the difference value exceeds the threshold value, the pixel point is considered to be the text position.

Step 1305: and responding to the background processing result of the first text translation result, and performing foreground information fusion processing to obtain the fused first text translation result.

All pixel points on the bitmap are marked, and partial memory space is used. The method aims to multiplex the result of local traversal during the subsequent second bitmap traversal, and can reversely fill a certain background color for the pixel point of the character through the mark fast enough.

Specifically, as shown in FIG. 13, during the second pass, only all text region pixels marked during the first pass are processed, e.g., (x1, y 1). And then, taking the pixel point (x1, y1) as a center, traversing surrounding pixel points from 1 to a preset value (wherein the preset value is less than or equal to 15) in radius, and if a certain non-character pixel (x2, y2) is found, reversely filling the RGB value of the non-character pixel (x2, y2) (x1, y 1). The process is repeated continuously until all the character pixel points in all the areas are traversed. At this point, the text pixels have been reverse filled with the most similar background color.

After the background images in the video image frames are fused, the foreground images of the video image frames need to be continuously processed to obtain a first text translation result subjected to foreground information erasing, specifically, the foreign language characters on the hand-game interface on the screen are basically erased and covered with background colors. But the boundary part of the text still has a plurality of noise points, which causes the interface to be not beautiful. Therefore, the bitmap needs to be subjected to partial morphological image processing, and the color of the text area is subjected to secondary processing by using corrosion and expansion technology. Since the stroke weights of the characters in the game video, especially the game video of the hand-game scene, are different, the simple erosion-dilation process may generate a larger noise area. Therefore, a "block-wise" etch scheme is employed.

Wherein the image expansion calculation method refers to equation 3, in which the value of (x, y) is replaced with the maximum value in the (x, y) peripheral region x + x ', y + y'.

The technical scheme of the application processes the colors of a plurality of pixel points instead of processing the color of a single pixel point. In formula 3, (x, y) is a certain pixel, and src (x, y) is a pixel value. The scheme is further optimized, a pixel block with a new pixel point of 3 x 3 in the conceptual sense is taken, the color value of the pixel block is normalized aiming at 9 pixel points to obtain a new pixel value, and the normalization processing refers to a formula 4:

the RGB three-channel or HSV three-channel of a plurality of pixels in a pixel block may be summed, the sum is for 9 pixels in a certain channel (such as an R channel), and then the final normalization result is that the mean of color values of the three channels in the pixel block is used as the color value of the pixel block. With this value, the expansion operation as above is performed

In some embodiments of the present invention, when performing gaussian blur processing on an image of an area where the foreground information is located, a result of the gaussian blur processing may be converted from an RGB mode to an HSV mode to obtain foreground information of the HSV mode; and sequentially performing morphological expansion processing and morphological corrosion processing on the image of the region where the foreground information is located to obtain the fused first text translation result. And during morphological expansion treatment, holes can be eliminated, the structural elements are rectangles with the size of 3 x 3, the structural elements slide from left to right and from top to bottom to and operate with the image pixels in the window in sequence, when at least one value of the operation result is 1, the value of the position pixel is assigned to be 1, otherwise, the value is 0.

Finally, through the processing in step 301 and step 305, the text translation result matches with the picture style of the terminal screen display content, and the color and font are the same, so that the user can obtain the same use feeling as that when the current screen display content of the terminal is used in a foreign language state.

In some embodiments of the present invention, since the image frame to be translated may be bullet screen information in a game video, when the terminal screen display content is the game video and the text content information is a bullet screen, the content of the target bullet screen may be replaced with the first text translation result by triggering a translation function item to form a new bullet screen. Specifically, when the original bullet screen information in the game video is Japanese " Sheng dynasty emperor Ben martian は" the people just under the day! Secret を teaches えます. Privacy は is shown at です. "after the processing of the translation model, a first text translation result" Jiansheng GongBen martial art: "Do not have two in the world! Tell you a secret: i, I is invincible! The color, the font size and the display position of the character of the presented first text translation result are the same as those of the original barrage information, and meanwhile, the playing of the game video cannot be blocked.

In some embodiments of the present invention, since the foreign game may not meet the corresponding law (for example, the game is graded according to the age of the player in some countries), when the obtained translation result does not meet the legal requirement, the game video can be continuously played, but the video which is continuously played and the translation result of the text content displayed in the current on-screen content of the terminal need to be fuzzified to meet the corresponding legal requirement. In some embodiments of the present invention, the blurring of the video and the text translation result displayed in the current screen display content of the terminal may be implemented by gaussian blurring of the picture layer and the text translation result layer. Wherein, the process of Gaussian blur comprises the following steps: converting the available memory, the decoding capability, the frame rate of the current screen display content of the terminal and the like of the equipment into the following steps: A. b, C, D4, the Gaussian blur degree takes the blur radius as a parameter, the higher the blur radius is, the higher the performance consumption is, and the Gaussian blur processing is performed with the blur power of 25+5 grades to realize the blur processing of different grades from 25f to 50 f.

In some embodiments of the present invention, since the user habits are different, the playing state of the video and/or the corresponding text translation result may be adjusted according to the corresponding user instruction when the video and the text translation result displayed in the current on-screen content of the terminal are blurred.

Of course, since the types of terminals running the video clients are various and the hardware configurations thereof are different, different processing strategies can be executed on devices with different configuration parameters. When the configuration of the equipment meets the configuration condition and the text content information in the image frame to be translated in the video is translated, the method for translating the contents displayed on the screen of the terminal provided by the application can be executed in the terminal equipment due to the sufficient parallel processing task capacity of the processor of the terminal equipment, the video playing and/or the text translation result display can not be blocked, and a user can obtain better viewing experience, so that the contents displayed on the screen of the terminal can be continuously played and the text translation result can be adjusted along with the corresponding background information and the corresponding foreground information; for a large game process with higher resolution, when the capability of a processor of the terminal device for parallel processing tasks is insufficient, the method for translating the content displayed on the screen of the terminal device provided by the application can be executed in the game server, and the terminal device receives the text translation result which is transmitted by the game server and rendered based on the background information and the foreground information, so that the display of the text translation result can be avoided. When the game server obtains the generated text translation result, the text translation result can be checked, and when the check is passed, the text translation result rendered based on the background information and the foreground information is presented in different clients (or webpage playing interfaces) watching the same video; and when the verification fails, blurring the text translation result rendered based on the background information and the foreground information, or sending prompt information for prompting that the noncompliance information exists in the image frame to be translated of the screen display content of the user terminal.

In some embodiments of the present invention, when the terminal screen content is a cloud game video, an experience threshold matching the image frame to be translated is determined; determining the color overflowing pixels in the terminal screen display content based on the experience threshold matched with the image frame to be translated and the image processing boundary range matched with the color key parameters; and adjusting the gray value of the overflowing pixel in the terminal screen display content. The pixels in the original image within the closed interval with the color key distance of [ R, P ] are screened out by adjusting an empirical threshold value P (P is larger than or equal to R), and the pixels are defined as the overflowing pixels, namely, some of the pixels are overflowed due to the reflection action of the curtain, the perception is influenced, and the pixels need to be corrected. The value of the overflowed pixel may be set as a gray value, wherein, as an example, the correction may be performed according to (R + G + B)/3, in some embodiments of the present invention, since the perceptions of RGB colors are different in the process of observing colors by users in different usage scenarios, the weights of the overflowed pixel may be flexibly adjusted according to the usage habits of different users, for example: the Grey is 0.411R + 0.547G + 0.155B to adapt to different users' usage habits, so that users can obtain more comfortable usage feeling.

In summary, the embodiments of the present invention have the following technical effects:

capturing an image frame to be translated in a target video by acquiring the target video, and receiving a translation trigger instruction for translating the current screen display content of a terminal; triggering a translation process matched with the target video, translating text information in the image frame to be translated through the translation process to obtain a first text translation result, responding to the translation processing instruction aiming at the current screen display content of the terminal, acquiring the current screen display content of the terminal, and performing screen capture processing to obtain the image frame to be translated including the text content information in the current screen display content of the terminal; the image frame to be translated comprises text content information and text attribute information corresponding to the text content information; translating the text content information in the image frame to be translated to obtain a text translation result; acquiring text attribute information corresponding to the text content information of the image frame to be translated, wherein the text attribute information comprises text color information and text word size information; rendering the text translation result according to the text attribute information and then covering the rendered text translation result to a current screen display content page of the terminal so as to realize real-time presentation of the rendered text translation result in the terminal. Acquiring text color information and text font information corresponding to the text information; determining background information and foreground information corresponding to the text information according to the position of the text information in the image frame to be translated; respectively processing the text translation result according to the text color information and the text word size information to obtain a second text translation result so as to realize that the second text translation result is matched with the picture style of the target video; the second text translation result is adjusted based on the background information and the foreground information corresponding to the text information, so that the second text translation result is fused in the target video, therefore, not only can the image frame to be translated in the screen display content of the target video terminal be accurately translated, but also when the image frame to be translated is presented, the text color information, the text word number information, the background information and the foreground information can be covered to a current screen display content page of the terminal after rendering processing and are matched with the picture style of the screen display content of the terminal, double images and fuzzy characters caused by translation of the text content are avoided, the user can be guaranteed to obtain more comfortable use experience, the more comfortable use experience can be well fused in the target video and is matched with the picture style of the target video, and the user can be guaranteed to obtain more comfortable use experience.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1.A method for translating content displayed on a screen of a terminal is characterized by comprising the following steps:

responding to a translation processing instruction aiming at the current screen display content of the terminal, acquiring the current screen display content of the terminal, and performing screen capture processing to obtain an image frame to be translated, wherein the current screen display content of the terminal comprises text content information; the image frame to be translated comprises text content information and text attribute information corresponding to the text content information;

2. The method according to claim 1, wherein the translating the text content information in the image frame to be translated to obtain a text translation result comprises:

responding to the translation processing instruction aiming at the current screen display content of the terminal, and triggering a corresponding translation model;

determining at least one word-level hidden variable corresponding to text content information through an encoder of the translation model;

generating, by a decoder of the translation model, a translated term corresponding to the word-level hidden variable and a selected probability of the translated term according to the at least one word-level hidden variable;

and selecting at least one translation word to form a text translation result corresponding to the text content information according to the selection probability of the translation result.

3. The method according to claim 1, wherein the obtaining text attribute information corresponding to the text content information of the image frame to be translated comprises:

converting the format of the image frame to be translated to obtain the image frame to be translated in a bitmap format;

performing color space conversion processing on the image frame to be translated in the image format, and according to a result of the color space conversion processing, performing identification processing on a background color in the image frame to be translated to obtain a color value of a pixel point corresponding to the background color;

extracting color values of pixel points in a region where text content information in the image frame to be translated is located;

comparing the color value of the pixel point corresponding to the background color with the color value of the pixel point in the area where the text content information is located to obtain the text color information;

and segmenting the image frame to be translated in the bitmap format to obtain the text font information.

4. The method of claim 3, further comprising:

according to the position of the text content information, the image frame to be translated is segmented to obtain at least two text recognition segments,

counting the text word size corresponding to the text of the single character related in the at least two text recognition segments to obtain the text word size information in each text recognition segment;

comparing text word size information in different text recognition segments, and summarizing the different text recognition segments when the horizontal starting positions of the different text recognition segments are determined to be less than or equal to a horizontal position threshold value and the heights of the different text recognition segments are determined to be less than or equal to a height threshold value;

and counting the text word size information of the text recognition segment obtained by summarizing, and screening the corresponding text word size information in the image frame to be translated according to the counting result.

5. The method according to claim 1, wherein the rendering the text translation result according to the text attribute information and then overlaying the rendered text translation result to a current on-screen content page of the terminal, so as to realize real-time rendering of the rendered text translation result in the terminal, comprising:

determining background information and foreground information corresponding to the text content information according to the position of the text content information in the image frame to be translated;

respectively matching the text translation results according to text color information and text word size information in the text attribute information to obtain a first text translation result, wherein the first text translation result is matched with the picture style of the current screen display content;

rendering the first text translation result based on the background information and the foreground information corresponding to the text content information to obtain the rendered text translation result.

6. The method according to claim 5, wherein the determining the background information and the foreground information corresponding to the text content information according to the position of the text content information in the image frame to be translated comprises:

converting the image frame to be translated from a red, green and blue (RGB) mode into a Hue Saturation Value (HSV) mode;

in the image frame to be translated in the HSV mode, cutting text content information in the image frame to be translated according to the position of the text content information in the image frame to be translated;

determining background information corresponding to the text content information according to the cutting processing result;

and determining the edge position of the text content information according to the result of the clipping processing and the position of the text content information in the image frame to be translated, and determining the foreground information according to the edge position of the text content information.

7. The method according to claim 5, wherein the rendering the first text translation result based on the background information and the foreground information corresponding to the text content information to obtain the rendered text translation result comprises:

configuring a character area boundary detection box and a maximum search radius based on the background information corresponding to the text content information;

according to the character region boundary detection frame and the maximum search radius, carrying out first marking processing on pixel points exceeding the character region boundary detection frame to obtain a first marking result;

performing second marking processing on pixel points inside the character region boundary detection frame to obtain a second marking result;

traversing pixel points corresponding to all text content information in the two marking results, and performing pixel reverse filling processing to obtain a background processing result of the first text translation result;

and responding to a background processing result of the first text translation result, performing foreground information fusion processing to obtain the text translation result subjected to rendering processing.

8. The method according to claim 7, wherein performing foreground information fusion processing in response to a background processing result of the first text translation result to obtain the text translation result subjected to rendering processing comprises:

performing Gaussian blur processing on the image of the region where the foreground information is located;

converting the result of the Gaussian blur processing from an RGB mode to an HSV mode to obtain the foreground information of the HSV mode;

and sequentially performing morphological expansion processing and morphological corrosion processing on the image of the region where the foreground information is located to obtain the rendered text translation result.

9. The method of claim 1, further comprising:

when the content displayed by the terminal screen is a game video and the text content information is a bullet screen,

acquiring replacement operation aiming at a target bullet screen through the bullet screen editing component;

responding to the replacement operation, and acquiring a second text translation result corresponding to the target bullet screen;

rendering the second text translation result according to the text attribute information through the bullet screen editing component;

and replacing the target bullet screen in the current screen display content page of the terminal by using the rendered second text translation result.

10. The method of claim 1, further comprising:

presenting a translation function item in a view interface, wherein the translation function item is used for realizing translation of character information in the terminal screen display content;

and responding to a trigger operation aiming at the translation function item, and acquiring and presenting the image frame to be translated and the corresponding background image which are subjected to translation processing, wherein the image frame to be translated and the corresponding background image comprise a first text translation result.

11. The method of claim 10, wherein presenting translation function items in a view interface comprises:

presenting a hovercall-shaped translation function item in the view interface, wherein the hovercall-shaped translation function item can respond to the trigger operation and adjust the display position in the view interface; alternatively, the first and second electrodes may be,

and presenting a translation function item in a transparent form in the view interface, wherein the number of pixels of the translation function item in the transparent form is 1, and the translation function item in the transparent form can adjust the display position in the view interface in response to the trigger operation.

12. An apparatus for performing translation processing on contents displayed on a screen of a terminal, the apparatus comprising:

13. A computer program product comprising a computer program or instructions, characterized in that said computer program or instructions, when executed by a processor, implement the method for translation processing of on-screen content of a terminal according to any of claims 1 to 11.

14. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor, configured to execute the executable instructions stored in the memory, and implement the method for translating the content displayed on the screen of the terminal according to any one of claims 1 to 11.

15. A computer-readable storage medium storing executable instructions, wherein the executable instructions, when executed by a processor, implement the method for translating the contents displayed on the screen of the terminal according to any one of claims 1 to 11.