CN113050860A

CN113050860A - Control identification method and related device

Info

Publication number: CN113050860A
Application number: CN202110459815.6A
Authority: CN
Inventors: 任明星
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-06-29
Anticipated expiration: 2041-04-27
Also published as: CN113050860B

Abstract

The embodiment of the application discloses a control identification method and a related device, which at least relate to a computer vision technology in artificial intelligence, and comprise the following steps: and acquiring a first display interface corresponding to the cloud product according to the terminal identifier in the triggering operation. The method comprises the steps of obtaining a first control template and at least one resolution template obtained by adjusting the resolution of the first control template when the first control template is originally triggered when the first control template enters a target display interface, and obtaining richer control identification information determined based on the templates with different resolutions when the control area in the first display interface is identified by using the obtained templates due to different image resolutions of the obtained templates, so that a better control identification effect can be achieved for the first display interfaces with different resolutions, the success rate of simulating touch control of the first control is improved, the retry times of obtaining a second display interface pointed by the first control is reduced, and the efficiency of returning the target display interface to a terminal device is improved.

Description

Control identification method and related device

Technical Field

The present application relates to the field of data processing, and in particular, to a control identification method and a related apparatus.

Background

With the development of cloud technology, various cloud products such as cloud games and the like have come into play, and the cloud products can be deployed into terminal equipment as applications. The cloud product moves most of data processing and operation to the cloud server to be completed, the terminal device is mainly used for receiving user operation, receiving and displaying processing results of the cloud server, and aiming at the cloud product, the terminal device is more like a display medium of a cloud product and does not need to execute complex data processing, so that the user can use various products which are not enough to operate by the original terminal device.

In order to improve the use efficiency of cloud products, some core use interfaces need to be directly and quickly displayed to a user using terminal equipment in some scenes, so that the user can quickly know or use the cloud products. For example, in the cloud game scenario, it is sometimes necessary to directly provide the user with an interface for starting the game, and the user does not need to click to the interface for starting the game step by step from the main interface for starting the game.

In order to achieve the purpose, in the related technology, a cloud server is adopted to simulate a mode that a user triggers a virtual control, and a display interface of a cloud product to be opened by a terminal device is jumped to a core use interface which needs to be displayed to the user, so that the user can be saved from triggering the virtual control on the interface. However, in the related art, the position of the virtual control on different display interfaces is not accurately identified, which easily causes the operation position of the simulation operation not to be in the control area of the virtual control which should be triggered, and increases the retry number and the response time.

Disclosure of Invention

In order to solve the technical problem, the application provides a control identification method and a related device, which are used for improving the accuracy of virtual control position identification on a display interface.

The embodiment of the application discloses the following technical scheme:

in one aspect, the present application provides a control identification method, including:

acquiring a trigger operation sent by a terminal device aiming at a cloud product, wherein the trigger operation comprises a terminal identifier of the terminal device;

acquiring a first display interface of the cloud product corresponding to the trigger operation according to the terminal identification;

acquiring a first control template and at least one resolution template related to the first display interface, wherein the resolution template is obtained by adjusting the resolution of the first control template, and the image resolution of the resolution template is different from that of the first control template;

respectively carrying out control identification on the first display interface according to the first control template and the resolution template, and determining a control area of a first control corresponding to the first control template in the first display interface;

and acquiring a second display interface pointed by the first control by simulating a trigger operation in the control area, wherein the second display interface is used for determining a target display interface returned to the terminal equipment.

In another aspect, the present application provides a control recognition apparatus, including: an acquisition unit and an identification unit;

the acquisition unit is used for acquiring a trigger operation sent by a terminal device aiming at a cloud product, wherein the trigger operation comprises a terminal identifier of the terminal device; acquiring a first display interface of the cloud product corresponding to the trigger operation according to the terminal identification; acquiring a first control template and at least one resolution template related to the first display interface, wherein the resolution template is obtained by adjusting the resolution of the first control template, and the image resolution of the resolution template is different from that of the first control template;

the identification unit is used for respectively carrying out control identification on the first display interface according to the first control template and the resolution template, and determining a control area of a first control corresponding to the first control template in the first display interface;

the obtaining unit is further configured to obtain a second display interface pointed by the first control by simulating a trigger operation in the control area, where the second display interface is used to determine a target display interface returned to the terminal device.

In another aspect, the present application provides a computer device comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the method of the above aspect according to instructions in the program code.

In another aspect, the present application provides a computer-readable storage medium for storing a computer program for executing the method of the above aspect.

In another aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of the above aspect.

According to the technical scheme, in order to quickly enter the target display interface of the cloud product, the triggering operation carrying the terminal identification of the cloud product is obtained from the terminal equipment, and the first display interface of the triggering operation corresponding to the cloud product is obtained according to the terminal identification. The method comprises the steps that a first control which needs to be triggered by a user when the first control originally enters a target display interface is included in the first display interface, in order to simplify the operation of the user and achieve automatic triggering, a control area of the first control in the first display interface is identified through a first control template corresponding to the first control, in order to improve identification precision, resolution adjustment is conducted on the first control template, at least one resolution template is obtained, and due to the fact that the image resolution of the resolution template is different from that of the first control template, control identification is conducted on a first display image through the first control template and the resolution template, and richer control identification information determined based on templates with different resolutions can be obtained. Because the multi-resolution control template is adopted for control identification, a better control identification effect can be achieved for the first display interfaces with different resolutions, the success rate of simulating the touch control of the first control is improved, the retry times of acquiring the second display interface pointed by the first control are reduced, and the efficiency of returning the target display interface to the terminal equipment is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a control identification method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a server architecture of a cloud game according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a control identification method according to an embodiment of the present application;

fig. 4 is a schematic diagram of displaying the same display interface by terminal devices with different resolutions according to the embodiment of the present application;

fig. 5 is a schematic diagram of an operation and maintenance module according to an embodiment of the present disclosure;

fig. 6 is an execution sequence diagram corresponding to fig. 5 according to an embodiment of the present application;

fig. 7 is a flowchart of a target display interface triggering process according to an embodiment of the present disclosure;

FIG. 8 is a schematic illustration of an upsampling and downsampling;

FIG. 9 is a schematic illustration of a down-sampling;

FIG. 10 is a schematic diagram illustrating resolution adjustment performed on a first control template according to an embodiment of the present application;

fig. 11 is a schematic diagram of a first display interface provided in an embodiment of the present application;

fig. 12 is a schematic diagram of feature points corresponding to the first display interface of fig. 11 according to an embodiment of the present disclosure;

FIG. 13 is a schematic view of an identification control;

FIG. 14 is a schematic view of an identification control;

FIG. 15 is a schematic view of an identification control;

FIG. 16 is a schematic diagram of an identification feature point;

fig. 17 is a schematic diagram of an identification feature point provided in an embodiment of the present application;

FIG. 18 is a schematic diagram of a parametric configuration of a SURF algorithm model;

FIG. 19 is a schematic illustration of a display interface for initiating a game according to an embodiment of the present application;

FIG. 20 is a schematic illustration of a game lobby display interface provided in an embodiment of the application;

fig. 21 is a schematic view of a display interface corresponding to a human-machine battle scene according to an embodiment of the present application;

fig. 22 is a schematic diagram of a control identification apparatus according to an embodiment of the present application;

fig. 23 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 24 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

In the related technology, for a cloud product, a server at the cloud end identifies the position of a virtual control according to a display interface image and a corresponding template image which should be displayed on a terminal device, so that a user is simulated to trigger the virtual control, and the cloud product directly enters a core use interface. Taking a cloud game as an example, the cloud server replaces a user to trigger virtual controls such as game login and game starting, so that the user can directly enter a game interface, and the user can be saved from triggering corresponding virtual controls on terminal equipment. However, only one template image generally corresponds to one image resolution, and the resolutions of the display interface of the terminal device are various, and when the difference between the image resolution of the display interface image and the image resolution of the template image is large, the accuracy of identifying the position of the virtual control is affected.

Based on this, the embodiment of the application provides a control identification method and a related device, which are used for improving the accuracy of identifying the position of a virtual control.

The control identification method provided by the embodiment of the application is realized based on Artificial Intelligence (AI), which is a theory, method, technology and application system for simulating, extending and expanding human Intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In the embodiment of the present application, the artificial intelligence techniques mainly involved include the above-mentioned computer vision techniques, machine learning/deep learning, and the like.

The control identification method provided by the application can be applied to control identification equipment with data processing capacity, such as terminal equipment and a server. The terminal device may be, but is not limited to, a smart phone, a desktop computer, a notebook computer, a tablet computer, a smart speaker, a smart watch, and the like; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

The control recognition device may be capable of computer vision techniques. Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, and the like.

The control recognition device may be machine learning capable. Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks.

In the control identification method provided by the embodiment of the application, the adopted artificial intelligence model mainly relates to the application of a computer vision technology, and the control area of the first control corresponding to the first control template in the first display interface is identified through technologies such as image identification and the like.

In addition, the control identification method provided by the embodiment of the application can be realized based on a cloud computing technology. Cloud computing (cloud computing) refers to a delivery and use mode of an IT infrastructure, and refers to obtaining required resources in an on-demand and easily-extensible manner through a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), Distributed Computing (Distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.

With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.

The control identification method provided by the embodiment of the application can also be applied to Cloud games, and the Cloud games (Cloud games) can also be called game on demand (gaming), and are an online game technology based on a Cloud computing technology. Cloud game technology enables light-end devices (thin clients) with relatively limited graphics processing and data computing capabilities to run high-quality games. In a cloud game scene, a game is not operated in a player game terminal but in a cloud server, and the cloud server renders the game scene into a video and audio stream which is transmitted to the player game terminal through a network. The player game terminal does not need to have strong graphic operation and data processing capacity, and only needs to have basic streaming media playing capacity and capacity of acquiring player input instructions and sending the instructions to the cloud server.

A control identification method as disclosed herein, wherein a first control template may be saved on a blockchain.

In order to facilitate understanding of the technical solution of the present application, the following describes a control identification method provided in the embodiment of the present application with a server as a control identification device in combination with an actual application scenario.

Referring to fig. 1, the figure is a schematic view of an application scenario of a control identification method provided in an embodiment of the present application. In the application scenario shown in fig. 1, including the terminal device 100 and the server 200, the server 200 may simulate a triggering operation of a user on a cloud product in the terminal device 100. The cloud product is a product based on a cloud computing technology, such as a cloud game, cloud education, and a cloud conference, and the cloud product is exemplified as a cloud game Application (APP).

After a user performs a trigger operation on a cloud game APP through a terminal device 100, the terminal device 100 sends a trigger operation carrying a terminal identifier 100 to a server 200, after the server 200 obtains the trigger operation, a first display interface (a game starting display interface shown in fig. 1) corresponding to the trigger operation of the cloud game APP is obtained according to the terminal identifier 100 in the trigger operation, and the image resolution of the display interface is 576 x 1280 and includes a first control (a game starting control shown in fig. 1) that needs to be triggered by the user when the user originally enters a target display interface (a game lobby display interface shown in fig. 1).

The server 200 obtains a first control template corresponding to a game starting control related to the game starting display interface, wherein the image resolution of the first control template is 720 x 1280, and since the image resolutions of the first control template and the game starting display interface are different, the accuracy of identifying the game starting control is affected. In order to improve the recognition accuracy, the resolution of the first control template is adjusted, for example, the first control template is subjected to magnification sampling, so that the resolution template shown in fig. 1 is obtained, and the image resolution of the resolution template is 720 × 864.

Because the image resolution of the resolution template is different from that of the first control template, the game starting controls in the game starting display interface are respectively identified through the first control template and the resolution template, and compared with the method for identifying the game starting controls based on the template with single image resolution, more abundant control identification information can be obtained, better control identification effect can be obtained, the identified game starting control area is more accurate, and therefore the success rate of the server 200 in simulating the user to trigger the game starting controls is higher.

The server 200 simulates a user to trigger a game starting control, namely, a recognized control area is triggered, so that the cloud game APP skips from a game starting display interface to a game hall display interface, namely, a second display interface, at the moment, the second display interface is a target display interface, the server 200 returns the game hall display interface to the terminal device 200, and therefore the user can directly enter the game hall without triggering the game starting control to quickly start a game.

Therefore, as the multi-resolution control template is adopted for control identification, a better control identification effect can be achieved for the first display interfaces with different resolutions, the success rate of simulating the touch control of the first control is improved, the retry times of acquiring the second display interface pointed by the first control are reduced, and the efficiency of returning the target display interface to the terminal equipment is improved.

With reference to the drawings, a server is used as a control identification device, and a control identification method provided by the embodiment of the present application is described below.

Before introducing the control identification method, the server architecture is explained. The cloud product is a product based on a cloud computing technology, and is not operated in a terminal device of a user, but operated in a server (hereinafter referred to as a server) in the cloud, and a server architecture is described by taking a cloud game as an example. Referring to fig. 2, the figure is a schematic diagram of a server architecture of a cloud game according to an embodiment of the present application.

The cloud game server comprises a plurality of boards or containers, wherein cloud games are installed on the boards or containers (also called cloud game hosts), the functions of the boards and the containers are similar to those of terminal equipment, data related to the cloud games can be processed, but the boards and the containers do not have physical display screens, the related data are processed on the boards or the containers, corresponding display is provided for users through the terminal equipment, and triggering operation of the users is received. The board card or the container has a plug flow process, the sound and the picture of the cloud game are sent to APP or HTML5 (a language description mode for constructing Web content) on the terminal equipment of the user through the plug flow server, and the user transmits the triggering operation of the user back to the board card or the container of the cloud game through the terminal equipment in the process of trying to play the cloud game. The plug flow server may be the same as or different from the server, and this is not specifically limited in this application.

In order to enable the cloud game to enter an interface capable of trying to play, an operation and maintenance module integrating functions of image identification, game control and the like is additionally arranged in a server. For example, in the cloud game starting and game process, the operation and maintenance module performs screenshot and image recognition on the cloud game, performs triggering operations such as clicking, sliding a screen and actively informing a terminal device of a user of finishing the game according to an image recognition result, so that the cloud game enters a certain scene, for example, a game scene of man-machine battle, and allocates the game scene corresponding to the man-machine battle to the user for trial play, so that the user does not need to trigger a login game control in the terminal device, trigger the man-machine battle control after entering a game hall, and the like, the waiting time of the user is reduced, and the operation of the user is simplified. Each board card or container may correspond to one operation and maintenance module, or a plurality of board cards or containers may share one operation and maintenance module, which is not specifically limited in this application.

The operation and maintenance module in the server acquires the control position through image recognition. Referring to fig. 3, the figure is a flowchart of a control identification method provided in an embodiment of the present application. As shown in fig. 3, the control identification method includes the following steps:

s301: acquiring triggering operation sent by the terminal equipment aiming at the cloud product.

In practical application, a user triggers a cloud product through a terminal device, wherein the triggering operation can be single click, double click, sliding and the like, and the method is not particularly limited in this application. The terminal device sends the trigger operation to the server so that the server can know that the user starts to use the cloud product, and the time when the user uses the cloud product can be just when the cloud product is started or after the cloud product is started.

In the triggering operation acquired by the server, the triggering operation includes a terminal identifier of the terminal device, the terminal identifier can identify the terminal device, so that the server can return the corresponding operation result to the corresponding terminal device, and the terminal identifier can also identify a board card or a container corresponding to the terminal device. For example, after a user starts a cloud game, a board or a container running the cloud game is identified by a terminal identifier, and in the process of running the cloud game, subsequent trigger operation sent by the terminal device is sent to the board or the container corresponding to the terminal identifier according to the terminal identifier, so that the running progress of the current cloud game can be determined, and the running of the cloud game is smooth.

S302: and acquiring a first display interface of the cloud product corresponding to the trigger operation according to the terminal identification.

And the server determines a first display interface of the triggering operation corresponding to the cloud product operated by the user in a board card or a container corresponding to the terminal identification according to the terminal identification in the triggering operation. For example, when a user opens a cloud game APP in a terminal device, a server runs the cloud game in a corresponding board or container according to a terminal identifier in a trigger operation sent by the terminal device, and at this time, a first display interface corresponding to the opening of the cloud game APP may be a game starting display interface.

The first display interface comprises a first control which needs to be triggered by a user, in order to simplify the operation of the user and realize automatic triggering, the server can perform operations such as screenshot on the first display interface, and the position of the first control in the first display interface is obtained by performing image recognition on the first display interface, so that the user is replaced to trigger the first control, and automatic triggering is realized.

S303: a first control template and at least one resolution template associated with a first display interface are obtained.

In the related art, since the control template is generally obtained by sampling 720P, one display interface generally corresponds to only one control template, that is, the control template corresponds to a fixed image resolution, and the terminal devices of the user are various and correspond to various resolutions. Continuing to take the cloud game as an example, in order to adapt to the page layout of the terminal devices with different resolutions, when the plug-flow server pushes the screen of the cloud game, adaptive zooming-in or zooming-out is performed, but the scaling and the range of zooming-in are determined by a game developer, and there is no unified specification at present. Referring to fig. 4, the display interfaces of the same cloud game have different display resolutions on the terminal devices with different resolutions.

Therefore, for the display interfaces with different resolutions, the control template with the fixed resolution is adopted for image recognition, the obtained recognition effect is possibly quite different, and the stability of the recognition accuracy of the position of the first control is difficult to ensure. In order to improve the identification precision of the position of the first control, resolution adjustment is carried out on the first control template to obtain at least one resolution template, and due to the fact that the image resolution of the resolution template is different from that of the first control template, control identification is carried out on the first display image through the first control template and the resolution template, richer control identification information determined based on the templates with different resolutions can be obtained, and a better control identification effect can be achieved for first display interfaces with different resolutions.

It should be noted that, resolution adjustment is performed on the first control template, and obtaining at least one resolution template may be performed at any time before S303, for example, resolution adjustment is performed after the first control template is obtained, and the obtained resolution template and the first control template are stored in the server together. In another example, resolution adjustment is performed on the first control template after the first display interface is obtained.

As a possible implementation manner, after the resolution adjustment is performed on the first control template, at least one first resolution template and at least one second resolution template may be obtained. The image resolution of the first resolution template is smaller than that of the first control template, the image resolution of the second resolution template is larger than that of the first control template, so that when the first display interface is subjected to image recognition according to the template image, the first resolution template with lower resolution and the second resolution template with higher resolution are available, and the templates with different resolutions can adapt to the first display interfaces with different resolutions as much as possible, namely the image resolution of one template in the first control template, the first resolution template and the second resolution template is closer to that of the first display interface, the difference between the image resolutions of the template and the first display interface is reduced, so that the matching can be carried out according to the template closer to that of the first display interface in the image recognition process, more feature points are obtained, so that a better control identification effect can be achieved for the first display interfaces with different resolutions, and the compatibility is better.

S304: and respectively carrying out control identification on the first display interface according to the first control template and the resolution template, and determining a control area of a first control corresponding to the first control template in the first display interface.

And respectively carrying out control identification on the first display interface according to the first control template and the resolution template acquired in the step S303, and determining a control area of the first control in the first display interface.

It will be appreciated that there may be multiple controls in the first display interface, with different controls corresponding to different templates, e.g., a first control corresponding to a first control template.

S305: and acquiring a second display interface pointed by the first control by simulating a trigger operation in the control area.

Because the multi-resolution control template is adopted for control identification, a better control identification effect can be achieved for the first display interfaces with different resolutions, namely, the identified control area of the first control is more accurate, and the triggering operation of the first control by a user is simulated in the control area, so that the first display interface is jumped to the second display interface according to the first control, for example, the display interface is jumped from a login game display interface to a game hall display interface.

Through S301-S305, the server can replace a user to automatically trigger a control in the display interface, so that the user can directly enter the target display interface after triggering operation is performed on the cloud product through the terminal device, the target display interface is the display interface returned to the terminal device by the server after the triggering operation is simulated, for example, the target display interface can be a game hall display interface or a display interface corresponding to a man-machine battle scene, and the user can quickly start a game.

It should be noted that the second display interface may be a target display interface, and may also be a target display interface obtained after multiple simulation triggering operations by the server, which is not specifically limited in this application.

In S303, the first control template and the at least one resolution template are obtained, and the manner of obtaining the first control template is described first, and then the manner of obtaining the at least one resolution template is described.

After the user triggers the cloud product, the server simulates the triggering operation of the user and returns the target display interface to the terminal equipment, so that the user omits the triggering operation before entering the target display interface, the server can enter the target display interface by simulating the triggering operation of the user once, and can enter the target display interface by simulating the triggering operation of the user for multiple times.

Therefore, after the server captures the screen of the display interface of the cloud product, the running process corresponding to the display interface of the current screen capture needs to be determined so as to obtain the corresponding first control template. Therefore, a target display interface triggering process associated with the cloud product can be formulated for the cloud product, so that the server can determine a first control template related to the first display interface based on the target display interface triggering process, and further acquire the first control module.

Referring to fig. 5, the figure is a schematic diagram of an operation and maintenance module according to an embodiment of the present application. The target display interface triggering process is built in an operation and maintenance module in the server, the target display interface triggering process is read through a logic control module, and a first control template is determined and obtained. The image recognition module performs image recognition according to the first control template and the resolution template and a first display interface obtained from a cloud product running in a board card or a container to obtain the position of the first control in the first display interface, and informs the position to the logic control module, the logic control module indicates the control module to simulate the triggering operation of a user on the cloud product until the cloud product enters a target display interface, and the target display interface is returned to APP or H5 in the terminal device through the plug flow server shown in FIG. 2.

It should be noted that, for the same cloud product, one or more display interface triggering processes may be provided, which is not specifically limited in this application. The target display interface trigger flow can be preset or can be selected by a user. The following description will be made separately.

If the target display interface triggering process is preset, the acquired triggering operation also carries a process identifier, so that the target display interface triggering process is determined from the multiple display interface triggering processes according to the process identifier. For example, in the cloud product testing process, a corresponding target display interface trigger flow may be preset according to a testing requirement (e.g., whether a login flow is abnormal). For example, 3 display interface flows are preset, and the corresponding flow identifications are 1, 2, and 3. If the flow identifier obtained from the triggering operation is 2, the display interface triggering flow with the flow identifier of 2 may be obtained as the target display interface triggering flow.

And if the display interface triggering process is selected by the user, determining a target display interface triggering process from the multiple display interface triggering processes according to a preset process identification of the cloud product. For example, a user can preset a target display interface which the user directly wants to enter after opening a cloud product according to personal habits, an interface trigger flow corresponding to the process from opening the cloud product to entering the target display interface is determined as a target display interface flow corresponding to a unique flow identifier, and when the server obtains the flow identifier from the trigger operation, the user can be helped to directly enter the target display interface which the user wants to enter, so that the user can set the target display interface trigger flow in a personalized manner according to the personal habits, and the use feeling of the user is improved.

As can be seen from the foregoing, the server may simulate a trigger operation of a user to enter the target display interface, and may also simulate a trigger operation of multiple users to enter the target display interface, that is, the target display interface trigger process may include a control simulation operation once, and may also include multiple control simulation operations, which are described below separately.

If the target display interface triggering process comprises one-time control simulation operation, namely after the server simulates a user to trigger a first control for the first time, the current control simulation operation is the last control simulation operation of the target display interface triggering process, a second display interface pointed by the first control is the target display interface, and the second display interface is returned to the terminal equipment.

If the target display interface triggering process comprises multiple times of control simulation operations, namely after a first simulation user triggers a first control, the server does not perform the last control simulation operation of the target display interface triggering process, after jumping from the first display interface to a second display interface, jumping of other display interfaces is required to be continued, and then the target display interface can be accessed, namely, the second display interface is not the target display interface, and a second control in the second display interface is required to be triggered, so that the target display interface can be accessed later. Therefore, the position of the second control needs to be identified, a second control template related to the second display interface can be determined based on the target display interface triggering process, then the second control template is used as the first control template, the second display interface is used as the first display interface, and S303 and S304 are executed until the control simulation operation is the last control simulation operation of the target display interface triggering process, at this moment, the second display interface is the target display interface, and the second display interface is returned to the terminal device.

As a possible implementation manner, the display interface trigger flow may be an Extensible Markup Language (XML) file, where both a first control template to be identified and a trigger flow sequence are described in an XML Language, and are used to describe how to use OpenCV (Open Source Computer Vision Library, an image recognition software development kit) to perform template image recognition, and control a cloud product to enter a target display interface by sending a click or an input command. OpenCV generally writes calls in C + + language (a computer programming language) and Python language (a computer programming language). Compared with json (JavaScript Object Notation) language, the logical hierarchy of the XML language is clearer, the XML language is easy to modify, and the attribute name configuration is more standard. Wherein the content of the first and second substances,

the server generally adopts a Linux system (a computer operating system), and the server can be developed by adopting a C + + language, so that the server is high in execution speed, occupies less server system resources, and cannot occupy the server system resources with games.

Referring to fig. 6, an execution sequence diagram corresponding to fig. 5 is provided according to an embodiment of the present application. The method comprises the steps of describing a target display interface triggering flow through an XML file, wherein the target display interface triggering flow comprises a script logic structure, a control template and other input parameters, developing a logic control module by adopting a C + + language, reading the XML file, carrying out OpenCV image matching operation according to the input parameters, and then sending an instruction to a control module according to an image recognition result so that the control module simulates the triggering operation of a user on a cloud product.

The script logic structure included in the target display interface triggering process may be a function node shown in table 1 to adapt to various scenes in the cloud product.

TABLE 1

In the following, the cloud game is taken as an example to describe the script logic, the script logic includes the above parameter information, and is convenient to modify, and the corresponding program is as follows:

<？xml version＝"1.0"encoding＝"UTF-8"？>

-<root>

< | A! - -detect Start Game button- - >)

-<match method＝"surf"ROI＝"393,447,652,215"delay＝"2000"retry＝"5000"

hessian＝"500"template＝"tmp_main_start_game.png"name＝"match_main">

-<true>

< | A! -clicking the Start of Game button >

< | A! Detecting a game hall's sports button >

-<match method＝"surf"ROI＝"401,482,461,174"delay＝"2000"retry＝"5"

hessian＝"500"template＝"tmp_jingji_1280x720.png"name＝"match_jingji">

-<true>

< | A! Click into a sports mode >

< | A! The game has entered the lobby and the cloud gaming device can be assigned to the player who will see the game lobby's competitive mode after the player is connected to the cloud game at that time >

</true>

</match>

</true>

< | A! -no start game button detected, see if login is required >

< | A! Sometimes the game remembers the login status, so it is not required every time >

-<falseCase method＝"surf"ROI＝"451,484,614,204"hessian＝"500"

template＝"tmp_qq_login_btn.png">

< | A! Click and pull login SDK >

</falseCase>

< | A! -if there is an update button, click update >

-<falseCase method＝"surf"ROI＝"500,300,-1,-1"hessian＝"900"

template＝"tmp_gengxin_1280x720.png">

</falseCase>

</match>

</root>

Referring to fig. 7, which is a flowchart of a target display interface triggering process operation provided in the embodiment of the present application, each match node is a step, and the match nodes may be nested to form a complete link to describe a target display interface workflow, that is, a step of preloading a target display interface.

When each step is executed to obtain the result of the failure of matching, some abnormal scenes may be encountered, and a trueCase or false ecase node is needed to deal with the abnormal scenes. Each matching step may have its own exception handling or may have common exception handling, such as exception 3 shown in FIG. 7, for example, detecting a game bulletin popup.

In the above description of the manner of acquiring the first control template, the manner of acquiring at least one resolution template is described below. Resolution adjustment is performed on the acquired first control template to obtain at least one resolution template, and a resolution adjustment mode is taken as an example of scaling sampling.

The purpose of the zoom sampling is to perform a zoom-out process and a zoom-in process on the first control template. In the related art, if the first control template is to be reduced or enlarged, gaussian blurring and smoothing processing needs to be performed first to simulate the observation effect of human eyes on the object perspective, wherein the closer the object is, the clearer the object is, and the farther the object is, the more blurred the object is. Referring to fig. 8, a schematic of up-sampling and down-sampling is shown. If the first control template is scaled down, downsampling is performed, and otherwise, if the first control template is scaled up, upsampling is performed. Referring to fig. 9, a schematic diagram of downsampling is shown.

However, in the image recognition process of the cloud product, the first control template associated with the cloud product is only changed in resolution, and there is no change in the degree of image blur as far as and near in real life, so that the situation of image blur does not need to be considered, and only the problem of image scaling needs to be considered. Thus, images of different scales can be obtained by varying the size of the gaussian filter window, e.g. box filter initial size starting from 9 x 9 pixels.

The box filtering is also called as block filtering, is a linear filtering technology, and the realization of the box filtering refers to the principle idea of integral images, and in the fast integral image solving, the sum value operation of certain matrix pixels is converted into the sum difference value operation of the corresponding edge points of the matrix. The key step for realizing the box type filtering is to initialize an array S, each value of the array S is a pixel sum value in a storage pixel neighborhood, and when the pixel sum in a certain rectangular block is solved, the calculation can be completed only by indexing the sum value stored in the position of a corresponding area.

Referring to fig. 10, this is a schematic diagram of performing resolution adjustment on a first control template according to an embodiment of the present application. In fig. 10, the second login manner is a first control, and the first resolution template is obtained by performing reduction processing on the first control template, where the image resolution of the first resolution template is smaller than the image resolution of the first control template. And amplifying the first control template to obtain a second resolution template, wherein the image resolution of the second resolution template is greater than that of the first control template. Therefore, at least one first resolution template and one second resolution template are obtained by carrying out zoom sampling on the first control template, but in the zoom sampling process, the image definition of the first control template is not changed because the condition of image blurring is not required to be considered, namely the first resolution template, the second resolution template and the first control template have the same image definition.

Therefore, for the image recognition of cloud products, due to the fact that the problem of long and short distances does not exist, compared with the related art, the scaling sampling mode provided by the embodiment of the application does not need to change the image definition, and therefore the matching speed can be improved.

In the process of image recognition of the first control, the feature points related to the first control in the first display interface are extracted, then feature point matching is carried out, and finally the position of the first control is determined according to the feature point matching result.

The feature points are also called interest points and key points, and are some points which are highlighted in the image and have representative meanings, and image recognition, image registration, 3D reconstruction and the like can be performed through the points. The feature points contain important information: distance (distance), which is a quantized value representing the quality of the image recognition result, and the smaller the distance, the more similar the template and the image to be matched.

As can be seen from the foregoing, in the related art, only one first control template is generally used for image recognition, and if the difference between the image resolution of the first control template and the image resolution of the first display interface is large, the accuracy of recognizing the position of the first control is affected.

The following description takes the first control as a second login method in the cloud game login interface as an example, where the image resolution of the first control template is 1280 × 720. Referring to fig. 13, a diagram of an identification control is shown. If the image resolution of the first display interface is the same as that of the first control template, 65 effective feature points can be obtained. If the image resolution of the first display interface is different from the image resolution of the first control template, for example, the image resolution of the first display interface is 864 × 720, referring to fig. 14, only 12 effective feature points can be obtained.

The problem brought by too few feature points is that: the judgment standard is not easy to make, even the number of the effective characteristic points acquired in other scenes is only a few, and if some similar areas exist in the display interface at the moment, misjudgment is easily caused. Referring to fig. 15, a diagram of an identification control is shown. In some low-resolution scenes, errors occur in the acquired feature points, and if the number of feature points of the display interface is small, misjudgment can be caused more easily.

Based on this, as one possible implementation manner, S304 may be: and respectively matching the feature points of the first control with the first display interface according to the first control template, the first resolution template and the second resolution template so as to obtain a first feature point identification result corresponding to the first control template, a second feature point identification result corresponding to the first resolution template and a third feature point identification result corresponding to the second resolution template. And determining a control area of a first control corresponding to the first control template in the first display interface according to the first feature point identification result, the second feature point identification result and the third feature point identification result.

The first control template, the first resolution template and the second resolution template have consistent definition, image resolutions of the three templates are different, the three templates are in control matching with the first display interface one by one through the three different resolution templates, the difference between the image resolution of one template and the image resolution of the first display interface is smaller in the first control template, the first resolution template and the second resolution template, and more feature points can be obtained by utilizing feature point identification results obtained by the template with the image resolution closer to the first display interface and the first display interface, so that the matching speed can be improved on the premise of ensuring the number of extracted feature points.

The embodiment of the present application does not specifically limit the manner of determining the control area of the first control according to the three feature point recognition results, and two manners are taken as examples and are explained below.

The first method is as follows: and integrating the feature points respectively identified in the first feature point identification result, the second feature point identification result and the third feature point identification result, and determining a control area of a first control corresponding to the first control template in the first display interface. Therefore, richer control identification information can be obtained by integrating the feature point identification result, and the determined control area of the first control is more accurate.

The second method comprises the following steps: determining the number of the feature points respectively recognized in the first feature point recognition result, the second feature point recognition result and the third feature point recognition result, taking the feature point recognition result of which the number of the feature points exceeds a threshold value in the first feature point recognition result, the second feature point recognition result and the third feature point recognition result as a target recognition result, and determining a control area of a first control corresponding to the first control template in the first display interface according to the target recognition result. Therefore, according to the preset threshold value, the feature point identification result meeting the actual requirement can be screened out from the multiple results, the control area of the first control is determined according to the better feature point identification result, and the influence of the poorer feature point identification result on the identification accuracy of the control area is avoided.

As a possible implementation manner, feature point recognition may be performed on the first display interface through the feature point sampling model, so that the control area of the first control is determined according to the recognized feature points.

Specifically, according to a first control template and a resolution template, feature point identification is carried out on the first display interface through a feature point sampling model, and according to feature points identified by the feature point sampling model, a control area of a first control corresponding to the first control template in the first display interface is determined.

The input of the feature point sampling model is resolution templates of a plurality of different image resolutions, and the first display interface of the cloud product does not have the problem of distance, so that the number of layers of the feature point sampling layer in the feature point sampling model can be set to be smaller than the default number of layers of the feature point sampling layer of the feature point sampling model, and the image recognition speed is improved.

The following description will take a feature point sampling model as an accelerated-Up Robust Features (SURF) algorithm model as an example.

On the basis of keeping the excellent performance characteristics of a Scale-invariant feature transform (SIFT) operator, an operator in the SURF algorithm model simultaneously solves the defects of high SIFT computation complexity and long time consumption, improves the aspects of interest point extraction and feature vector description, and improves the computation speed.

Since the first control template is generally manufactured at 1280 × 720 of image resolution, and in order to increase the recognition speed, the SURF algorithm model performs image recognition (hereinafter referred to as SURF (2+3)) through parameter configuration of 2 sets of pyramids (Octave) and 3 layers of pictures (Layer).

As can be seen from the foregoing, if the difference between the image resolution of the first display interface and the image resolution of the first control template is relatively large, the number of the identified feature points is reduced, so as to facilitate elimination of the erroneous determination. In the related art, the number of identified feature points is increased by increasing the number of towers in the SURF algorithm model, thereby reducing the probability of erroneous determination.

Referring to fig. 16, a schematic diagram of identifying feature points is shown. The SURF algorithm model used in fig. 16 employs 4 sets of pyramids and 3-layer pictures, and the number of obtained feature points is 14. The default number of feature point sampling layers SURF (4+3) of the SURF algorithm model at this time.

See table 2 for the comparison between the feature points identified by SURF (2+3) and SURF (4+3) and the matching speed.

TABLE 2

Characteristic point sampling model	Number of feature points obtained	Match time (millisecond)
			SURF(2+3)	12	47
SURF(4+3)	14	78

As can be seen from table 2, for the first display interface with an image resolution of 864 × 720, although the number of pyramids is increased, the number of feature points obtained is not significantly increased, but the matching speed is slowed by 50%.

Based on this, the input of the SURF algorithm model is changed from the single first control template into the combination of the first control template and the resolution template, and the input of the SURF algorithm model is a plurality of multi-resolution templates without the process of blurring the first control template, so that the number of layers of the feature point sampling layers in the SURF algorithm model can be set to be smaller than the default number of layers of the feature point sampling model, and the image recognition speed is improved. The following description will be given taking an example in which the number of feature point sampling layers in the SURF algorithm model is set to SURF (1+ 3).

Referring to fig. 17, the figure is a schematic diagram of an identification feature point provided in the embodiment of the present application. In fig. 17, SURF (1+3) identifies a first control in a first display interface with an image resolution of 864 × 720, wherein input of SURF (1+3) includes a first control template and at least one resolution template, and by inputting a plurality of multi-resolution templates, the SURF algorithm model not only identifies an increased number of feature points, but also increases the image identification speed.

See table 3, which is a comparison between the feature points identified by SURF (2+3), SURF (4+3) and SURF (1+3) and the matching speed.

TABLE 3

Characteristic point sampling model	Number of feature points obtained	Match time (millisecond)
			SURF(2+3)	12	47
SURF(4+3)	14	78
			SURF(1+3)	30	48

In the related art, a matcher used in the image recognition process is a Fast Library for Approximate Nearest Neighbor (FLANN) matcher, where the FLANN matcher includes a set of algorithms that are optimized for Fast Nearest neighbor search and high-dimensional features in a large data set. For large datasets, it runs faster than a brute force matcher (BFMatcher).

The BFMatche violence matcher selects a key point in the first image, then performs (descriptor) distance test with each key point of the second image in sequence, and finally returns the key point with the nearest distance. The violence matching is better than the FLANN matcher at scene speeds with fewer feature points.

In the related art, the process of image recognition by using the FLANN matcher is as follows: and extracting the characteristic points of the first display interface through the SURF algorithm model, and then performing characteristic point matching by using a FLANN matcher. Through research, the FLANN matcher has a fast matching speed when the number of towers in the SURF algorithm model is large, wherein the SURF (4+3) has the best effect, as shown in fig. 18, the corresponding procedure is as follows:

CV_WRAP static Ptr<SURF>create(double hessianThreshold＝100,

Int n0ctaves＝4,int n0ctaveLayers＝3,

bool extended＝false,bool upright＝false)；

therefore, if the number of the pyramids in the SURF algorithm model is reduced, the matching speed of the FLANN matcher is not as fast as that of the bfmather, because the FLANN matcher has a large data operation amount only under the condition that the number of the pyramids is large, and the matching efficiency of the FLANN matcher is better than that of the bfmather. And under the condition that the number of the pyramids of the FLANN matcher is small, the data operation amount is small, and the matching efficiency of the FLANN matcher is inferior to that of the BFMathcer. That is, the matching speed of the FLANN matcher is related to the number of identified feature points.

Based on the method, different matchers can be adopted according to the number of the identified feature points, so that the matching speed is improved. Specifically, when identifying the control area of the first control, the number of the feature points of the identified feature points may be determined, and if the number of the feature points reaches a number threshold, the quality of the feature points is determined by using a first type matcher; and if the number of the feature points does not reach the number threshold, determining the quality of the feature points by using a second type matcher, and after determining the quality of the feature points according to the first type matcher or the second type matcher, determining a control area of a first control corresponding to the first control template in the first display interface according to the feature points and the corresponding quality of the feature points.

The feature point quality is a distance between feature points, for example, a distance between a relative position of a feature point of a control area of the first control in the first display interface and a relative position corresponding to the feature point in the first control template.

For example, taking a cloud game as an example, a distance of 0.15 or less may be regarded as a better feature point, a distance of more than 0.15 and less than 0.20 may be regarded as a referenceable feature point, and the rest are worse feature points. In order to identify the start game control in fig. 11, the feature points found by image recognition are shown in fig. 12, where circles are better feature points (distance is less than or equal to 0.15), and rectangles are poorer feature points (distance is greater than 0.15) that can be used as a reference.

Taking the first type of matcher as an FLANN matcher and the second type of matcher as a BFMatcher as an example, if the first control template with an image resolution of 1280 × 720 is matched with the first display interface with an image resolution of 860 × 724, the obtained result may be referred to table 4, which is a comparison between feature points identified by SURF (2+3) and SURF (1+3) in different sampling manners and matching speeds.

TABLE 4

Therefore, when the number of the obtained feature points is large, the parameter configuration in the SURF algorithm model can be reduced to 1 group of pyramids and 3-layer pictures, so that the feature point extraction speed is increased, and the image identification speed can be increased to about 30% by using the BFMatcher.

Next, a control identification method provided in the embodiment of the present application will be described with reference to fig. 19 to 21, taking a cloud game as an example.

In practical application, after a user opens the cloud game APP in the smart phone, the cloud game skips the game starting display interface shown in fig. 19 and the game hall display interface shown in fig. 20, and directly enters the display interface corresponding to the man-machine battle scene shown in fig. 21, and the server replaces manual trigger operation, so that user operation is simplified, the user can quickly enter the game, and the time for the user to wait for the game is reduced.

Actually, the display interface shown in fig. 21 is a target display interface, and the preset target display interface triggering process is performed until the display interface shown in fig. 21 is entered after the display interface shown in fig. 19 jumps to the display interface shown in fig. 20.

After a user opens a cloud game APP in a smart phone, a server acquires a trigger operation sent by the smart phone for the cloud game, wherein the trigger operation carries a terminal identifier and a process identifier for determining a target display interface trigger process in a plurality of display interface trigger processes.

The server obtains a first display interface, namely the display interface shown in fig. 19, of the cloud game corresponding to the user trigger operation according to the terminal identifier. And determining a first control template, namely a control template corresponding to the starting game control according to the process identification and the target display interface triggering process.

The server acquires a first control template, a first resolution template and a second resolution template, wherein the image resolution of the first resolution template is smaller than that of the first control template, the image resolution of the second resolution template is larger than that of the first control template, and the first control template, the first resolution template and the second resolution template have consistent image definition.

The server identifies the feature points of the first control through the SURF algorithm model and the first display interface respectively according to the first control template, the first resolution template and the second resolution template to obtain a first feature point identification result corresponding to the first control template, a second feature point identification result corresponding to the first resolution template and a third feature point identification result corresponding to the second resolution template, wherein the parameters of the used SURF algorithm model are configured to be SURF (1+ 3).

And determining the number of the characteristic points identified according to the SURF algorithm model. If the number of the characteristic points reaches the number threshold, determining the quality of the characteristic points by adopting an FLANN matcher; and if the number of the characteristic points does not reach the number threshold, determining the quality of the characteristic points by adopting BFMatcher. And determining the control area of the first control corresponding to the first control template in the first display interface by adopting the first mode or the second mode according to the characteristic points and the quality of the corresponding characteristic points.

After determining the control area corresponding to the game starting control shown in fig. 19, the server simulates a trigger operation in the control area, and acquires a second display interface pointed by the first control, that is, the display interface shown in fig. 20.

Because the target display interface triggering process includes multiple control simulation operations, and the current simulation triggering operation of the server is not the last control simulation operation of the multiple control simulation operations, a second control template related to the second display interface, that is, a template corresponding to the competitive countermeasure control as shown in fig. 20, is determined based on the target display interface triggering process. And taking the second control template as the first control template and the second display interface as the first display interface, executing the operation of obtaining the first control template, the first resolution template and the second resolution template again, and identifying a control area corresponding to the competitive countermeasure control according to the three templates and the SURF algorithm model.

After determining the control area corresponding to the competitive countermeasure control shown in fig. 20, the server simulates a trigger operation in the control area to obtain a second display interface pointed by the first control, that is, the display interface shown in fig. 21, at this time, the current simulation trigger operation of the server is the last control simulation operation of the multiple control simulation operations, so the display interface shown in fig. 21 is a target display interface, and the server returns the target display interface to the smartphone. Therefore, after the user opens the cloud game APP in the smart phone, the user can directly enter the display interface shown in fig. 21.

Aiming at the control identification method provided by the embodiment, the embodiment of the application also provides a control identification device.

Referring to fig. 22, this figure is a schematic diagram of a control identification apparatus according to an embodiment of the present application. As shown in fig. 22, the control identifying apparatus 2200 includes: an acquisition unit 2201 and a recognition unit 2202;

the acquiring unit 2201 is configured to acquire a trigger operation sent by a terminal device for a cloud product, where the trigger operation includes a terminal identifier of the terminal device; acquiring a first display interface of the cloud product corresponding to the trigger operation according to the terminal identification; acquiring a first control template and at least one resolution template related to the first display interface, wherein the resolution template is obtained by adjusting the resolution of the first control template, and the image resolution of the resolution template is different from that of the first control template;

the identification unit 2202 is configured to perform control identification on the first display interface according to the first control template and the resolution template, and determine a control area of a first control corresponding to the first control template in the first display interface;

the obtaining unit 2201 is further configured to obtain, by simulating a trigger operation in the control area, a second display interface pointed by the first control, where the second display interface is used to determine a target display interface returned to the terminal device.

As a possible implementation manner, the resolution adjustment is scaling sampling, and the apparatus 2200 further includes a resolution adjustment unit configured to:

and carrying out zoom sampling on the first control template to at least obtain a first resolution template and a second resolution template, wherein the first control template, the first resolution template and the second resolution template have consistent image definition, the image resolution of the first resolution template is smaller than that of the first control template, and the image resolution of the second resolution template is larger than that of the first control template.

As a possible implementation manner, the identifying unit 2202 is configured to:

respectively matching feature points of the first control with the first display interface according to the first control template, the first resolution template and the second resolution template to obtain a first feature point identification result corresponding to the first control template, a second feature point identification result corresponding to the first resolution template and a third feature point identification result corresponding to the second resolution template;

and determining a control area of a first control corresponding to the first control template in the first display interface according to the first feature point identification result, the second feature point identification result and the third feature point identification result.

synthesizing the feature points respectively identified in the first feature point identification result, the second feature point identification result and the third feature point identification result, and determining a control area of a first control corresponding to the first control template in the first display interface; alternatively, the first and second electrodes may be,

determining the number of the feature points respectively identified in the first feature point identification result, the second feature point identification result and the third feature point identification result; and taking the feature point identification result of which the number of the feature points exceeds a threshold value in the first feature point identification result, the second feature point identification result and the third feature point identification result as a target identification result, and determining a control area of a first control corresponding to the first control template in the first display interface according to the target identification result.

As a possible implementation manner, the apparatus 2200 further includes a determining unit configured to:

determining the first control template related to the first display interface based on a target display interface trigger process associated with the cloud product.

As a possible implementation manner, the determining unit is further configured to:

determining the target display interface triggering process from a plurality of display interface triggering processes according to the process identification carried in the triggering operation; alternatively, the first and second electrodes may be,

and determining the target display interface triggering process from a plurality of display interface triggering processes according to the preset process identification of the cloud product.

As a possible implementation manner, the apparatus 2200 further includes a returning unit configured to:

if the target display interface triggering process comprises one-time control simulation operation, taking the second display interface as the target display interface, and returning the second display interface to the terminal equipment;

if the target display interface triggering process comprises multiple control simulation operations, determining a second control template related to the second display interface based on the target display interface triggering process; and taking the second control template as the first control template, taking the second display interface as the first display interface, and executing the operation of acquiring the first control template and at least one resolution template related to the first display interface.

according to the first control template and the resolution template, feature point identification is carried out on the first display interface through a feature point sampling model, and the number of feature point sampling layers in the feature point sampling model is set to be smaller than the default number of feature point sampling layers of the feature point sampling model;

and determining a control area of a first control corresponding to the first control template in the first display interface according to the characteristic points identified by the characteristic point sampling model.

determining the number of the identified characteristic points;

if the number of the characteristic points reaches a number threshold, determining the quality of the characteristic points by adopting a first type matcher;

if the number of the characteristic points does not reach the number threshold, determining the quality of the characteristic points by adopting a second type matcher;

and determining a control area of a first control corresponding to the first control template in the first display interface according to the characteristic points and the quality of the corresponding characteristic points.

According to the control identification device provided by the embodiment of the application, in order to quickly enter a target display interface of a cloud product, a trigger operation carrying a terminal identifier of the cloud product is acquired from a terminal device, and a first display interface of the cloud product corresponding to the trigger operation is acquired according to the terminal identifier. The method comprises the steps that a first control which needs to be triggered by a user when the first control originally enters a target display interface is included in the first display interface, in order to simplify the operation of the user and achieve automatic triggering, a control area of the first control in the first display interface is identified through a first control template corresponding to the first control, in order to improve identification precision, resolution adjustment is conducted on the first control template, at least one resolution template is obtained, and due to the fact that the image resolution of the resolution template is different from that of the first control template, control identification is conducted on a first display image through the first control template and the resolution template, and richer control identification information determined based on templates with different resolutions can be obtained. Because the multi-resolution control template is adopted for control identification, a better control identification effect can be achieved for the first display interfaces with different resolutions, the success rate of simulating the touch control of the first control is improved, the retry times of acquiring the second display interface pointed by the first control are reduced, and the efficiency of returning the target display interface to the terminal equipment is improved.

The aforementioned control identification device may be a computer device, which may be a server or a terminal device, and the computer device provided in the embodiments of the present application will be described below from the perspective of hardware implementation. Fig. 23 is a schematic structural diagram of a server, and fig. 24 is a schematic structural diagram of a terminal device.

Referring to fig. 23, fig. 23 is a schematic diagram of a server 1400 provided by an embodiment of the present application, which may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1422 (e.g., one or more processors) and a memory 1432, one or more storage media 1430 (e.g., one or more mass storage devices) for storing applications 1442 or data 1444. Memory 1432 and storage media 1430, among other things, may be transient or persistent storage. The program stored on storage medium 1430 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Still further, a central processor 1422 may be disposed in communication with storage medium 1430 for executing a series of instruction operations on storage medium 1430 on server 1400.

The server 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input-output interfaces 1458, and/or one or more operating systems 1441, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 23.

The CPU 1422 is configured to perform the following steps:

Optionally, the CPU 1422 may further execute the method steps of any specific implementation manner of the control identification method in the embodiment of the present application.

Referring to fig. 24, fig. 24 is a schematic structural diagram of a terminal device according to an embodiment of the present application. Fig. 24 is a block diagram illustrating a partial structure of a smartphone related to a terminal device provided in an embodiment of the present application, where the smartphone includes: a Radio Frequency (RF) circuit 1510, a memory 1520, an input unit 1530, a display unit 1540, a sensor 1550, an audio circuit 1560, a wireless fidelity (WiFi) module 1570, a processor 1580, and a power supply 1590. Those skilled in the art will appreciate that the smartphone configuration shown in fig. 24 is not limiting and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The following describes each component of the smartphone in detail with reference to fig. 24:

the RF circuit 1510 may be configured to receive and transmit signals during information transmission and reception or during a call, and in particular, receive downlink information of a base station and then process the received downlink information to the processor 1580; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuit 1510 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuit 1510 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), and the like.

The memory 1520 may be used to store software programs and modules, and the processor 1580 implements various functional applications and data processing of the smart phone by operating the software programs and modules stored in the memory 1520. The memory 1520 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the smartphone, and the like. Further, the memory 1520 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 1530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the smartphone. Specifically, the input unit 1530 may include a touch panel 1531 and other input devices 1532. The touch panel 1531, also referred to as a touch screen, can collect touch operations of a user (e.g., operations of the user on or near the touch panel 1531 using any suitable object or accessory such as a finger or a stylus) and drive corresponding connection devices according to a preset program. Alternatively, the touch panel 1531 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, and sends the touch point coordinates to the processor 1580, and can receive and execute commands sent by the processor 1580. In addition, the touch panel 1531 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 1530 may include other input devices 1532 in addition to the touch panel 1531. In particular, other input devices 1532 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 1540 may be used to display information input by the user or information provided to the user and various menus of the smartphone. The Display unit 1540 may include a Display panel 1541, and optionally, the Display panel 1541 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1531 may cover the display panel 1541, and when the touch panel 1531 detects a touch operation on or near the touch panel 1531, the touch operation is transmitted to the processor 1580 to determine the type of the touch event, and then the processor 1580 provides a corresponding visual output on the display panel 1541 according to the type of the touch event. Although in fig. 24, the touch panel 1531 and the display panel 1541 are two separate components to implement the input and output functions of the smartphone, in some embodiments, the touch panel 1531 and the display panel 1541 may be integrated to implement the input and output functions of the smartphone.

The smartphone may also include at least one sensor 1550, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 1541 according to the brightness of ambient light and a proximity sensor that may turn off the display panel 1541 and/or backlight when the smartphone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration) for recognizing the attitude of the smartphone, and related functions (such as pedometer and tapping) for vibration recognition; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the smart phone, further description is omitted here.

Audio circuit 1560, speaker 1561, microphone 1562 may provide an audio interface between a user and a smartphone. The audio circuit 1560 may transmit the electrical signal converted from the received audio data to the speaker 1561, and convert the electrical signal into an audio signal by the speaker 1561 and output the audio signal; on the other hand, the microphone 1562 converts collected sound signals into electrical signals, which are received by the audio circuit 1560 and converted into audio data, which are processed by the output processor 1580 and then passed through the RF circuit 1510 for transmission to, for example, another smart phone, or output to the memory 1520 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the smart phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through a WiFi module 1570, and provides wireless broadband internet access for the user. Although fig. 24 shows WiFi module 1570, it is understood that it does not belong to the essential constitution of the smartphone and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 1580 is a control center of the smartphone, connects various parts of the entire smartphone by using various interfaces and lines, and performs various functions of the smartphone and processes data by operating or executing software programs and/or modules stored in the memory 1520 and calling data stored in the memory 1520, thereby integrally monitoring the smartphone. Optionally, the processor 1580 may include one or more processing units; preferably, the processor 1580 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, and the like, and a modem processor, which mainly handles wireless communications. It is to be appreciated that the modem processor may not be integrated into the processor 1580.

The smartphone also includes a power supply 1590 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 1580 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

Although not shown, the smart phone may further include a camera, a bluetooth module, and the like, which are not described herein.

In an embodiment of the application, the smartphone includes a memory 1520 that can store program code and transmit the program code to the processor.

The processor 1580 included in the smartphone may execute the control identification method provided in the foregoing embodiment according to an instruction in the program code.

The embodiment of the present application further provides a computer-readable storage medium for storing a computer program, where the computer program is used to execute the control identification method provided in the foregoing embodiment.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the control identification method provided in the various alternative implementations of the above aspects.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.

It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A control identification method, the method comprising:

2. The method of claim 1, wherein the resolution adjustment is to scale samples, the method further comprising:

3. The method of claim 2, wherein the performing control recognition on the first display interface according to the first control template and the resolution template, and determining a control area of a first control corresponding to the first control template in the first display interface comprises:

4. The method of claim 3, wherein the determining the control region of the first control corresponding to the first control template in the first display interface according to the first feature point recognition result, the second feature point recognition result, and the third feature point recognition result comprises:

5. The method of claim 1, further comprising:

6. The method of claim 5, further comprising:

7. The method of claim 5, wherein if the target display interface trigger flow comprises a control simulation operation, the method further comprises:

taking the second display interface as the target display interface, and returning the second display interface to the terminal equipment;

if the target display interface triggering process comprises multiple control simulation operations, the method further comprises the following steps:

determining a second control template related to the second display interface based on the target display interface trigger flow; and taking the second control template as the first control template, taking the second display interface as the first display interface, and executing the operation of acquiring the first control template and at least one resolution template related to the first display interface.

8. The method according to any one of claims 1 to 7, wherein the performing control identification on the first display interface according to the first control template and the resolution template, and determining a control area of a first control corresponding to the first control template in the first display interface comprises:

9. The method according to claim 8, wherein the determining a control region of a first control corresponding to the first control template in the first display interface according to the feature points identified by the feature point sampling model comprises:

determining the number of the identified characteristic points;

10. An apparatus for identifying controls, the apparatus comprising: an acquisition unit and an identification unit;

11. The apparatus of claim 10, wherein the resolution adjustment is to scale samples, the apparatus further comprising a resolution adjustment unit to:

12. The apparatus according to claim 11, wherein the identifying unit is configured to:

13. The apparatus according to claim 12, wherein the identifying unit is configured to:

14. A computer device, the device comprising a processor and a memory:

the processor is configured to perform the method of any of claims 1-9 according to instructions in the program code.

15. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any of claims 1-9.