CN114296581A

CN114296581A - Display device and control triggering method

Info

Publication number: CN114296581A
Application number: CN202110842951.3A
Authority: CN
Inventors: 高伟; 姜俊厚; 贾亚洲; 岳国华; 祝欣培; 初德进; 李保成; 李佳琳; 于硕; 吴汉勇
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2022-04-08

Abstract

The application provides a display device and a control triggering method, which can determine control keywords in a control triggering instruction aiming at the control triggering instruction input by a user. And further determining the position information of the control keyword in the user interface, and finally triggering the control at the position information, thereby realizing triggering the target control which the user wants to use. The method and the device can determine the position information of the control key words in the user interface instead of the position information of the text obtained by character recognition, so that the position of the target control can be accurately determined, and the target control is triggered. Therefore, the method and the device have high accuracy when the control is triggered, and the experience for the user is strong.

Description

Display device and control triggering method

Technical Field

The application relates to the technical field of display equipment, in particular to display equipment and a control triggering method.

Background

The display device refers to a terminal device capable of outputting a specific display picture, such as a smart television, a mobile terminal, a smart advertisement screen, a projector, and the like. Along with the rapid development of display equipment, the function of the display equipment is more and more abundant, the performance is more and more powerful, the bidirectional man-machine interaction function can be realized, and various functions such as audio and video, entertainment, data and the like are integrated, so that the diversified and personalized requirements of users are met. Intelligent voice interaction has become one of the main functions of display devices.

For the display equipment with the intelligent voice interaction function, a user can input a control instruction in a voice mode, so that some functions of the display equipment are realized. For example, a user may control the display device to trigger a corresponding target control by saying "i want to see something". The display device can capture a current picture of the display and perform character recognition, so that a plurality of texts are obtained. And matching the instruction of the user with the texts to obtain the position information of the text corresponding to the instruction. The display device triggers the control at the position information to realize the instruction of the user.

However, each text obtained by character recognition may contain a plurality of control names, and the obtained position information is position information of the whole text, not position information of the target control. And the control at the position information may be other controls, not the target control. Therefore, the display device may trigger an incorrect control, and the control instruction of the user cannot be implemented, which results in poor user experience.

Disclosure of Invention

The invention provides a display device and a control triggering method. The problem that in the related art, display equipment may trigger wrong controls, and user experience is poor is solved.

In a first aspect, the present application provides a display device. The device comprises a display device, a sound collector and a controller. Wherein the display is configured to display a user interface; the sound collector is configured to receive a voice instruction input by a user; a controller configured to perform the steps of:

responding to a control triggering instruction input by a user, and determining a control keyword in the control triggering instruction; determining position information of the control key words in a user interface; and triggering a control at the position information.

In some implementations, the controller is further configured to: in the step of determining the control keyword in the control triggering instruction,

converting the control triggering instruction into a control triggering text; and performing word segmentation processing on the control trigger text to obtain control keywords.

In some implementations, the controller is further configured to: in performing the step of determining the position information of the control keyword in the user interface,

screenshot processing is carried out on the user interface to obtain a screenshot image; performing character recognition processing on the screenshot image to obtain image recognition information; and acquiring the position information of the control key words in the user interface based on the image identification information.

In some implementations, the controller is further configured to: in the step of acquiring the position information of the control keyword in the user interface based on the image recognition information,

determining a plurality of text recognition information contained in the image recognition information; determining control keyword identification information corresponding to the control keywords in the plurality of text identification information, and acquiring position information of the control keywords in a user interface according to the control keyword identification information.

In some implementations, the control keyword identification information includes control keyword identification text and identification text location information.

In some implementations, the controller is further configured to: in the step of acquiring the position information of the control keyword in the user interface according to the control keyword identification information,

detecting the control keyword identification information; when the control keyword identification information is detected to further comprise character information, determining the position information of each character of the control keyword according to the character information; when detecting that the control keyword identification information does not comprise character information, determining the character information according to the control keyword identification information; the character information comprises position information of each character of the control keyword recognition text; and determining the position information of the control keyword in the user interface according to the position information of each character of the control keyword.

In some implementations, the controller is further configured to: in performing the step of determining character information based on the control keyword recognition information,

determining the number of characters and the number of empty characters between a first character and each character in the control keyword recognition text, and acquiring the length of the characters and the length of the empty characters; calculating a first character length from the first character to each character according to the character length and the character number; calculating a first empty character length from the first character to each character according to the empty character length and the empty character number; calculating the length from the first character to each character according to the first character length and the first empty character length; and determining the position information of each character according to the length from the first character to each character and the identification text position information.

In some implementations, the controller is further configured to: in performing the step of obtaining the length of the empty character,

determining the length of the control keyword recognition text according to the recognition text position information; determining the total number of characters and the total number of null characters in the control keyword recognition text, and calculating the total length of the characters in the control keyword recognition text according to the total number of the characters and the length of the characters; calculating the difference value between the length of the control keyword recognition text and the total length of the characters; and calculating the ratio of the difference to the total number of the null characters to obtain the length of the null characters.

In some implementations, the controller is further configured to: in performing the step of calculating the length of the first character to each character based on the first character length and the first dummy character length,

acquiring the length of a character interval; calculating the sum of the number of the characters and the number of the empty characters to obtain a first number; calculating the total length of the character interval from the first character to each character according to the character interval length and the first number; and calculating the sum of the first character length, the first empty character length and the total length of the character interval to obtain the length from the first character to each character.

In a second aspect, the present application provides a control triggering method, applied to a display device, including:

According to the technical scheme, the display equipment and the control triggering method can determine the control keywords in the control triggering instruction according to the control triggering instruction input by the user. And further determining the position information of the control keyword in the user interface, and finally triggering the control at the position information, thereby realizing triggering the target control which the user wants to use. The method and the device can determine the position information of the control key words in the user interface instead of the position information of the text obtained by character recognition, so that the position of the target control can be accurately determined, and the target control is triggered. Therefore, the method and the device have high accuracy when the control is triggered, and the experience for the user is strong.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 illustrates a usage scenario of a display device according to some embodiments;

fig. 2 illustrates a hardware configuration block diagram of the control apparatus 100 according to some embodiments;

fig. 3 illustrates a hardware configuration block diagram of the display apparatus 200 according to some embodiments;

FIG. 4 illustrates a software configuration diagram in the display device 200 according to some embodiments;

FIG. 5 shows a schematic diagram of a user interface in some embodiments;

FIG. 6 is a schematic diagram illustrating the display of a voice interaction mode confirmation message in the display in some embodiments;

FIG. 7 shows a schematic diagram of a user interface in some embodiments;

FIG. 8 illustrates a schematic diagram of an application list in some embodiments;

FIG. 9 illustrates an interaction flow diagram for components of a display device in some embodiments;

FIG. 10 is a diagram illustrating text recognition information in some embodiments;

FIG. 11 is a schematic diagram illustrating the display of a prompt in a display in some embodiments;

FIG. 12 is a flow diagram illustrating one embodiment of a control-triggering method.

Detailed Description

To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

Fig. 1 is a schematic diagram of a usage scenario of a display device according to an embodiment. As shown in fig. 1, the display apparatus 200 is also in data communication with a server 400, and a user can operate the display apparatus 200 through the smart device 300 or the control device 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes at least one of an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, and controls the display device 200 in a wireless or wired manner. The user may control the display apparatus 200 by inputting a user instruction through at least one of a key on a remote controller, a voice input, a control panel input, and the like.

In some embodiments, the smart device 300 may include any of a mobile terminal, a tablet, a computer, a laptop, an AR/VR device, and the like.

In some embodiments, the smart device 300 may also be used to control the display device 200. For example, the display device 200 is controlled using a camera application running on the smart device.

In some embodiments, the smart device 300 and the display device may also be used for communication of data.

In some embodiments, the display device 200 may also be controlled in a manner other than the control apparatus 100 and the smart device 300, for example, the voice instruction control of the user may be directly received by a module configured inside the display device 200 to obtain a voice instruction, or may be received by a voice control apparatus provided outside the display device 200.

In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers.

In some embodiments, software steps executed by one step execution agent may be migrated on demand to another step execution agent in data communication therewith for execution. Illustratively, software steps performed by the server may be migrated to be performed on a display device in data communication therewith, and vice versa, as desired.

Fig. 2 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction from a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200.

In some embodiments, the communication interface 130 is used for external communication, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module.

In some embodiments, the user input/output interface 140 includes at least one of a microphone, a touchpad, a sensor, a key, or an alternative module.

Fig. 3 shows a hardware configuration block diagram of the display apparatus 200 according to an exemplary embodiment.

In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, a user interface.

In some embodiments the controller comprises a central processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.

In some embodiments, the display 260 includes a display screen component for displaying pictures, and a driving component for driving image display, a component for receiving image signals from the controller output, displaying video content, image content, and menu manipulation interface, and a user manipulation UI interface, etc.

In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the control device 100 or the server 400 through the communicator 220.

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which may be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon.

In some embodiments the controller comprises at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphics Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first to nth interface for input/output, a communication Bus (Bus), and the like.

A CPU processor. The system is used for executing the operating system and the camera application instructions stored in the memory and executing various camera applications, data and contents according to various interaction instructions received from the outside so as to finally display and play various audio and video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.

In some embodiments, a graphics processor for generating various graphics objects, such as: at least one of an icon, an operation menu, and a user input instruction display figure. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, and perform at least one of video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal displayed or played on the direct display device 200.

In some embodiments, the video processor includes at least one of a demultiplexing module, a video decoding module, an image composition module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.

In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform at least one of noise reduction, digital-to-analog conversion, and amplification processing to obtain a sound signal that can be played in the speaker.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between a camera application or operating system and a user that enables conversion between an internal form of information and a user-acceptable form. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. visual interface elements.

In some embodiments, user interface 280 is an interface that may be used to receive control inputs (e.g., physical buttons on the body of the display device, or the like).

In some embodiments, a system of a display device may include a Kernel (Kernel), a command parser (shell), a file system, and a camera application. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user camera application. The camera application is compiled into machine code after being started, and a process is formed.

Referring to fig. 4, in some embodiments, the system is divided into four layers, which are, from top to bottom, a camera Application (Applications) layer (abbreviated as "Application layer"), a camera Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer.

In some embodiments, at least one camera application runs in the camera application layer, and the camera applications may be a Window (Window) program of an operating system, a system setting program, a clock program, or the like; or a camera application developed by a third party developer. In particular, the camera application package in the camera application layer is not limited to the above example.

The framework layer provides an Application Programming Interface (API) and a programming framework for the camera application of the camera application layer. The camera application framework layer includes some predefined functions. The camera application framework layer acts as a processing center that decides to let the camera applications in the application layer act. The camera application can access resources in the system and obtain services of the system in execution through the API interface.

As shown in fig. 4, in the embodiment of the present application, the camera application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to the camera application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the various camera applications and the usual navigation fallback functions, such as controlling the exit, opening, fallback, etc. of the camera applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window, displaying a shake, displaying a distortion deformation, and the like), and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the core layer includes at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..

Each text obtained through character recognition may contain a plurality of control names, and the obtained position information is the position information of the whole text, but not the position information of the target control. And the control at the position information may be other controls, not the target control. Therefore, the display device may trigger an incorrect control, and the control instruction of the user cannot be implemented, which results in poor user experience.

The application provides a display device, including display, sound collector and controller.

Wherein the display is used for displaying a user interface. The user interface is the current displayed picture content in the display. The user interface may be a specific target image, such as various media assets acquired from a network signal source, including video, pictures, and other content. The user interface may also be some UI interface of the display device. The sound collector may be a microphone for receiving a voice instruction input by a user, such as a control triggering instruction.

In some embodiments, after the user controls the display device to be powered on, the controller may control the display device to display a user interface, and fig. 5 is a schematic diagram of the user interface in some embodiments, where the user interface includes a media asset interface for playing media assets. The user interface also includes some specific controls, such as "movie". The user may click on the asset interface to control the display to display assets in a full screen mode. The user can click a certain control, so that the interface corresponding to the control is displayed in the display.

In some embodiments, the display device has a voice interaction function, and the user can control the display device by inputting voice. The display device may be provided with a voice interaction mode. In the voice interaction mode, a user may have voice interaction with the display device.

In some embodiments, the user may send a voice interaction mode command to the display device by operating a designated key of the remote controller. And binding the corresponding relation between the voice interaction mode command and the remote controller key in advance in the actual application process. For example, a voice interaction mode key is arranged on the remote controller, when a user touches the key, the remote controller sends a voice interaction mode instruction to the controller, and at the moment, the controller controls the display device to enter a voice interaction mode. When the user touches the key again, the controller may control the display device to exit the voice interaction mode.

In some embodiments, the user may directly control the display device to enter the voice interaction mode by means of voice input using a sound collector of the display device, such as a microphone. An intelligent voice system can be arranged in the display device, and the intelligent voice system can recognize the voice of the user so as to extract the instruction content input by the user. The user can input a preset awakening word through the microphone so as to start the intelligent voice system, and the controller can respond to the instruction input by the user. For example, the user may enter "a classmate" to activate the intelligent speech system, at which point the display device enters a speech interaction mode.

In some embodiments, the user may also send a voice interaction mode instruction to the display device through a preset gesture. The display device may detect the user's behavior through an image collector, such as a camera. When the user makes a preset gesture, the user may be considered to have sent a voice interaction mode instruction to the display device. For example, it can be set as: when the V-shaped word drawn by the user is detected, the user is judged to input a voice interaction mode instruction to the display device. The user can also send a voice interaction mode instruction to the display device through a preset action. For example, it can be set as: and when the fact that the user lifts the left foot and the right hand simultaneously is detected, the fact that the user inputs a voice interaction mode instruction to the display device is judged.

In some embodiments, when the user controls the display device using the smart device, for example using a cell phone, a voice interaction mode instruction may also be sent to the display device. In the process of practical application, a control can be set in the mobile phone, whether the mobile phone enters the voice interaction mode can be selected through the control, and therefore a voice interaction mode instruction is sent to the controller, and at the moment, the controller can control the display equipment to enter the voice interaction mode.

A voice interaction mode option may also be set in the UI interface of the display device, and when the user clicks the option, the display device may be controlled to enter or exit the voice interaction mode.

In some embodiments, to prevent the user from triggering the voice interaction mode by mistake, when the controller receives the voice interaction mode instruction, the controller may control the display to display the voice interaction mode confirmation information, so that the user performs secondary confirmation to determine whether to control the display device to enter the voice interaction mode. Fig. 6 illustrates a schematic diagram of displaying a voice interaction mode confirmation message in a display in some embodiments.

In some embodiments, after the display device is triggered to enter the voice control mode, the user may also send an instruction to the display device in a text form through a mobile phone, a remote controller, and the like, so as to prevent the display device from being unable to receive the voice instruction of the user after the microphone has a problem.

When the display device enters a voice interaction mode, voice interaction with a user is possible. The user can input various instructions to the display device through the microphone to realize various operations on the display device. The user can send a control triggering instruction to the display device to trigger the target control which needs to be used by the user.

In some embodiments, after the display device enters the voice interaction mode, the user may view the user interface and send a control trigger instruction. FIG. 7 illustrates a schematic diagram of a user interface in some embodiments.

The control trigger instruction may be an instruction containing the full name of a control, e.g., the user may send the voice "i want to watch a movie". As shown in fig. 7, the "movie" is a name of the control, and when the display device receives the instruction, the display device may trigger the "movie" control and display an interface corresponding to the "movie" control to the user. As another example, the user may send "open My applications". When the display device receives the instruction, the my application control may be triggered, and an interface corresponding to the my application control is displayed to the user, where the interface may include all applications already installed in the display device, as shown in fig. 8, the interface corresponding to the my application control may include an application list, and the application list includes all applications already installed in the display device.

The control triggering instruction may also be an instruction containing a partial name of a control. For example, there is a control "strongest animation season" in the user interface. The user may send a voice "i want to see the animation season", and when the display device receives the instruction, the control that the user wants to trigger may be determined to be the "strongest animation season", thereby triggering the control.

In some embodiments, after the user sends the control triggering instruction, the display device may trigger a target control corresponding to the control triggering instruction. FIG. 9 illustrates an interaction flow diagram for components of a display device in some embodiments.

In response to a control triggering instruction input by a user, the controller may first determine a control keyword associated with a control included in the control triggering instruction. The control keyword may be a complete name of a certain control or a partial name of a certain control. There may be a one-to-one correspondence between control keywords and control names.

In some embodiments, after the display device receives a control triggering instruction input by a user, the controller may send the received voice data to a voice recognition service, so as to convert the voice data into text information, and obtain a control triggering text. For the identification operation of the control trigger instruction of the user, reference may be made to related technologies, which are not described in detail in this embodiment of the present application.

In some embodiments, the display device may also include a third party speech recognition interface. After receiving a control triggering instruction input by a user, the controller can send the voice data to a third-party voice recognition interface, and recognize the control triggering instruction of the user as a control triggering text by using a third-party voice recognition device and the like.

After the control trigger text is obtained, the controller can analyze the control trigger text, so that control keywords in the control trigger text are obtained.

In some embodiments, the controller may identify the control trigger text to obtain some keywords in the control trigger text.

The controller can firstly perform word segmentation processing on the control trigger text to obtain a word segmentation result comprising a plurality of words, and the word segmentation processing can adopt an open source word segmentation tool JIEBA. For example, for a control triggering instruction "i want to watch a movie", after performing word segmentation processing, three words with word segmentation results of "i want to watch, movie" can be obtained. The specific word segmentation method can refer to the related technology, and is not described in detail in this application.

After the word segmentation processing is carried out on the control trigger text, the controller can analyze the word segmentation result and extract the keywords in the word segmentation result. Specifically, nouns in the segmentation result may be extracted. For example, for the word segmentation result "i want, see, movie", and "see" can be resolved into a control instruction "click", and the term "movie" therein, which is a control name in the user interface, i.e., a control keyword, can also be extracted.

In some embodiments, it is contemplated that the control-triggering instruction entered by the user may only include a partial name of a control, for example, the control-triggering instruction is "i want to see the animation season". After the controller carries out word segmentation processing on the control trigger text, the noun 'animation season' in the control trigger text can be extracted.

The controller may traverse all control names in the UI layout information corresponding to the user interface. By comparing all the control names with the extracted nouns, namely matching the control names with the nouns, repeated words in the nouns and all the control names can be determined, and the words are determined as control keywords.

In some embodiments, upon determining a control keyword in the control triggering instruction, the controller may determine location information of the control keyword in the user interface.

The controller may first perform screenshot processing on the user interface to obtain a screenshot image of the user interface. Specifically, the controller may capture a screen of a currently displayed picture in the display through a screen capture program, so as to obtain a screenshot image of the user interface.

After the screenshot image of the user interface is acquired, the controller can perform character recognition processing on the screenshot image to obtain image recognition information. The position information of the control key words in the user interface can be determined through the image identification information.

Specifically, the controller may recognize the screenshot image by using an OCR (Optical Character Recognition) method, so as to obtain text information included in the screenshot image. For example, the captured image is recognized by using an OCR character model, and corresponding image recognition information can be obtained. Among them, the OCR method can determine its shape by detecting dark and light patterns and then translate the shape into computer text by a character recognition method, which is the prior art. The specific OCR recognition mode may be set according to a hardware configuration status of the actual display device, for example, the OCR recognition mode may be a text recognition mode based on forms of artificial intelligence, a neural network, a genetic algorithm, and the like, which is not limited in this embodiment.

In some embodiments, the OCR recognition method may convert the screenshot image from a picture form to a text form, and may obtain text information contained in the screenshot image. When the captured image is subjected to OCR recognition, a specific text can be obtained, and the number of the texts is multiple. And meanwhile, the position information of each text in the screenshot image, namely the position information of each text in the user interface can be identified. Therefore, the image recognition information includes a plurality of text recognition information, and each text recognition information may include a specific recognition text and position information of the recognition text.

It should be noted that, the OCR recognition method cannot guarantee that a specific control name is recognized, and only one piece of text information is recognized, so that one piece of text may include multiple control names. For example, the recognition text may be "chose" UP master awards "the strongest animation season. The identification text comprises specific literal characters, and the embodiment of the application is uniformly expressed by characters. "+" indicates a space, i.e., an empty character. Meanwhile, the position information of the whole text in the user interface can be identified.

The text may be considered a rectangular area. The position information of the text may be represented as (x0, y0, W0, H0). Where, (x0, y0) represents coordinates of a certain vertex or center point of the rectangular region, W0 represents the length of the text, and H0 represents the height of the text. In the embodiment of the present application, the (x0, y0) is exemplarily set as the vertex of the lower left corner of the rectangular region. FIG. 10 illustrates a schematic of text recognition information in some embodiments.

In some embodiments, upon acquiring the image recognition information, the controller may determine a plurality of text recognition information included in the image recognition information, each text recognition information including a piece of recognition text and position information of the recognition text in the user interface.

The controller can match the control keywords with all the identification texts, so that the identification texts where the control keywords are located are determined and serve as the control keyword identification texts. After the control keyword identification text is determined, text identification information corresponding to the control keyword, namely the text identification information corresponding to the identification text, can be further determined and used as control keyword identification information. The control keyword identification information comprises a control keyword identification text and identification text position information corresponding to the identification text.

In some embodiments, the controller may obtain the position information of the control keyword in the user interface according to the control keyword identification information. Specifically, the control keyword recognition information may include character information, where the character information refers to position information of each character of the control keyword recognition text. The controller may determine position information of each character of the control keyword in the control keyword recognition text according to the character information. And then, the position information of the control key words in the user interface can be determined according to the position information of each character of the control key words.

In some embodiments, when the user interfaces displayed in the displays are different, the corresponding control keyword identification information is also different. However, the acquired control keyword identification information may or may not include character information. Therefore, before the position information of each character of the control keyword is obtained, the control keyword identification information can be detected first, and whether the character information is included in the control keyword identification information is judged.

When the control keyword identification information is detected to comprise character information, the position information of each character of the control keyword can be directly determined according to the character information.

When it is detected that the control keyword identification information does not include the character information, the controller may determine the character information according to the control keyword identification information.

In some embodiments, the controller may first obtain the size of the character as well as the null character.

Wherein the size of the character may be determined according to the position information of the control keyword recognition text, assuming that the position information of the control keyword recognition text is (x0, y0, W0, H0). The height of each character is the same, and the height of the text recognized by the control keywords is the height of the character H0.

The length and height of each character may be a fixed ratio, for example the length of the character is α H0, α is the ratio of length to height. The length and height of each character are exemplarily set to be the same in the embodiment of the present application. The size of the character is thus length-height H0.

The height of the character and the height of the character are the same, both being the height H0 of the text. However, the length and height of the null character may be different, and thus the length of the null character needs to be determined.

In some embodiments, the length of the null character can be set to a fixed value, and the length S of the null character can be set by the user_NC. For example, the null character length and the character length may be set to be the same, i.e., S_NC＝H0。

In some embodiments, a fixed ratio θ between the null character length and the character length may be set to 0.8. Namely S_NC＝θ*H0。

In some embodiments, the space character length S may be calculated by a preset space character formula_NC。

Specifically, the length of the control keyword recognition text is W0. The controller may determine a total number of characters N in the control keyword recognition text_CAnd the total number of null characters N_NC. And calculating the total length of the characters in the control keyword recognition text according to the total number of the characters and the length of the characters.

And calculating the difference value between the length W0 of the control keyword recognition text and the total length of the characters to obtain the total length of the empty characters. The length of the null character can be determined according to the total length of the null character and the total number of the null characters.

Specifically, the null character formula is:

in some embodiments, after the character length and the null character length are determined, position information for each character of the control keyword recognition text may be obtained.

For each character, the character can also be regarded as a rectangular area, and the position information of each character can be set to (xi, yi, Wi, Hi).

Specifically, the position information of the control keyword recognition text is (x0, y0, W0, H0). Since the length and height of the character are the same as the height of the text, Wi ═ Hi ═ H0, and yi ═ y 0. That is, the position information of each character is (xi, y0, H0, H0). The X coordinate of the vertex in the lower left corner of each character needs to be determined at this time.

In some embodiments, in determining the X coordinate of the vertex of the lower left corner of each character, the controller may first determine the number of characters and the number of empty characters between the first character and each character in the control keyword recognition text. The number of the first character and the number of the characters between each character are the serial number of each character, namely the character is the number of the characters in the control keyword recognition text.

The total length of the characters from the first character to each character can be calculated according to the number of the characters and set as the first character length. Similarly, the total length of the empty characters from the first character to each character can be calculated according to the number of the empty characters, and the total length is set as the first empty character length.

Therefore, the length from the first character to each character can be further determined according to the first character length and the first empty character length, and the X coordinate of the vertex of the lower left corner of each character can be obtained according to the length.

Specifically, the calculation formula of the X coordinate xi is:

wherein:

xi represents the X coordinate of the vertex of the lower left corner of the ith character of the control keyword recognition text; i represents the serial number of the character;

n represents the number of empty characters between the first character and the ith character in the control keyword recognition text;

S_NCindicating an empty character length.

According to the formula, the X coordinate of the vertex of the lower left corner of each character can be calculated, and the position information (xi, y0, H0, H0) of each character is further determined.

In some embodiments, the controller may obtain the X coordinate of the vertex of the lower left corner of each character according to the character interval, considering that there may be a character interval between adjacent characters, between adjacent characters and empty characters, and between adjacent empty characters.

In the embodiment of the present application, all the character intervals are set to be the same, and the length of the character interval is γ.

In acquiring the X coordinate of the vertex of the lower left corner of each character, the controller may calculate the sum of the number of characters between the first character and each character and the number of empty characters, as the first number. The total length of the character interval from the first character to each character can be calculated according to the first number and the character interval length.

The length from the first character to each character can be further determined according to the total length of the character interval, and the X coordinate of the vertex of the lower left corner of each character can be obtained according to the length.

Specifically, the calculation formula of the X coordinate xi is:

the formula for the calculation of the X coordinate xi can also be expressed as:

In some embodiments, after determining the position information of each character in the control keyword recognition text, each character of the control keyword may be screened from the control keyword recognition text and the position information of each character of the control keyword may be determined.

In some embodiments, the controller may determine the position information of the control keyword in its entirety in the user interface from the position information of each character of the control keyword.

Specifically, assume that the position information of the first character of the control keyword is (x1, y0, H0, H0), and the position information of the last character is (xz, y0, H0, H0).

The X coordinate of the vertex of the lower left corner of the control keyword is the X coordinate X1 of the first character, and the Y coordinate is Y0. The control key has a length of xz-x1+ H0 and a height of H0.

Therefore, the position information of the control keyword is (x1, y0, xz-x1+ H0, H0).

In some embodiments, the center coordinates of the control key may also be calculated as [ x1+ (xz-x1+ H0)/2, y0+ H0/2 ], i.e., [ (xz + x1+ H0)/2, y0+ H0/2 ]. The center coordinates may be used as position information of the control keyword.

In some embodiments, in addition to characters and null characters, the size of the special characters may need to be determined in consideration that some special characters, such as numbers and punctuation marks, may exist in the control keyword recognition text.

The size of each special character may be different, and considering that the control keyword recognition text is a line of text, it is possible to set the height of all the special characters to be the same as the height of the control keyword recognition text, that is, the height of all the special characters to be H0. At this time, the length of each special character needs to be determined. The special characters may include numeric type, english type, and punctuation type.

In some embodiments, the length of each digit may be considered the same for the special characters of the numeric type, and there is a fixed ratio a between the special characters of the numeric type and the characters, i.e., all the special characters of the numeric type have a length of a × H0.

In some embodiments, the length of each english letter may be different for english-type special characters, such as the english letters "i" and "m", and the length may be different when apparent. In this case, a fixed length may be set for each english alphabet. Considering the case of the english alphabet, the respective lengths of the 26 upper-case letters and the 26 lower-case letters can be set by the technician. For example, a developer setting for the functionality-related algorithm is triggered by the control. The embodiments of the present application are not limited. Take the example of the recognition text "chose main awards and strongest animation season and country creation", wherein the english letters "U" and "P" each have a specific length.

In some embodiments, for punctuation-type special characters, the controller can detect whether the punctuation is located in the last digit of the control keyword recognition text.

It should be noted that if the last position of the control keyword recognition text is a punctuation mark, the length of the punctuation mark is recognized to be smaller compared with the length of the character. The length of each punctuation mark needs to be determined. For example, for the recognition text "season of strongest animation! ", wherein the punctuation mark"! "is located in the last digit of the recognized text and is therefore shorter than the character length, in which case the punctuation mark"! "length of the film.

For conventional punctuation marks, such as commas, quotation marks and exclamation marks, a fixed length may be set, i.e. each punctuation mark has its specific length.

In some embodiments, for punctuation-type special characters, and punctuation symbols are not the last digit of the control keyword recognition text, the controller can detect the sum of the number of characters in the control keyword recognition text and the number of special characters.

And if the sum of the number of the characters and the number of the special characters is less than a preset control threshold value, the length of the punctuation mark type special characters in the recognized text is considered to be less than the character length. For example, the control threshold may be set to 6. For the recognition text "education, summer-heat vacation", the text includes four characters "education, summer-heat, vacation" and one punctuation mark "". The sum of the two characters is 5, which is less than the control threshold. Therefore, the control corresponding to the recognized text can be considered as a small control, wherein the length of the punctuation mark type special character is smaller than the length of the character. In this case, the length of the punctuation type special character can be determined to be a set fixed length.

And if the sum of the number of the characters and the number of the special characters is not less than a preset control threshold value, considering that the length of the punctuation mark type special characters in the recognized text is equal to the character length. For example, the control threshold is 6, and for the recognition text "education, summer vacation, study", the text includes six characters "teach, educate, summer, vacation, study" and two punctuation "". The sum of the two characters is 8, which is greater than the control threshold. Therefore, the length of the punctuation-type special character "is equal to the character length, i.e., H0.

In some embodiments, if the special characters are characters in other languages, such as japanese and russian, the length of each special character may be considered to be fixed, and there is a fixed ratio b to the character length, that is, the length of the special character of the type is b × H0.

In some embodiments, if a special character is detected, it is not set to a fixed length. For example, for an unconventional punctuation mark, which is not fixed in length, its length is considered equal to the character length H0.

In some embodiments, for a chinese interface, all other languages of characters are special characters.

For interfaces of other languages, it can be considered that characters of other languages except the language are all special characters. For example, for an english interface, the chinese characters are considered as special characters, and the lengths of all the chinese characters can be set to a fixed value.

In some embodiments, when determining the X coordinate of the vertex of the lower left corner of each character, the controller may calculate the total length of the character from the first character to each character and the total length of the empty character, and may also calculate the total length of the special character from the first character to each character.

In calculating the total length of the special characters, the controller may determine the number of the number type special characters and all the punctuation type and english type special characters from the first character to each character. And further calculating the total length of the digital special characters, the total length of the punctuation marks special characters and the total length of the English special characters.

The length from the first character to each character can be calculated according to all the lengths, and the X coordinate of the vertex of the lower left corner of each character is determined:

xi＝x0+L

where L is the length from the first character to the ith character.

After determining the X coordinate of the vertex of the lower left corner of each character, the position information of each character can be obtained, and the position information of the control key words in the user interface can be determined.

In some embodiments, after the position information of the control keyword in the user interface is obtained, the control at the position information may be triggered, where the control at the position information is a target control in the user speech.

Meanwhile, the controller can display the interface corresponding to the target control in the space-time display.

In some embodiments, the user-entered control-triggering instruction may not include a control keyword in the current user interface. For example, it may be that the user himself speaks an instruction without a control keyword, or that the control keyword does not exist in the user interface currently displayed by the display.

After the word segmentation processing is carried out on the control trigger text, the controller can analyze the word segmentation result and extract the keywords in the word segmentation result. And if the extracted keywords do not have the control keywords, the controller cannot trigger any control.

Or after the controller determines the control keywords, when the control keywords are matched with all the identification texts of the screenshot image, the identification texts where the control keywords are located are not matched, that is, when the identification texts of the control keywords do not exist, the controller cannot trigger any control.

At this time, the display may be controlled to display preset prompt information, where the prompt information is used to prompt the user that the relevant control is not searched.

In some embodiments, the hint information can employ a preset hint template. For example, the hint may be set to "sorry, not search for the relevant control, please re-determine". The prompt template may also contain related asset names, such as: sorry, control "a" not searched, please continue searching for other controls. FIG. 11 illustrates a schematic view of the display of a hint in a display in some embodiments.

In some embodiments, the controller may further convert the prompt message into a voice response, and play the voice response to notify the user.

An embodiment of the present application further provides a control triggering method, which is applied to a display device, and as shown in fig. 12, the method includes:

step 1201, responding to a control triggering instruction input by a user, and determining a control keyword in the control triggering instruction;

step 1202, determining position information of the control key words in a user interface;

step 1203, triggering a control at the position information.

The same and similar parts in the embodiments in this specification may be referred to one another, and are not described herein again.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method in the embodiments or some parts of the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, comprising:

a display configured to display a user interface;

the voice collector is configured to receive a voice instruction input by a user;

a controller configured to:

responding to a control triggering instruction input by a user, and determining a control keyword in the control triggering instruction;

determining position information of the control key words in a user interface;

and triggering a control at the position information.

2. The display device of claim 1, wherein the controller is further configured to:

in the step of determining the control keyword in the control triggering instruction,

converting the control triggering instruction into a control triggering text;

and performing word segmentation processing on the control trigger text to obtain control keywords.

3. The display device of claim 1, wherein the controller is further configured to:

in performing the step of determining the position information of the control keyword in the user interface,

screenshot processing is carried out on the user interface to obtain a screenshot image;

performing character recognition processing on the screenshot image to obtain image recognition information;

and acquiring the position information of the control key words in the user interface based on the image identification information.

4. The display device of claim 3, wherein the controller is further configured to:

in the step of acquiring the position information of the control keyword in the user interface based on the image recognition information,

determining a plurality of text recognition information contained in the image recognition information;

determining control keyword identification information corresponding to the control keywords in the plurality of text identification information, and acquiring position information of the control keywords in a user interface according to the control keyword identification information.

5. The display device according to claim 4, wherein the control keyword identification information includes control keyword identification text and identification text position information;

the controller is further configured to:

in the step of acquiring the position information of the control keyword in the user interface according to the control keyword identification information,

detecting the control keyword identification information;

when the control keyword identification information is detected to further comprise character information, determining the position information of each character of the control keyword according to the character information; when detecting that the control keyword identification information does not comprise character information, determining the character information according to the control keyword identification information; the character information comprises position information of each character of the control keyword recognition text;

and determining the position information of the control keyword in the user interface according to the position information of each character of the control keyword.

6. The display device of claim 5, wherein the controller is further configured to:

in performing the step of determining character information based on the control keyword recognition information,

determining the number of characters and the number of empty characters between a first character and each character in the control keyword recognition text, and acquiring the length of the characters and the length of the empty characters;

calculating a first character length from the first character to each character according to the character length and the character number; calculating a first empty character length from the first character to each character according to the empty character length and the empty character number;

calculating the length from the first character to each character according to the first character length and the first empty character length;

and determining the position information of each character according to the length from the first character to each character and the identification text position information.

7. The display device of claim 6, wherein the controller is further configured to:

in performing the step of obtaining the length of the empty character,

determining the length of the control keyword recognition text according to the recognition text position information;

determining the total number of characters and the total number of null characters in the control keyword recognition text, and calculating the total length of the characters in the control keyword recognition text according to the total number of the characters and the length of the characters;

calculating the difference value between the length of the control keyword recognition text and the total length of the characters;

and calculating the ratio of the difference to the total number of the null characters to obtain the length of the null characters.

8. The display device of claim 6, wherein the controller is further configured to:

in performing the step of calculating the length of the first character to each character based on the first character length and the first dummy character length,

acquiring the length of a character interval;

calculating the sum of the number of the characters and the number of the empty characters to obtain a first number;

calculating the total length of the character interval from the first character to each character according to the character interval length and the first number;

and calculating the sum of the first character length, the first empty character length and the total length of the character interval to obtain the length from the first character to each character.

9. A control triggering method is applied to display equipment and is characterized by comprising the following steps:

determining position information of the control key words in a user interface;

and triggering a control at the position information.

10. The control triggering method of claim 9, wherein determining the position information of the control keyword in the user interface comprises: