CN111885400A

CN111885400A - Media data display method, server and display equipment

Info

Publication number: CN111885400A
Application number: CN202010756075.8A
Authority: CN
Inventors: 杜永花; 崔保磊; 李含珍
Original assignee: Qingdao Hisense Media Network Technology Co Ltd
Current assignee: Qingdao Hisense Media Network Technology Co Ltd; Juhaokan Technology Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-11-03

Abstract

The application discloses a media data display method, which is used for a server and comprises the following steps: receiving current screen state information sent by display equipment; receiving voice text information sent by display equipment; performing semantic understanding on the voice text information to obtain media data keyword information; acquiring matched media data based on the media data keyword information and the current screen state information; sending the matched media data to the display equipment; furthermore, semantic understanding is carried out on the voice text information to obtain media data keyword information and user-specified screen state information. In addition, the application also discloses a server and display equipment. The design of the display method can display the corresponding media data content based on the current screen state of the display equipment, so that the user experience can be effectively improved.

Description

Media data display method, server and display equipment

Technical Field

The present application relates to the technical field of display devices, and in particular, to a media data display method, a server, and a display device.

Background

With the rapid development of economy and society, people have an increasing demand for searching videos and watching videos on intelligent display devices (such as intelligent televisions). Moreover, with the maturity of voice technology, video searching and watching operations through voice remote control are also more and more mature.

However, in the prior art, when the user performs the video operation by voice, the following problems still exist:

the voice media asset search cannot distinguish the media asset content adaptive to the screen according to the state of the television, namely, the corresponding media asset content cannot be displayed according to the current vertical screen or horizontal screen state of the television.

Disclosure of Invention

The technical problem to be solved by the application is to provide a media data display method, and the design of the display method can display corresponding media data content based on the current screen state of a display device, so that the user experience can be effectively improved. In addition, in order to solve the technical problem, the application further provides a server. Moreover, in order to solve the above technical problem, the present application further provides a display device.

In order to solve the foregoing technical problem, a first aspect of the present application provides a media data display method for a server, where the media data display method includes:

receiving current screen state information and voice information sent by display equipment;

acquiring matched media data based on the voice information and the current screen state information, wherein the matched media data are different under different screen states corresponding to the same voice information;

and sending the matched media data to the display equipment.

In addition, to solve the above technical problem, a second aspect of the present application provides a media data presentation method for a display device, where the media data presentation method includes:

acquiring current screen state information of the display equipment, and identifying and acquiring voice information based on voice operation of a user;

sending the current screen state information and the voice information to a server so that the server can obtain matched media data based on the voice information and the current screen state information, wherein the matched media data in different screen states are different corresponding to the same voice information;

and receiving the matched media data sent by the server and displaying the media data in the current screen state.

Further, to solve the above technical problem, a third aspect of the present application provides a server, including:

the receiving module is used for receiving current screen state information and voice information sent by the display equipment;

a media data obtaining module, configured to obtain matched media data based on the voice information and the current screen state information, where the matched media data in different screen states are different corresponding to the same voice information;

and the sending module is used for sending the matched media data to the display equipment.

Finally, to solve the above technical problem, a fourth aspect of the present application provides a display device comprising:

a communicator for communicating with a service;

a display for displaying an image and a user interface, and a selector in the user interface for indicating that an item is selected;

a controller configured to:

The technical effects of the embodiments of the present application are described below:

in one embodiment, the media data presentation method provided by the present application includes the following steps:

receiving current screen state information and voice information sent by display equipment; in this step, it should be noted that the display device may be an intelligent television, and certainly, may also be other display devices for video playing. Explanation is made on the screen status: the screen state can be a horizontal screen state or a vertical screen state, and therefore the screen state information is information representing whether the screen is in the horizontal screen state or the vertical screen state.

Receiving voice information sent by display equipment; in this step, the specific process may be: the user inputs voice through a voice inlet on a remote controller or a mobile phone, then corresponding voice programs on the display equipment perform voice recognition, for example, a product 'voice assistant' is adopted for recognition, and then recognized voice information is sent to a corresponding server.

And acquiring matched media data based on the voice information and the current screen state information, wherein the matched media data in different screen states are different corresponding to the same voice information. Further, semantic understanding is carried out on the voice information to obtain media data keyword information; in the step, the server receives the voice information sent by the display device, and performs word segmentation understanding on the voice text, so as to obtain the media data keyword information. For example, the voice recognition text acquired by the server is "i want to see action piece", the text is segmented into words, i "," want to see "and" action piece ", and the media asset search condition of" search "+" action piece type "+" vertical screen "is understood by combining the parameter of" vertical screen ", that is, the screen state information of the vertical screen.

Acquiring matched media data based on the media data keyword information and the current screen state information; in this step, as exemplified above, the server obtains media data, that is, media data of an action in the portrait state, based on a media asset search condition for searching for "+" action type "+" portrait ".

And sending the matched media data to the display equipment. In this step, the display device receives the corresponding media data, and then displays the media data of the action piece in the vertical screen state, and the media data of the action piece is the media data suitable for being displayed in the vertical screen state.

In summary, the media data display method provided by the application can display the corresponding media data content based on the current screen state of the display device, so that the user experience can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the implementation manner in the related art, a brief description will be given below of the drawings required for the description of the embodiments or the related art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a schematic diagram illustrating an operational scenario between a display device and a control apparatus according to some embodiments;

a block diagram of a hardware configuration of a display device 200 according to some embodiments is illustrated in fig. 2;

a block diagram of the hardware configuration of the control device 100 according to some embodiments is illustrated in fig. 3;

a schematic diagram of a software configuration in a display device 200 according to some embodiments is illustrated in fig. 4;

FIG. 5 illustrates an icon control interface display diagram of an application in the display device 200, according to some embodiments;

FIG. 6 is a logic flow diagram of a method of presenting media data in an embodiment of the present application;

FIG. 7 is a logic flow diagram of a method of presenting media data in accordance with another embodiment of the present application;

FIG. 8 is a system configuration diagram illustrating a method for presenting media data according to an embodiment of the present application;

FIG. 9 is a signaling timing diagram illustrating a method for media data presentation according to an embodiment of the present application;

fig. 10 is a functional block diagram of a server in an embodiment of the present application.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence of any particular one, Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The term "module," as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The term "remote control" as used in this application refers to a component of an electronic device (such as the display device disclosed in this application) that is typically wirelessly controllable over a relatively short range of distances. Typically using infrared and/or Radio Frequency (RF) signals and/or bluetooth to connect with the electronic device, and may also include WiFi, wireless USB, bluetooth, motion sensor, etc. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in the common remote control device with the user interface in the touch screen.

The term "gesture" as used in this application refers to a user's behavior through a change in hand shape or an action such as hand motion to convey a desired idea, action, purpose, or result.

Fig. 1 is a schematic diagram illustrating an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the display device 200 through the mobile terminal 300 and the control apparatus 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, etc., and the display device 200 is controlled by wireless or other wired methods. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. Such as: the user can input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to implement the function of controlling the display device 200.

In some embodiments, mobile terminals, tablets, computers, laptops, and other smart devices may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device. The application, through configuration, may provide the user with various controls in an intuitive User Interface (UI) on a screen associated with the smart device.

In some embodiments, the mobile terminal 300 may install a software application with the display device 200 to implement connection communication through a network communication protocol for the purpose of one-to-one control operation and data communication. Such as: the mobile terminal 300 and the display device 200 can establish a control instruction protocol, synchronize a remote control keyboard to the mobile terminal 300, and control the display device 200 by controlling a user interface on the mobile terminal 300. The audio and video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.

As also shown in fig. 1, the display apparatus 200 also performs data communication with the server 400 through various communication means. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. Illustratively, the display device 200 receives software program updates, or accesses a remotely stored digital media library, by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers. Other web service contents such as video on demand and advertisement services are provided through the server 400.

The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The particular display device type, size, resolution, etc. are not limiting, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired.

The display apparatus 200 may additionally provide an intelligent network tv function of a computer support function including, but not limited to, a network tv, an intelligent tv, an Internet Protocol Tv (IPTV), and the like, in addition to the broadcast receiving tv function.

A hardware configuration block diagram of a display device 200 according to an exemplary embodiment is exemplarily shown in fig. 2.

In some embodiments, at least one of the controller 250, the tuner demodulator 210, the communicator 220, the detector 230, the input/output interface 255, the display 275, the audio output interface 285, the memory 260, the power supply 290, the user interface 265, and the external device interface 240 is included in the display apparatus 200.

In some embodiments, a display 275 receives image signals originating from the first processor output and displays video content and images and components of the menu manipulation interface.

In some embodiments, the detector 230 may further include an image collector, such as a camera, etc., which may be configured to collect external environment scenes, collect attributes of the user or gestures interacted with the user, adaptively change display parameters, and recognize user gestures, so as to implement a function of interaction with the user.

In some embodiments, the detector 230 may also include a temperature sensor or the like, such as by sensing ambient temperature.

In some embodiments, the display apparatus 200 may adaptively adjust a display color temperature of an image. For example, the display apparatus 200 may be adjusted to display a cool tone when the temperature is in a high environment, or the display apparatus 200 may be adjusted to display a warm tone when the temperature is in a low environment.

In some embodiments, the detector 230 may also be a sound collector or the like, such as a microphone, which may be used to receive the user's voice. Illustratively, a voice signal including a control instruction of the user to control the display device 200, or to collect an ambient sound for recognizing an ambient scene type, so that the display device 200 can adaptively adapt to an ambient noise.

In some embodiments, as shown in fig. 2, the input/output interface 255 is configured to allow data transfer between the controller 250 and external other devices or other controllers 250. Such as receiving video signal data and audio signal data of an external device, or command instruction data, etc.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: the interface can be any one or more of a high-definition multimedia interface (HDMI), an analog or data high-definition component input interface, a composite video input interface, a USB input interface, an RGB port and the like. The plurality of interfaces may form a composite input/output interface.

In some embodiments, as shown in fig. 2, the tuning demodulator 210 is configured to receive a broadcast television signal through a wired or wireless receiving manner, perform modulation and demodulation processing such as amplification, mixing, resonance, and the like, and demodulate an audio and video signal from a plurality of wireless or wired broadcast television signals, where the audio and video signal may include a television audio and video signal carried in a television channel frequency selected by a user and an EPG data signal.

In some embodiments, the frequency points demodulated by the tuner demodulator 210 are controlled by the controller 250, and the controller 250 can send out control signals according to user selection, so that the modem responds to the television signal frequency selected by the user and modulates and demodulates the television signal carried by the frequency.

As shown in fig. 2, the controller 250 includes at least one of a Random Access Memory 251 (RAM), a Read-Only Memory 252 (ROM), a video processor 270, an audio processor 280, other processors 253 (e.g., a Graphics Processing Unit (GPU), a central Processing Unit 254 (CPU), a Communication Interface (Communication Interface), and a Communication Bus 256(Bus), which connects the respective components.

In some embodiments, RAM 251 is used to store temporary data for the operating system or other programs that are running.

In some embodiments, ROM 252 is used to store instructions for various system boots.

In some embodiments, the ROM 252 is used to store a Basic Input Output System (BIOS). The system is used for completing power-on self-test of the system, initialization of each functional module in the system, a driver of basic input/output of the system and booting an operating system.

In some embodiments, the video processor 270 is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image synthesis, and the like according to a standard codec protocol of the input signal, so as to obtain a signal that can be displayed or played on the direct display device 200.

In some embodiments, the graphics processor 253 and the video processor may be integrated or separately configured, and when the graphics processor and the video processor are integrated, the graphics processor and the video processor may perform processing of graphics signals output to the display, and when the graphics processor and the video processor are separately configured, the graphics processor and the video processor may perform different functions, respectively, for example, a GPU + frc (frame Rate conversion) architecture.

In some embodiments, the audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processes to obtain an audio signal that can be played in a speaker.

In some embodiments, video processor 270 may comprise one or more chips. The audio processor may also comprise one or more chips.

In some embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated together with the controller in one or more chips.

In some embodiments, the audio output, under the control of controller 250, receives sound signals output by audio processor 280, such as: the speaker 286, and an external sound output terminal of a generating device that can output to an external device, in addition to the speaker carried by the display device 200 itself, such as: external sound interface or earphone interface, etc., and may also include a near field communication module in the communication interface, for example: and the Bluetooth module is used for outputting sound of the Bluetooth loudspeaker.

The power supply 290 supplies power to the display device 200 from the power input from the external power source under the control of the controller 250. The power supply 290 may include a built-in power supply circuit installed inside the display apparatus 200, or may be a power supply interface installed outside the display apparatus 200 to provide an external power supply in the display apparatus 200.

A user interface 265 for receiving an input signal of a user and then transmitting the received user input signal to the controller 250. The user input signal may be a remote controller signal received through an infrared receiver, and various user control signals may be received through the network communication module.

In some embodiments, the user inputs a user command through the control apparatus 100 or the mobile terminal 300, the user input interface responds to the user input through the controller 250 according to the user input, and the display device 200 responds to the user input through the controller 250.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display 275, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

Fig. 3 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 3, the control apparatus 100 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply source.

The control device 100 is configured to control the display device 200 and may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200. Such as: the user responds to the channel up and down operation by operating the channel up and down keys on the control device 100.

In some embodiments, the control device 100 may be a smart device. Such as: the control apparatus 100 may install various applications that control the display apparatus 200 according to user demands.

In some embodiments, as shown in fig. 1, a mobile terminal 300 or other intelligent electronic device may function similar to the control device 100 after installing an application that manipulates the display device 200. Such as: the user may implement the functions of controlling the physical keys of the device 100 by installing applications, various function keys or virtual buttons of a graphical user interface available on the mobile terminal 300 or other intelligent electronic device.

The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller is used to control the operation of the control device 100, as well as the communication cooperation between the internal components and the external and internal data processing functions.

The communication interface 130 enables communication of control signals and data signals with the display apparatus 200 under the control of the controller 110. Such as: the received user input signal is transmitted to the display apparatus 200. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.

A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, and other input interfaces. Such as: the user can realize a user instruction input function through actions such as voice, touch, gesture, pressing, and the like, and the input interface converts the received analog signal into a digital signal and converts the digital signal into a corresponding instruction signal, and sends the instruction signal to the display device 200.

The output interface includes an interface that transmits the received user instruction to the display apparatus 200. In some embodiments, the interface may be an infrared interface or a radio frequency interface. Such as: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. The following steps are repeated: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then the digital signal is modulated according to the rf control signal modulation protocol and then transmitted to the display device 200 through the rf transmitting terminal.

In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is provided with a communication interface 130, such as: the WiFi, bluetooth, NFC, etc. modules may transmit the user input command to the display device 200 through the WiFi protocol, or the bluetooth protocol, or the NFC protocol code.

A memory 190 for storing various operation programs, data and applications for driving and controlling the control apparatus 200 under the control of the controller. The memory 190 may store various control signal commands input by a user.

And a power supply 180 for providing operational power support to the various elements of the control device 100 under the control of the controller. A battery and associated control circuitry.

In some embodiments, the system may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Referring to fig. 4, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer from top to bottom.

In some embodiments, at least one application program runs in the application program layer, and the application programs can be Window (Window) programs carried by an operating system, system setting programs, clock programs, camera applications and the like; or may be an application developed by a third party developer such as a hi program, a karaoke program, a magic mirror program, or the like. In specific implementation, the application packages in the application layer are not limited to the above examples, and may actually include other application packages, which is not limited in this embodiment of the present application.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 4, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an activity manager (ActivityManager) is used to interact with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a notification manager (notifiationmanager) for controlling display and clearing of notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the core layer includes at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (such as fingerprint sensor, temperature sensor, touch sensor, pressure sensor, etc.), and so on.

In some embodiments, the kernel layer further comprises a power driver module for power management.

In some embodiments, software programs and/or modules corresponding to the software architecture of fig. 4 are stored in the first memory or the second memory shown in fig. 2 or 3.

In some embodiments, as shown in fig. 5, the application layer containing at least one application may display a corresponding icon control in the display, such as: the system comprises a live television application icon control, a video on demand application icon control, a media center application icon control, an application center icon control, a game application icon control and the like.

In some embodiments, the live television application may provide live television via different signal sources. For example, a live television application may provide television signals using input from cable television, radio broadcasts, satellite services, or other types of live television services. And, the live television application may display video of the live television signal on the display device 200.

In some embodiments, a video-on-demand application may provide video from different storage sources. Unlike live television applications, video on demand provides a video display from some storage source. For example, the video on demand may come from a server side of the cloud storage, from a local hard disk storage containing stored video programs.

In some embodiments, the media center application may provide various applications for multimedia content playback. For example, a media center, which may be other than live television or video on demand, may provide services that a user may access to various images or audio through a media center application.

In some embodiments, an application center may provide storage for various applications. The application may be a game, an application, or some other application associated with a computer system or other device that may be run on the smart television. The application center may obtain these applications from different sources, store them in local storage, and then be operable on the display device 200.

Referring to fig. 6, fig. 6 is a logic flow diagram of a media data presentation method according to an embodiment of the present application.

s101, receiving current screen state information and voice information sent by display equipment;

in this step, it should be noted that the display device may be an intelligent television, and certainly, may also be other display devices for video playing. Explanation is made on the screen status: the screen state can be a horizontal screen state or a vertical screen state, and therefore the screen state information is information representing whether the screen is in the horizontal screen state or the vertical screen state.

In this step, the specific process may be: the user inputs voice through a voice inlet on a remote controller or a mobile phone, then corresponding voice programs on the display equipment perform voice recognition, for example, a product 'voice assistant' is adopted for recognition, and then recognized voice information is sent to a corresponding server.

Step S102, acquiring matched media data based on the voice information and the current screen state information, wherein the matched media data are different under different screen states corresponding to the same voice information;

further, in this step, the server receives the voice information sent by the display device, and performs word segmentation understanding on the voice, thereby obtaining the media data keyword information.

For example, the voice recognition text acquired by the server is "i want to see action piece", the text is segmented into words, i "," want to see "and" action piece ", and the media asset search condition of" search "+" action piece type "+" vertical screen "is understood by combining the parameter of" vertical screen ", that is, the screen state information of the vertical screen.

In this step, as exemplified above, the server obtains media data, that is, media data of an action in the portrait state, based on a media asset search condition for searching for "+" action type "+" portrait ".

And S103, sending the matched media data to the display equipment.

In this step, the display device receives the corresponding media data, and then displays the media data of the action piece in the vertical screen state, and the media data of the action piece is the media data suitable for being displayed in the vertical screen state.

In some embodiments, further designs may be made. For example, in step S102, the obtaining matched media data based on the voice information and the current screen state information includes:

performing semantic understanding on the voice information according to the current screen state information to obtain media data keyword information, wherein the obtained media data keyword information is different in different screen states corresponding to the same voice information;

and acquiring matched media data based on the media data keyword information and the current screen state information.

This is illustrated by way of example:

for example, in the vertical screen state, the user says "i want to see a fire fighter", at this time, because of the vertical screen state, semantic understanding is performed on the voice information based on the vertical screen state, the obtained media data keyword information is "fire fighter + cartoon", and at this time, the cartoon media data is displayed to the user.

Further designs may be made in some embodiments. For example, in step S102, the obtaining matched media data based on the voice information and the current screen state information includes:

performing semantic understanding on the voice information to obtain media data keyword information;

This is illustrated by way of example:

for example, in the vertical screen state, the user says "i want to see a movie breaker", at this time, because of the horizontal screen state, the speech information is semantically understood, the obtained media data keyword information is "movie and television series of the movie and television series", and at this time, the media data of the movie and television series is shown to the user.

In some embodiments, further designs may be made. For example, in the above steps, semantic understanding is performed on the voice information to obtain media data keyword information; this step may include:

performing semantic understanding on the voice information to obtain media data keyword information and user-specified screen state information;

that is, the user also specifies the screen state information while performing the input of the voice text. For example, in the example of the aforementioned "i want to see the action piece", when the user inputs a voice through the voice entry of the remote controller, in addition to inputting the "i want to see the action piece", a voice of the "vertical screen state" is also input. For example, the input speech of the user may be "i want to see action, vertical screen status".

In the process, the design of the screen state information specified by the user is further added, so that the user experience can be further improved.

In the above steps, based on the media data keyword information and the current screen state information, obtaining matched media data; the method comprises the following steps:

and acquiring matched media data based on the media data keyword information and the screen state information specified by the user.

In this process, the current screen state information may or may not coincide with the screen state information specified by the user. And when the screen state information is inconsistent with the keyword information, the server searches according to the screen state information and the keyword information specified by the user. Therefore, there is a step of judging whether the user designates the screen status information, and if not, searching for corresponding media data according to the current screen status information; if so, searching corresponding media data according to the screen state information specified by the user.

In some embodiments, further designs may be made. For example, the server includes a semantic server and a media resource server; that is, in the above embodiment, the server is a concept of a cluster, and may include one server or refer to a collection of multiple servers. In this embodiment, the server includes a virtual server and a media asset server, and on this basis,

the semantic server receives the current screen state information and the voice information; that is, the semantic server is used for performing semantic understanding, that is, performing semantic word segmentation, as described above, the speech recognition text acquired by the semantic server is "i want to see action piece", and the text is segmented into "i", "want to see" and "action piece".

The semantic understanding of the voice information to obtain media data keyword information includes:

the semantic server carries out semantic understanding on the voice information to obtain media data keyword information;

that is, the semantic server is used for performing semantic understanding, that is, performing semantic word segmentation, as described above, the speech recognition text acquired by the semantic server is "i want to see action piece", and the text is segmented into "i", "want to see" and "action piece". In this example, the "action" is the media data keyword information.

The obtaining of matched media data based on the media data keyword information and the current screen state information includes:

the media resource server receives the media data keyword information and the current screen state information sent by the semantic server, and acquires the matched media data; that is, in the above example, the semantic server sends the keyword information "action piece" and the current screen state information or the screen state information specified by the user to the media asset server, and then the media asset server performs the "action piece" search in the corresponding "vertical screen media asset library" or "horizontal screen media asset library" according to the keyword information "action piece" and obtains the corresponding "action piece" media data.

The sending the matched media data to the display device includes:

the media resource server sends the matched media data to the semantic server; in the step, the media resource server sends the retrieved media data line to the semantic server, and then the semantic server sends the retrieved media data line to the corresponding display device.

And the semantic server sends the matched media data to the display equipment.

In some embodiments, additional designs may also be made.

In addition, the present application provides another embodiment, please refer to fig. 7, and fig. 7 is a logic flow diagram of a media data presentation method according to another embodiment of the present application.

As shown in fig. 7, in this embodiment, the media data presentation method for a display device includes the following steps:

step S201, acquiring current screen state information of the display equipment, and identifying and acquiring voice information based on voice operation of a user;

Step S202, sending the current screen state information and the voice information to a server so that the server can obtain matched media data based on the voice information and the current screen state information, wherein the matched media data in different screen states are different corresponding to the same voice information;

in this step, further, the server receives the voice message sent by the display device, and performs word segmentation understanding on the voice text, so as to obtain the media data keyword information. For example, the voice recognition text acquired by the server is "i want to see action piece", the text is segmented into words, i "," want to see "and" action piece ", and the media asset search condition of" search "+" action piece type "+" vertical screen "is understood by combining the parameter of" vertical screen ", that is, the screen state information of the vertical screen.

And S203, receiving the matched media data sent by the server and displaying the matched media data in the current screen state.

In this step, as exemplified above, the server obtains media data, that is, media data of an action in the portrait state, based on a media asset search condition for searching for "+" action type "+" portrait ". The display device receives the corresponding media data, and then displays the media data of the action piece in the vertical screen state, wherein the media data of the action piece is the media data suitable for being displayed in the vertical screen state.

In some embodiments, further designs may be made. For example, in step S202, the "so that the server obtains the matched media data based on the voice information and the current screen state information" includes:

Further designs may be made in some embodiments. For example, in step S202, the "so that the server obtains the matched media data based on the voice information and the current screen state information" includes:

so that the server can carry out semantic understanding on the voice information to obtain media data keyword information;

and the server obtains matched media data based on the media data keyword information and the current screen state information.

In some embodiments, further designs may be made. For example, in the above step, the obtaining, by the server, media data keyword information based on semantic understanding of the voice information includes:

so that the server obtains media data keyword information and user-specified screen state information based on semantic understanding of the voice information;

The step of enabling the server to obtain matched media data based on the media data keyword information and the current screen state information includes:

the server obtains matched media data based on the media data keyword information and the screen state information specified by the user;

The receiving the matched media data sent by the server and displaying the matched media data in a current screen state comprises the following steps:

and receiving the matched media data sent by the server, and displaying the matched media data in a screen state appointed by a user.

In this process, if the screen state specified by the user is not consistent with the current screen state, the current screen state needs to be changed to the screen state specified by the user, and then the media data content is displayed.

In some embodiments, further improvements may be made. For example, in the above step, the current screen state information and the voice information are sent to a server, so that the server obtains media data keyword information based on semantic understanding of the voice information; and the server obtains matched media data based on the media data keyword information and the current screen state information, and the method comprises the following steps:

sending the current screen state information and the voice information to a semantic server so that the semantic server can obtain media data keyword information based on semantic understanding of the voice information; the media resource server obtains matched media data based on the media data keyword information and the current screen state information and sends the matched media data to the semantic server;

and receiving the matched media data sent by the semantic server and displaying the matched media data in the current screen state.

In the above step, the semantic server receives the current screen state information and the voice information; that is, the semantic server is used for performing semantic understanding, that is, performing semantic word segmentation, as described above, the speech recognition text acquired by the semantic server is "i want to see action piece", and the text is segmented into "i", "want to see" and "action piece". The media resource server receives the media data keyword information and the current screen state information sent by the semantic server, and acquires the matched media data; that is, in the above example, the semantic server sends the keyword information "action piece" and the current screen state information or the screen state information specified by the user to the media asset server, and then the media asset server performs the "action piece" search in the corresponding "vertical screen media asset library" or "horizontal screen media asset library" according to the keyword information "action piece" and obtains the corresponding "action piece" media data.

In conjunction with any of the above-described embodiments, the system architecture diagram of the present application may be introduced, and specifically, please refer to fig. 8, where fig. 8 is a system scheme architecture diagram of a media data presentation method according to an embodiment of the present application.

In this embodiment, as shown in fig. 8, the system architecture includes a complete machine platform, a voice terminal, a semantic cloud platform, and a media asset management platform, which are respectively introduced as follows:

the whole machine platform is also a display device, such as an intelligent television. The display device comprises two modules: the device comprises a Sensor position sensing module and a motor rotating module, wherein the Sensor position sensing module is used for sensing the current state of a screen, such as the vertical screen state or the horizontal screen state. And the television rotation module is used for selecting the screen state, such as turning the screen from a horizontal screen state to a vertical screen state.

The voice terminal, as introduced above, is a voice app on the display device, such as the product "voice assistant". The voice terminal comprises three modules: the device comprises a voice recognition module, a screen state storage module and a media asset list display module. The voice recognition module is used for recognizing the input voice of the user; the screen state storage module is used for storing the vertical screen state or the horizontal screen state of the screen; the display media asset list module is used for displaying a media asset list, for example, displaying the media asset list in a horizontal screen state by using a horizontal screen, and displaying the media asset list in a vertical screen state by using a vertical screen.

The semantic cloud platform is also the semantic server in the above, and comprises two modules: a semantic understanding module, as in the semantic understanding example in the above embodiments. The device also comprises a search condition generation module which comprises search conditions of horizontal screen media assets and search conditions of vertical screen media assets.

The media asset management platform, namely the media asset server in the above, includes a media asset search module, which is used for performing corresponding media asset search in the horizontal screen media asset database and the vertical screen media asset database according to the search condition.

With reference to fig. 8 and the above description, the operation of the system is performed as follows:

and 1, the voice terminal equipment is responsible for acquiring the screen state and reporting the voice recognition text and the screen state of the user to a semantic background.

2. The semantic cloud platform understands the media asset searching intention of a user in a specified horizontal screen, vertical screen or any screen state based on the text and the screen state of the user, and generates a searching condition request media asset management platform.

3. And the media asset management platform retrieves the media asset data supporting the current screen state based on the search condition and returns the media asset data to the semantic cloud platform.

4. And the semantic cloud platform sends the searched media asset data to the voice terminal.

5. And displaying the media asset data on the voice terminal interface, informing the whole machine to start a playing interface after rotating if the media asset selected by the user is inconsistent with the screen state, and directly starting the playing interface if the media asset selected by the user is consistent with the screen state.

The further detailed process is described as follows:

1. and the voice terminal monitors the screen state of the television and stores the screen state information of the horizontal screen or the vertical screen.

2. After the user speech recognizes the text, additional screen state is reported to the semantic cloud platform.

3. Semantically understanding a user intention, and if a screen state is designated, searching for media asset data corresponding to the screen state.

The uploading parameter and semantic background processing process of the voice terminal is as follows:

1) and the voice terminal uploads the voice recognition content and the screen state to the semantic background.

2) And the semantic background acquires the identification content and performs word segmentation processing.

3) And matching the semantic background with word banks of screen states such as a horizontal screen, a vertical screen and the like based on the results after word segmentation, extracting the screen states, and extracting the media asset searching conditions through semantic analysis.

4) And the semantic background sends the screen state and the media asset searching conditions to the media asset management platform.

5) And the media asset management platform queries the media asset library corresponding to the screen state, retrieves the media asset data and returns the retrieved media asset data to the semantic background.

6) And the semantic background issues the media asset data to the terminal, and a data list is displayed.

4. And if the user does not specify the screen state, searching the media asset data of the screen state of the user.

1) and performing word segmentation and semantic analysis on the recognized text by the semantic background, extracting the media resource searching conditions if the recognized text is not matched with the screen state.

2) And the semantic background sends the media asset searching conditions to the media asset management platform.

3) And the media asset management platform queries the media asset libraries of the horizontal screen and the vertical screen based on the search conditions, retrieves the media asset data and returns the retrieved data to the semantic background.

4) And the semantic background issues the media asset data to the terminal, and a data list is displayed.

5. And after the semantic cloud platform acquires the media asset data, the media asset data is issued to a voice terminal UI for display.

6. The user selects to play the video, and the video is directly started in accordance with the screen state, and the television is rotated first and then started in accordance with the video.

Referring to fig. 9, fig. 9 is a signaling timing diagram of a media data presentation method according to an embodiment of the present application, which introduces a signaling timing sequence in the above system architecture scheme:

1. under the condition that the television is vertically displayed, a user acquires recording by using a voice remote controller, and a voice terminal acquires a recognition text 'i want to watch action films' through a voice recognition engine.

2. The voice terminal acquires that the current screen state of the television is a vertical screen, and sends the screen state and the recognition text to the semantic cloud background.

3. The semantic cloud platform acquires the recognition text 'i want to see the action piece', the words are 'i', 'want to see' and 'action piece', and the media asset searching conditions of 'searching' + 'action piece type' + 'vertical screen' are understood by combining the parameters of 'vertical screen'.

4. And the semantic cloud platform sends the media asset searching conditions to the media asset management platform, and the media asset management platform queries the media asset data with the conditions of 'action type' + 'vertical screen' from the media asset library.

5. And the media asset management platform sends the searched media asset data to the semantic cloud platform and sends the searched media asset data to the voice terminal.

6. And the UI below the vertical screen of the voice terminal displays the action piece media resources of the vertical screen.

In addition, the present application also provides an embodiment of an apparatus corresponding to the method of the server side, please refer to fig. 10, where fig. 10 is a functional block diagram of a server in an embodiment of the present application.

In one embodiment, the present application provides a server, including:

the receiving module 301 is configured to receive current screen state information sent by the display device and receive voice information sent by the display device;

a media data obtaining module 302, configured to obtain matched media data based on the voice information and the current screen state information, where the matched media data in different screen states are different corresponding to the same voice information;

a sending module 303, configured to send the matched media data to the display device.

Further, the server comprises a semantic server and a media resource server; the receiving module may include a first receiving module and a second receiving module;

the semantic server comprises the first receiving module, the second receiving module and the semantic understanding module;

the media resource server comprises the media data acquisition module;

the sending module comprises a first sending module and a second sending module, and the media asset server comprises a first sending module used for sending the matched media data to the semantic server; the semantic server comprises a second sending module, which is used for sending the received matched media data to the display equipment.

The working process and technical effects in the above embodiments are the same as those in the above method embodiments, and therefore are not described herein again.

Furthermore, the present application also provides an apparatus embodiment corresponding to the method of the display device side, where the display device includes:

a communicator for communicating with a service;

a controller configured to:

Further, the controller is configured to:

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A media data display method is used for a server, and is characterized in that the media data display method comprises the following steps:

and sending the matched media data to the display equipment.

2. The method as claimed in claim 1, wherein said obtaining matched media data based on said voice information and said current screen status information comprises:

3. The method as claimed in claim 1, wherein said obtaining matched media data based on said voice information and said current screen status information comprises:

4. A method of media data presentation according to claim 3,

the step of performing semantic understanding on the voice information to obtain media data keyword information comprises the following steps:

the obtaining matched media data based on the media data keyword information and the current screen state information includes:

5. The media data presentation method of claim 3, wherein the server comprises a semantic server and a media asset server;

the semantic server receives the current screen state information and the voice information;

the media resource server receives the media data keyword information and the current screen state information sent by the semantic server, and acquires the matched media data;

the "transmitting the matched media data to the display device" includes:

the media resource server sends the matched media data to the semantic server;

and the semantic server sends the matched media data to the display equipment.

6. A media data display method is used for display equipment, and is characterized in that the media data display method comprises the following steps:

7. The method as claimed in claim 6, wherein said "for said server to obtain matching media data based on said voice information and said current screen status information" comprises:

8. The method as claimed in claim 6, wherein said "for said server to obtain matching media data based on said voice information and said current screen status information" comprises:

9. The method as claimed in claim 8, wherein said "for the server to semantically understand the voice message, and obtaining the media data keyword information" includes:

so that the server can carry out semantic understanding on the voice information to obtain media data keyword information and user-specified screen state information;

the step of receiving the matched media data sent by the server and displaying the matched media data in the current screen state comprises the following steps:

10. The method of claim 8, wherein the media data is presented,

the "transmitting the current screen state information and the voice information to a server" includes:

sending the current screen state information and the voice information to a semantic server;

the step of "so that the server obtains the media data keyword information based on the semantic understanding of the voice text information" includes:

so that the semantic server obtains media data keyword information based on semantic understanding of the voice text information;

the media resource server obtains matched media data based on the media data keyword information and the current screen state information and sends the matched media data to the semantic server;

11. A server, comprising:

12. A display device, comprising:

a communicator for communicating with a service;

a controller configured to: