CN112988099A

CN112988099A - Video display method and device

Info

Publication number: CN112988099A
Application number: CN202110382921.9A
Authority: CN
Inventors: 胡其斌
Original assignee: Shanghai Zhangmen Science and Technology Co Ltd
Current assignee: Shanghai Zhangmen Science and Technology Co Ltd
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-06-18

Abstract

The application discloses a video display method and device, and relates to the technical field of artificial intelligence of computer vision and deep learning. The specific implementation mode comprises the following steps: acquiring a text, determining keywords of the text, and acquiring a picture corresponding to the keywords, wherein the keywords exist in each sentence of the text; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the pictures corresponding to the keywords of the sentence in the acquired pictures; and in response to the fact that the picture corresponding to the keyword of each sentence is rendered or a display stopping instruction is received, ending the video display flow. The method and the device can convert the text into the video in real time, so that the paper static display of the article is converted into the dynamic playing of the video, the text can be converted into a multimedia form in real time, and the visual experience of a user is enriched.

Description

Video display method and device

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence including computer vision and deep learning, and particularly relates to a video display method and device.

Background

Video generally refers to various techniques for capturing, recording, processing, storing, transmitting, and reproducing a series of still images as electrical signals. Video technology was originally developed for television systems, but has now evolved into a variety of different formats to facilitate consumer recording of video.

Advances in networking technology have also enabled recorded segments of video to be streamed over the internet and received and played by computers.

Disclosure of Invention

A video display method, a video display device, an electronic apparatus, and a storage medium are provided.

According to a first aspect, there is provided a method of displaying a video, comprising: acquiring a text, determining keywords of the text, and acquiring a picture corresponding to the keywords, wherein the keywords exist in each sentence of the text; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the pictures corresponding to the keywords of the sentence in the acquired pictures; and in response to the fact that the picture corresponding to the keyword of each sentence is rendered or a display stopping instruction is received, ending the video display flow.

According to a second aspect, there is provided a display device of a video, comprising: the acquisition unit is configured to acquire a text, determine keywords of the text, and acquire a picture corresponding to the keywords, wherein the keywords exist in each sentence of the text; the rendering unit is configured to acquire and play audio corresponding to each sentence for the sentence in each sentence according to the sequence of each sentence in the text, and render pictures corresponding to the keyword of the sentence in the acquired pictures; and the ending unit is configured to respond to the fact that the picture corresponding to the keyword of each sentence is rendered completely or a display stopping instruction is received, and end the video display flow.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any of the embodiments of the method of displaying video.

According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the display method of the video.

According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any embodiment of the method of displaying video.

According to the scheme, the text can be converted into the video in real time, so that the paper static display of the article is converted into the dynamic playing of the video, the text can be converted into a multimedia form in real time, and the visual experience of a user is enriched.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of displaying video according to the present application;

FIG. 3 is a flow chart of yet another embodiment of a method of displaying video according to the present application;

FIG. 4 is a flow chart of yet another embodiment of a method of displaying video according to the present application;

FIG. 5 is a schematic block diagram of one embodiment of a video display apparatus according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a video display method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 of an embodiment of a display apparatus to which the display method of video or video of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server may analyze and perform other processing on data such as sentences of the received text, and feed back a processing result (e.g., audio corresponding to the sentence) to the terminal device.

It should be noted that the video display method provided in the embodiment of the present application may be executed by the

terminal devices

101, 102, and 103, and accordingly, the video display device may be disposed in the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of displaying video in accordance with the present application is shown. The video display method comprises the following steps:

step 201, obtaining a text, determining keywords of the text, and obtaining a picture corresponding to the keywords, where the keywords exist in each sentence of the text.

In this embodiment, an execution subject (for example, the terminal device shown in fig. 1) on which the display method of the video is executed may acquire a text and determine a keyword of the text. Then, the execution subject may obtain a picture corresponding to the keyword. The execution main body can extract the keywords of the text in the device, and can also send the text to other electronic devices (such as a server) and receive the keywords returned by the other electronic devices. In practice, the execution main body may directly obtain the picture stored in the device, or may send the keyword to another electronic device, and receive the picture returned by the other electronic device.

Corresponding keywords may exist in each sentence, and the keyword existing in each sentence may be at least one. The device or other electronic devices can have a picture set, and the pictures in the picture set have corresponding keywords. For example, the keywords of a plurality of pictures for playing basketball may be "playing basketball", and may also be "playing basketball" or "sports".

Step 202, according to the sequence of each sentence in the text, for the sentence in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the picture corresponding to the keyword of the sentence in the acquired picture.

In this embodiment, the execution main body may obtain, for a sentence (for example, each sentence) in each sentence according to a sequence (or reading sequence) of the line text of each sentence in the text, an audio corresponding to the sentence, and play the audio. In addition, the execution main body can also render pictures corresponding to the keywords of the sentence, so that the display of the pictures and the playing of audio are realized, and further, the video playing is realized. The picture is a picture of the acquired pictures.

Step 203, in response to determining that the rendering of the picture corresponding to the keyword of each sentence is completed or receiving a display stop instruction, ending the video display flow.

In this embodiment, the execution main body may end the video display flow when it is determined that rendering of the picture corresponding to the keyword of each sentence is completed or when a display stop instruction is received. The display stop instruction is used for instructing to stop the display of the video, that is, stopping the display process of the picture corresponding to the keyword.

The method provided by the embodiment of the application can convert the text into the video in real time, so that the paper static display of the article is converted into the dynamic playing of the video, the text can be converted into a multimedia form in real time, and the visual experience of a user is enriched.

In some optional implementation manners of this embodiment, the rendering the picture corresponding to the keyword of the sentence in the obtained picture in step 202 may include: rendering the picture corresponding to the keyword of the sentence in the obtained picture, and rendering the sentence as a subtitle to the upper layer of the picture.

In these optional implementation manners, the execution main body may render not only the picture corresponding to the keyword of the sentence, but also render the sentence as a subtitle corresponding to the rendered picture, that is, a subtitle of the video, to an upper layer of the picture.

The implementation manners can display the subtitles corresponding to the pictures aiming at the pictures, so that the pictures are explained through the subtitles, and the understanding of video contents by audiences is facilitated.

In some optional implementations of this embodiment, the number of keywords of each sentence in the text is at least one; the obtaining of the picture corresponding to the keyword in step 201 may include: sending a picture request comprising keywords to a server, wherein the server searches a single picture or a picture subset with the highest matching degree between the label and the keywords in the picture set and returns the single picture or the picture subset; and receiving a single picture or a picture subset returned by the server as a picture corresponding to the keyword.

In these alternative implementations, the execution main body may send a picture request including the keyword to the server, and the server may determine the picture corresponding to the keyword. The server may determine the best matching at least one picture, or at least a subset of pictures. In case more than one picture or subset of pictures is determined, the executing entity may select from the returned results.

The realization modes can enable the server to execute the picture searching task which needs to consume more running resources, thereby improving the picture obtaining efficiency.

Optionally, the generating of the picture set may include: collecting a plurality of pictures, the plurality of pictures including at least two categories of pictures that are: scene category, species category, action category; and adding labels to the pictures, and generating a picture set by the pictures added with the labels.

In particular, the executing subject or other electronic device may collect a plurality of pictures, the plurality of pictures including at least two categories of pictures. After that, labels may be added to the plurality of pictures, so that a picture set may be generated from the labeled pictures.

In practice, the tags added to the pictures may be used to indicate the category and/or specific content of the pictures. Such as "sports" belonging to the scene category, "basketball" belonging to the action category, "senior citizens" belonging to the species category, and "kittens". The species here refers to the species of the object.

These alternative implementations may generate a label for the picture, thereby facilitating finding the picture corresponding to the keyword.

In some optional implementations of this embodiment, the method is applied to a terminal, and the terminal is installed with a video application; obtaining a text, comprising: responding to the starting of the video application, and acquiring an input URL address at the video application; and determining the page indicated by the URL address, and analyzing the text from the page.

In these alternative implementations, as the execution subject of the terminal, the URL address input by the user (such as copied or manually input) may be acquired by the video application when the video application is started. Then, the execution body may determine the page indicated by the address, and parse the text from the page to extract the text.

These implementations may extract text in a page, thereby providing a way to obtain text.

In some optional implementations of this embodiment, the acquiring and playing the audio corresponding to the sentence in step 202 may include: and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.

In these alternative implementations, the audio synthesis model may complete the process from text to speech, i.e., audio. The execution body may input the sentence to an audio synthesis model, thereby obtaining audio output from the model.

In these implementations, when the execution subject is a terminal device, the execution subject may locally implement audio synthesis through an audio synthesis model.

With continued reference to fig. 3, fig. 3 is a flow chart of yet another embodiment of a video display method according to the present embodiment.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method of displaying a video is shown. The process 400 includes the following steps:

step 401, obtaining a text, determining keywords of the text, and obtaining a picture corresponding to the keywords, where the keywords exist in each sentence of the text.

In this embodiment, an execution subject (for example, the terminal device shown in fig. 1) on which the display method of the video is executed may acquire a text and determine a keyword of the text. Then, the execution subject may obtain a picture corresponding to the keyword. The execution main body can extract the keywords of the text in the device, and can also send the text to other electronic devices (such as a server) and receive the keywords returned by the other electronic devices.

Step 402, performing an audio acquisition step: and determining at least one sentence which is not acquired from each sentence according to the sequence, and acquiring the audio corresponding to the at least one sentence.

In this embodiment, the executing body may execute the audio obtaining step, specifically, determine at least one sentence that is not obtained from each sentence according to the above sequence, and obtain the audio corresponding to the at least one sentence. For example, the next sentence may be read sequentially according to the sequence, and the audio of the sentence may be obtained.

Step 403, extracting a picture corresponding to the keyword of at least one sentence from the obtained pictures, rendering the picture, and playing the obtained audio.

In this embodiment, the execution main body may extract, from the acquired pictures, a picture corresponding to a keyword of at least one sentence, and render the picture. The execution main body can not only perform picture rendering, but also play audio so as to realize playing in a multimedia form.

And step 404, in response to determining that the rendering of the picture corresponding to the keyword of each sentence is finished or receiving a display stop instruction, ending the video display flow.

In this embodiment, the execution main body may end the video display flow when it is determined that rendering of the picture corresponding to the keyword of each sentence is completed or when a display stop instruction is received.

The embodiment can acquire at least one statement at a time and convert the at least one statement into a part of the video so as to realize accurate conversion of the video.

In some optional implementation manners of this embodiment, the obtaining and playing, according to the sequence of each sentence in the text, the audio corresponding to the sentence for the sentence in each sentence, and rendering the picture corresponding to the keyword of the sentence in the obtained picture may further include: executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time; and for the sentence corresponding to the audio, extracting the picture corresponding to the keyword of the sentence from the obtained pictures, rendering the picture, and playing the audio.

In these optional implementations, the executing main body may execute the audio acquiring step again, so as to obtain the audio generated by the execution. Then, the execution main body may continue to extract the picture corresponding to the keyword of the sentence corresponding to the audio (for example, at least one sentence that is determined according to the above sequence and is not yet acquired). The execution body may render the picture and play the audio.

The alternative implementation manners can realize continuous playing of the video by completing multiple processes of audio generation, playing, picture extraction and rendering.

Optionally, the step of performing audio acquisition again may include: and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.

Specifically, the executing body may execute the audio acquiring step again when the audio is played and there is an unexecuted sentence (that is, there is a next sentence). For example, the execution body may first determine whether the last generated audio is played completely, and in a case that the result is yes, determine whether there is an unexecuted sentence. If so, the executing entity may execute the audio acquiring step again.

These optional implementations can convert statements into video sentence by sentence, thereby avoiding the problem of excessive occupation of running resources caused by centralized generation of video.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of a display device for video, which corresponds to the embodiment of the method shown in fig. 2, and which may include the same or corresponding features or effects as the embodiment of the method shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the video display device 500 of the present embodiment includes: an acquisition unit 501, a rendering unit 502, and an end unit 503. The acquiring unit 501 is configured to acquire a text, determine keywords of the text, and acquire a picture corresponding to the keywords, where the keywords exist in each sentence of the text; a rendering unit 502 configured to obtain and play audio corresponding to each sentence for the sentence in each sentence according to the sequence of each sentence in the text, and render a picture corresponding to a keyword of the sentence in the obtained picture; an ending unit 503, configured to end the video display flow in response to determining that rendering of the picture corresponding to the keyword of each sentence is completed or receiving a display stop instruction.

In this embodiment, specific processing of the obtaining unit 501, the rendering unit 502, and the ending unit 503 of the video display device 500 and technical effects thereof can refer to related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementation manners of this embodiment, the rendering unit is further configured to execute, according to the following manner, obtaining and playing, for a sentence in each sentence, an audio corresponding to the sentence according to the sequence of each sentence in the text, and rendering, in the obtained picture, a picture corresponding to a keyword of the sentence: performing an audio acquisition step: determining at least one sentence which is not acquired from each sentence according to the sequence, and acquiring the audio corresponding to the at least one sentence; and extracting a picture corresponding to the keyword of at least one sentence from the obtained pictures, rendering the picture, and playing the obtained audio.

In some optional implementation manners of this embodiment, the rendering unit is further configured to execute, according to the following manner, obtaining and playing, for a sentence in each sentence, an audio corresponding to the sentence according to the sequence of each sentence in the text, and rendering, in the obtained picture, a picture corresponding to a keyword of the sentence: executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time; and for the sentence corresponding to the audio, extracting the picture corresponding to the keyword of the sentence from the obtained pictures, rendering the picture, and playing the audio.

In some optional implementations of this embodiment, the rendering unit is further configured to perform the audio obtaining step again as follows: and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.

In some optional implementations of this embodiment, the rendering unit is further configured to perform rendering, in the obtained picture, a picture corresponding to the keyword of the sentence as follows: rendering the picture corresponding to the keyword of the sentence in the obtained picture, and rendering the sentence as a subtitle to the upper layer of the picture.

In some optional implementations of this embodiment, the number of keywords of each sentence in the text is at least one; the obtaining unit is further configured to obtain the picture corresponding to the keyword according to the following modes: sending a picture request comprising keywords to a server, wherein the server searches a single picture or a picture subset with the highest matching degree between the label and the keywords in the picture set and returns the single picture or the picture subset; and receiving a single picture or a picture subset returned by the server as a picture corresponding to the keyword.

In some optional implementations of this embodiment, the generating of the picture set includes: collecting a plurality of pictures, wherein the plurality of pictures comprises at least two categories of pictures: scene category, species category, action category; and adding labels to the pictures, and generating a picture set by the pictures added with the labels.

In some optional implementations of this embodiment, the method is applied to a terminal, and the terminal is installed with a video application; an obtaining unit further configured to perform obtaining the text as follows: responding to the starting of the video application, and acquiring an input URL address at the video application; and determining the page indicated by the URL address, and analyzing the text from the page.

In some optional implementations of this embodiment, the rendering unit is further configured to acquire and play audio corresponding to the sentence as follows: and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.

There is also provided, in accordance with an embodiment of the present application, an electronic device, a readable storage medium, and a computer program product.

As shown in fig. 6, the embodiment of the present application is a block diagram of an electronic device of a video display method. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the video display method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the display method of a video provided by the present application.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the display method of a video in the embodiment of the present application (for example, the acquisition unit 501, the rendering unit 502, and the ending unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., implements the display method of the video in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the display electronic device of the video, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, which may be connected to the video display electronics via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the display method of the video may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the video-displaying electronic apparatus, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, audio, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a rendering unit, and an end unit. The names of these units do not in some cases form a limitation on the units themselves, and for example, the acquiring unit may also be described as "a unit that determines a keyword of a text and acquires a picture corresponding to the keyword".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a text, determining keywords of the text, and acquiring a picture corresponding to the keywords, wherein the keywords exist in each sentence of the text; according to the sequence of each sentence in the text, for the sentences in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the pictures corresponding to the keywords of the sentence in the acquired pictures; and in response to the fact that the picture corresponding to the keyword of each sentence is rendered or a display stopping instruction is received, ending the video display flow.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method of displaying video, the method comprising:

acquiring a text, determining keywords of the text, and acquiring a picture corresponding to the keywords, wherein the keywords exist in each sentence of the text;

according to the sequence of each sentence in the text, for the sentence in each sentence, acquiring and playing the audio corresponding to the sentence, and rendering the picture corresponding to the keyword of the sentence in the acquired picture;

and in response to the fact that the picture corresponding to the keyword of each sentence is rendered or a display stopping instruction is received, ending the video display flow.

2. The method according to claim 1, wherein the obtaining and playing an audio corresponding to the sentence for the sentence in each sentence according to the sequence of the sentence in the text, and rendering an image corresponding to a keyword of the sentence in the obtained image, comprises:

performing an audio acquisition step: determining at least one sentence which is not acquired from the sentences according to the sequence, and acquiring the audio corresponding to the at least one sentence;

and extracting the picture corresponding to the keyword of the at least one sentence from the obtained pictures, rendering the picture, and playing the obtained audio.

3. The method according to claim 2, wherein the obtaining and playing, for the sentences in each sentence, the audio corresponding to the sentence according to the sequence of the sentences in the text, and rendering the picture corresponding to the keyword of the sentence in the obtained picture further comprise:

executing the audio acquisition step again to obtain the audio generated in the audio acquisition step executed this time;

and for the sentence corresponding to the audio, extracting the picture corresponding to the keyword of the sentence from the obtained pictures, rendering the picture, and playing the audio.

4. The method of claim 3, wherein said re-performing an audio acquisition step comprises:

and in response to determining that the audio is played completely and that there are sentences which have not been acquired, executing the audio acquiring step again.

5. The method of claim 1, wherein the rendering the picture corresponding to the keyword of the sentence in the obtained picture comprises:

rendering the picture corresponding to the keyword of the sentence in the obtained picture, and rendering the sentence as a subtitle to the upper layer of the picture.

6. The method of claim 1, wherein the number of keywords per sentence in the text is at least one;

the acquiring of the picture corresponding to the keyword includes:

sending a picture request including the keyword to a server, wherein the server searches a single picture or a picture subset with the highest matching degree between the label and the keyword in a picture set and returns the picture request;

and receiving a single picture or a picture subset returned by the server as a picture corresponding to the keyword.

7. The method of claim 6, wherein the generating of the set of pictures comprises:

collecting a plurality of pictures, wherein the plurality of pictures comprises at least two categories of pictures that: scene category, species category, action category;

and adding labels to the pictures, and generating the picture set by the pictures added with the labels.

8. The method of claim 1, wherein the method is applied to a terminal, which is installed with a video application; the acquiring the text comprises:

responding to the video application starting, and acquiring an input URL address at the video application;

and determining a page indicated by the URL address, and analyzing a text from the page.

9. The method of claim 1, wherein the obtaining and playing the audio corresponding to the sentence comprises:

and inputting the sentence into the trained audio synthesis model to obtain the audio output from the audio synthesis model, and taking the audio as the audio corresponding to the sentence.

10. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

11. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.