CN112714254B - Video speech display method and device, electronic equipment and storage medium - Google Patents

Video speech display method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112714254B
CN112714254B CN202011600007.9A CN202011600007A CN112714254B CN 112714254 B CN112714254 B CN 112714254B CN 202011600007 A CN202011600007 A CN 202011600007A CN 112714254 B CN112714254 B CN 112714254B
Authority
CN
China
Prior art keywords
speech
lines
video
display
displayed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011600007.9A
Other languages
Chinese (zh)
Other versions
CN112714254A (en
Inventor
陈彦涛
麦文军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Pacific Computer Information Consulting Co ltd
Original Assignee
Guangzhou Pacific Computer Information Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Pacific Computer Information Consulting Co ltd filed Critical Guangzhou Pacific Computer Information Consulting Co ltd
Priority to CN202011600007.9A priority Critical patent/CN112714254B/en
Publication of CN112714254A publication Critical patent/CN112714254A/en
Application granted granted Critical
Publication of CN112714254B publication Critical patent/CN112714254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/2222Prompting

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

The invention discloses a video speech display method, a video speech display device, electronic equipment and a storage medium, wherein the method comprises the following steps: when entering a video shooting interface, acquiring a preset control file; generating a speech to be displayed based on the control file; acquiring a display mode of the lines to be displayed; determining target display lines in the lines to be displayed according to the display mode; acquiring face information of a user, and calculating the display size of the target display lines according to the face information; and displaying the target display lines in the video shooting interface according to the display size. The technical problems that a user cannot manage lines in the existing short video shooting process, the user forgets the lines in the shooting process easily, the rhythm is difficult to control when wrong lines are read or lines are read, and the like are solved, and the shooting cost of the user is reduced.

Description

Video speech display method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for displaying video lines, electronic equipment and a storage medium.
Background
With the explosion of short videos, users in various industries including the automobile media industry start to make a large number of videos, many video photographers can write a script and make a channel word in advance, but the user can easily forget the word even if writing a file and shooting the file in advance, or the user can not control the rhythm of reading the word well in the shooting process. The technical solutions provided by most of the enterprises at present are as follows:
1) provides a shooting function without any word reading assistance. The user can only remember the lines by himself, but this easily causes the user to forget the lines, resulting in a large increase in the cost of capturing the video.
2) The method has the advantages that simple text prompt is provided, the lines cannot be modified in the shooting process, the user cannot be helped to locate the lines to be read, and the user cannot be helped to control the reading rhythm in the shooting process. Meanwhile, the size of the displayed lines cannot be modified, or the size can only be changed manually by a user, which is very inconvenient to operate.
As can be seen from the above, for the existing shooting function, more than all of the shooting functions are shooting and collecting directly by calling a camera of the system, or simply displaying characters to be read. The user can not manage the lines by himself in the shooting process, so that the user can forget the lines in the shooting process easily, the rhythm is difficult to control when the user reads wrong lines or lines, and the shooting cost of the user is improved.
Disclosure of Invention
The invention provides a video speech display method and device, electronic equipment and a storage medium, which are used for solving the technical problems that a user cannot manage speech in the existing short video shooting process, the user forgets the speech in the shooting process, and the rhythm is difficult to control when the user wrongly reads the speech or reads the speech, and the like, and the shooting cost of the user is reduced.
The invention provides a video speech display method, which comprises the following steps:
when entering a video shooting interface, acquiring a preset control file;
generating a speech to be displayed based on the control file;
acquiring a display mode of the lines to be displayed;
determining target display lines in the lines to be displayed according to the display mode;
acquiring face information of a user, and calculating the display size of the target display lines according to the face information;
and displaying the target display lines in the video shooting interface according to the display size.
Optionally, the step of generating a to-be-displayed speech based on the control file includes:
when the line-of-speech modification operation triggered by the user is not detected in the video shooting process, generating a line-of-speech to be displayed by adopting the control file;
when a speech modification operation triggered by a user is detected in the video shooting process, modifying the control file according to the speech modification operation to generate a modified control file;
and generating the lines to be displayed by adopting the modified control file.
Optionally, the method further comprises:
determining a modification starting time point according to the modification control file;
intercepting a first target video clip in the shot video according to the modification starting time point;
generating a second target video clip when video shooting is continued based on the modified starting time point;
and generating a target video by adopting the first target video segment and the second target video segment.
Optionally, the step of determining a target display speech line in the speech lines to be displayed according to the display mode includes:
when the display mode is displaying according to a shooting time axis, recording shooting time through a preset timer;
when the shooting time meets timing callback time, acquiring a starting time point of the lines to be displayed in the control file;
and matching the timing callback time with the starting time point, and determining a target display speech in the speech to be displayed.
Optionally, the step of determining a target display speech line in the speech lines to be displayed according to the display mode includes:
when the display mode is that the corresponding lines are displayed according to the user speech rate, extracting the audio track of the current shot video;
identifying textual information for the audio track;
matching the character information in the lines to be displayed;
and determining a target display speech in the speech to be displayed according to the matching result.
Optionally, the step of obtaining the face information of the user and calculating the display size of the target display lines according to the face information includes:
acquiring face information of a user;
extracting a binocular distance from the face information;
calculating the relative distance between the face of the user and the video shooting interface by adopting the distance between the two eyes and preset distance reference data;
and calculating the display size of the target display lines by adopting the relative distance and preset character size reference data.
The present invention also provides a video speech display device, comprising:
the device comprises a preset control file acquisition module, a video shooting module and a video processing module, wherein the preset control file acquisition module is used for acquiring a preset control file when entering a video shooting interface;
the to-be-displayed line generation module is used for generating lines to be displayed based on the control file;
the display mode acquisition module is used for acquiring the display mode of the lines to be displayed;
the target display lines determining module is used for determining target display lines in the lines to be displayed according to the display mode;
the display size calculation module is used for acquiring the face information of the user and calculating the display size of the target display lines according to the face information;
and the target display speech-line display module is used for displaying the target display speech-lines in the video shooting interface according to the display size.
Optionally, the display size calculating module includes:
the face information acquisition submodule is used for acquiring face information of a user;
a binocular distance extraction submodule for extracting binocular distance from the face information;
the relative distance calculation submodule is used for calculating the relative distance between the face of the user and the video shooting interface by adopting the binocular distance and preset distance reference data;
and the display size calculation submodule is used for calculating the display size of the target display lines by adopting the relative distance and preset character size reference data.
The invention also provides an electronic device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the video speech presentation method according to any one of the above instructions in the program code.
The present invention also provides a computer-readable storage medium for storing a program code for executing the video speech presentation method according to any one of the above.
According to the technical scheme, the invention has the following advantages: the invention can self-define the display lines of the video by controlling the file, thereby enabling the setting of the lines to be more flexible; the size of the lines can be automatically adjusted by identifying the face information of the user, so that the situation that the reading of the lines cannot be smoothly finished due to the fact that the lines cannot be clearly seen by the user due to distance is avoided; in addition, the positioning of the target display lines can be realized by selecting the display mode, so that the error of the lines caused by the interference of other lines on a user is avoided, and the shooting rhythm is kept undisturbed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flowchart illustrating steps of a method for displaying a video speech in accordance with an embodiment of the present invention;
fig. 2 is a flowchart illustrating steps of a method for displaying video lines according to another embodiment of the present invention;
fig. 3 is a flowchart of a target video synthesis method for modifying a speech-line according to an embodiment of the present invention;
fig. 4 is a flowchart of determining a target display speech word according to an embodiment of the present invention;
fig. 5 is a flowchart illustrating a process of calculating a display size of a target display speech according to an embodiment of the present invention;
fig. 6 is a flowchart of a method for displaying video lines according to an embodiment of the present invention;
fig. 7 is a block diagram of a video speech-line display device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a video speech display method, a video speech display device, electronic equipment and a storage medium, which are used for solving the technical problems that a user cannot manage speech in the existing short video shooting process, the user forgets the speech in the shooting process, and the rhythm is difficult to control when the user wrongly reads the speech or reads the speech, and the like, and reducing the shooting cost of the user.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for displaying video lines according to an embodiment of the present invention.
The method for displaying the video lines provided by the invention specifically comprises the following steps:
step 101, when entering a video shooting interface, acquiring a preset control file;
in the embodiment of the invention, the lines displayed in the shooting process mainly control the centralized management of the files.
In a specific implementation, before video shooting, a video shooting software system may initialize a control file in advance and load the control file in a memory, and the control file is continuously held by the system in the whole shooting process until a user exits from a shooting interface after shooting is finished, and the control file is released from the memory and written into a local file so as to be used in the next shooting.
102, generating a line to be displayed based on the control file;
when shooting is carried out, the control file is obtained, the video-related speech to be displayed can be extracted from the control file, and the video-related speech to be displayed is used for displaying the corresponding speech in real time in a video shooting interface according to the shooting condition of the video subsequently, so that a user is guided to speak out the speech required by video shooting.
103, acquiring a display mode of the lines to be displayed;
in the embodiment of the invention, in order to meet the requirements of different users on the speech display, a plurality of different speech display modes can be preset. When the video starts to be shot, the corresponding lines display mode can be triggered according to the selection operation of the display mode.
104, determining target display lines in the lines to be displayed according to the display mode;
in a specific implementation, the displayed target display lines are different at different times according to different display modes of the lines. For example, the lines may be displayed such that the entire sentence is completely displayed in a unit of a single sentence; or each single character of the sentence can be sequentially presented by taking the character as a unit and taking a single sentence as the maximum display length. Therefore, in practical applications, it is necessary to determine a target display speech-line matching the shooting progress according to the shooting progress of the current video based on the display mode of the currently selected speech-line.
105, acquiring face information of a user, and calculating the display size of the target display lines according to the face information;
in practical application, in order to ensure normal reading of the speech-lines by the user, the size of the speech-lines needs to meet the requirement that the user can see clearly under different distances. The size of the lines needs to be changed following the change in the distance between the user and the video capture interface.
In specific implementation, the display size of the target implementation lines can be calculated by acquiring the face information of the user and according to the incidence relation between the face information and the font size.
And 106, displaying the target display lines in the video shooting interface according to the display size.
In the embodiment of the invention, after the display size of the speech and the target display speech at the current moment are obtained, the target speech can be displayed in the video shooting interface so as to remind the user of the speech which needs to be expressed currently.
The invention can self-define the display lines of the video by controlling the file, thereby enabling the setting of the lines to be more flexible; the size of the lines can be automatically adjusted by identifying the face information of the user, so that the situation that the reading of the lines cannot be smoothly finished due to the fact that the lines cannot be clearly seen by the user due to distance is avoided; in addition, the positioning of the target display lines can be realized by selecting the display mode, so that the error of the lines caused by the interference of other lines on a user is avoided, and the shooting rhythm is kept undisturbed.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for displaying video lines according to another embodiment of the present invention; the method may specifically comprise the steps of:
step 201, when entering a video shooting interface, acquiring a preset control file;
in the embodiment of the invention, the lines displayed in the shooting process mainly control the centralized management of the files.
In a specific implementation, before video shooting, a video shooting software system may initialize a control file in advance and load the control file in a memory, and the control file is continuously held by the system in the whole shooting process until a user exits from a shooting interface after shooting is finished, and the control file is released from the memory and written into a local file so as to be used in the next shooting.
In an example, a structure corresponding to a control file provided by the embodiment of the present invention may be as shown in table 1 below.
Figure BDA0002868599080000071
TABLE 1
Step 202, when the line-of-speech modification operation triggered by the user is not detected in the video shooting process, generating a line-of-speech to be displayed by adopting the control file;
step 203, when a speech modification operation triggered by a user is detected in the video shooting process, modifying the control file according to the speech modification operation to generate a modified control file;
in the embodiment of the invention, the user can modify the content and the appearance time point of the lines before and during shooting.
In one example, when a user-triggered speech modification operation is detected during video capture, a modification control file may be generated in accordance with the speech modification operation. The term modifying operation may be adding terms, modifying time points of terms, and the like.
In a specific implementation, when the user triggers a line-word modification operation, on one hand, the user can directly input the line word to be modified so as to generate a modification control file according to the modified line word. On the other hand, the user can also input the search terms, the system can monitor the search terms input by the user in real time and then request a background interface, and the background returns the information related to the words input by the user to the front end for display by searching for the user to select. For example, when a user inputs the name of a certain vehicle, the configuration information and the related information corresponding to the vehicle can be automatically displayed, and the user can modify the control file according to the configuration information and the related information fed back by the system to generate a modified control file.
Step 204, generating lines to be displayed by adopting the modified control file;
in the embodiment of the invention, after the modification control file is obtained, the video-related speech to be displayed can be extracted from the modification control file, so that the corresponding speech can be displayed in real time in a video shooting interface according to the shooting condition of the video in the following process, and a user is guided to speak the speech required by video shooting.
Further, in order to ensure the continuity of video shooting and reduce the operation behavior of a user. When a speech modification operation triggered by a user is detected in the video shooting process, the embodiment of the invention can further comprise the following steps:
determining a modification starting time point according to the modification control file;
intercepting a first target video clip in the shot video according to the modification starting time point;
generating a second target video clip when video shooting is continued based on the modification starting time point;
and generating a target video by adopting the first target video segment and the second target video segment.
In a specific implementation, in a short video shooting process, if a user modifies a speech, a previously shot video may be cut according to a modification start time point of a currently modified speech at a short video time sequence (for example, a video is obtained by setting start and duration of an avassetsexportsession attribute timeRange), a first target video segment before the cut point is taken out, and when the user continues shooting, a newly shot second target video segment may be spliced with the first target video segment (for example, video synthesis may be performed by using setoptitratfromstartopportunity of avmustevideo composition instruction), and the second target video segment is automatically linked to obtain a target video. By the method, the user can perform the speech modification operation at any time in the shooting process, and the phenomenon that all videos are abandoned and restarted due to the error of the speech is avoided.
In an example, as shown in fig. 3, fig. 3 is a flowchart of a target video composition method when a speech is modified according to an embodiment of the present invention.
As shown in fig. 3, when a start stage or an intermediate stage of shooting a certain segment is performed, a speech modification operation is triggered, a video time point is matched in a shot video according to a modification start time point of a speech, and a video before the modification start time point is captured according to the video time point to generate a first target video segment. And continuing to shoot the video after the lines are modified to generate a second target video segment, and synthesizing the first target video segment and the second target video segment to obtain the target video.
In another example, when modifying the speech-line time point, if the speech rate of the existing speech-line is faster or slower (the user providing the speech-line pre-reading function can pre-read the existing speech-line first), the user can modify the starting time point and the duration of the speech-line according to the actual situation of the user, so that the user can adapt to the speech rate of the user. The user can finally make corresponding modification to the control file no matter the user modifies the content or the time point. The starting point corresponds to the modification of the zmStart object in the control file and the duration corresponds to the modification of the duration object of the control file. The control file after the modification can play a role of controlling the scroll of the lines in the mode of automatically displaying the lines according to the shooting time axis.
Step 205, acquiring a display mode of the lines to be displayed;
in the embodiment of the invention, in order to meet the requirements of different users on the speech display, a plurality of different speech display modes can be preset. When the video starts to be shot, the corresponding lines display mode can be triggered according to the selection operation of the display mode.
Step 206, determining target display lines in the lines to be displayed according to the display mode;
in a specific implementation, the displayed target display lines are different at different times according to different display modes of the lines. For example, the display of the speech may be to display the entire sentence in a single sentence unit in a complete manner; or each single character of the sentence can be sequentially presented by taking the character as a unit and taking a single sentence as the maximum display length. Therefore, in practical applications, it is necessary to determine a target display speech-line matching the shooting progress according to the shooting progress of the current video based on the display mode of the currently selected speech-line.
In one example, step 206 may include the following sub-steps:
s11, when the display mode is according to the shooting time axis, recording the shooting time by a preset timer;
s12, when the shooting time meets timing callback time, acquiring the starting time point of the to-be-displayed line in the control file;
and S13, matching the timing callback time with the starting time point, and determining a target display line in the lines to be displayed.
In specific implementation, a novice photographer may need to take more help of a script, and a mode of displaying lines according to a shooting time axis can enable a user to read lines one by one according to lines set in advance, when the user clicks the shooting start, a timer object can be created, the timer starts to time, lines in a control file can be matched according to timing callback time in the timing process (the callback time is compared with a starting time point zmStart of the lines in the control file), and therefore the novice photographer can be determined which line is to be displayed and color is changed for display next second. When the user clicks the shooting pause or the shooting is finished, the timer stops timing, and the lines also stop scrolling.
In another example, step 206 may include the following sub-steps:
s21, when the display mode is that the corresponding speech is displayed according to the user speech speed, extracting the audio track of the current shooting video;
s22, recognizing the character information of the audio track;
s23, matching the character information in the lines to be displayed;
and S24, determining target display lines in the lines to be displayed according to the matching result.
In specific implementation, a user may select to automatically display a corresponding speech according to a speech rate, perform audio and video parsing on a video stream output in real time in a shooting process (for example, obtain the video stream by calling back captureOutput through an AVCaptureVideoDataOutput agent, and implement audio parsing by specifying that an output type is AVMediaTypeAudio through insertTimeRange), extract an individual audio track, intelligently identify text information in the audio track, compare the text information with a speech to be displayed, change a color of a corresponding text when the same speech is matched, indicate that the text is read, and use the next speech as a target display speech. When the user reads the last character of the line, the user can scroll to the next line. And determining the next sentence speech as a target display speech. Therefore, the next sentence of speech can be displayed after the user finishes reading the speech, so that the user can control the speed of reading the speech by himself, when the speed of the speech is high, the speech rolling display is accelerated, and when the speed of the speech is low, the speech rolling display is slowed. Therefore, the user can better control the rolling of the lines according to the speed of the user. The specific flow is shown in fig. 4.
Step 207, obtaining face information of a user, and calculating the display size of the target display lines according to the face information;
in practical application, in order to ensure normal reading of the speech-lines by the user, the size of the speech-lines needs to meet the requirement that the user can see clearly under different distances. The size of the lines needs to be changed following the change in the distance between the user and the video capture interface.
In specific implementation, the display size of the target implementation lines can be calculated by acquiring the face information of the user and according to the incidence relation between the face information and the font size.
In one example, step 207 may include the following sub-steps:
s31, acquiring the face information of the user;
s32, extracting a binocular distance from the face information;
s33, calculating the relative distance between the user face and the video shooting interface by adopting the distance between the two eyes and preset distance reference data;
and S34, calculating the display size of the target display lines by using the relative distance and preset character size reference data.
In specific implementation, as shown in fig. 5, after a user clicks a shooting start, when the user triggers an operation option for automatically adjusting the size of a displayed word, the system may extract face information of the user (for example, the face information may be obtained by using haarcacade _ front _ alt2 in OpenCV), extract a binocular distance of the user from the face information, compare the binocular distance of the user with preset distance reference data (for example, 180 pixels at 0.5 meter), obtain a shorter binocular distance as the distance is longer, calculate a relative distance between the face of the user and a video shooting interface, calculate a word font size of a target display word to be displayed correspondingly according to the preset word size reference data (for example, 20 # font at 0.5 meter), and increase the distance, the word size is larger; thereby determining the display size of the target display lines. Therefore, as the distance between the face of the user and the video shooting interface changes, the display size of the corresponding target display lines also changes. For example, if the distance between the two eyes of the user is 90 pixels, the corresponding relative distance is 180/90 × 0.5 × 1 meter, and the font size is 1/0.5 × 20 — 40 font size.
In another example, as shown in fig. 5, the user may also manually adjust the display size of the target display lines, for example, the size of the target display line text is set according to the screen size, wherein the size of the target display line text corresponds to 2, 4, 6 or 8 characters on the screen. Firstly, the width of the current shooting equipment is obtained, then the width is divided by the number of characters to be displayed, the width of each character is calculated, and the width is converted into the corresponding character size according to the width. Therefore, the user can freely set the size of the characters, and the user can not see clearly even if the shooting distance is long.
And 208, displaying the target display lines in the video shooting interface according to the display size.
In the embodiment of the invention, after the display size of the lines and the target display lines at the current moment are obtained, the target lines can be displayed in the video shooting interface so as to remind a user of the lines needing to be expressed currently.
The invention can self-define the display lines of the video by controlling the file, thereby enabling the setting of the lines to be more flexible; the size of the lines can be automatically adjusted by identifying the face information of the user, so that the situation that the reading of the lines cannot be smoothly finished due to the fact that the lines cannot be clearly seen by the user due to distance is avoided; in addition, the positioning of the target display lines can be realized by selecting the display mode, so that the error of the lines caused by the interference of other lines on a user is avoided, and the shooting rhythm is kept undisturbed.
For ease of understanding, the following description of embodiments of the present invention by way of specific examples is provided:
referring to fig. 6, fig. 6 is a flowchart of a video speech-line display method according to an embodiment of the present invention.
As shown in fig. 6, when the video to be shot includes a plurality of video mirrors, the shooting interface of each video mirror is sequentially performed, and the control file is initialized. Thereby obtaining the speech to be displayed according to the control file.
When the speech modification operation is triggered, the speech can be modified, wherein the speech modification operation can comprise content modification and speech display time modification, and the content modification can be realized by determining a modified word in an intelligent word bank matching mode, so that the modification of the control file, the speech display time modification and the control file modification can be completed based on the modified word. After the modification is completed, the shooting can be continued.
In the shooting process, the user may select a display mode of the speech, which may specifically include: and displaying the speech-lines according to the shooting time and displaying the speech-lines according to the user speech speed.
The method for displaying the lines according to the shooting time can be called back by the timer and match the starting time of each line of lines, so that the target display lines at each moment are determined. And according to the way of displaying the speech-lines of the user, mainly through extracting the audio track of the user, the character information in the audio track is identified, and the character information is matched with the speech-lines to be displayed, so that the target display speech-lines are determined.
In the shooting process, the face information of the user can be extracted in real time, the distance between the eyes of the user is extracted from the face information, the relative distance between the user and the video shooting interface is calculated, and therefore the display size of the target display lines is adjusted according to the relative distance.
Referring to fig. 7, fig. 7 is a block diagram of a video speech-line display device according to an embodiment of the present invention.
An embodiment of the present invention further provides a video speech display device, including:
a preset control file obtaining module 701, configured to obtain a preset control file when entering a video shooting interface;
a to-be-displayed speech generation module 702, configured to generate a to-be-displayed speech based on the control file;
a display mode obtaining module 703, configured to obtain a display mode of the to-be-displayed speech;
a target display speech determining module 704, configured to determine a target display speech in the to-be-displayed speech according to the display mode;
a display size calculation module 705, configured to obtain face information of a user, and calculate a display size of the target display lines according to the face information;
and a target display speech-line display module 706, configured to display the target display speech-line in the video shooting interface according to the display size.
In this embodiment of the present invention, the to-be-displayed speech generating module 702 includes:
the first to-be-displayed speech generation sub-module is used for generating speech to be displayed by adopting the control file when no speech modification operation triggered by a user is detected in the video shooting process;
the modification control file generation sub-module is used for modifying the control file according to the speech modification operation when detecting the speech modification operation triggered by the user in the video shooting process to generate a modification control file;
and the second to-be-displayed speech generation submodule is used for generating the to-be-displayed speech by adopting the modified control file.
In the embodiment of the present invention, the method further includes:
a modification starting time point determining submodule for determining a modification starting time point according to the modification control file;
the first target video clip intercepting submodule is used for intercepting a first target video clip in the shot video according to the modification starting time point;
a second target video segment generation sub-module for generating a second target video segment when video shooting is continued based on the modification start time point;
and the target video generation sub-module is used for generating a target video by adopting the first target video clip and the second target video clip.
In this embodiment of the present invention, the target display speech determining module 704 includes:
the shooting time recording submodule is used for recording the shooting time through a preset timer when the display mode is displaying according to a shooting time axis;
the starting time point acquisition submodule is used for acquiring the starting time point of the to-be-displayed speech in the control file when the shooting time meets timing callback time;
and the target display speech determining submodule is used for matching the timing callback time with the starting time point and determining a target display speech in the speech to be displayed.
In this embodiment of the present invention, the target display speech determining module 704 includes:
the audio track extraction sub-module is used for extracting the audio track of the current shooting video when the display mode is that the corresponding speech-sounds are displayed according to the user speech speed;
the character information identification submodule is used for identifying the character information of the audio track;
the character information matching sub-module is used for matching the character information in the lines to be displayed;
and the target display line determining submodule is used for determining a target display line in the lines to be displayed according to the matching result.
In this embodiment of the present invention, the display size calculating module 705 includes:
the face information acquisition submodule is used for acquiring face information of a user;
the binocular distance extraction submodule is used for extracting binocular distance from the face information;
the relative distance calculation submodule is used for calculating the relative distance between the face of the user and the video shooting interface by adopting the binocular distance and preset distance reference data;
and the display size calculation submodule is used for calculating the display size of the target display lines by adopting the relative distance and preset character size reference data.
An embodiment of the present invention further provides an electronic device, where the device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the video speech-line display method according to the embodiment of the invention according to the instructions in the program codes.
The embodiment of the invention also provides a computer-readable storage medium, which is used for storing a program code, and the program code is used for executing the video speech-line display method in the embodiment of the invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for displaying video lines, comprising:
when entering a video shooting interface, acquiring a preset control file;
generating lines to be displayed based on the control file;
acquiring a display mode of the lines to be displayed;
determining target display lines in the lines to be displayed according to the display mode;
acquiring face information of a user, and calculating the display size of the target display lines according to the face information;
displaying the target display lines in the video shooting interface according to the display size;
wherein, the step of generating the lines to be displayed based on the control file comprises the following steps:
when the line-of-speech modification operation triggered by the user is not detected in the video shooting process, generating a line-of-speech to be displayed by adopting the control file; the speech modification operation comprises the steps of modifying speech and modifying the occurrence time point of the speech;
when a speech modification operation triggered by a user is detected in the video shooting process, modifying the control file according to the speech modification operation to generate a modified control file;
generating a line to be displayed by adopting the modified control file;
wherein, the step of determining a target display speech in the speech to be displayed according to the display mode comprises:
when the display mode is that the corresponding lines are displayed according to the user speech rate, extracting the audio track of the current shot video;
identifying textual information for the audio track;
matching the character information in the lines to be displayed;
and determining a target display speech in the speech to be displayed according to the matching result.
2. The method of claim 1, further comprising:
determining a modification starting time point according to the modification control file;
intercepting a first target video clip in the shot video according to the modification starting time point;
generating a second target video clip when video shooting is continued based on the modified starting time point;
and generating a target video by adopting the first target video segment and the second target video segment.
3. The method according to claim 1, wherein the step of determining a target display line among the lines to be displayed according to the display mode comprises:
when the display mode is displaying according to a shooting time axis, recording shooting time through a preset timer;
when the shooting time meets timing callback time, acquiring a starting time point of the to-be-displayed speech;
and matching the timing callback time with the starting time point, and determining a target display speech in the speech to be displayed.
4. The method according to any one of claims 1 to 3, wherein the step of acquiring face information of the user and calculating the display size of the target display lines according to the face information comprises:
acquiring face information of a user;
extracting a binocular distance from the face information;
calculating the relative distance between the face of the user and the video shooting interface by adopting the distance between the two eyes and preset distance reference data;
and calculating the display size of the target display lines by adopting the relative distance and preset character size reference data.
5. A video speech display apparatus, comprising:
the device comprises a preset control file acquisition module, a video shooting module and a video processing module, wherein the preset control file acquisition module is used for acquiring a preset control file when entering a video shooting interface;
the to-be-displayed speech generation module is used for generating the to-be-displayed speech based on the control file;
the display mode acquisition module is used for acquiring the display mode of the lines to be displayed;
the target display lines determining module is used for determining target display lines in the lines to be displayed according to the display mode;
the display size calculation module is used for acquiring the face information of the user and calculating the display size of the target display lines according to the face information;
the target display line display module is used for displaying the target display line in the video shooting interface according to the display size;
the to-be-displayed line generation module comprises:
the first to-be-displayed speech generation sub-module is used for generating speech to be displayed by adopting the control file when no speech modification operation triggered by a user is detected in the video shooting process; the speech modification operation comprises the steps of modifying speech and modifying the occurrence time point of the speech;
the modification control file generation sub-module is used for modifying the control file according to the speech modification operation when detecting the speech modification operation triggered by the user in the video shooting process to generate a modification control file;
the second to-be-displayed speech generation submodule is used for generating the to-be-displayed speech by adopting the modification control file;
wherein, the target display speech determining module comprises:
the audio track extraction sub-module is used for extracting the audio track of the current shooting video when the display mode is that the corresponding speech-sounds are displayed according to the user speech speed;
the character information identification submodule is used for identifying the character information of the audio track;
the character information matching sub-module is used for matching the character information in the lines to be displayed;
and the target display line determining submodule is used for determining a target display line in the lines to be displayed according to the matching result.
6. The apparatus of claim 5, wherein the display size calculation module comprises:
the face information acquisition submodule is used for acquiring face information of a user;
the binocular distance extraction submodule is used for extracting binocular distance from the face information;
the relative distance calculation submodule is used for calculating the relative distance between the face of the user and the video shooting interface by adopting the binocular distance and preset distance reference data;
and the display size calculation submodule is used for calculating the display size of the target display lines by adopting the relative distance and preset character size reference data.
7. An electronic device, wherein the device comprises a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the video speech presentation method according to any one of claims 1-4 according to instructions in the program code.
8. A computer-readable storage medium for storing a program code for executing the video-speech-line display method according to any one of claims 1 to 4.
CN202011600007.9A 2020-12-29 2020-12-29 Video speech display method and device, electronic equipment and storage medium Active CN112714254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011600007.9A CN112714254B (en) 2020-12-29 2020-12-29 Video speech display method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011600007.9A CN112714254B (en) 2020-12-29 2020-12-29 Video speech display method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112714254A CN112714254A (en) 2021-04-27
CN112714254B true CN112714254B (en) 2022-06-14

Family

ID=75546757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011600007.9A Active CN112714254B (en) 2020-12-29 2020-12-29 Video speech display method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112714254B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800302A (en) * 2011-05-25 2012-11-28 联想移动通信科技有限公司 Method for adjusting resolution of display screen by terminal equipment, and terminal equipment
CN105653032A (en) * 2015-12-29 2016-06-08 小米科技有限责任公司 Display adjustment method and apparatus
CN206657346U (en) * 2017-04-18 2017-11-21 苏州科技大学 A kind of display screen font size automatic adjustment system
CN110784762A (en) * 2019-08-21 2020-02-11 腾讯科技(深圳)有限公司 Video data processing method, device, equipment and storage medium
CN111372119A (en) * 2020-04-17 2020-07-03 维沃移动通信有限公司 Multimedia data recording method and device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7591558B2 (en) * 2006-05-31 2009-09-22 Sony Ericsson Mobile Communications Ab Display based on eye information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800302A (en) * 2011-05-25 2012-11-28 联想移动通信科技有限公司 Method for adjusting resolution of display screen by terminal equipment, and terminal equipment
CN105653032A (en) * 2015-12-29 2016-06-08 小米科技有限责任公司 Display adjustment method and apparatus
CN206657346U (en) * 2017-04-18 2017-11-21 苏州科技大学 A kind of display screen font size automatic adjustment system
CN110784762A (en) * 2019-08-21 2020-02-11 腾讯科技(深圳)有限公司 Video data processing method, device, equipment and storage medium
CN111372119A (en) * 2020-04-17 2020-07-03 维沃移动通信有限公司 Multimedia data recording method and device and electronic equipment

Also Published As

Publication number Publication date
CN112714254A (en) 2021-04-27

Similar Documents

Publication Publication Date Title
CN108900902B (en) Method, device, terminal equipment and storage medium for determining video background music
US9799375B2 (en) Method and device for adjusting playback progress of video file
CN108900771B (en) Video processing method and device, terminal equipment and storage medium
CN107369462B (en) Electronic book voice playing method and device and terminal equipment
US11869508B2 (en) Systems and methods for capturing, processing, and rendering one or more context-aware moment-associating elements
EP4068793A1 (en) Video editing method, video editing apparatus, terminal, and readable storage medium
US11657822B2 (en) Systems and methods for processing and presenting conversations
CN113194255A (en) Shooting method and device and electronic equipment
CN111079494B (en) Learning content pushing method and electronic equipment
US12041313B2 (en) Data processing method and apparatus, device, and medium
CN112437353A (en) Video processing method, video processing apparatus, electronic device, and readable storage medium
CN111586490A (en) Multimedia interaction method, device, equipment and storage medium
CN113709545A (en) Video processing method and device, computer equipment and storage medium
CN111026786A (en) Dictation list generation method and family education equipment
CN111026901A (en) Learning content searching method and learning equipment
CN113992973A (en) Video abstract generation method and device, electronic equipment and storage medium
CN112714254B (en) Video speech display method and device, electronic equipment and storage medium
CN112019936B (en) Method, device, storage medium and computer equipment for controlling video playing
CN111081088A (en) Dictation word receiving and recording method and electronic equipment
KR101783872B1 (en) Video Search System and Method thereof
CN114863448A (en) Answer statistical method, device, equipment and storage medium
CN111027317A (en) Control method for dictation and reading progress and electronic equipment
CN111090738A (en) Double-screen-based photographing question searching method and electronic equipment
CN114494951B (en) Video processing method, device, electronic equipment and storage medium
CN111079498A (en) Learning function switching method based on mouth shape recognition and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant