WO2024131585A1 - 视频特效显示方法、装置、电子设备及存储介质 - Google Patents

视频特效显示方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024131585A1
WO2024131585A1 PCT/CN2023/137977 CN2023137977W WO2024131585A1 WO 2024131585 A1 WO2024131585 A1 WO 2024131585A1 CN 2023137977 W CN2023137977 W CN 2023137977W WO 2024131585 A1 WO2024131585 A1 WO 2024131585A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
video
text
map
moving
Prior art date
Application number
PCT/CN2023/137977
Other languages
English (en)
French (fr)
Inventor
吴燊
廖昀昊
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024131585A1 publication Critical patent/WO2024131585A1/zh

Links

Definitions

  • the embodiments of the present disclosure relate to the field of Internet technology, and in particular to a method, device, electronic device and storage medium for displaying video special effects.
  • APP Application, APP
  • a user contribution function interface is provided for users to shoot and edit videos within the function interface, including adding video effects to the shot target video.
  • corresponding types of special effect maps such as fireworks effects, lighting effects, etc., will be generated in the target video, so that the target video has better visual expression.
  • the embodiments of the present disclosure provide a video special effect display method, device, electronic device and storage medium to overcome the problems of poor special effect visual effects and low interactivity.
  • an embodiment of the present disclosure provides a video special effects display method, comprising:
  • a target video is obtained, and a user voice in the target video is extracted; based on the user voice in the target video, at least one target text corresponding to the content of the user voice is generated; a target map corresponding to the target text is displayed word by word in the target video, wherein the target map is centered on a target area in the target video and moves outward along a target trajectory.
  • an embodiment of the present disclosure provides a video special effects display device, including:
  • a voice module used to obtain a target video and extract the user's voice in the target video
  • a processing module configured to generate at least one target text corresponding to the content of the user voice according to the user voice in the target video;
  • a display module is used to display a target map corresponding to the target text in the target video word by word, wherein the target map is centered on the target area in the target video and moves outward along a target trajectory.
  • an electronic device including:
  • a processor and a memory communicatively connected to the processor
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory to implement the video special effects display method described in the first aspect and various possible designs of the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
  • a processor executes the computer execution instructions, the video special effects display method described in the first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video special effects display method as described in the first aspect and various possible designs of the first aspect.
  • FIG1 is a diagram showing an application scenario of a video special effects display method provided by an embodiment of the present disclosure
  • FIG2 is a flow chart of a method for displaying video special effects according to an embodiment of the present disclosure
  • FIG3 is a schematic diagram of a specific implementation process of step S102 in the embodiment shown in FIG2 ;
  • FIG4 is a schematic diagram of a target map moving in a target video provided by an embodiment of the present disclosure
  • FIG5 is a schematic diagram of a specific implementation process of step S103 in the embodiment shown in FIG2 ;
  • FIG6 is a schematic diagram of another target map moving in a target video provided by an embodiment of the present disclosure.
  • FIG. 7 is a second flow chart of a method for displaying video special effects provided by an embodiment of the present disclosure.
  • FIG8 is a schematic diagram of a specific implementation process of step S203 in the embodiment shown in FIG7 ;
  • FIG9 is a schematic diagram of a target area provided by an embodiment of the present disclosure.
  • FIG10 is a schematic diagram of a specific implementation process of step S2032 in the embodiment shown in FIG8 ;
  • FIG11 is a schematic diagram of the distribution of a moving starting point provided by an embodiment of the present disclosure.
  • FIG12 is a schematic diagram of a specific implementation process of step S204 in the embodiment shown in FIG7 ;
  • FIG13 is a schematic diagram of a target deflection angle provided by an embodiment of the present disclosure.
  • FIG14 is a schematic diagram of a specific implementation process of step S205 in the embodiment shown in FIG7 ;
  • FIG15 is a structural block diagram of a video special effects display device provided by an embodiment of the present disclosure.
  • FIG16 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 17 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
  • user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • FIG1 is an application scenario diagram of the video special effects display method provided by the embodiment of the present disclosure.
  • the video special effects display method provided by the embodiment of the present disclosure can be applied to application scenarios such as video editing and video live broadcasting.
  • the method provided by the embodiment of the present disclosure can be applied to a terminal device, such as a smart phone, in which a video application is running.
  • a video special effects control or function button shown as "Special Effect Control #1" in the figure
  • the terminal device starts the camera page to shoot a video, and performs real-time special effects processing according to the content in the shot video image, and generates the corresponding target special effects in real time in the video image.
  • an output video with video special effects is generated.
  • the output video is saved on the server. Users can save, forward or share this output video with video special effects to achieve the purpose of video generation and publishing.
  • the video special effects in the prior art are usually generated based on the image information in the target video and cannot be associated with the user's voice, resulting in problems such as poor special effects visual effects and low interactivity.
  • the disclosed embodiment provides a video special effects display method to solve the above-mentioned problem.
  • FIG. 2 is a flow chart of a video special effect display method provided by an embodiment of the present disclosure.
  • the method of this embodiment can be applied in a terminal device, and the video special effect display method includes:
  • Step S101 Acquire a target video and extract user voice in the target video.
  • Step S102 generating at least one target text corresponding to the content of the user voice according to the user voice in the target video.
  • the user triggers the special effect control in the target application by operating the terminal device, and then starts the camera page to shoot a video, thereby obtaining the target video.
  • the target video can be a portrait video shot by the user, that is, a video containing the facial image of the target user, wherein the target user is the user who makes a user voice. More specifically, for example, the content of the target video is a "New Year's greeting video", in short, the target user says a sentence of New Year's greeting, and the terminal device is aimed at the target user to shoot a video, thereby obtaining the target video, and the target user appears in the target video. This case is shown in FIG1.
  • the target video is a non-portrait video, that is, the camera unit of the terminal device is not aimed at the target user to shoot, and the target user who makes a user voice does not appear in the target video, but only the user voice made by the target user is recorded.
  • the target video shot by the terminal device contains the user voice made by the target user, and then the sound channel data in the target video is extracted to obtain the user voice. Further, after obtaining the user voice, the user voice is recognized, and at least one target text corresponding to the content of the user voice can be obtained.
  • the target video may be a part of a complete video captured by the terminal device.
  • the terminal device After the terminal device starts the camera page, it continues to capture the video (for example, for a total of 30 seconds). During this process, the terminal device obtains a preset duration (for example, 1 second).
  • the video clip can be the target video.
  • the terminal device processes the video clip (target video) of the preset length to obtain the corresponding target text and displays it with special effects.
  • the user voice corresponding to one or more video clips between the target videos can be referred to on the basis of the user voice in the target video to jointly generate the target text. This process step is not repeated.
  • step S102 includes:
  • Step S1021 Perform voice recognition on the user's voice to obtain a corresponding voice text, which includes at least one alternative text.
  • Step S1022 Detect alternative characters in the voice text, and when the alternative characters are the preset first keywords, determine the alternative characters as target characters.
  • the user voice is subjected to voice recognition to obtain the voice text corresponding to the voice content of the user voice.
  • voice recognition is a prior art known to those skilled in the art and will not be repeated here.
  • the voice text includes one or more alternative characters, and then the alternative characters are detected. If the alternative characters are the preset first keywords, they are extracted as the target characters. If they are not the first keywords, they are ignored and not processed. Specifically, for example, after the user voice is subjected to voice recognition, the voice text obtained is "I wish you all a happy New Year".
  • Each Chinese character is an alternative character, that is, there are 8 alternative characters in total.
  • each alternative character is further detected based on the first keyword, wherein the first keyword includes, for example, "Happy New Year”, so 4 alternative characters "Happy New Year” among the 8 alternative characters are used as target characters.
  • the first keyword includes, for example, "Happy New Year”
  • 4 alternative characters "Happy New Year” among the 8 alternative characters are used as target characters.
  • the voice text generated by the user's voice is further screened and the keyword text therein is extracted as the target text, thereby improving the information display efficiency of the text special effects, reducing the display of useless and low-information text, reducing the display density of text maps, and improving the display effect of video special effects.
  • Step S103 Displaying a target map corresponding to the target text in the target video word by word, wherein the target map is centered on the target area in the target video and moves outward along the target trajectory.
  • the terminal device After the terminal device obtains the target video through the camera unit, it plays the target video in real time and simultaneously converts the target text obtained in the previous step into the corresponding target map and renders it to the target image.
  • the target video the special effects of the target video are generated.
  • the target map is rendered into the target video in a word-by-word display manner. This process can be achieved by inputting each target text into a processing queue, and rendering it into a target map in sequence according to the processing queue for display. The specific implementation process will not be repeated.
  • the target map is synchronously controlled to move from the target area of the target video to the outside of the target area, forming a visual motion effect of the target map.
  • Figure 4 is a schematic diagram of a target map moving in a target video provided by an embodiment of the present disclosure.
  • the target user who makes a user voice does not appear in the target video, that is, the target video does not include the user's facial image of the target user.
  • the target area is, for example, a circular area with the center of the target video as the origin.
  • the target map corresponding to the target text appears from the target area and moves around the outside of the target area, and gradually approaches the edge of the target video.
  • the target map corresponding to the target word "new” moves to the upper left position in the target video; the target map corresponding to the target word "year” moves to the lower left position in the target video; the target map corresponding to the target word “fast” moves to the upper right position in the target video; the target map corresponding to the target word “happy” moves to the lower right position in the target video.
  • the target map moves along the target path during the movement process.
  • the target path can be generated before the target map moves. For example, according to the generation position of each target word, the corresponding target path is generated, and then the target map is controlled to move along the pre-generated target path.
  • the target path can be a straight path, that is, a path moving along a straight line to the outside of the target area; it can also be a curved path, such as a path that surrounds the target area and makes a curved motion to the outside of the target area.
  • the target path can be randomly generated or determined according to a preset function.
  • the target path is not determined before the target map moves, and is generated after the target map starts to move.
  • the target map can be generated in the target area in the target video and move based on the corresponding moving direction.
  • the appearance position and moving direction of the target map can be randomly generated or determined according to a preset function, which is not limited here.
  • the target video is a video containing a user's facial image, that is, a target user who speaks a user voice appears in the target video.
  • the target area is the mouth area in the user's facial image, and the target area is determined based on feature recognition of the user's facial image in the target video.
  • the specific process is prior art and will not be described in detail. That is, the target map corresponding to the target text moves from the mouth area of the target user in the target video to the surrounding areas.
  • the specific implementation of step S103 includes:
  • Step S1031 Determine the user's facial orientation based on the user's facial image in the target video.
  • Step S1032 while playing the target video, display the target map, and control the target map to move outward along the facial direction starting from the mouth area.
  • the user's facial image can be one or more video frames in the target video.
  • the normal vector of the spatial plane corresponding to the face of the target user in the three-dimensional space of the camera corresponding to the target video that is, the facial orientation
  • the specific implementation method of obtaining the facial orientation of the character in the video based on the video is a prior art known to those skilled in the art and will not be repeated here.
  • the target map is controlled to move along the facial orientation with any point in the mouth area as the starting point of movement to achieve the movement of the target map.
  • FIG. 6 is a schematic diagram of another target map moving in the target video provided by an embodiment of the present disclosure.
  • the mouth area Z and the facial orientation are determined to be V.
  • the moving starting point of each target map is randomly generated in the mouth area Z, and an angle offset is randomly added based on the facial orientation V to obtain the moving direction of each target map.
  • each target map is controlled to move.
  • the target map corresponding to the target text " ⁇ ” has a corresponding moving starting point Z_2 and a moving direction V_2. This achieves the outward movement of the target map.
  • the moving starting point and moving direction of the target map are determined according to the mouth area and facial orientation of the target character in the target video, so that the movement of the target map matches the facial shape of the character in the target video, forming a realistic visual effect of "text jumping out of the user's mouth", and improving the visual expressiveness of the special effects.
  • a target video is obtained, and at least one target text corresponding to the content of the user voice is generated according to the user voice in the target video; the target video is played, and a target map corresponding to the target text is displayed word by word in the target video, wherein the target map is centered on the target area in the target video and moves toward the edge of the target video.
  • a target map corresponding to the target text is generated and displayed, thereby realizing the user
  • the visual special effect of the voice controls the target map to move outward with the target area in the target video as the center in a word-by-word dynamic display mode, thereby improving the visual display effect of the visual special effect and increasing the interactivity of the video special effect and the target video.
  • FIG. 7 is a second flow chart of a method for displaying video special effects provided by an embodiment of the present disclosure. Based on the embodiment shown in FIG. 2 , this embodiment further refines step S102 , and the method for displaying video special effects includes:
  • Step S201 Acquire and play the target video.
  • Step S202 generating at least one target text corresponding to the content of the user voice according to the user voice in the target video.
  • Step S203 Obtain the moving starting point of the target map within the target area.
  • a moving starting point corresponding to the target map is randomly generated.
  • the target area is a ring area, and the specific implementation of step S203 includes:
  • Step S2031 Obtain the inner diameter length and outer diameter length corresponding to the target area.
  • Step S2032 Based on the inner diameter length and the outer diameter length, a target radius is randomly obtained, and the length of the target radius is between the inner diameter length and the outer diameter length.
  • Step S2033 Generate a moving starting point according to the target radius and the pre-generated target angle.
  • FIG9 is a schematic diagram of a target area provided by an embodiment of the present disclosure.
  • the target area is an annular area, which is surrounded by an inner ring C1 with a smaller radius and an outer ring C2 with a larger radius.
  • the area between the inner ring C1 and the outer ring C2 is the target area (annular area).
  • the target video is a video containing a user's facial image, and the target area can be determined based on the user's mouth area in the user's facial image.
  • the center point corresponding to the user's mouth contour is determined, and then based on the center point, the inner ring and the outer ring are determined using the preset first radius and second radius, thereby determining the target area.
  • the first radius is, for example, the inner diameter length
  • the second radius is, for example, the outer diameter length.
  • a target radius is randomly determined within the length interval formed by the inner diameter length and the outer diameter length of the target area.
  • the inner diameter length is 10 (preset unit, the same below)
  • a target radius is randomly determined, for example, 12.
  • the preset angle range for example, 0 to 2 ⁇
  • a point can be uniquely determined in the target area, which is the moving starting point.
  • the moving starting point is determined by a target radius randomly determined within the annular area and a target angle randomly generated, wherein, since the target radius is generated based on an annular area with a preset inner diameter length and an outer diameter length, the value range of the target radius is controlled, and the target radius is within a reasonable range. Due to the limitation of the inner ring radius, the moving starting point will not be too close to the center point of the circular area (i.e., the mouth area), thereby reducing the overlap of the target map during the movement process and improving the visual effect of the text special effect.
  • step S2032 includes:
  • Step S2032A square the inner diameter length and the outer diameter length to obtain the corresponding square inner diameter length and square outer diameter length respectively;
  • Step S2032B obtaining a square value interval based on the square inner diameter length and the square outer diameter length, and randomly obtaining a square radius value within the square value interval;
  • Step S2032C Calculate the square root of the plane radius value to obtain the target radius.
  • the inner diameter length is 1 and the outer diameter length is 10
  • the inner diameter length and the outer diameter length are squared, and the corresponding square inner diameter length is 1 and the square outer diameter length is 100.
  • the square of the square radius value is taken, and its arithmetic square root is calculated to obtain the target radius. For example, if the square radius value is 81, the corresponding target radius is 9.
  • FIG. 11 is a distribution diagram of a moving starting point provided by an embodiment of the present disclosure.
  • the distribution density of the moving starting point is proportional to the square of the radius, that is, the larger the radius, the greater the probability of the moving starting point appearing.
  • the distribution of the moving starting point is proportional to the square of the radius.
  • the randomly obtained target radius is no longer linearly distributed, but the points are more sparse at positions with smaller radii, and more dense at positions with larger radii, making the distribution of the moving starting points in the target area more even, thus showing better rationality.
  • Step S204 Obtain the moving direction corresponding to the moving starting point of the target map.
  • step S204 includes:
  • Step S2041 Determine a corresponding target deflection angle according to a first distance between the moving starting point and the edge of the target area.
  • the target deflection angle represents the angle at which the moving trajectory of the target map deflects toward the edge of the target video.
  • the target deflection angle is proportional to the first distance.
  • Step S2042 Obtain the target space angle, where the target space angle is the three-dimensional space angle corresponding to the normal vector at the moving starting point in the three-dimensional space plane of the camera where the target area is located.
  • Step S2043 Determine the moving direction according to the vector sum of the target space angle and the target deflection angle.
  • the corresponding deflection angle i.e., the target deflection angle
  • the target deflection angle can be set based on the first distance between the moving starting point and the edge of the target area, wherein the target deflection angle represents the angle at which the moving trajectory of the target map deflects toward the edge of the target video, and the target deflection angle corresponds to a preset value interval, such as [0,15], i.e., the target deflection angle takes values between 0 and 15 degrees.
  • the target area is an annular area, and the larger the first distance from the moving starting point to the outer edge of the target area, the smaller the target deflection angle; conversely, the smaller the first distance from the moving starting point to the outer edge of the target area, the larger the target deflection angle.
  • Figure 13 is a schematic diagram of a target deflection angle provided by an embodiment of the present disclosure. As shown in Figure 13, the distance between the moving starting point P1 and the outer edge of the target area is L1.
  • the corresponding target deflection angle can also be set by obtaining the second distance between the moving starting point and the inner edge of the target area, that is, the larger the second distance between the moving starting point and the inner edge of the target area, the larger the target deflection angle; and the larger the second distance between the moving starting point and the inner edge of the target area, the larger the target deflection angle.
  • the smaller the distance the smaller the target deflection angle.
  • the target space angle corresponding to the moving starting point is obtained.
  • the target space angle represents the facial orientation or mouth orientation of the user in the target video; when the target user in the target video is facing the camera, the target space angle is, for example, 0 degrees; when the facial orientation of the target user in the target video does not change relative to the shooting direction of the camera unit of the terminal device, the target space angle is unchanged, that is, the target space angles corresponding to each moving starting point in the target area are consistent.
  • the target space angle is the three-dimensional space angle corresponding to the normal vector at the moving starting point in the three-dimensional space plane of the camera where the target area is located.
  • the target space angle can be obtained by parsing the user's facial image in the target video.
  • the specific implementation method can be found in the relevant introduction of the step of obtaining the user's facial orientation in the embodiment shown in FIG. 2, which will not be repeated here.
  • the vector sum of the target space angle and the target deflection angle is calculated to obtain the moving direction corresponding to the target map.
  • the movement of the target map is controlled based on the moving direction obtained by the vector sum of the target space angle and the target deflection angle. This can not only make the moving direction of the target map consistent with the orientation of the user's mouth and face, thereby improving authenticity and matching, but also reduce the probability of overlapping the moving trajectories of each target map, avoiding mutual occlusion between maps, thereby improving the visual performance of special effects and the display clarity of text information.
  • Step S205 After the target map is displayed at the moving starting point, the target map is controlled to move based on the moving direction.
  • step S205 includes:
  • Step S2051 obtaining control parameters, where the control parameters are used to characterize target conditions for stopping display of a target map.
  • Step S2052 Control the target texture to move toward the edge of the target video based on the control parameters until the target condition is reached.
  • the appearance position and the moving direction of the target map can be determined according to the moving starting point and the first direction, but the end position of the target map is still uncertain. Therefore, the control parameters of the target map can be further obtained to determine the end position of the target map, that is, the target condition for stopping displaying the target map.
  • control parameters include moving duration and/or moving distance; the moving duration represents the duration of the target texture moving toward the edge of the target video; the moving distance represents the distance of the target texture moving toward the target video.
  • the continuous distance of the edge movement of the video Specifically, for example, when the target sticker moves in the moving direction for a continuous time of 3 seconds (movement time), and/or when the distance of the target sticker in the moving direction reaches 100 unit distances (movement distance), the target sticker is stopped from being displayed, the target sticker disappears from the target video, and the display process of the target sticker for the target text ends.
  • the moving time and moving distance in the control parameters can be fixed preset values, or they can be random values obtained within the corresponding value ranges based on the preset moving time value range and moving distance value range, i.e., random moving time and random moving distance.
  • the moving speed of the target map during the moving process i.e., the ratio of the moving distance to the moving time
  • the target maps corresponding to different target texts can present different movement distances and speeds, achieve randomized operation effects, and improve the visual expressiveness of the special effects.
  • Step S206 in the process of controlling the movement of the target texture, setting the text attribute corresponding to the target texture based on the moving distance of the target texture.
  • the text attributes corresponding to the target texture can be updated simultaneously, so that the text form of the target texture representing the target text changes, for example, as the moving distance of the target texture increases, the transparency increases, the color gradually changes, etc., thereby further improving the visual expressiveness of the text special effect.
  • the text attributes include at least one of the following: font transparency, font color, font size.
  • step S201 the method further includes:
  • Step S207 displaying prompt information corresponding to the second keyword in the target video.
  • Step S208 When the target text is a preset second keyword, a target environment special effect corresponding to the second keyword is displayed in the target video, wherein the target environment special effect includes a music special effect and/or a map special effect corresponding to the second keyword.
  • a prompt message can be displayed in the camera interface to instruct the user to read out the second keyword corresponding to the prompt message, so as to guide the user to correctly use the video special effect.
  • the terminal device extracts the user's voice and recognizes it according to the obtained target video.
  • the target text is compared with the second keyword. If the target text is consistent with the second keyword, it means that the user has read the second keyword indicated by the prompt message, and then the second keyword is played.
  • the corresponding target environment special effects are used to further improve the visual performance.
  • the prompt information includes the text "Please read out loud [Happy New Year]".
  • the target text extracted from the target video includes the second keyword "Happy New Year”
  • the music corresponding to the second keyword "Happy New Year” is played, and sticker special effects, such as fireworks special effects, are displayed in the target video. This enables interaction with users, improves user participation, and improves the visual expressiveness of special effects.
  • step S201 is consistent with step S101 in the above embodiment.
  • step S201 please refer to the discussion of step S201, which will not be repeated here.
  • the video special effects display method, device, electronic device and storage medium obtained in this embodiment obtain a target video and extract the user voice in the target video; generate at least one target text corresponding to the content of the user voice according to the user voice in the target video; and display a target map corresponding to the target text in the target video word by word, wherein the target map moves outward along a target track with the target area in the target video as the center. Since the user voice in the target video is converted into the corresponding target text, a target map corresponding to the target text is generated for display, thereby realizing the visualization special effects of the user voice.
  • the target map is controlled to move outward along the target track with the target area in the target video as the center in a word-by-word dynamic display manner, thereby improving the visual display effect of the visualization special effects and increasing the interactivity between the video special effects and the target video.
  • FIG15 is a structural block diagram of a video special effects display device provided by an embodiment of the present disclosure. For ease of explanation, only the parts related to the embodiment of the present disclosure are shown. Referring to FIG15 , the video special effects display device 3 includes:
  • the voice module 31 is used to obtain the target video and extract the user voice in the target video;
  • a processing module 32 configured to generate at least one target text corresponding to the content of the user voice according to the user voice in the target video;
  • the display module 33 is used to display the target map corresponding to the target text in the target video word by word, wherein the target map is centered on the target area in the target video and moves outward along the target track.
  • the target video is a video containing a user's facial image; the target area is the mouth area in the user's facial image; when the display module 33 displays the target map corresponding to the target text word by word in the target video, it is specifically used to: determine the user's facial orientation based on the user's facial image in the target video; and control the target map to move along the facial orientation with the mouth area as the starting point.
  • the processing module 32 generates a video based on the user's voice in the target video.
  • generating at least one target text corresponding to the content of the user's voice it is specifically used for: performing voice recognition on the user's voice to obtain a corresponding voice text, wherein the voice text includes at least one alternative text; detecting the alternative text in the voice text, and when the alternative text is a preset first keyword, determining the alternative text as the target text.
  • the display module 33 is also used for: when the target text is a preset second keyword, displaying a target environment special effect corresponding to the second keyword in the target video, wherein the target environment special effect includes a music special effect and/or a sticker special effect corresponding to the second keyword; before displaying the target environment special effect corresponding to the second keyword in the target video, the display module 33 is also used for: displaying prompt information corresponding to the second keyword in the target video.
  • the display module 33 when the display module 33 displays the target map corresponding to the target text word by word in the target video, it is specifically used to: obtain the moving starting point and the corresponding moving direction of the target map in the target area; after displaying the target map at the moving starting point, control the movement of the target map based on the moving direction.
  • the display module 33 when the display module 33 obtains the moving starting point and the corresponding moving direction of the target map in the target area, it is specifically used to: randomly generate a moving starting point corresponding to the target map in the target area; obtain the moving direction according to the distance between the moving starting point and the edge of the target area, wherein the moving direction represents the angle between the moving path of the target map and the plane where the target area is located in the preset camera three-dimensional space.
  • the target area is an annular area; when the display module 33 randomly generates a moving starting point corresponding to the target map within the target area, it is specifically used to: obtain the inner diameter length and outer diameter length corresponding to the target area; based on the inner diameter length and the outer diameter length, randomly obtain the target radius, and the length of the target radius is between the inner diameter length and the outer diameter length; generate a moving starting point according to the target radius and the pre-generated target angle.
  • the display module 33 when the display module 33 randomly obtains the target radius based on the inner diameter length and the outer diameter length, it is specifically used to: obtain the corresponding square inner diameter length and square outer diameter length for the inner diameter length and the outer diameter length, respectively; obtain a square value interval based on the square inner diameter length and the square outer diameter length, and randomly obtain a square radius value within the square value interval; obtain the target radius based on the square root operation result of the plane radius value.
  • the display module 33 when the display module 33 obtains the moving direction according to the distance between the moving starting point and the edge of the target area, it is specifically used to: A first distance is used to determine a corresponding target deflection angle, where the target deflection angle represents the angle at which the moving trajectory of the target map deflects toward the edge of the target video, and the target deflection angle is proportional to the first distance; a target spatial angle is obtained, where the target spatial angle is a three-dimensional spatial angle corresponding to a normal vector at the starting point of the movement in the three-dimensional spatial plane of the camera where the target area is located; and a moving direction is determined according to a vector sum of the target spatial angle and the target deflection angle.
  • the display module 33 is further used to: in the process of controlling the movement of the target map, set the text attributes corresponding to the target map based on the moving distance of the target map; wherein the text attributes include at least one of the following: font transparency, font color, font size.
  • the display module 33 when controlling the target map to move toward the target video, is specifically used to: obtain control parameters, the control parameters are used to characterize the target conditions for stopping displaying the target map; based on the control parameters, control the target map to move toward the edge of the target video; wherein the control parameters include moving duration and/or moving distance; the moving duration characterizes the duration of the target map moving toward the edge of the target video; the moving distance characterizes the distance of the target map moving toward the edge of the target video.
  • the voice module 31, the processing module 32 and the display module 33 are connected.
  • the video special effect display device 3 provided in this embodiment can implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be repeated in this embodiment.
  • FIG16 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure. As shown in FIG16 , the electronic device 4 includes:
  • the memory 42 stores computer executable instructions
  • the processor 41 executes the computer-executable instructions stored in the memory 42 to implement the video special effects display method in the embodiments shown in Figures 2 to 14.
  • processor 41 and the memory 42 are connected via a bus 43 .
  • An embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
  • the computer execution instructions are executed by a processor, they are used to implement the video special effects display method provided by any one of the embodiments corresponding to Figures 2 to 14 of the present application.
  • the electronic device 900 may be a terminal device or a server.
  • the terminal device may include
  • the present invention includes but is not limited to mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (Portable Media Players, PMPs), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG17 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random access memory (RAM) 903.
  • a processing device e.g., a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • Various programs and data required for the operation of the electronic device 900 are also stored in the RAM 903.
  • the processing device 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904.
  • the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 907 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 908 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 909.
  • the communication device 909 may allow the electronic device 900 to communicate with other devices wirelessly or by wire to exchange data.
  • FIG. 17 shows an electronic device 900 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902.
  • the processing device 901 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • the substance may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof.
  • More specific examples of computer-readable storage media may include, but are not limited to, an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, device, or device.
  • the program code embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being installed in the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device executes the method shown in the above embodiment.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer via any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet Service Provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • Internet Service Provider e.g., via the Internet using an Internet Service Provider
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
  • the name of a unit does not limit the unit itself in some cases.
  • the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chip
  • CPLDs complex programmable logic devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • a method for displaying video special effects comprising:
  • the method comprises the following steps: detecting a user voice in the target video, generating at least one target word corresponding to the content of the user voice; and displaying a target map corresponding to the target word in the target video word by word, wherein the target map is centered on a target area in the target video and moves outward along a target trajectory.
  • the target video is a video containing a user's facial image; the target area is the mouth area in the user's facial image; and the target map corresponding to the target text is displayed word by word in the target video, including: determining the user's facial orientation based on the user's facial image in the target video; and controlling the target map to move along the facial orientation with the mouth area as the starting point.
  • generating at least one target text corresponding to the content of the user voice in the target video includes: performing voice recognition on the user voice to obtain a corresponding voice text, wherein the voice text includes at least one alternative text; detecting the alternative text in the voice text, and when the alternative text is a preset first keyword, determining the alternative text as the target text.
  • the method further includes: when the target text is a preset second keyword, displaying a target environment special effect corresponding to the second keyword in the target video, wherein the target environment special effect includes a music special effect and/or a sticker special effect corresponding to the second keyword; before displaying the target environment special effect corresponding to the second keyword in the target video, the method further includes: displaying prompt information corresponding to the second keyword in the target video.
  • displaying the target map corresponding to the target text in the target video word by word includes: obtaining a moving starting point and a corresponding moving direction of the target map in the target area; after displaying the target map at the moving starting point, controlling the movement of the target map based on the moving direction.
  • obtaining the moving starting point and the corresponding moving direction of the target map within the target area includes: randomly generating a moving starting point corresponding to the target map within the target area; obtaining the moving direction based on the distance between the moving starting point and the edge of the target area, wherein the moving direction represents the angle between the moving path of the target map and the plane where the target area is located in a preset camera three-dimensional space.
  • the target area is a ring area; within the target area, randomly generating a moving starting point corresponding to the target map, including: obtaining the target area The inner diameter length and the outer diameter length corresponding to the domain; based on the inner diameter length and the outer diameter length, randomly obtain the target radius, and the length of the target radius is between the inner diameter length and the outer diameter length; according to the target radius and the pre-generated target angle, generate the moving starting point.
  • the target radius is randomly obtained based on the inner diameter length and the outer diameter length, including: obtaining the corresponding square inner diameter length and square outer diameter length for the inner diameter length and the outer diameter length, respectively; obtaining a square value interval based on the square inner diameter length and the square outer diameter length, and randomly obtaining a square radius value within the square value interval; and obtaining the target radius based on the result of the square root operation of the plane radius value.
  • the moving direction is obtained according to the distance between the moving starting point and the edge of the target area, including: determining a corresponding target deflection angle according to a first distance between the moving starting point and the edge of the target area, the target deflection angle characterizing the angle at which the moving trajectory of the target map is deflected toward the edge of the target video, and the target deflection angle is proportional to the first distance; obtaining a target spatial angle, the target spatial angle being a three-dimensional spatial angle corresponding to the normal vector at the moving starting point in the three-dimensional spatial plane of the camera where the target area is located; and determining the moving direction according to the vector sum of the target spatial angle and the target deflection angle.
  • the method further includes: in the process of controlling the movement of the target map, setting text attributes corresponding to the target map based on the moving distance of the target map; wherein the text attributes include at least one of the following: font transparency, font color, and font size.
  • controlling the target map to move outward along the target trajectory includes: acquiring a control parameter, wherein the control parameter is used to characterize a target condition for stopping displaying the target map; controlling the movement of the target map based on the control parameter; wherein the control parameter includes a movement duration and/or a movement distance; the movement duration characterizes a duration of movement of the target map; and the movement distance characterizes a distance of movement of the target map.
  • a video special effects display device comprising:
  • a voice module used to obtain a target video and extract the user's voice in the target video
  • a processing module configured to generate at least one target text corresponding to the content of the user voice according to the user voice in the target video;
  • a display module is used to display the target text corresponding to the target text word by word in the target video.
  • the target video is a video containing a user's facial image; the target area is the mouth area in the user's facial image; when the display module displays the target map corresponding to the target text word by word in the target video, it is specifically used to: determine the user's facial orientation based on the user's facial image in the target video; and control the target map to move along the facial orientation with the mouth area as the starting point.
  • the processing module when the processing module generates at least one target text corresponding to the content of the user voice based on the user voice in the target video, it is specifically used to: perform voice recognition on the user voice to obtain a corresponding voice text, wherein the voice text includes at least one alternative text; detect the alternative text in the voice text, and when the alternative text is a preset first keyword, determine the alternative text as the target text.
  • the display module is also used to: when the target text is a preset second keyword, display the target environment special effects corresponding to the second keyword in the target video, wherein the target environment special effects include music special effects and/or map special effects corresponding to the second keyword; before displaying the target environment special effects corresponding to the second keyword in the target video, the display module is also used to: display prompt information corresponding to the second keyword in the target video.
  • the display module when the display module displays the target map corresponding to the target text in the target video word by word, it is specifically used to: obtain the moving starting point and the corresponding moving direction of the target map in the target area; after displaying the target map at the moving starting point, control the movement of the target map based on the moving direction.
  • the display module when the display module obtains the moving starting point and the corresponding moving direction of the target map in the target area, it is specifically used to: randomly generate a moving starting point corresponding to the target map in the target area; obtain the moving direction according to the distance between the moving starting point and the edge of the target area, wherein the moving direction represents the angle between the moving path of the target map and the plane where the target area is located in the preset camera three-dimensional space.
  • the target area is an annular area; when the display module randomly generates a moving starting point corresponding to the target map within the target area, it is specifically used to: obtain the inner diameter length and outer diameter length corresponding to the target area; based on the inner diameter length and the outer diameter length, randomly obtain the target radius, and the length of the target radius is between the inner diameter length and the outer diameter length; generate a moving starting point according to the target radius and a pre-generated target angle.
  • the display module when the display module randomly obtains the target radius based on the inner diameter length and the outer diameter length, it is specifically used to: obtain the corresponding square inner diameter length and square outer diameter length for the inner diameter length and the outer diameter length, respectively; obtain a square value interval based on the square inner diameter length and the square outer diameter length, and randomly obtain a square radius value within the square value interval; obtain the target radius based on the square root operation result of the plane radius value.
  • the display module when it obtains the moving direction based on the distance between the moving starting point and the edge of the target area, it is specifically used to: determine the corresponding target deflection angle based on the first distance between the moving starting point and the edge of the target area, the target deflection angle represents the angle at which the moving trajectory of the target map is deflected toward the edge of the target video, and the target deflection angle is proportional to the first distance; obtain the target space angle, the target space angle is the three-dimensional space angle corresponding to the normal vector at the moving starting point in the three-dimensional space plane of the camera where the target area is located; determine the moving direction based on the vector sum of the target space angle and the target deflection angle.
  • the display module is further used to: in the process of controlling the movement of the target map, set the text attributes corresponding to the target map based on the moving distance of the target map; wherein the text attributes include at least one of the following: font transparency, font color, font size.
  • the display module when controlling the target map to move toward the target video, is specifically used to: obtain control parameters, the control parameters are used to characterize the target conditions for stopping displaying the target map; based on the control parameters, control the target map to move toward the edge of the target video; wherein the control parameters include moving duration and/or moving distance; the moving duration characterizes the duration of the target map moving toward the edge of the target video; the moving distance characterizes the distance of the target map moving toward the edge of the target video.
  • an electronic device comprising: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory to implement the video special effects display method described in the first aspect and various possible designs of the first aspect.
  • a computer-readable storage medium stores computer execution instructions.
  • the video special effects display method described in the first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, When the computer program is executed by a processor, the video special effects display method described in the first aspect and various possible designs of the first aspect is implemented.

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

本公开实施例提供一种视频特效显示方法、装置、电子设备及存储介质,通过获取目标视频,并提取目标视频中的用户语音;根据目标视频中的用户语音,生成与用户语音的内容对应的至少一个目标文字;在目标视频中逐字显示目标文字对应的目标贴图,其中,目标贴图以目标视频中的目标区域为中心,沿目标轨迹向外移动。由于将目标视频中的用户语音转换为对应的目标文字后,生成目标文字对应的目标贴图进行显示,实现了用户语音的可视化特效,同时以逐字动态显示的方式,控制目标贴图以目标视频中的目标区域为中心,向目标视频的边缘移动,提高该可视化特效的视觉展示效果,增加视频特效与目标视频的互动性。

Description

视频特效显示方法、装置、电子设备及存储介质
本申请要求2022年12月23日递交的、标题为“视频特效显示方法、装置、电子设备及存储介质”、申请号为202211668451.3的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及互联网技术领域,尤其涉及一种视频特效显示方法、装置、电子设备及存储介质。
背景技术
当前,在各种视频类应用程序(Application,APP)中,会为用户提供用户投稿的功能界面,用于可以在该功能界面内进行视频拍摄和编辑,其中包括为拍摄的目标视频添加视频特效。
在一些相关的方案中,基于用户选择的具体的特效道具,会在目标视频中生成对应类型的特效贴图,例如烟花特效、灯光特效等,从而使目标视频具有更好的视觉表现力。
然而,现有技术中的视频特效,无法与用户语音产生关联,存在特效视觉效果差,互动性低等问题。
发明内容
本公开实施例提供一种视频特效显示方法、装置、电子设备及存储介质,以克服特效视觉效果差,互动性低的问题。
第一方面,本公开实施例提供一种视频特效显示方法,包括:
获取目标视频,并提取所述目标视频中的用户语音;根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字;在所述目标视频中逐字显示所述目标文字对应的目标贴图,其中,所述目标贴图以所述目标视频中的目标区域为中心,沿目标轨迹向外移动。
第二方面,本公开实施例提供一种视频特效显示装置,包括:
语音模块,用于获取目标视频,并提取所述目标视频中的用户语音;
处理模块,用于根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字;
显示模块,用于在所述目标视频中逐字显示所述目标文字对应的目标贴图,其中,所述目标贴图以所述目标视频中的目标区域为中心,沿目标轨迹向外移动。
第三方面,本公开实施例提供一种电子设备,包括:
处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的视频特效显示方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的视频特效显示方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的视频特效显示方法。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的视频特效显示方法的一种应用场景图;
图2为本公开实施例提供的视频特效显示方法的流程示意图一;
图3为图2所示实施例中步骤S102的具体实现流程的示意图;
图4为本公开实施例提供的一种目标贴图在目标视频中移动的示意图;
图5为图2所示实施例中步骤S103的具体实现流程的示意图;
图6为本公开实施例提供的另一种目标贴图在目标视频中移动的示意图;
图7为本公开实施例提供的视频特效显示方法的流程示意图二;
图8为图7所示实施例中步骤S203的具体实现流程的示意图;
图9为本公开实施例提供的一种目标区域的示意图;
图10为图8所示实施例中步骤S2032的具体实现流程的示意图;
图11为本公开实施例提供的一种移动起点的分布示意图;
图12为图7所示实施例中步骤S204的具体实现流程的示意图;
图13为本公开实施例提供的一种目标偏转角的示意图;
图14为图7所示实施例中步骤S205的具体实现流程的示意图;
图15为本公开实施例提供的视频特效显示装置的结构框图;
图16为本公开实施例提供的一种电子设备的结构示意图;
图17为本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。
下面对本公开实施例的应用场景进行解释:
图1为本公开实施例提供的视频特效显示方法的一种应用场景图,本公开实施例提供的视频特效显示方法,可以应用于视频编辑、视频直播等应用场景中。具体地,如图1所示,本公开实施例提供的方法,可以应用于终端设备,例如智能手机,终端设备内运行有视频类的应用程序,通过在应用程序界面内触发该视频类应用程序中的视频特效控件或功能按钮(图中示为“特效控件#1”)后,终端设备启动相机页面进行视频拍摄,并根据拍摄的视频图像中的内容,进行实时特效处理,在视频图像中实时生成对应的目标特效, 最终生成一段带有视频特效的输出视频,之后,输出视频被保存在服务端,用户可将此段带有视频特效的输出视频进行保存、转发或分享,实现视频生成和发布的目的。
现有技术中,基于用户在应用内触发的特效道具和控件的类型,可以实现在目标视频中生成对应类型的特效贴图,例如烟花特效、灯光特效等,从而使目标视频具有更好的视觉表现力。然而,现有技术中的视频特效通常是基于目标视频中的图像信息生成的,无法与用户语音产生关联,导致存在特效视觉效果差,互动性低等问题。
本公开实施例提供一种视频特效显示方法以解决上述问题。
参考图2,图2为本公开实施例提供的视频特效显示方法的流程示意图一。本实施例的方法可以应用在终端设备中,该视频特效显示方法包括:
步骤S101:获取目标视频,并提取目标视频中的用户语音。
步骤S102:根据目标视频中的用户语音,生成与用户语音的内容对应的至少一个目标文字。
示例性地,参考图1所示应用场景图,用户通过操作终端设备,触发目标应用程序中的特效控件后,启动相机页面进行视频拍摄,从而获得目标视频。一种可能的情况中,目标视频可以是用户的拍摄的人像视频,即包含目标用户的面部图像的视频,其中,目标用户即发出用户语音的用户,更具体地,例如,目标视频的内容为“拜年视频”,简言之,即目标用户说出拜年的语句,终端设备对准目标用户进行视频拍摄,从而得到目标视频,目标用户在目标视频中出现,此种情况例如图1中所示。另一种可能的情况中,目标视频是非人像视频,即终端设备的相机单元不对准目标用户进行拍摄,目标视频中不出现发出用户语音的目标用户,而仅收录目标用户发出的用户语音。针对上述两种可能的情况,终端设备在拍摄的目标视频中,均包含目标用户发出的用户语音,之后,对目标视频中的声音通道数据进行提取,即可得到用户语音。进一步地,获得用户语音后,对用户语音进行语音识别,即可得到用户语音的内容对应的至少一个目标文字。
其中,需要说的是,目标视频可以是终端设备拍摄的完整视频的一部分,例如,在图1所示应用场景中,终端设备启动相机页面后,持续进行视频拍摄(例如共持续30秒),在该过程中,终端设备得到的预设时长(例如1秒) 的视频片段,即可以为目标视频,终端设备针对该预设时长的视频片段(目标视频)进行处理,获得对应的目标文字并进行特效显示。当然,可以理解的时,为了实现更好的语音识别效果和语义准确性,在基于目标视频中的用户语音生成对应的目标文字的过程中,可以在目标视频中的用户语音的基础上,参考目标视频之间的一个或多个视频片段对应的用户语音,共同生成目标文字,此过程步骤进行赘述。
在一种可能的实现方式中,如图3所示,步骤S102的具体实现方式包括:
步骤S1021:对用户语音进行语音识别,得到对应的语音文本,语音文本中包括至少一个备选文字。
步骤S1022:检测语音文本中的备选文字,当备选文字为预设的第一关键字时,将备选文字确定为目标文字。
示例性地,在提取出目标视频中的用户语音后,对用户语音进行语音识别,得到用户语音的语音内容对应的语音文本,语音识别的具体实现方式为本领域技术人员知晓的现有技术,此处不再赘述。其中,语音文本中包括一个或多个备选文字,之后,对备选文字进行检测,备选文字为预设的第一关键字,则提取出作为目标文字,若非第一关键字,则忽略不作处理。具体地,例如,对用户语音进行语音识别后,得到的语音文本为“我祝大家新年快点”。其中每一个汉字为一个备选文字,即共8个备选文字。之后,基于第一关键字对各备选文字做进一步检测,其中,第一关键字例如包括“新年快乐”,因此,将8个备选文字中的4个备选文字“新年快乐”作为目标文字。在后续对目标文字进行显示过程中,仅将“新年快乐”四个汉字显示在目标视频中,而备选文字“我祝大家”则进行忽略,不进行显示。
本实施例步骤中,通过对用户语音生成的语音文本做进一步的筛选,将其中的关键字性文字提取作为目标文字,从而提高文字特效的信息展示效率,减少无用和低信息量的文字展示,降低文字贴图的显示密度,提高视频特效的展示效果。
步骤S103:在目标视频中逐字显示目标文字对应的目标贴图,其中,目标贴图以目标视频中的目标区域为中心,沿目标轨迹向外移动。
示例性地,终端设备在通过相机单元获取目标视频后,实时对目标视频进行播放,并同步将之前步骤中获的目标文字转换为对应的目标贴图渲染至 目标视频中,生成目标视频的特效。其中,目标贴图以逐字显示的方式渲染至目标视频中,该过程可以通过将各目标文字输入处理队列,并根据处理队列依次渲染为目标贴图进行显示的方式实现,具体实现过程不再赘述。同时,针对每一出现的目标贴图,同步控制目标贴图从目标视频的目标区域中向目标区域的外部移动,形成目标贴图视觉上的运动效果。图4为本公开实施例提供的一种目标贴图在目标视频中移动的示意图,如图4所示,在一种可能的实现方式中,目标视频中不出现发出用户语音的目标用户,即目标视频中不包括目标用户的用户面部图像,此种情况下,目标区域例如为以目标视频的中心为原点的圆形区域,目标文字对应的目标贴图,从该目标区域内出现,并向该目标区域的外部四周移动,而逐渐靠近目标视频的边缘。具体地,参考图4中所示,目标文字“新”对应的目标贴图向目标视频中的左上位置移动;目标文字“年”对应的目标贴图向目标视频中的左下位置移动;目标文字“快”对应的目标贴图向目标视频中的右上位置移动;目标文字“乐”对应的目标贴图向目标视频中的右下位置移动。其中,目标贴图在移动过程中沿目标路径移动,一种可能的实现方式中,该目标路径可以是在目标贴图移动前生成的,例如,根据各目标文字的生成位置,生成对应的目标路径,之后控制目标贴图沿该预生成的目标路径进行移动,进一步地,该目标路径可以是直线路径,即沿直线向目标区域外部外移动的路径;也可以是曲线路径,例如环绕目标区域,并向目标区域外部做曲线运动的路径,目标路径可以是随机生成的,也可以是根据预设的函数确定的。在另一种可能的实现方式中,该目标路径在目标贴图移动前不确定,而在目标贴图开始移动后才随之生成。进一步地,目标贴图在目标视频中,可以在目标区域内生成,并基于对应的移动方向进行移动,目标贴图的出现位置以及移动方向可以是随机生成的,也可以是根据预设的函数确定的,此处不进行限制。
在另一种可能的实现方式中,目标视频为包含用户面部图像的视频,即目标视频中出现发出用户语音的目标用户。此种情况下,目标区域为用户面部图像中的嘴部区域,该目标区域是基于对目标视频中的用户面部图像进行特征识别后确定的,具体过程为现有技术,不再赘述。即目标文字对应的目标贴图从目标视频中的目标用户的嘴部区域向四周移动。具体地,如图5所示,步骤S103的具体实现方式包括:
步骤S1031:根据目标视频中的用户面部图像,确定用户的面部朝向。
步骤S1032:在播放目标视频的同时,显示目标贴图,并以嘴部区域为起点,控制目标贴图沿面部朝向向外移动。
示例性地,用户面部图像可以为目标视频中的一个或多个视频帧,通过对目标视频的视频帧进行特征识别和空间映射,可以得到目标视频对应的相机三维空间中,目标用户的面部所对应的空间平面的法向量,即面部朝向。基于视频得到视频中人物的面部朝向的具体实现方式为本领域技术人员知晓的现有技术,此处不再赘述。之后,在播放目标视频的同时,示例性地,控制目标贴图以嘴部区域中的任一点为移动起点,沿面部朝向移动,实现目标贴图的移动。可选地,同时,针对每一目标贴图,在沿面部朝向移动的同时,施加一个随机的偏移角度,从而使各目标贴图的运行轨迹不发生重合,提高文字展示清晰度,进而提高特效的视觉效果。图6为本公开实施例提供的另一种目标贴图在目标视频中移动的示意图,如图6所示,根据目标视频中的用户面部图像,确定嘴部区域Z和面部朝向为V。之后,针对每一目标贴图,在嘴部区域Z中,随机生成各目标贴图的移动起点,并在面部朝向V的基础上,随机增加一个角度偏移量,得到各目标贴图的移动方向,之后,基于移动起点和移动方向,控制各目标贴图进行移动。如图中所示,目标文字“新”对应的目标贴图,其对应的移动起点为Z_1,移动方向为V_1,其中,V_1=V+rand,rand为预设范围内的随机角度值。类似的,目标文字“年”对应的目标贴图,其对应的移动起点为Z_2,移动方向为V_2。从而实现目标贴图的向外移动。
本实施例步骤中,通过结合目标视频的内容,根据目标视频中目标人物的嘴部区域和面部朝向,来确定目标贴图的移动起点和移动方向,从而使目标贴图的移动与目标视频中的人物面部形态相匹配,形成“文字从用户嘴中跳出”的逼真视觉效果,提高特效的视觉表现力。
在本实施例中,通过获取目标视频,并根据目标视频中的用户语音,生成与用户语音的内容对应的至少一个目标文字;播放目标视频,并在目标视频中逐字显示目标文字对应的目标贴图,其中,目标贴图以目标视频中的目标区域为中心,向目标视频的边缘移动。由于将目标视频中的用户语音转换为对应的目标文字后,生成目标文字对应的目标贴图进行显示,实现了用户 语音的可视化特效,同时以逐字动态显示的方式,控制目标贴图以目标视频中的目标区域为中心,向外移动,提高该可视化特效的视觉展示效果,增加视频特效与目标视频的互动性。
参考图7,图7为本公开实施例提供的视频特效显示方法的流程示意图二。本实施例在图2所示实施例的基础上,进一步对步骤S102进行细化,该视频特效显示方法包括:
步骤S201:获取并播放目标视频。
步骤S202:根据目标视频中的用户语音,生成与用户语音的内容对应的至少一个目标文字。
步骤S203:获取目标贴图在目标区域内的移动起点。
示例性地,在目标区域内,随机生成目标贴图对应的移动起点。一种可能的实现方式中,如图8所示,目标区域为环形区域,步骤S203的具体实现方式包括:
步骤S2031:获取目标区域对应的内径长度和外径长度。
步骤S2032:基于内径长度和外径长度,随机获取目标半径,目标半径的长度位于内径长度和外径长度之间。
步骤S2033:根据目标半径与预生成的目标角度,生成移动起点。
示例性地,图9为本公开实施例提供的一种目标区域的示意图,如图9所示,目标区域为环形区域,该目标区域由一个半径较小的内圆环C1和一个半径较大的外圆环C2围成,该内圆环C1与外圆环C2之间的区域,即为目标区域(环形区域)。其中,本实施例中,目标视频为包含用户面部图像的视频,该目标区域可以基于用户面部图像中用户的嘴部区域确定。具体地,例如,通过对目标视频中的用户面部图像进行图像识别,确定用户嘴部轮廓对应的中心点,之后基于该中心点,利用预设的第一半径和第二半径,确定内圆环和外圆环,进而确定该目标区域。其中,第一半径例如为内径长度、第二半径例如为外径长度。
确定目标区域后,在该目标区域的内径长度和外径长度形成的长度区间内,随机确定一个目标半径,例如,内径长度为10(预设单位,下同),外径长度为20,在构成的半径长度区间为P1=(10,20)。之后在该长度区间P1内,随机确定目标半径,例如为12。进一步地,在预设角度范围(例如0至 2π)内,再随机生成一个角度,作为目标角度,之后根据该目标半径和目标角度,即可在该目标区域内,唯一确定一个点,即移动起点。
本实施例步骤中,移动起点是通过在环形区域内随机确定的目标半径和随机生成的目标角度而确定的,其中,由于目标半径是基于一个预设内径长度和外径长度的环形区域产生的,从而实现了对目标半径的取值区间的控制,时目标半径处于一个合理的范围内,由于内环半径的限制,使移动起点不会过于靠近圆环区域(也即嘴部区域)的中心点,从而减少目标贴图在移动过程中的重叠,提高文字特效的视觉效果。
进一步地,在一种可能的实现方式中,如图10所示,步骤S2032的具体实现方式包括:
步骤S2032A:对内径长度和外径长度取平方,分别得到对应的平方内径长度和平方外径长度;
步骤S2032B:基于平方内径长度和平方外径长度,得到平方取值区间,并在平方取值区间内,随机获取平方半径值;
步骤S2032C:计算平面半径值的开平方,得到目标半径。
示例性地,例如,内径长度为1,外径长度为10,分别对内径长度和外径长度取平方后,得到对应的平方内径长度为1,平方外径长度为100。之后,在平方取值区间P2=(1,100)内,随机获取一个数值,即平方半径值。之后,对平方半径值开平方,计算其算术平方根,得到目标半径,例如,平方半径值为81,则对应的目标半径为9。
由于在圆形区域内,以“半径+角度”的方式随机区取点的过程中,若半径对应的取值空间内的取值点线性分布,会导致在半径较小的位置处取点更加密集,而在半径较大的位置处,取点更加稀疏,导致取点分布不合理。而本实施例中,在以随机的方式获得目标半径的过程中,通过计算平方的方式得到平方取值区间,再从平方取值区间中随机取出平方半径值,之后对平方半径值进行开平方逆运算,得到落入环形区域内的目标半径。图11为本公开实施例提供的一种移动起点的分布示意图,如图11所示,在多次随机获取移动起点的过程中,以移动方向为R1为例,在由内圆环C1和外圆环C2形成的目标区域内的R1方向上,移动起点分布密度与半径平方成正比,即半径越大,移动起点的出现概率越大,在移动方向为R2和R3上,移动起点的分布 规律相同,不再赘述。通过上述方式,随机获得的目标半径不再是线性分布,而是在半径较小的位置处,取点更加稀疏,而在半径较大的位置,取点更加密集,处使目标区域内的移动起点的分布更加平均,从而呈现出更好的合理性。
步骤S204:获取目标贴图的移动起点对应的移动方向。
示例性地,移动起点对应的移动方向,是指目标贴图在移动起点开始移动时的方式,改移动方向可以通过三维空间向量来表示。如图12所示,步骤S204的具体实现方式包括:
步骤S2041:根据移动起点与目标区域的边缘的第一距离,确定对应的目标偏转角,目标偏转角度表征目标贴图的移动轨迹向目标视频的边缘偏转的角度,目标偏转角与第一距离成正比。
步骤S2042:获取目标空间角,目标空间角为目标区域所在的相机三维空间平面内,移动起点处的法向量对应的三维空间角度。
步骤S2043:根据目标空间角和目标偏转角的向量和,确定移动方向。
示例性地,在获取移动起点后,为了使从不同移动起点开始移动的目标贴图的移动路径不一致,减少贴图之间的遮挡,可以基于移动起点与目标区域的边缘的第一距离,来设置对应的偏转角,即目标偏转角,其中,目标偏转角表征目标贴图的移动轨迹向目标视频的边缘偏转的角度,该目标偏转角对应有预设的取值区间,例如[0,15],即目标偏转角在0度至15度之间取值。更具体地,目标区域为环形区域,当移动起点距离目标区域的外边缘的第一距离越大,则目标偏转角越小;反之,当移动起点距离目标区域的外边缘的第一距离越小,则目标偏转角越大。图13为本公开实施例提供的一种目标偏转角的示意图,如图13所示,移动起点P1距离目标区域的外边缘的距离为L1,根据预设的映射关系,得到移动起点P1对应的目标偏转角为phi_1=10,即移动起点P1对应的目标偏转角为10度;移动起点P2距离目标区域的外边缘的距离为L2,其中,L2大于L1,则根据预设的映射关系,得到移动起点P2对应的目标偏转角为phi_2=3,即移动起点P2对应的目标偏转角为3度。当然,可以理解的是,也可以通过获取移动起点与目标区域的内边缘的第二距离,来设置对应的目标偏转角,即当移动起点距离目标区域的内边缘的第二距离越大,则目标偏转角越大;当移动起点距离目标区域的内边缘的第二 距离越小,则目标偏转角越小。具体实现方式与上述实施例中所示类似,不再赘述。
之后,获取移动起点对应的目标空间角,示例性地,目标空间角表征目标视频中用户的面部朝向或嘴部朝向;在目标视频中的目标用户正对相机的情况下,该目标空间角例如为0度;在在目标视频中的目标用户的面部朝向相对终端设备的相机单元的拍摄方向不发生改变的情况下,目标空间角是不变的,也即,在目标区域中各移动起点对应的目标空间角是一致的。更具体地,目标空间角为目标区域所在的相机三维空间平面内,移动起点处的法向量对应的三维空间角度,目标空间角可以通过解析目标视频中的用户面部图像而获得,具体实现方式可参见图2所示实施例中获得用户面部朝向的步骤中的相关介绍,此处不再赘述。
之后,计算目标空间角和目标偏转角的向量和,得到目标贴图所对应的移动方向,之后基于该由目标空间角和目标偏转角的向量和得到的移动方向控制目标贴图移动,既可以使目标贴图的移动方向与用户的嘴部和面部的朝向一种,提高真实性和匹配性,又能够降低各目标贴图的移动轨迹发生重合的概率,避免出现贴图之间的相互遮挡,从而提高特效的视觉表现效果和文字信息的展示清晰度。
步骤S205:在移动起点显示目标贴图后,基于移动方向,控制目标贴图移动。
示例性地,如图14所示,步骤S205的具体实现方式包括:
步骤S2051:获取控制参数,控制参数用于表征停止显示目标贴图的目标条件。
步骤S2052:基于控制参数控制目标贴图向目标视频的边缘移动,直至达到目标条件。
示例性地,在得到移动起点和对应的移动方向后,根据该移动起点和第一方向即可确定目标贴图的出现位置和移动的方向,但目标贴图的结束位置仍不确定,因此,可以进一步的获取目标贴图的控制参数,来确定目标贴图的结束位置,即停止显示目标贴图的目标条件。
示例性地,其中,控制参数包括移动时长和/或移动距离;移动时长表征目标贴图向目标视频的边缘移动的持续时长;移动距离表征目标贴图向目标 视频的边缘移动的持续距离。具体地,例如,即当目标贴图向移动方向移动的持续时长达到3秒(移动时长)后,和/或,当目标贴图向移动方向的距离达到100个单位距离(移动距离)后,停止显示该目标贴图,目标贴图从目标视频中消失,针对目标文字的目标贴图的展示过程结束。
其中,进一步地,控制参数中的移动时长、移动距离可以是固定的预设值,也可以是基于预设的移动时长取值范围和移动距离取值范围,在对应的取值范围内,获取的随机值,即随机移动时长和随机移动距离。随着移动时长和移动距离发生变化时,大概率意味着目标贴图移动过程中的移动速度(即移动距离和移动时长的比值)随之发生变化,因此,通过获取随机移动时长和随机移动距离,可以使不同目标文字对应的目标贴图,呈现出不同的运动距离、速度、实现随机化的运行效果,提高特效的视觉表现力。
步骤S206:在控制目标贴图移动的过程中,基于目标贴图的移动距离,设置目标贴图对应的文字属性。
进一步地,在控制目标贴图向目标视频的边缘移动的过程中,随着目标贴图的移动,可以同时更新该目标贴图对应的文字属性,从而使表现目标文字的目标贴图的文字形态发生改变,例如随着目标贴图的移动距离的增加,透明度主键增大,颜色逐渐变化等。从而进一步地提高文字特效的视觉表现力。示例性地,其中,文字属性包括以下至少一种:字体透明度、字体颜色、字体大小。
可选地,在步骤S201之后,还包括:
步骤S207:在目标视频中显示第二关键字对应的提示信息。
步骤S208:当目标文字为预设的第二关键字时,在目标视频中显示与第二关键字对应的目标环境特效,其中,目标环境特效包括第二关键字对应的音乐特效和/或贴图特效。
示例性地,另一方面,在播放目标视频的过程中,为了进一步的提高与用户之间的互动,可以在相机界面内显示提示信息,从而指示用户念出提示信息对应的第二关键字,实现引导用户正确使用该视频特效的目的。同时,终端设备根据获得的目标视频,提取出用户语音并进行识别,得到目标文字后,将目标文字与该第二关键字进行对比,若目标文字与第二关键字一致,则说明用户年初了提示信息所指示的第二关键字,之后,播放该第二关键字 对应的目标环境特效,来进一步地提高视觉表现效果。例如,提示信息例如包括文字“请大声念出【新年快乐】”。其中的第二关键字为“新年快乐”之后,若从目标视频中提取到的目标文字中,包括该第二关键字“新年快乐”,则播放第二关键字“新年快乐”对应的音乐,并在目标视频中显示贴图特效,例如焰火特效等。从而实现与用户的互动,提高用户参与感,并提高特效的视觉表现力。
在本实施例中,步骤S201与上述实施例中步骤S101的一致,详细论述请参考步骤S201的论述,这里不再赘述。
本实施例提供的视频特效显示方法、装置、电子设备及存储介质,通过获取目标视频,并提取所述目标视频中的用户语音;根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字;在所述目标视频中逐字显示所述目标文字对应的目标贴图,其中,所述目标贴图以所述目标视频中的目标区域为中心,沿目标轨迹向外移动。由于将目标视频中的用户语音转换为对应的目标文字后,生成目标文字对应的目标贴图进行显示,实现了用户语音的可视化特效,同时以逐字动态显示的方式,控制目标贴图以目标视频中的目标区域为中心,沿目标轨迹向外移动,提高该可视化特效的视觉展示效果,增加视频特效与目标视频的互动性。
对应于上文实施例的视频特效显示方法,图15为本公开实施例提供的视频特效显示装置的结构框图。为了便于说明,仅示出了与本公开实施例相关的部分。参照图15,视频特效显示装置3,包括:
语音模块31,用于获取目标视频,并提取目标视频中的用户语音;
处理模块32,用于根据目标视频中的用户语音,生成与用户语音的内容对应的至少一个目标文字;
显示模块33,用于在目标视频中逐字显示目标文字对应的目标贴图,其中,目标贴图以目标视频中的目标区域为中心,沿目标轨迹向外移动。
在本公开的一个实施例中,目标视频为包含用户面部图像的视频;目标区域为用户面部图像中的嘴部区域;显示模块33在目标视频中逐字显示目标文字对应的目标贴图时,具体用于:根据目标视频中的用户面部图像,确定用户的面部朝向;以嘴部区域为起点,控制目标贴图沿面部朝向移动。
在本公开的一个实施例中,处理模块32在根据目标视频中的用户语音, 生成与用户语音的内容对应的至少一个目标文字时,具体用于:对用户语音进行语音识别,得到对应的语音文本,语音文本中包括至少一个备选文字;检测语音文本中的备选文字,当备选文字为预设的第一关键字时,将备选文字确定为目标文字。
在本公开的一个实施例中,显示模块33,还用于:当目标文字为预设的第二关键字时,在目标视频中显示与第二关键字对应的目标环境特效,其中,目标环境特效包括第二关键字对应的音乐特效和/或贴图特效;在目标视频中显示与第二关键字对应的目标环境特效之前,显示模块33还用于:在目标视频中显示第二关键字对应的提示信息。
在本公开的一个实施例中,显示模块33在目标视频中逐字显示目标文字对应的目标贴图时,具体用于:获取目标贴图在目标区域内的移动起点和对应的移动方向;在移动起点显示目标贴图后,基于移动方向,控制目标贴图移动。
在本公开的一个实施例中,显示模块33在获取目标贴图在目标区域内的移动起点和对应的移动方向时,具体用于:在目标区域内,随机生成目标贴图对应的移动起点;根据移动起点与目标区域的边缘的距离,得到移动方向,其中,移动方向表征在预设的相机三维空间内,目标贴图的移动路径与目标区域所在平面的夹角。
在本公开的一个实施例中,目标区域为环形区域;显示模块33在目标区域内,随机生成目标贴图对应的移动起点时,具体用于:获取目标区域对应的内径长度和外径长度;基于内径长度和外径长度,随机获取目标半径,目标半径的长度位于内径长度和外径长度之间;根据目标半径与预生成的目标角度,生成移动起点。
在本公开的一个实施例中,显示模块33在基于内径长度和外径长度,随机获取目标半径时,具体用于:对内径长度和外径长度,分别得到对应的平方内径长度和平方外径长度;基于平方内径长度和平方外径长度,得到平方取值区间,并在平方取值区间内,随机获取平方半径值;根据平面半径值的开平方运行结果,得到目标半径。
在本公开的一个实施例中,显示模块33在根据移动起点与目标区域的边缘的距离,得到移动方向时,具体用于:根据移动起点与目标区域的边缘的 第一距离,确定对应的目标偏转角,目标偏转角度表征目标贴图的移动轨迹向目标视频的边缘偏转的角度,目标偏转角与第一距离成正比;获取目标空间角,目标空间角为目标区域所在的相机三维空间平面内,移动起点处的法向量对应的三维空间角度;根据目标空间角和目标偏转角的向量和,确定移动方向。
在本公开的一个实施例中,显示模块33,还用于:在控制目标贴图移动的过程中,基于目标贴图的移动距离,设置目标贴图对应的文字属性;其中,文字属性包括以下至少一种:字体透明度、字体颜色、字体大小。
在本公开的一个实施例中,显示模块33在控制目标贴图向目标视频移动时,具体用于:获取控制参数,控制参数用于表征停止显示目标贴图的目标条件;基于控制参数控制目标贴图向目标视频的边缘移动;其中,控制参数包括移动时长和/或移动距离;移动时长表征目标贴图向目标视频的边缘移动的持续时长;移动距离表征目标贴图向目标视频的边缘移动的持续距离。
其中,语音模块31、处理模块32和显示模块33连接。本实施例提供的视频特效显示装置3可以执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
图16为本公开实施例提供的一种电子设备的结构示意图,如图16所示,该电子设备4包括:
处理器41,以及与处理器41通信连接的存储器42;
存储器42存储计算机执行指令;
处理器41执行存储器42存储的计算机执行指令,以实现如图2-图14所示实施例中的视频特效显示方法。
其中,可选地,处理器41和存储器42通过总线43连接。
相关说明可以对应参见图2-图14所对应的实施例中的步骤所对应的相关描述和效果进行理解,此处不做过多赘述。
本公开实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,计算机执行指令被处理器执行时用于实现本申请图2-图14所对应的实施例中任一实施例提供的视频特效显示方法。
参考图17,其示出了适于用来实现本公开实施例的电子设备900的结构示意图,该电子设备900可以为终端设备或服务器。其中,终端设备可以包 括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图17示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图17所示,电子设备900可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(ReadOnly Memory,简称ROM)902中的程序或者从存储装置908加载到随机访问存储器(Random Access Memory,简称RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有电子设备900操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。
通常,以下装置可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置906;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备900与其他设备进行无线或有线通信以交换数据。虽然图17示出了具有各种装置的电子设备900,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM 902被安装。在该计算机程序被处理装置901执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介 质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上***(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行***、装置或设备使用或与指令执行***、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体***、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供了一种视频特效显示方法,包括:
获取目标视频,并提取所述目标视频中的用户语音;根据所述目标视频 中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字;在所述目标视频中逐字显示所述目标文字对应的目标贴图,其中,所述目标贴图以所述目标视频中的目标区域为中心,沿目标轨迹向外移动。
根据本公开的一个或多个实施例,所述目标视频为包含用户面部图像的视频;所述目标区域为所述用户面部图像中的嘴部区域;所述在所述目标视频中逐字显示所述目标文字对应的目标贴图,包括:根据所述目标视频中的用户面部图像,确定用户的面部朝向;以所述嘴部区域为起点,控制所述目标贴图沿所述面部朝向移动。
根据本公开的一个或多个实施例,所述根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字,包括:对所述用户语音进行语音识别,得到对应的语音文本,所述语音文本中包括至少一个备选文字;检测所述语音文本中的备选文字,当所述备选文字为预设的第一关键字时,将所述备选文字确定为所述目标文字。
根据本公开的一个或多个实施例,所述方法还包括:当所述目标文字为预设的第二关键字时,在所述目标视频中显示与所述第二关键字对应的目标环境特效,其中,所述目标环境特效包括所述第二关键字对应的音乐特效和/或贴图特效;在所述目标视频中显示与所述第二关键字对应的目标环境特效之前,所述方法还包括:在所述目标视频中显示所述第二关键字对应的提示信息。
根据本公开的一个或多个实施例,所述在所述目标视频中逐字显示所述目标文字对应的目标贴图,包括:获取所述目标贴图在所述目标区域内的移动起点和对应的移动方向;在所述移动起点显示所述目标贴图后,基于所述移动方向,控制所述目标贴图移动。
根据本公开的一个或多个实施例,所述获取所述目标贴图在所述目标区域内的移动起点和对应的移动方向,包括:在所述目标区域内,随机生成所述目标贴图对应的移动起点;根据所述移动起点与所述目标区域的边缘的距离,得到所述移动方向,其中,所述移动方向表征在预设的相机三维空间内,所述目标贴图的移动路径与所述目标区域所在平面的夹角。
根据本公开的一个或多个实施例,所述目标区域为环形区域;在所述目标区域内,随机生成所述目标贴图对应的移动起点,包括:获取所述目标区 域对应的内径长度和外径长度;基于所述内径长度和所述外径长度,随机获取目标半径,所述目标半径的长度位于所述内径长度和所述外径长度之间;根据所述目标半径与预生成的目标角度,生成所述移动起点。
根据本公开的一个或多个实施例,所述基于所述内径长度和所述外径长度,随机获取目标半径,包括:对所述内径长度和所述外径长度,分别得到对应的平方内径长度和平方外径长度;基于所述平方内径长度和所述平方外径长度,得到平方取值区间,并在所述平方取值区间内,随机获取平方半径值;根据所述平面半径值的开平方运行结果,得到所述目标半径。
根据本公开的一个或多个实施例,所述根据所述移动起点与所述目标区域的边缘的距离,得到所述移动方向,包括:根据所述移动起点与所述目标区域的边缘的第一距离,确定对应的目标偏转角,所述目标偏转角度表征所述目标贴图的移动轨迹向目标视频的边缘偏转的角度,所述目标偏转角与所述第一距离成正比;获取目标空间角,所述目标空间角为所述目标区域所在的相机三维空间平面内,所述移动起点处的法向量对应的三维空间角度;根据所述目标空间角和所述目标偏转角的向量和,确定所述移动方向。
根据本公开的一个或多个实施例,所述方法还包括:在控制所述目标贴图移动的过程中,基于所述目标贴图的移动距离,设置所述目标贴图对应的文字属性;其中,所述文字属性包括以下至少一种:字体透明度、字体颜色、字体大小。
根据本公开的一个或多个实施例,所述控制所述目标贴图沿目标轨迹向外移动,包括:获取控制参数,所述控制参数用于表征停止显示所述目标贴图的目标条件;基于所述控制参数控制所述目标贴图移动;其中,所述控制参数包括移动时长和/或移动距离;所述移动时长表征所述目标贴图移动的持续时长;所述移动距离表征所述目标贴图移动的持续距离。
第二方面,根据本公开的一个或多个实施例,提供了一种视频特效显示装置,包括:
语音模块,用于获取目标视频,并提取所述目标视频中的用户语音;
处理模块,用于根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字;
显示模块,用于在所述目标视频中逐字显示所述目标文字对应的目标贴 图,其中,所述目标贴图以所述目标视频中的目标区域为中心,沿目标轨迹向外移动。
在本公开的一个实施例中,目标视频为包含用户面部图像的视频;目标区域为用户面部图像中的嘴部区域;显示模块在目标视频中逐字显示目标文字对应的目标贴图时,具体用于:根据目标视频中的用户面部图像,确定用户的面部朝向;以嘴部区域为起点,控制目标贴图沿面部朝向移动。
在本公开的一个实施例中,处理模块在根据目标视频中的用户语音,生成与用户语音的内容对应的至少一个目标文字时,具体用于:对用户语音进行语音识别,得到对应的语音文本,语音文本中包括至少一个备选文字;检测语音文本中的备选文字,当备选文字为预设的第一关键字时,将备选文字确定为目标文字。
在本公开的一个实施例中,显示模块,还用于:当目标文字为预设的第二关键字时,在目标视频中显示与第二关键字对应的目标环境特效,其中,目标环境特效包括第二关键字对应的音乐特效和/或贴图特效;在目标视频中显示与第二关键字对应的目标环境特效之前,显示模块还用于:在目标视频中显示第二关键字对应的提示信息。
在本公开的一个实施例中,显示模块在目标视频中逐字显示目标文字对应的目标贴图时,具体用于:获取目标贴图在目标区域内的移动起点和对应的移动方向;在移动起点显示目标贴图后,基于移动方向,控制目标贴图移动。
在本公开的一个实施例中,显示模块在获取目标贴图在目标区域内的移动起点和对应的移动方向时,具体用于:在目标区域内,随机生成目标贴图对应的移动起点;根据移动起点与目标区域的边缘的距离,得到移动方向,其中,移动方向表征在预设的相机三维空间内,目标贴图的移动路径与目标区域所在平面的夹角。
在本公开的一个实施例中,目标区域为环形区域;显示模块在目标区域内,随机生成目标贴图对应的移动起点时,具体用于:获取目标区域对应的内径长度和外径长度;基于内径长度和外径长度,随机获取目标半径,目标半径的长度位于内径长度和外径长度之间;根据目标半径与预生成的目标角度,生成移动起点。
在本公开的一个实施例中,显示模块在基于内径长度和外径长度,随机获取目标半径时,具体用于:对内径长度和外径长度,分别得到对应的平方内径长度和平方外径长度;基于平方内径长度和平方外径长度,得到平方取值区间,并在平方取值区间内,随机获取平方半径值;根据平面半径值的开平方运行结果,得到目标半径。
在本公开的一个实施例中,显示模块在根据移动起点与目标区域的边缘的距离,得到移动方向时,具体用于:根据移动起点与目标区域的边缘的第一距离,确定对应的目标偏转角,目标偏转角度表征目标贴图的移动轨迹向目标视频的边缘偏转的角度,目标偏转角与第一距离成正比;获取目标空间角,目标空间角为目标区域所在的相机三维空间平面内,移动起点处的法向量对应的三维空间角度;根据目标空间角和目标偏转角的向量和,确定移动方向。
在本公开的一个实施例中,显示模块,还用于:在控制目标贴图移动的过程中,基于目标贴图的移动距离,设置目标贴图对应的文字属性;其中,文字属性包括以下至少一种:字体透明度、字体颜色、字体大小。
在本公开的一个实施例中,显示模块在控制目标贴图向目标视频移动时,具体用于:获取控制参数,控制参数用于表征停止显示目标贴图的目标条件;基于控制参数控制目标贴图向目标视频的边缘移动;其中,控制参数包括移动时长和/或移动距离;移动时长表征目标贴图向目标视频的边缘移动的持续时长;移动距离表征目标贴图向目标视频的边缘移动的持续距离。
第三方面,根据本公开的一个或多个实施例,提供了一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的视频特效显示方法。
第四方面,根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的视频特效显示方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序, 该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的视频特效显示方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (20)

  1. 一种视频特效显示方法,包括:
    获取目标视频,并提取所述目标视频中的用户语音;
    根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字;
    在所述目标视频中逐字显示所述目标文字对应的目标贴图,其中,所述目标贴图以所述目标视频中的目标区域为中心,沿目标轨迹向外移动。
  2. 根据权利要求1所述的方法,其中所述目标视频为包含用户面部图像的视频;所述目标区域为所述用户面部图像中的嘴部区域;
    所述在所述目标视频中逐字显示所述目标文字对应的目标贴图,包括:
    根据所述目标视频中的用户面部图像,确定用户的面部朝向;
    以所述嘴部区域为起点,控制所述目标贴图沿所述面部朝向移动。
  3. 根据权利要求1所述的方法,其中所述根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字,包括:
    对所述用户语音进行语音识别,得到对应的语音文本,所述语音文本中包括至少一个备选文字;
    检测所述语音文本中的备选文字,当所述备选文字为预设的第一关键字时,将所述备选文字确定为所述目标文字。
  4. 根据权利要求1所述的方法,其中所述方法还包括:
    当所述目标文字为预设的第二关键字时,在所述目标视频中显示与所述第二关键字对应的目标环境特效,其中,所述目标环境特效包括所述第二关键字对应的音乐特效和/或贴图特效;
    在所述目标视频中显示与所述第二关键字对应的目标环境特效之前,所述方法还包括:
    在所述目标视频中显示所述第二关键字对应的提示信息。
  5. 根据权利要求1所述的方法,其中所述在所述目标视频中逐字显示所述目标文字对应的目标贴图,包括:
    获取所述目标贴图在所述目标区域内的移动起点和对应的移动方向;
    在所述移动起点显示所述目标贴图后,基于所述移动方向,控制所述目标贴图移动。
  6. 根据权利要求5所述的方法,其中所述获取所述目标贴图在所述目标区域内的移动起点和对应的移动方向,包括:
    在所述目标区域内,随机生成所述目标贴图对应的移动起点;
    根据所述移动起点与所述目标区域的边缘的距离,得到所述移动方向,其中,所述移动方向表征在预设的相机三维空间内,所述目标贴图的移动路径与所述目标区域所在平面的夹角。
  7. 根据权利要求6所述的方法,其中所述目标区域为环形区域;在所述目标区域内,随机生成所述目标贴图对应的移动起点,包括:
    获取所述目标区域对应的内径长度和外径长度;
    基于所述内径长度和所述外径长度,随机获取目标半径,所述目标半径的长度位于所述内径长度和所述外径长度之间;
    根据所述目标半径与预生成的目标角度,生成所述移动起点。
  8. 根据权利要求7所述的方法,其中所述基于所述内径长度和所述外径长度,随机获取目标半径,包括:
    对所述内径长度和所述外径长度,分别得到对应的平方内径长度和平方外径长度;
    基于所述平方内径长度和所述平方外径长度,得到平方取值区间,并在所述平方取值区间内,随机获取平方半径值;
    根据所述平面半径值的开平方运行结果,得到所述目标半径。
  9. 根据权利要求6所述的方法,其中所述根据所述移动起点与所述目标区域的边缘的距离,得到所述移动方向,包括:
    根据所述移动起点与所述目标区域的边缘的第一距离,确定对应的目标偏转角,所述目标偏转角度表征所述目标贴图的移动轨迹向目标视频的边缘偏转的角度,所述目标偏转角与所述第一距离成正比;
    获取目标空间角,所述目标空间角为所述目标区域所在的相机三维空间平面内,所述移动起点处的法向量对应的三维空间角度;
    根据所述目标空间角和所述目标偏转角的向量和,确定所述移动方向。
  10. 根据权利要求6所述的方法,其中所述方法还包括:
    在控制所述目标贴图移动的过程中,基于所述目标贴图的移动距离,设置所述目标贴图对应的文字属性;
    其中,所述文字属性包括以下至少一种:
    字体透明度、字体颜色、字体大小。
  11. 根据权利要求5所述的方法,其中所述控制所述目标贴图移动,包括:
    获取控制参数,所述控制参数用于表征停止显示所述目标贴图的目标条件;
    基于所述控制参数控制所述目标贴图移动;
    其中,所述控制参数包括移动时长和/或移动距离;
    所述移动时长表征所述目标贴图移动的持续时长;
    所述移动距离表征所述目标贴图移动的持续距离。
  12. 一种视频特效显示装置,包括:
    语音模块,用于获取目标视频,并提取所述目标视频中的用户语音;
    处理模块,用于根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字;
    显示模块,用于在所述目标视频中逐字显示所述目标文字对应的目标贴图,其中,所述目标贴图以所述目标视频中的目标区域为中心,沿目标轨迹向外移动。
  13. 一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,使所述电子设备执行动作,所述动作包括:
    获取目标视频,并提取所述目标视频中的用户语音;
    根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字;
    在所述目标视频中逐字显示所述目标文字对应的目标贴图,其中,所述目标贴图以所述目标视频中的目标区域为中心,沿目标轨迹向外移动。
  14. 根据权利要求13所述的电子设备,其中所述目标视频为包含用户面部图像的视频;所述目标区域为所述用户面部图像中的嘴部区域;
    所述在所述目标视频中逐字显示所述目标文字对应的目标贴图,包括:
    根据所述目标视频中的用户面部图像,确定用户的面部朝向;
    以所述嘴部区域为起点,控制所述目标贴图沿所述面部朝向移动。
  15. 根据权利要求13所述的电子设备,其中所述根据所述目标视频中的用户语音,生成与所述用户语音的内容对应的至少一个目标文字,包括:
    对所述用户语音进行语音识别,得到对应的语音文本,所述语音文本中包括至少一个备选文字;
    检测所述语音文本中的备选文字,当所述备选文字为预设的第一关键字时,将所述备选文字确定为所述目标文字。
  16. 根据权利要求13所述的电子设备,其中所述动作还包括:
    当所述目标文字为预设的第二关键字时,在所述目标视频中显示与所述第二关键字对应的目标环境特效,其中,所述目标环境特效包括所述第二关键字对应的音乐特效和/或贴图特效;
    在所述目标视频中显示与所述第二关键字对应的目标环境特效之前,所述方法还包括:
    在所述目标视频中显示所述第二关键字对应的提示信息。
  17. 根据权利要求13所述的电子设备,其中所述在所述目标视频中逐字显示所述目标文字对应的目标贴图,包括:
    获取所述目标贴图在所述目标区域内的移动起点和对应的移动方向;
    在所述移动起点显示所述目标贴图后,基于所述移动方向,控制所述目标贴图移动。
  18. 根据权利要求17所述的电子设备,其中所述获取所述目标贴图在所述目标区域内的移动起点和对应的移动方向,包括:
    在所述目标区域内,随机生成所述目标贴图对应的移动起点;
    根据所述移动起点与所述目标区域的边缘的距离,得到所述移动方向,其中,所述移动方向表征在预设的相机三维空间内,所述目标贴图的移动路径与所述目标区域所在平面的夹角。
  19. 一种计算机可读存储介质,其中所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至11任一项所述的视频特效显示方法。
  20. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器 执行时实现权利要求1至11中任一项所述的视频特效显示方法。
PCT/CN2023/137977 2022-12-23 2023-12-11 视频特效显示方法、装置、电子设备及存储介质 WO2024131585A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211668451.3 2022-12-23
CN202211668451.3A CN115967781A (zh) 2022-12-23 2022-12-23 视频特效显示方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024131585A1 true WO2024131585A1 (zh) 2024-06-27

Family

ID=87357463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/137977 WO2024131585A1 (zh) 2022-12-23 2023-12-11 视频特效显示方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115967781A (zh)
WO (1) WO2024131585A1 (zh)

Also Published As

Publication number Publication date
CN115967781A (zh) 2023-04-14

Similar Documents

Publication Publication Date Title
CN109462776B (zh) 一种视频特效添加方法、装置、终端设备及存储介质
CN109543064B (zh) 歌词显示处理方法、装置、电子设备及计算机存储介质
WO2021004247A1 (zh) 视频封面生成方法、装置及电子设备
CN108831437B (zh) 一种歌声生成方法、装置、终端和存储介质
CN109474850B (zh) 运动像素视频特效添加方法、装置、终端设备及存储介质
CN112669417B (zh) 虚拟形象的生成方法、装置、存储介质及电子设备
TW201909171A (zh) 會話資訊處理方法、裝置、電子設備
US20240121479A1 (en) Multimedia processing method, apparatus, device, and medium
US20210160581A1 (en) Method and data processing apparatus
CN110047121B (zh) 端到端的动画生成方法、装置及电子设备
CN109348277B (zh) 运动像素视频特效添加方法、装置、终端设备及存储介质
CN109600559B (zh) 一种视频特效添加方法、装置、终端设备及存储介质
US20220406311A1 (en) Audio information processing method, apparatus, electronic device and storage medium
WO2023029904A1 (zh) 文本内容匹配方法、装置、电子设备及存储介质
CN113778419B (zh) 多媒体数据的生成方法、装置、可读介质及电子设备
JP2023059937A (ja) データインタラクション方法、装置、電子機器、記憶媒体、および、プログラム
JP7331044B2 (ja) 情報処理方法、装置、システム、電子機器、記憶媒体およびコンピュータプログラム
WO2022242706A1 (zh) 基于多模态的反应式响应生成
WO2023011318A1 (zh) 媒体文件处理方法、装置、设备、可读存储介质及产品
CN112908292A (zh) 文本的语音合成方法、装置、电子设备及存储介质
CN112652041A (zh) 虚拟形象的生成方法、装置、存储介质及电子设备
WO2024078293A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2024088100A1 (zh) 特效处理方法、装置、电子设备和存储介质
WO2024131585A1 (zh) 视频特效显示方法、装置、电子设备及存储介质
WO2023061229A1 (zh) 视频生成方法及设备