CN112883223A

CN112883223A - Audio display method and device, electronic equipment and computer storage medium

Info

Publication number: CN112883223A
Application number: CN201911199903.6A
Authority: CN
Inventors: 胡凌峰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-06-01

Abstract

The embodiment of the application provides an audio display method and device, electronic equipment and a computer storage medium. The audio display method comprises the following steps: acquiring beat data corresponding to human voice audio from target audio containing the human voice audio; and in the playing process of the target audio, performing first visual display corresponding to the rhythm of the beat data according to the beat data. The scheme that this embodiment provided need not to use other accompaniment audio, only can carry out first visual show according to the beat data that the voice audio corresponds, and can make the user accurately feel the rhythm of voice audio through first visual show, has improved user experience.

Description

Audio display method and device, electronic equipment and computer storage medium

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to an audio display method and device, electronic equipment and a computer storage medium.

Background

In order to better show a music effect to a user in the conventional music playing, a corresponding spectrogram and the like are often shown while music is played. When the accompaniment and the vocal sound of music are full, a plurality of sounds can be distinguished according to bass, alto, mediant and treble, so that a spectrogram is formed according to the distinguished sounds, and the rhythm of the music are displayed through displaying a picture corresponding to the spectrogram.

However, some existing music does not include accompaniment, for example, some songs that are sung only have voice but do not include accompaniment, and voice is not easily distinguished according to bass, mediant and treble, so that when the music is displayed through a spectrogram, rhythm of the music cannot be accurately expressed, and user experience is reduced.

Disclosure of Invention

In view of the above, an embodiment of the present invention provides an audio display method, an audio display apparatus, an electronic device and a computer storage medium, so as to overcome the defect that the rhythm of the audio lacking the accompaniment cannot be accurately displayed in the prior art.

The embodiment of the application provides an audio display method, which comprises the following steps: acquiring beat data corresponding to human voice audio from target audio containing the human voice audio; and in the playing process of the target audio, performing first visual display corresponding to the rhythm of the beat data according to the beat data.

The embodiment of the application provides an audio frequency display device, it includes: the acquisition module is used for acquiring beat data corresponding to the human voice audio from target audio containing the human voice audio; and the display module is used for performing first visual display corresponding to the rhythm of the beat data according to the beat data in the playing process of the target audio.

An embodiment of the present application provides an electronic device, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the audio presentation method.

An embodiment of the present application provides a computer storage medium, on which a computer program is stored, which when executed by a processor implements the audio presentation method as described above.

According to the scheme provided by the embodiment, the beat data corresponding to the human voice audio is acquired from the target audio containing the human voice audio, and then the first visual display corresponding to the rhythm of the beat data is performed according to the beat data in the playing process of the target audio, so that other accompaniment audio is not needed, the first visual display can be performed only according to the beat data corresponding to the human voice audio, and the user can accurately feel the rhythm of the human voice audio through the first visual display, so that the user experience is improved; especially, when the target audio is a chorus song, the scheme provided by the embodiment can still perform the first visual display corresponding to the rhythm of the chorus song according to the beat data, so that the user experience is improved.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

FIG. 1a is a schematic diagram of an audio frequency display method according to an embodiment of the present application;

FIG. 1b is a schematic diagram of a usage scenario in the first embodiment;

FIG. 1c is a schematic view of another usage scenario in the first embodiment;

FIG. 1d is a schematic view of a visual presentation in the use scenario of FIGS. 1b and 1 c;

FIG. 2a is a schematic diagram of an audio frequency display method according to a second embodiment of the present application;

FIG. 2b is a schematic view of a usage scenario;

FIG. 2c is a schematic view of a visual display effect in the usage scenario of FIG. 2 b;

FIG. 3a is a schematic diagram of an interface during a song playing process;

FIG. 3b is a schematic view of a visual display effect during a song playing process;

FIG. 3c is a schematic view of another visual display effect during the playing process of a song;

FIG. 3d is a schematic view of another visual display effect during the playing process of a song;

FIG. 3e is a diagram illustrating a relationship between a moving distance threshold of a singing Note and a Note;

FIG. 4 is a schematic structural diagram of an audio frequency display device according to a fourth embodiment of the present application;

fig. 5 is a hardware structure diagram of some electronic devices that execute the audio presentation method according to the present application.

Detailed Description

It is not necessary for any particular embodiment of the invention to achieve all of the above advantages at the same time.

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

Example one

FIG. 1a is a schematic diagram of an audio frequency display method according to an embodiment of the present application; as shown in fig. 1a, it comprises the following steps:

s102, obtaining beat data corresponding to the human voice audio from target audio containing the human voice audio.

In this embodiment, the target audio may be an audio including a vocal sound obtained in any manner, and the vocal sound included in the target audio may be an original vocal sound audio, a vocal flipping audio, a vocal sound audio of a self-created song, and the like, which is not limited in this embodiment.

The beat data corresponding to the human voice audio is used for representing the beat of the human voice audio, and the beat refers to a beat which periodically appears and has certain strength and weakness, and is a unit for measuring the rhythm.

In this embodiment, the obtaining manner of the beat data corresponding to the human voice audio may be, for example: if the target audio is the published audio known to the public, acquiring beat data of the target audio through any resource channel, and taking the acquired beat data as beat data corresponding to the human voice audio; if the target audio is the audio recorded by the user, the beat data corresponding to the human voice audio can be obtained according to the setting of the user, and can also be obtained by performing rhythm analysis on the target audio through any appropriate algorithm.

And S104, in the playing process of the target audio, performing first visual display corresponding to the rhythm of the beat data according to the beat data.

The beat data is used for representing the beat of the human voice audio, and the beat can be used for measuring the rhythm, so that in the playing process of the target audio, first visual display corresponding to the rhythm of the beat data can be carried out according to the beat data, the content displayed to a user is controlled to change along with the playing of the target audio according to the beat data, the displayed content of the first visual display corresponds to the rhythm of the human voice audio, and the user experience is improved.

The first visual display can be any visual display, and can correspond to the rhythm of the beat data, for example, animation content changing along with the beat data can be directly displayed through a screen to perform the first visual display, and the first visual display can also be performed through light changing along with the rhythm of the beat data.

For example, when the user X wants to play song a, it may request song a from the cloud through its terminal a, as shown in fig. 1 b; after receiving the request of the user X, the cloud returns corresponding data to the terminal A, wherein the corresponding data comprises audio data of the song A and beat data of the song A; after receiving the data returned by the cloud, the terminal a plays the song a through the corresponding APP, and in the playing process of the song a, the terminal a simultaneously displays the corresponding special effect according to the beat data of the song a, and the playing interface is as shown in fig. 1b, wherein the displayed special effect is schematically shown in the dashed line frame of fig. 1 b. In one possible way, the effect of displaying the special effect is shown in fig. 1 d.

For another example, as shown in fig. 1c, when the user X performs a triggering operation of recording a song a through the terminal a, in step S1, the cloud sends beat data corresponding to the song a to the terminal a, so that the corresponding APP capable of recording the song obtains the beat data corresponding to the song a, then in step S2, the user X sings the song a and records a voice audio into the terminal a, and the terminal a may display a page being recorded through the interface 1. After the user X finishes recording, the terminal may display a recording completion page through the interface 2, and may execute step S3, send the recorded target audio (including voice audio and beat data) to the cloud, and store the target audio by the cloud.

When the target audio is played through the terminal B, step S4 may be executed first, the cloud sends the target audio (including the human voice audio and the beat data) to the terminal B, and the APP capable of playing the song in the terminal B may perform visual display of a corresponding rhythm according to the beat data in the target audio, as shown by a dashed box portion in the interface 3 of the terminal B. Terminal a and terminal B in fig. 1c may be the same terminal or different terminals.

A first visual presentation shown in dashed box in the interface 3 of terminal B is for example shown in fig. 1 d. The first visual display is represented by vertical bar-shaped graphs, which change along with the beat data in the sequence from top to bottom, and the vertical bar-shaped graphs gradually increase and expand towards two sides by taking the head portrait of the user as the center as seen from the change process shown in fig. 1 d. It is to be understood that the above description is made only by taking the vertical bar-shaped pattern as an example, and the present invention is not limited thereto.

The audio presentation method of the present embodiment may be performed by any suitable electronic device with data processing capabilities, including but not limited to: mobile terminals (such as tablet computers, mobile phones and the like), PCs and the like.

Example two

FIG. 2a is a schematic diagram of an audio frequency display method according to a second embodiment of the present application; as shown in fig. 2a, it comprises the following steps:

s202, obtaining beat data corresponding to the human voice audio and note time data corresponding to the note audio from target audio comprising the human voice audio and the note audio.

The manner of obtaining the beat data corresponding to the human voice audio is described in the above embodiments, and this embodiment is not described again.

In this embodiment, the target audio may include note audio in addition to the human voice audio. The note audio may be audio generated according to a playing operation of the user when the note audio is generated as the human audio. For example, when a song is sung, although there is no complete accompaniment, in order to achieve a better singing effect, a singer can play part of the musical notes through the playing function provided by the APP during the singing process, so that the song is more enjoyable, and then the musical note audio frequency can be generated according to the part of the musical notes played by the singer. After the note audio is determined, it can be added to the target audio. Based on this, some target audios include both human voice audio and note audio.

In the prior art, even if the target audio includes the note audio, if the audio included in the note audio is less, for example, each lyric corresponds to only one or two notes, the notes corresponding to the entire target audio are less, and the human audio is not easy to distinguish a plurality of sounds according to bass, alto, mediant, and treble, so that even if the target audio includes the note audio, the spectrogram of the target audio cannot be determined, and the user cannot effectively feel the effect and rhythm of the audio.

In the embodiment, the corresponding processing and visual display are performed according to the note time data corresponding to the note audio, so that the problems can be effectively solved. The note time data corresponding to the note audio may be determined according to the output time of the note in the target audio, for example, if the third note is output when the target audio is played to the 5 th second, the 5 th second may be the audio time data corresponding to the third note. Note that the note time data may correspond to one note or a plurality of notes, and this embodiment does not limit this.

S204, in the playing process of the target audio, performing first visual display corresponding to the rhythm of the beat data according to the beat data, and performing second visual display corresponding to the time of the note time data according to the note time data.

For the specific content of the first visual display, reference may be made to the above embodiments, which are not repeated in this embodiment.

And performing second visual display corresponding to the time of the note time data according to the note time data, namely controlling the content displayed to the user to change at the time corresponding to the note time data, so that the user can intuitively determine the note in the target audio according to the second visual display. In order to form elegant music, the output time of the general notes is also matched with the beat of the human voice audio, so that the user can intuitively feel the rhythm of the human voice audio according to the second visual content.

The second visual display may also be any form of visual display as long as it can correspond to the time of the note time data, for example, the animation content corresponding to the note may be displayed at the time corresponding to the note time data through the screen to perform the second visual display, or the light and shade state may be switched at the time corresponding to the note time data through controlling the light to perform the second visual display. Through the combination of the first visual display and the second visual display, the user can visually feel the rhythm of the human voice audio and the effect of the note audio, and the user experience is further improved.

For example, as shown in fig. 2B, steps S1-S4 in fig. 2B refer to steps S1-S4 in fig. 1c, interface 1 of terminal a in fig. 2B refers to interface 1 of terminal a in fig. 1c, interface 3 of terminal a in fig. 2B refers to interface 2 of terminal a in fig. 1c, interface 4 of terminal B in fig. 2B refers to interface 3 of terminal B in fig. 1c, and the description of the embodiment is omitted. Terminal a and terminal B in fig. 2B may be the same terminal or different terminals.

FIG. 2b differs from FIG. 1c above in that:

A) in the recording process, as shown in the interface 2 displayed by the terminal a, the user X can play the icon 1 in the dashed line frame of the interface on the left side of the interface at the 5 th second after starting recording, step S3 is executed, and when the terminal a sends the recorded target audio (including the human voice audio, the beat data, the note audio and the note time data) to the cloud, the note audio generated by the user playing operation and the note time data corresponding to the note audio are added to the sent target audio.

B) In the playing process, step S4 is executed, and when the recorded target audio (including the human voice audio, the beat data, the note audio, and the note time data) is sent from the cloud to the terminal B, the note audio generated by the user playing operation and the corresponding note time data are added to the sent target audio. During the display, as shown in the interface 5 in the terminal B, the content displayed in the playing interface is added with the second visual display corresponding to the note time data from the 5 th second from the start of the playing.

A detailed schematic diagram of the first visual presentation and the second visual presentation shown in the dashed box of the playing interface is shown in fig. 2c, and the middle part in fig. 2c shows the overlapping effect of the first visual presentation and the second visual presentation, which changes with the playing progress in the sequence from top to bottom. Wherein the vertical bar represents the first visual presentation, the solid block represents the second visual presentation, and the "playing progress" of the interface 5 in fig. 2b and the "5 th second" under the table of the solid block in fig. 2c are used to illustrate that the presentation time of the solid block is 5 th second from playing and not actually presenting the content.

Hereinafter, each of the playback interfaces involved in the above-described process will be described as an example.

As shown in fig. 3a, fig. 3a shows an interface when a song is played, and lyric content is displayed above the interface; the lyric content comprises a rhythm bar below, and a first visual display and/or a second visual display can be performed in the rhythm bar; in addition, the playing progress of the song is also shown below the rhythm bar.

In the song playing process, a third object, namely a user head portrait, is displayed in the middle of the rhythm bar, vertical bar-shaped graphs on the left side and the right side of the first object, namely the user head portrait, are displayed, and each vertical bar-shaped graph corresponds to a beat point of beat data. With the change of the song playing progress, a new vertical bar-shaped graph corresponding to the beat point is generated at the position of the user avatar, and moves along a straight line in the direction away from the user avatar, and meanwhile, the user avatar is zoomed, as shown in fig. 3 b.

It is to be understood that the above description has been made only by taking the vertical bar-shaped pattern as an example, but the present invention is not limited thereto, and in other embodiments of the present invention, the first object may be another pattern, such as a bubble pattern, a note pattern, or the like, as long as the first object changes with the change of the tempo data, and the present embodiment is not limited thereto.

For example, when the first object is a note graphic, as the progress of playing a song changes, a new note graphic corresponding to the beat point is generated at the position of the user avatar, and moves along the line spectrum in the graph in a direction away from the user avatar, and at the same time, the user avatar zooms, as shown in fig. 3 c.

The specific graph of the first object and the moving mode of the first object may be set by those skilled in the art as needed, and this embodiment is not described again.

When the time corresponding to the Note time data is played, as shown in fig. 3d, in the rhythm bar, the user avatar is used as a starting point, the singing notes (second visual display) corresponding to the notes are controlled to move along a straight line to a side far away from the user avatar, and disappear when the user moves to a movement distance threshold (1 cm), the movement distance thresholds of the singing notes corresponding to different notes are different, the positions of the disappearing notes can also be different, specifically, as shown in fig. 3e, the positions of the disappearing notes corresponding to Note 1 and Note 2 are different.

The first visual display comprises zooming display of the head portrait of the user and moving display of the vertical bar-shaped graph, the second visual display comprises moving of the singing Note, and the combination of the first visual display, the second visual display and the third visual display can enable the user to feel strong rhythm through a viewing interface, so that the user experience is improved.

In addition, because the music score and the lyrics are generally created separately when the song is created, according to the scheme provided by the embodiment, if the target audio is the audio generated in the music score creating process, the creation process of the music score can be more clearly shown through the first visual display and the second visual display.

Of course, the display modes in fig. 3a, 3b, 3c, 3d and 3e are only examples and are not intended to limit the present application.

In one alternative, the display area of the first visual display overlaps partially or completely with the display area of the second visual display to superimpose the first visual display with the second visual display for a more intense visual effect. As shown in the middle part of fig. 2 c.

Specifically, during the overlaying, if the second visual display is performed above the first visual display, the transparency of the display content corresponding to the second visual display can be adjusted, so that the user can still observe the content corresponding to the first visual display through the content corresponding to the second visual display; if the superposition position is opposite, the transparency of the display content corresponding to the first visual display can be adjusted.

In one alternative, performing a first visual presentation corresponding to the tempo of the tempo data according to the tempo data comprises: moving a first object corresponding to a beat point of the beat data along a first preset path. Therefore, the displayed first object directly corresponds to the beat point of the beat data and moves along the first preset path to form a visual effect that the first object changes along with the change of the playing progress, so that a user can intuitively feel the change of the beat point by observing the appearance and the moving process of the first object, and the user experience is improved.

For example, if the first preset path is a straight path with the user avatar as the center, the first object may be moved along the straight path in a direction away from the user avatar in the horizontal direction with the user avatar as the starting point, and may be hidden when moved to the interface boundary (left and right sides of the dashed frame), as shown in fig. 1 c. Of course, the first predetermined path may also be a wave-shaped path, a parabolic path, etc. Alternatively, the first preset path may include at least two sub-paths, and then at least two first objects corresponding to the same beat point may be moved along the at least two sub-paths. The straight line path centered on the user avatar as described above may be divided into a straight line sub-path on the left side of the user avatar and a straight line sub-path on the right side of the user avatar. Of course, the two sub-paths may be different, such as a straight sub-path on the left side of the user avatar, a wavy sub-path on the right side of the user avatar, and so on.

In one alternative, said performing a second visual presentation corresponding to the time of said note time data from said note time data comprises: and if the target audio is played to the time corresponding to the note time data, moving a second object along a second preset path. Therefore, the generation time of the musical notes is more clearly displayed through the second object corresponding to the musical notes, the visual effect that the second object changes along with the change of the playing progress is formed by moving the second object along the second preset path, the user intuitively feels the corresponding musical notes by observing the appearance and the moving process of the second object, and the user experience is improved.

The moving manner of the second object may be similar to the moving manner of the first object, and is not described in detail in this embodiment.

In a further alternative, the first preset path and the second preset path are partially overlapped or completely overlapped, so that the possibility of overlapping the moved first object and the moved second object exists, and then the first visual effect and the second visual effect can be overlapped when the display is carried out, so that a stronger visual effect is achieved. In practical use, the second preset path may completely overlap with the first preset path, or all the second preset paths may also overlap with a part of the first preset path, which is not limited in this embodiment. For example, an interface schematic of a visual effect overlay is shown in FIG. 2 c.

In a further alternative, the method further comprises: determining a note corresponding to the current playing time from the note audio, and determining a current moving distance threshold according to the note; and if the moving distance of the second object moving along the second preset path is matched with the current moving distance threshold, hiding the second object. An example of the above process is shown in fig. 3b, and assuming that the current movement distance threshold is 1cm, as shown in the right interface of fig. 3b, the second object moves symmetrically along a straight line in a direction away from the user avatar around the user avatar, and when the movement distance of the second object without moving along the second preset path reaches 1cm, the second object is hidden.

Due to the fact that the sound characteristics corresponding to different notes are different, for example, the sound volume, duration and the like corresponding to different notes are different, the sound characteristics of the corresponding notes can be directly sensed according to the moving distance of the second object by determining the moving distance threshold corresponding to the notes and controlling the hiding of the second object according to the moving distance threshold, and user experience is further improved. In addition to determining the moving distance threshold corresponding to the second object according to the musical note, the moving speed of the second object may also be determined according to the musical note.

For another example, all corresponding moving distance thresholds may be determined according to a second preset path, assuming that the second preset path is a straight line path from the left side of the interface to the right side of the interface, the moving distance thresholds corresponding to each trisection point from left to right and the rightmost end point of the path are 1, 2, and 3, respectively, if the volume of a certain note is high, the moving distance threshold corresponding to the note is determined to be 3, the second object corresponding to the note is moved to the rightmost position close to the second preset path to be hidden, and if the volume of another note is low, the moving distance threshold corresponding to the other note is determined to be 1, the second object corresponding to the note is moved to the trisection point close to the left side of the second preset path to be hidden.

In a further alternative, before the target audio is played to the time corresponding to the note time data and the second object is moved along a second preset path, the method further comprises: and determining a note corresponding to the current playing time from the note audio, and determining the color of a corresponding second object according to the note so that the second object is displayed according to the color, so that the sound characteristic of the corresponding note is directly sensed according to the color of the second object, and the user experience is further improved.

For example, if a certain note is a note played by a piano, the color of the corresponding second object may be blue, and if a note played by a guitar, the color of the corresponding second object may be orange, or the like.

In a further alternative, a third object may be displayed at the start position of the first preset path according to the beat data, that is, the display position of the third object overlaps with the start position of the first preset path. The third object is displayed at the initial position of the first preset path, so that the visual effect of linkage of the third object and the first object can be generated, a user feels stronger rhythm, and the user experience is improved.

For example, when the first visual presentation is performed, the third object may be controlled to generate a vibration effect and/or control color switching of the third object according to the beat data, and the first object may be moved from a position of the third object (a start position of the first preset path) at the same time, so as to generate a visual effect in which the first object is linked with the third object.

In a further alternative, based on the first visual presentation being made, other content interacting with the user may be expanded to obtain an immersive experience for the user.

In a specific alternative, after step S204, the method may further include:

s206, in the process of the first visual display, obtaining input operation of a user, and determining a matching result of the input operation and the beat data.

In the first visual display process, the user can know the beat corresponding to the target audio according to the first visual effect. Based on this, the user can perform an input operation in response to the learned tempo.

The input operation by the user may include: and according to the first visual display, triggering operation on a fixed position in the screen, pressing operation on a preset key and the like. Further, the input operation by the user may include an operation of inputting voice by the user, and the like.

After the input operation of the user is acquired, the input operation may be matched with the beat data, and a matching result may be determined, and then the matching result may be displayed through step S208. By determining and presenting the matching results, the degree of user involvement may be increased to enable the user to obtain an immersive experience. Of course, a person skilled in the art may set a corresponding matching rule according to actual requirements, which is not limited in this embodiment.

And S208, displaying the matching result.

In a specific usage scenario, when performing the first visual presentation, the presented content may include first objects corresponding to the beat points in the beat data one to one, and the first objects move according to a first preset path. When the input operation of the user is acquired, the matching degree of the input operation and the beat point in the beat data is determined according to the distance between the first object and the preset fixed position, and the closer the distance is, the higher the matching degree is.

When the matching result is displayed, the matching prompt color can be displayed above the first object or at other positions, the matching degree can be divided into a plurality of ranges, and the matching degrees in different ranges can correspond to different matching prompt colors. Therefore, the matching degree of the input operation of the user and the beat data can be intuitively known by the displayed matching prompt color.

In addition, since the matching result can be used for measuring the rhythm of the user, the statistical analysis can be performed on the matching results respectively corresponding to the multiple users besides displaying the matching result, so that the multiple users can be ranked according to their own rhythms, or the user with the best rhythm is selected from the multiple users, and the like.

In this embodiment, by obtaining beat data corresponding to a human voice audio and audio time data corresponding to a note audio from a target audio including the human voice audio and the note audio, and during the playing of the target audio, performing a first visual presentation corresponding to a rhythm of the beat data according to the beat data, and performing a second visual presentation corresponding to a time of the audio time data according to the audio time data, thereby, the first visual display can be carried out according to the beat data corresponding to the human voice audio without using other accompanying audio, the second visual display can be carried out according to the note time data, and the user can accurately sense the rhythm of the human voice and the audio through the first visual display, and the rhythm sensed by the user can be enhanced through the second visual display, so that the user experience is improved.

EXAMPLE III

FIG. 4 is a schematic structural diagram of an audio frequency display device according to a fourth embodiment of the present application; as shown in fig. 4, it includes: an obtaining module 402 and a rhythm exhibiting module 404.

An obtaining module 402, configured to obtain beat data corresponding to a human voice audio from a target audio including the human voice audio.

A rhythm display module 404, configured to perform, according to the beat data, a first visual display corresponding to a rhythm of the beat data in a playing process of the target audio.

Optionally, in any embodiment of the present application, the target audio further includes a note audio, and the obtaining module 402 is further configured to obtain note time data corresponding to the note audio from the target audio including the note audio; the device further comprises: and the musical note display module is used for performing second visual display corresponding to the time of the musical note time data according to the musical note time data in the playing process of the target audio.

Optionally, in any embodiment of the present application, the display area of the first visual display partially overlaps or completely overlaps the display area of the second visual display.

Optionally, in any embodiment of the present application, the rhythm exhibiting module 404 is specifically configured to move the first object corresponding to the beat point of the beat data along a first preset path.

Optionally, in any embodiment of the present application, the first object includes: vertical bar patterns, bubble patterns, or note patterns.

Optionally, in any embodiment of the present application, the note displaying module is specifically configured to move the second object along a second preset path if the target audio is played to the time corresponding to the note time data.

Optionally, in any embodiment of the present application, the first preset path and the second preset path partially overlap or completely overlap.

Optionally, in any embodiment of the present application, the apparatus further includes: the distance threshold value determining module is used for determining a note corresponding to the current playing time from the note audio and determining a current moving distance threshold value according to the note; correspondingly, the note displaying module is further configured to hide the second object if the moving distance of the second object moving along the second preset path matches the current moving distance threshold.

Optionally, in any embodiment of the present application, the apparatus further includes: and the color determining module is used for determining the musical notes corresponding to the current playing time from the musical note audio, and determining the color of the corresponding second object according to the musical notes so as to display the second object according to the color.

Optionally, in any embodiment of the present application, the rhythm display module 404 is further configured to: and displaying a third object at the initial position of the first preset path according to the beat data.

Optionally, in any embodiment of the present application, the apparatus further includes: the matching module is used for acquiring input operation of a user in the process of performing the first visual display and determining a matching result of the input operation and the beat data; and the matching display module is used for displaying the matching result.

According to the scheme provided by the embodiment, the beat data corresponding to the human voice audio is acquired from the target audio containing the human voice audio, and then the first visual display corresponding to the rhythm of the beat data is performed according to the beat data in the playing process of the target audio, so that other accompaniment audio is not needed, the first visual display can be performed only according to the beat data corresponding to the human voice audio, and the user can accurately sense the rhythm of the human voice audio through the first visual display, namely, the rhythm of the human voice audio corresponding to the beat data, so that the user experience is improved; especially, when the target audio is a chorus song, the scheme provided by the embodiment can still perform the first visual display corresponding to the rhythm of the chorus song according to the beat data, so that the user experience is improved.

Example four

Fig. 5 is a hardware structure diagram of some electronic devices that execute the audio presentation method according to the present application. According to fig. 5, the apparatus comprises:

a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508.

Wherein:

the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508.

A communication interface 504 for communicating with other terminal devices or servers.

The processor 502 is configured to execute the program 510, and may specifically perform the relevant steps in the above-described audio presentation method embodiment.

In particular, program 510 may include program code that includes computer operating instructions.

The processor 502 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention; the processor may also be a Programmable Gate Array (FPGA), a graphics processor GPU, an embedded neural network processor NPU, or the like. The terminal device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The electronic device can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the above embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The cloud is similar to a general computer architecture, but has high requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because high-reliability services need to be provided.

(5) And other electronic devices with data interaction functions.

Thus, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular transactions or implement particular abstract data types. The application may also be practiced in distributed computing environments where transactions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An audio presentation method, comprising:

acquiring beat data corresponding to human voice audio from target audio containing the human voice audio;

and in the playing process of the target audio, performing first visual display corresponding to the rhythm of the beat data according to the beat data.

2. The method of claim 1, wherein the target audio further comprises note audio, the method further comprising:

acquiring note time data corresponding to the note audio;

and in the playing process of the target audio, performing second visual display corresponding to the time of the note time data according to the note time data.

3. The method of claim 2, wherein the display area of the first visual display partially overlaps or fully overlaps the display area of the second visual display.

4. The method of claim 2, wherein the performing a first visual presentation corresponding to the tempo of the beat data according to the beat data comprises:

moving a first object corresponding to a beat point of the beat data along a first preset path.

5. The method of claim 4, wherein the first object comprises: vertical bar patterns, bubble patterns, or note patterns.

6. A method as recited in claim 4, wherein said performing a second visual presentation corresponding to the time of the note time data from the note time data comprises:

and if the target audio is played to the time corresponding to the note time data, moving a second object along a second preset path.

7. The method of claim 6, wherein the first predetermined path partially overlaps or completely overlaps the second predetermined path.

8. The method of claim 6, further comprising:

determining a note corresponding to the current playing time from the note audio, and determining a current moving distance threshold according to the note;

and if the moving distance of the second object moving along the second preset path is matched with the current moving distance threshold, hiding the second object.

9. The method of claim 6, wherein before moving the second object along a second predetermined path if the target audio is played to the time corresponding to the note time data, the method further comprises:

and determining a note corresponding to the current playing time from the note audio, and determining the color of a corresponding second object according to the note so that the second object is displayed according to the color.

10. The method of claim 4, further comprising:

and displaying a third object at the initial position of the first preset path according to the beat data.

11. The method of claim 1, wherein after the first visual presentation corresponding to the tempo of the beat data from the beat data, the method further comprises:

in the process of carrying out the first visual display, acquiring input operation of a user, and determining a matching result of the input operation and the beat data;

and displaying the matching result.

12. An audio presentation device, comprising:

the acquisition module is used for acquiring beat data corresponding to the human voice audio from target audio containing the human voice audio;

and the rhythm display module is used for carrying out first visual display corresponding to the rhythm of the beat data according to the beat data in the playing process of the target audio.

13. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the audio presentation method according to any one of claims 1-11.

14. A computer storage medium, on which a computer program is stored which, when being executed by a processor, carries out the audio presentation method as claimed in any one of claims 1 to 11.