CN112885369B

CN112885369B - Audio processing method and audio processing device

Info

Publication number: CN112885369B
Application number: CN202110106764.9A
Authority: CN
Inventors: 张文文
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2024-05-24
Anticipated expiration: 2041-01-26
Also published as: CN112885369A

Abstract

The application discloses an audio processing method, an audio processing device, electronic equipment and a readable storage medium, and belongs to the field of computers. Analyzing the first audio data, acquiring at least one piece of sub audio data, generating a graph corresponding to the at least one piece of sub audio data according to the attribute information of the at least one piece of sub audio data, arranging and displaying the graph according to the playing time period of the at least one piece of sub audio data on a playing time axis corresponding to the first audio data, receiving a first input of a user, responding to the first input, editing the graph corresponding to the at least one piece of sub audio data, and outputting second audio data according to the edited graph. According to the scheme, the audio data is graphically displayed, so that a user can intuitively and clearly know the audio data, edit the audio data by editing the graph corresponding to the audio data, output the edited audio data, greatly reduce the threshold of audio processing, and are wide in adaptability and convenient to popularize.

Description

Audio processing method and audio processing device

Technical Field

The application belongs to the field of computers, and particularly relates to an audio processing method, an audio processing device, electronic equipment and a readable storage medium.

Background

With the rapid development of technology, the amount of information contained in audio is increasingly complex, and users often need to process audio at their own discretion to obtain the required information. Related art in processing audio, audio processing is generally performed using professional audio software.

In the process of realizing the application, the inventor finds that at least the following problems exist in the related technology, professional audio software needs to spend time to learn audio processing, the audio is difficult to edit by hand immediately, the threshold is high after the use, and the application range is narrow.

Content of the application

An object of the embodiments of the present application is to provide an audio processing method, an audio processing apparatus, an electronic device, and a readable storage medium, which can solve the problem that in the related art, due to the need of using professional audio software to perform audio processing, the audio processing is long in use, the threshold is high, and the application range is narrow.

In order to solve the technical problems, the application is realized as follows:

In a first aspect, an embodiment of the present application provides an audio processing method, including:

analyzing the first audio data to obtain at least one piece of sub-audio data;

generating a graph corresponding to at least one piece of sub-audio data according to attribute information of the at least one piece of sub-audio data;

Displaying the graph in a permutation manner according to the at least one sub-audio data in the playing time period of the first audio data on the playing time axis corresponding to the first audio data;

Receiving a first input of a user, and responding to the first input, editing a graph corresponding to the at least one piece of sub-audio data;

And outputting the second audio data according to the edited graph.

In a second aspect, an embodiment of the present application provides an audio processing apparatus, including:

the acquisition module is used for analyzing the first audio data and acquiring at least one piece of sub-audio data;

The generating module is used for generating a graph corresponding to at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data;

the display module is used for displaying the graph in a permutation manner according to the at least one sub-audio data in the playing time period of the first audio data on the playing time axis corresponding to the first audio data;

the editing module is used for receiving a first input of a user, responding to the first input and editing a graph corresponding to the at least one piece of sub-audio data;

and the output module is used for outputting the second audio data according to the edited graph.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions implementing the steps of the audio processing method according to the first aspect when executed by the processor.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the audio processing method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the audio processing method according to the first aspect.

The embodiment of the application provides an audio processing method, an audio processing device, electronic equipment and a readable storage medium, and the scheme enables a user to intuitively and clearly know audio data and edit the audio data by editing a graph corresponding to the audio data and then output the edited audio data by graphically displaying the audio data, so that the threshold of audio processing is greatly reduced, the adaptability is wide, and the popularization is convenient.

Drawings

FIG. 1 is a flow chart of steps of an audio processing method according to an embodiment of the present application;

FIG. 2 is one of the schematic diagrams of a cube graphic provided by an embodiment of the present application;

FIG. 3 is a schematic illustration of a sector pattern provided in accordance with an embodiment of the present application;

FIG. 4 is a flowchart illustrating steps of an audio display method according to an embodiment of the present application;

FIG. 5 is a second schematic view of a cube graphic provided in accordance with an embodiment of the present application;

FIG. 6 is a second schematic view of a sector pattern provided in accordance with an embodiment of the present application;

FIG. 7 is a third schematic view of a cube graphic provided in accordance with an embodiment of the present application;

FIG. 8 is a third schematic view of a sector pattern provided in accordance with an embodiment of the present application;

FIG. 9 is a fourth schematic illustration of a sector pattern provided in accordance with an embodiment of the present application;

FIG. 10 is one of the schematic diagrams of a graphical presentation provided by an embodiment of the present application;

FIG. 11 is a second schematic diagram of a graphical representation provided by an embodiment of the present application;

FIG. 12 is a third schematic diagram of a graphical representation provided by an embodiment of the present application;

FIG. 13 is a fourth schematic diagram of a graphical presentation provided by an embodiment of the present application;

FIG. 14 is a fifth schematic diagram of a graphical presentation provided by an embodiment of the present application;

FIG. 15 is a schematic diagram showing a graphical representation provided by an embodiment of the present application;

FIG. 16 is a schematic diagram seventh of a graphical illustration provided by an embodiment of the present application;

FIG. 17 is a schematic illustration of a graphical representation provided by an embodiment of the present application;

FIG. 18 is a diagram illustrating a graphical representation of a ninth embodiment of the present application;

fig. 19 is a block diagram of an audio processing apparatus according to an embodiment of the present application;

Fig. 20 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

Fig. 21 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the application may be practiced otherwise than as specifically illustrated or described herein. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

When processing an audio file, professional audio editing software is often required to be introduced, and professional persons who are good at using the audio editing software perform audio processing, so that the access threshold is high, and the processing is inconvenient to start. Meanwhile, the audio editing software needs to be executed in a certain application environment, the requirement on the application environment is high, and the application range is narrow. The audio processing method, the audio processing device, the electronic equipment and the readable storage medium provided by the application have the advantages that the audio files are displayed on the visual interface in a visual mode, the audio files are processed through the operation of the visual interface, the processed audio files are obtained, the threshold is low, the operation is easy, the application range is wide, and the popularization is easy.

The audio processing method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 shows a step flowchart of an audio processing method according to an embodiment of the present application, where the method includes:

step 101, analyzing the first audio data to obtain at least one piece of sub-audio data.

In the embodiment of the application, the first audio data is an audio file to be processed. The first audio data to be parsed is an audio file to be subjected to audio processing. The essence of the first audio data is data obtained by sound conversion, and when the first audio data is analyzed, the first audio data can be analyzed according to sound characteristics. Sound characteristics include pitch, timbre, loudness, and the like, among others. The first audio data may be parsed in different tones and different timbres to obtain at least one sub-audio data.

Specifically, a tone waveform and a tone waveform corresponding to the first audio data are extracted, the tone waveform and the tone waveform are respectively compared with a preset tone waveform and a preset tone waveform to obtain a superposition portion of the first audio data and the second audio data, when the superposition portion of the first audio data and the second audio data exceeds a preset threshold value, the superposition portion is determined to be the acquired sub audio data, and the like, the first audio data is traversed to obtain at least one sub audio data.

The information amount obtained by analyzing the first audio data by using different tones and different timbres is huge, but not all information is needed by a user, so when analyzing the first audio data and acquiring the sub audio data, analysis and acquired contents can be set according to different actual requirements, the utilization rate of the finally obtained sub audio data is ensured, and the convenience of audio processing is further realized. Specifically, the preset pitch waveform and the preset tone waveform can be selected with different accuracies according to actual requirements, or the preset threshold can be adjusted according to actual requirements.

When the first audio data is parsed, the first audio data is parsed sequentially according to the time information corresponding to the first audio data, that is, from the start time of the first audio data to the end time of the first audio data. For example, if the start time of one first audio data is 0 hours 0 minutes 0 seconds and the end time is 3 minutes 20 seconds, that is, when the analysis is performed, the analysis is performed sequentially from 0 hours 0 minutes 0 seconds to 3 minutes 20 seconds. Meanwhile, when analyzing, time information and volume information corresponding to the sub-audio data are recorded. For a tonal waveform, the more times the waveform vibrates over the same period of time, the higher the tone; for tone color waveforms, the smoothness of the waveforms is different, and the tone colors are different. The implementation manner of parsing the first audio data may also be implemented according to related technologies in the art, and the embodiments of the present application are not described herein again and are not limited in detail.

And 102, generating a graph corresponding to at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data.

In the embodiment of the application, in order to improve the convenience of audio processing, after at least one piece of sub-audio data is acquired, the at least one piece of sub-audio data is displayed in a graphical mode. The graphic mode is a mode of generating an image according to at least one piece of sub-audio data for displaying, and can intuitively display other information corresponding to the at least one piece of sub-audio data, so that a user can be helped to quickly realize audio processing.

Generally, each sub-audio data has its own attribute information, which may include profile information, play time length, and the like of the sub-audio data. The profile information is used to distinguish each sub-audio data, and the play duration is used to describe the play duration of one sub-audio data. When at least one piece of sub-audio data is displayed in a graphic mode, a graphic corresponding to the sub-audio data can be generated according to the attribute information of the at least one piece of sub-audio data. And after generating the graph, the graph is displayed on a playing time axis corresponding to the first audio data according to the at least one sub audio data in a permutation manner in the playing time period of the first audio data. Thereby realizing unified graphic presentation of a plurality of sub-audio data.

Firstly, generating a graph corresponding to at least one piece of sub-audio data according to attribute information of the at least one piece of sub-audio data. That is, the graphics corresponding to the sub-audio data may represent attribute information of the sub-audio data, and when the graphics corresponding to one sub-audio data is obtained, the attribute information of the audio data may be obtained through the graphics.

And 103, arranging and displaying the graph according to at least one sub-audio data in the playing time period of the first audio data on the playing time axis corresponding to the first audio data.

In the embodiment of the application, the first audio data corresponds to a playing time axis, the playing time axis represents time information of the first audio data, and on the playing time axis, a user can intuitively acquire the time information of the first audio data, including, but not limited to, the starting time and the ending time of the first audio data. And displaying the generated graph in a sequence according to the playing time period of the first audio data on the playing time axis corresponding to the first audio data. At this time, the user may acquire time information of each sub-audio data through the play time axis, including, but not limited to, a start time and an end time of each sub-audio data.

It should be noted that, the playing time axis is determined according to the time information of the first audio data, and represents a period of time from the start time to the end time of the first audio data, and the maximum time displayed by the playing time axis is not less than the end time of the first audio data. Thereby ensuring that the time information of the first audio data can be completely displayed on the playing time axis.

Optionally, the playing time axis has at least one type, and each playing time axis corresponds to at least one type of three-dimensional graphics. Therefore, the three-dimensional graph corresponding to the at least one sub-audio data and the playing time axis can be generated according to the attribute information of the at least one sub-audio data, and different attribute information corresponding to different dimensions of the three-dimensional graph. The three-dimensional graph can be displayed by being overlapped on a playing time axis according to the playing time period of at least one piece of sub-audio data in the first audio data.

Step 104, receiving a first input of a user, and responding to the first input, editing a graph corresponding to at least one piece of sub-audio data.

In the embodiment of the application, the first input may be a click, long press, sliding, voice or gesture type input of a graphic corresponding to the sub-audio data by a user. Specifically, the first input from the user acts on the graphic corresponding to the at least one sub-audio data, and editing the graphic corresponding to the at least one sub-audio data in response to the first input includes, but is not limited to: deleting the graph corresponding to the at least one piece of sub-audio data, moving the graph corresponding to the at least one piece of sub-audio data, modifying the graph corresponding to the at least one piece of sub-audio data, and adding the graph corresponding to the one piece of sub-audio data. When editing the graphics corresponding to at least one piece of sub-audio data, correspondingly, at least one piece of sub-audio data is also edited, that is, the editing of the sub-audio data is affected by the editing of the graphics, which is only exemplified herein, and the specific editing mode can be determined according to the actual requirement and is not limited herein.

It should be noted that, the first input may be implemented by a cursor, or may be implemented by touch, or may be implemented by a physical button. The implementation mode of the first input is based on the application scene where the application is located, for example, when the application scene supports a touch screen, the first input from a user can be realized by using touch; when the application scenario supports an external device, a first input from the user may be implemented using the cursor.

Step 105, outputting the second audio data according to the edited graph.

In the embodiment of the application, the user can execute the related operation for a plurality of times to perform the first input, the received first input is not limited to a plurality of times until the user determines to finish adjustment, and the second audio data is output. That is, the second audio data is determined from the edited graphic. It should be noted that, the second audio data is different from the first audio data, the second audio data is an audio file obtained after editing according to the first input, and the information contained in the second audio data is probably overlapped or not overlapped; the second audio data may contain an amount of information that is greater than the amount of information contained in the first audio data, and may be less than the amount of information contained in the first audio data.

The output second audio data is stored in a storage space corresponding to a specified directory, and the directory may be specified by a system or a user. Meanwhile, the catalogue in storage can be named according to the requirements of users.

In summary, the audio processing method provided by the embodiment of the application graphically displays the audio data, intuitively and clearly indicates the audio data, realizes the editing of the audio data by modifying the graph corresponding to the audio data, and outputs the edited audio data, thereby greatly reducing the application threshold of audio processing, having wide adaptability and being convenient for popularization.

Optionally, when each play time axis corresponds to at least one type of three-dimensional graphics, generating at least one piece of sub-audio data and the three-dimensional graphics corresponding to the play time axis according to the attribute information of the at least one piece of sub-audio data, wherein different dimensions of the three-dimensional graphics correspond to different attribute information. And according to the playing time period of at least one piece of sub-audio data in the first audio data, superposing the three-dimensional graph on a playing time axis for displaying.

It should be noted that, in the embodiment of the present application, at least one type of play time axis exists, and each play time axis corresponds to at least one type of three-dimensional graphics. The playing time axis may start from the start time of the first audio data, or may start from the zero point, and advance along any axis direction of the playing time axis. The play time axis may be linear or circumferential. The attribute information includes profile information of the sub-audio data and a play duration, the profile information is used for describing one sub-audio data, so that a user can distinguish the sub-audio data from other sub-audio data conveniently, and the play duration is used for describing the occupation condition of one sub-audio data in a play time period of the first audio data.

Optionally, the play time axis includes at least one of a linear time axis and a circumferential time axis. In the case where the play time axis is a linear time axis, the three-dimensional figure is a cube figure. In the case where the play time axis is a circumferential time axis, the three-dimensional pattern is a sector pattern.

When the playing time axis is a linear time axis and the three-dimensional graph is a cube graph, the first side of the cube graph displays the profile information of the sub-audio data, and the length of the first side represents the playing time length of at least one sub-audio data. Wherein, in the cube patterns displayed on the linear time axis, the extending direction of the first side of each cube pattern is consistent with the extending direction of the linear time axis, and the orientation of the first side of each cube pattern is the same.

For example, referring to fig. 2, one of schematic diagrams of ase:Sub>A cube graphic is shown, where starting from zero, the sub-audio datase:Sub>A obtained by parsing the first audio datase:Sub>A is dog-bone sub-audio datase:Sub>A, bird-bone sub-audio datase:Sub>A, and piano sub-audio datase:Sub>A, where three sub-audio datase:Sub>A are displayed in ase:Sub>A superimposed manner on ase:Sub>A playing time axis, the playing time of the dog-bone sub-audio datase:Sub>A is (ase:Sub>A-0), the playing time of the bird-bone sub-audio datase:Sub>A is (B-ase:Sub>A), and the playing time of the piano sub-audio datase:Sub>A is (C-B).

When the playing time axis is a circumference time axis and the three-dimensional graph is a sector graph, the arc surface of the sector graph displays the profile information of at least one piece of sub-audio data, and the angle opposite to the arc surface represents the playing time of at least one piece of sub-audio data. In the sector patterns displayed on the circumference time axis, the circle center corresponding to the arc edge of each sector pattern is consistent with the circle center of the circumference time axis. For example, referring to FIG. 3, a schematic diagram is shown in one of the sector patterns, wherein the circumferential time axis starts with a hollow circle and the solid circle ends, the hollow circle representing zero. The sub-audio data obtained through analysis of the first audio data are dog-ear sub-audio data, bird-ear sub-audio data and piano-ear sub-audio data, the playing time length of the dog-ear sub-audio data is +.EOF, the playing time length of the bird-ear sub-audio data is +.FOH, and the playing time length of the piano-ear sub-audio data is +.GOI.

It should be noted that, on the circumferential time axis, the playing duration may be represented by an angle opposite to the arc surface, or may be determined by an arc, that is, an arc opposite to the angle. Taking fig. 3 as an example, the circumferential time axis extends from the hollow circle to the solid circle, and different times are marked on the circumferential time axis, so that the playing duration +.eof can be represented by (F-E), the playing duration +.foh can be represented by (H-F), and the playing duration +.goi can be represented by (I-G). When determined in an arc corresponding to the arcuate face, the circumferential time axis will be divided in a circumference corresponding to the radius of the largest sector thereof.

Optionally, referring to fig. 4, a flowchart of steps of an audio presentation method provided by an embodiment of the present application is shown, where the method includes:

step 201, an audio type corresponding to at least one piece of sub-audio data is obtained.

Step 202, on a playing time axis corresponding to the first audio data, the graphics of the sub audio data of the same audio type are placed on the same layer for displaying.

In the embodiment of the application, at least one piece of sub-audio data can be integrated according to the audio type, so that the graphical display of the at least one piece of sub-audio data is facilitated, and the audio type corresponding to the at least one piece of sub-audio data can be acquired after the at least one piece of sub-audio data is obtained. And on a playing time axis corresponding to the first audio data, the graphics of the sub audio data of the same audio type are displayed on the same layer, so that the sub audio data can be intuitively and clearly displayed.

Further, the audio types corresponding to the sub-audio data are obtained through preset, for example, the sub-audio data are bird call sub-audio data and dog call sub-audio data, and the audio types corresponding to the two sub-audio data are birds and beasts, so, taking fig. 2 as an example, when the audio types of the bird call sub-audio data and the dog call sub-audio data shown in fig. 2 are the same, fig. 2 can also be graphically displayed in the manner shown in fig. 5. As shown in fig. 5, perpendicular to the play time axis is an audio type axis, which distinguishes sub-audio data belonging to different audio types, the dog sub-audio data and the bird sub-audio data are in the same layer, and the piano sub-audio data are in another layer. And after classifying the sub-audio data by audio types, carrying out superposition display, wherein the information such as the playing time length of each sub-audio data is not influenced.

Taking fig. 3 as an example, when the audio types of the bird call sub-audio data and the dog call sub-audio data shown in fig. 3 are the same, fig. 3 may also be graphically displayed in the manner of fig. 6. As shown in fig. 6, the audio type axis is perpendicular to the plane of the circumferential time axis, and the audio type axis distinguishes sub-audio data belonging to different audio types, the dog sub-audio data and the bird sub-audio data are in the same layer, and the piano sub-audio data are in another layer. And after classifying the sub-audio data by audio types, carrying out superposition display, wherein the information such as the playing time length of each sub-audio data is not influenced.

It should be noted that, the audio types of the bird call sub audio data and the dog call sub audio data are only exemplary descriptions of the embodiments of the present application, in practical application, the naming of the audio types may be based on practical requirements, and the naming may be marked with numbers, letters, or characters, so long as the requirement of distinguishing different audio types is met. Meanwhile, the display of different layers can be shown in fig. 5 and 6, or can be displayed after the audio type shaft is touched, or can be displayed in other modes, such as text information and the like. The embodiments of the present application are not particularly limited herein.

Optionally, in an embodiment of the present application, the attribute information may further include at least one of a tone, a volume, and a timbre of the sub-audio data. When the attribute information includes a tone color, it is reflected that the acquired sub-audio data is determined by the tone color.

Referring to fig. 7, in case that the three-dimensional graphic is a cube graphic, the length of the second side N of the first side M of the cube graphic represents the tone of at least one sub-audio data, the ground tint (shadow) in the first side M represents the tone of at least one sub-audio data, and the length of the third side V of the cube graphic perpendicular to the first side M represents the volume of at least one sub-audio data.

Referring to fig. 8, in the case that the three-dimensional pattern is a sector pattern, a height N 'of an arc surface M' of the sector pattern represents a tone of at least one sub-audio data, a shading (a hatching) in the arc surface M 'represents a tone of at least one sub-audio data, and an extension distance V' of the sector to a center of a circle represents a volume of at least one sub-audio data.

When the audio data are a plurality of sub audio data, because the volumes are different, the vertical lengths from the circle center to the arc-shaped surface are different, and a user can clearly distinguish the volumes corresponding to the different sub audio data. For example, as shown in fig. 9, two sub-audio data are shown, and it can be seen from the figure that the bottoms of the arcuate surfaces of the two sub-audio data are different, the lengths from the center to the arcuate surfaces are different, the heights of the arcuate surfaces are different, that is, the tone colors of the two sub-audio data are different, the volumes of the two sub-audio data are different, and the tones of the two sub-audio data are different.

When the graphic display is performed in a circumferential time axis and a sector pattern, at least one sub-audio data acquired from the first audio data forms a cylindrical pattern, which is essentially a stacked sector pattern corresponding to the at least one sub-audio data. Meanwhile, in practical application, not only the shading can be used for identifying the tone, but also the color can be used for identifying the tone.

Illustratively, the first audio data is parsed to obtain the second audio data, the bird song sub-audio data, the dog song sub-audio data, the lyric song sub-audio data, the guitar sub-audio data, the harp sub-audio data, the suona sub-audio data, the piano sub-audio data and the unknown sub-audio data, and fig. 10 is obtained according to the above steps, which shows a schematic diagram showing a graphical manner. As can be seen, the guitar sub-audio data, the harp sub-audio data and the suona sub-audio data are of the same audio type.

It should be noted that, when the audio type corresponding to the sub-audio data is preset, the playing time length corresponding to the sub-audio data may also be referred to, so that the playing time lengths corresponding to the sub-audio data of the same audio type will not overlap. The playing time length is not only a time length, but also carries the starting time and the ending time of one piece of sub-audio data, so that the graph corresponding to one piece of sub-audio data can accurately correspond to the time of the playing time axis. Or if the playing time of two sub-audio data of the same audio type is overlapped, the two sub-audio data cannot be placed on the same layer, and the two sub-audio data are stacked. Alternatively, in the case where the respective sub-audio data are graphically presented on the play time axis, a time schedule scale is displayed on the time axis.

Further, fig. 11 is a plan view of fig. 10, and as shown in fig. 11, the position indicated by the black arrow is the start of the circumferential time axis, and also the end of the circumferential time axis, and increases in the direction of the arrow. The circumference time axis represents the playing time period of the first audio data, and each sub-audio data can find the starting time and the ending time in the first audio data in the circumference time axis to obtain the duration of the sub-audio data in the first audio data. As shown in fig. 11, the dotted line shows a form of one circle of the circumferential time axis, the time axis is a limited time axis, the time maximum value of the audio file is the tail end T2 of the time axis, and the time minimum value T1 of the audio file is the head end of the audio file on the time axis, i.e. the open circle on the dotted line is the tail end of the audio file on the time axis, and the solid circle on the dotted line is the tail end of the audio file on the time axis. The head section is connected with the tail end from front to back, so that the occupied space is small, the display content is rich, and the interface display utilization rate is improved.

With continued reference to fig. 11, V1 to V2 characterize the volume of each sub-audio data, the open circles on the line segments are the locations of the minimum volume of the audio file, and the solid circles on the line segments are the locations of the maximum volume of the audio file. That is, the volume is represented by the distance from the circle to the center of the circle, the volume corresponding to each sub-audio data is in the range of V1 to V2, V2 is determined by the maximum volume of one kind in the audio file, and V1 is determined as the zero point according to the practical application. In practical application, the head end of the time axis is displayed first, and the user can control the cylinder to rotate around the central rotating shaft of the cylinder, so that different contents corresponding to different positions of the circumferential time axis are displayed around the time axis.

It should be noted that, the first audio data includes at least one sub audio data, and one sub audio data is a segment pattern, so that the first audio data is graphically displayed in a cylindrical pattern, but there may be a difference in segments of each sub audio data, and the volume, tone, or pitch of each segment of each sub audio data is different, so that the surface of the cylindrical pattern actually obtained is in an uneven shape, where the shape may refer to the case shown in fig. 9, and embodiments of the present application are not repeated herein.

Alternatively, the content shown in fig. 10 may also be displayed by a linear time axis and a cube graphic, the display schematic of which is shown in fig. 12. As can be seen from the figure, each sub-audio data has different volume, and has different concave-convex in the direction of the volume axis, and at the same time, it is superimposed according to the audio type, clearly showing the sub-audio data included in the first audio data.

In the embodiment of the application, the profile information is not only used for distinguishing each piece of sub-audio data, but also can display attribute information of one piece of sub-audio data, namely, when a user touches one piece of sub-audio data, information such as volume, tone, playing duration and the like of the sub-audio data is displayed. The user can intuitively know various information contained in the sub-audio data, and the music theory knowledge of the user is enriched.

The user may perform a triggering operation on the graphic, which is implemented by a cursor or touch, and the triggering operation may perform functions including deleting, moving, adjusting a volume, and the like. The triggering operation executed by the user is the first input, the first input directly acts on the corresponding graph, and the functions of deleting, moving, adjusting the volume and the like are executed on the graph. Wherein the first input is to be acted on by a sub-audio data.

Taking the graph shown in fig. 12 as an example, when the user selects the first input of "unknown" performing deletion, the cube graph of "unknown" of at least one sub-audio data is deleted, as shown in fig. 13; on the basis of fig. 13, when the user selects "dog-up" to perform the first input of moving to the target position, the position of the "dog-up" cube graphic will be moved to the target position in accordance with the user's wish, as shown in fig. 14; when the user selects "urheen" to perform the first input of adjusting the volume to the target volume, the cube graphic of "urheen" will be lengthened or shortened to the target volume in the direction of the volume axial arrow, as shown in fig. 15. When a plurality of first inputs are inputted, the first inputs may be sequentially inputted in the first input order. Similarly, the first input may also be performed with respect to fig. 10, and the embodiments of the present application are not described herein again.

Optionally, before step 104, the method further comprises: a fifth input of third audio data by the user is received, and in response to the fifth input, a graphic corresponding to the third audio data is generated, the third audio data being used for inserting the first audio data or for replacing part or all of the sub audio data.

In an embodiment of the application, the first input comprises introducing new sub-audio data in addition to the current sub-audio data to the first audio data, and therefore, prior to performing such first input. The incoming sub-audio data needs to be processed to meet the user's needs. The third audio data is part or all of the audio data for inserting the first audio data or for replacing the sub audio data. Accordingly, a fifth input of the third audio data by the user is received, and in response to the fifth input, a graphic corresponding to the third audio data is generated, the graphic being of the same type as the graphic of the first audio data. Illustratively, on the basis of fig. 15, new sub-audio data, the profile information of which is "my monologue", is added to the target unknown, and the schematic diagram after the addition is shown in fig. 16.

Optionally, after graphically presenting the at least one sub-audio data, the method further comprises: a second input of the user to the graphic of any of the sub-audio data is received, and the sub-audio data is played in response to the second input.

In the embodiment of the application, after at least one piece of sub-audio data included in the first audio data is graphically displayed, a second input of a user on any piece of sub-audio data graph can be received, and the sub-audio data are played in response to the second input, so that the user is helped to know the at least one piece of sub-audio data included in the first audio data, and the familiarity of the user on the first audio data is improved.

It should be noted that, in response to the second input, the graphical manner shown in fig. 16 may further include a time-schedule scale, as shown in fig. 17, when playing the sub-audio data, where the time-schedule scale indicates K points, that is, playing the sub-audio data from K points.

Optionally, after graphically presenting the at least one sub-audio data, the method further comprises: and receiving a third input of the user, and marking sound positions corresponding to the search keywords in the graph in response to the third input, wherein the third input comprises the search keywords.

Embodiments of the present application may also determine the target object in a language retrieval manner when faced with a class containing language information, such as lyrics. The retrieved portions will be presented in a distinguishing manner from other content, for example, the retrieved portions are shown in phantom. Taking fig. 18 as an example, the shaded portion on "lyrics" may be presented for the result of the retrieval. At the same time, the timeline bar will also display this shadow, pointing automatically here, to facilitate the user to replace or delete the shadow. By marking the sound positions corresponding to the search keywords in the graph, accurate information indication is provided, and a user is helped to edit more conveniently.

The third input is performed on the premise that the first audio data includes any language information. Meanwhile, the language searching mode can be realized by referring to related technologies in the field, and the embodiment of the application is not repeated here.

In summary, the audio processing method provided by the embodiment of the application graphically displays the audio data, intuitively and clearly indicates the audio data, realizes the editing of the audio data by modifying the graph corresponding to the audio data, greatly reduces the application threshold of audio processing, has wide adaptability and is convenient to popularize. Meanwhile, the provided graphic mode can be displayed in a personalized way according to requirements, and display contents are enriched.

It should be noted that, in the audio processing method provided in the embodiment of the present application, the execution body may be an audio processing apparatus, or a control module in the audio processing apparatus for executing the loading audio processing method. In the embodiment of the present application, an audio processing device executes a loading audio processing method as an example, and the audio processing method provided in the embodiment of the present application is described.

Referring to fig. 19, which shows a block diagram of an audio processing apparatus 300 according to an embodiment of the present application, the audio processing apparatus 300 includes:

the obtaining module 301 is configured to parse the first audio data and obtain at least one sub-audio data.

The generating module 302 is configured to generate a graphic corresponding to at least one sub-audio data according to the attribute information of the at least one sub-audio data.

And the display module 303 is configured to display the graphics in a display time period of the first audio data according to at least one sub-audio data on a display time axis corresponding to the first audio data.

And the editing module 304 is configured to receive a first input from a user, and edit a graphic corresponding to the at least one sub-audio data in response to the first input.

And an output module 305, configured to output the second audio data according to the edited graphics.

According to the audio processing device provided by the embodiment of the application, the acquisition module analyzes the first audio data to acquire at least one piece of sub-audio data, the generation module generates the graph corresponding to the at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data, the display module arranges and displays the graph according to the playing time period of the at least one piece of sub-audio data on the playing time axis corresponding to the first audio data, the editing module receives the first input of a user, edits the graph corresponding to the at least one piece of sub-audio data in response to the first input, and the output module outputs the second audio data according to the edited graph.

Optionally, the generating module 302 is further configured to:

Generating at least one piece of sub-audio data and a three-dimensional graph corresponding to a playing time axis according to the attribute information of the at least one piece of sub-audio data, wherein different dimensions of the three-dimensional graph correspond to different attribute information;

The display module 303 is further configured to:

And according to the playing time period of at least one piece of sub-audio data in the first audio data, superposing the three-dimensional graph on a playing time axis for displaying.

Optionally, the playing time axis includes a linear time axis and a circumferential time axis, and each playing time axis corresponds to at least one three-dimensional graph; the attribute information includes: profile information and play time length of the sub-audio data;

in the case that the playing time axis is a linear time axis, the three-dimensional graph is a cube graph, a first side of the cube graph displays profile information of at least one piece of sub-audio data, and the length of the first side represents the playing time length of the at least one piece of sub-audio data;

Wherein, in the cube patterns displayed on the linear time axis, the extending direction of the first side of each cube pattern is consistent with the extending direction of the linear time axis, and the orientation of the first side of each cube pattern is the same;

Under the condition that the playing time axis is a circumference time axis, the three-dimensional graph is a sector graph, the arc-shaped surface of the sector graph displays brief information of at least one piece of sub-audio data, and the angle opposite to the arc-shaped surface represents the playing time length of the at least one piece of sub-audio data;

in the sector patterns displayed on the circumference time axis, the circle center corresponding to the arc edge of each sector pattern is consistent with the circle center of the circumference time axis.

Optionally, the apparatus 300 further comprises:

And the acquisition module is used for acquiring the audio type corresponding to the at least one piece of sub-audio data.

The display module 303 is further configured to:

and on a playing time axis corresponding to the first audio data, the graphics of the sub audio data of the same audio type are placed on the same layer for displaying.

Optionally, the apparatus 300 further comprises:

and the marking module is used for receiving a third input of a user, and marking the sound position corresponding to the search keyword in the graph in response to the third input, wherein the third input comprises the search keyword.

According to the audio processing device provided by the embodiment of the application, the audio data is graphically displayed, so that a user can intuitively and clearly know the audio data, edit the audio data by editing the graph corresponding to the audio data and output the edited audio data, the threshold of audio processing is greatly reduced, the adaptability is wide, and the popularization is convenient. Meanwhile, the provided graphic mode can be displayed in a personalized way according to requirements, and display contents are enriched.

The audio processing device in the embodiment of the application can be a device, and also can be a component, an integrated circuit or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., and the embodiments of the present application are not limited in particular.

The audio processing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The audio processing device provided in the embodiment of the present application can implement each process implemented by the audio processing device in the method embodiment of fig. 1 to 18, and in order to avoid repetition, a description is omitted here.

Optionally, as shown in fig. 20, the embodiment of the present application further provides an electronic device 400, including a processor 401, a memory 402, and a program or an instruction stored in the memory 402 and capable of running on the processor 401, where the program or the instruction is executed by the processor 401 to implement each process of the above-mentioned embodiment of the audio processing method, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

It should be noted that, the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 21 is a schematic hardware structure of an electronic device 500 implementing an embodiment of the present application.

The electronic device 500 includes, but is not limited to: radio frequency unit 501, network module 502, audio output unit 503, input unit 504, sensor 505, display unit 506, user input unit 507, interface unit 508, memory 509, and processor 510.

Those skilled in the art will appreciate that the electronic device 500 may further include a power source (e.g., a battery) for powering the various components, and that the power source may be logically coupled to the processor 510 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 21 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown in the drawings, or may combine some components, or may be arranged in different components, which will not be described in detail herein.

The processor 510 is configured to parse the first audio data, and obtain at least one sub-audio data.

And a display unit 506, configured to generate a graphic corresponding to at least one sub-audio data according to the attribute information of the at least one sub-audio data.

The display unit 506 is further configured to arrange and display the graphics according to at least one sub-audio data on a play time axis corresponding to the first audio data in a play time period of the first audio data.

And a user input unit 507 for receiving a first input of a user, and editing a graphic corresponding to at least one sub-audio data in response to the first input.

And an output unit 504 for outputting the second audio data according to the edited graphic.

According to the embodiment of the application, the audio data is graphically displayed, so that a user can intuitively and clearly know the audio data, edit the audio data by editing the graph corresponding to the audio data, and output the edited audio data, the threshold of audio processing is greatly reduced, the adaptability is wide, and the popularization is convenient.

Optionally, the display unit 506 is further configured to generate at least one three-dimensional graphic corresponding to the sub-audio data and the playing time axis, where different dimensions of the three-dimensional graphic correspond to different attribute information.

Optionally, the display unit 506 is further configured to superimpose the three-dimensional graphic on the playing time axis according to the playing time period of the at least one sub-audio data in the first audio data for display.

Optionally, the processor 510 is further configured to obtain an audio type corresponding to the at least one sub-audio data.

Optionally, the display unit 506 is further configured to display, on the play time axis corresponding to the first audio data, a graphic of sub-audio data of the same audio type on the same layer.

Optionally, the display unit 506 is further configured to receive a third input from the user, and mark a sound location corresponding to the search keyword in the graph in response to the third input, where the third input includes the search keyword.

The embodiment of the application also performs personalized display according to the requirements in a provided graphic mode, and enriches display contents. The superposition of a plurality of sub-audio data is carried out through the audio type, the sub-audio data included in the first audio data are clearly displayed, meanwhile, the first surface of the cube graphic or the sector graphic displays profile information, a user can intuitively know various information included in the sub-audio data, and the music theory knowledge of the user is enriched. And the user is helped to know at least one piece of sub-audio data included in the first audio data by playing the sub-audio data, so that the familiarity of the user with the first audio data is improved. And the sound positions corresponding to the search keywords in the graph are marked, so that accurate information indication is provided, and a user is helped to edit more conveniently.

The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above-mentioned audio processing method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the embodiment of the audio processing method, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method of audio processing, the method comprising:

analyzing the first audio data to obtain at least one piece of sub-audio data;

Outputting second audio data according to the edited graph;

the generating a graph corresponding to the at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data comprises the following steps:

generating a three-dimensional graph corresponding to the at least one piece of sub-audio data and the playing time axis according to the attribute information of the at least one piece of sub-audio data, wherein different dimensions of the three-dimensional graph correspond to different attribute information;

the step of arranging and displaying the graphics according to the playing time period of the at least one piece of sub-audio data in the first audio data on the playing time axis corresponding to the first audio data, includes:

according to the playing time period of the at least one piece of sub-audio data in the first audio data, the three-dimensional graph is overlapped on the playing time axis to be displayed;

after generating the graph corresponding to the at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data, the method further comprises the following steps:

acquiring an audio type corresponding to at least one piece of sub-audio data;

2. The method of claim 1, wherein the play time axis comprises a linear time axis and a circumferential time axis, each play time axis corresponding to at least one three-dimensional graphic; the attribute information includes: profile information and play time length of the sub-audio data;

When the playing time axis is a linear time axis, the three-dimensional graph is a cube graph, a first surface of the cube graph displays profile information of the at least one piece of sub-audio data, and the length of a first edge of the first surface represents the playing time length of the at least one piece of sub-audio data;

wherein, in the cube patterns displayed on the linear time axis, the extending direction of the first side of each cube pattern is consistent with the extending direction of the linear time axis, and the first surface of each cube pattern faces the same;

When the playing time axis is a circumference time axis, the three-dimensional graph is a sector graph, an arc surface of the sector graph displays profile information of the at least one piece of sub-audio data, and an angle opposite to the arc surface represents playing time of the at least one piece of sub-audio data;

3. The method of any of claims 1-2, wherein after graphically presenting the at least one sub-audio data, the method further comprises:

And receiving a third input of a user, and marking a sound position corresponding to the search keyword in the graph in response to the third input, wherein the third input comprises the search keyword.

4. An audio processing apparatus, the apparatus comprising:

the output module is used for outputting second audio data according to the edited graph;

The generating module is further configured to:

The display module is further configured to:

the device further comprises:

The acquisition module is used for acquiring the audio type corresponding to the at least one piece of sub-audio data;

The display module is further configured to:

5. The apparatus of claim 4, wherein the playback time axis comprises a linear time axis and a circumferential time axis, each playback time axis corresponding to at least one three-dimensional graphic; the attribute information includes: profile information and play time length of the sub-audio data;

6. The apparatus according to any one of claims 4-5, further comprising: