CN113360129A

CN113360129A - Sound playing method and device, electronic equipment and readable storage medium

Info

Publication number: CN113360129A
Application number: CN202110677717.XA
Authority: CN
Inventors: 谢芳
Original assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-09-07

Abstract

The application provides a sound playing method, a sound playing device, an electronic device and a readable storage medium, wherein before the electronic device plays a target file, each audio clip of the audio frequency of the target file is processed and played according to the sampling values of most sampling points in the audio clip, so that after the media playing volume of the electronic device is set, the electronic device can output consistent volume when playing different target files, the user can hear consistently, the user does not need to manually adjust the media playing volume of the electronic device, the operation speed is high, the process is simple, better music experience is brought to the user, and the existing music playing logic is not influenced.

Description

Sound playing method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of multimedia control technologies, and in particular, to a sound playing method and apparatus, an electronic device, and a readable storage medium.

Background

With the rapid development of the technology, the sound playing function becomes an indispensable function of the electronic device. The user can play music, record a sound, etc. using the electronic device.

In general, different audio frequencies have different volumes, some audio frequencies have large volumes, and some audio frequencies have small volumes. Where audio is also referred to as a playback source, sound source, music source, etc. For electronic equipment with calibrated volume, when sound files with different volumes are played, the volume difference of the played sound files is large. Taking music a and music B as an example, the volume of music a is large, and the volume of music B is small. Assuming that the volume level of the electronic device includes 30 levels, the user sets the volume level to 15 levels and plays music a and music B in sequence, music a may sound much louder than music B, causing the user to feel that music a is loud or that music B is loud. In order to overcome this problem, when the electronic device plays music a and music B, the volume heard by the user is the same, and when playing music a or playing music B, the user is required to adjust the volume level of the electronic device. Assuming that the volume of music a is too large when the media volume of the electronic device is level 15, the user adjusts the volume level to level 14, which is 3 decibels (dB) lower than level 15.

In the sound playing process, the user is required to adjust the media volume of the electronic equipment according to the volume of the music, and the process is complicated.

Disclosure of Invention

The embodiment of the application discloses a sound playing method and device, electronic equipment and a readable storage medium, wherein sampling values of most sampling points in a target file are unified to the target sampling value, so that the electronic equipment outputs consistent volume when playing different target files, the media volume of the electronic equipment does not need to be adjusted manually, and the process is simple and convenient.

In a first aspect, an embodiment of the present application provides a sound playing method, including:

receiving a playing instruction for requesting to play a target file containing audio;

responding to the playing instruction, and determining sampling values of all sampling points of the audio clip corresponding to the sliding window in the audio;

determining a reference sampling value from sampling values of sampling points of the audio clip, wherein the reference sampling value is the sampling value with the largest occurrence frequency in the sliding window;

and processing the audio clip according to the reference sampling value and playing.

In a second aspect, an embodiment of the present application provides a sound playing apparatus, including:

the receiving module is used for receiving a playing instruction for requesting to play a target file containing audio;

the processing module is used for responding to the playing instruction and determining sampling values of all sampling points of the audio clip corresponding to the sliding window in the audio;

the determining module is used for determining a reference sampling value from sampling values of all sampling points of the audio clip, wherein the reference sampling value is the sampling value with the largest occurrence frequency in the sliding window;

the processing module is further used for processing the audio segment according to the reference sampling value;

and the playing module is used for playing the audio clip processed by the processing module.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor causing the electronic device to implement the method as described above when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer instructions, which when executed by a processor, are used to implement the method as described above.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program, which when executed by a processor implements the method as described above.

According to the sound playing method and device, the electronic device and the readable storage medium, after receiving a playing instruction requesting playing of a target file containing audio, the electronic device responds to the playing instruction, determines sampling values of all sampling points in an audio clip corresponding to a sliding window in the audio, determines the sampling value with the largest occurrence frequency from the sampling values of all the sampling points, takes the sampling value as a reference sampling value, processes all the sampling values in the audio clip according to the reference sampling value and plays the sampling value. By adopting the scheme, before the electronic equipment plays the target file, for each audio clip of the audio frequency of the target file, the audio clip is processed and played according to the sampling values of most sampling points in the audio clip, so that after the media playing volume of the electronic equipment is set, the electronic equipment can output consistent volume when playing different target files, the audiences of users are consistent, the media playing volume of the electronic equipment does not need to be manually adjusted by the users, the operation speed is high, the process is simple, better music experience is brought to the users, and the existing music playing logic is not influenced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of a process for peak normalization;

fig. 2 is a frame diagram of a playing system to which the sound playing method provided in the embodiment of the present application is applied;

fig. 3 is a flowchart of a sound playing method provided by an embodiment of the present application;

fig. 4 is a histogram in a sound playing method provided by an embodiment of the present application;

fig. 5 is a flowchart of processing an audio segment according to a reference sample value in a sound playing method provided by an embodiment of the present application;

fig. 6 is another flowchart of a sound playing method provided in an embodiment of the present application;

FIG. 7 is a comparison of a method according to an embodiment of the present application and a normalization scheme that relies on maximum sample values;

FIG. 8 is a schematic diagram of a large volume sliding window in a sound processing method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a sound playing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the examples and figures of the present application are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

With the rapid development of smart cars, vehicle-mounted multimedia systems are also more and more favored by users. Music playing is an important function of the vehicle-mounted multimedia system, and the frequency of music playing by a user using the vehicle-mounted multimedia system is higher and higher. The vehicle-mounted multimedia system can present the user with an immersive listening experience, and benefits from a highly configured sound system of a vehicle and an advanced sound effect algorithm. For example, some vehicle terminals are equipped with 13-channel speakers and advanced sound algorithms to make the music experience more immersive.

However, many car multimedia systems and conventional playing systems on the market currently have the following problems: different playing sources have different volumes, some playing sources have large volume, and some playing sources have small volume. The playback source is music, for example. For a vehicle with calibrated volume, when the vehicle-mounted multimedia system of the vehicle plays different music due to different playing sources, the volume played is greatly different. In addition, the vehicle-mounted loudspeaker is also a high-power device, and the small difference of the playing sources can be amplified by a rear-end power amplifier, so that great difference is brought.

In short, assuming that the volume of music a is greater than that of music B, the calibrated media playing volume of the vehicle has 30 levels, the media playing volume of the multimedia system is set to a certain level, for example, 15 levels, and the vehicle-mounted multimedia system is used to play music a and music B in sequence. From the user's perspective, music a sounds much louder than music B. Bringing a poor experience to the user.

If the user wants music a and music B to have the same listening feeling, the media playing volume needs to be adjusted while music a or music B is played. For example, when music a is played, the media playback volume is adjusted from 15 levels to 14 levels. Then, when music B is played, the media playback volume is adjusted from 14 level to 15 level. The adjustment process is cumbersome and it is likely that the media playback volume needs to be adjusted to fit different volume music each time different music is played.

Moreover, even if the volume of the media is adjusted, the multimedia system cannot necessarily bring the same sense of hearing to the user when playing music a and music B. For example, music B is 2 decibels (dB) greater than music a, the difference between the sound volumes corresponding to the adjacent two media playback sound volume levels is 3dB, and if music a is played, the media playback sound volume is set to 15 levels. Then, when music B is played, the volume of the media is adjusted from 15 level to 14 level, the volume of music B is reduced by 3dB, and the hearing of music B reduced by 3dB to the user is still different from the hearing of music a to the user.

In order to solve the problem that the output volumes of different music played by a multimedia system are inconsistent, the traditional solution is peak value standardization.

Fig. 1 is a schematic diagram of a process of peak normalization. Referring to fig. 1, the vehicle-mounted multimedia system decodes the music to form streaming media data, and samples the streaming media data to obtain a maximum sampling value, where the maximum sampling value is a sampling peak value. And then, determining a unified coefficient according to the maximum sampling value, and carrying out peak value unification on all sampling points according to the unified coefficient. And in the peak value unifying process, multiplying the sampling value of each sampling point of the streaming media data by a unifying coefficient to obtain a new sampling value. The peak unification process is equivalent to performing equal-ratio amplification or attenuation on the sampling values of each sampling point of the streaming media data. And then, carrying out volume adjustment, sound effect adjustment and the like on the streaming media data with the unified peak value, and carrying out digital-to-analog conversion on the adjusted digital signal and playing.

However, the effect of the peak normalization scheme is heavily dependent on the volume of the sampled peak, so that even after normalization, the volume of the non-peak sample point is still small, and the playing effect is poor.

Based on this, embodiments of the present application provide a sound playing method and apparatus, an electronic device, and a readable storage medium, in which sampling values of most sampling points in a target file are unified to a target sampling value, so that the electronic device outputs a consistent volume when playing different target files, and does not need to manually adjust the media volume of the electronic device, and the process is simple and convenient.

The sound playing method provided by the embodiment of the application is applied to electronic equipment with a sound playing function, including but not limited to a mobile phone, a tablet computer, a sports camera, a smart watch, a smart bracelet, smart glasses, vehicle-mounted terminal equipment and the like. The electronic device may be a portable device, such as an electronic device loaded with an IOS, Android, Windows, or other operating system. It should be understood that in other embodiments of the present application, the electronic device may not be a portable electronic device, but may be a desktop computer having a touch-sensitive surface (e.g., a touch panel).

The electronic equipment has different levels of media playing volume when leaving the factory. For example, 30 levels of media playback volume, the difference between adjacent levels of media playback volume being the same, increasing by multiples, etc. The user can flexibly adjust the volume of the media playing according to the voice instruction, the touch instruction, the hardware key and the like. After the user sets the media playing volume of the electronic device to a certain level, in the conversation process, when the user speaks a small sound or a big sound to the microphone, the listening feelings of the opposite sides are different. When different target files are played, for example, the target file 1 and the target file 2 are played, the target file 1 contains audio a, the target file 2 contains audio B, and the volume of the audio a is greater than that of the audio B, the electronic device enables the output volume of the electronic device to be consistent by executing the method in the embodiment of the application, so that the same listening feeling is brought to a user, and the user does not need to adjust the media playing volume to adapt to different target files.

The sound playing method provided by the embodiment of the application can be used for various Applications (APPs) capable of playing sound. The server realizes the sound playing method provided by the embodiment of the application through software. The electronic device may upgrade the APP in an Over-the-Air Technology (OTA) manner, so as to have the function of playing the sound as described in the embodiments of the present application. The upgrading process is simple.

Fig. 2 is a frame diagram of a playing system to which the sound playing method provided in the embodiment of the present application is applied. Referring to fig. 2, the playing frame includes: a System-on-a-Chip (SoC) Chip, a Digital Signal Processor (DSP), a Digital to analog converter (DAC), a Power Amplifier (PA), and a speaker. Communication can be carried out between the SoC and the DSP through I2C, I2S and the like, the DSP is connected with the DAC, the DAC is connected with the PA, and the PA is connected with the loudspeaker. Wherein I2S is, for example, MI 2S.

When the target file starts to be played, the audio contained in the target file is decoded to form streaming media data, and the streaming media data is sent to the SoC. Software is run in the SoC for controlling pause, play, etc. of the player. The electronic device may determine one or more audio segments from the audio using the sliding window concept. For each sliding window, the software in the SoC further analyzes the streaming media data corresponding to the sliding window to determine a reference sample value of the audio segment. And then, the SoC processes each sampling point of the audio clip according to the reference sampling value, and adjusts the volume of the processed audio clip and the like. The volume adjustment is, for example: and amplifying the volume of the audio file, smoothing and the like according to the playing volume of each level of media. In addition, the Soc can adjust the sound effect of the audio clip.

Since the SoC has limited volume adjustment and sound effect adjustment functions, the SoC outputs the adjusted audio clip to the DSP through the I2S interface. The DSP is also called as a sound effect DSP, and the sound effect DSP further performs volume adjustment and sound effect adjustment on the audio clip. For example, an Equalizer (EQ), speaker effect processing, and the like are performed on the audio piece.

And the DAC converts the streaming media data processed by the DSP into an analog signal, and the analog signal is amplified by the PA and then output through a loudspeaker, so that a user can hear the sound.

Fig. 3 is a flowchart of a sound playing method according to an embodiment of the present application. The execution main body of the implementation is an electronic device, and the embodiment includes:

301. and receiving a playing instruction for requesting to play the target file containing the audio.

In the embodiment of the present application, the target file is a file containing audio, for example, the target file is an audio file or a video file containing audio. The audio file is, for example, music, a recording, etc., and the format of the audio includes, but is not limited to, WAV, CDA, Musical Instrument Digital Interface (MIDI), MP3, etc.

The user can input a playing instruction for requesting to play the target file by means of voice, touch and the like. Taking the target file as music a as an example, the user speaks to the electronic device in a voice manner: if the user wants to play music A, the electronic device plays local music A or searches for music A online and plays music A. Alternatively, the user clicks an icon of music a on the display screen of the electronic device to cause the electronic device to play music a.

302. And responding to the playing instruction, and determining the sampling value of each sampling point in the audio clip corresponding to the sliding window in the audio.

In the embodiment of the present application, the length of the sliding window is less than or equal to the audio frequency. After the electronic device identifies the playing instruction, the audio clips corresponding to all the sliding windows or part of the sliding windows can be processed.

In the processing process, the electronic equipment determines an audio segment corresponding to the sliding window from the audio, and samples the streaming media data corresponding to the audio segment to determine a sampling value of each sampling point of the audio segment. For example, the length of the sliding window is 10 milliseconds (ms), i.e., 0.01 seconds, and the sampling rate is 48000 hertz (Hz), and with two-channel sampling, the number of sampling points is: 48000 × 2 × 0.01 ═ 960. The electronics determine a sample value for each of the 960 sample points.

303. And determining a reference sampling value from sampling values of sampling points of the audio segment, wherein the reference sampling value is the sampling value with the largest occurrence frequency in the sliding window.

Illustratively, each sample point has its own sample value, and the sample values at different sample points may be the same or different. That is, some sampling values have a relatively large number of corresponding sampling points, and some sampling values have a relatively small number of corresponding sampling points. And the electronic equipment counts the number of sampling points corresponding to each sampling value, so that the sampling value with the largest occurrence frequency is determined.

Continuing with the example above, when the sample values of each of the 960 sample points are different, then there are 960 sample values. When the sample values of some of the 960 sample points are the same, the number of sample values is less than or equal to 960. When the number of the sampling values is smaller than or equal to 960, the electronic device determines the number of sampling points corresponding to each sampling value, and takes the sampling value with the largest number of sampling points as a reference sampling value.

When the number of samples is equal to 960, the electronic device sorts the 960 samples in order of arrival from the smallest, dividing the sequence into a plurality of intervals, the range of samples being different for different intervals. And determining an interval with the most sampling points from the plurality of intervals, and taking the average value of the sampling values of the sampling points in the interval as a reference sampling value.

304. And playing the audio clip according to the reference sampling value.

In the process of playing the target file, when the audio clip is played, the sampling value of each sampling point in the audio clip is processed according to the reference sampling value and then played.

After recognizing the playing instruction, the electronic device may perform the processing of

steps

302 and 302 on the audio clip corresponding to each sliding window, and then store each processed audio clip. Thereafter, when step 304 is executed, the processed audio clip is played.

In addition, the electronic device may also predict the content to be played within a certain time period after a future time point after recognizing the play instruction. For example, at this time, 6: 35 minutes, the electronic device predicts that the content played by 100ms is played by 6: 36 minutes, and performs the above-mentioned

steps

302 and 303 for the content of 100 ms. For example, audio segments corresponding to 10 sliding windows are determined, and respective reference sample values of each audio segment are determined. After that, starting at point 6 and 36, the electronic device plays the content processed in

steps

302 and 303. In this kind of mode, electronic equipment carries out the sound broadcast while broadcasting and handles, and it is little to adjust the granule, and the precision is high and time is with low costs.

According to the sound playing method provided by the embodiment of the application, after the electronic equipment receives a playing instruction for requesting playing of a target file containing audio, the electronic equipment responds to the playing instruction, determines the sampling values of all sampling points in an audio clip corresponding to a sliding window in the audio, determines the sampling value with the largest occurrence frequency from the sampling values of all the sampling points, takes the sampling value as a reference sampling value, processes all the sampling values in the audio clip according to the reference sampling value and plays the sampling value. By adopting the scheme, before the electronic equipment plays the target file, for each audio clip of the audio frequency of the target file, the audio clip is processed and played according to the sampling values of most sampling points in the audio clip, so that after the media playing volume of the electronic equipment is set, the electronic equipment can output consistent volume when playing different target files, the audiences of users are consistent, the media playing volume of the electronic equipment does not need to be manually adjusted by the users, the operation speed is high, the process is simple, better music experience is brought to the users, and the existing music playing logic is not influenced.

Optionally, in the above embodiment, when the electronic device determines the reference sample value from the sample values of the sampling points in the audio segment, a histogram is determined according to the sample values of the sampling points in the audio segment, where the histogram is used to indicate the occurrence times of different sample values. The electronic device then determines the reference sample value from the histogram.

For example, after the electronic device samples the audio segment to obtain the sampling value of each sampling point, a histogram is determined according to the sampling value of each sampling point, the abscissa of the histogram represents the sampling value, and the ordinate represents the occurrence number of the sampling value. For example, please refer to fig. 4.

Fig. 4 is a histogram in the sound playing method provided in the embodiment of the present application. Referring to fig. 4, the abscissa represents different sampling values, each corresponding to one occurrence number. The thick black solid line indicates that the sampling point corresponding to the sampling value has the largest number. That is, the sample values of most sample points are as indicated by the thick black solid line in the figure. Therefore, the sample value with the largest number of occurrences, i.e., the reference sample value, can be determined from the histogram.

If a segment of audio segment contains a large volume part and a small audio part, and the small audio ratio is large, if the reference sampling value is determined by adopting an averaging method, the sound near the large volume is averaged, and the large volume is affected.

By adopting the scheme, the reference sampling value is determined through the histogram instead of determining the reference sampling value through averaging and other modes, most sampling points can be considered, the influence on the volume with smaller proportion in the audio segment is avoided, and the accuracy is high.

Fig. 5 is a flowchart of processing an audio segment according to a reference sample value in a sound playing method provided in an embodiment of the present application. The embodiment comprises the following steps:

501. a target sample value for the audio segment is determined.

Illustratively, the maximum value of the theoretical sample (hereinafter referred to as the theoretical sample value) of an audio segment is fixed, and the theoretical sample value is usually represented by overload. The theoretical sampling value is usually determined by the number of sampling bits. For example, for 16-bit music, it can be represented by two bytes, i.e. the number of sampling bits occupies 2 bytes, i.e. 16 bits, and the data range is-32768-32767. Thus, overload is 32767.

The electronic device may determine a target sample value from the theoretical sample value, the target sample value (hereinafter p std) being, for example, 0.8 × overload. Therefore, for 16bit music, p _ std is 0.8 × 32767 is 26213.

502. And determining a unified coefficient according to the target sampling value and the reference sampling value.

Assuming that most of the sample values are sample _ most, the reference sample value is equal to sample _ most. The electronic device can obtain a unity coefficient g from the reference sample value and the target sample value, wherein g is p _ std/smple _ most.

503. And adjusting the sampling value of each sampling point in the audio clip according to the unified coefficient.

For example, the electronic device multiplies the sampling value of each sampling point in the sliding window by a uniform coefficient, so that most of the sampling values in the sliding window can be uniform to a target sampling value, and the purpose of uniform volume output is achieved.

Taking a sliding window of 10ms and a sampling rate of 48000 hertz (Hz) as an example, taking two-channel sampling as an example, the number N of sampling points is 48000 × 2 × 0.01 is 960, and 960 sampling points are total, and the sampling values are respectively represented as sample0, sample1, and sample2 … … sample 959.

And (3) representing the sampling value of the ith (i is more than or equal to 0 and less than or equal to 959) sampling point in 960 sampling points as sample i, and after the electronic equipment determines a uniform coefficient g, performing normalization processing on all the sampling points to obtain a new sampling value. Representing the new sample value by sample _ newi, then:

Sample_newi＝sample i×g。

504. and playing the audio clip after the sampling value is adjusted.

By adopting the scheme, a unified coefficient is determined according to the target sampling value and the reference sampling value, and the sampling value of each sampling point in the sliding window is multiplied by the unified coefficient, so that the sampling values of most sampling points in the sliding window are unified to the target sampling value, and the purpose of outputting unified volume is realized.

In the above embodiment, before determining the reference sample value from the sample values of the sampling points in the audio segment, the electronic device further determines the maximum sample value from the sample values of the sampling points in the audio segment, and if the maximum sample value is smaller than a preset sample value, the reference sample value is further determined, and the uniform coefficient is determined according to the reference sample value and the target sample value. And when the maximum sampling value is greater than or equal to the preset sampling value, directly playing the audio clip without processing the sampling value of each sampling point in the audio clip.

For example, after determining the sample values of the sample points in the audio segment, the electronic device can determine the maximum sample value from the sample values. For example, the maximum sample value is determined by traversing the sample values, and the traversal function is as follows:

p_max＝MAX(sample i)，i＝0,1,2……。

in the above traversal function, p _ max represents the maximum sample value.

In one embodiment, the maximum sample value of an audio piece, also referred to as a sample peak, may be obtained by the following interface. The interfaces are as follows:

static int get_peak(int samples[],int offset,int N)

the samples point to the sampling value array of the sampling points of the audio clip, the offset is the data position where the peak value needs to be obtained, that is, the position of the first sampling point of the audio clip in the audio, and N is the number of the sampling points of the audio clip.

The electronic device is further capable of determining a predetermined sample value from the theoretical sample values of the audio piece, which predetermined sample value is, for example, larger than the target sample value mentioned above. When one or more sampling values of the audio segment are greater than the preset sampling value, for example, the maximum sampling value of the audio segment is greater than the preset sampling value, and the minimum sampling value of the audio segment is greater than the preset sampling value, or there are other sampling values in the audio segment that are greater than the preset sampling value, but the minimum sampling value of the audio segment is smaller than the preset sampling value. At this time, the electronic device directly plays the audio clip.

Taking the theoretical sampling value as overload as an example, the preset sampling value is, for example, 0.9 × overload. When all sampling values in the audio clip are less than or equal to the preset sampling values, the method of the application is executed, namely, the reference sampling values are determined according to the sampling values of all the sampling points, and the sampling values of all the sampling points in the audio clip are adjusted according to the reference sampling values and played. When one or more sampling values larger than a preset sampling value exist in the audio clip, the electronic equipment does not process the sampling value of each sampling point of the audio clip, but directly plays the audio clip.

By adopting the scheme, when large volume appears in the sliding window, the sound in the sliding window is not subjected to unified treatment, large volume output can be reserved, and the volume output has a larger dynamic range. That is to say, the sound playing method provided by the application can standardize the output volume and is compatible with the dynamic range of the sound. In some playing scenes needing high dynamic range, such as movie theaters and the like, the method has better hearing experience.

Optionally, in the above embodiment, the length of the audio segment may be equal to or less than the audio contained in the target file. When the length of the audio segment is less than the length of the audio, the audio is said to contain one or more sliding windows. And for each sliding window, the electronic equipment determines the first sampling point of the audio clip from the sampling points corresponding to the sound file. And then, according to the length of the sliding window and the first sampling point, determining the audio clip from the sound file.

Illustratively, assuming that the audio is 3 minutes (min) in length, when the sampling rate is 48000 hertz (Hz), and two-channel sampling is employed, the number of sampling points is: 48000X 2X 3X 60. Assuming a sliding window of 10ms, 3 minutes of audio contains 300 10ms segments of audio. The electronic device may perform the sound playing method described above on one or more of the 300 audio clips. For each audio frequency, the electronic equipment can determine the audio frequency segment according to the length of the sliding window and the position of the first sampling point of the audio frequency segment.

By adopting the scheme, the electronic equipment determines the audio clip from the audio according to the position of the first sampling point and the length of the sliding window, so that the audio playing processing is carried out in a targeted manner, and the electronic equipment is high in flexibility and high in speed.

Optionally, in the above embodiment, before the electronic device processes the audio segment according to the reference sample value and plays the audio segment, the media playing volume of the electronic device is further adjusted.

In the embodiment of the application, the media playing audio is the volume of the electronic device, and the user can adjust the media playing volume in a voice mode, a touch mode or a physical key pressing mode. For example, pressing a physical case of a mobile phone can adjust the media playing volume of the mobile phone; for another example, when the electronic device is a smart speaker, the user speaks a wake-up word to the smart speaker and sends a voice instruction "sound is turned up a little" to adjust the media playing volume of the smart speaker.

In addition, many Applications (APPs) also have a volume adjustment function, such as a video playing APP and a music playing APP, and a user can adjust the volume of the APP in a voice mode and a touch mode.

When the media volume is set unreasonable, the sound playing method can not achieve good effect. For example, the sound of music a is relatively large, and the media playing volume is also large, even if music a is adjusted by using the sound playing method described in the embodiment of the present application, the media playing volume is also large, so that the adjusted volume of music a is still large.

Therefore, before the electronic device plays the music A, the media playing volume of the electronic device is adjusted, so that the media playing volume of the electronic device is in a reasonable position. On the basis, the audio is subjected to sound playing processing, so that the finally played sound effect is optimal.

Fig. 6 is another flowchart of a sound playing method provided in an embodiment of the present application. The embodiment comprises the following steps:

601. and setting the media playing volume.

Taking the electronic device as an in-vehicle terminal as an example, the media playing volume of the in-vehicle terminal is usually 30 levels. The user can set the media playing volume according to the preference, the current environment and the like. For example, if the current environment is noisy, the media playback volume is set to be relatively high, such as 20 levels. For another example, if the user likes music with a smaller volume, the media playback volume is set to be lower.

602. And receiving a playing instruction for requesting to play the target file containing the audio.

603. And responding to the playing instruction to obtain the maximum sampling value of the audio clip in the sliding window.

604. Comparing the maximum sampling value with a preset sampling value, and executing the step 605 when the maximum sampling value is greater than or equal to the preset sampling value; when the maximum sampling value is smaller than the preset sampling value, step 606 is executed. The preset sampling value is, for example, 0.9 × theoretical maximum sampling value overload, 0.85 × overload, or the like.

It should be noted that, although the step is described by comparing the maximum sampling value of the audio with the preset sampling value, the embodiment of the present application is not limited thereto, and in other feasible implementation manners, the audio clip may be directly played as long as the sampling value greater than the preset sampling value exists in the sampling value corresponding to the audio clip.

605. And playing the audio clip.

In the process of playing the target file by the electronic equipment, the sampling values of the sampling points of the audio clip are not processed, but are directly played.

606. And acquiring reference sampling points of the audio clips in the sliding window.

Specifically, refer to the description of

steps

302 and 303 in fig. 3, which is not described herein again.

607. And determining a unified coefficient according to the reference sampling point and the target sampling point.

The target sample point is, for example, 0.8 × the theoretical maximum sample value.

608. And processing the sampling values of all sampling points of the audio clip according to the uniform coefficient.

For example, each sample value is multiplied by a unity coefficient to obtain a new sample value.

609. And playing the audio clip after the sampling value is adjusted.

And in the process of playing the target file by the electronic equipment, when the audio clip is played, the audio clip obtained by processing the sampling value of the sampling point is played.

Fig. 7 is a graph comparing a method described in an embodiment of the present application with a normalization scheme that relies on maximum sample values. Referring to fig. 7, the waveform of the target file is shown as the left waveform, the middle waveform shows the waveform obtained by the prior art that depends on the maximum sampling value normalization scheme, and the right waveform shows the waveform processed by the sound playing method according to the embodiment of the present application.

The left waveform includes waveforms of a smaller volume portion and a larger volume portion. For example, the target file is a piece of music, the beginning is the sound of singing, and the following large volume part is the drum sound, etc.

After the traditional peak value standardization scheme is adopted, the sound volume of the non-drumming part is still small, and the drumming sound volume is large.

When the sound playing method is adopted to process the audio frequency of the target file, at least two audio frequency segments can be obtained from the audio frequency of the target file by utilizing the sliding window, wherein one audio frequency segment is a drumbeat part, and the other audio frequency segment is a non-drumbeat part. Wherein the sliding window is shown as a rectangular box in the figure.

Obviously, the reference sample values of the two audio segments are different, and therefore, the unity coefficients of the two audio segments are also different. And a non-drumming part for increasing the volume of the non-drumming part by multiplying the sampling value of each sampling point by the uniform coefficient of the non-drumming part. And the drum sound part multiplies the sampling value of each sampling point of the drum sound part by another uniform coefficient so as to increase the volume of the drum sound part, but the amplification is smaller than that of the non-drum sound part.

In addition, when large volume occurs in the sliding window, for example, the maximum sampling value in the sliding window is greater than a preset sampling value, the sound in the sliding window may not be processed uniformly. Therefore, large volume output is reserved, and the volume output has a larger dynamic range. For example, please refer to fig. 8.

Fig. 8 is a schematic diagram of a large volume sliding window in the sound processing method according to the embodiment of the present application. Please refer to fig. 8: the sound volume of the drumbeat part is relatively large, and the maximum sampling value exceeds a preset sampling value, wherein the preset sampling value is, for example, 0.9 multiplied by the theoretical maximum sampling value overload. At this time, the sampling values of the sampling points in the sliding window may not be scaled, so that the maximum volume output is retained.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 9 is a schematic structural diagram of a sound playing apparatus according to an embodiment of the present application. The sound reproducing apparatus 900 includes: a receiving module 91, a processing module 92, a determining module 93 and a playing module 94.

A receiving module 91, configured to receive a play instruction requesting to play a target file containing audio;

the processing module 92 is configured to determine, in response to the play instruction, sample values of sampling points of an audio segment corresponding to the sliding window in the audio;

a determining module 93, configured to determine a reference sample value from sample values of each sample point of the audio segment, where the reference sample value is a sample value with the largest occurrence number in the sliding window;

the processing module 92 is further configured to process the audio segment according to the reference sample value;

a playing module 94, configured to play the audio clip processed by the processing module 92.

In a possible implementation manner, the determining module 93 is configured to determine a histogram according to sampling values of sampling points in the audio segment, where the histogram is used to indicate the number of times that different sampling values occur; and determining the reference sampling value according to the histogram.

In a possible implementation, the processing module 92 is configured to determine a target sample value of the audio segment when processing the audio segment according to the reference sample value; determining a unified coefficient according to the target sampling value and the reference sampling value; adjusting sampling values of all sampling points in the audio clip according to the unified coefficient;

the playing module 94 is configured to play the audio clip after adjusting the sampling value.

In a possible implementation manner, before the determining module 93 determines the reference sample value from the sample values of the sampling points in the audio segment, the determining module is further configured to determine a maximum sample value from the sample values of the sampling points in the audio segment, and determine that the maximum sample value is smaller than a preset sample value.

In a possible implementation manner, the playing module 94 is further configured to play the audio segment when the determining module 93 determines that the maximum sampling value is greater than or equal to the preset sampling value.

In a feasible implementation manner, before the determining module 93 determines the sampling value of each sampling point in the audio clip corresponding to the sliding window in the sound file, the determining module is further configured to determine a first sampling point of the audio clip from the sampling points corresponding to the sound file; and determining the audio clip from the sound file according to the length of the sliding window and the first sampling point.

In a possible implementation manner, the processing module 92 is further configured to adjust a media playing volume of the electronic device before processing the audio segment according to the reference sample value.

The sound playing device provided by the embodiment of the application can execute the actions of the electronic equipment in the above embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 10, the electronic apparatus 1000 includes:

a processor 1001 and a memory 1002;

the memory 1002 stores computer instructions;

the processor 1001 executes the computer instructions stored by the memory 1002, so that the processor 1001 executes the sound playing method as described above.

For a specific implementation process of the processor 1001, reference may be made to the above method embodiments, which have similar implementation principles and technical effects, and details of this embodiment are not described herein again.

Optionally, the electronic device 10000 further includes a communication unit 1003. The processor 1001, the memory 1002, and the communication unit 1003 may be connected by a bus 1004.

An embodiment of the present application further provides a computer-readable storage medium, in which computer instructions are stored, and when executed by a processor, the computer instructions are used to implement the sound playing method described above.

Embodiments of the present application further provide a computer program product, which contains a computer program, and when the computer program is executed by a processor, the sound playing method as described above is implemented.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A sound playing method is applied to an electronic device, and the method comprises the following steps:

2. The method of claim 1, wherein determining a reference sample value from the sample values of the sample points in the audio piece comprises:

determining a histogram according to sampling values of all sampling points in the audio segment, wherein the histogram is used for indicating the occurrence times of different sampling values;

and determining the reference sampling value according to the histogram.

3. The method of claim 1, wherein processing and playing the audio clip according to the reference sample value comprises:

determining a target sample value of the audio segment;

determining a unified coefficient according to the target sampling value and the reference sampling value;

adjusting sampling values of all sampling points in the audio clip according to the unified coefficient;

and playing the audio clip after the sampling value is adjusted.

4. The method according to any of claims 1-3, wherein before determining the reference sample value from the sample values of the sample points in the audio piece, further comprising:

determining a maximum sampling value from sampling values of all sampling points in the audio clip;

and determining that the maximum sampling value is smaller than a preset sampling value.

5. The method of claim 4, further comprising:

and when the maximum sampling value is greater than or equal to the preset sampling value, playing the audio clip.

6. The method according to any one of claims 1-3, wherein before determining the sample value of each sample point in the audio segment corresponding to the sliding window in the sound file, the method further comprises:

determining the first sampling point of the audio clip from the sampling points corresponding to the sound file;

and determining the audio clip from the sound file according to the length of the sliding window and the first sampling point.

7. The method of any of claims 1-3, wherein prior to processing the audio clip according to the reference sample values and playing, further comprising:

and adjusting the media playing volume of the electronic equipment.

8. A usage playback apparatus, comprising:

9. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, causes the electronic device to carry out the method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.