WO2022161291A1 - Procédé et appareil de recherche audio, dispositif informatique et support de stockage - Google Patents

Procédé et appareil de recherche audio, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2022161291A1
WO2022161291A1 PCT/CN2022/073291 CN2022073291W WO2022161291A1 WO 2022161291 A1 WO2022161291 A1 WO 2022161291A1 CN 2022073291 W CN2022073291 W CN 2022073291W WO 2022161291 A1 WO2022161291 A1 WO 2022161291A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
hash
hash feature
feature
audio
Prior art date
Application number
PCT/CN2022/073291
Other languages
English (en)
Chinese (zh)
Inventor
吕镇光
Original Assignee
百果园技术(新加坡)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2022161291A1 publication Critical patent/WO2022161291A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings

Definitions

  • the embodiments of the present application relate to the technical field of audio processing, for example, to an audio search method, apparatus, computer device, and storage medium.
  • multimedia data such as making short videos, humming songs, recordings, etc.
  • multimedia data in the Internet grow rapidly, and audio data also followed by rapid growth.
  • the audio data is compared to determine whether the audio data is the same or similar.
  • the audio data is usually sorted by a queuing system, and then the audio data is compared in order.
  • the baseline method is usually used, that is, the audio data has no specific reference standard when sorting, and the audio data is compared one by one.
  • the accuracy rate is high, it occupies a lot of resources. Time consuming is high, resulting in low overall efficiency.
  • the embodiments of the present application propose an audio search method, apparatus, computer equipment, and storage medium, so as to solve the problem of how to improve the efficiency of comparison while maintaining the accuracy of comparison audio data.
  • an embodiment of the present application provides an audio search method, including:
  • the first hash feature is calculated for the first audio data
  • the second hash feature is calculated for a plurality of the second audio data
  • the first hash feature is compared with a plurality of the second hash features in the order to find the second audio data that is the same as or similar to the first audio data.
  • the embodiment of the present application also provides an audio search method, including:
  • the first hash feature is compared with a plurality of the second hash features in the order to determine whether there is second audio data in the plurality of second audio data that is the same as the first audio data or similar;
  • the first audio data is determined to be illegal in response to second audio data being the same as or similar to the first audio data in the plurality of second audio data.
  • an embodiment of the present application also provides an audio search device, including:
  • an audio data determination module configured to determine the first audio data and a plurality of second audio data
  • a hash feature calculation module configured to calculate a first hash feature for the first audio data and a second hash feature for a plurality of the second audio data respectively;
  • an order determination module configured to determine the order in which the plurality of second audio data are arranged according to the density of the plurality of second hash features
  • a hash feature comparison module configured to compare the first hash feature with a plurality of the second hash features in the order to find the second hash features that are the same as or similar to the first audio data audio data.
  • an embodiment of the present application also provides an audio search device, including:
  • an audio data receiving module configured to receive the first audio data uploaded by the client, and calculate a first hash feature for the first audio data
  • the blacklist search module is configured to search for a currently configured blacklist, where a plurality of second audio data are recorded in the blacklist, and a second hash feature has been configured for the plurality of second audio data;
  • an order determination module configured to determine the order in which the plurality of second audio data are arranged according to the density of the plurality of second hash features
  • a hash feature comparison module configured to compare the first hash feature with a plurality of the second hash features in the order to determine whether there is second audio data in the plurality of second audio data the same or similar to the first audio data;
  • the illegal audio determination module is configured to determine that the first audio data is illegal in response to the presence of second audio data in the plurality of second audio data that is identical to or similar to the first audio data.
  • an embodiment of the present application further provides a computer device, the computer device comprising:
  • memory arranged to store at least one program
  • the at least one processor When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the audio search method according to the first aspect or the second aspect.
  • embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the implementation of the first or second aspect is implemented The audio search method described above.
  • FIG. 1 is a flowchart of an audio search method provided in Embodiment 1 of the present application.
  • FIG. 2 is an example diagram of calculating the density of the second hash feature according to Embodiment 1 of the present application;
  • 3A is an example diagram of a short audio search provided in Embodiment 1 of the present application.
  • 3B is an example diagram of a long audio search provided in Embodiment 1 of the present application.
  • FIG. 5 is a schematic structural diagram of an audio search apparatus according to Embodiment 3 of the present application.
  • FIG. 6 is a schematic structural diagram of an audio search apparatus according to Embodiment 4 of the present application.
  • FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 5 of the present application.
  • FIG. 1 is a flowchart of an audio search method provided in Embodiment 1 of the application. This embodiment is applicable to sorting and comparing audio data according to the density of the hash feature of the audio data.
  • the method can be performed by an audio search device.
  • the audio search apparatus can be implemented by software and/or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, etc., including the following steps:
  • Step 101 Determine first audio data and a plurality of second audio data.
  • the first audio data and the plurality of second audio data are audio data
  • the audio data can be in the form of songs released by singers, audio data separated from video data such as short videos, movies, and TV dramas,
  • the format of the audio data may include MP3, WMA, and AAC, which is not limited in this embodiment.
  • the plurality of second audio data are pre-collected audio data in various ways, for example, the user uploads the audio data, purchases the audio data from the copyright owner, the technician records the audio data, and uses the crawler client to crawl from the network.
  • Audio data, etc. a plurality of second audio data can form an audio library, and search services can be provided to the outside, the first audio data is the audio data to be searched, that is, the audio library is searched for the same or similar to the first audio data. the second audio data.
  • the same or similar in this embodiment may refer to the first audio data and the second audio data being the same or similar in whole or in part.
  • Step 102 Calculate a first hash feature for the first audio data and calculate a second hash feature for a plurality of second audio data, respectively.
  • a hash feature (hash, also known as hash feature, fingerprint) can be calculated for it to be used as the feature of the first audio data.
  • the hash feature is recorded as the first hash feature .
  • a hash feature (hash, also known as hash feature, fingerprint) can be calculated for it to be used as the feature of the second audio data.
  • hash feature is recorded as the second hash feature .
  • the methods of calculating the first hash feature and calculating the second hash feature are the same, that is, the first hash feature is calculated for the first audio data and the second hash feature is calculated for multiple second audio data based on the same method. Hi feature.
  • step 102 may include the following steps:
  • Step 1021 Convert the first audio data into a first spectrogram.
  • the first audio data may be converted by means of Fourier transform (Discrete Fourier Transform, DFT), short-time Fourier transform (short-time Fourier transform, or short-term Fourier transform, STFT), etc.
  • DFT Discrete Fourier Transform
  • short-time Fourier transform short-time Fourier transform
  • STFT short-term Fourier transform
  • the horizontal axis of the spectrogram is time and the vertical axis is frequency, so that the first audio data is converted from a time-domain signal to a frequency-domain signal.
  • the spectrogram is denoted as the first spectrogram.
  • a data block also known as a window
  • the plurality of first data blocks are respectively converted into frequency domain signals, so that time information is preserved to a certain extent.
  • the parameters of the first audio data are two-channel, 16-bit precision, and 44100 Hz sampling.
  • the data size of 1s is 441002byte2 channel ⁇ 176kB. If 4kB is selected as the size of the data block, Fourier transform is performed on 44 blocks of data every second, and such a segmentation density can meet the requirements.
  • Step 1022 Search for a first key point on multiple spectral bands of the first spectrogram according to the energy.
  • the frequency span with the larger amplitude of the first audio data may be very wide, and may appear from low C (32.70 Hz) to high C (4186.01 Hz).
  • the first spectrogram may be divided into a plurality of spectral bands (also called sub-bands).
  • Select key points, frequency peaks from each subband for example, select the following subbands: 30Hz-40Hz, 40Hz-80Hz and 80Hz-120Hz for the bass subband (bass guitars and other instruments will have a bass subband at the fundamental frequency) , the midrange and treble subbands are 120Hz-180Hz and 180Hz-300Hz respectively (the fundamental frequencies of vocals and most other instruments appear in these two subbands).
  • a key point can be selected according to the energy, which is recorded as the first key point for the convenience of distinction.
  • the point with the highest frequency (ie, the highest energy) in each subband can be selected as the first key point.
  • Step 1023 Generate a first hash feature of the first audio data based on the first key point.
  • the first key point of each data block constitutes the signature of this frame of audio data, and the signatures of different data blocks constitute the first hash feature of the entire first audio data.
  • the first hash feature of the first audio data may be cached in the memory, waiting to be compared with the second hash feature of the second audio data.
  • Step 1024 Convert the second audio data into a second spectrogram.
  • the second audio data can be converted into a spectrogram by means of Fourier transform, short-time Fourier transform, etc.
  • the horizontal axis of the spectrogram is time, and the vertical axis is frequency, so that the second audio data is converted into a spectrogram. Converted from a time domain signal to a frequency domain signal, the spectrogram is denoted as the second spectrogram for the convenience of distinction.
  • a data block also known as a window
  • the data blocks are converted to frequency domain signals separately, which preserves time information to a certain extent.
  • Step 1025 Search for a second key point on multiple spectral bands of the second spectrogram according to the energy.
  • the frequency span with the larger amplitude of the second audio data may be very wide, and may appear from the bass C (32.70 Hz) to the high C (4186.01 Hz).
  • the second spectrogram may be divided into a plurality of spectral bands (also called sub-bands).
  • Select key points, frequency peaks from each subband for example, select the following subbands: 30Hz-40Hz, 40Hz-80Hz and 80Hz-120Hz for the bass subband (bass guitars and other instruments will have a bass subband at the fundamental frequency) , the midrange and treble subbands are 120Hz-180Hz and 180Hz-300Hz respectively (the fundamental frequencies of vocals and most other instruments appear in these two subbands).
  • a key point can be selected according to the energy, which is recorded as the second key point for the convenience of distinction.
  • the point with the highest frequency (ie, the highest energy) in each subband can be selected as the second key point.
  • Step 1026 Generate a second hash feature of the second audio data based on the second key point.
  • the second key point of each data block constitutes the signature of this frame of audio data, and the signatures of different data blocks constitute the second hash feature of the entire second audio data.
  • the second hash feature of the second audio data can be stored as a key for retrieving the hash table.
  • the second hash feature is usually used as the key value of the hash table, and the part pointed to by the key value The time when the second hash feature appears in the second audio data and the ID of the second audio data are included.
  • the second hash feature (Hash Tag) Time in Seconds Second audio data (Song) 30 51 99 121 195 53.52 Song A 33 56 92 151 185 12.32 Song B 39 26 89 141 251 15.34 Song C 32 67 100 128 270 78.43 Song D 30 51 99 121 195 10.89 Song E 34 57 95 111 200 54.52 Song A 34 41 93 161 202 11.89 Song E
  • the above method for calculating the first hash feature and the second hash feature is only an example.
  • other methods for calculating the first hash feature and the second hash feature may be set according to the actual situation. This embodiment of the present application does not limit this.
  • those skilled in the art can also adopt other methods for calculating the first hash feature and the second hash feature according to actual needs. This is also not restricted.
  • Step 103 Determine the order of arrangement among the plurality of second audio data according to the density of the plurality of second hash features.
  • the comparison accuracy of the hash features is higher, and when the hash features are sparse, the comparison accuracy of the hash features is lower, and it is easy to combine different or dissimilar audio data, considered to be the same or similar audio data.
  • the second hash feature statistical density (Density) of the second audio data can be used to represent the density of the second hash feature, and in the queuing system (Queuing System), the second audio data The density of the two hash features is used as a threshold, and the plurality of second audio data are sorted according to the density of the second hash features of the second audio data, so as to determine the order among the plurality of second audio data.
  • step 103 includes the following steps:
  • Step 1031 Count the number of overlapping second hash features in multiple local regions.
  • the second audio data can be divided into a plurality of local areas of the same size, and for each local area, the number of overlapping second hash features in the local area can be counted separately, and the local area is used as the unit area, the data can be regarded as local density.
  • a second spectrogram of the second audio data may be obtained, where the second spectrogram is a spectrogram obtained after converting the second audio data from time domain information to frequency domain information, and the second hash feature may be marked in on the second spectrogram.
  • Multiple windows of the same size are added to the second spectrogram to represent the range of multiple local regions, so that the number of second hash features is counted in multiple windows, and the second hash feature is used as the second hash feature in The number of multiple local regions.
  • the number of local regions (that is, the local density) is expressed as follows:
  • i is the number of overlapping second hash features within the window (ie, t to t+k).
  • a preset window may be searched, and a window may be added to the second spectrogram at preset time intervals, thereby dividing the second spectrogram into multiple local regions.
  • the width of the window is equal to the length of the preset time, that is, there is no overlap between two adjacent windows, which reduces the calculation amount of the second hash feature.
  • the width of the window is smaller than the preset time length, that is, a partial overlap between two adjacent windows can improve the accuracy of the second hash feature.
  • Step 1032 Generate the density of the second hash feature in the second audio data to which it belongs based on the number of overlaps in the multiple local regions.
  • the number of overlapping second hash features in multiple local regions may be used as a reference to generate the number of overlapping second hash features in the second audio data. density.
  • the number of overlaps in a plurality of partial regions may be compared, and if the number of overlaps in a certain partial region is the largest, the number of overlaps in the partial region with the largest number of overlaps is determined as the number of overlaps in the second hash feature to which it belongs. Density in the second audio data.
  • max is the function of taking the maximum value.
  • a window 201 , a window 202 , a window 203 , a window 204 , a window 205 , a window 206 , and a window 207 are added to the second spectrogram of a certain second hash feature, wherein the window 203
  • the number of overlaps of the second hash features in the window 203 is the highest, therefore, the number of overlaps of the second hash features in the window 203 can be selected as the density of the second hash features in the second audio data.
  • the above method for calculating the density of the second hash feature is only an example.
  • other methods for calculating the density of the second hash feature may be set according to actual conditions. Sort the number from large to small, take the number of overlaps in the j (j is a positive integer) local area before sorting 1 and calculate the average value as the density of the second hash feature in the second audio data, the embodiment of the present application This is not restricted.
  • those skilled in the art may also adopt other methods for calculating the density of the second hash feature according to actual needs, which are not limited in this embodiment of the present application.
  • Step 1033 Sort the plurality of second audio data in descending order according to the density to obtain the order of the plurality of second audio data.
  • the plurality of second audio data may be sorted in descending order according to the density, so as to determine the order of each second audio data, that is, the second The higher the density of the hash features is, the higher the order of the second audio data is; otherwise, the lower the density of the second hash features is, the lower the sequence of the second audio data is.
  • Step 104 Compare the first hash feature with a plurality of second hash features in order to find second audio data that is the same as or similar to the first audio data.
  • the second hash feature of the second audio data may be sequentially compared with the first hash feature of the first audio data according to the order in which the second audio data is arranged, so as to determine the difference between the first audio data and the first hash feature of the first audio data. Whether the two audio data are the same or similar.
  • the difference between the second hash feature of the second audio data and the first hash feature of the first audio data is large, it can be considered that the second audio data is different from the first audio data The similarity between them is low, the first hash feature does not match the second hash feature, and the search continues for the next second audio data.
  • the search can be stopped.
  • a target position may be determined, where the target position is used to represent the quantity of the second audio data to be compared, and the target position is generally much smaller than the quantity of the second audio data.
  • first hash feature matches the second hash feature, it is determined that the first audio data and the second audio data to which the second hash feature belongs are identical or similar.
  • the first audio data and a plurality of second audio data are determined, a first hash feature is calculated for the first audio data, and a second hash feature is calculated for a plurality of second audio data, respectively, according to the plurality of
  • the density of the second hash feature determines the order in which the plurality of second audio data are arranged, and the first hash feature is compared with the plurality of second hash features in order to find the first audio data that is the same or similar to the first audio data.
  • denser hash features can improve the accuracy of comparison, adjust the sorting of audio data through the density of hash features, improve the probability of searching for the same or similar audio data in the process of priority comparison, thereby reducing the In the case of the number of comparisons, the accuracy of searching for audio data is improved.
  • the first audio data is compared with the second audio data one by one, and the matching second audio data is a coincidental event.
  • the process of searching for the second audio data matching the first audio data consumes a lot of time, and the time complexity is O(N).
  • the queue system A arranges the second audio data according to the absolute number (Absolute Matches) of the second hash feature.
  • the second audio data is placed in a queue, where the second audio data at the front of the queue are most likely to be the best match and those at the back of the queue are less likely to be the correct match.
  • the queue system A can provide a stop criterion. If the first m second audio data in the queue are compared and no second audio data matching the first audio data is found, the search can be stopped, and the search result is generated as There is no second audio data matching the first audio data.
  • n is a positive integer, and m ⁇ N (m is much smaller than N).
  • the time complexity of the queue system A is O(m), and O(m) ⁇ O(N).
  • the queuing system A saves time, it is only effective when the plurality of second audio data have the same duration, and when the duration of the plurality of second audio data has a large deviation, the accuracy will decrease.
  • the duration of the second audio data A is 2 minutes
  • the duration of the second audio data B is 30 minutes
  • the second audio data B may only be due to the duration is so long that the number of second hash features of the second audio data B is greater than the number of second hash features of the second audio data A, so that the second audio data B is at the front of the queue, and the second audio data A is at the back of the queue.
  • the queue system B normalizes the duration of the second audio data (Normalised by Duration) by dividing by the duration to queue the second audio data.
  • This embodiment provides a queue system C, which performs normalization according to the density of the second hash feature, and sorts according to the density of the second hash feature, so that the difference between the absolute number of the second hash feature and the over-normalization duration is trade-offs were made.
  • the second audio data are song A (Song A) and song B (Song B) respectively, the duration of song A is less than the duration of song B, and it is assumed that the given second audio data matching the first audio data is song A.
  • the second hash feature is marked on the second spectrogram of song A and the second spectrogram of song B, respectively, and the following data are counted on them:
  • the absolute number of second hash features in song A (727) is less than the absolute number of second hash features in song B (913), so song A ranks after song B.
  • Song A's normalized duration (0.198) is greater than Song B's normalized duration (0.033), so Song A ranks ahead of Song B.
  • the density of the second hash feature in song A (0.266) is greater than the density of the second hash feature in song B (0.067), so song A ranks ahead of song B.
  • the second audio data are song A (Song A) and song B (Song B) respectively, the duration of song A is shorter than the duration of song B, and it is assumed that the given second audio data matching the first audio data is song B.
  • the second hash feature is marked on the second spectrogram of song A and the second spectrogram of song B respectively, and the following data are counted on them:
  • the absolute number of second hash features in song A (347) is less than the absolute number of second hash features in song B (2481), so song A ranks after song B.
  • Song A's normalized duration (0.094) is greater than Song B's normalized duration (0.090), so Song A ranks ahead of Song B.
  • the density of the second hash feature in song A (0.127) is less than the density of the second hash feature in song B (0.182), so song A ranks after song B.
  • the query matching song B has a higher density area, the duration of song B is longer, the absolute number of second hash features is greater than that of song A, and queue system B overcompensates for the duration, although queue system B does not Scenario one (short audio search) is valid, but not for scenario two (long audio search), while queue system C is valid for both scenario one (short audio search) and scenario two (long audio search).
  • FIG. 4 is a flowchart of an audio search method provided in Embodiment 2 of the present application. This embodiment is applicable to the case where audio data is sorted and compared according to the density of the hash feature of the audio data, so as to perform content review.
  • the method may be performed by an audio search apparatus, which may be implemented in software and/or hardware, and may be configured in computer equipment, such as a server, workstation, personal computer, etc., including the following steps:
  • Step 401 Receive first audio data uploaded by a client, and calculate a first hash feature for the first audio data.
  • the computer device acts as a multimedia platform.
  • it provides users with audio-based services, such as providing users with live programs, short videos, voice conversations, video conversations, etc., and on the other hand, receives user uploads.
  • audio-carrying files such as live broadcast data, short videos, session information, and so on.
  • Audio-carrying files such as audio-carrying files that contain pornographic, vulgar, violence, etc. content, so as to release some audio-carrying files that meet the video content review standards.
  • a streaming real-time system can be set up in the multimedia platform.
  • the user uploads the audio-carrying file to the streaming real-time system in real time through the client, and the streaming real-time system can transmit the audio-carrying file to the real-time streaming system. to computer equipment used for content moderation.
  • a database such as a distributed database
  • the user uploads the audio file to the database through the client, and the computer equipment used for content review can read the data from the database.
  • a file that carries audio can be set up in the multimedia platform.
  • the first audio data may be separated from the file carrying the audio for content auditing, and for the first audio data, a hash feature may be calculated for the first audio data as the first hash feature.
  • the first audio data can be converted into a first spectrogram, a first key point can be searched on a plurality of spectral bands of the first spectrogram according to the energy, and based on the first key point A first hash feature of the first audio data is generated.
  • Step 402 look up the currently configured blacklist.
  • some audio data containing sensitive content such as pornography, vulgarity, violence, etc. may be recorded in the blacklist as second audio data.
  • the second audio data can be continuously expanded.
  • a hash feature may be calculated for the second audio data as the second hash feature.
  • the second audio data can be converted into a second spectrogram, a second key point is searched on a plurality of spectral bands of the second spectrogram according to the energy, and based on the second key point A second hash feature of the second audio data is generated.
  • a plurality of second audio data are recorded in the blacklist, and each second audio data has been configured with a second hash feature, and the second hash feature may be loaded during content review.
  • Step 403 Determine the order of arrangement among the plurality of second audio data according to the density of the plurality of second hash features.
  • the magnitude of the first audio data uploaded by the client every day can reach tens of millions or even hundreds of millions.
  • the magnitude of the first audio data belonging to the blacklist is about several thousand. , which makes the matching rate of the blacklist lower.
  • the matching rate of the blacklist is about 0.005%.
  • the multimedia platform needs a queue system with low time consumption and high precision to capture the first audio data belonging to the blacklist as much as possible.
  • the baseline method uses the first audio data to compare with all the second audio numbers in the blacklist. Although the accuracy rate is high, the time complexity is O(N) and the time-consuming is high, which is unnecessary. Because 99.995% of the first audio data does not match the second audio data, this is an inefficient search method.
  • Queue System A Queue System A, arrange the second audio data according to the absolute number of the second hash feature (Absolute Matches)
  • Queue System B Queue System B, perform the second audio data duration Normalized by Duration to arrange the second audio data
  • This embodiment proposes a queue system C that allows pruning to more accurately select second audio data in the pruning queue using the density of the second hash feature while maintaining efficiency.
  • step 403 includes the following steps:
  • Step 4031 Count the number of overlapping second hash features in multiple local regions.
  • a second spectrogram of the second audio data can be obtained; multiple windows are added on the second spectrogram; the number of the second hash features is counted in the multiple windows, as the number of the second hash features in multiple windows. the number of local regions.
  • the width of the window is less than or equal to the preset time length.
  • Step 4032 Generate the density of the second hash feature in the second audio data to which it belongs based on the number of overlaps in the multiple local regions.
  • the number of overlaps in multiple local regions can be compared; if the number of overlaps in a local region is the largest, the number of overlaps in the local region with the largest number of overlaps is determined as the second.
  • the density of the feature in the associated second audio data is determined as the second.
  • Step 4033 Sort the plurality of second audio data in descending order according to the density to obtain the order of the plurality of second audio data.
  • Step 404 Compare the first hash feature with a plurality of second hash features in order to determine whether there is second audio data in the plurality of second audio data that is identical or similar to the first audio data.
  • the target position may be determined; the first hash feature is compared with the second hash feature located before the target position in order.
  • the first hash feature matches the second hash feature, it is determined that the first audio data is the same or similar to the second audio data to which the second hash feature belongs.
  • the baseline method, queue system A, queue system B, and queue system C are tested.
  • a test set consisting of 130 blacklisted second audio data and 1000 first audio data is used, of which 800 The first audio data does not belong to the blacklist, and the 200 first audio data belong to the blacklist.
  • Queuing system B can improve accuracy relative to queuing system A, but at the expense of lowering the push rate.
  • Queue system C can provide high push rate and precision at the same time, and the time consumption is very small.
  • Step 405 If the second audio data is the same as or similar to the first audio data in the plurality of second audio data, determine that the first audio data is illegal.
  • the first audio data is not the same or similar to any second audio data in the blacklist, it can be determined that the first audio data is legal, pass the content audit, and perform other content audits according to business requirements, or , releasing the first audio data to the public.
  • the first audio data is the same as or similar to a certain second audio data in the blacklist, it can be determined that the first audio data is illegal, cannot pass the content review, and cannot be released to the public, and generates a corresponding The prompt information is sent to the client.
  • users who log in to the client can be banned, frozen, or banned.
  • the second hash feature is calculated for the second audio data
  • the second audio data is sorted based on the density of the second hash feature
  • the first hash feature and the second hash feature are basically similar to the application of the first embodiment, so the description is relatively simple, and the relevant parts can be referred to the partial description of the first embodiment, and this embodiment will not be described in detail here. .
  • the first audio data uploaded by the client is received, and the first hash feature is calculated for the first audio data; the currently configured blacklist is searched, and a plurality of second audio data are recorded in the blacklist, and a plurality of The second audio data has been configured with the second hash feature; the order of arrangement among the plurality of second audio data is determined according to the density of the plurality of second hash features; the first hash feature and the plurality of second hash features are arranged in order
  • the feature is compared to determine whether the second audio data is the same or similar to the first audio data in the plurality of second audio data; if the second audio data is the same or similar to the first audio data in the plurality of second audio data , then it is determined that the first audio data is illegal, the denser hash features can improve the accuracy of comparison, and the sorting of audio data can be adjusted by the density of hash features.
  • the probability of getting the same or similar audio data so as to reduce the number of comparisons, improve the push rate of the search audio data, and improve the
  • FIG. 5 is a structural block diagram of an audio search apparatus provided in Embodiment 3 of the present application, including the following modules:
  • the audio data determination module 501 is configured to determine the first audio data and a plurality of second audio data
  • the hash feature calculation module 502 is configured to calculate a first hash feature for the first audio data and a second hash feature for a plurality of the second audio data respectively;
  • the order determination module 503 is configured to determine the order of arrangement among the plurality of the second audio data according to the density of the plurality of the second hash features;
  • the hash feature comparison module 504 is configured to compare the first hash feature with a plurality of the second hash features in the order to find the first audio data that is the same or similar to the first audio data. 2. Audio data.
  • the audio data determination module 501 includes:
  • a first spectrogram conversion module configured to convert the first audio data into a first spectrogram
  • a first key point search module configured to search for a first key point on a plurality of frequency spectrum bands of the first spectrogram according to energy
  • a first hash feature generation module configured to generate a first hash feature of the first audio data based on the first key point
  • a second spectrogram conversion module configured to convert each second audio data into a second spectrogram
  • a second key point searching module configured to search for a second key point on a plurality of spectral bands of the second spectrogram according to energy
  • a second hash feature generation module configured to generate a second hash feature of each of the second audio data based on the second key point.
  • the ranking determining module 503 includes:
  • a local quantity statistics module set to count the overlapping quantity of each second hash feature in multiple local areas
  • a local density generation module configured to generate the density of each second hash feature in the second audio data to which it belongs based on the number of overlaps in a plurality of the local regions
  • the audio sequence determination module is configured to sort the plurality of second audio data in descending order according to the density to obtain the sequence of the plurality of second audio data.
  • the local quantity statistics module includes:
  • a spectrogram acquisition module configured to acquire a second spectrogram of the second audio data to which each second hash feature belongs
  • a window adding module configured to add multiple windows on the second spectrogram
  • the window number statistics module is configured to count the number of each second hash feature in a plurality of the windows respectively, as the number of each second hash feature in a plurality of local areas.
  • the window adding module includes:
  • Window search module set to search for preset windows
  • a time adding module configured to add the window on the second spectrogram every preset time interval.
  • the width of the window is less than or equal to the length of the preset time.
  • the local density generation module includes:
  • a quantity comparison module configured to compare the overlapping quantities in a plurality of the local regions
  • the quantity value module is set to, if the overlapping quantity in a certain local area is the largest, determine the overlapping quantity in the partial area with the largest overlapping quantity as the second audio data to which each second hash feature belongs density in .
  • the hash feature comparison module 504 includes:
  • the target position determination module is set to determine the target position
  • a partial feature comparison module configured to compare the first hash feature with the second hash feature located before the target position in the order
  • a search and determination module configured to determine that the first audio data and the second audio data to which the second hash feature belongs are identical or similar if the first hash feature matches the second hash feature.
  • the audio search apparatus provided by the embodiment of the present application can execute the audio search method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
  • FIG. 6 is a structural block diagram of an audio search apparatus provided in Embodiment 4 of the present application, including the following modules:
  • the audio data receiving module 601 is configured to receive the first audio data uploaded by the client, and calculate the first hash feature for the first audio data;
  • the blacklist search module 602 is configured to search for a currently configured blacklist, where a plurality of second audio data are recorded in the blacklist, and a second hash feature has been configured for the plurality of second audio data;
  • an order determination module 603, configured to determine the order of arrangement among a plurality of the second audio data according to the density of the plurality of second hash features
  • Hash feature comparison module 604 configured to compare the first hash feature with a plurality of the second hash features in the order to determine whether there is a second audio in the plurality of the second audio data data is the same as or similar to the first audio data;
  • the illegal audio determination module 605 is configured to determine that the first audio data is illegal if there is second audio data in the plurality of second audio data that is the same as or similar to the first audio data.
  • the audio data receiving module 601 includes:
  • a first spectrogram conversion module configured to convert the first audio data into a first spectrogram
  • a first key point search module configured to search for a first key point on a plurality of frequency spectrum bands of the first spectrogram according to energy
  • a first hash feature generation module configured to generate a first hash feature of the first audio data based on the first key point.
  • a second spectrogram conversion module configured to convert each second audio data into a second spectrogram
  • a second key point searching module configured to search for a second key point on a plurality of spectral bands of the second spectrogram according to energy
  • a second hash feature generation module configured to generate a second hash feature of each of the second audio data based on the second key point.
  • the ranking determining module 603 includes:
  • a local quantity statistics module set to count the overlapped quantity of each second hash feature in multiple local areas
  • a local density generation module configured to generate the density of each second hash feature in the second audio data to which it belongs based on the number of overlaps in a plurality of the local regions
  • the audio sequence determination module is configured to sort the plurality of second audio data in descending order according to the density to obtain the sequence of the plurality of second audio data.
  • the local quantity statistics module includes:
  • a spectrogram acquisition module configured to acquire a second spectrogram of the second audio data to which each second hash feature belongs
  • a window adding module configured to add multiple windows on the second spectrogram
  • the window number statistics module is configured to count the number of each second hash feature in a plurality of the windows respectively, as the number of each second hash feature in a plurality of local areas.
  • the window adding module includes:
  • Window search module set to search for preset windows
  • a time adding module configured to add the window on the second spectrogram every preset time interval.
  • the width of the window is less than or equal to the length of the preset time.
  • the local density generation module includes:
  • a quantity comparison module configured to compare the overlapping quantities in a plurality of the local regions
  • the quantity value module is set to, if the overlapping quantity in a certain local area is the largest, determine the overlapping quantity in the partial area with the largest overlapping quantity as the second audio data to which each second hash feature belongs density in .
  • the hash feature comparison module 604 includes:
  • the target position determination module is set to determine the target position
  • a partial feature comparison module configured to compare the first hash feature with the second hash feature located before the target position in the order
  • a search and determination module configured to determine that the first audio data and the second audio data to which the second hash feature belongs are identical or similar if the first hash feature matches the second hash feature.
  • the audio search apparatus provided by the embodiment of the present application can execute the audio search method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
  • the fifth embodiment of the present application provides a computer device, in which the audio search apparatus provided by any one of the embodiments of the present application can be integrated.
  • FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 5 of the present application.
  • the computer device includes at least one processor 701 and a memory 702, and the memory 702 is configured to store at least one program.
  • the at least one program is executed by the at least one processor 701, the at least one processor 701 implements the description in any embodiment of the present application. audio search method.
  • Embodiment 6 of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the above audio search method is implemented. Repeat.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Des modes de réalisation de la présente demande concernent un procédé et un appareil de recherche audio, un dispositif informatique et un support de stockage. Le procédé consiste : à déterminer des premières données audio et une pluralité d'éléments de secondes données audio ; à calculer une première caractéristique de hachage pour les premières données audio et à calculer des secondes caractéristiques de hachage pour la pluralité d'éléments de secondes données audio, respectivement ; à déterminer une séquence d'agencement de la pluralité d'éléments de secondes données audio selon des densités de la pluralité de secondes caractéristiques de hachage ; et à comparer la première caractéristique de hachage à la pluralité de secondes caractéristiques de hachage selon la séquence pour rechercher des secondes données audio identiques ou similaires aux premières données audio.
PCT/CN2022/073291 2021-01-28 2022-01-21 Procédé et appareil de recherche audio, dispositif informatique et support de stockage WO2022161291A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110119351.4 2021-01-28
CN202110119351.4A CN112784098A (zh) 2021-01-28 2021-01-28 一种音频搜索方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022161291A1 true WO2022161291A1 (fr) 2022-08-04

Family

ID=75759439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/073291 WO2022161291A1 (fr) 2021-01-28 2022-01-21 Procédé et appareil de recherche audio, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN112784098A (fr)
WO (1) WO2022161291A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784098A (zh) * 2021-01-28 2021-05-11 百果园技术(新加坡)有限公司 一种音频搜索方法、装置、计算机设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915403A (zh) * 2015-06-01 2015-09-16 腾讯科技(北京)有限公司 一种信息处理方法及服务器
CN109189978A (zh) * 2018-08-27 2019-01-11 广州酷狗计算机科技有限公司 基于语音消息进行音频搜索的方法、装置及存储介质
CN110019921A (zh) * 2017-11-16 2019-07-16 阿里巴巴集团控股有限公司 音频与属性的关联方法及装置、音频搜索方法及装置
CN112784098A (zh) * 2021-01-28 2021-05-11 百果园技术(新加坡)有限公司 一种音频搜索方法、装置、计算机设备和存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100555287C (zh) * 2007-09-06 2009-10-28 腾讯科技(深圳)有限公司 互联网音乐文件排序方法、***和搜索方法及搜索引擎
US8463719B2 (en) * 2009-03-11 2013-06-11 Google Inc. Audio classification for information retrieval using sparse features
CN103971689B (zh) * 2013-02-04 2016-01-27 腾讯科技(深圳)有限公司 一种音频识别方法及装置
CN103440313B (zh) * 2013-08-27 2018-10-16 复旦大学 基于音频指纹特征的音乐检索***
CN107526846B (zh) * 2017-09-27 2021-09-24 百度在线网络技术(北京)有限公司 频道排序模型的生成、排序方法、装置、服务器和介质
CN111274360A (zh) * 2020-01-20 2020-06-12 深圳五洲无线股份有限公司 智能语音问答的答案提取方法、录入方法及智能设备
CN111462775B (zh) * 2020-03-30 2023-11-03 腾讯科技(深圳)有限公司 音频相似度确定方法、装置、服务器及介质
CN111597379B (zh) * 2020-07-22 2020-11-03 深圳市声扬科技有限公司 音频搜索方法、装置、计算机设备和计算机可读存储介质
CN112256911A (zh) * 2020-10-21 2021-01-22 腾讯音乐娱乐科技(深圳)有限公司 一种音频匹配方法、装置和设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915403A (zh) * 2015-06-01 2015-09-16 腾讯科技(北京)有限公司 一种信息处理方法及服务器
CN110019921A (zh) * 2017-11-16 2019-07-16 阿里巴巴集团控股有限公司 音频与属性的关联方法及装置、音频搜索方法及装置
CN109189978A (zh) * 2018-08-27 2019-01-11 广州酷狗计算机科技有限公司 基于语音消息进行音频搜索的方法、装置及存储介质
CN112784098A (zh) * 2021-01-28 2021-05-11 百果园技术(新加坡)有限公司 一种音频搜索方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN112784098A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
US10497378B2 (en) Systems and methods for recognizing sound and music signals in high noise and distortion
Wang The Shazam music recognition service
US20200257722A1 (en) Method and apparatus for retrieving audio file, server, and computer-readable storage medium
US9798513B1 (en) Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications
US9092518B2 (en) Automatic identification of repeated material in audio signals
Cano et al. Robust sound modeling for song detection in broadcast audio
JP5907511B2 (ja) オーディオメディア認識のためのシステム及び方法
US20160132600A1 (en) Methods and Systems for Performing Content Recognition for a Surge of Incoming Recognition Queries
CN106802960B (zh) 一种基于音频指纹的分片音频检索方法
US8706276B2 (en) Systems, methods, and media for identifying matching audio
CN1759396A (zh) 改进的数据检索方法和***
MXPA05010665A (es) Sistema y metodo para acelerar busquedas de base de datos para multiples corrientes de datos sincronizados.
CN108447501B (zh) 一种云存储环境下基于音频字的盗版视频检测方法与***
CN108197319A (zh) 一种基于时频局部能量的特征点的音频检索方法和***
WO2016189307A1 (fr) Procédé d'identification d'audio
WO2022161291A1 (fr) Procédé et appareil de recherche audio, dispositif informatique et support de stockage
George et al. Scalable and robust audio fingerprinting method tolerable to time-stretching
WO2022194277A1 (fr) Procédé et appareil de traitement d'empreinte audio, et dispositif informatique et support de stockage
Kekre et al. A review of audio fingerprinting and comparison of algorithms
Bisio et al. Opportunistic estimation of television audience through smartphones
Tzanetakis Audio-based gender identification using bootstrapping
Senevirathna et al. Radio Broadcast Monitoring to Ensure Copyright Ownership
Jie et al. Improved algorithms of music information retrieval based on audio fingerprint
Medina et al. Audio fingerprint parameterization for multimedia advertising identification
Qian et al. A novel algorithm for audio information retrieval based on audio fingerprint

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22745166

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22745166

Country of ref document: EP

Kind code of ref document: A1