CN117854518A - Method and device for realizing voice encoding and decoding and transcoding by WebGPU - Google Patents

Method and device for realizing voice encoding and decoding and transcoding by WebGPU Download PDF

Info

Publication number
CN117854518A
CN117854518A CN202410248002.6A CN202410248002A CN117854518A CN 117854518 A CN117854518 A CN 117854518A CN 202410248002 A CN202410248002 A CN 202410248002A CN 117854518 A CN117854518 A CN 117854518A
Authority
CN
China
Prior art keywords
voice
file
transcoding
decoding
display card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410248002.6A
Other languages
Chinese (zh)
Inventor
曾胜群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LOLAAGE TECHNOLOGIES Inc
Original Assignee
LOLAAGE TECHNOLOGIES Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LOLAAGE TECHNOLOGIES Inc filed Critical LOLAAGE TECHNOLOGIES Inc
Priority to CN202410248002.6A priority Critical patent/CN117854518A/en
Publication of CN117854518A publication Critical patent/CN117854518A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the technical field of voice processing, and discloses a method for realizing voice encoding and decoding and transcoding by using a WebGPU, which comprises the following steps: according to the display card computing force obtained by the browser, carrying out coding initialization, decoding initialization and transcoding initialization operation on a preset primary processing assembly respectively to obtain a voice processing assembly; performing anomaly monitoring on the voice processing assembly to obtain fault-tolerant anomalies, and performing snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant anomalies; performing voice encoding, voice decoding and voice transcoding operation on a pre-acquired voice file by utilizing a voice processing component to obtain a voice processing file; acquiring a real-time display card load by using a browser, and generating the occupancy rate of the display card according to the display card calculation force and the real-time display card load; and carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card. The invention also provides a device for realizing voice encoding and decoding and transcoding by the WebGPU. The invention can improve the efficiency of browser voice processing.

Description

Method and device for realizing voice encoding and decoding and transcoding by WebGPU
Technical Field
The invention relates to the technical field of voice processing, in particular to a method and a device for realizing voice encoding and decoding and transcoding by using a WebGPU.
Background
With the tremendous development of internet technology in recent years, developers in the field of voice communication have introduced advanced schemes of various voice codec standards such as Speex, AMR, g.711, AAC, and Opus, among which AAC and Opus are favored in terms of their characteristics of significantly reducing transmission bandwidth and storage space while maintaining high-quality voice.
However, speech codec and transcoding applications in Web browsers still have challenges in adaptation. Generally, most applications still rely on a browser to perform soft decoding, encoding and transcoding, and such soft decoding, encoding and transcoding methods often cause performance deficiency in a Web browser, and such problems may be represented by phenomena such as playing card of a voice stream, browser crash, and the like, which limit good experience of online voice service, and may cause low efficiency when the browser performs voice processing.
Disclosure of Invention
The invention provides a method and a device for realizing voice encoding and decoding and transcoding by using a WebGPU (web graphic processing unit), which mainly aim to solve the problem of lower efficiency when a browser performs voice processing.
In order to achieve the above object, the present invention provides a method for implementing voice encoding and decoding and transcoding by using WebGPU, including:
according to the display card computing force obtained by the browser, carrying out coding initialization, decoding initialization and transcoding initialization operation on a preset primary processing assembly respectively to obtain a voice processing assembly;
performing anomaly monitoring on the voice processing assembly to obtain fault-tolerant anomalies, and performing snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant anomalies;
performing voice coding, voice decoding and voice transcoding operation on a voice file acquired in advance by utilizing the voice processing component to obtain a voice processing file;
acquiring a real-time display card load by using the browser, and generating a display card occupancy rate according to the display card calculation force and the real-time display card load;
and carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
Optionally, the obtaining the graphics computing force according to the browser includes:
acquiring a display card computing force interface according to a browser, and acquiring a display card adapter according to the display card computing force interface;
according to the display card adapter, carrying out equipment request on the browser to obtain the display card equipment name;
and carrying out calculation force inquiry on the browser according to the display card equipment name to obtain the display card calculation force.
Optionally, the performing the coding initialization, decoding initialization and transcoding initialization operations on the preset primary processing component respectively to obtain a voice processing component includes:
performing multi-stage coding unit initialization operation on a preset primary processing assembly to obtain a voice coding assembly;
performing multi-stage decoding unit initialization operation on the primary processing assembly to obtain a voice decoding assembly;
performing multi-stage transcoding unit initialization operation on the primary processing assembly to obtain a voice transcoding assembly;
and integrating the voice coding component, the voice decoding component and the voice conversion component into a voice processing component.
Optionally, the performing a snapshot restart recovery operation on the voice processing component according to the fault tolerance exception includes:
judging whether the fault-tolerant abnormality is in an abnormal state or not;
if not, returning to the step of carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality;
if yes, storing the working snapshot of the voice processing assembly to obtain a real-time working snapshot;
restarting the voice processing assembly, and performing snapshot recovery operation on the restarted voice processing assembly by utilizing the real-time work snapshot.
Optionally, the performing, by using the voice processing component, voice encoding, voice decoding and voice transcoding on a pre-acquired voice file to obtain a voice processing file includes:
the method comprises the steps of carrying out demand grouping on a voice file obtained in advance to obtain a file to be encoded, a file to be decoded and a file to be transcoded;
extracting a coding type from the file to be coded;
performing adaptive coding and coding packaging operation on the file to be coded by utilizing the voice processing component and the coding type to obtain a voice coding file;
identifying the identification header of the file to be decoded to obtain a decoding type;
performing adaptive decoding and decoding packaging operation on the file to be decoded by utilizing the voice processing component and the decoding type to obtain a voice decoding file;
extracting a transcoding type from the file to be transcoded, and identifying a middle identification head of the file to be transcoded to obtain an initial type;
performing adaptive transcoding and transcoding packaging operation on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoding file;
and integrating the voice coding file, the voice decoding file and the voice transcoding file into a voice processing file.
Optionally, the adaptively encoding and encoding packaging operation is performed on the file to be encoded by using the speech processing component and the encoding type to obtain a speech encoded file, which includes:
selecting a voice coding assembly from the voice processing assembly, and taking a coding unit corresponding to the coding type in the voice coding assembly as a target coding unit;
screening out the voice data to be encoded from the file to be encoded;
coding the voice data to be coded by utilizing the target coding unit to obtain a coded voice file;
adding a coding type identification head to the coded voice file according to the coding type to obtain a standard coded file;
and carrying out structural body assembly on the standard coding file to obtain the voice coding file.
Optionally, the adaptively decoding and decoding packaging operation is performed on the file to be decoded by using the voice processing component and the decoding type to obtain a voice decoding file, which includes:
selecting a voice decoding component from the voice processing component, and taking a decoding unit corresponding to the decoding type in the voice decoding component as a target decoding unit;
screening out the voice data to be decoded from the file to be decoded;
decoding the voice data to be decoded by utilizing the target decoding unit to obtain a decoded voice file and decoded voice frame characteristics;
and carrying out structural body assembly on the decoded voice file and the decoded voice frame characteristics to obtain a voice decoded file.
Optionally, the performing adaptive transcoding and transcoding packaging operations on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoded file includes:
selecting a voice transcoding component from the voice processing components;
taking a transcoding unit corresponding to the transcoding type in the voice transcoding component as a target transcoding unit;
extracting voice data to be transcoded from the file to be transcoded;
performing transcoding operation on the voice data to be transcoded by using the target transcoding unit and the initial type to obtain a transcoded voice file;
and carrying out structural body assembly on the transcoded voice file according to the transcoding type to obtain the voice transcoded file.
Optionally, the dynamically adjusting the task of the voice processing component according to the occupancy rate of the graphics card includes:
judging whether the occupancy rate of the display card is larger than a preset primary load threshold value or not;
if not, returning to the step of acquiring the real-time display card load by using the browser;
if yes, judging whether the occupancy rate of the display card is larger than a preset secondary load threshold value;
if not, acquiring a real-time processing task of the voice processing component, and performing display card parallel calculation on the real-time processing task to obtain a parallel processing task;
if yes, acquiring a real-time processing task of the voice processing component, and setting priority of the real-time processing task to obtain a priority processing task.
In order to solve the above problems, the present invention further provides a device for implementing voice encoding and decoding and transcoding by using a WebGPU, the device comprising:
the initialization module is used for acquiring the calculation force of the display card according to the browser, and respectively carrying out coding initialization, decoding initialization and transcoding initialization operation on the preset primary processing assembly to obtain a voice processing assembly;
the abnormality monitoring module is used for carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality, and carrying out snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant abnormality;
the voice processing module is used for performing voice coding, voice decoding and voice transcoding operation on a voice file acquired in advance by utilizing the voice processing assembly to obtain a voice processing file;
the load calculation module is used for acquiring a real-time display card load by using the browser and generating a display card occupancy rate according to the display card calculation force and the real-time display card load;
and the dynamic adjustment module is used for carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
According to the invention, the maximum computing power of the display card equipment can be determined by acquiring the computing power of the display card according to the browser, so that the subsequent real-time determination of the occupancy rate of the display card is facilitated, the voice processing assembly is obtained by respectively carrying out coding initialization, decoding initialization and transcoding initialization operations on the preset primary processing assembly, the voice processing assembly can be realized by adopting units of multilevel coding, decoding and transcoding, thus the voice processing requirements under different application scenes are met, the fault-tolerant abnormality is obtained by carrying out abnormal monitoring on the voice processing assembly, the snap-shot restarting recovery operation is carried out on the voice processing assembly according to the fault-tolerant abnormality, the fault-tolerant capability of the voice processing assembly during working can be ensured, the working efficiency of the voice processing assembly is improved, the voice processing file is obtained by carrying out voice coding, voice decoding and voice transcoding operations on the pre-acquired voice file by utilizing the voice processing assembly, and the voice processing file can be efficiently processed according to a plurality of different types of coding, decoding and transcoding units in the voice processing assembly, and the processing efficiency of the voice processing assembly is further improved.
The browser is used for acquiring the real-time display card load, the display card occupancy rate is generated according to the display card calculation force and the real-time display card load, the real-time monitoring of the performance load of the display card equipment can be realized, the working state of the display card is known, the subsequent dynamic adjustment of the voice processing assembly is facilitated, the working efficiency of the voice processing assembly can be improved, and the voice coding and decoding transcoding efficiency is improved by carrying out dynamic task adjustment on the voice processing assembly according to the display card occupancy rate. Therefore, the method and the device for realizing voice encoding and decoding and transcoding by the WebGPU can solve the problem of lower efficiency when the browser carries out voice processing.
Drawings
FIG. 1 is a flowchart of a method for implementing voice encoding and decoding and transcoding by a WebGPU according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an initialization of a speech processing assembly according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for generating a speech encoded file according to an embodiment of the present invention;
FIG. 4 is a functional block diagram of a device for implementing voice encoding, decoding and transcoding by using a WebGPU according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application provides a method for realizing voice encoding and decoding and transcoding by using a WebGPU. The execution subject of the WebGPU implementing the method for speech encoding and decoding and transcoding includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiments of the present application. In other words, the method for implementing voice codec and transcoding by the WebGPU may be performed by software or hardware installed in a terminal device or a server device, where the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 1, a flowchart of a method for implementing voice encoding and decoding and transcoding by using a WebGPU according to an embodiment of the present invention is shown. In this embodiment, the method for implementing voice encoding and decoding and transcoding by using the WebGPU includes:
s1, acquiring the computing power of a display card according to a browser, and respectively carrying out coding initialization, decoding initialization and transcoding initialization operation on a preset primary processing assembly to obtain a voice processing assembly.
In the embodiment of the invention, the browser refers to a web page browser, the graphics card computing power refers to the computing capacity of a graphics card of a host device corresponding to the browser, and the index of the graphics card computing power can be the number of floating point operations per second.
In the embodiment of the present invention, the obtaining the computing force of the display card according to the browser includes:
acquiring a display card computing force interface according to a browser, and acquiring a display card adapter according to the display card computing force interface;
according to the display card adapter, carrying out equipment request on the browser to obtain the display card equipment name;
and carrying out calculation force inquiry on the browser according to the display card equipment name to obtain the display card calculation force.
In detail, the graphics interface may be a WebGPU, and the WebGPU (Web Graphics Library for Graphics Processing Unit, a network graphics library of a graphics processing unit) is a low-level interface for providing modern graphics and computing functions for a Web browser, and the graphics adapter may be obtained by using a navigator. Gpu. Requestadapter () method of the graphics interface, where the graphics adapter refers to a hardware device for connecting a display and processing graphics rendering tasks.
Specifically, the device request may be performed on the browser by using a requestDevice method of the graphics adapter to obtain a graphics device name, where the graphics device name is a device name of a graphics card corresponding to the browser, and the computing force may be queried on the browser by using a device.
In detail, referring to fig. 2, the foregoing operations of initializing encoding, initializing decoding, and initializing transcoding, respectively, are performed on a preset primary processing component, so as to obtain a speech processing component, where the speech processing component includes:
s21, carrying out multi-stage coding unit initialization operation on a preset primary processing assembly to obtain a voice coding assembly;
s22, performing multi-stage decoding unit initialization operation on the primary processing assembly to obtain a voice decoding assembly;
s23, carrying out multi-stage transcoding unit initialization operation on the primary processing assembly to obtain a voice transcoding assembly;
s24, integrating the voice coding component, the voice decoding component and the voice conversion component into a voice processing component.
Specifically, the multi-level coding unit initialization refers to initializing a speex coding unit, an acr coding unit, an aac coding unit, a g711 coding unit and an opus coding unit, wherein the speex coding unit may be initialized by using an encoder_spex_unit function, the acr coding unit may be initialized by using an encoder_acr_unit function, the aac coding unit may be initialized by using an encoder_aac_unit function, the g711 coding unit may be initialized by using an encoder_g711_unit function, and the opus coding unit may be initialized by using an encoder_opus_unit function.
In detail, the multi-stage decoding unit initialization means initializing a speex decoding unit, an acr decoding unit, an aac decoding unit, a g711 decoding unit, and an opus decoding unit, wherein the speex decoding unit may be initialized using a decoder_spex_unit function, the acr decoding unit may be initialized using a decoder_acr_unit function, the aac decoding unit may be initialized using a decoder_aac_unit function, the g711 decoding unit may be initialized using a decoder_g711_unit function, and the opus decoding unit may be initialized using a decoder_opus_unit function.
In detail, the multi-stage transcoding unit initialization refers to initializing a speex transcoding unit, an amur transcoding unit, an aac transcoding unit, a g711 transcoding unit and an opus transcoding unit, wherein the speex transcoding unit may be initialized using a franscoder_spex_unit function, the amur transcoding unit may be initialized using a franscoder_amur_unit function, the aac transcoding unit may be initialized using a franscoder_aac_unit function, the g711 transcoding unit may be initialized using a franscoder_g 711_unit function, and the opus transcoding unit may be initialized using a franscoder_opus_unit function.
In the embodiment of the invention, the maximum computing force of the display card equipment can be determined by acquiring the computing force of the display card according to the browser, so that the occupancy rate of the display card can be conveniently determined in real time, the voice processing assembly is obtained by respectively carrying out coding initialization, decoding initialization and transcoding initialization operations on the preset primary processing assembly, and the voice processing operation can be realized by adopting the units of multilevel coding, decoding and transcoding, thereby meeting the voice processing requirements in different application scenes.
S2, carrying out anomaly monitoring on the voice processing assembly to obtain fault tolerant anomalies, and carrying out snapshot restarting recovery operation on the voice processing assembly according to the fault tolerant anomalies.
In the embodiment of the invention, the abnormality monitoring means monitoring the working abnormality of the voice processing component during working, and the fault-tolerant mechanism can be utilized to monitor the abnormality of the voice processing component in a manner of burying points and the like, wherein the fault-tolerant abnormality comprises an abnormal state and a normal state, and the abnormal state comprises an abnormality such as overflow of a video memory, zero removal error, abnormality of an array out of range, abnormality of a null pointer and the like.
In the embodiment of the present invention, the performing a snapshot restart recovery operation on the speech processing assembly according to the fault tolerance exception includes:
judging whether the fault-tolerant abnormality is in an abnormal state or not;
if not, returning to the step of carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality;
if yes, storing the working snapshot of the voice processing assembly to obtain a real-time working snapshot;
restarting the voice processing assembly, and performing snapshot recovery operation on the restarted voice processing assembly by utilizing the real-time work snapshot.
Specifically, the local storage of the browser of the LocalStorage can be utilized to store the work snapshot of the voice processing component, so as to obtain a real-time work snapshot, wherein the real-time work snapshot refers to the current value and state of the real-time component, the data structure and the variable of the voice processing component during working.
In the embodiment of the invention, fault-tolerant abnormality is obtained by monitoring the abnormality of the voice processing component, and the snapshot restarting recovery operation is carried out on the voice processing component according to the fault-tolerant abnormality, so that the fault-tolerant capability of the voice processing component in working can be ensured, and the working efficiency of the voice processing component is improved.
S3, performing voice coding, voice decoding and voice transcoding operation on the pre-acquired voice file by utilizing the voice processing component to obtain the voice processing file.
In the embodiment of the invention, the voice file refers to a voice type file needing to be subjected to operations such as encoding, decoding or transcoding, and the voice processing file refers to a file after the operations such as encoding, decoding or transcoding are performed on the voice file.
In the embodiment of the present invention, the performing, by using the speech processing component, speech encoding, speech decoding and speech transcoding on a pre-acquired speech file to obtain a speech processing file includes:
the method comprises the steps of carrying out demand grouping on a voice file obtained in advance to obtain a file to be encoded, a file to be decoded and a file to be transcoded;
extracting a coding type from the file to be coded;
performing adaptive coding and coding packaging operation on the file to be coded by utilizing the voice processing component and the coding type to obtain a voice coding file;
identifying the identification header of the file to be decoded to obtain a decoding type;
performing adaptive decoding and decoding packaging operation on the file to be decoded by utilizing the voice processing component and the decoding type to obtain a voice decoding file;
extracting a transcoding type from the file to be transcoded, and identifying a middle identification head of the file to be transcoded to obtain an initial type;
performing adaptive transcoding and transcoding packaging operation on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoding file;
and integrating the voice coding file, the voice decoding file and the voice transcoding file into a voice processing file.
In detail, the grouping of requirements refers to grouping according to processing requirements of each file in the voice file, the processing requirements include encoding, decoding and transcoding, the file to be encoded refers to a file needing to perform encoding operation in the voice file, the file to be decoded refers to a file needing to perform decoding operation in the voice file, the file to be transcoded refers to a file needing to perform transcoding operation in the voice file, and the encoding type refers to a type of what voice encoding needs to be performed on voice data in the file to be encoded.
Specifically, referring to fig. 3, the performing adaptive coding and coding packaging operations on the file to be coded by using the speech processing component and the coding type to obtain a speech coding file includes:
s31, selecting a voice coding assembly from the voice processing assembly, and taking a coding unit corresponding to the coding type in the voice coding assembly as a target coding unit;
s32, screening out voice data to be encoded from the file to be encoded;
s33, coding the voice data to be coded by utilizing the target coding unit to obtain a coded voice file;
s34, adding a coding type identification head to the coded voice file according to the coding type to obtain a standard coded file;
and S35, assembling the structure body of the standard coding file to obtain the voice coding file.
In detail, the voice data to be encoded refers to voice data to be encoded, and the identification header identification refers to identification of an encoding type identification header of a voice file, for example, "0xF0" or "0xFF".
Specifically, the performing adaptive decoding and decoding packaging operations on the file to be decoded by using the voice processing component and the decoding type to obtain a voice decoding file includes:
selecting a voice decoding component from the voice processing component, and taking a decoding unit corresponding to the decoding type in the voice decoding component as a target decoding unit;
screening out the voice data to be decoded from the file to be decoded;
decoding the voice data to be decoded by utilizing the target decoding unit to obtain a decoded voice file and decoded voice frame characteristics;
and carrying out structural body assembly on the decoded voice file and the decoded voice frame characteristics to obtain a voice decoded file.
Specifically, the decoded speech frame features include a channel number, a sampling rate and a sampling bit number, the transcoding type refers to a speech type obtained after the file to be transcoded needs transcoding, and the initial type refers to a current speech coding type of the file to be transcoded.
Specifically, the performing adaptive transcoding and transcoding packaging operations on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoded file includes:
selecting a voice transcoding component from the voice processing components;
taking a transcoding unit corresponding to the transcoding type in the voice transcoding component as a target transcoding unit;
extracting voice data to be transcoded from the file to be transcoded;
performing transcoding operation on the voice data to be transcoded by using the target transcoding unit and the initial type to obtain a transcoded voice file;
and carrying out structural body assembly on the transcoded voice file according to the transcoding type to obtain the voice transcoded file.
In the embodiment of the invention, the voice processing file is obtained by performing voice encoding, voice decoding and voice transcoding operation on the voice file acquired in advance by utilizing the voice processing component, and the voice file can be processed efficiently according to a plurality of encoding, decoding and transcoding units of different types in the voice processing component, so that the voice processing efficiency is improved.
And S4, acquiring a real-time display card load by using the browser, and generating the display card occupancy rate according to the display card calculation force and the real-time display card load.
In the embodiment of the invention, the real-time display card load refers to the real-time occupation calculation performance of the display card equipment used by the voice processing component, and the display card occupation rate refers to the real-time utilization rate of the performance of the display card.
In the embodiment of the present invention, the obtaining the real-time video card load by using the browser includes: creating a graphical context object using the browser; acquiring a graphics display card adapter corresponding to the graphics context object; acquiring a standard display card calculation force interface in real time according to the graphic display card adapter; and carrying out real-time load inquiry on the browser according to the standard display card computing interface to obtain the real-time display card load.
Specifically, the adapter, the requestdevice function may be used to create a graphics context object, the navigator, the gpu, the requestadapter function may be used to obtain a graphics card adapter corresponding to the graphics context object, the WebGLRenderingContext may be used to obtain a standard graphics card computing force interface, and the getParameter method may be used to perform real-time load query.
Specifically, the generating the graphics occupancy rate according to the graphics computing force and the real-time graphics load refers to dividing the real-time graphics load by the graphics computing force to obtain the graphics occupancy rate.
In detail, the browser is utilized to acquire the real-time display card load, and the display card occupancy rate is generated according to the display card calculation force and the real-time display card load, so that the real-time monitoring of the performance load of the display card equipment can be realized, the working state of the display card is known, and the subsequent dynamic adjustment of the voice processing assembly is facilitated.
S5, carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
In the embodiment of the invention, the dynamic task adjustment refers to adjusting the parallel computing method or task processing priority in the voice processing assembly.
In the embodiment of the present invention, the dynamic task adjustment for the voice processing component according to the occupancy rate of the graphics card includes:
judging whether the occupancy rate of the display card is larger than a preset primary load threshold value or not;
if not, returning to the step of acquiring the real-time display card load by using the browser;
if yes, judging whether the occupancy rate of the display card is larger than a preset secondary load threshold value;
if not, acquiring a real-time processing task of the voice processing component, and performing display card parallel calculation on the real-time processing task to obtain a parallel processing task;
if yes, acquiring a real-time processing task of the voice processing component, and setting priority of the real-time processing task to obtain a priority processing task.
Specifically, the primary load threshold may be 80%, the secondary load threshold may be 90%, the graphics card parallel computing refers to splitting the real-time processing task into a plurality of parallel work subtasks by using a graphics card corresponding to the browser, and processing all the work subtasks in parallel by using the graphics card, and the priority setting refers to setting the priority of the task according to the urgency degree of the task or the importance of the task.
In the embodiment of the invention, the working efficiency of the voice processing assembly can be improved and the efficiency of voice coding, decoding and transcoding can be improved by carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
According to the invention, the maximum computing power of the display card equipment can be determined by acquiring the computing power of the display card according to the browser, so that the subsequent real-time determination of the occupancy rate of the display card is facilitated, the voice processing assembly is obtained by respectively carrying out coding initialization, decoding initialization and transcoding initialization operations on the preset primary processing assembly, the voice processing assembly can be realized by adopting units of multilevel coding, decoding and transcoding, thus the voice processing requirements under different application scenes are met, the fault-tolerant abnormality is obtained by carrying out abnormal monitoring on the voice processing assembly, the snap-shot restarting recovery operation is carried out on the voice processing assembly according to the fault-tolerant abnormality, the fault-tolerant capability of the voice processing assembly during working can be ensured, the working efficiency of the voice processing assembly is improved, the voice processing file is obtained by carrying out voice coding, voice decoding and voice transcoding operations on the pre-acquired voice file by utilizing the voice processing assembly, and the voice processing file can be efficiently processed according to a plurality of different types of coding, decoding and transcoding units in the voice processing assembly, and the processing efficiency of the voice processing assembly is further improved.
The browser is used for acquiring the real-time display card load, the display card occupancy rate is generated according to the display card calculation force and the real-time display card load, the real-time monitoring of the performance load of the display card equipment can be realized, the working state of the display card is known, the subsequent dynamic adjustment of the voice processing assembly is facilitated, the working efficiency of the voice processing assembly can be improved, and the voice coding and decoding transcoding efficiency is improved by carrying out dynamic task adjustment on the voice processing assembly according to the display card occupancy rate. Therefore, the method for realizing voice encoding and decoding and transcoding by the WebGPU can solve the problem of lower efficiency when the browser carries out voice processing.
Fig. 4 is a functional block diagram of an apparatus for implementing voice encoding, decoding and transcoding by using WebGPU according to an embodiment of the present invention.
The device 100 for implementing voice encoding and decoding and transcoding by using WebGPU of the present invention may be installed in an electronic device. Depending on the functions implemented, the WebGPU-enabled device 100 may include an initialization module 101, an anomaly monitoring module 102, a voice processing module 103, a load calculation module 104, and a dynamic adjustment module 105. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the initialization module 101 is configured to obtain a graphics card computing force according to a browser, and perform coding initialization, decoding initialization and transcoding initialization operations on a preset primary processing component to obtain a voice processing component;
the abnormality monitoring module 102 is configured to perform abnormality monitoring on the speech processing component to obtain a fault-tolerant abnormality, and perform a snapshot restart recovery operation on the speech processing component according to the fault-tolerant abnormality;
the voice processing module 103 is configured to perform voice encoding, voice decoding and voice transcoding operations on a voice file acquired in advance by using the voice processing component, so as to obtain a voice processing file;
the load calculation module 104 is configured to obtain a real-time graphics card load by using the browser, and generate a graphics card occupancy rate according to the graphics card calculation force and the real-time graphics card load;
the dynamic adjustment module 105 is configured to perform dynamic task adjustment on the speech processing component according to the occupancy rate of the graphics card.
In detail, each module in the apparatus 100 for implementing voice encoding, decoding and transcoding by using the WebGPU in the embodiment of the present invention adopts the same technical means as the method for implementing voice encoding, decoding and transcoding by using the WebGPU described in fig. 1 to 3, and can produce the same technical effects, which are not described herein again.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The various units or means recited in the apparatus embodiments may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. A method for WebGPU to implement speech codec and transcoding, the method comprising:
according to the display card computing force obtained by the browser, carrying out coding initialization, decoding initialization and transcoding initialization operation on a preset primary processing assembly respectively to obtain a voice processing assembly;
performing anomaly monitoring on the voice processing assembly to obtain fault-tolerant anomalies, and performing snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant anomalies;
performing voice coding, voice decoding and voice transcoding operation on a voice file acquired in advance by utilizing the voice processing component to obtain a voice processing file;
acquiring a real-time display card load by using the browser, and generating a display card occupancy rate according to the display card calculation force and the real-time display card load;
and carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
2. The method for implementing voice codec and transcoding by WebGPU according to claim 1, wherein the obtaining the graphics card computing force according to the browser comprises:
acquiring a display card computing force interface according to a browser, and acquiring a display card adapter according to the display card computing force interface;
according to the display card adapter, carrying out equipment request on the browser to obtain the display card equipment name;
and carrying out calculation force inquiry on the browser according to the display card equipment name to obtain the display card calculation force.
3. The method for implementing voice encoding and decoding and transcoding by WebGPU according to claim 1, wherein the performing the encoding initialization, decoding initialization and transcoding initialization operations on the preset primary processing component respectively to obtain the voice processing component includes:
performing multi-stage coding unit initialization operation on a preset primary processing assembly to obtain a voice coding assembly;
performing multi-stage decoding unit initialization operation on the primary processing assembly to obtain a voice decoding assembly;
performing multi-stage transcoding unit initialization operation on the primary processing assembly to obtain a voice transcoding assembly;
and integrating the voice coding component, the voice decoding component and the voice conversion component into a voice processing component.
4. The method for implementing voice codec and transcoding by WebGPU according to claim 1, wherein said performing a snapshot restart recovery operation on the voice processing component according to the fault tolerant exception comprises:
judging whether the fault-tolerant abnormality is in an abnormal state or not;
if not, returning to the step of carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality;
if yes, storing the working snapshot of the voice processing assembly to obtain a real-time working snapshot;
restarting the voice processing assembly, and performing snapshot recovery operation on the restarted voice processing assembly by utilizing the real-time work snapshot.
5. The method for implementing voice codec and transcoding by WebGPU according to claim 1, wherein the performing voice encoding, voice decoding and voice transcoding operations on the pre-acquired voice file by using the voice processing component to obtain the voice processing file comprises:
the method comprises the steps of carrying out demand grouping on a voice file obtained in advance to obtain a file to be encoded, a file to be decoded and a file to be transcoded;
extracting a coding type from the file to be coded;
performing adaptive coding and coding packaging operation on the file to be coded by utilizing the voice processing component and the coding type to obtain a voice coding file;
identifying the identification header of the file to be decoded to obtain a decoding type;
performing adaptive decoding and decoding packaging operation on the file to be decoded by utilizing the voice processing component and the decoding type to obtain a voice decoding file;
extracting a transcoding type from the file to be transcoded, and identifying a middle identification head of the file to be transcoded to obtain an initial type;
performing adaptive transcoding and transcoding packaging operation on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoding file;
and integrating the voice coding file, the voice decoding file and the voice transcoding file into a voice processing file.
6. The method for implementing voice codec and transcoding by WebGPU according to claim 5, wherein said adaptively encoding and encoding the file to be encoded using the voice processing component and the encoding type, and performing an encoding packaging operation, to obtain a voice encoded file, comprises:
selecting a voice coding assembly from the voice processing assembly, and taking a coding unit corresponding to the coding type in the voice coding assembly as a target coding unit;
screening out the voice data to be encoded from the file to be encoded;
coding the voice data to be coded by utilizing the target coding unit to obtain a coded voice file;
adding a coding type identification head to the coded voice file according to the coding type to obtain a standard coded file;
and carrying out structural body assembly on the standard coding file to obtain the voice coding file.
7. The method for implementing voice codec and transcoding by WebGPU according to claim 5, wherein said adaptively decoding and decoding the file to be decoded using the voice processing component and the decoding type, and performing a decoding and packaging operation to obtain a voice decoded file, comprises:
selecting a voice decoding component from the voice processing component, and taking a decoding unit corresponding to the decoding type in the voice decoding component as a target decoding unit;
screening out the voice data to be decoded from the file to be decoded;
decoding the voice data to be decoded by utilizing the target decoding unit to obtain a decoded voice file and decoded voice frame characteristics;
and carrying out structural body assembly on the decoded voice file and the decoded voice frame characteristics to obtain a voice decoded file.
8. The method for implementing voice codec and transcoding by WebGPU according to claim 5, wherein the performing adaptive transcoding and transcoding packaging operations on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoded file includes:
selecting a voice transcoding component from the voice processing components;
taking a transcoding unit corresponding to the transcoding type in the voice transcoding component as a target transcoding unit;
extracting voice data to be transcoded from the file to be transcoded;
performing transcoding operation on the voice data to be transcoded by using the target transcoding unit and the initial type to obtain a transcoded voice file;
and carrying out structural body assembly on the transcoded voice file according to the transcoding type to obtain the voice transcoded file.
9. The method for implementing voice codec and transcoding by WebGPU according to claim 1, wherein the dynamically task adjusting the voice processing component according to the graphics card occupancy comprises:
judging whether the occupancy rate of the display card is larger than a preset primary load threshold value or not;
if not, returning to the step of acquiring the real-time display card load by using the browser;
if yes, judging whether the occupancy rate of the display card is larger than a preset secondary load threshold value;
if not, acquiring a real-time processing task of the voice processing component, and performing display card parallel calculation on the real-time processing task to obtain a parallel processing task;
if yes, acquiring a real-time processing task of the voice processing component, and setting priority of the real-time processing task to obtain a priority processing task.
10. An apparatus for WebGPU to implement speech codec and transcoding, the apparatus comprising:
the initialization module is used for acquiring the calculation force of the display card according to the browser, and respectively carrying out coding initialization, decoding initialization and transcoding initialization operation on the preset primary processing assembly to obtain a voice processing assembly;
the abnormality monitoring module is used for carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality, and carrying out snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant abnormality;
the voice processing module is used for performing voice coding, voice decoding and voice transcoding operation on a voice file acquired in advance by utilizing the voice processing assembly to obtain a voice processing file;
the load calculation module is used for acquiring a real-time display card load by using the browser and generating a display card occupancy rate according to the display card calculation force and the real-time display card load;
and the dynamic adjustment module is used for carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
CN202410248002.6A 2024-03-05 2024-03-05 Method and device for realizing voice encoding and decoding and transcoding by WebGPU Pending CN117854518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410248002.6A CN117854518A (en) 2024-03-05 2024-03-05 Method and device for realizing voice encoding and decoding and transcoding by WebGPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410248002.6A CN117854518A (en) 2024-03-05 2024-03-05 Method and device for realizing voice encoding and decoding and transcoding by WebGPU

Publications (1)

Publication Number Publication Date
CN117854518A true CN117854518A (en) 2024-04-09

Family

ID=90534842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410248002.6A Pending CN117854518A (en) 2024-03-05 2024-03-05 Method and device for realizing voice encoding and decoding and transcoding by WebGPU

Country Status (1)

Country Link
CN (1) CN117854518A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1791119A (en) * 2005-12-12 2006-06-21 中兴通讯股份有限公司 Tracking method for mobile communication system signaling message
CN101827242A (en) * 2010-05-10 2010-09-08 南京邮电大学 Method for realizing video phone system based on IPTV set-top box
CN102279752A (en) * 2011-08-31 2011-12-14 北京华电万通科技有限公司 Device and method for rendering ultra-large scene in real time based on Web three-dimension (3D)
CN108733423A (en) * 2018-05-16 2018-11-02 福建天晴数码有限公司 A kind of method and terminal of determining mobile device model
CN109819057A (en) * 2019-04-08 2019-05-28 科大讯飞股份有限公司 A kind of load-balancing method and system
CN111432262A (en) * 2020-02-24 2020-07-17 杭州海康威视数字技术股份有限公司 Page video rendering method and device
CN112988364A (en) * 2021-05-20 2021-06-18 西安芯瞳半导体技术有限公司 Dynamic task scheduling method, device and storage medium
CN113485841A (en) * 2021-07-28 2021-10-08 腾讯科技(深圳)有限公司 Data processing method and device based on edge calculation and readable storage medium
CN115881142A (en) * 2022-11-29 2023-03-31 北京百瑞互联技术股份有限公司 Training method and device for bone conduction speech coding model and storage medium
CN116974872A (en) * 2023-08-03 2023-10-31 北京知道创宇信息技术股份有限公司 GPU card performance testing method and device, electronic equipment and readable storage medium
CN117130749A (en) * 2023-08-30 2023-11-28 合肥善达信息科技有限公司 Method for improving hardware decoding capability of Web player based on WebGPU

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1791119A (en) * 2005-12-12 2006-06-21 中兴通讯股份有限公司 Tracking method for mobile communication system signaling message
CN101827242A (en) * 2010-05-10 2010-09-08 南京邮电大学 Method for realizing video phone system based on IPTV set-top box
CN102279752A (en) * 2011-08-31 2011-12-14 北京华电万通科技有限公司 Device and method for rendering ultra-large scene in real time based on Web three-dimension (3D)
CN108733423A (en) * 2018-05-16 2018-11-02 福建天晴数码有限公司 A kind of method and terminal of determining mobile device model
CN109819057A (en) * 2019-04-08 2019-05-28 科大讯飞股份有限公司 A kind of load-balancing method and system
CN111432262A (en) * 2020-02-24 2020-07-17 杭州海康威视数字技术股份有限公司 Page video rendering method and device
CN112988364A (en) * 2021-05-20 2021-06-18 西安芯瞳半导体技术有限公司 Dynamic task scheduling method, device and storage medium
CN113485841A (en) * 2021-07-28 2021-10-08 腾讯科技(深圳)有限公司 Data processing method and device based on edge calculation and readable storage medium
CN115881142A (en) * 2022-11-29 2023-03-31 北京百瑞互联技术股份有限公司 Training method and device for bone conduction speech coding model and storage medium
CN116974872A (en) * 2023-08-03 2023-10-31 北京知道创宇信息技术股份有限公司 GPU card performance testing method and device, electronic equipment and readable storage medium
CN117130749A (en) * 2023-08-30 2023-11-28 合肥善达信息科技有限公司 Method for improving hardware decoding capability of Web player based on WebGPU

Similar Documents

Publication Publication Date Title
CN111382844A (en) Deep learning model training method and device
KR20220130630A (en) Image processing method, face recognition model training method, device and equipment
JP2017126332A (en) Systems and methods for efficient generation of stochastic spike patterns in core-based neuromorphic systems
CN113889076B (en) Speech recognition and coding/decoding method, device, electronic equipment and storage medium
CN114333862B (en) Audio encoding method, decoding method, device, equipment, storage medium and product
CN110807111A (en) Three-dimensional graph processing method and device, storage medium and electronic equipment
CN112181307A (en) Block chain based distributed data redundancy storage method and electronic equipment
CN115116454A (en) Audio encoding method, apparatus, device, storage medium, and program product
CN114710667A (en) Rapid prediction method and device for CU partition in H.266/VVC screen content frame
CN117854518A (en) Method and device for realizing voice encoding and decoding and transcoding by WebGPU
CN111083408B (en) Method, system and equipment for processing video storage service
CN113409803A (en) Voice signal processing method, device, storage medium and equipment
US11792408B2 (en) Transcoder target bitrate prediction techniques
CN117149399A (en) Data processing method, device, equipment and readable storage medium
CN114842857A (en) Voice processing method, device, system, equipment and storage medium
CN112511706A (en) Voice stream obtaining method and system suitable for non-invasive bypass telephone
CN106302573B (en) Method, system and device for processing data by adopting erasure code
CN117708568B (en) Feature extraction method and device for large language model, computer equipment and medium
CN113762510B (en) Data processing method and device for target model, electronic equipment and medium
US11734012B2 (en) Systems and methods for efficient transfer of log data
CN113628215B (en) Image processing method, system, device and storage medium
CN118247363A (en) Image compression method, device, electronic equipment and storage medium
CN110933444B (en) Bit width value storage method and device
CN116543756A (en) Speech recognition model training method, device and related equipment
CN111143314A (en) Log analysis method and system based on high-speed streaming processing technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination