CN117854518A - Method and device for realizing voice encoding and decoding and transcoding by WebGPU - Google Patents
Method and device for realizing voice encoding and decoding and transcoding by WebGPU Download PDFInfo
- Publication number
- CN117854518A CN117854518A CN202410248002.6A CN202410248002A CN117854518A CN 117854518 A CN117854518 A CN 117854518A CN 202410248002 A CN202410248002 A CN 202410248002A CN 117854518 A CN117854518 A CN 117854518A
- Authority
- CN
- China
- Prior art keywords
- voice
- file
- transcoding
- decoding
- display card
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 224
- 238000004364 calculation method Methods 0.000 claims abstract description 28
- 238000012544 monitoring process Methods 0.000 claims abstract description 22
- 238000011084 recovery Methods 0.000 claims abstract description 16
- 230000005856 abnormality Effects 0.000 claims description 35
- 238000004806 packaging method and process Methods 0.000 claims description 18
- 230000003044 adaptive effect Effects 0.000 claims description 14
- 230000002159 abnormal effect Effects 0.000 claims description 7
- 238000012216 screening Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 23
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention relates to the technical field of voice processing, and discloses a method for realizing voice encoding and decoding and transcoding by using a WebGPU, which comprises the following steps: according to the display card computing force obtained by the browser, carrying out coding initialization, decoding initialization and transcoding initialization operation on a preset primary processing assembly respectively to obtain a voice processing assembly; performing anomaly monitoring on the voice processing assembly to obtain fault-tolerant anomalies, and performing snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant anomalies; performing voice encoding, voice decoding and voice transcoding operation on a pre-acquired voice file by utilizing a voice processing component to obtain a voice processing file; acquiring a real-time display card load by using a browser, and generating the occupancy rate of the display card according to the display card calculation force and the real-time display card load; and carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card. The invention also provides a device for realizing voice encoding and decoding and transcoding by the WebGPU. The invention can improve the efficiency of browser voice processing.
Description
Technical Field
The invention relates to the technical field of voice processing, in particular to a method and a device for realizing voice encoding and decoding and transcoding by using a WebGPU.
Background
With the tremendous development of internet technology in recent years, developers in the field of voice communication have introduced advanced schemes of various voice codec standards such as Speex, AMR, g.711, AAC, and Opus, among which AAC and Opus are favored in terms of their characteristics of significantly reducing transmission bandwidth and storage space while maintaining high-quality voice.
However, speech codec and transcoding applications in Web browsers still have challenges in adaptation. Generally, most applications still rely on a browser to perform soft decoding, encoding and transcoding, and such soft decoding, encoding and transcoding methods often cause performance deficiency in a Web browser, and such problems may be represented by phenomena such as playing card of a voice stream, browser crash, and the like, which limit good experience of online voice service, and may cause low efficiency when the browser performs voice processing.
Disclosure of Invention
The invention provides a method and a device for realizing voice encoding and decoding and transcoding by using a WebGPU (web graphic processing unit), which mainly aim to solve the problem of lower efficiency when a browser performs voice processing.
In order to achieve the above object, the present invention provides a method for implementing voice encoding and decoding and transcoding by using WebGPU, including:
according to the display card computing force obtained by the browser, carrying out coding initialization, decoding initialization and transcoding initialization operation on a preset primary processing assembly respectively to obtain a voice processing assembly;
performing anomaly monitoring on the voice processing assembly to obtain fault-tolerant anomalies, and performing snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant anomalies;
performing voice coding, voice decoding and voice transcoding operation on a voice file acquired in advance by utilizing the voice processing component to obtain a voice processing file;
acquiring a real-time display card load by using the browser, and generating a display card occupancy rate according to the display card calculation force and the real-time display card load;
and carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
Optionally, the obtaining the graphics computing force according to the browser includes:
acquiring a display card computing force interface according to a browser, and acquiring a display card adapter according to the display card computing force interface;
according to the display card adapter, carrying out equipment request on the browser to obtain the display card equipment name;
and carrying out calculation force inquiry on the browser according to the display card equipment name to obtain the display card calculation force.
Optionally, the performing the coding initialization, decoding initialization and transcoding initialization operations on the preset primary processing component respectively to obtain a voice processing component includes:
performing multi-stage coding unit initialization operation on a preset primary processing assembly to obtain a voice coding assembly;
performing multi-stage decoding unit initialization operation on the primary processing assembly to obtain a voice decoding assembly;
performing multi-stage transcoding unit initialization operation on the primary processing assembly to obtain a voice transcoding assembly;
and integrating the voice coding component, the voice decoding component and the voice conversion component into a voice processing component.
Optionally, the performing a snapshot restart recovery operation on the voice processing component according to the fault tolerance exception includes:
judging whether the fault-tolerant abnormality is in an abnormal state or not;
if not, returning to the step of carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality;
if yes, storing the working snapshot of the voice processing assembly to obtain a real-time working snapshot;
restarting the voice processing assembly, and performing snapshot recovery operation on the restarted voice processing assembly by utilizing the real-time work snapshot.
Optionally, the performing, by using the voice processing component, voice encoding, voice decoding and voice transcoding on a pre-acquired voice file to obtain a voice processing file includes:
the method comprises the steps of carrying out demand grouping on a voice file obtained in advance to obtain a file to be encoded, a file to be decoded and a file to be transcoded;
extracting a coding type from the file to be coded;
performing adaptive coding and coding packaging operation on the file to be coded by utilizing the voice processing component and the coding type to obtain a voice coding file;
identifying the identification header of the file to be decoded to obtain a decoding type;
performing adaptive decoding and decoding packaging operation on the file to be decoded by utilizing the voice processing component and the decoding type to obtain a voice decoding file;
extracting a transcoding type from the file to be transcoded, and identifying a middle identification head of the file to be transcoded to obtain an initial type;
performing adaptive transcoding and transcoding packaging operation on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoding file;
and integrating the voice coding file, the voice decoding file and the voice transcoding file into a voice processing file.
Optionally, the adaptively encoding and encoding packaging operation is performed on the file to be encoded by using the speech processing component and the encoding type to obtain a speech encoded file, which includes:
selecting a voice coding assembly from the voice processing assembly, and taking a coding unit corresponding to the coding type in the voice coding assembly as a target coding unit;
screening out the voice data to be encoded from the file to be encoded;
coding the voice data to be coded by utilizing the target coding unit to obtain a coded voice file;
adding a coding type identification head to the coded voice file according to the coding type to obtain a standard coded file;
and carrying out structural body assembly on the standard coding file to obtain the voice coding file.
Optionally, the adaptively decoding and decoding packaging operation is performed on the file to be decoded by using the voice processing component and the decoding type to obtain a voice decoding file, which includes:
selecting a voice decoding component from the voice processing component, and taking a decoding unit corresponding to the decoding type in the voice decoding component as a target decoding unit;
screening out the voice data to be decoded from the file to be decoded;
decoding the voice data to be decoded by utilizing the target decoding unit to obtain a decoded voice file and decoded voice frame characteristics;
and carrying out structural body assembly on the decoded voice file and the decoded voice frame characteristics to obtain a voice decoded file.
Optionally, the performing adaptive transcoding and transcoding packaging operations on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoded file includes:
selecting a voice transcoding component from the voice processing components;
taking a transcoding unit corresponding to the transcoding type in the voice transcoding component as a target transcoding unit;
extracting voice data to be transcoded from the file to be transcoded;
performing transcoding operation on the voice data to be transcoded by using the target transcoding unit and the initial type to obtain a transcoded voice file;
and carrying out structural body assembly on the transcoded voice file according to the transcoding type to obtain the voice transcoded file.
Optionally, the dynamically adjusting the task of the voice processing component according to the occupancy rate of the graphics card includes:
judging whether the occupancy rate of the display card is larger than a preset primary load threshold value or not;
if not, returning to the step of acquiring the real-time display card load by using the browser;
if yes, judging whether the occupancy rate of the display card is larger than a preset secondary load threshold value;
if not, acquiring a real-time processing task of the voice processing component, and performing display card parallel calculation on the real-time processing task to obtain a parallel processing task;
if yes, acquiring a real-time processing task of the voice processing component, and setting priority of the real-time processing task to obtain a priority processing task.
In order to solve the above problems, the present invention further provides a device for implementing voice encoding and decoding and transcoding by using a WebGPU, the device comprising:
the initialization module is used for acquiring the calculation force of the display card according to the browser, and respectively carrying out coding initialization, decoding initialization and transcoding initialization operation on the preset primary processing assembly to obtain a voice processing assembly;
the abnormality monitoring module is used for carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality, and carrying out snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant abnormality;
the voice processing module is used for performing voice coding, voice decoding and voice transcoding operation on a voice file acquired in advance by utilizing the voice processing assembly to obtain a voice processing file;
the load calculation module is used for acquiring a real-time display card load by using the browser and generating a display card occupancy rate according to the display card calculation force and the real-time display card load;
and the dynamic adjustment module is used for carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
According to the invention, the maximum computing power of the display card equipment can be determined by acquiring the computing power of the display card according to the browser, so that the subsequent real-time determination of the occupancy rate of the display card is facilitated, the voice processing assembly is obtained by respectively carrying out coding initialization, decoding initialization and transcoding initialization operations on the preset primary processing assembly, the voice processing assembly can be realized by adopting units of multilevel coding, decoding and transcoding, thus the voice processing requirements under different application scenes are met, the fault-tolerant abnormality is obtained by carrying out abnormal monitoring on the voice processing assembly, the snap-shot restarting recovery operation is carried out on the voice processing assembly according to the fault-tolerant abnormality, the fault-tolerant capability of the voice processing assembly during working can be ensured, the working efficiency of the voice processing assembly is improved, the voice processing file is obtained by carrying out voice coding, voice decoding and voice transcoding operations on the pre-acquired voice file by utilizing the voice processing assembly, and the voice processing file can be efficiently processed according to a plurality of different types of coding, decoding and transcoding units in the voice processing assembly, and the processing efficiency of the voice processing assembly is further improved.
The browser is used for acquiring the real-time display card load, the display card occupancy rate is generated according to the display card calculation force and the real-time display card load, the real-time monitoring of the performance load of the display card equipment can be realized, the working state of the display card is known, the subsequent dynamic adjustment of the voice processing assembly is facilitated, the working efficiency of the voice processing assembly can be improved, and the voice coding and decoding transcoding efficiency is improved by carrying out dynamic task adjustment on the voice processing assembly according to the display card occupancy rate. Therefore, the method and the device for realizing voice encoding and decoding and transcoding by the WebGPU can solve the problem of lower efficiency when the browser carries out voice processing.
Drawings
FIG. 1 is a flowchart of a method for implementing voice encoding and decoding and transcoding by a WebGPU according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an initialization of a speech processing assembly according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for generating a speech encoded file according to an embodiment of the present invention;
FIG. 4 is a functional block diagram of a device for implementing voice encoding, decoding and transcoding by using a WebGPU according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application provides a method for realizing voice encoding and decoding and transcoding by using a WebGPU. The execution subject of the WebGPU implementing the method for speech encoding and decoding and transcoding includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiments of the present application. In other words, the method for implementing voice codec and transcoding by the WebGPU may be performed by software or hardware installed in a terminal device or a server device, where the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 1, a flowchart of a method for implementing voice encoding and decoding and transcoding by using a WebGPU according to an embodiment of the present invention is shown. In this embodiment, the method for implementing voice encoding and decoding and transcoding by using the WebGPU includes:
s1, acquiring the computing power of a display card according to a browser, and respectively carrying out coding initialization, decoding initialization and transcoding initialization operation on a preset primary processing assembly to obtain a voice processing assembly.
In the embodiment of the invention, the browser refers to a web page browser, the graphics card computing power refers to the computing capacity of a graphics card of a host device corresponding to the browser, and the index of the graphics card computing power can be the number of floating point operations per second.
In the embodiment of the present invention, the obtaining the computing force of the display card according to the browser includes:
acquiring a display card computing force interface according to a browser, and acquiring a display card adapter according to the display card computing force interface;
according to the display card adapter, carrying out equipment request on the browser to obtain the display card equipment name;
and carrying out calculation force inquiry on the browser according to the display card equipment name to obtain the display card calculation force.
In detail, the graphics interface may be a WebGPU, and the WebGPU (Web Graphics Library for Graphics Processing Unit, a network graphics library of a graphics processing unit) is a low-level interface for providing modern graphics and computing functions for a Web browser, and the graphics adapter may be obtained by using a navigator. Gpu. Requestadapter () method of the graphics interface, where the graphics adapter refers to a hardware device for connecting a display and processing graphics rendering tasks.
Specifically, the device request may be performed on the browser by using a requestDevice method of the graphics adapter to obtain a graphics device name, where the graphics device name is a device name of a graphics card corresponding to the browser, and the computing force may be queried on the browser by using a device.
In detail, referring to fig. 2, the foregoing operations of initializing encoding, initializing decoding, and initializing transcoding, respectively, are performed on a preset primary processing component, so as to obtain a speech processing component, where the speech processing component includes:
s21, carrying out multi-stage coding unit initialization operation on a preset primary processing assembly to obtain a voice coding assembly;
s22, performing multi-stage decoding unit initialization operation on the primary processing assembly to obtain a voice decoding assembly;
s23, carrying out multi-stage transcoding unit initialization operation on the primary processing assembly to obtain a voice transcoding assembly;
s24, integrating the voice coding component, the voice decoding component and the voice conversion component into a voice processing component.
Specifically, the multi-level coding unit initialization refers to initializing a speex coding unit, an acr coding unit, an aac coding unit, a g711 coding unit and an opus coding unit, wherein the speex coding unit may be initialized by using an encoder_spex_unit function, the acr coding unit may be initialized by using an encoder_acr_unit function, the aac coding unit may be initialized by using an encoder_aac_unit function, the g711 coding unit may be initialized by using an encoder_g711_unit function, and the opus coding unit may be initialized by using an encoder_opus_unit function.
In detail, the multi-stage decoding unit initialization means initializing a speex decoding unit, an acr decoding unit, an aac decoding unit, a g711 decoding unit, and an opus decoding unit, wherein the speex decoding unit may be initialized using a decoder_spex_unit function, the acr decoding unit may be initialized using a decoder_acr_unit function, the aac decoding unit may be initialized using a decoder_aac_unit function, the g711 decoding unit may be initialized using a decoder_g711_unit function, and the opus decoding unit may be initialized using a decoder_opus_unit function.
In detail, the multi-stage transcoding unit initialization refers to initializing a speex transcoding unit, an amur transcoding unit, an aac transcoding unit, a g711 transcoding unit and an opus transcoding unit, wherein the speex transcoding unit may be initialized using a franscoder_spex_unit function, the amur transcoding unit may be initialized using a franscoder_amur_unit function, the aac transcoding unit may be initialized using a franscoder_aac_unit function, the g711 transcoding unit may be initialized using a franscoder_g 711_unit function, and the opus transcoding unit may be initialized using a franscoder_opus_unit function.
In the embodiment of the invention, the maximum computing force of the display card equipment can be determined by acquiring the computing force of the display card according to the browser, so that the occupancy rate of the display card can be conveniently determined in real time, the voice processing assembly is obtained by respectively carrying out coding initialization, decoding initialization and transcoding initialization operations on the preset primary processing assembly, and the voice processing operation can be realized by adopting the units of multilevel coding, decoding and transcoding, thereby meeting the voice processing requirements in different application scenes.
S2, carrying out anomaly monitoring on the voice processing assembly to obtain fault tolerant anomalies, and carrying out snapshot restarting recovery operation on the voice processing assembly according to the fault tolerant anomalies.
In the embodiment of the invention, the abnormality monitoring means monitoring the working abnormality of the voice processing component during working, and the fault-tolerant mechanism can be utilized to monitor the abnormality of the voice processing component in a manner of burying points and the like, wherein the fault-tolerant abnormality comprises an abnormal state and a normal state, and the abnormal state comprises an abnormality such as overflow of a video memory, zero removal error, abnormality of an array out of range, abnormality of a null pointer and the like.
In the embodiment of the present invention, the performing a snapshot restart recovery operation on the speech processing assembly according to the fault tolerance exception includes:
judging whether the fault-tolerant abnormality is in an abnormal state or not;
if not, returning to the step of carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality;
if yes, storing the working snapshot of the voice processing assembly to obtain a real-time working snapshot;
restarting the voice processing assembly, and performing snapshot recovery operation on the restarted voice processing assembly by utilizing the real-time work snapshot.
Specifically, the local storage of the browser of the LocalStorage can be utilized to store the work snapshot of the voice processing component, so as to obtain a real-time work snapshot, wherein the real-time work snapshot refers to the current value and state of the real-time component, the data structure and the variable of the voice processing component during working.
In the embodiment of the invention, fault-tolerant abnormality is obtained by monitoring the abnormality of the voice processing component, and the snapshot restarting recovery operation is carried out on the voice processing component according to the fault-tolerant abnormality, so that the fault-tolerant capability of the voice processing component in working can be ensured, and the working efficiency of the voice processing component is improved.
S3, performing voice coding, voice decoding and voice transcoding operation on the pre-acquired voice file by utilizing the voice processing component to obtain the voice processing file.
In the embodiment of the invention, the voice file refers to a voice type file needing to be subjected to operations such as encoding, decoding or transcoding, and the voice processing file refers to a file after the operations such as encoding, decoding or transcoding are performed on the voice file.
In the embodiment of the present invention, the performing, by using the speech processing component, speech encoding, speech decoding and speech transcoding on a pre-acquired speech file to obtain a speech processing file includes:
the method comprises the steps of carrying out demand grouping on a voice file obtained in advance to obtain a file to be encoded, a file to be decoded and a file to be transcoded;
extracting a coding type from the file to be coded;
performing adaptive coding and coding packaging operation on the file to be coded by utilizing the voice processing component and the coding type to obtain a voice coding file;
identifying the identification header of the file to be decoded to obtain a decoding type;
performing adaptive decoding and decoding packaging operation on the file to be decoded by utilizing the voice processing component and the decoding type to obtain a voice decoding file;
extracting a transcoding type from the file to be transcoded, and identifying a middle identification head of the file to be transcoded to obtain an initial type;
performing adaptive transcoding and transcoding packaging operation on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoding file;
and integrating the voice coding file, the voice decoding file and the voice transcoding file into a voice processing file.
In detail, the grouping of requirements refers to grouping according to processing requirements of each file in the voice file, the processing requirements include encoding, decoding and transcoding, the file to be encoded refers to a file needing to perform encoding operation in the voice file, the file to be decoded refers to a file needing to perform decoding operation in the voice file, the file to be transcoded refers to a file needing to perform transcoding operation in the voice file, and the encoding type refers to a type of what voice encoding needs to be performed on voice data in the file to be encoded.
Specifically, referring to fig. 3, the performing adaptive coding and coding packaging operations on the file to be coded by using the speech processing component and the coding type to obtain a speech coding file includes:
s31, selecting a voice coding assembly from the voice processing assembly, and taking a coding unit corresponding to the coding type in the voice coding assembly as a target coding unit;
s32, screening out voice data to be encoded from the file to be encoded;
s33, coding the voice data to be coded by utilizing the target coding unit to obtain a coded voice file;
s34, adding a coding type identification head to the coded voice file according to the coding type to obtain a standard coded file;
and S35, assembling the structure body of the standard coding file to obtain the voice coding file.
In detail, the voice data to be encoded refers to voice data to be encoded, and the identification header identification refers to identification of an encoding type identification header of a voice file, for example, "0xF0" or "0xFF".
Specifically, the performing adaptive decoding and decoding packaging operations on the file to be decoded by using the voice processing component and the decoding type to obtain a voice decoding file includes:
selecting a voice decoding component from the voice processing component, and taking a decoding unit corresponding to the decoding type in the voice decoding component as a target decoding unit;
screening out the voice data to be decoded from the file to be decoded;
decoding the voice data to be decoded by utilizing the target decoding unit to obtain a decoded voice file and decoded voice frame characteristics;
and carrying out structural body assembly on the decoded voice file and the decoded voice frame characteristics to obtain a voice decoded file.
Specifically, the decoded speech frame features include a channel number, a sampling rate and a sampling bit number, the transcoding type refers to a speech type obtained after the file to be transcoded needs transcoding, and the initial type refers to a current speech coding type of the file to be transcoded.
Specifically, the performing adaptive transcoding and transcoding packaging operations on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoded file includes:
selecting a voice transcoding component from the voice processing components;
taking a transcoding unit corresponding to the transcoding type in the voice transcoding component as a target transcoding unit;
extracting voice data to be transcoded from the file to be transcoded;
performing transcoding operation on the voice data to be transcoded by using the target transcoding unit and the initial type to obtain a transcoded voice file;
and carrying out structural body assembly on the transcoded voice file according to the transcoding type to obtain the voice transcoded file.
In the embodiment of the invention, the voice processing file is obtained by performing voice encoding, voice decoding and voice transcoding operation on the voice file acquired in advance by utilizing the voice processing component, and the voice file can be processed efficiently according to a plurality of encoding, decoding and transcoding units of different types in the voice processing component, so that the voice processing efficiency is improved.
And S4, acquiring a real-time display card load by using the browser, and generating the display card occupancy rate according to the display card calculation force and the real-time display card load.
In the embodiment of the invention, the real-time display card load refers to the real-time occupation calculation performance of the display card equipment used by the voice processing component, and the display card occupation rate refers to the real-time utilization rate of the performance of the display card.
In the embodiment of the present invention, the obtaining the real-time video card load by using the browser includes: creating a graphical context object using the browser; acquiring a graphics display card adapter corresponding to the graphics context object; acquiring a standard display card calculation force interface in real time according to the graphic display card adapter; and carrying out real-time load inquiry on the browser according to the standard display card computing interface to obtain the real-time display card load.
Specifically, the adapter, the requestdevice function may be used to create a graphics context object, the navigator, the gpu, the requestadapter function may be used to obtain a graphics card adapter corresponding to the graphics context object, the WebGLRenderingContext may be used to obtain a standard graphics card computing force interface, and the getParameter method may be used to perform real-time load query.
Specifically, the generating the graphics occupancy rate according to the graphics computing force and the real-time graphics load refers to dividing the real-time graphics load by the graphics computing force to obtain the graphics occupancy rate.
In detail, the browser is utilized to acquire the real-time display card load, and the display card occupancy rate is generated according to the display card calculation force and the real-time display card load, so that the real-time monitoring of the performance load of the display card equipment can be realized, the working state of the display card is known, and the subsequent dynamic adjustment of the voice processing assembly is facilitated.
S5, carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
In the embodiment of the invention, the dynamic task adjustment refers to adjusting the parallel computing method or task processing priority in the voice processing assembly.
In the embodiment of the present invention, the dynamic task adjustment for the voice processing component according to the occupancy rate of the graphics card includes:
judging whether the occupancy rate of the display card is larger than a preset primary load threshold value or not;
if not, returning to the step of acquiring the real-time display card load by using the browser;
if yes, judging whether the occupancy rate of the display card is larger than a preset secondary load threshold value;
if not, acquiring a real-time processing task of the voice processing component, and performing display card parallel calculation on the real-time processing task to obtain a parallel processing task;
if yes, acquiring a real-time processing task of the voice processing component, and setting priority of the real-time processing task to obtain a priority processing task.
Specifically, the primary load threshold may be 80%, the secondary load threshold may be 90%, the graphics card parallel computing refers to splitting the real-time processing task into a plurality of parallel work subtasks by using a graphics card corresponding to the browser, and processing all the work subtasks in parallel by using the graphics card, and the priority setting refers to setting the priority of the task according to the urgency degree of the task or the importance of the task.
In the embodiment of the invention, the working efficiency of the voice processing assembly can be improved and the efficiency of voice coding, decoding and transcoding can be improved by carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
According to the invention, the maximum computing power of the display card equipment can be determined by acquiring the computing power of the display card according to the browser, so that the subsequent real-time determination of the occupancy rate of the display card is facilitated, the voice processing assembly is obtained by respectively carrying out coding initialization, decoding initialization and transcoding initialization operations on the preset primary processing assembly, the voice processing assembly can be realized by adopting units of multilevel coding, decoding and transcoding, thus the voice processing requirements under different application scenes are met, the fault-tolerant abnormality is obtained by carrying out abnormal monitoring on the voice processing assembly, the snap-shot restarting recovery operation is carried out on the voice processing assembly according to the fault-tolerant abnormality, the fault-tolerant capability of the voice processing assembly during working can be ensured, the working efficiency of the voice processing assembly is improved, the voice processing file is obtained by carrying out voice coding, voice decoding and voice transcoding operations on the pre-acquired voice file by utilizing the voice processing assembly, and the voice processing file can be efficiently processed according to a plurality of different types of coding, decoding and transcoding units in the voice processing assembly, and the processing efficiency of the voice processing assembly is further improved.
The browser is used for acquiring the real-time display card load, the display card occupancy rate is generated according to the display card calculation force and the real-time display card load, the real-time monitoring of the performance load of the display card equipment can be realized, the working state of the display card is known, the subsequent dynamic adjustment of the voice processing assembly is facilitated, the working efficiency of the voice processing assembly can be improved, and the voice coding and decoding transcoding efficiency is improved by carrying out dynamic task adjustment on the voice processing assembly according to the display card occupancy rate. Therefore, the method for realizing voice encoding and decoding and transcoding by the WebGPU can solve the problem of lower efficiency when the browser carries out voice processing.
Fig. 4 is a functional block diagram of an apparatus for implementing voice encoding, decoding and transcoding by using WebGPU according to an embodiment of the present invention.
The device 100 for implementing voice encoding and decoding and transcoding by using WebGPU of the present invention may be installed in an electronic device. Depending on the functions implemented, the WebGPU-enabled device 100 may include an initialization module 101, an anomaly monitoring module 102, a voice processing module 103, a load calculation module 104, and a dynamic adjustment module 105. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the initialization module 101 is configured to obtain a graphics card computing force according to a browser, and perform coding initialization, decoding initialization and transcoding initialization operations on a preset primary processing component to obtain a voice processing component;
the abnormality monitoring module 102 is configured to perform abnormality monitoring on the speech processing component to obtain a fault-tolerant abnormality, and perform a snapshot restart recovery operation on the speech processing component according to the fault-tolerant abnormality;
the voice processing module 103 is configured to perform voice encoding, voice decoding and voice transcoding operations on a voice file acquired in advance by using the voice processing component, so as to obtain a voice processing file;
the load calculation module 104 is configured to obtain a real-time graphics card load by using the browser, and generate a graphics card occupancy rate according to the graphics card calculation force and the real-time graphics card load;
the dynamic adjustment module 105 is configured to perform dynamic task adjustment on the speech processing component according to the occupancy rate of the graphics card.
In detail, each module in the apparatus 100 for implementing voice encoding, decoding and transcoding by using the WebGPU in the embodiment of the present invention adopts the same technical means as the method for implementing voice encoding, decoding and transcoding by using the WebGPU described in fig. 1 to 3, and can produce the same technical effects, which are not described herein again.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The various units or means recited in the apparatus embodiments may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.
Claims (10)
1. A method for WebGPU to implement speech codec and transcoding, the method comprising:
according to the display card computing force obtained by the browser, carrying out coding initialization, decoding initialization and transcoding initialization operation on a preset primary processing assembly respectively to obtain a voice processing assembly;
performing anomaly monitoring on the voice processing assembly to obtain fault-tolerant anomalies, and performing snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant anomalies;
performing voice coding, voice decoding and voice transcoding operation on a voice file acquired in advance by utilizing the voice processing component to obtain a voice processing file;
acquiring a real-time display card load by using the browser, and generating a display card occupancy rate according to the display card calculation force and the real-time display card load;
and carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
2. The method for implementing voice codec and transcoding by WebGPU according to claim 1, wherein the obtaining the graphics card computing force according to the browser comprises:
acquiring a display card computing force interface according to a browser, and acquiring a display card adapter according to the display card computing force interface;
according to the display card adapter, carrying out equipment request on the browser to obtain the display card equipment name;
and carrying out calculation force inquiry on the browser according to the display card equipment name to obtain the display card calculation force.
3. The method for implementing voice encoding and decoding and transcoding by WebGPU according to claim 1, wherein the performing the encoding initialization, decoding initialization and transcoding initialization operations on the preset primary processing component respectively to obtain the voice processing component includes:
performing multi-stage coding unit initialization operation on a preset primary processing assembly to obtain a voice coding assembly;
performing multi-stage decoding unit initialization operation on the primary processing assembly to obtain a voice decoding assembly;
performing multi-stage transcoding unit initialization operation on the primary processing assembly to obtain a voice transcoding assembly;
and integrating the voice coding component, the voice decoding component and the voice conversion component into a voice processing component.
4. The method for implementing voice codec and transcoding by WebGPU according to claim 1, wherein said performing a snapshot restart recovery operation on the voice processing component according to the fault tolerant exception comprises:
judging whether the fault-tolerant abnormality is in an abnormal state or not;
if not, returning to the step of carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality;
if yes, storing the working snapshot of the voice processing assembly to obtain a real-time working snapshot;
restarting the voice processing assembly, and performing snapshot recovery operation on the restarted voice processing assembly by utilizing the real-time work snapshot.
5. The method for implementing voice codec and transcoding by WebGPU according to claim 1, wherein the performing voice encoding, voice decoding and voice transcoding operations on the pre-acquired voice file by using the voice processing component to obtain the voice processing file comprises:
the method comprises the steps of carrying out demand grouping on a voice file obtained in advance to obtain a file to be encoded, a file to be decoded and a file to be transcoded;
extracting a coding type from the file to be coded;
performing adaptive coding and coding packaging operation on the file to be coded by utilizing the voice processing component and the coding type to obtain a voice coding file;
identifying the identification header of the file to be decoded to obtain a decoding type;
performing adaptive decoding and decoding packaging operation on the file to be decoded by utilizing the voice processing component and the decoding type to obtain a voice decoding file;
extracting a transcoding type from the file to be transcoded, and identifying a middle identification head of the file to be transcoded to obtain an initial type;
performing adaptive transcoding and transcoding packaging operation on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoding file;
and integrating the voice coding file, the voice decoding file and the voice transcoding file into a voice processing file.
6. The method for implementing voice codec and transcoding by WebGPU according to claim 5, wherein said adaptively encoding and encoding the file to be encoded using the voice processing component and the encoding type, and performing an encoding packaging operation, to obtain a voice encoded file, comprises:
selecting a voice coding assembly from the voice processing assembly, and taking a coding unit corresponding to the coding type in the voice coding assembly as a target coding unit;
screening out the voice data to be encoded from the file to be encoded;
coding the voice data to be coded by utilizing the target coding unit to obtain a coded voice file;
adding a coding type identification head to the coded voice file according to the coding type to obtain a standard coded file;
and carrying out structural body assembly on the standard coding file to obtain the voice coding file.
7. The method for implementing voice codec and transcoding by WebGPU according to claim 5, wherein said adaptively decoding and decoding the file to be decoded using the voice processing component and the decoding type, and performing a decoding and packaging operation to obtain a voice decoded file, comprises:
selecting a voice decoding component from the voice processing component, and taking a decoding unit corresponding to the decoding type in the voice decoding component as a target decoding unit;
screening out the voice data to be decoded from the file to be decoded;
decoding the voice data to be decoded by utilizing the target decoding unit to obtain a decoded voice file and decoded voice frame characteristics;
and carrying out structural body assembly on the decoded voice file and the decoded voice frame characteristics to obtain a voice decoded file.
8. The method for implementing voice codec and transcoding by WebGPU according to claim 5, wherein the performing adaptive transcoding and transcoding packaging operations on the file to be transcoded by using the voice processing component, the transcoding type and the initial type to obtain a voice transcoded file includes:
selecting a voice transcoding component from the voice processing components;
taking a transcoding unit corresponding to the transcoding type in the voice transcoding component as a target transcoding unit;
extracting voice data to be transcoded from the file to be transcoded;
performing transcoding operation on the voice data to be transcoded by using the target transcoding unit and the initial type to obtain a transcoded voice file;
and carrying out structural body assembly on the transcoded voice file according to the transcoding type to obtain the voice transcoded file.
9. The method for implementing voice codec and transcoding by WebGPU according to claim 1, wherein the dynamically task adjusting the voice processing component according to the graphics card occupancy comprises:
judging whether the occupancy rate of the display card is larger than a preset primary load threshold value or not;
if not, returning to the step of acquiring the real-time display card load by using the browser;
if yes, judging whether the occupancy rate of the display card is larger than a preset secondary load threshold value;
if not, acquiring a real-time processing task of the voice processing component, and performing display card parallel calculation on the real-time processing task to obtain a parallel processing task;
if yes, acquiring a real-time processing task of the voice processing component, and setting priority of the real-time processing task to obtain a priority processing task.
10. An apparatus for WebGPU to implement speech codec and transcoding, the apparatus comprising:
the initialization module is used for acquiring the calculation force of the display card according to the browser, and respectively carrying out coding initialization, decoding initialization and transcoding initialization operation on the preset primary processing assembly to obtain a voice processing assembly;
the abnormality monitoring module is used for carrying out abnormality monitoring on the voice processing assembly to obtain fault-tolerant abnormality, and carrying out snapshot restarting recovery operation on the voice processing assembly according to the fault-tolerant abnormality;
the voice processing module is used for performing voice coding, voice decoding and voice transcoding operation on a voice file acquired in advance by utilizing the voice processing assembly to obtain a voice processing file;
the load calculation module is used for acquiring a real-time display card load by using the browser and generating a display card occupancy rate according to the display card calculation force and the real-time display card load;
and the dynamic adjustment module is used for carrying out dynamic task adjustment on the voice processing assembly according to the occupancy rate of the display card.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410248002.6A CN117854518A (en) | 2024-03-05 | 2024-03-05 | Method and device for realizing voice encoding and decoding and transcoding by WebGPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410248002.6A CN117854518A (en) | 2024-03-05 | 2024-03-05 | Method and device for realizing voice encoding and decoding and transcoding by WebGPU |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117854518A true CN117854518A (en) | 2024-04-09 |
Family
ID=90534842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410248002.6A Pending CN117854518A (en) | 2024-03-05 | 2024-03-05 | Method and device for realizing voice encoding and decoding and transcoding by WebGPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117854518A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1791119A (en) * | 2005-12-12 | 2006-06-21 | 中兴通讯股份有限公司 | Tracking method for mobile communication system signaling message |
CN101827242A (en) * | 2010-05-10 | 2010-09-08 | 南京邮电大学 | Method for realizing video phone system based on IPTV set-top box |
CN102279752A (en) * | 2011-08-31 | 2011-12-14 | 北京华电万通科技有限公司 | Device and method for rendering ultra-large scene in real time based on Web three-dimension (3D) |
CN108733423A (en) * | 2018-05-16 | 2018-11-02 | 福建天晴数码有限公司 | A kind of method and terminal of determining mobile device model |
CN109819057A (en) * | 2019-04-08 | 2019-05-28 | 科大讯飞股份有限公司 | A kind of load-balancing method and system |
CN111432262A (en) * | 2020-02-24 | 2020-07-17 | 杭州海康威视数字技术股份有限公司 | Page video rendering method and device |
CN112988364A (en) * | 2021-05-20 | 2021-06-18 | 西安芯瞳半导体技术有限公司 | Dynamic task scheduling method, device and storage medium |
CN113485841A (en) * | 2021-07-28 | 2021-10-08 | 腾讯科技(深圳)有限公司 | Data processing method and device based on edge calculation and readable storage medium |
CN115881142A (en) * | 2022-11-29 | 2023-03-31 | 北京百瑞互联技术股份有限公司 | Training method and device for bone conduction speech coding model and storage medium |
CN116974872A (en) * | 2023-08-03 | 2023-10-31 | 北京知道创宇信息技术股份有限公司 | GPU card performance testing method and device, electronic equipment and readable storage medium |
CN117130749A (en) * | 2023-08-30 | 2023-11-28 | 合肥善达信息科技有限公司 | Method for improving hardware decoding capability of Web player based on WebGPU |
-
2024
- 2024-03-05 CN CN202410248002.6A patent/CN117854518A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1791119A (en) * | 2005-12-12 | 2006-06-21 | 中兴通讯股份有限公司 | Tracking method for mobile communication system signaling message |
CN101827242A (en) * | 2010-05-10 | 2010-09-08 | 南京邮电大学 | Method for realizing video phone system based on IPTV set-top box |
CN102279752A (en) * | 2011-08-31 | 2011-12-14 | 北京华电万通科技有限公司 | Device and method for rendering ultra-large scene in real time based on Web three-dimension (3D) |
CN108733423A (en) * | 2018-05-16 | 2018-11-02 | 福建天晴数码有限公司 | A kind of method and terminal of determining mobile device model |
CN109819057A (en) * | 2019-04-08 | 2019-05-28 | 科大讯飞股份有限公司 | A kind of load-balancing method and system |
CN111432262A (en) * | 2020-02-24 | 2020-07-17 | 杭州海康威视数字技术股份有限公司 | Page video rendering method and device |
CN112988364A (en) * | 2021-05-20 | 2021-06-18 | 西安芯瞳半导体技术有限公司 | Dynamic task scheduling method, device and storage medium |
CN113485841A (en) * | 2021-07-28 | 2021-10-08 | 腾讯科技(深圳)有限公司 | Data processing method and device based on edge calculation and readable storage medium |
CN115881142A (en) * | 2022-11-29 | 2023-03-31 | 北京百瑞互联技术股份有限公司 | Training method and device for bone conduction speech coding model and storage medium |
CN116974872A (en) * | 2023-08-03 | 2023-10-31 | 北京知道创宇信息技术股份有限公司 | GPU card performance testing method and device, electronic equipment and readable storage medium |
CN117130749A (en) * | 2023-08-30 | 2023-11-28 | 合肥善达信息科技有限公司 | Method for improving hardware decoding capability of Web player based on WebGPU |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111382844A (en) | Deep learning model training method and device | |
KR20220130630A (en) | Image processing method, face recognition model training method, device and equipment | |
JP2017126332A (en) | Systems and methods for efficient generation of stochastic spike patterns in core-based neuromorphic systems | |
CN113889076B (en) | Speech recognition and coding/decoding method, device, electronic equipment and storage medium | |
CN114333862B (en) | Audio encoding method, decoding method, device, equipment, storage medium and product | |
CN110807111A (en) | Three-dimensional graph processing method and device, storage medium and electronic equipment | |
CN112181307A (en) | Block chain based distributed data redundancy storage method and electronic equipment | |
CN115116454A (en) | Audio encoding method, apparatus, device, storage medium, and program product | |
CN114710667A (en) | Rapid prediction method and device for CU partition in H.266/VVC screen content frame | |
CN117854518A (en) | Method and device for realizing voice encoding and decoding and transcoding by WebGPU | |
CN111083408B (en) | Method, system and equipment for processing video storage service | |
CN113409803A (en) | Voice signal processing method, device, storage medium and equipment | |
US11792408B2 (en) | Transcoder target bitrate prediction techniques | |
CN117149399A (en) | Data processing method, device, equipment and readable storage medium | |
CN114842857A (en) | Voice processing method, device, system, equipment and storage medium | |
CN112511706A (en) | Voice stream obtaining method and system suitable for non-invasive bypass telephone | |
CN106302573B (en) | Method, system and device for processing data by adopting erasure code | |
CN117708568B (en) | Feature extraction method and device for large language model, computer equipment and medium | |
CN113762510B (en) | Data processing method and device for target model, electronic equipment and medium | |
US11734012B2 (en) | Systems and methods for efficient transfer of log data | |
CN113628215B (en) | Image processing method, system, device and storage medium | |
CN118247363A (en) | Image compression method, device, electronic equipment and storage medium | |
CN110933444B (en) | Bit width value storage method and device | |
CN116543756A (en) | Speech recognition model training method, device and related equipment | |
CN111143314A (en) | Log analysis method and system based on high-speed streaming processing technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |