CN114171034A - High-definition set top box voice data coding and decoding system and method - Google Patents

High-definition set top box voice data coding and decoding system and method Download PDF

Info

Publication number
CN114171034A
CN114171034A CN202111460183.1A CN202111460183A CN114171034A CN 114171034 A CN114171034 A CN 114171034A CN 202111460183 A CN202111460183 A CN 202111460183A CN 114171034 A CN114171034 A CN 114171034A
Authority
CN
China
Prior art keywords
coding
signal
top box
voice data
set top
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111460183.1A
Other languages
Chinese (zh)
Other versions
CN114171034B (en
Inventor
张鹏
刘琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Expressway Da Industrial Co ltd
Shenzhen Gaosuda Technology Co ltd
Original Assignee
Shenzhen Gaosuda Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Gaosuda Technology Co ltd filed Critical Shenzhen Gaosuda Technology Co ltd
Priority to CN202111460183.1A priority Critical patent/CN114171034B/en
Publication of CN114171034A publication Critical patent/CN114171034A/en
Application granted granted Critical
Publication of CN114171034B publication Critical patent/CN114171034B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a high-definition set top box voice data coding and decoding system and a method thereof, wherein the system comprises the following steps: the processing module is used for acquiring voice data to be coded from a preset server according to an audio request sent by the target set top box and processing the voice data; the coding module is used for coding the processed voice data to obtain a coding result; the generating module is used for generating a check code according to the characteristic parameters of the voice data to be coded; and the verification module is used for verifying the coding result by using the check code and transmitting the coding result to the target set top box after verification so as to enable the target set top box to decode the coding result by using the decoding function of the target set top box. The coded voice data is verified by utilizing the characteristic parameters of the voice data to ensure the accuracy of the coding result, so that the probability of misrecognition and error coding can be greatly reduced, the accuracy is improved, the condition that audio and video on the set top box correspond to each other is ensured, and the watching experience of audiences is improved.

Description

High-definition set top box voice data coding and decoding system and method
Technical Field
The invention relates to the technical field of data processing, in particular to a high-definition set top box voice data coding and decoding system and method.
Background
With the vigorous advance of the three-network convergence technology, people can enjoy rich and diverse information services more and more quickly, conveniently and timely. The integration of three networks means that telecommunication network, computer network and cable television network can provide communication services of comprehensive multimedia including voice, data, image and the like through technical transformation. If the corresponding equipment terminals of the telecommunication network system, the computer network system and the cable television network system only have the original functions, the requirements of consumers cannot be met. The set-top box is a digital television device, which is widely applied in family life, along with the development of electronic technology, the performance and function of the set-top box are greatly improved in recent years, the set-top box with stronger performance and more functions is widely applied in various fields, the working principle of the television set-top box is that a digital television signal is received by a cable television network and converted into an analog signal, then the analog signal is played on a television to obtain a voice video signal, but the voice analog signal cannot be directly absorbed and played because the analog signal does not have the function of radio reception, the analog signal does not have the function of coding, so the received voice analog signal is coded into a data signal by means of external force, the data signal is received by the set-top box and then is decoded and converted into the analog signal to play audio and video, in the existing coding technology, in order to recognize a speech signal and then obtain a pronunciation spectrum, and code speech data aiming at the pronunciation spectrum, the above method has the following problems: because the pronunciation frequency spectrum is identified intelligently by the system, the occurrence of false identification is inevitable, so that the problem of audio and video non-correspondence on the set top box due to the fact that the encoding result does not conform to the reality is caused, and the experience of audiences is influenced.
Disclosure of Invention
Aiming at the displayed problems, the invention provides a high-definition set top box voice data coding and decoding system and a method thereof, which are used for solving the problems that the voice frequency spectrum is identified intelligently by a system, so that the situation of error identification is difficult to avoid, the audio and video on the set top box are not corresponding due to the fact that the encoding result is not in accordance with the reality, and the experience of audiences is influenced.
A high definition set top box voice data coding and decoding system comprises:
the processing module is used for acquiring voice data to be coded from a preset server according to an audio request sent by the target set top box and processing the voice data;
the coding module is used for coding the processed voice data to obtain a coding result;
the generating module is used for generating a check code according to the characteristic parameters of the voice data to be coded;
and the verification module is used for verifying the coding result by using the check code and transmitting the coding result to the target set top box after verification so as to enable the target set top box to decode the coding result by using the decoding function of the target set top box.
Preferably, the processing module includes:
the detection submodule is used for detecting whether the target set-top box sends an audio request or not, and if so, generating an acquisition instruction;
the first acquisition submodule is used for acquiring a voice analog signal of target voice data corresponding to the audio request from the preset server according to the acquisition instruction;
the extraction submodule is used for extracting a noise signal in the voice analog signal;
and the processing submodule is used for eliminating the noise signals in the voice analog signals to obtain the processed voice analog signals and confirming the processed voice analog signals as voice data to be coded.
Preferably, the encoding module includes:
the first determining submodule is used for determining the signal frequency corresponding to the frame signal of each frame in the voice data to be coded;
the classification submodule is used for dividing the voice analog signal corresponding to the voice data to be coded into a long frame signal and a short frame signal according to the signal frequency of each frame signal;
the calculation submodule is used for calculating the transformation coefficient of each frame signal in the long frame signal category and the short frame signal category respectively and determining the target coding mode of the voice data to be coded according to the transformation coefficient;
and the coding submodule is used for coding the voice data to be coded by utilizing the target coding mode.
Preferably, the generating module includes:
the second obtaining submodule is used for obtaining the characteristic parameters of the voice data to be coded;
the second determining submodule is used for determining a signal coding sequence of the voice data to be coded according to the characteristic parameters of the voice data to be coded;
the generation submodule is used for generating a standard error correcting code of each sequence factor according to the state parameter of the sequence factor in the signal coding sequence;
and the merging submodule is used for merging all standard error correcting codes to obtain the check code.
Preferably, the verification module includes:
the verification submodule is used for verifying the coding frame corresponding to each sequence factor in the coding result by using the standard error correcting code of each sequence factor to obtain a verification result;
the screening submodule is used for screening target coding frames which fail to pass the verification according to the verification result;
the feedback sub-module is used for feeding the target coding frame back to the coding module for recoding;
the replacing submodule is used for replacing the target coding frame after recoding with the original target coding frame in the coding result;
and the transmission submodule is used for generating a decoding instruction and transmitting the decoding instruction and the replaced coding result to the target set top box so as to enable the target set top box to decode the coding result by utilizing the decoding function of the target set top box.
Preferably, before the coding sub-module uses the target coding method to code the voice data, the coding sub-module is further configured to:
determining the searching difficulty of the current code searching strategy based on the target coding mode;
determining whether the searching difficulty according to the current code searching strategy meets the coding requirement or not based on the performance index of the system and the complexity of the voice data to be coded, if so, not performing subsequent operation, otherwise, determining that the code searching strategy needs to be replaced;
detecting the current bandwidth of an input signal of the target set top box, and selecting N first code search strategies in a preset strategy library according to the current bandwidth;
and selecting a second code search strategy which is most matched with a target coding mode from the N first code search strategies, and replacing the current code search strategy by using the second code search strategy.
Preferably, the system further comprises:
the decoding evaluation module is used for calculating the maximum decoding amount of the target set top box before the target set top box utilizes the decoding function of the target set top box to decode the coding result, and the calculating step comprises the following steps:
detecting the maximum capacity of a data buffer area of a background of a target set top box;
constructing a conversion matrix of the data buffer zone grading work of the background of the target set top box according to the maximum capacity;
calculating the maximum length of the single storage data according to the maximum capacity of the data buffer area;
determining the current length of the voice data according to the coding result, comparing the current length with the maximum length, if the current length is less than or equal to the maximum length, determining that the target set top box can directly decode the voice data, and if the current length is greater than the maximum length, extracting a gain factor of the voice data;
determining the current grading condition of the voice data according to the gain factor and the conversion matrix of the grading work of the data buffer area;
and determining the maximum decoding amount corresponding to the current grading situation as the maximum decoding amount of the target set top box.
Preferably, the step of eliminating the noise signal in the voice analog signal by the processing sub-module includes:
acquiring a first amplitude spectrum of the voice analog signal;
comparing the first amplitude spectrum with a second amplitude spectrum of the standard voice signal to obtain a comparison result;
determining different magnitude spectrum masks according to the comparison result;
determining a noise signal in the voice analog signal according to the amplitude spectrum mask;
calculating the average value of the amplitude spectrum mask and carrying out error elimination processing on the average value;
constructing a noise signal elimination model of the voice simulation signal according to the noise signal and the mean value of the processed amplitude spectrum mask;
acquiring a signal amplitude attenuation coefficient of the noise signal in the voice analog signal;
determining a loss function of the noise signal in the elimination process according to the signal amplitude attenuation coefficient and the probability distribution of the noise signal in the voice analog signal;
inputting a loss function of the noise signal in a cancellation process into the noise signal cancellation model to perfect the noise signal cancellation model;
acquiring a standard voice signal and a first signal frequency thereof;
training the improved noise signal elimination model by using the standard voice signal and the first signal frequency thereof to obtain the trained noise signal elimination model;
inputting a second signal frequency corresponding to the voice simulation signal into the trained noise signal elimination model to obtain a third signal frequency corresponding to a noise signal;
and eliminating a target signal corresponding to a fourth signal frequency which is the same as the third signal frequency in the voice analog signal by using a preset noise elimination method, and obtaining the voice analog signal with the noise signal eliminated.
A high-definition set top box voice data coding and decoding method comprises the following steps:
acquiring voice data to be encoded from a preset server according to an audio request sent by a target set top box and processing the voice data;
coding the processed voice data to obtain a coding result;
generating a check code according to the characteristic parameters of the voice data to be coded;
and verifying the coding result by using the check code, and transmitting the coding result to the target set top box after verification so as to decode the coding result by using the decoding function of the target set top box.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a schematic structural diagram of a high-definition set-top box voice data encoding and decoding system provided by the present invention;
fig. 2 is a schematic structural diagram of a processing module in a high-definition set-top box voice data encoding and decoding system provided by the present invention;
fig. 3 is a schematic structural diagram of a decoding module in a high-definition set-top box voice data encoding and decoding system provided by the present invention;
fig. 4 is a flowchart of a method for encoding and decoding voice data of a high-definition set-top box according to the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
With the vigorous advance of the three-network convergence technology, people can enjoy rich and diverse information services more and more quickly, conveniently and timely. The integration of three networks means that telecommunication network, computer network and cable television network can provide communication services of comprehensive multimedia including voice, data, image and the like through technical transformation. If the corresponding equipment terminals of the telecommunication network system, the computer network system and the cable television network system only have the original functions, the requirements of consumers cannot be met. The set-top box is a digital television device, which is widely applied in family life, along with the development of electronic technology, the performance and function of the set-top box are greatly improved in recent years, the set-top box with stronger performance and more functions is widely applied in various fields, the working principle of the television set-top box is that a digital television signal is received by a cable television network and converted into an analog signal, then the analog signal is played on a television to obtain a voice video signal, but the voice analog signal cannot be directly absorbed and played because the analog signal does not have the function of radio reception, the analog signal does not have the function of coding, so the received voice analog signal is coded into a data signal by means of external force, the data signal is received by the set-top box and then is decoded and converted into the analog signal to play audio and video, in the existing coding technology, in order to recognize a speech signal and then obtain a pronunciation spectrum, and code speech data aiming at the pronunciation spectrum, the above method has the following problems: because the pronunciation frequency spectrum is identified intelligently by the system, the occurrence of false identification is inevitable, so that the problem of audio and video non-correspondence on the set top box due to the fact that the encoding result does not conform to the reality is caused, and the experience of audiences is influenced. In order to solve the above problem, the embodiment discloses a high definition set top box voice data encoding and decoding system.
A high definition set top box voice data coding and decoding system, as shown in fig. 1, the system includes:
the processing module 101 is configured to obtain voice data to be encoded from a preset server according to an audio request sent by a target set top box and process the voice data;
the encoding module 102 is configured to encode the processed voice data to obtain an encoding result;
the generating module 103 is configured to generate a check code according to the characteristic parameter of the voice data to be encoded;
and the verification module 104 is configured to verify the coding result by using the check code, and after the verification is completed, transmit the coding result to the target set top box so that the target set top box decodes the coding result by using a self-decoding function.
The working principle of the technical scheme is as follows: the method comprises the steps of firstly, detecting whether a target set top box sends an audio request by using a processing module, if so, acquiring voice data to be coded from a preset server and processing the voice data, then, coding the processed voice data by using the coding module to acquire the coded voice data, then, generating a check code according to characteristic parameters of the voice data to be coded by using a generating module, finally, verifying the coded voice data by using a verifying module, and after the verification is passed, determining that the coded voice data is qualified, and transmitting the coded voice data to the target set top box so that the target set top box decodes the coded voice data by using a decoding function of the target set top box to convert digital signals corresponding to the voice data into analog signals for playing.
The beneficial effects of the above technical scheme are: the method has the advantages that the probability of misrecognizing the misrecognizing error codes can be greatly reduced by verifying the coded voice data by using the characteristic parameters of the voice data, the accuracy is improved, the condition that the audios and videos on the set top box correspond to each other is guaranteed, the watching experience of audiences is improved, furthermore, the occurrence of misrecognizing conditions caused by misrecognizing due to the tone quality problem can be avoided by using a verification code verification mode, the coding efficiency and the coding accuracy are further improved, and the problems that in the prior art, the audio and video on the set top box do not correspond to each other due to the fact that the pronunciation frequency spectrum is recognized intelligently by a system, the misrecognizing conditions are difficult to avoid are solved, and the experience of the audiences is influenced.
In one embodiment, as shown in fig. 2, the processing module includes:
the detection submodule 1011 is used for detecting whether the target set top box sends an audio request or not, and if so, generating an acquisition instruction;
a first obtaining sub-module 1012, configured to obtain, according to the obtaining instruction, a voice analog signal of target voice data corresponding to the audio request from the preset server;
an extracting sub-module 1013 for extracting a noise signal from the voice analog signal;
the processing sub-module 1014 is configured to perform elimination processing on the noise signal in the voice analog signal, obtain a processed voice analog signal, and determine the processed voice analog signal as voice data to be encoded.
The beneficial effects of the above technical scheme are: the noise signal elimination processing is carried out on the voice analog signal, so that the coding precision of the voice data can be further ensured, meanwhile, the interference of useless signals is eliminated, and the coding efficiency is further improved.
In one embodiment, as shown in fig. 3, the encoding module includes:
a first determining submodule 1021, configured to determine a signal frequency corresponding to a frame signal of each frame in the speech data to be encoded;
the classification submodule 1022 is configured to divide the voice analog signal corresponding to the voice data to be encoded into a long frame signal and a short frame signal according to the signal frequency of each frame signal;
the calculating submodule 1023 is used for calculating the transformation coefficient of each frame signal in the long frame signal category and the short frame signal category respectively and determining the target coding mode of the voice data to be coded according to the transformation coefficient;
and the encoding submodule 1024 is configured to encode the voice data to be encoded by using the target encoding manner.
The beneficial effects of the above technical scheme are: the voice analog signals are subjected to framing processing, so that intelligent processing can be effectively carried out on the voice signals with different frame rates, the coding efficiency of the voice data is further ensured, and further, the voice data can be reasonably coded according to parameters of the voice data by selecting a proper target coding mode for coding, and the coding efficiency is further ensured.
In one embodiment, the generating module includes:
the second obtaining submodule is used for obtaining the characteristic parameters of the voice data to be coded;
the second determining submodule is used for determining a signal coding sequence of the voice data to be coded according to the characteristic parameters of the voice data to be coded;
the generation submodule is used for generating a standard error correcting code of each sequence factor according to the state parameter of the sequence factor in the signal coding sequence;
and the merging submodule is used for merging all standard error correcting codes to obtain the check code.
The beneficial effects of the above technical scheme are: by generating the standard error correcting code of each sequence factor, an accurate check code can be obtained according to the state parameter of each voice signal in the voice data, a reference basis is provided for the verification of the subsequent coding result, and the verification accuracy and precision are improved.
In one embodiment, a verification module, comprising:
the verification submodule is used for verifying the coding frame corresponding to each sequence factor in the coding result by using the standard error correcting code of each sequence factor to obtain a verification result;
the screening submodule is used for screening target coding frames which fail to pass the verification according to the verification result;
the feedback sub-module is used for feeding the target coding frame back to the coding module for recoding;
the replacing submodule is used for replacing the target coding frame after recoding with the original target coding frame in the coding result;
and the transmission submodule is used for generating a decoding instruction and transmitting the decoding instruction and the replaced coding result to the target set top box so as to enable the target set top box to decode the coding result by utilizing the decoding function of the target set top box.
The beneficial effects of the above technical scheme are: the method can carry out complete verification on each coding frame in a targeted manner, further ensures the verification precision, and further can effectively carry out adaptive modification on the coding result of the speech signal with error coding by recoding the target coding frame so as to ensure the accuracy of the final coding result.
In one embodiment, before the encoding sub-module encodes the speech data by using the target encoding method, the encoding sub-module is further configured to:
determining the searching difficulty of the current code searching strategy based on the target coding mode;
determining whether the searching difficulty according to the current code searching strategy meets the coding requirement or not based on the performance index of the system and the complexity of the voice data to be coded, if so, not performing subsequent operation, otherwise, determining that the code searching strategy needs to be replaced;
detecting the current bandwidth of an input signal of the target set top box, and selecting N first code search strategies in a preset strategy library according to the current bandwidth;
and selecting a second code search strategy which is most matched with a target coding mode from the N first code search strategies, and replacing the current code search strategy by using the second code search strategy.
The beneficial effects of the above technical scheme are: the method can ensure the smooth and efficient coding work of the voice data by setting a proper code search strategy for the target coding mode, further improve the coding efficiency, and further obtain the optimal code search strategy aiming at the characteristics of the target coding mode by selecting the second code search strategy which is most matched with the target coding mode, thereby greatly reducing the fault tolerance rate and further improving the coding efficiency.
In one embodiment, the system further comprises:
the decoding evaluation module is used for calculating the maximum decoding amount of the target set top box before the target set top box utilizes the decoding function of the target set top box to decode the coding result, and the calculating step comprises the following steps:
detecting the maximum capacity of a data buffer area of a background of a target set top box;
constructing a conversion matrix of the data buffer zone grading work of the background of the target set top box according to the maximum capacity;
calculating the maximum length of the single storage data according to the maximum capacity of the data buffer area;
determining the current length of the voice data according to the coding result, comparing the current length with the maximum length, if the current length is less than or equal to the maximum length, determining that the target set top box can directly decode the voice data, and if the current length is greater than the maximum length, extracting a gain factor of the voice data;
determining the current grading condition of the voice data according to the gain factor and the conversion matrix of the grading work of the data buffer area;
and determining the maximum decoding amount corresponding to the current grading situation as the maximum decoding amount of the target set top box.
The beneficial effects of the above technical scheme are: the maximum decoding amount of the target set top box is calculated, so that the stable work of the target set top box can be effectively guaranteed, the carrying load of the target set top box is reduced, the decoding stability and efficiency of the target set top box on the coded voice data are guaranteed, the audio and video playing can be stably carried out, and the experience of a user is further improved.
In one embodiment, the step of extracting the gain factor of the speech data comprises:
adding the voice data into a preset partition;
determining the data length and the data integrating degree in each partition;
and calculating a gain threshold according to the data length and the data contact degree in each partition:
Figure BDA0003389615060000111
wherein S is represented as a gain threshold, M is represented as the number of partitions of a preset partition, i is represented as the ith partition, and aiExpressed as the data length in the ith partition, e is expressed as a natural constant with the value of 2.72, ajExpressed as the length of data in the jth partition, biExpressing the data integrating degree in the ith partition, and expressing p as a preset reference integrating degree mean value;
and extracting a first gain factor of each section of data in the voice data, and taking a second gain factor of which the first gain factor is more than or equal to the gain threshold value as the gain factor of the voice data.
The beneficial effects of the above technical scheme are: effective gain factors in the voice signals can be effectively screened out by calculating the gain threshold, the number of processed samples is reduced, and the working efficiency is improved.
In one embodiment, the step of the processing sub-module performing rejection processing on the noise signal in the voice analog signal includes:
acquiring a first amplitude spectrum of the voice analog signal;
comparing the first amplitude spectrum with a second amplitude spectrum of the standard voice signal to obtain a comparison result;
determining different magnitude spectrum masks according to the comparison result;
determining a noise signal in the voice analog signal according to the amplitude spectrum mask;
calculating the average value of the amplitude spectrum mask and carrying out error elimination processing on the average value;
constructing a noise signal elimination model of the voice simulation signal according to the noise signal and the mean value of the processed amplitude spectrum mask;
acquiring a signal amplitude attenuation coefficient of the noise signal in the voice analog signal;
determining a loss function of the noise signal in the elimination process according to the signal amplitude attenuation coefficient and the probability distribution of the noise signal in the voice analog signal;
inputting a loss function of the noise signal in a cancellation process into the noise signal cancellation model to perfect the noise signal cancellation model;
acquiring a standard voice signal and a first signal frequency thereof;
training the improved noise signal elimination model by using the standard voice signal and the first signal frequency thereof to obtain the trained noise signal elimination model;
inputting a second signal frequency corresponding to the voice simulation signal into the trained noise signal elimination model to obtain a third signal frequency corresponding to a noise signal;
and eliminating a target signal corresponding to a fourth signal frequency which is the same as the third signal frequency in the voice analog signal by using a preset noise elimination method, and obtaining the voice analog signal with the noise signal eliminated.
The beneficial effects of the above technical scheme are: the noise signal frequency in the voice simulation signal can be accurately determined by constructing the noise signal elimination model, the precision is improved, the elimination integrity of the noise signal is ensured, and the stability is improved.
The embodiment also discloses a method for encoding and decoding the voice data of the high-definition set top box, which comprises the following steps as shown in fig. 4:
step S401, acquiring voice data to be coded from a preset server according to an audio request sent by a target set top box and processing the voice data;
s402, coding the processed voice data to obtain a coding result;
step S403, generating a check code according to the characteristic parameters of the voice data to be coded;
and S404, verifying the coding result by using the check code, and after the verification is finished, transmitting the coding result to the target set top box so as to enable the target set top box to decode the coding result by using the decoding function of the target set top box.
The working principle and the advantageous effects of the above technical solutions have been described in the system claims, and are not described herein again.
It will be understood by those skilled in the art that the first and second terms of the present invention refer to different stages of application.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (9)

1. A high definition set top box voice data coding and decoding system is characterized by comprising:
the processing module is used for acquiring voice data to be coded from a preset server according to an audio request sent by the target set top box and processing the voice data;
the coding module is used for coding the processed voice data to obtain a coding result;
the generating module is used for generating a check code according to the characteristic parameters of the voice data to be coded;
and the verification module is used for verifying the coding result by using the check code and transmitting the coding result to the target set top box after verification so as to enable the target set top box to decode the coding result by using the decoding function of the target set top box.
2. The system of claim 1, wherein the processing module comprises:
the detection submodule is used for detecting whether the target set-top box sends an audio request or not, and if so, generating an acquisition instruction;
the first acquisition submodule is used for acquiring a voice analog signal of target voice data corresponding to the audio request from the preset server according to the acquisition instruction;
the extraction submodule is used for extracting a noise signal in the voice analog signal;
and the processing submodule is used for eliminating the noise signals in the voice analog signals to obtain the processed voice analog signals and confirming the processed voice analog signals as voice data to be coded.
3. The system of claim 1, wherein the encoding module comprises:
the first determining submodule is used for determining the signal frequency corresponding to the frame signal of each frame in the voice data to be coded;
the classification submodule is used for dividing the voice analog signal corresponding to the voice data to be coded into a long frame signal and a short frame signal according to the signal frequency of each frame signal;
the calculation submodule is used for calculating the transformation coefficient of each frame signal in the long frame signal category and the short frame signal category respectively and determining the target coding mode of the voice data to be coded according to the transformation coefficient;
and the coding submodule is used for coding the voice data to be coded by utilizing the target coding mode.
4. The system of claim 1, wherein the generating module comprises:
the second obtaining submodule is used for obtaining the characteristic parameters of the voice data to be coded;
the second determining submodule is used for determining a signal coding sequence of the voice data to be coded according to the characteristic parameters of the voice data to be coded;
the generation submodule is used for generating a standard error correcting code of each sequence factor according to the state parameter of the sequence factor in the signal coding sequence;
and the merging submodule is used for merging all standard error correcting codes to obtain the check code.
5. The voice data encoding and decoding system of claim 4, wherein the verification module comprises:
the verification submodule is used for verifying the coding frame corresponding to each sequence factor in the coding result by using the standard error correcting code of each sequence factor to obtain a verification result;
the screening submodule is used for screening target coding frames which fail to pass the verification according to the verification result;
the feedback sub-module is used for feeding the target coding frame back to the coding module for recoding;
the replacing submodule is used for replacing the target coding frame after recoding with the original target coding frame in the coding result;
and the transmission submodule is used for generating a decoding instruction and transmitting the decoding instruction and the replaced coding result to the target set top box so as to enable the target set top box to decode the coding result by utilizing the decoding function of the target set top box.
6. The system for encoding and decoding speech data of a high-definition set-top box according to claim 3, wherein before the encoding sub-module encodes the speech data by using the target encoding method, the system is further configured to:
determining the searching difficulty of the current code searching strategy based on the target coding mode;
determining whether the searching difficulty according to the current code searching strategy meets the coding requirement or not based on the performance index of the system and the complexity of the voice data to be coded, if so, not performing subsequent operation, otherwise, determining that the code searching strategy needs to be replaced;
detecting the current bandwidth of an input signal of the target set top box, and selecting N first code search strategies in a preset strategy library according to the current bandwidth;
and selecting a second code search strategy which is most matched with a target coding mode from the N first code search strategies, and replacing the current code search strategy by using the second code search strategy.
7. The system for encoding and decoding speech data of a high definition set top box according to claim 1, further comprising:
the decoding evaluation module is used for calculating the maximum decoding amount of the target set top box before the target set top box utilizes the decoding function of the target set top box to decode the coding result, and the calculating step comprises the following steps:
detecting the maximum capacity of a data buffer area of a background of a target set top box;
constructing a conversion matrix of the data buffer zone grading work of the background of the target set top box according to the maximum capacity;
calculating the maximum length of the single storage data according to the maximum capacity of the data buffer area;
determining the current length of the voice data according to the coding result, comparing the current length with the maximum length, if the current length is less than or equal to the maximum length, determining that the target set top box can directly decode the voice data, and if the current length is greater than the maximum length, extracting a gain factor of the voice data;
determining the current grading condition of the voice data according to the gain factor and the conversion matrix of the grading work of the data buffer area;
and determining the maximum decoding amount corresponding to the current grading situation as the maximum decoding amount of the target set top box.
8. The speech data coding and decoding system of claim 2, wherein the processing sub-module performs a removing process on the noise signal in the speech analog signal, and the removing process includes:
acquiring a first amplitude spectrum of the voice analog signal;
comparing the first amplitude spectrum with a second amplitude spectrum of the standard voice signal to obtain a comparison result;
determining different magnitude spectrum masks according to the comparison result;
determining a noise signal in the voice analog signal according to the amplitude spectrum mask;
calculating the average value of the amplitude spectrum mask and carrying out error elimination processing on the average value;
constructing a noise signal elimination model of the voice simulation signal according to the noise signal and the mean value of the processed amplitude spectrum mask;
acquiring a signal amplitude attenuation coefficient of the noise signal in the voice analog signal;
determining a loss function of the noise signal in the elimination process according to the signal amplitude attenuation coefficient and the probability distribution of the noise signal in the voice analog signal;
inputting a loss function of the noise signal in a cancellation process into the noise signal cancellation model to perfect the noise signal cancellation model;
acquiring a standard voice signal and a first signal frequency thereof;
training the improved noise signal elimination model by using the standard voice signal and the first signal frequency thereof to obtain the trained noise signal elimination model;
inputting a second signal frequency corresponding to the voice simulation signal into the trained noise signal elimination model to obtain a third signal frequency corresponding to a noise signal;
and eliminating a target signal corresponding to a fourth signal frequency which is the same as the third signal frequency in the voice analog signal by using a preset noise elimination method, and obtaining the voice analog signal with the noise signal eliminated.
9. A high-definition set top box voice data coding and decoding method is characterized by comprising the following steps:
acquiring voice data to be encoded from a preset server according to an audio request sent by a target set top box and processing the voice data;
coding the processed voice data to obtain a coding result;
generating a check code according to the characteristic parameters of the voice data to be coded;
and verifying the coding result by using the check code, and transmitting the coding result to the target set top box after verification so as to decode the coding result by using the decoding function of the target set top box.
CN202111460183.1A 2021-12-02 2021-12-02 High-definition set top box voice data encoding and decoding system and method Active CN114171034B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111460183.1A CN114171034B (en) 2021-12-02 2021-12-02 High-definition set top box voice data encoding and decoding system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111460183.1A CN114171034B (en) 2021-12-02 2021-12-02 High-definition set top box voice data encoding and decoding system and method

Publications (2)

Publication Number Publication Date
CN114171034A true CN114171034A (en) 2022-03-11
CN114171034B CN114171034B (en) 2024-05-14

Family

ID=80482423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111460183.1A Active CN114171034B (en) 2021-12-02 2021-12-02 High-definition set top box voice data encoding and decoding system and method

Country Status (1)

Country Link
CN (1) CN114171034B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01257999A (en) * 1988-04-08 1989-10-16 Nec Corp Voice signal encoding and decoding method, voice signal encoder and voice signal decoder
CN101165778A (en) * 2006-10-18 2008-04-23 宝利通公司 Dual-transform coding of audio signals
KR100917990B1 (en) * 2009-02-16 2009-09-18 주식회사 빅슨 Settop-box having voice recognition function of singing room and method thereof
CN101854308A (en) * 2010-06-09 2010-10-06 武汉必联网络技术有限公司 Self-adaptation realizing method of high-tone quality service network of VoIP system
CN103247293A (en) * 2013-05-14 2013-08-14 中国科学院自动化研究所 Coding method and decoding method for voice data
CN103377653A (en) * 2012-04-20 2013-10-30 展讯通信(上海)有限公司 Method and device for searching algebraic code table in speech coding, and speech coding method
RU2019122302A (en) * 2008-01-04 2021-01-18 Долби Интернэшнл Аб AUDIO CODER AND DECODER

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01257999A (en) * 1988-04-08 1989-10-16 Nec Corp Voice signal encoding and decoding method, voice signal encoder and voice signal decoder
CN101165778A (en) * 2006-10-18 2008-04-23 宝利通公司 Dual-transform coding of audio signals
RU2019122302A (en) * 2008-01-04 2021-01-18 Долби Интернэшнл Аб AUDIO CODER AND DECODER
KR100917990B1 (en) * 2009-02-16 2009-09-18 주식회사 빅슨 Settop-box having voice recognition function of singing room and method thereof
CN101854308A (en) * 2010-06-09 2010-10-06 武汉必联网络技术有限公司 Self-adaptation realizing method of high-tone quality service network of VoIP system
CN103377653A (en) * 2012-04-20 2013-10-30 展讯通信(上海)有限公司 Method and device for searching algebraic code table in speech coding, and speech coding method
CN103247293A (en) * 2013-05-14 2013-08-14 中国科学院自动化研究所 Coding method and decoding method for voice data

Also Published As

Publication number Publication date
CN114171034B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
US9456273B2 (en) Audio mixing method, apparatus and system
JP4053424B2 (en) Robust checksum
CN1494712A (en) Distributed voice recognition system using acoustic feature vector modification
CN106937121A (en) Image decoding and coding method, decoding and code device, decoder and encoder
DE60000087T2 (en) Reliability evaluation of decoded signal blocks for speech recognition on wireless transmission channels
CN111371534B (en) Data retransmission method and device, electronic equipment and storage medium
CN111541900B (en) Security and protection video compression method, device, equipment and storage medium based on GAN
CN111464262B (en) Data processing method, device, medium and electronic equipment
US11871017B2 (en) Video data processing
CN112767955B (en) Audio encoding method and device, storage medium and electronic equipment
CN102376306B (en) Method and device for acquiring level of speech frame
WO2021028236A1 (en) Systems and methods for sound conversion
Grassucci et al. Generative AI meets semantic communication: Evolution and revolution of communication tasks
CN114171034B (en) High-definition set top box voice data encoding and decoding system and method
CN113823303A (en) Audio noise reduction method and device and computer readable storage medium
CN112104863A (en) Method and related device for training video quality evaluation model and evaluating video quality
CN106937127B (en) Display method and system for intelligent search preparation
CN115379291B (en) Code table updating method, device, equipment and storage medium
JP2002530931A (en) Method and apparatus for processing received data in a distributed speech recognition process
CN110958417A (en) Method for removing compression noise of video call video based on voice clue
CN115273828A (en) Training method and device of voice intention recognition model and electronic equipment
CN116074574A (en) Video processing method, device, equipment and storage medium
CN112435675A (en) FEC-based audio coding method, device, equipment and medium
Yang et al. Cascaded trellis-based rate-distortion control algorithm for MPEG-4 advanced audio coding
CN113936669A (en) Data transmission method, system, device, computer readable storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240416

Address after: 518000 5th floor, building a, 15 shapuwei Chuangye Industrial Zone, Songgang street, Bao'an District, Shenzhen City, Guangdong Province

Applicant after: SHENZHEN GAOSUDA TECHNOLOGY Co.,Ltd.

Country or region after: China

Applicant after: Shenzhen Expressway Da Industrial Co.,Ltd.

Address before: 518000 5th floor, building a, 15 shapuwei Chuangye Industrial Zone, Songgang street, Bao'an District, Shenzhen City, Guangdong Province

Applicant before: SHENZHEN GAOSUDA TECHNOLOGY Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant