CN111294642B

CN111294642B - Video stream playing method and device

Info

Publication number: CN111294642B
Application number: CN201811504930.5A
Authority: CN
Inventors: 姚玉兵
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2021-06-18
Anticipated expiration: 2038-12-10
Also published as: CN111294642A; WO2020119692A1

Abstract

The embodiment of the application provides a video stream playing method and device. The method comprises the following steps: acquiring voiceprint data of an input audio signal as verification voiceprint data; determining whether the verification voiceprint data is matched with authorized voiceprint data carried by a video stream to be played, wherein the authorized voiceprint data is voiceprint data of an audio signal acquired when the video stream to be played is encrypted; and if the verified voiceprint data is matched with the authorized voiceprint data, playing the video stream to be played. The method can utilize authorized voiceprint data carried in the video stream to be played to verify the identity of a person who is ready to view the video stream to be played, and since the voiceprint is irrelevant to the specific content and the voiceprints of different persons are different, the authorized person can normally view the video stream to be played without memorizing any secret key, and irrelevant persons cannot normally view the video stream to be played.

Description

Video stream playing method and device

Technical Field

The present application relates to the field of video security technologies, and in particular, to a method and an apparatus for playing a video stream.

Background

For practical purposes, it may not be desirable for some video streams to be viewed by unrelated people. These video streams may be encrypted so that no extraneous people can normally play the video streams. In the related art, the video stream may be encrypted by an AES (Advanced Encryption Standard) algorithm using an Encryption key. The encrypted video stream can be normally played only by decrypting through an AES algorithm by using the encryption key.

However, if the setting of the encryption key is simple, the encryption key is easy to be cracked, and if the setting of the encryption key is complex, the user cannot remember the setting conveniently.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method for playing a video stream, so that unrelated people cannot normally play the video stream without a user memorizing a key. The specific technical scheme is as follows:

in a first aspect of an embodiment of the present application, a method for playing a video stream is provided, where the method includes:

acquiring voiceprint data of an input audio signal as verification voiceprint data;

determining whether the verification voiceprint data is matched with authorized voiceprint data carried by a video stream to be played, wherein the authorized voiceprint data is voiceprint data of an audio signal acquired when the video stream to be played is encrypted;

and if the verified voiceprint data is matched with the authorized voiceprint data, playing the video stream to be played.

With reference to the first aspect, in a first possible implementation manner, the authorized voiceprint data is stored in the video stream to be played in the following manner:

storing the bit length of the authorized voiceprint data by using a preset number of bytes from a first storage position in the video stream to be played;

storing the authorized voiceprint data by using the bit-length bytes starting from a first storage position in the video stream to be played;

before the determining whether the verification voiceprint data is matched with authorized voiceprint data carried by a video stream to be played, the method further includes:

reading the preset number of bytes from the first storage position of the video stream to be played to obtain the bit length of the authorized voiceprint data;

and reading the bit length bytes starting from the second storage position of the video stream to be played to obtain the authorized voiceprint data.

With reference to the first aspect, in a second possible implementation manner, before the acquiring the voiceprint data of the input audio signal as the verification voiceprint data, the method further includes:

displaying a first dynamic verification code, wherein the first dynamic verification code is dynamically generated;

carrying out voice recognition on an input audio signal to obtain the voice content of the audio signal;

determining whether the voice content matches the first dynamic verification code;

if the voice content is matched with the dynamic verification code, executing the step of acquiring the voiceprint data of the input audio signal as verification voiceprint data;

and if the voice content is not matched with the dynamic verification code, refusing to play the video stream to be played.

With reference to the first aspect, in a third possible implementation manner, the authorized voiceprint data is obtained by:

displaying a plurality of second dynamic verification codes, wherein the second dynamic verification codes are dynamically generated;

acquiring an audio signal input for each second dynamic verification code;

performing voice recognition on each input audio signal to obtain the voice content of the audio signal;

and if the voice content of each input audio signal is matched with the second dynamic verification code corresponding to the audio signal, acquiring the voiceprint data of all the audio signals as authorized voiceprint data.

With reference to the first aspect, in a fourth possible implementation manner, the acquiring voiceprint data of an input audio signal as verification voiceprint data includes:

carrying out voice activity detection on an input audio signal, and extracting a human voice signal in the audio signal;

and acquiring the voiceprint data of the human voice signal as verification voiceprint data.

With reference to the first aspect, in a fifth possible implementation manner, the acquiring voiceprint data of an input audio signal as verification voiceprint data includes:

performing voiceprint modeling on an input audio signal to obtain voiceprint model data of the audio signal, wherein the voiceprint model data is used as verification voiceprint data;

and the authorized voiceprint data is voiceprint model data of the audio signal acquired when the video stream to be played is encrypted.

In a second aspect of the embodiments of the present application, there is provided a video stream playing apparatus, including:

the voice print identification module is used for acquiring voice print data of the input audio signal as verification voice print data;

the voiceprint comparison module is used for determining whether the verification voiceprint data is matched with authorized voiceprint data carried by the video stream to be played, wherein the authorized voiceprint data is voiceprint data of an audio signal acquired when the video stream to be played is encrypted;

and the code stream playing module is used for playing the video stream to be played if the verification voiceprint data is matched with the authorized voiceprint data.

With reference to the second aspect, in a first possible implementation manner, the authorized voiceprint data is stored in the video stream to be played by:

the device further comprises a voiceprint analysis module, configured to read the preset number of bytes starting from the first storage location of the video stream to be played to obtain a bit length of authorized voiceprint data before determining whether the verified voiceprint data matches authorized voiceprint data carried by the video stream to be played;

With reference to the second aspect, in a second possible implementation manner, the apparatus further includes a dynamic verification module, configured to display a first dynamic verification code before the voiceprint data of the input audio signal is obtained and used as verification voiceprint data, where the first dynamic verification code is dynamically generated;

if the voice content is not matched with the dynamic verification code, refusing to play the video stream to be played;

the voiceprint recognition module is specifically configured to execute the step of acquiring the voiceprint data of the input audio signal as verification voiceprint data if the voice content matches the dynamic verification code.

With reference to the second aspect, in a third possible implementation manner, the authorized voiceprint data is obtained by:

acquiring an audio signal input for each second dynamic verification code;

With reference to the second aspect, in a fourth possible implementation manner, the voiceprint recognition module is specifically configured to perform voice activity detection on an input audio signal, and extract a human voice signal in the audio signal;

With reference to the second aspect, in a fifth possible implementation manner, the voiceprint recognition module is specifically configured to perform voiceprint modeling on an input audio signal to obtain voiceprint model data of the audio signal, where the voiceprint model data is used as verification voiceprint data;

In a third aspect of embodiments of the present application, there is provided an electronic device, including:

a memory for storing a computer program;

and the processor is used for realizing any video stream playing method when executing the program stored in the memory.

In a fourth aspect of embodiments of the present application, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the video stream playing methods described above.

The method and the device for playing the video stream can verify the identity of a person who is ready to watch the video stream by using the authorized voiceprint data carried in the video stream to be played, and the voiceprint is irrelevant to the specific content and different voiceprints of different persons, so that the authorized person can normally watch the video stream to be played without memorizing any secret key, and the irrelevant person cannot normally watch the video stream to be played. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a video stream playing method according to an embodiment of the present application;

fig. 2 is another schematic flow chart of a video stream playing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of an authorized voiceprint data acquisition method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of an authorized voiceprint data reading method according to an embodiment of the present application;

fig. 5a is a schematic structural diagram of a video stream playing apparatus according to an embodiment of the present application;

fig. 5b is a schematic structural diagram of a video stream playing apparatus according to an embodiment of the present application;

fig. 5c is a schematic structural diagram of a video stream playing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of a video stream playing method provided in an embodiment of the present application, and the method may include:

s101, acquiring voiceprint data of the input audio signal as verification voiceprint data.

The Voice signal may be obtained by performing VAD (Voice Activity Detection) processing on the input audio signal to remove the noise signal and the mute signal in the audio signal. And inputting the human voice signal into a pre-trained voiceprint neural network to obtain verification voiceprint data.

The verification voiceprint data may be different types of data according to actual requirements, and for example, in this embodiment of the present application, the verification voiceprint data may be an unsigned char array with a maximum of 128 bits.

S102, determining whether the verification voiceprint data is matched with authorized voiceprint data carried by the video stream to be played.

The authorized voiceprint data is the voiceprint data of the audio signal collected when the video stream to be played is encrypted. The method of obtaining authorized voiceprint data may be the same as the method of obtaining verified voiceprint data.

The video stream to be played may carry only one authorized voiceprint data or may carry a plurality of authorized voiceprint data. For example, in some application scenarios, a user may want a video stream to be played to be viewed only by himself, and then only his own voice may be included in an audio signal input when the video stream to be played is encrypted. In other application scenarios, a user may want a video stream to be played to be viewed only by designated people including the user, for example, family members of the user may include sounds of the family members in an audio signal input when the video stream to be played is encrypted.

If the video stream to be played only carries one authorized voiceprint data, the similarity between the verified voiceprint data and the authorized voiceprint data can be calculated, if the similarity is higher than a preset similarity threshold, the verified voiceprint data is determined to be matched with the authorized voiceprint data, and if the similarity is lower than the preset similarity threshold, the verified voiceprint data is determined not to be matched with the authorized voiceprint data.

If the video stream to be played carries a plurality of authorized voiceprint data, the similarity between the verified voiceprint data and the plurality of authorized voiceprint data can be respectively calculated, if the maximum value of the similarity is larger than a preset similarity threshold value, the verified voiceprint data is determined to be matched with the authorized voiceprint data, and if the maximum value of the similarity is not larger than the preset similarity threshold value, the verified voiceprint data is determined not to be matched with the authorized voiceprint data.

S103, if the voiceprint data is matched with the authorized voiceprint data through verification, the video stream to be played is played.

The voiceprint is a biological factor of a human, and the voiceprints of different people can be considered to be different, so that the voiceprint data corresponding to the voiceprints of different people are different under the condition that the voiceprint model is established accurately enough.

If the voiceprint data is verified to be matched with the authorized voiceprint data, the input audio signal and the audio signal input when the video stream to be played is encrypted can be considered to contain the sound signal of the same person, and the sound signal contained in the audio signal input when the video stream to be played is encrypted can be considered to be a person (hereinafter, referred to as an authorized person) permitted to watch the video stream to be played, so that the input audio signal and the audio signal input when the video stream to be played is encrypted contain the sound signal of the same person, which can indicate that a person (hereinafter, referred to as an authenticated person) ready to watch the video stream to be played is an authorized person, and therefore, the video stream to be played can be normally played under the condition. On the other hand, if the person to be verified is not an authorized person (i.e., the person to be verified is an unrelated person), the verification voiceprint data is not matched with the authorized voiceprint data, and therefore the video stream to be played cannot be played normally.

By adopting the embodiment, the authorized voiceprint data can be carried in the video stream to be played, and the video stream to be played can be played under the condition that the voiceprint data is verified to be matched with the carried authorized voiceprint data, so that authorized personnel can normally watch the video stream to be played without memorizing a secret key, and irrelevant personnel cannot normally watch the video stream to be played.

Referring to fig. 2, fig. 2 is a schematic flow chart of another video stream playing method provided in the embodiment of the present application, which may include:

s201, displaying the first dynamic verification code.

The first dynamic verification code is dynamically generated, and may be generated based on a chaotic equation or a random number table, for example. Taking an execution main body of the method as a user terminal (such as a mobile phone) as an example, the first dynamic verification code may be randomly generated by the user terminal according to a preset random algorithm, or may be generated by the user terminal sending a verification request to the server and receiving verification information sent by the server, where the verification information includes the first dynamic verification code, and the first dynamic verification code is dynamically generated by the server. The first dynamic verification code may be a character string including one or more characters of numbers, chinese characters, letters, etc. according to actual requirements. In the embodiment of the present application, the first dynamic verification code may be a character string including only numbers, for example, a character string consisting of 8 numbers, such as "59347826", for user identification.

S202, carrying out voice recognition on the input audio signal to obtain the voice content of the audio signal.

The audio signal may be input into a speech recognition neural network trained in advance to obtain the speech content of the audio signal. The voice content of the audio signal refers to the voice content corresponding to the human voice signal in the audio signal. Illustratively, assuming that the input audio signal is derived from the sound emitted by the audio input device when the user enters "ABC", the speech content of the audio signal is "ABC" regardless of the error in speech recognition.

S203, determining whether the voice content is matched with the first dynamic verification code, if the voice content is matched with the first dynamic verification code, executing S204, and if the voice content is not matched with the first dynamic verification code, executing S207.

The matching between the voice content and the first dynamic verification code may mean that the similarity between the voice content and the first dynamic verification code is higher than a preset threshold. Since the first dynamic verification code is dynamically generated, the first dynamic verification code cannot be known in advance theoretically. If the voice content is matched with the first dynamic verification code, the input audio signal can be considered as the audio signal recorded by the audio input device after the person to be verified observes the displayed first dynamic verification code. In this case, therefore, it can be considered that the voice of the person to be authenticated is entered in the input audio signal.

If the voice content is not matched with the first dynamic verification code, the input audio signal is not considered to be the audio signal recorded by the person to be verified through the audio input device after observing the displayed first dynamic verification code. Therefore, in this case, it can be considered that the voice of the person to be authenticated is not entered in the input audio signal. For example, assuming that the person to be authenticated is an unrelated person, the main execution body of the method is a user terminal, and the audio frequency of the authorized person for speaking is stored in the portable audio playing device in advance, the input audio signal may be the audio frequency of the authorized person for speaking played by the person to be authenticated through the portable audio playing device, and the audio signal is input to the user terminal. Since the authorized person and the person to be authenticated cannot know the first dynamic verification code in advance, the authorized person cannot say the first dynamic verification code theoretically during the speaking process, and the person to be authenticated cannot induce the authorized person to say the first dynamic verification code. The speech content of the audio signal cannot theoretically be matched to the first dynamic verification code.

And S204, acquiring the voiceprint data of the input audio signal as verification voiceprint data.

The step is the same as S101, and reference may be made to the foregoing description about S101, which is not described herein again.

S205, determining whether the voiceprint data are matched with authorized voiceprint data carried by the video stream to be played, if the voiceprint data are matched with the authorized voiceprint data carried by the video stream to be played, executing S206, and if the voiceprint data are not matched with the authorized voiceprint data carried by the video stream to be played, executing S207.

The step is the same as S102, and reference may be made to the foregoing description about S102, which is not repeated herein.

S206, playing the video stream to be played.

This step is the same as S103, and reference may be made to the foregoing description about S103, which is not described herein again.

And S207, refusing to play the video stream to be played.

In other optional embodiments, the identity of the person to be authenticated may also be authenticated by a preset identity authentication method. If the person to be verified is verified as an authorized person, playing the video stream to be played,

as analyzed above, if the voice content does not match the first dynamic verification code, it may be considered that the voice of the person to be verified is not recorded in the input audio signal, and in this case, even if the voiceprint data of the input audio signal is obtained as the verification voiceprint data, the verification voiceprint data may not be able to effectively verify whether the person to be verified is an authorized person, and the video stream to be played may be directly rejected from the viewpoint of protecting the privacy of the user.

Still taking the example in S203 as an example, if the voiceprint data of the input audio signal is continuously acquired as the verification voiceprint data, the verification voiceprint data is actually the voiceprint data representing the voiceprint of the authorized person, but not the voiceprint data representing the voiceprint of the person to be verified, and the person to be verified may be determined as the authorized person by mistake. Namely, the irrelevant person can impersonate the authorized person by using the pre-recorded audio of the authorized person to watch the video stream to be played. By adopting the embodiment, the situation that irrelevant personnel successfully utilizes the pre-recorded audio of the speech of the authorized personnel to impersonate the authorized personnel can be effectively avoided by requiring the personnel to be verified to recite the dynamically generated first dynamic verification code, and the safety of the video stream to be played is further improved.

The video stream playing method provided by the embodiment of the application determines whether a person to be verified authorizes the person by verifying whether voiceprint data is matched with authorized voiceprint data. Therefore, whether the authorized voiceprint data is accurate directly affects the accuracy of the video stream playing method provided by the embodiment of the application. For example, if the voiceprint represented by the authorized voiceprint data is far from the actual voiceprint of the authorized person, it may result in the authorized person not being able to normally view the video stream to be played. To this end, an embodiment of the present application provides an authorized voiceprint data acquisition method, which may be referred to in fig. 3, and includes:

s301, displaying a plurality of second dynamic verification codes.

The number of the displayed second dynamic verification codes can be set according to actual requirements, and an example can be to display 5 second dynamic verification codes. The second dynamic verification code and the first dynamic verification code may be obtained in the same or different manners. In this embodiment, the second dynamic verification code may be a character string including only numbers, and in other alternative embodiments, the second dynamic verification code may also include one or more characters of letters, chinese characters, and the like.

S302, an audio signal input for each second dynamic verification code is acquired.

Taking the execution main body as the user terminal as an example, the user terminal may start mic (microphone) audio acquisition, and respectively input the sound of each second dynamic verification code spoken by the user. In other embodiments, S302 may also be performed alternatively to S301, for example, displaying the first second dynamic verification code, acquiring the audio signal input for the first second dynamic verification code, displaying the second dynamic verification code, acquiring the audio signal … … input for the second dynamic verification code, and so on until the audio signal input for the last second dynamic verification code is acquired.

S303, perform speech recognition on each input audio signal to obtain the speech content of the audio signal.

For the voice content and the obtaining method, reference may be made to the relevant description in S202, and details are not repeated here. In other alternative embodiments, S303 may also be performed alternately with S302, or may also be performed alternately with S301 and S302. For example, after an audio signal input for a certain second dynamic verification code is acquired each time, voice recognition may be performed on the audio signal to obtain voice content of the audio signal, and then an audio signal input for a next second dynamic verification code may be acquired.

S304, if the voice content of each input audio signal is matched with the second dynamic verification code corresponding to the audio signal, acquiring the voiceprint data of the plurality of audio signals as authorized voiceprint data.

If the voice content of an audio signal is matched with the second dynamic verification code aiming at the audio signal, the audio signal is recorded with the sound of the user reciting the second dynamic verification code. If the voice content of each input audio signal is matched with the second dynamic verification code for the audio signal, the voice of the corresponding second dynamic verification code uttered by the user is recorded in each audio signal. In this case, it can be considered that a sufficient number of sound samples of the user have been entered in the plurality of audio signals, and the voiceprint data acquired based on the sufficient number of sound samples of the user is more accurate.

In some application scenarios, there may be both a video stream to be played that is not intended to be viewed by unrelated people, and a video stream to be played that may be viewed by all people. In the embodiment of the present application, an identifier may be set in the header of the encapsulation format to distinguish between a video stream to be played that is not desired to be watched by an unrelated person and a video stream to be played that can be watched by all persons. And authorized voiceprint data for verifying the identity of the person to be verified is carried in the video stream to be played, which is not expected to be watched by the irrelevant person. In order to accurately read authorized voiceprint data from the video stream to be played, see fig. 4, where fig. 4 is a schematic flow chart of a voiceprint data reading method provided by an embodiment of the present application, which may include:

s401, reading a preset number of bytes from a first storage position of a video stream to be played to obtain a bit length of authorized voiceprint data.

The preset number may be set according to an actual requirement, and for example, in this embodiment of the present application, a four-byte unsigned reshaped int (unsigned int) may be used to store a bit length of the authorized voiceprint data. In other embodiments, other data formats may be used to store the bit length of the authorized voiceprint data.

S402, reading the bit length byte starting from the second storage position of the video stream to be played to obtain authorized voiceprint data.

Illustratively, assuming the bit length is 126, the first 126 bytes of the second storage location are read to obtain the authorized voiceprint data. In this embodiment, the first storage location and the second storage location may be selected according to actual requirements, and in this embodiment, it is assumed that the authorized voiceprint data does not exceed 128 bytes at most, and in order to avoid collision with other data in the video stream to be played, a start location of the consecutive reserved bits of 4 bytes may be selected as the first storage location, and a start location of the consecutive reserved bits of 128 bytes may be selected as the second storage location.

The storage method of the authorized voiceprint data should be consistent with the reading method of the authorized voiceprint data, and will not be described herein again. In the embodiment of the present application, the authorized voiceprint data and the bit length may be stored in the form of a structure, and the format of the structure may be as follows:

{

unsigned int nModelSize；

unsigned char strModel[MODEL_MAX_SIZE]

}

wherein, nModelSize is unsigned shaping data used for representing bit length, strModel is unsigned character array used for representing authorized voiceprint data, and MODEL _ MAX _ SIZE is maximum bit length of the authorized voiceprint data.

In other application scenarios, if the video stream to be played includes a plurality of authorized voiceprint data models, a plurality of first storage locations and a plurality of second storage locations may be correspondingly set, a preset number of bytes from the plurality of first storage locations are respectively read to obtain a bit length of each authorized voiceprint data, and for each authorized voiceprint data, a bit length of the authorized voiceprint data from the corresponding second storage location is read to obtain the authorized voiceprint data.

Referring to fig. 5a, fig. 5a is a schematic structural diagram of a video stream playing apparatus according to an embodiment of the present application, which may include:

a voiceprint recognition module 501, configured to acquire voiceprint data of an input audio signal, where the voiceprint data is used as verification voiceprint data;

a voiceprint comparison module 502, configured to determine whether the verification voiceprint data is matched with authorized voiceprint data carried by the video stream to be played, where the authorized voiceprint data is voiceprint data of an audio signal acquired when the video stream to be played is encrypted;

and the code stream playing module 503 is configured to play the video stream to be played if it is verified that the voiceprint data matches the authorized voiceprint data.

In an alternative embodiment, the authorized voiceprint data is stored in the video stream to be played by:

storing authorized voiceprint data by using the bit length bytes starting from the first storage position in the video stream to be played;

as shown in fig. 5b, the apparatus may further include a voiceprint parsing module 504, configured to, before determining whether the voiceprint data is matched with authorized voiceprint data carried by the video stream to be played, read a preset number of bytes from a first storage location of the video stream to be played, so as to obtain a bit length of the authorized voiceprint data;

and reading the bit length bytes starting from the second storage position of the video stream to be played to obtain authorized voiceprint data.

In an alternative embodiment, as shown in fig. 5c, the apparatus may further include a dynamic verification module 505, configured to display a first dynamic verification code before acquiring the input audio signal for voiceprint data to be used as verification voiceprint data, where the first dynamic verification code is dynamically generated;

and the voiceprint recognition module is specifically used for acquiring the voiceprint data of the input audio signal as the step of verifying the voiceprint data if the voice content is matched with the dynamic verification code.

In an alternative embodiment, the authorized voiceprint data is obtained by:

acquiring an audio signal input for each second dynamic verification code;

In an alternative embodiment, the voiceprint recognition module 501 is specifically configured to perform voice activity detection on an input audio signal, and extract a human voice signal in the audio signal;

and acquiring voiceprint data of the human voice signal as verification voiceprint data.

In an optional embodiment, the voiceprint recognition module 501 is specifically configured to perform voiceprint modeling on an input audio signal to obtain voiceprint model data of the audio signal, where the voiceprint model data is used as verification voiceprint data;

the authorized voiceprint data is voiceprint model data of an audio signal acquired when the video stream to be played is encrypted.

An embodiment of the present application further provides an electronic device, as shown in fig. 6, including:

a memory 601 for storing a computer program;

the processor 602 is configured to implement the following steps when executing the program stored in the memory 601:

determining whether the verification voiceprint data is matched with authorized voiceprint data carried by the video stream to be played, wherein the authorized voiceprint data is voiceprint data of an audio signal collected when the video stream to be played is encrypted;

and if the voiceprint data is matched with the authorized voiceprint data through verification, playing the video stream to be played.

before determining whether the voiceprint data is matched with authorized voiceprint data carried by the video stream to be played, the method further comprises the following steps:

reading a preset number of bytes from a first storage position of a video stream to be played to obtain the bit length of authorized voiceprint data;

In an alternative embodiment, before obtaining the voiceprint data of the input audio signal as the verification voiceprint data, the method further comprises:

if the voice content matches the dynamic verification code, executing the step of acquiring voiceprint data of the input audio signal as verification voiceprint data;

In an alternative embodiment, the authorized voiceprint data is obtained by:

acquiring an audio signal input for each second dynamic verification code;

In an alternative embodiment, acquiring voiceprint data of an input audio signal as verification voiceprint data includes:

carrying out voice activity detection on the input audio signals, and extracting human voice signals in the audio signals;

In an alternative embodiment, obtaining the input audio signal for voiceprint modeling as verification voiceprint data comprises:

The Memory mentioned in the above electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In another embodiment provided by the present application, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to execute any one of the video stream playing methods in the foregoing embodiments.

In another embodiment provided by the present application, there is also provided a computer program product containing instructions, which when run on a computer, causes the computer to execute any one of the video stream playing methods in the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, the computer-readable storage medium, and the computer program product, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method for playing a video stream, the method comprising:

if the verification voiceprint data is matched with the authorized voiceprint data, playing the video stream to be played;

before the acquiring the voiceprint data of the input audio signal as the verification voiceprint data, the method further comprises:

2. The method according to claim 1, wherein the authorized voiceprint data is stored in the video stream to be played by:

storing the authorized voiceprint data by using the bit-length bytes starting from a second storage position in the video stream to be played;

3. The method of claim 1, wherein the authorized voiceprint data is obtained by:

acquiring an audio signal input for each second dynamic verification code;

4. The method according to claim 1, wherein the obtaining of the voiceprint data of the input audio signal as the verification voiceprint data comprises:

5. The method according to claim 1, wherein the obtaining of the voiceprint data of the input audio signal as the verification voiceprint data comprises:

6. A video stream playback apparatus, characterized in that the apparatus comprises:

the code stream playing module is used for playing the video stream to be played if the verification voiceprint data is matched with the authorized voiceprint data;

the device also comprises a dynamic verification module, which is used for displaying a first dynamic verification code before the voiceprint data of the input audio signal is acquired and used as verification voiceprint data, wherein the first dynamic verification code is dynamically generated;

7. The apparatus according to claim 6, wherein the authorized voiceprint data is stored in the video stream to be played by:

8. The apparatus of claim 6, wherein the authorized voiceprint data is obtained by:

acquiring an audio signal input for each second dynamic verification code;

9. The apparatus according to claim 6, wherein the voiceprint recognition module is specifically configured to perform voice activity detection on an input audio signal, and extract a human voice signal from the audio signal;

10. The apparatus according to claim 6, wherein the voiceprint recognition module is specifically configured to perform voiceprint modeling on an input audio signal to obtain voiceprint model data of the audio signal as verification voiceprint data;