CN115119009B

CN115119009B - Video alignment method, video encoding device and storage medium

Info

Publication number: CN115119009B
Application number: CN202210759941.8A
Authority: CN
Inventors: 王健; 王龙君; 王�忠; 何广
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2023-09-01
Anticipated expiration: 2042-06-29
Also published as: CN115119009A

Abstract

The application relates to a video alignment method, a video coding device and a storage medium. In the encoding process, the standard index number of the image group is corrected based on the encoding delay of the image group, and the corrected index number is larger than the standard index number, so that when a slice is generated, the encoding delay can be tracked based on the corrected index number, thereby avoiding the problem that a corresponding slice is not produced when a playing end downloads the slice on a generating server according to the time under the standard frame rate, and causing the production delay of the slice to be generated by a live broadcasting cloud, and realizing the alignment of live broadcasting code streams with different definition accurately under a live broadcasting scene.

Description

Video alignment method, video encoding device and storage medium

Technical Field

The present application relates to the field of computers, and in particular, to a video alignment method, a video encoding device, and a storage medium.

Background

Currently, HLS (HTTP Live Streaming, hypertext live streaming) protocol is the dominant technology for implementing live services. Video live broadcast is carried out based on an HLS protocol, so that the problem of blocking during video live broadcast can be solved. For example, in a scene with obvious network fluctuation such as a subway, live video jamming always occurs, and the occurrence of jamming can be well avoided by switching to a multimedia stream with low definition.

However, since slices of multimedia streams with different definitions are generally distributed in different servers or in different task processes of the same server, in a live video scene, when switching of multimedia streams with different definitions is performed, a problem that slices of multimedia streams with different definitions and audio/video time stamps cannot be aligned accurately occurs.

Disclosure of Invention

The application provides a video alignment method, a video coding device and a storage medium, which are used for solving the problems that slices of multimedia streams with different definition and audio and video time stamps cannot be aligned accurately.

In a first aspect, a video encoding method is provided, including:

for any one live broadcast code stream in N live broadcast code streams, acquiring the coding time, standard index number and initial reference time of any one image group in the any one live broadcast code stream; the initial reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on an encoding production virtual time axis;

calculating the coding delay of any one image group based on the coding time, the standard index number and the initial reference time;

Updating the standard index number according to the coding delay to obtain the index number of any one image group; and when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number.

Optionally, calculating the coding delay of the arbitrary image group based on the coding time, the standard index number and the start reference time includes:

calculating the product of the standard index number and the standard image group duration to obtain the coding moment offset of any one image group on the coding production virtual time axis;

calculating the sum of the coding moment offset and the initial reference time to obtain a summation result;

and calculating the difference between the coding time and the summation result to obtain the coding delay.

Optionally, updating the standard index number according to the coding delay to obtain the index number of the arbitrary image group, including:

judging whether the coding delay is longer than N standard image group time lengths or not;

if the time length of the image group is longer than the time length of the N standard image groups, determining that the index number of any one image group is the standard index number plus N;

and if the code delay is not greater than the N standard image group duration, updating N=N-1, returning to the step of judging whether the code delay is greater than the N standard image group duration, and if the code delay is greater than the 1 standard image group duration, determining that the index number of any one image group is the standard index number plus 1, and if the code delay is not greater than the 1 standard image group duration, determining that the index number of any one image group is the standard index number.

In a second aspect, a video alignment method is provided, including:

any one image group in one live code stream is obtained;

analyzing any one image group, and acquiring an alignment parameter of the any one image group, wherein the alignment parameter comprises a start reference time, an index number of the any one image group and a time stamp offset of a first video frame in the any one image group, the start reference time is the encoding completion time of the first video frame in the N-path live broadcast code stream on an encoding production virtual time axis, and the index number of the any one image group is related to encoding delay of the any one image group; the time stamp offset is an offset of the encoding time stamp of the first video frame relative to the first video frame in a target image group, wherein the target image group is the first image group processed when a slicing task for slicing any one image group is started;

determining the segmentation position of any one image group in the live code stream according to the index number of the any one image group;

and determining the audio playing time and the video playing time of any image group based on the starting reference time and the timestamp offset.

Optionally, determining the splitting position of the arbitrary image group in the one-path live code stream according to the index number of the arbitrary image group includes:

calculating the product of the index number and the time length of the standard image group to obtain the standard coding time offset of any one image group on the coding production virtual time axis;

obtaining a quotient value and a remainder value obtained by dividing the standard coding time offset by a preset slicing period duration;

determining a serial number of a slice to which the arbitrary image group belongs based on the quotient value, and determining a position of the arbitrary image group in the slice to which the arbitrary image group belongs based on the remainder value;

and taking the serial number of the slice and the position in the slice to which the serial number belongs as the slicing position.

Optionally, determining the audio playing time and the video playing time of the arbitrary image group based on the start reference time and the timestamp offset includes:

calculating the sum of the initial reference time and the timestamp offset to obtain the time offset of the first video frame in any one image group on the coding production virtual time axis;

for each video frame in any one image group, correcting a video coding time stamp and a video display time stamp of each video frame by adopting the time offset to obtain a corrected video coding time stamp corresponding to the video coding time stamp and a corrected video display time stamp corresponding to the video display time stamp; and respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting the time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp;

The corrected video coding time stamp and the corrected video display time stamp are used as the video time stamp of each video frame; and taking the corrected audio coding time stamp and the corrected audio display time stamp as the audio time stamp of each audio frame;

taking the playing time indicated by the video time stamps of all video frames in any one image group as the video playing time of any one image group; and taking the playing time indicated by the audio time stamps of all the audio frames in any one image group as the audio playing time of any one image group.

Optionally, for each video frame in the arbitrary image group, correcting the video coding timestamp and the video display timestamp of each video frame by using the time offset to obtain a corrected video coding timestamp corresponding to the video coding timestamp and a corrected video display timestamp corresponding to the video display timestamp, including:

calculating the sum of the video coding time stamp and the time offset of each video frame to obtain a first summation result; calculating the sum of the video display time and the time offset of each video frame to obtain a second summation result;

The first summation result is used as the corrected video coding time stamp, and the second summation result is used as the corrected video display time stamp.

Optionally, the time offset is used to correct the audio coding time stamp and the audio display time stamp of each audio frame respectively, so as to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp, which includes:

calculating the sum of the audio coding time stamp and the time offset of each audio frame to obtain a first summation result; calculating the sum of the audio display time and the time offset of each audio frame to obtain a second summation result;

the first summation result is used as the corrected audio coding time stamp, and the second summation result is used as the corrected audio display time stamp.

In a third aspect, there is provided a video encoding apparatus comprising:

the first acquisition module is used for acquiring the encoding time, the standard index number and the initial reference time of any image group in any one of the N paths of live broadcast code streams; the initial reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on an encoding production virtual time axis;

The calculating module is used for calculating the coding delay of any one image group based on the coding moment, the standard index number and the initial reference time;

the updating module is used for updating the standard index number according to the coding delay to obtain the index number of any one image group; and when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number.

In a fourth aspect, there is provided a video alignment apparatus comprising:

the second acquisition module is used for acquiring any one image group in one path of live code stream;

the analyzing module is used for analyzing any one image group and acquiring an alignment parameter of the any one image group, wherein the alignment parameter comprises a starting reference time, an index number of the any one image group and a time stamp offset of a first video frame in the any one image group, the starting reference time is the encoding completion time of the first video frame in the N-path live broadcast code stream on an encoding production virtual time axis, and the index number of the any one image group is related to encoding delay of the any one image group; the time stamp offset is an offset of the encoding time stamp of the first video frame relative to the first video frame in a target image group, wherein the target image group is the first image group processed when a slicing task for slicing any one image group is started;

The first determining module is used for determining the segmentation position of any one image group in the one-path live broadcast code stream according to the index number of the any one image group;

and the second determining module is used for determining the audio playing time and the video playing time of any one image group based on the starting reference time and the timestamp offset.

In a fifth aspect, a video alignment system is provided, comprising:

an encoding server and a slicing server;

the encoding server is used for acquiring the encoding time, the standard index number and the initial reference time of any image group in any one of the N paths of live broadcast code streams for any one of the N paths of live broadcast code streams; the initial reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on an encoding production virtual time axis; calculating the coding delay of any one image group based on the coding time, the standard index number and the initial reference time; updating the standard index number according to the coding delay to obtain the index number of any one image group; when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number;

The slice server is used for acquiring any one image group in one live code stream; analyzing any one image group, and acquiring an alignment parameter of the any one image group, wherein the alignment parameter comprises a start reference time, an index number of the any one image group and a time stamp offset of a first video frame in the any one image group, the start reference time is the encoding completion time of the first video frame in the N-path live broadcast code stream on an encoding production virtual time axis, and the index number of the any one image group is related to encoding delay of the any one image group; the time stamp offset is an offset of the encoding time stamp of the first video frame relative to the first video frame in a target image group, wherein the target image group is the first image group processed when a slicing task for slicing any one image group is started; determining the segmentation position of any one image group in the live code stream according to the index number of the any one image group; and determining the audio playing time and the video playing time of any image group based on the starting reference time and the timestamp offset.

In a sixth aspect, an electronic device is provided, including: the device comprises a processor, a memory and a communication bus, wherein the processor and the memory are communicated with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to execute the program stored in the memory, to implement the video encoding method according to the first aspect or the video alignment method according to the second aspect.

In a seventh aspect, a computer readable storage medium is provided, in which a computer program is stored, where the computer program, when executed by a processor, implements the video encoding method according to the first aspect or the video alignment method according to the second aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the method provided by the embodiment of the application, the standard index number of the image group is corrected based on the coding delay of the image group in the coding process, and the corrected index number is larger than the standard index number, so that when the slice is generated, the coding delay can be tracked based on the corrected index number, and the problem that the corresponding slice is not produced when the playing end downloads the slice on the generating server according to the time of the standard frame rate, so that the slice production delay is generated by the live cloud is avoided, and the alignment of live code streams with different definitions under the live scene is accurately realized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic flow chart of a video encoding method according to an embodiment of the present application;

FIG. 2 is a flow chart of a video alignment method according to an embodiment of the application;

FIG. 3 is a schematic diagram of a video encoding apparatus according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a video alignment apparatus according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a video alignment system according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the related art, slice alignment is generally performed according to an equal frame rate for a standard signal source stream of live transcoding, however, test observation shows that the production method of slice alignment performs normally in an ideal case. However, when the live transcoded signal source stream is not transcoded at the standard equal frame rate, a play-down delay is created for the playback end.

For example, at a standard frame rate, for a group of pictures (GOP) of 2s, each GOP is 50 frames. The slicing is performed in GOP, and if it is a 6s slice, the slice should contain 3 GOP,150 frames of video frames. If the frame rate is insufficient, for example 24.9fps, this corresponds to a time difference between two video frames of slightly more than 1000/25=40 ms. When slicing, each GOP is made up of 50 frames of video frames, but its actual time length is slightly longer than 2s, which in turn results in a time length of each live slice being slightly longer than 6s, so that the production of slices will be slightly "lag", i.e. the time for slice production will be slightly longer than at standard frame rates. A slice is slightly worse and accumulates over time, and this deviation becomes larger and larger. When the playing end downloads the slice on the generation server according to the time under the standard frame rate, the corresponding slice is not produced yet, and the live cloud generates slice production delay.

In order to solve the problems in the related art, the present embodiment provides a video encoding method, which is applicable to an encoding server;

as shown in fig. 1, the method may include the steps of:

step 101, for any one live code stream in the N live code streams, acquiring the encoding time, standard index number and initial reference time of any one image group in any one live code stream; the initial reference time is the encoding completion time of the first frame video frame in the N paths of live broadcast code streams on the encoding production virtual time axis.

It should be understood that, in the N live code streams, multiple live code streams with different resolutions are obtained by transcoding the same signal source stream for the encoding server. Each live code stream is composed of a plurality of GOPs.

In this embodiment, the encoding time of any one image group is the time when the encoding server actually encodes to obtain any one image group. It should be appreciated that when the time interval of the video frames is unstable, resulting in a frame rate of the signal source stream that is less than the standard frame rate, the encoding instant of any one group of pictures is later than the encoding instant of that group of pictures at the standard frame rate.

It should be understood that in this embodiment, the standard index number of any one image group is the index number of the image group determined when the frame rate of the signal source stream is the standard frame rate. Accordingly, the standard index number indicates the coding order in which the group of pictures is coded when the frame rate of the signal source stream is the standard frame rate. For example, when the standard index number of the image group is 2, it indicates that the image group is the second encoded image group in the live code stream to which the image group belongs.

In the application, the coding server can synchronously start the coding of the N paths of live broadcast code streams, and can also start the coding of the N paths of live broadcast code streams successively. In this embodiment, in order to achieve alignment of multiple live code streams, the encoding completion time of the first frame video frame in the N live code streams on the encoding production virtual time axis is taken as the starting reference time, that is, for each live code stream in the N live code streams, each live code stream is considered to be encoded from the first frame video frame, and the encoding completion time of the first frame video frame is the encoding completion time.

In this embodiment, the code production virtual time axis is consistent with the time local to the code server, that is, the time on the code production virtual time axis is actually the time local to the code server.

And 102, calculating the coding delay of any image group based on the coding time, the standard index number and the initial reference time.

It should be appreciated that the coding delay indicates the time at which any one group of pictures is delay coded with respect to the signal source stream at the standard frame rate.

In an alternative embodiment, the specific calculation process of the coding delay may include: calculating the product of the standard index number and the time length of the standard image group to obtain the coding time offset of any one image group on the coding production virtual time axis; calculating the sum of the code moment offset and the initial reference time to obtain a summation result; and calculating the difference between the coding time and the summation result to obtain the coding delay.

It should be understood that the standard image group duration is a preset duration of one image group assuming that the frame rate of the signal source stream is the standard frame rate. For example, when the standard frame rate is 25fps and one image group includes 50 video frames, then the standard image group duration may be (1/25 fps) ×50=2s.

In application, each image group is composed of one key frame and a plurality of non-key frames. The key frame is typically the first frame in the group of images. The index number of the group of images is carried by the key frame. The encoding time offset in this embodiment indicates the time when the first frame video frame of any one image group is encoded later than the first frame video frame in the N-way live code stream.

In application, when the encoding server transcodes the signal source stream, the encoding server transcodes the signal source stream according to each GOP, and in the transcoding process, the encoding server encapsulates the index number as parameter information into SEI data units and attaches the SEI data units to key frames of the image group. It should be understood that SEI is a data unit for information transfer defined in the H265/H264 video coding standard.

It should be understood that the summation result calculated based on the encoding time offset and the start reference time represents the time at which any one of the image groups is encoded assuming that the frame rate of the signal source stream is the standard frame rate.

In this embodiment, the formula used for calculating the coding delay is as follows:

delay＝current_ntp_time–(start_ntp_time+gop_index*gop_time)

wherein delay is coding delay, current_ntp_time is coding time, start_ntp_time is starting reference time, gp_index is standard index number of any image group, and gp_time is standard image group duration.

And 103, updating the standard index number according to the coding delay to obtain the index number of any one image group, wherein when the coding delay is longer than the length of the standard image group, the index number is longer than the standard index number.

The present embodiment adopts a mode of increasing the index number to track the coding delay. In specific implementation, the updating size of the standard index number is determined based on the relative relation between the coding delay and the standard image group duration.

In an alternative embodiment, determining whether the encoding delay is greater than N standard image group durations; if the time length of the image group is longer than the time length of N standard image groups, determining that the index number of any one image group is the standard index number plus N; if the coding delay is not greater than the N standard image group duration, updating N=N-1, returning to the step of judging whether the coding delay is greater than the N standard image group duration, and if the coding delay is greater than the 1 standard image group duration, determining that the index number of any one image group is the standard index number plus 1, and if the coding delay is not greater than the 1 standard image group duration, determining that the index number of any one image group is the standard index number.

In practical application, in order to timely offset coding delay and avoid the generation of the phenomenon of slice skipping, N may be preferably set to 1.

In the technical scheme provided by the embodiment, because the standard index number of the image group is corrected based on the coding delay of the image group in the coding process, and the corrected index number is larger than the standard index number, when the slice is generated, the coding delay can be tracked based on the corrected index number, so that the problem that the corresponding slice is not produced yet when the playing end downloads the slice on the generating server according to the time of the standard frame rate, and the slice production delay is generated by the live broadcast cloud is avoided, and the alignment of live broadcast code streams with different definitions is accurately realized in a live broadcast scene.

The embodiment provides a video alignment method, which can be applied to a slicing server. It should be understood that live code streams with different definitions can be sliced in different threads of the same process, and at this time, live code streams with different definitions are sliced in the same slice server; of course, live code streams with different definition may also be sliced in different slicing servers, which is not limited in this embodiment.

As shown in fig. 2, the method may include the steps of:

step 201, obtaining any one image group in a live code stream;

Step 202, analyzing any one image group, and acquiring an alignment parameter of any one image group, wherein the alignment parameter comprises a start reference time, an index number of any one image group and a timestamp offset of a first video frame in any one image group, and the start reference time is the encoding completion time of the first video frame in the N paths of live code streams on an encoding production virtual time axis, and the index number of any one image group is related to encoding delay of any one image group; the time stamp offset is the offset of the encoding time stamp of the first video frame relative to the first video frame in the target image group, and the target image group is the first image group processed when the slicing task of slicing any one image group is started;

step 203, determining the segmentation position of any one image group in a live code stream according to the index number of any one image group;

step 204, determining the audio playing time and the video playing time of any one image group based on the initial reference time and the timestamp offset.

In the application, the slice server can pull up the stream from the encoding server or the transit server to acquire any one image group in one live code stream. When the slice is streamed up from the transit server, the encoding server transcodes the signal source stream and then uploads the transcoded result to the transit server so that the slice server streams up from the transit server.

It should be understood that the slicing server may perform slicing processing on one live code stream or multiple live code streams to obtain slice data that satisfies the HLS live rule. When the slicing server performs slicing processing on the multi-path live code stream, different processes in the slicing server or different processes in the same process respectively perform slicing processing on different live code streams.

In this embodiment, the slice server performs slice processing on the live code stream with GOP (group of pictures) as a minimum unit. In order to realize data alignment of different paths of live code streams and avoid downloading delay to a playing end, each image group in each path of live code stream in the N paths of live code streams carries an index number (gop_index), so that a slicing server can determine the slicing position of the GOP based on the index number of the GOP after the GOP is pulled. It should be understood that, since the gop_index carried in the image group is related to the coding delay of the image group on the coding server side, when the image group is aligned based on the gop_index, the problem of "slice production delay" caused by the coding delay can be eliminated in the slicing process.

In a specific implementation, in an alternative embodiment, calculating the product of the index number and the duration of the standard image group to obtain the standard coding time offset of any one image group on the coding production virtual time axis; obtaining a quotient value and a remainder value obtained by dividing a standard coding time offset by a preset slicing period duration; determining a serial number of a slice to which any one image group belongs based on the quotient value, and determining a position of any one image group in the slice to which the any one image group belongs based on the remainder value; the serial number of the slice and the position in the slice to which the slice belongs are taken as slicing positions.

It should be understood that the standard image group duration herein refers to the duration of one image group when the frame rate of the signal source stream is the standard frame rate. For example, the standard frame rate is 25fps, and for a group of 50 video frames, the standard group duration should be (1/25 fps) x 50 = 2s. It should be noted that, the signal source stream usually carries a frame rate, where the carried frame rate is a standard frame rate, and based on various uncertainty factors of practical application, the signal source stream often does not transmit at the carried standard frame rate, and the time difference between two video frames is slightly greater than 1/25 fps=40 ms.

In order to improve the alignment efficiency of video data, the slice server calculates the standard group of pictures duration (gop_time) of the first GOP after downloading the video data into the first GOP, and caches the first GOP. When the standard group of pictures duration needs to be acquired, acquiring the gp_time of the first GOP, and taking the gp_time of the first GOP as the marked group of pictures duration. It should be understood that, here, the first GOP refers to the first GOP downloaded after the start of the slicing server, where the slice server does not fail, the gop_index=0 of the first GOP, and if the slicing server fails and is restarted during the process of streaming and slicing, the gop_index of the first GOP is not necessarily 0.

In this embodiment, the slicing period length reflects the slicing rule. The slicing period duration is the least common multiple of the slicing duration and the standard image group duration. For example, the slice period length segment_time is 5s and the gp_time is 2s, and then the slice period length is 5*2 =10s.

In this embodiment, the slicing duration is preset manually according to the service requirement. For example, the slice duration may be set to 5s or 6s, etc. The slice lengths of the different path (different definition) slices are the same.

Taking 5s as a slicing period and 2s as an example, the slicing rule reflected by the slicing period length is described. Since the GOP contained in each slice must be a complete GOP, in this example, it is only guaranteed that the average time of the slices in one slice period is 5s. Further, the resulting slicing rule may be 6s,4s, wherein a slice of 6s comprises 3 GOPs and a slice of 4s comprises 2 GOPs, the first 6s slice and the 4s slice comprising a first slice period, and the second 6s slice and the 4s slice comprising a second slice period.

In this embodiment, when the coding delay of any one image group is greater than that of the standard image group, the standard index number of any one image group is increased, so that in this case, the index number of any one image group jumps with respect to the index number of the adjacent previous image group. Some slices may lack groups of images when determining the slicing positions based on the index numbers.

In one example, segment_time=6s, gp_time=2s, and the least common multiple of segment_time and gp_time is set to 6s, so that the slice period duration can be determined to be 6s.

In the first case, a live code stream includes 5 image groups, and the index numbers of the 5 image groups are divided into 0, 2, 3, 4, and 5, according to the above calculation method for determining the slicing position, in terms of index number 2, the gp_time_index=2s×2s=4s, 4s/6 s=quotient value is 0, the 1 st slice is determined based on the quotient and the remainder (there are 0 slice periods in front), 4s/6 s=remainder is 4, and based on the 3 rd (4/2+1) image group determined to be the 1 st slice, the 1 st image group with index number 0 can be determined similarly, the 1 st image group with index number 3 is the second slice, the 2 image groups with index number 4 are the second slice, and the 3 image groups with index number 5 are the second slice.

That is, in this case, the hopped GOP2 still belongs to the first slice, which lacks an intervening GOP.

In the second case, the index numbers of 5 image groups included in one live code stream are respectively 0, 1, 3, 4 and 5, in this case, the image group with the index number of 3 is the image group with jump, and according to the calculation mode of determining the slicing position, the image group belongs to the first slice of the second slice, so that the first slice lacks a GOP, that is, lacks the last GOP.

In the third case, the index numbers of 5 groups of pictures included in one live code stream are 0, 1, 2, 4, and 5, respectively, in this case, GOP4 is a group of pictures that is hopped, and according to the above calculation method for determining the slicing position, the group of pictures belongs to the second slice of the second slice, so that the second slice lacks a GOP, that is, lacks the first GOP.

In this embodiment, the effect of tracking time can be achieved by the image group in which the index number jumps. Taking the third case as an example, according to the coding server side, the GOP is tracked and skipped, the GOP index jumps from 2 to 4, and no 3; that slice side groups 4,5 into one slice, i.e. the 2 nd slice has only 2 GOP. The second slice has only two GOP because GOP0, GOP1 and GOP2 create a delay, and the group of pictures duration of each GOP is actually slightly longer than the standard group of pictures duration (2 s), resulting in the total duration of the first slice also being longer than 6s, so as the delay is detected at the encoding server side, the hop of the gp_index results in the effect of 2 nd slice with only 2 GOPs (possibly slightly longer than 4s but less than 6 s), thus compensating the delay back.

That is, the hop of the GOP index+1 occurs when it is detected, and there must be one slice in the vicinity and one GOP less, that is, the time of the previous slice is slightly delayed, and the previous slice is compensated back in the form of a slice-missing GOP.

In this embodiment, since the gp_index is based on a unified code to produce a virtual time axis, the segmentation rule determined according to the gp_time and the segment_time also has cross-task consistency; so that multi-slice data alignment can be ensured. Even if one-in-multiple-out coding task is interrupted or a certain slicing task is interrupted, the slicing position can still be positioned according to the gp_index carried in the data stream after restarting. In addition, the index number of the image group considers the coding delay, so that the generation of the slice generation delay can be avoided on the basis of ensuring the alignment of multi-path slice data.

In this embodiment, the alignment of video data includes the alignment of slicing any one of the image groups, and the correction of the video play time and the audio play time in any one of the image groups is required, which will be described below.

In an alternative embodiment, calculating the sum of the initial reference time and the timestamp offset to obtain the time offset of the first video frame in any one image group on the coding production virtual time axis; for each video frame in any image group, correcting the video coding time stamp and the video display time stamp of each video frame by adopting a time offset to obtain a corrected video coding time stamp corresponding to the video coding time stamp and a corrected video display time stamp corresponding to the video display time stamp; and respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting the time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp; the corrected video coding time stamp and the corrected video display time stamp are used as the video time stamp of each video frame; and taking the corrected audio coding time stamp and the corrected audio display time stamp as the audio time stamp of each audio frame; taking the playing time indicated by the video time stamps of all video frames in any one image group as the video playing time of any one image group; and taking the playing time indicated by the audio time stamps of all the audio frames in any one image group as the audio playing time of any one image group.

The formula for calculating the time offset in this embodiment may be:

video_ts_offset＝start_ntp_time+first_gop_dts_drift；

the video_ts_offset is the time offset of the first video frame in any one image group on the encoding production virtual time axis, the start_ntp_time is the initial reference time, and the first_gp_dts_drift is the time stamp offset.

In this embodiment, the calculation formulas for respectively correcting the video encoding time stamp and the video display time stamp of each video frame by using the time offset may be:

video_packet_dts’＝video_packet_dts+video_ts_offset；

video_packet_pts’＝video_packet_pts+video_ts_offset；

wherein video_packet_dts is a video encoding time stamp, video_packet_pts is a video display time stamp, video_packet_dts 'is a corrected video encoding time stamp, and video_packet_pts' is a corrected video display time stamp.

It should be understood that when the two values of video_packet_dts and video_packet_pts are the slice server pull stream, the video coding time stamp and the video display time stamp of each path of live code stream data are automatically acquired when the slice server pulls stream.

In this embodiment, the alignment of the video time stamps of the multiple slices is ensured by converting the video time stamps of the video frames to a unified code production time axis.

audio_packet_dts’＝audio_packet_dts+video_ts_offset

audio_packet_pts’＝audio_packet_dts+video_ts_offset

Wherein audio_packet_dts is an audio encoding time stamp, audio_packet_dts is an audio display time stamp, audio_packet_dts 'is a corrected audio encoding time stamp, and audio_packet_pts' is a corrected audio display time stamp.

It should be understood that, when the two values of audio_packet_dts and audio_packet_dts are the slice server to be streamed, the audio coding time stamp and the audio display time stamp of each path of live code stream data are automatically acquired when the slice server is streamed.

In this embodiment, the alignment of the multi-slice audio time stamps is ensured by converting the audio time stamps of the audio frames to a unified encoding production time axis.

In the technical scheme provided by the embodiment of the application, the segmentation position of any one image group, the audio playing time and the video playing time in the one-path live code stream are determined based on the alignment parameters in any one image group, so that the alignment of video data is realized. Because the code production virtual time axis on the code server is used as the basis of the segmentation position and the timestamp production, and any one image group carries an index number, when transcoding or any one path of slicing task is interrupted and started, the position of any one image group on the transcoding production virtual time axis can be perceived in real time whenever slicing is started, so that the segmentation position is synchronously determined in multiple paths and the timestamp is corrected, and the method has strong robustness and interference resistance. Meanwhile, the index number of any image group is related to the coding delay, so that generation of the delay generated by slicing can be avoided on the basis of ensuring alignment of multi-path slice data.

In this embodiment, transcoding and multiple slices can be distributed on different servers, so that task deployment is more flexible, for example, rtmp streams can be produced from encoded data, then rtmp servers are used as transfer servers for pulling streams, and multiple slices are produced, and rtmp data streams and hls slice data streams are obtained at the same time, so that the task number is reduced and the computing resources are saved.

The embodiment is based on cross-task slice alignment, and can optimize video playing and blocking through automatic code rate adjustment under the live scene of HLS slices by combining with the self-adaptive code rate technology.

The present embodiment is based on cross-task slice alignment, and may be derived to retrofit a server side ad insertion technology (SSAI) that achieves source stream slice and personalized ad slice alignment.

The cross-task slice alignment in the embodiment can ensure that slice data is effective, and play and clip are optimized through ways such as filling data under the condition of insufficient coding efficiency and the like.

Based on the same concept, the embodiment of the present application provides a video encoding device, and the specific implementation of the device may be referred to the description of the embodiment of the method, and the repetition is omitted, as shown in fig. 3, where the device mainly includes:

the first obtaining module 301 is configured to obtain, for any one live code stream of the N live code streams, a coding time, a standard index number, and a start reference time of any one image group in any one live code stream; the initial reference time is the encoding completion time of the first frame video frame in the N paths of live broadcast code streams on an encoding production virtual time axis;

The calculating module 302 is configured to calculate a coding delay of any one image group based on the coding time, the standard index number and the initial reference time;

an updating module 303, configured to update the standard index number according to the coding delay, so as to obtain an index number of any one image group; when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number.

The calculation module 302 is configured to:

calculating the product of the standard index number and the time length of the standard image group to obtain the coding time offset of any one image group on the coding production virtual time axis;

calculating the sum of the code moment offset and the initial reference time to obtain a summation result;

The update module 303 is configured to:

Based on the same conception, the embodiment of the present application provides a video alignment device, and the specific implementation of the device may be referred to the description of the embodiment of the method, and the repetition is omitted, as shown in fig. 4, where the device mainly includes:

a second obtaining module 401, configured to obtain any one image group in one live code stream;

the parsing module 402 is configured to parse any one of the image groups, and obtain an alignment parameter of any one of the image groups, where the alignment parameter includes a start reference time, an index number of any one of the image groups, and a timestamp offset of a first video frame in any one of the image groups, where the start reference time is a coding completion time of the first video frame in the N-path live code stream on a coding production virtual time axis, and the index number of any one of the image groups is related to a coding delay of any one of the image groups; the time stamp offset is the offset of the encoding time stamp of the first video frame relative to the first video frame in the target image group, and the target image group is the first image group processed when the slicing task of slicing any one image group is started;

a first determining module 403, configured to determine a segmentation position of any one image group in a live code stream according to an index number of any one image group;

A second determining module 404, configured to determine an audio playing time and a video playing time of any one image group based on the start reference time and the timestamp offset.

The first determining module 403 is configured to:

obtaining a quotient value and a remainder value obtained by dividing a standard coding time offset by a preset slicing period duration;

determining a serial number of a slice to which any one image group belongs based on the quotient value, and determining a position of any one image group in the slice to which the any one image group belongs based on the remainder value;

the serial number of the slice and the position in the slice to which the slice belongs are taken as slicing positions.

The second determining module 404 is configured to:

for each video frame in any image group, correcting the video coding time stamp and the video display time stamp of each video frame by adopting a time offset to obtain a corrected video coding time stamp corresponding to the video coding time stamp and a corrected video display time stamp corresponding to the video display time stamp; and respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting the time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp;

The second determining module 404 is configured to:

the first summation result is taken as a corrected video coding time stamp, and the second summation result is taken as a corrected video display time stamp.

The second determining module 404 is configured to:

The first summation result is taken as a corrected audio coding time stamp, and the second summation result is taken as a corrected audio display time stamp.

Based on the same concept, the embodiment of the present application provides a video alignment system, and the specific implementation of the system may be referred to the description of the embodiment of the method, and the details are not repeated, as shown in fig. 5, where the system mainly includes:

an encoding server 501 and a slice server 502;

the encoding server 501 is configured to obtain, for any one live code stream in the N live code streams, an encoding time, a standard index number, and a start reference time of any one image group in the any one live code stream; the initial reference time is the encoding completion time of the first frame video frame in the N paths of live broadcast code streams on an encoding production virtual time axis; calculating the coding delay of any one image group based on the coding time, the standard index number and the initial reference time; updating the standard index number according to the coding delay to obtain the index number of any image group; when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number;

the slice server 502 is configured to obtain any one image group in one live code stream; analyzing any one image group, and acquiring an alignment parameter of any one image group, wherein the alignment parameter comprises a start reference time, an index number of any one image group and a timestamp offset of a first video frame in any one image group, and the start reference time is the encoding completion time of the first video frame in the N paths of live code streams on an encoding production virtual time axis, and the index number of any one image group is related to encoding delay of any one image group; the time stamp offset is the offset of the encoding time stamp of the first video frame relative to the first video frame in the target image group, and the target image group is the first image group processed when the slicing task of slicing any one image group is started; determining the segmentation position of any one image group in one live code stream according to the index number of any one image group; an audio playing time and a video playing time of any one of the image groups are determined based on the start reference time and the timestamp offset.

Based on the same concept, the embodiment of the application also provides an electronic device, as shown in fig. 6, where the electronic device mainly includes: processor 601, memory 602, and communication bus 603, wherein processor 601 and memory 602 accomplish communication with each other through communication bus 603. The memory 602 stores a program executable by the processor 601, and the processor 601 executes the program stored in the memory 602 to implement the following steps:

for any one live code stream in the N live code streams, acquiring the encoding time, the standard index number and the initial reference time of any one image group in any one live code stream; the initial reference time is the encoding completion time of the first frame video frame in the N paths of live broadcast code streams on an encoding production virtual time axis; calculating the coding delay of any one image group based on the coding time, the standard index number and the initial reference time; updating the standard index number according to the coding delay to obtain the index number of any image group; when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number;

or alternatively, the first and second heat exchangers may be,

any one image group in one live code stream is obtained; analyzing any one image group, and acquiring an alignment parameter of any one image group, wherein the alignment parameter comprises a start reference time, an index number of any one image group and a timestamp offset of a first video frame in any one image group, and the start reference time is the encoding completion time of the first video frame in the N paths of live code streams on an encoding production virtual time axis, and the index number of any one image group is related to encoding delay of any one image group; the time stamp offset is the offset of the encoding time stamp of the first video frame relative to the first video frame in the target image group, and the target image group is the first image group processed when the slicing task of slicing any one image group is started; determining the segmentation position of any one image group in one live code stream according to the index number of any one image group; an audio playing time and a video playing time of any one of the image groups are determined based on the start reference time and the timestamp offset.

The communication bus 603 mentioned in the above-mentioned electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated to PCI) bus, an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated to EISA) bus, or the like. The communication bus 603 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.

The memory 602 may include random access memory (Random Access Memory, simply RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the aforementioned processor 601.

The processor 601 may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

In yet another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to perform the video encoding method or the video alignment method described in the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, by a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, microwave, etc.) means from one website, computer, server, or data center to another. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape, etc.), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A video encoding method, wherein the video encoding method is applied to an encoding server, comprising:

for any one live broadcast code stream in N live broadcast code streams, acquiring the coding time, standard index number and initial reference time of any one image group in the any one live broadcast code stream; the initial reference time is the encoding completion time of the first frame of video frame in the N-path live broadcast code stream on an encoding production virtual time axis, and the standard index number indicates the encoding order of the image group when the frame rate of the signal source stream is the standard frame rate;

updating the standard index number according to the coding delay to obtain the index number of any one image group; when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number;

the standard index number is updated according to the coding delay to obtain the index number of any one image group, which comprises the following steps: judging whether the coding delay is longer than N standard image group time lengths or not; if the time length of the image group is longer than the time length of the N standard image groups, determining that the index number of any one image group is the standard index number plus N; and if the code delay is not greater than the N standard image group duration, updating N=N-1, returning to the step of judging whether the code delay is greater than the N standard image group duration, and if the code delay is greater than the 1 standard image group duration, determining that the index number of any one image group is the standard index number plus 1, and if the code delay is not greater than the 1 standard image group duration, determining that the index number of any one image group is the standard index number.

2. The method of claim 1, wherein calculating the coding delay for the arbitrary group of pictures based on the coding time instant, the standard index number, and the starting reference time, comprises:

3. A video alignment method, wherein the video alignment method is applied to a slicing server, comprising:

any one image group in one live code stream is obtained;

analyzing any one image group, and acquiring an alignment parameter of the any one image group, wherein the alignment parameter comprises a start reference time, an index number of the any one image group and a time stamp offset of a first video frame in the any one image group, the start reference time is the encoding completion time of the first video frame in the N-path live broadcast code stream on an encoding production virtual time axis, and the index number of the any one image group is updated by the video encoding method of claim 1 or 2; the time stamp offset is an offset of the encoding time stamp of the first video frame relative to the first video frame in a target image group, wherein the target image group is the first image group processed when a slicing task for slicing any one image group is started;

Determining a segmentation position of any one image group in the one-path live code stream according to the index number of the any one image group, wherein the segmentation position is used for representing the serial number of the slice and the position of the slice;

determining the audio playing time and the video playing time of any one image group based on the starting reference time and the timestamp offset;

wherein determining the audio playing time and the video playing time of the arbitrary one image group based on the start reference time and the timestamp offset comprises: calculating the sum of the initial reference time and the timestamp offset to obtain the time offset of the first video frame in any one image group on the coding production virtual time axis; for each video frame in any one image group, correcting a video coding time stamp and a video display time stamp of each video frame by adopting the time offset to obtain a corrected video coding time stamp corresponding to the video coding time stamp and a corrected video display time stamp corresponding to the video display time stamp; and respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting the time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp; the corrected video coding time stamp and the corrected video display time stamp are used as the video time stamp of each video frame; and taking the corrected audio coding time stamp and the corrected audio display time stamp as the audio time stamp of each audio frame; taking the playing time indicated by the video time stamps of all video frames in any one image group as the video playing time of any one image group; and taking the playing time indicated by the audio time stamps of all the audio frames in any one image group as the audio playing time of any one image group.

4. A method according to claim 3, wherein determining the slicing position of the arbitrary image group in the one-way live code stream according to the index number of the arbitrary image group comprises:

5. A method according to claim 3, wherein for each video frame in said any one of said groups of pictures, correcting the video encoding time stamp and the video display time stamp of said each video frame, respectively, using said time offset, to obtain a corrected video encoding time stamp corresponding to said video encoding time stamp and a corrected video display time stamp corresponding to said video display time stamp, comprising:

6. A method according to claim 3, wherein using the time offset to correct the audio encoding time stamp and the audio display time stamp for each audio frame, respectively, to obtain a corrected audio encoding time stamp corresponding to the audio encoding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp, comprises:

7. A video encoding apparatus, the video encoding apparatus being applied to an encoding server, comprising:

the first acquisition module is used for acquiring the encoding time, the standard index number and the initial reference time of any image group in any one of the N paths of live broadcast code streams; the initial reference time is the encoding completion time of the first frame of video frame in the N-path live broadcast code stream on an encoding production virtual time axis, and the standard index number indicates the encoding order of the image group when the frame rate of the signal source stream is the standard frame rate;

the updating module is configured to update the standard index number according to the coding delay to obtain an index number of the arbitrary image group, and includes: judging whether the coding delay is longer than N standard image group duration, if so, determining that the index number of any one image group is the standard index number plus N, if not longer than the N standard image group duration, updating N=N-1, returning to execute the step of judging whether the coding delay is longer than N standard image group duration until N is updated to 1, if the coding delay is longer than 1 standard image group duration, determining that the index number of any one image group is the standard index number plus 1, and if the coding delay is not longer than 1 standard image group duration, determining that the index number of any one image group is the standard index number; and when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number.

8. A video alignment apparatus for use with a slicing server, comprising:

the analyzing module is configured to analyze the arbitrary image group, obtain an alignment parameter of the arbitrary image group, where the alignment parameter includes a start reference time, an index number of the arbitrary image group, and a timestamp offset of a first video frame in the arbitrary image group, where the start reference time is a time when encoding of the first video frame in the N live broadcast code streams on an encoding production virtual time axis is completed, and the index number of the arbitrary image group is updated by using the video encoding device of claim 7; the time stamp offset is an offset of the encoding time stamp of the first video frame relative to the first video frame in a target image group, wherein the target image group is the first image group processed when a slicing task for slicing any one image group is started;

the first determining module is used for determining the segmentation position of any one image group in the one-path live code stream according to the index number of the any one image group, wherein the segmentation position is used for representing the serial number of the slice and the position of the slice;

A second determining module, configured to determine an audio playing time and a video playing time of the arbitrary one image group based on the start reference time and the timestamp offset, including: calculating the sum of the initial reference time and the timestamp offset to obtain the time offset of the first video frame in any one image group on the coding production virtual time axis; for each video frame in any one image group, correcting a video coding time stamp and a video display time stamp of each video frame by adopting the time offset to obtain a corrected video coding time stamp corresponding to the video coding time stamp and a corrected video display time stamp corresponding to the video display time stamp; and respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting the time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp; the corrected video coding time stamp and the corrected video display time stamp are used as the video time stamp of each video frame; and taking the corrected audio coding time stamp and the corrected audio display time stamp as the audio time stamp of each audio frame; taking the playing time indicated by the video time stamps of all video frames in any one image group as the video playing time of any one image group; and taking the playing time indicated by the audio time stamps of all the audio frames in any one image group as the audio playing time of any one image group.

9. A video alignment system, comprising:

an encoding server and a slicing server;

the encoding server is used for acquiring the encoding time, the standard index number and the initial reference time of any image group in any one of the N paths of live broadcast code streams, wherein the standard index number indicates the encoding sequence of the image group when the frame rate of the signal source stream is the standard frame rate; the initial reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on an encoding production virtual time axis; calculating the coding delay of any one image group based on the coding time, the standard index number and the initial reference time; updating the standard index number according to the coding delay to obtain the index number of any one image group, including: judging whether the coding delay is longer than N standard image group duration, if so, determining that the index number of any one image group is the standard index number plus N, if not longer than the N standard image group duration, updating N=N-1, returning to execute the step of judging whether the coding delay is longer than N standard image group duration until N is updated to 1, if the coding delay is longer than 1 standard image group duration, determining that the index number of any one image group is the standard index number plus 1, and if the coding delay is not longer than 1 standard image group duration, determining that the index number of any one image group is the standard index number; when the coding delay is longer than the standard image group duration, the index number is larger than the standard index number;

The slice server is used for acquiring any one image group in one live code stream; analyzing any one image group, and acquiring an alignment parameter of the any one image group, wherein the alignment parameter comprises a start reference time, an index number of the any one image group and a time stamp offset of a first video frame in the any one image group, the start reference time is the encoding completion time of the first video frame in the N-path live broadcast code stream on an encoding production virtual time axis, and the index number of the any one image group is related to encoding delay of the any one image group; the time stamp offset is an offset of the encoding time stamp of the first video frame relative to the first video frame in a target image group, wherein the target image group is the first image group processed when a slicing task for slicing any one image group is started; determining a segmentation position of any one image group in the one-path live code stream according to the index number of the any one image group, wherein the segmentation position is used for representing the serial number of the slice and the position of the slice; determining the audio playing time and the video playing time of the arbitrary image group based on the starting reference time and the timestamp offset comprises: calculating the sum of the initial reference time and the timestamp offset to obtain the time offset of the first video frame in any one image group on the coding production virtual time axis; for each video frame in any one image group, correcting a video coding time stamp and a video display time stamp of each video frame by adopting the time offset to obtain a corrected video coding time stamp corresponding to the video coding time stamp and a corrected video display time stamp corresponding to the video display time stamp; and respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting the time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp; the corrected video coding time stamp and the corrected video display time stamp are used as the video time stamp of each video frame; and taking the corrected audio coding time stamp and the corrected audio display time stamp as the audio time stamp of each audio frame; taking the playing time indicated by the video time stamps of all video frames in any one image group as the video playing time of any one image group; and taking the playing time indicated by the audio time stamps of all the audio frames in any one image group as the audio playing time of any one image group.

10. An electronic device, comprising: the device comprises a processor, a memory and a communication bus, wherein the processor and the memory are communicated with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to execute a program stored in the memory to implement the video encoding method of any one of claims 1-2 or the video alignment method of any one of claims 3-6.

11. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the video encoding method of any one of claims 1-2 or the video alignment method of any one of claims 3-6.