CN117793529A

CN117793529A - Extension camera with multipath audio and video coding capability and control method thereof

Info

Publication number: CN117793529A
Application number: CN202311339551.6A
Authority: CN
Inventors: 刘建松; 赵兴国
Original assignee: Shanghai Sailian Information Technology Co ltd
Current assignee: Shanghai Sailian Information Technology Co ltd
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-03-29

Abstract

The invention provides an expansion camera with multi-channel audio and video coding capability and a control method thereof. Wherein the camera includes: the acquisition module is used for acquiring original audio and video signals; the coding module is used for coding the original audio and video signals to obtain at least two paths of code streams with different resolutions, frame rates and/or code rates; the encapsulation module is used for encapsulating the at least two paths of code streams with different resolutions, frame rates and/or code rates into one path of data stream according to a private protocol; and the sending module is used for sending the data stream to a host side. The expansion camera with the multi-channel audio and video coding capability and the control method thereof can simultaneously code and output multi-channel code streams with different parameters according to actual conditions, adaptively provide the optimal coding quality, provide high-quality real-time conference experience under different networks, devices and scenes, meet diversified conference requirements, improve communication efficiency and provide better user experience.

Description

Extension camera with multipath audio and video coding capability and control method thereof

Technical Field

The invention relates to the technical field of video communication, in particular to an expansion camera with multipath audio and video coding capability and a control method thereof.

Background

Internet-based video communication technology is widely used in both work and life scenarios. The camera for collecting the original audio and video signals plays an extremely important role, and is a basic component for realizing numerous applications such as video communication, conference, monitoring, entertainment and the like.

A general camera generally does not have complete encoding and decoding capabilities, but outputs an original audio and video signal to a host side, and then performs encoding operation through the host side. Some advanced cameras or special purpose cameras may integrate the encoding function.

In the prior art, a protocol of connecting a camera with audio and video coding capability with a host end is a standard protocol, and the standard protocol (such as a UVC protocol) can limit the camera to only output one fixed coding parameter at a time, so that the camera can only code a code stream with a fixed frame rate, code rate and resolution, but cannot simultaneously code streams with different frame rates, code rates and resolutions.

The traditional camera weakens the flexibility of the camera when adapting to different application demands, and can not dynamically adjust and output the code streams of different parameters according to actual conditions, and the flexibility and the adaptability are not enough. For example, in a large video conference, a participant may be located in a different geographic location, use a different network bandwidth, and use a different video device, so the code stream requirements of different receiving ends are different, while a conventional camera cannot simultaneously encode code streams with different frame rates, code rates and resolutions to meet the requirements of the receiving ends due to encoding limitations, and a fixed encoding parameter may cause resources waste of the receiving ends in a high bandwidth network, and video quality degradation or transmission interruption of the receiving ends in a low bandwidth network.

Disclosure of Invention

The invention provides an expansion camera with multipath audio and video coding capability and a control method thereof, wherein the code stream of the camera can be adjusted and switched according to the actual requirements of dynamic code rate, frame rate and resolution, thereby meeting the requirement of real switching of different coding capability in a real-time conference, and greatly improving the video conference experience of users in terms of flexibility and self-adaption.

In a first aspect, the present invention provides an extended camera with multiple audio and video coding capabilities, where the camera includes:

the acquisition module is used for acquiring original audio and video signals;

the coding module is used for coding the original audio and video signals to obtain at least two paths of code streams with different resolutions, frame rates and/or code rates;

the encapsulation module is used for encapsulating the at least two paths of code streams with different resolutions, frame rates and/or code rates into one path of data stream according to a private protocol;

and the sending module is used for sending the data stream to a host side.

In a second aspect, the present invention further provides a control method for an extended camera with multiple audio and video coding capabilities, where the method is characterized in that the method includes:

collecting original audio and video signals;

encoding the original audio and video signals to obtain at least two paths of code streams with different resolutions, frame rates and/or code rates;

packaging the at least two paths of code streams with different resolutions, frame rates and/or code rates into one path of data stream according to a private protocol;

and sending the data stream to a host side.

The expansion camera with the multi-channel audio and video coding capability provided by the invention has the coding submodule for simultaneously coding multi-channel different parameter code streams and the packaging module for packaging the multi-channel code streams according to a private protocol, has the capability of adjusting the dynamic code rate, the frame rate and the resolution according to actual requirements, can simultaneously code and output the multi-channel different parameter code streams so as to meet the requirements of different code stream requirements and real-time conferences, and provides better conference experience: firstly, optimizing the bandwidth utilization rate, wherein in a real-time conference, the network environments and the bandwidth conditions of different participants may be different, and the camera can dynamically adjust the code rate of the code in real time according to the actual bandwidth conditions of different receiving ends, so that the bandwidth utilization rate is optimized to the greatest extent, and if the network bandwidth is limited, the camera can automatically reduce the code rate to ensure stable video transmission; secondly, smooth experience is ensured, smooth video transmission is required for a real-time conference so as to ensure the communication effect among participants, and the camera can automatically adjust the frame rate and the resolution according to the network conditions and the equipment performances of different receiving ends so as to ensure smooth video display and avoid picture blocking or delay; thirdly, providing the optimal image quality, wherein different receiving ends can adopt different display devices, such as a large-screen display, a smart phone or a tablet personal computer, and the camera dynamically adjusts the resolution of code streams of different devices in real time according to the resolution and the display capability of the different devices so as to provide the optimal image quality, so that the images are kept clear and adaptive; fourth, simplify user operation, can dynamic adjustment coding parameter satisfy the camera of different receiving terminal code stream demands can alleviate the operation burden of participant, and the participant need not manual adjustment resolution ratio, frame rate or code rate, the camera can adjust according to the demand voluntarily, provides best video communication experience.

In a word, the camera with the dynamic code rate, the frame rate and the resolution adjustment capability can simultaneously encode and output multiple paths of code streams with different parameters according to actual conditions, adaptively provides optimal encoding quality, provides high-quality real-time conference experience under different networks, devices and scenes, meets diversified conference requirements, improves communication efficiency and provides better user experience.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic block diagram of an extended camera with multi-channel audio and video coding capability provided by an embodiment of the present invention;

fig. 2 is a flowchart of a control method of an extended camera with multi-channel audio and video coding capability according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Summary of The Invention

As described above, the present invention provides a flexible extension camera with multiple audio and video coding capabilities and a control method thereof, which can simultaneously code and output multiple code streams with different parameters according to actual conditions of different receiving ends, adaptively provide optimal coding quality, and provide high-quality real-time conference experience under different networks, devices and scenes, so as to meet diversified conference requirements, improve communication efficiency and provide better user experience.

Exemplary System

Fig. 1 is a block diagram of an extension camera with multi-channel audio/video coding capability according to an embodiment of the present invention, as shown in fig. 1, an extension camera 100 provided in this embodiment includes: the device comprises an acquisition module 101, an encoding module 102, a packaging module 103, a sending module 104 and a receiving module 105.

The acquisition module 101 is used for acquiring original audio and video signals.

Specifically, the original audio signal is an analog sound signal, and the original video signal is a continuous frame. Wherein the acquisition module 101 comprises a microphone and an image sensor.

Microphones typically sense ambient sound and convert it into an analog electrical signal, i.e., the original audio signal is captured. The analog sound signal is continuous, i.e. a signal that varies continuously in time, similar to the actual sound we hear.

An image sensor is an electronic device for capturing an optical image, similar to a camera's negative film, capturing an image by converting light into an electrical signal, and common image sensors are CCD sensors and CMOS sensors. The image sensor captures images continuously at a frame rate (frames/second) that constitute successive frames of video, i.e., the raw video data captured by the image sensor is successive frames, each of which contains a distribution of light across the sensor.

The receiving module 105 is configured to receive an instruction from a host.

The host is a transmitting end, and the instructions of the host comprise the demands of the receiving end on the code stream resolution, the frame rate and/or the code rate.

Specifically, the code stream includes a video stream and/or an audio stream.

Resolution refers to the density of pixels on an image, display or image capture device, typically expressed in terms of the number of horizontal pixels times the number of vertical pixels. Different devices and applications may have different resolutions. The resolutions include, but are not limited to, 360p, 480p, 720p, 1080p, 2k, 4k, 8k, and 10k. The higher the resolution, the sharper the images are generally, but more computing resources and memory space are required to process and store the images. Different application areas may use different resolutions, and when a resolution is selected, image quality and computational resources need to be balanced according to device, application, content and performance requirements.

The frame rate is the number of frames of an image that are continuously displayed in a unit time, and is generally expressed in frames per second (fps, frames per second). The frame rate affects the smoothness and dynamics of the video, and different applications and scenes may require different frame rates. The frame rates include 24fps, 30fps, 60fps, 120fps, and 240fps. Of these, 24fps is a frame rate commonly used for movies, and is considered to give a movie texture to movies while maintaining smoothness. Many movies and television shows are shot at 24 fps. 30fps is a standard frame rate commonly used in video and television broadcasts. It can provide a relatively smooth picture and is considered sufficient in most cases. 60fps is a common option for High Frame Rate (HFR). In many video games, sports and action scenes, a smoother picture can be provided, and better performance is achieved for fast moving objects and actions. 120fps, 240fps: these high frame rates are commonly used in the field of professional photography and film production, and are also supported on some high-end televisions and display devices. They can be used in scenes where highly fluent and accurate dynamic feel is required, such as slow motion and special effects.

Different applications may require different frame rates depending on the needs and target audience. Higher frame rates may generally provide a smoother visual experience, but may also occupy more computing resources and bandwidth. The frame rate is selected taking into account the device capabilities, the target media type, and the desires of the viewer.

Code rate refers to the amount of data transmitted or processed in a unit time, typically expressed in bits per second (bps). The code rate includes an audio code rate and a video code rate. In the multimedia field, code rate is used to measure transmission rate and compression quality of data such as audio and video. Different types of media content require different code rates to maintain adequate quality and fluency. The audio code rate includes 128kbps, 192kbps, 256kbps and 320kbps. The video code rate includes 1Mbps, 2Mbps, 4Mbps, 8Mbps, and 10Mbps.

The choice of code rate depends on a number of factors including the type of media, the target audience, bandwidth limitations, and quality requirements. Higher code rates may generally provide better quality but may occupy more bandwidth and memory. Conversely, a lower code rate may save bandwidth, but may sacrifice some image and sound quality. In selecting the code rate, a trade-off between quality and resources is required.

In an actual scenario, different receiving ends may have different requirements on the resolution, frame rate and/or code rate of the code stream.

For example, there are three different receiving ends in video conferences, desktop computers, notebook computers, and smartphones. Desktop computers typically have larger displays and thus it is desirable to have a higher resolution to see details, such as facial expressions and presentation content, in a meeting. At the same time, smooth animation and smooth voice synchronization are also important, and thus a higher frame rate is required. To obtain high quality images on larger displays, it is desirable to choose a higher resolution (e.g., 1080 p) and a higher frame rate (e.g., 60 fps) and a larger bandwidth and higher code rate. Notebook computers have smaller displays, but still require clear images. Because notebook computers are generally mobile and need to accommodate different network environments, proper bandwidth and fluent experience are required. A moderate resolution (e.g., 720 p) and frame rate (e.g., 30 fps) need to be selected to balance image quality and bandwidth consumption. Smart phones have limited screen space but have advantages in terms of flexibility. Users want to be able to see a clear image on a small screen while maintaining smoothness and stability. A moderate resolution (e.g., 720 p) needs to be selected, while appropriate frame rates and code rates are selected to ensure a smooth experience, depending on device performance and network conditions.

In summary, the requirements of different receiving ends for resolution, frame rate and code rate are different, and a trade-off needs to be made between image quality and bandwidth limitation to achieve a satisfactory videoconference experience.

In addition, in order to perform the bidirectional communication between the camera 100 and the host, for example, the receiving module 105 receives the command from the host, and the sending module 104 sends the data stream to the host, a communication connection needs to be established between the camera 100 and the host, and a specific communication connection may also have various options, including a network port, USB, thunderbolt, PCI Express, wireless connection, and the like. The choice of which communication connection is generally dependent on the type of device, the use and the interfaces supported.

The portal connection is suitable for applications requiring high bandwidth and stability, such as monitoring cameras and video conferencing equipment.

USB connection is the most common one, applicable to many types of cameras, from common webcams to high-performance photography and camera equipment.

The Thunderbolt is an interface for high-speed data transmission, and is suitable for professional photography and video production equipment.

PCI Express is suitable for high-performance professional cameras, and can connect the cameras to expansion slots on the host side.

The wireless connection can enable the camera 100 to interact with the host computer in a remote way without physical connection, so that the problems of wiring and physical connection are reduced, the reliability and stability of the whole connection system are improved, and the flexibility and convenience of equipment are improved. Wireless communication protocols include, but are not limited to, wi-Fi, bluetooth, zigbee, loRa, and the like. The selection of an appropriate communication protocol depends on the application requirements such as transmission distance, data rate, power consumption, etc., e.g. bluetooth is suitable for short-range communication. The communication connection between the camera 100 and the host end generally involves a pairing process, authentication, and secure communication setup to ensure communication security and reliability, and once the communication connection is established, the camera 100 and the host end can implement two-way communication.

The same camera 100 may be connected to one host side, or may be connected to multiple host sides, i.e. multiple host sides share one camera 100. For example, the host terminals a-c are connected with the camera 100, the three host terminals need the same audio and video content with different resolutions, the camera 100 receives two-way code stream requests from the host terminal a, three-way code stream requests from the host terminal b, two-way code stream requests from the host terminal c, the camera 100 encodes seven different code streams, encapsulates the seven different code streams to obtain one-way data stream, sends the data stream to the host terminals a-c, and each host terminal unpacks the encapsulated pre-code stream to find out the respective needed code stream and sends the code stream to the corresponding receiving terminal.

For another example, the host terminals a-c are connected with the camera 100, and three host terminals need different audio/video contents with different resolutions, so that the host terminals need to allocate the use of the camera 100 by a management module, and the management module can perform an automatic algorithm, such as a greedy algorithm, dynamic planning, and the like, and automatically calculate a time period allocation scheme according to constraint conditions, such as a receiving terminal requirement, a maximum resource utilization rate, a minimum delay, fairness, and the like, for the use management of the camera 100. In a word, the management module can integrate various allocation strategies, so that the utilization efficiency of the service capacity of the camera 100 is improved to the greatest extent, the requirements under different scenes are met, the sharing of the camera 100 among different host terminals is effectively realized, and the repeated development and manufacturing cost is greatly reduced.

The encoding module 102 is configured to encode the original audio/video signal to obtain at least two code streams with different resolutions, frame rates and/or code rates. Specifically, the encoding module 102 encodes the original audio/video signal according to the instruction of the host side to obtain the at least two paths of code streams with different resolutions, frame rates and/or code rates. The encoding module 102 is provided with a plurality of encoding submodules, and can realize simultaneous encoding of multiple code streams with different resolutions, frame rates and/or code rates.

Typically, the original audio signal needs to be encoded for transmission, storage, and playback on digital devices and networks. Encoding is a process of converting an original audio signal, i.e., an analog sound signal, into digital data so that audio information can be efficiently processed and transmitted.

The main purpose of digital audio coding is to reduce the amount of data while maintaining sound quality as much as possible. The encoding process splits a continuous analog audio signal into discrete samples and converts the samples into digital representations, typically a series of binary digits. Different audio coding algorithms use different techniques to compress the audio data so as to occupy less space in transmission and storage.

Common audio encoding formats include:

MP3: a popular lossy audio coding format is used to reduce file size while maintaining relatively high sound quality.

AAC: is also a lossy audio coding format, commonly used for music and video streaming media.

FLAC: a lossless audio coding format capable of maintaining high sound quality, but the file size is generally large.

WAV: a lossless audio format is typically used to store uncompressed audio data.

The encoded audio files may be played on a variety of devices, as well as transmitted over a network for sharing and playing via online streaming media, music platforms, and other applications. It is noted that lossy coding formats may lose sound quality to some extent, especially at low bit rates, whereas lossless formats may preserve the original audio quality more accurately. The selection of the appropriate audio encoding format depends on the application requirements, storage resources and audio quality requirements.

The original video signal also needs to be encoded for transmission, storage, and playback on digital devices and networks. Video coding is the process of converting a continuous sequence of images into digital data in order to efficiently process and transmit video information.

The main purpose of video coding is to reduce the amount of data while maintaining image quality as much as possible. The encoding process breaks the video frame into a series of image blocks, which are then converted into digital representations, typically a series of binary digits, using different compression techniques. Different video coding algorithms use different techniques to compress video data so as to occupy less space in transmission and storage.

Common video encoding formats include:

H.264/AVC: a widely used lossy video coding format is commonly used for applications such as online video streaming, video conferencing, and digital television.

265/HEVC: a more efficient lossy video coding format is capable of reducing more data volume at the same picture quality, and is suitable for high resolution scenes such as 4K and 8K video.

VP9: an open video coding format developed by Google is used to transmit high quality video content over the Web.

AV1: an open, efficient video coding format is intended to provide high quality video compression.

The encoded video file may be played on a variety of devices, as well as transmitted over a network for sharing and playing via online streaming media, video platforms, and other applications. It should be noted that video coding can lead to some degree of picture quality loss, especially at lower bit rates. The selection of the appropriate video encoding format depends on the application requirements, storage resources, bandwidth and video quality requirements.

For example, the receiving end A in the video conference is in a high-speed wired network environment and has a large bandwidth; the receiving end B uses a mobile network, and the bandwidth is relatively low; the receiving end C is in a common home broadband network, and the bandwidth is moderate. The camera 100 encodes the receiving end a with high resolution, high frame rate and high code rate according to the network environment of the receiving end, encodes the receiving end B with low resolution, low frame rate and low code rate, encodes the receiving end C with medium resolution, medium frame rate and medium code rate, ensures that the receiving end a can obtain high-quality video experience in the same time period, displays clear and smooth pictures, maintains meeting instantaneity despite lower definition and smoothness of the pictures, and can obtain relatively good video quality without being affected by network bottlenecks. When the receiving end C is converted into a mobile network with a lower bandwidth in the next time period, the camera 100 adjusts the code stream corresponding to the receiving end C into a low resolution, a low frame rate and a low code rate in real time. That is, in the case of limited network bandwidth, the camera 100 may automatically reduce the code rate to ensure stable video transmission.

In summary, compared with the camera which can only provide one path of code stream in the prior art, the camera 100 can be compatible with different code stream requirements of different receiving ends at the same time, the participant does not need to manually adjust resolution, frame rate or code rate, and the camera 100 can automatically adjust coding parameters for different receiving ends in real time, and meanwhile, the high-quality video conference experience of different participants is met.

The encapsulation module 103 is configured to encapsulate the at least two paths of code streams with different resolutions, frame rates and/or code rates into one path of data stream according to a private protocol.

Specifically, the encapsulation module 103 is further configured to encapsulate at least two code stream fragments of the same time slice into the same data slice in the data stream according to a private protocol.

The fields of the private protocol have the information of the position, resolution, frame rate and code rate of the code stream encapsulation in the data sheet. Correspondingly, the data sheet comprises marking information for marking the position, resolution, frame rate and code rate information of the code stream package.

The sending module 104 is configured to send the data stream to a host.

And the host side unpacks the data stream according to the marking information to obtain a pre-packaged code stream and sends the pre-packaged code stream to a corresponding receiving end.

For example, the receiving module 105 receives an instruction from the host end that the receiving end needs a code stream a with a resolution of 1080p and a frame rate of 60 fps; the receiving end b needs a code stream b with the resolution of 720p and the frame rate of 30 fps; the receiving end c needs a code stream c with a resolution of 480p and a frame rate of 60 fps. The encoding module 102 encodes the original audio/video signal according to the instruction of the host side to obtain three code streams a-c. Encapsulation module 103 encapsulates the same time slice, e.g., the first 5 ms three-way code stream fragment, into the same data slice in the data stream according to the proprietary protocol. The proprietary protocol format is as follows:

[ Start flag ] [ data slice Length ] [ code stream a information ] [ code stream b information ] [ code stream c information ] [ checksum ] [ end flag ]

Specific fields are described below:

start flag: one byte, for identifying the start of a piece of data. For example, the start flag may be hexadecimal value 0xAA.

Data slice length: one byte indicates the length of the entire piece of data, including the length of all subsequent fields.

Code stream information field: each of the code stream information fields contains information about one code stream segment.

Code stream identification: one byte, for identifying the sequence number of the code stream. Let 0x01 denote code stream a,0x02 denote code stream b, and 0x03 denote code stream c.

Start time: one byte for indicating the start time of the time slice. Let 0x05 denote the first 5 milliseconds in milliseconds.

Resolution ratio: one byte for representing resolution. Let 0x01 denote 1080p,0x02 denote 720p, and 0x03 denote 480p.

Frame rate: one byte for representing the frame rate. Let 0x01 denote 60fps and 0x02 denote 30fps.

And (3) checksum: one byte for checking the integrity of the data slice. A simple exclusive or check or a more complex check algorithm may be used.

End flag: one byte for identifying the end of the piece of data. For example, the end flag may be hexadecimal value 0x55.

The data slice is: [0xAA ] [0x17] [0x01 0x05 0x01 0x01] [0x02 0x05 0x020x02] [0x03 0x05 0x03 0x03] [0x2F ] [0x55]

Wherein:

0xAA is the start flag.

0x17 indicates that the slice length is 23 bytes.

0x01 0x05 0x01 0x01 the information of the code stream a, where 0x01 represents the code stream a,0x05 represents the first 5 milliseconds, 0x01 represents 1080p resolution, and 0x01 represents 60fps frame rate.

0x02 0x05 0x020x02 the information of the code stream b, where 0x02 represents the code stream b,0x05 represents the first 5 milliseconds, 0x02 represents 720p resolution, and 0x02 represents 30fps frame rate.

0x03 0x05 0x03 0x03 the information of the code stream c, where 0x03 represents the code stream c,0x05 represents the first 5 milliseconds, 0x03 represents 480p resolution, and 0x03 represents 60fps frame rate.

0x2F is a checksum, possibly calculated using an exclusive or algorithm or the like.

0x55 is an end flag.

The sending module 104 then sends each piece of data in the data stream to the host in real time.

The host analyzes and unpacks the data sheet in the data stream in real time according to the marking information in the data sheet to obtain a code stream a before encapsulation: [0x01 0x05 0x01 0x01]; code stream b: [0x02 0x05 0x020x02]; code stream c: 0x03 0x05 0x03 0x03, and transmits the code stream a to the receiving end a, the code stream b to the receiving end b, and the code stream c to the receiving end c.

Exemplary method

Correspondingly, the embodiment of the invention also provides a control method of the extension camera with the multi-channel audio and video coding capability. Fig. 2 is a flowchart of a control method of an extended camera with multi-channel audio and video coding capability according to an embodiment of the present invention, where the embodiment includes the following steps:

s101: collecting original audio and video signals;

s102: encoding the original audio and video signals to obtain at least two paths of code streams with different resolutions, frame rates and/or code rates;

s103: packaging the at least two paths of code streams with different resolutions, frame rates and/or code rates into one path of data stream according to a private protocol;

s104: and sending the data stream to a host side.

The method also includes receiving an instruction from the host side.

The step of encoding the original audio/video signal to obtain at least two paths of code streams with different resolutions, frame rates and/or code rates comprises the following steps: and encoding the original audio and video signals according to the instruction of the host side to obtain at least two paths of code streams with different resolutions, frame rates and/or code rates.

The instructions of the host include the requirements of the receiving end on the resolution, the frame rate and/or the code rate of the code stream.

The resolutions include 360p, 480p, 720p, 1080p, 2k, 4k, 8k, and 10k;

the frame rate includes 24fps, 30fps, 60fps, 120fps, and 240fps;

the code rate comprises an audio code rate and a video code rate;

the audio code rate includes 128kbps, 192kbps, 256kbps and 320kbps;

the video code rate includes 1Mbps, 2Mbps, 4Mbps, 8Mbps, and 10Mbps.

The step of encapsulating the at least two paths of code streams with different resolutions, frame rates and/or code rates into one path of data stream according to a private protocol comprises the following steps: and encapsulating at least two code stream fragments of the same time slice into the same data slice in the data stream according to a private protocol.

The method further comprises adding the position, resolution, frame rate and code rate information of the code stream encapsulation in the data slice in the field of the private protocol.

The data sheet comprises marking information for marking the position, resolution, frame rate and code rate information of the code stream package.

The communication connection between the camera and the host end comprises network ports, USB, thunderbolt, PCI Express or wireless connection.

It should be noted that while several devices, units, or modules of an extended camera with multiple audio and video encoding capabilities are mentioned in the detailed description above, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present invention. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.

Further, although the operations of the control method of an extended camera with multi-channel audio and video coding capability of the present invention are depicted in a particular order in the figures, this is not a requirement or suggestion that these operations must be performed in that particular order or that all of the illustrated operations must be performed in order to achieve the desired results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of the present invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

The invention provides:

1. an extension camera with multichannel audio and video coding ability, its characterized in that, the camera includes:

the acquisition module is used for acquiring original audio and video signals;

and the sending module is used for sending the data stream to a host side.

2. The expansion camera according to claim 1, further comprising a receiving module for receiving an instruction from the host side.

3. The expansion camera according to claim 2, wherein the encoding module is further configured to encode the original audio/video signal according to the instruction of the host side to obtain the at least two code streams with different resolutions, frame rates and/or code rates.

4. The expansion camera according to claim 2 or 3, wherein the instruction of the host side includes a requirement of a receiving side for a code stream resolution, a frame rate and/or a code rate.

5. The extension camera of any one of claims 1-4, wherein the resolution comprises 360p, 480p, 720p, 1080p, 2k, 4k, 8k, and 10k;

the frame rate includes 24fps, 30fps, 60fps, 120fps, and 240fps;

the code rate comprises an audio code rate and a video code rate;

the audio code rate includes 128kbps, 192kbps, 256kbps and 320kbps;

the video code rate includes 1Mbps, 2Mbps, 4Mbps, 8Mbps, and 10Mbps.

6. The expansion camera of any of claims 1-5, wherein the encapsulation module is further configured to encapsulate at least two code stream fragments of a same time slice into a same data slice in the data stream according to a proprietary protocol.

7. The extension camera of claim 6, wherein the private protocol field has position, resolution, frame rate and code rate information of the encapsulation of the code stream in the data slice.

8. The extension camera according to claim 6 or 7, wherein the data slice includes marking information for marking the position, resolution, frame rate, and code rate information of the code stream package.

9. The expansion camera according to claim 8, wherein the host side unpacks the data stream according to the tag information to obtain a pre-package code stream, and sends the pre-package code stream to a corresponding receiving side.

10. The expansion camera of any of claims 1-9, wherein the communication connection of the camera with the host side comprises a portal, USB, thunderbolt, PCI Express, or wireless connection.

11. The control method of the extension camera with the multipath audio and video coding capability is characterized by comprising the following steps of:

collecting original audio and video signals;

and sending the data stream to a host side.

12. The control method according to claim 11, characterized in that the method further comprises receiving an instruction from the host side.

13. The method according to claim 12, wherein the step of encoding the original audio/video signal to obtain at least two code streams with different resolutions, frame rates and/or code rates comprises: and encoding the original audio and video signals according to the instruction of the host side to obtain at least two paths of code streams with different resolutions, frame rates and/or code rates.

14. The control method according to claim 12 or 13, wherein the instruction of the host side includes a requirement of the receiving side for a code stream resolution, a frame rate and/or a code rate.

15. The control method according to any one of claims 11 to 14, characterized in that the resolution includes 360p, 480p, 720p, 1080p, 2k, 4k, 8k, and 10k;

the frame rate includes 24fps, 30fps, 60fps, 120fps, and 240fps;

the code rate comprises an audio code rate and a video code rate;

the audio code rate includes 128kbps, 192kbps, 256kbps and 320kbps;

the video code rate includes 1Mbps, 2Mbps, 4Mbps, 8Mbps, and 10Mbps.

16. The control method according to any one of claims 11-15, wherein the step of encapsulating the at least two code streams with different resolutions, frame rates and/or code rates into one data stream according to a proprietary protocol specifically comprises: and encapsulating at least two code stream fragments of the same time slice into the same data slice in the data stream according to a private protocol.

17. The control method according to claim 16, further comprising adding the position, resolution, frame rate and code rate information of the encapsulation of the stream in the data slice in the field of the private protocol.

18. The control method according to claim 16 or 17, characterized in that the data slice includes marking information for marking the position, resolution, frame rate and code rate information of the code stream package.

19. The control method according to claim 18, wherein the host side unpacks the data stream according to the tag information to obtain a pre-package code stream, and sends the pre-package code stream to a corresponding receiving side.

20. The control method according to any one of claims 11 to 19, characterized in that the communication connection of the camera with the host side includes a portal, USB, thunderbolt, PCI Express, or wireless connection.

Claims

the acquisition module is used for acquiring original audio and video signals;

and the sending module is used for sending the data stream to a host side.

2. The expansion camera of claim 1, further comprising a receiving module for receiving instructions from the host side.

4. An extension camera according to claim 2 or 3, wherein the instructions at the host side comprise the demands on the bit stream resolution, frame rate and/or code rate at the receiving side.

the frame rate includes 24fps, 30fps, 60fps, 120fps, and 240fps;

the code rate comprises an audio code rate and a video code rate;

the audio code rate includes 128kbps, 192kbps, 256kbps and 320kbps;

the video code rate includes 1Mbps, 2Mbps, 4Mbps, 8Mbps, and 10Mbps.

7. The extension camera of claim 6, wherein the private protocol has in a field a position, resolution, frame rate, and code rate information of a code stream encapsulation in a data slice.

8. The extension camera according to claim 6 or 7, wherein the data slice includes marking information for marking the position, resolution, frame rate and code rate information of the code stream package.

9. The expansion camera according to claim 8, wherein the host side unpacks the data stream according to the marking information to obtain a pre-package code stream, and sends the pre-package code stream to a corresponding receiving side.

10. The control method of the extension camera with the multipath audio and video coding capability is characterized by comprising the following steps of:

collecting original audio and video signals;

and sending the data stream to a host side.