CN112712525A

CN112712525A - Multi-party image interaction system and method

Info

Publication number: CN112712525A
Application number: CN202011537099.0A
Authority: CN
Inventors: 沈来信; 孙明东; 卫王王; 王兆奎; 米坤
Original assignee: Beijing Thunisoft Information Technology Co ltd
Current assignee: Beijing Thunisoft Information Technology Co ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-04-27

Abstract

The application discloses a multi-party image interaction system and a method, comprising the following steps: an acquisition module for acquiring an image stream; the processing module is used for performing at least one of segmentation, adjustment, replacement or synthesis on the acquired image stream; and the interaction module is used for transmitting the image processed by the processing module. Meanwhile, a video conference system synthesis scheme and a local synthesis scheme are respectively provided for judging whether the generated video image stream is allowed to be stored in the video conference system by the user, so that the requirement of the user on personal privacy protection is met.

Description

Multi-party image interaction system and method

Technical Field

The application relates to the field of electronics, in particular to a multi-party image interaction system.

Background

In the prior art, with the rapid development of the Internet and 5G, online image interaction is more and more common, online work is carried out in a remote video mode, and the problem of centralized office work can be solved.

In the process of realizing the prior art, the inventor finds that:

when an office worker works at home, the background of a video picture is not uniform, so that personal privacy is disclosed, and therefore, the situation that the background can be replaced becomes an urgent requirement of an online conference.

When a plurality of office workers are not in the same region, a real-time conference of a multi-person virtual scene can be realized through a virtual background replacement technology and an image synthesis technology.

Therefore, it is desirable to provide a multi-party image interaction system.

Disclosure of Invention

The embodiment of the application provides a multi-party image interaction system and method, which are used for solving the technical problem of virtual scene office of conference personnel.

The application provides a multi-party image interaction system, which comprises:

an acquisition module for acquiring an image stream;

the processing module is used for performing at least one of segmentation, adjustment, replacement or synthesis on the acquired image stream;

and the interaction module is used for transmitting the image processed by the processing module.

Further, the processing module comprises:

a segmentation sub-module for segmenting at least one acquisition object in the stream of acquisition images;

the adjusting submodule adjusts image parameters of the acquired object based on preset parameters;

a replacement submodule for replacing the adjusted background of the acquisition object;

and the synthesis submodule is used for synthesizing the acquired object after the background is adjusted and replaced to generate a synthesized image.

Further, the segmentation sub-module segments the image stream using a portrait segmentation model.

Further, the image segmentation model comprises: the system comprises an extraction layer, a CBR layer, an up-sampling bilinear interpolation layer and a classification layer.

Further, the preset parameters in the adjusting submodule include parameters of at least one of size, brightness, contrast, saturation, exposure, highlight, shadow, color temperature, hue and sharpening.

Further, the adjusting submodule sets the position of the acquisition object based on preset parameters.

Further, the replacing background in the replacing sub-module is a preset image.

The multi-party image interaction method provided by the application comprises the following steps:

collecting an image stream;

segmenting at least one acquisition object from an acquisition image stream;

sending the collection object to a server;

acquiring at least one acquisition object from a server;

and processing the collected object to generate a composite image.

acquiring an acquisition object from a client;

processing the collected object to generate a synthetic image;

sending the processed composite image to a client;

wherein the client splits at least one acquisition object from the acquired image stream.

adjusting parameters of at least one of image size, brightness, contrast, saturation, exposure, highlight, shadow, color temperature, hue and sharpening of the collected object;

replacing the background image of the acquisition object;

adjusting the relative coordinate position of the acquisition object and the background image until the coordinates of the acquisition object are the preset coordinate position;

and generating a composite image based on the position-adjusted acquisition object and the replaced background image.

The embodiment provided by the application has at least the following beneficial effects:

the multi-party image interaction system provides a remote video mode which protects personal privacy and unifies picture formats for a remote online real-time conference through a virtual background replacement technology and an image synthesis technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic block diagram of a multi-party image interaction system according to an embodiment of the present disclosure.

Fig. 2 is a client processing flow of the multi-party image interaction method according to the embodiment of the present application.

Fig. 3 is a server processing flow of the multi-party image interaction method according to the embodiment of the present application.

Reference numerals:

1000 multi-party image interaction system

1100 acquisition module

1200 processing module

1300 interaction module

1210 segmentation sub-module

1220 regulation submodule

1230 replacement submodule

1240 synthesis submodule

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

To support the need for a long-range online meeting that is not limited by time and place, a multi-party image interaction system 1000 is provided.

The multi-party image interaction system 1000 can separate the human body images of the personnel participating in the conference from the video image stream through the lightweight human body image instance segmentation neural network, then adjust the human body images to enable the human body image pictures to be set uniformly, then replace the background into the court background, finally display the synthesized image stream on the personnel participating in the conference in real time, and complete the online conference video based on the virtual background.

Referring to fig. 1, the present application provides a multi-party image interaction system 1000, including:

an acquisition module 1100 for acquiring an image stream;

a processing module 1200, configured to perform at least one of segmentation, adjustment, replacement, or synthesis on the acquired image stream;

and the interaction module 1300 is configured to transmit the image processed by the processing module.

The processing module 1200 is composed of a segmentation sub-module 1210, an adjustment sub-module 1220, a replacement sub-module 1230, and a synthesis sub-module 1240.

Further, the segmentation sub-module 1210 may segment at least one acquisition object in the acquisition image stream, considering that a plurality of persons may be present in the same acquisition region in a conference.

For example, the capture object may be a portrait of a person who participates in a conference, such as a judge, a co-pending group, a bookmarker, a party, and a lawyer.

In a preferred embodiment provided by the present application, to ensure that the client device can run the deep learning model, the segmentation sub-module 1210 is a stable and general lightweight portrait segmentation model and can support CPU level deployment.

Further, in order to ensure that the online conference has a high real-time rate, the lightweight portrait segmentation model includes: the system comprises an extraction layer, a CBR layer, an up-sampling bilinear interpolation layer and a classification layer.

In a preferred embodiment provided by the present application, the lightweight portrait segmentation model adopts a 9-layer lightweight portrait segmentation model, and includes:

the three feature extraction layers, that is, the downsampling layers are encoder.level1 (feature map is 1/2 of the original), encoder.level2 (feature map is 1/4 of the original), and encoder.level2 (feature map is 1/8).

Up3 is an Up-sampling bilinear interpolation layer, and is realized by adopting UpsamplingBilinear2d, an amplification scale factor scale _ factor is 2, Bn3 is a batch normalization layer and is realized by adopting BatchNorm2d, a stability parameter eps is set to be 1 e-3, and a moment parameter is set to be 0.1.

A CBR layer: the basic convolution CONV + batch normalized BN + activation function PRELU combination, implemented by CONV2d, BatchNorm2d and preelutide, respectively.

Up2 is an Up-sampling bilinear interpolation layer, and is realized by adopting UpsamplingBilinear2d, an amplification scale factor scale _ factor is 2, Bn2 is a batch normalization layer and is realized by adopting BatchNorm2d, a stability parameter eps is set to be 1 e-3, and a moment parameter is set to be 0.1.

Classification layer classifier: consists of a Sequential sequence block, including an upsampling bilinear interpolation layer of upsampling bilinear2d and a Conv2d convolutional layer.

It is to be understood that the lightweight portrait segmentation models described herein are each used to segment an acquisition object in an acquisition image stream. Thus, the design of the network may have different structures. The number of layers may also be modified as appropriate. The described embodiments are only some embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Further, in a specific embodiment provided by the present application, the segmented collection object is sent to the client through the transmission module.

In a preferred embodiment provided by the present application, in order to make the images of the participants in the conference naturally neat, the adjusting sub-module 1220 may adjust the human image segmentation model threshold value based on preset parameters.

Further, the preset parameters include at least one of size, brightness, contrast, saturation, exposure, highlight, shadow, color temperature, hue and sharpening.

In a preferred embodiment provided by the present application, in order to unify image backgrounds of conference participants and protect personal privacy, the replacing sub-module 1230 may replace the background of the captured image based on preset requirements and generate an image to be synthesized.

Further, the background may be set based on preset requirements.

Further, the background picture is placed on the bottom layer of the image to be synthesized based on preset requirements.

For example, the background image here may be a picture of a court background wall.

Further, in consideration of the fact that when there are a plurality of images of the conference participants, in order to make the position of the plurality of images have a certain format, the multi-party image interaction system may adjust the relative coordinate position of the acquisition object and the background image through the adjustment sub-module 1220 until the coordinate of the acquisition object is the preset coordinate position.

For example, the preset coordinate position may be the corresponding position of the person participating in the conference, such as a judge, a co-pending group, a bookkeeper, a party, and a lawyer, in the picture of the court background wall.

Further, a composite image may be generated by the composite sub-module 1240 based on the acquired objects after replacing the background.

Further, a synthetic image may be generated by the synthetic sub-module 1240 based on the position-adjusted acquisition object.

Further, a composite image may be generated by the composite sub-module 1240 based on the position-adjusted acquired object and the replaced background image.

Further, in one embodiment provided herein, the composite image is sent to the client via a transmission module.

Furthermore, the synthesized picture can generate video stream, and online real-time meeting is realized.

The following provides an implementation of a client.

Referring to fig. 2, the present application provides a multi-party image interaction method, including:

s110: an image stream is acquired.

The image stream is real-time image data of the participants of the conference, and the image data is transmitted in a stream mode in a network in a segmented mode.

Further, in one embodiment provided herein, the image stream may be real-time image data of a judge, a co-auditor, a bookmarker, a party, a lawyer, etc. participating in the conference.

The acquisition of the image stream data can be realized by hardware with a real-time image capturing function, such as a terminal camera, a digital camera, a video camera and the like, and conference personnel can be recorded in real time.

Further, in a specific embodiment provided in the present application, the image stream data obtained based on the hardware is collected and stored by the collection module 1100 of the client.

S120: at least one acquisition object is segmented from the stream of acquisition images.

The acquisition objects are real-time portrait faces, portrait outlines, portrait dresses and portrait spatial positions of conference participants.

Further, in one embodiment provided by the present application, the collection object may be a real-time portrait face, portrait outline, portrait dress, and portrait spatial location of a judge, a co-pending group, a bookkeeper, a party, a lawyer, and the like participating in a conference.

Wherein the acquisition object is segmented by a segmentation submodule 1210 of the client into separating the portrait face, the portrait outline, the portrait dress, and the portrait spatial position in the acquired image stream from the image background.

Further, in a specific embodiment provided by the present application, the segmentation sub-module 1210 is a lightweight portrait segmentation model, optimizes the attribute contribution coefficients by using a convolutional neural network algorithm, and can segment the portrait faces, the portrait outlines, the portrait dresses, and the spatial positions of the portraits of the conference participants based on the deep learning model with respect to the real-time image stream.

Further, the principle application of the artificial intelligence deep learning convolutional neural network algorithm in the embodiment of the present application is as follows:

a certain setting of the contribution coefficients of the attributes of a single task may be considered to form an array. Different sets of attribute contribution coefficients of a single task correspondingly present different arrays. A reasonable or preferred array may make the computation relatively reasonable or relatively accurate in obtaining the workload of a single task.

The convolutional neural network algorithm is a process of continuously training through historical samples to obtain a reasonable or preferred array, namely, a reasonable setting of each attribute contribution coefficient.

The convolutional neural network algorithm may include an input layer, a convolutional layer, a pooling layer, and one or more re-convolutional layers, re-pooling layers, fully-connected layers, and output layers.

The input layer is used for inputting training samples of a single task; the output layer is used for outputting the workload of the single task. The number of feature detectors can be set in the convolutional neural network algorithm. The feature detector can detect the feature of the training sample of the attribute feature value and the attribute contribution coefficient and the feature of the sample to be classified. The convolutional neural network algorithm can combine the identified primary features step by step into high-level features through multiple training. The convolutional layer is used for identifying the fitness of the training sample and the sample to be classified and the feature detector and outputting features or feature combinations. The pooling layer is used for de-detail or background de-noising to enhance the identified features. And the full connection layer and the output layer are used for outputting the calculation result of the workload of the single task. The fully-connected layer may be provided with several layers of neurons, and the first layer of neurons is connected with the identified features or feature combinations, each layer of neurons is connected with neurons between adjacent layers, and the last layer of neurons is connected with the output layer. The weight, or probability of occurrence, of a feature or combination of features is optimized by a back-propagation mechanism. I.e. the attribute contribution coefficients are optimized to obtain a reasonable or relatively accurate workload of the individual tasks.

Further, in a specific embodiment provided by the present application, the segmentation sub-module 1210 employs 9 layers of lightweight portrait segmentation models, each layer is introduced as follows:

the system comprises three feature extraction layers, namely down-sampling layers, namely an encoder.level1 (the feature map is 1/2 of the original), an encoder.level2 (the feature map is 1/4 of the original) and an encoder.level2 (the feature map is 1/8).

S130: and sending the acquisition object to a server.

As for the method of sending the collection object to the server, it can be understood that the client transmits the collection object to the server through the interactive module 1300.

Further, the client can access the server through the access port, so that information interaction is performed.

S140: at least one acquisition object is obtained from a server.

For the method of obtaining at least one collection object from the server, it can be understood that the client obtains collection objects of other clients from the server through the interaction module 1300.

It can be understood that the information interaction described in the present application is all used to transmit the collected image stream, the segmented portrait image, the image after replacing the background, the synthesized image, and the video stream. There are therefore a number of transmission schemes. The described embodiments are only some embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

S150: adjusting image parameters of the acquisition object.

Further, the image parameter is at least one of size, brightness, contrast, saturation, exposure, highlight, shadow, color temperature, hue and sharpening.

The image size includes a length and a width, and the size of the image can be adjusted based on the pixels of the adjusted image.

Furthermore, the size of the image of the acquisition object can be adjusted based on the parameter difference between the image pixel of the acquisition object and the preset pixel, so as to ensure that the portrait image conforming to the layout format is obtained.

The image brightness is the brightness of the image light. Furthermore, the brightness of the image of the acquisition object can be adjusted based on the parameter difference between the image pixel of the acquisition object and the preset pixel, so as to ensure that the portrait image conforming to the approximate color value is obtained.

The image contrast is an image shading value. Furthermore, the adjustment of the image contrast of the acquisition object can be adjusted based on the parameter difference between the image pixel of the acquisition object and the preset pixel, so as to ensure that the portrait image conforming to the approximate color value is obtained.

The image saturation is the image color density. Furthermore, the adjustment of the image saturation of the acquisition object can be adjusted based on the difference value between the image pixel of the acquisition object and the preset pixel parameter, so as to ensure that the portrait image conforming to the approximate color value is obtained.

The image exposure is the light flux of the image. Furthermore, the adjustment of the image exposure of the acquisition object can be adjusted based on the difference value between the image pixel of the acquisition object and the preset pixel parameter, so as to ensure that a portrait image conforming to the approximate color value is obtained.

Highlight of an image is a pixel with a brighter tone of the image. Furthermore, the highlight of the image of the acquisition object can be adjusted based on the parameter difference value between the image pixel of the acquisition object and the preset pixel, so that the portrait image conforming to the approximate color value can be obtained.

Image shading is the image illuminant value. Furthermore, the adjustment of the shadow of the image of the acquisition object can be adjusted based on the difference value between the image pixel of the acquisition object and the preset pixel parameter, so as to ensure that the portrait image conforming to the approximate color value is obtained.

The image color temperature is the color tendency of the image light. Furthermore, the adjustment of the color temperature of the image of the acquisition object can be adjusted based on the parameter difference value between the image pixel of the acquisition object and the preset pixel, so as to ensure that the portrait image conforming to the approximate color value is obtained.

The image tone is the brightness of the primary color of the image. Furthermore, the adjustment of the tone of the image of the acquisition object can be adjusted based on the difference value between the image pixel of the acquisition object and the preset pixel parameter, so as to ensure that the portrait image conforming to the approximate color value is obtained.

The image is sharpened to a compensation value for the image contour. Furthermore, the adjustment of the image sharpening of the acquisition object can be adjusted based on the difference value between the image pixel of the acquisition object and the preset pixel parameter, so as to ensure that a portrait image conforming to the approximate color value is obtained.

The adjustment parameters are used for adjusting the portrait image parameters of the collected object through an adjustment submodule 1220 of the client.

S160: the background image of each acquisition object is replaced.

Further, in order to obtain an image to be synthesized with a uniform background and a virtual environment, the replacement background image is an image based on a preset requirement.

Further, in a specific embodiment provided by the present application, the preset image is a picture of a court background wall.

Further, the preset image replaces the background of the captured object by the replacement sub-module 1230 of the client.

S170: and adjusting the relative coordinate position of the acquired object and the background image.

Further, the coordinate position of the object to be captured can be calculated based on a coordinate system generated from the background image.

To obtain an image to be synthesized that conforms to the layout format, the coordinate position of the collection object may be changed based on preset parameters.

Further, the coordinate position of the background image can be calculated based on a coordinate system generated by the acquisition object.

To obtain an image to be synthesized that conforms to the layout format, the coordinate position of the background image may be changed based on preset parameters.

The relative coordinate position of the collected object and the background image is adjusted by the adjustment submodule 1220 of the client.

S180: and generating a composite image based on the position-adjusted acquisition object and the replaced background image.

Wherein the composite image is synthesized by the synthesis sub-module 1240 of the client.

In another preferred embodiment provided by the present application, the multi-party image interaction method includes the following steps:

s110: an image stream is acquired.

The application of the convolutional neural network algorithm of artificial intelligence deep learning is explained in detail above, and is not described in detail here.

S130: and sending the acquisition object to a server.

S140: at least one acquisition object is obtained from a server.

S160: the background image of each acquisition object is replaced.

s110: an image stream is acquired.

S130: and sending the acquisition object to a server.

S140: at least one acquisition object is obtained from a server.

S150: adjusting image parameters of the acquisition object.

The adjustment parameter is used for adjusting the image parameter of the acquired object through an adjustment submodule 1220 of the client.

S160: the background image of each acquisition object is replaced.

S180: and generating a composite image based on the acquired object after replacing the background.

s110: an image stream is acquired.

S130: and sending the acquisition object to a server.

S140: at least one acquisition object is obtained from a server.

S160: the background image of each acquisition object is replaced.

The following provides an implementation of a server.

Referring to fig. 3, the present application provides a multi-party image interaction method, including:

s210: and acquiring the acquisition object from the client.

For the method of acquiring the acquisition object from the client, it can be understood that the server accesses the client through the access port of the interaction module 1300, thereby performing information interaction.

Further, the interactive information is an acquisition object, and the acquisition object is a portrait image obtained by acquiring and segmenting an image stream by a client.

S220: adjusting image parameters of the acquisition object.

The adjustment parameter is used for adjusting the image parameter of the acquired object through the adjustment submodule 1220 in the server.

S230: the background image of each acquisition object is replaced.

Further, the preset image replaces the background of the captured object by the replacement sub-module 1230 of the server.

S240: and adjusting the relative coordinate position of the acquired object and the background image.

Wherein, the relative coordinate position of the collected object and the background image is adjusted by the adjusting submodule 1220 of the server.

S250: and generating a composite image based on the position-adjusted acquisition object and the replaced background image.

Wherein the composite image is synthesized by the synthesis sub-module 1240 of the server.

S260: and sending the processed composite image to a client.

As for the method of transmitting the processed composite image to the client, it can be understood that the server transmits the composite image to the client through the interactive module 1300.

Further, the server can access the client through the access port, so as to perform information interaction.

s210: and acquiring the acquisition object from the client.

S220: adjusting image parameters of the acquisition object.

S230: the background image of each acquisition object is replaced.

S260: and sending the processed composite image to a client.

s210: and acquiring the acquisition object from the client.

S230: the background image of each acquisition object is replaced.

S260: and sending the processed composite image to a client.

s210: and acquiring the acquisition object from the client.

S230: the background image of each acquisition object is replaced.

S260: and sending the processed composite image to a client.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A multi-party image interaction system, comprising:

an acquisition module for acquiring an image stream;

2. The multi-party image interaction system of claim 1, wherein the processing module comprises:

3. The multi-party image interaction system of claim 2, wherein the segmentation sub-module segments the image stream using a portrait segmentation model.

4. The multi-party image interaction system of claim 3, wherein the portrait segmentation model comprises: the system comprises an extraction layer, a CBR layer, an up-sampling bilinear interpolation layer and a classification layer.

5. The multiparty image interactive system according to claim 2, wherein the preset parameters in said adjustment submodule include parameters of at least one of size, brightness, contrast, saturation, exposure, highlight, shadow, color temperature, hue, sharpening.

6. The multi-party image interaction system of claim 2, wherein the adjustment sub-module sets the coordinate position of the acquisition object based on preset parameters.

7. The multi-party image interaction system of claim 2, wherein the replacement background in the replacement sub-module is a preset image.

8. A multi-party image interaction method is characterized by comprising the following steps:

collecting an image stream;

segmenting at least one acquisition object from an acquisition image stream;

sending the collection object to a server;

acquiring at least one acquisition object from a server;

and processing the collected object to generate a composite image.

9. A multi-party image interaction method is characterized by comprising the following steps:

acquiring an acquisition object from a client;

processing the collected object to generate a synthetic image;

sending the processed composite image to a client;

10. A multi-party image interaction method is characterized by comprising the following steps:

replacing the background image of the acquisition object;