CN114842384B

CN114842384B - 6G-oriented haptic mode signal reconstruction method

Info

Publication number: CN114842384B
Application number: CN202210476817.0A
Authority: CN
Inventors: 周亮; 李昂; 李沛林; 陈顺; 曹宇; 楼婧蕾; 倪守祥; 陈亚男; 陈建新; 魏昕
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-04-30
Filing date: 2022-04-30
Publication date: 2024-05-31
Anticipated expiration: 2042-04-30
Also published as: CN114842384A

Abstract

The invention discloses a 6G-oriented haptic mode signal reconstruction method, which comprises the steps of collecting data samples and constructing a data set containing video and haptic mode signals; by utilizing the semantic relevance between two modal signals, a cross-modal signal reconstruction model with internal semantic relevance driving is constructed based on deep learning; training a cross-modal signal reconstruction model by using a data set until the quality of the reconstructed signal meets the requirement or the deviation cannot be optimized continuously; in the invention, a modal dataset VisTouch comprising video and touch is constructed for a 6G cross-modal application scene; reconstructing the video modal signals with semantic relevance into haptic modal signals based on a deep learning technology; in order to improve the signal reconstruction quality, two types of loss functions, namely the countermeasures loss and the mean square error loss, are used as target functions, training is carried out based on VisTouch, and the accuracy of the reconstruction method is verified.

Description

6G-oriented haptic mode signal reconstruction method

Technical Field

The invention relates to the technical field of cross-mode communication, in particular to a 6G-oriented haptic mode signal reconstruction method.

Background

In the 6G era, the conventional multimedia application with audio-visual as a core has gradually failed to meet the immersive experience requirement of the user, so that new sensory interaction, such as touch, needs to be introduced into the novel multimedia application to bring the immersive experience to the user. However, the introduction of new mode signals tends to present a great challenge to the existing multimedia system, and the maximum throughput of network transmission is expected to be multiplied under the requirement of multi-dimensional sensory information cooperative transmission. Therefore, in order to achieve both user experience and communication quality, a cross-modal signal reconstruction scheme is urgently needed to reduce the amount of transmission data so as to support 6G immersive multimedia applications.

Research has shown that multimodal applications combine haptic signals with traditional audio video signals, and that users can get more immersive experience through touching or interactive behavior. Aiming at multi-mode application in the 6G age, an audio-visual touch cross-mode communication framework is provided, and three key scientific problems of efficient haptic signal coding, heterogeneous code stream transmission and mode information reconstruction are solved by fully mining the relevance between different mode signals. Meanwhile, a cross-mode communication framework under the condition of artificial intelligence is further provided, and the technical challenges in cross-mode communication are solved by utilizing the technologies of reinforcement learning, transfer learning and the like. The signal transmission and receiving process is accompanied by loss of different degrees, so that the inherent relevance among voice, video and touch signals is discovered, one mode signal is accurately and real-time reconstructed, and the method is one of the key points of 6G cross-mode communication research and is also considered as a key technology capable of greatly improving the immersive experience of users. In a potential immersive application scene of 6G (such as immersion cloud XR, holographic communication and sensory interconnection), the cross-mode reconstruction technology can recover the touch signal of the same object by using the existing video and audio signals, the newly generated touch signal can reconstruct the original audio and video signal in super resolution, so that the communication requirements of people, objects and environments are greatly met, and meanwhile, the millisecond-level delay under 6G provides a better connection experience for users.

For the deep learning model for realizing the cross-modal reconstruction, the performance quality depends on the quality and the scale of a data set, in theory, the larger the data volume is, the higher the labeling quality is, the more the deep model can approximate or even surpass the human performance, for example, models such as an image model AlexNet, VGG, resnet trained by using a large-scale ImageNet image data set are similar to the human recognition accuracy. Currently, audio-visual data sets are various, so that the existing work is mainly focused on exploring semantic relations between audio and video by using a depth model. In order to meet the 6G immersive experience requirement, a large-scale and high-quality audio-visual touch data set is urgently needed to assist deep learning to complete tasks such as cross-modal coding, transmission, signal processing and the like. In addition, a great deal of research is focused on restoration and reconstruction between audio and video, and research on reconstructing haptic signals by using audio and video is still in a starting stage. Meanwhile, how to semantically characterize the haptic signals in different forms and how to design a universal and robust cross-modal signal reconstruction frame has become a difficulty in realizing 6G cross-modal application due to different structures and contents of the haptic signals collected by different sensors.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.

The present invention has been made in view of the above-mentioned problems with the conventional 6G-oriented haptic mode signal reconstruction method.

Therefore, the invention aims to provide a 6G-oriented haptic mode signal reconstruction method, which aims to solve the problem that the conventional medium video mode signal cannot be converted into a haptic mode signal.

In order to solve the technical problems, the invention provides the following technical scheme: A6G-oriented haptic mode signal reconstruction method comprises,

S1: collecting data samples, and constructing a data set containing video and touch mode signals;

S2: by utilizing the semantic relevance between two modal signals, a cross-modal signal reconstruction model with internal semantic relevance driving is constructed based on deep learning;

s3: training a cross-modal signal reconstruction model by using the data set until the quality of the reconstructed signal meets the requirement or the deviation cannot be optimized continuously.

As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: the data sample collection comprises the steps of selecting a collected sample and classifying the collected sample; selecting acquisition equipment, and synchronously setting the acquisition equipment; setting an acquisition mode, and acquiring video signals and touch signals of different samples in different states through acquisition equipment.

As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: the cross-modal signal reconstruction model comprises a feature extraction module, a signal reconstruction module, a signal identification module and a loss optimization module, wherein the feature extraction module extracts video semantic features after processing video frames of video signals; inputting the video semantic features into a signal reconstruction module, and obtaining a reconstructed touch signal after reconstruction processing; inputting the real touch signal and the reconstructed touch signal into a signal distinguishing module to distinguish true from false; and calculating the mean square error loss of the reconstructed haptic signal and the real haptic signal, generating the counterloss, and realizing gradient update of the parameters of the module by using a loss value through a counter propagation algorithm so as to optimize and generate the reconstructed signal with higher accuracy.

As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: in the feature extraction module, in the semantic feature extraction based on 3D CNN aiming at video signals, each video frame is subjected to scaling and clipping pretreatment; secondly, inputting the video frame image into a 3D Resnet50, and outputting video semantic features through multi-layer 3D convolution processing.

As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: when the real tactile signal is acquired, the real tactile signal is required to be preprocessed, wherein the real tactile signal comprises, aiming at the tactile signal in a time sequence form, using STFT to obtain a frequency spectrum, and separating a real part and an imaginary part of a complex number in a complex matrix to obtain a real tactile frequency spectrum S _.

As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: the signal reconstruction module comprises a step of reconstructing a frequency spectrum of a touch signal through processing of a deconvolution layer, a batch normalization layer and an activation function according to the output video semantic characteristics, and a step of obtaining the reconstructed touch signal in a time domain through inverse Fourier transform. As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: in the reconstruction processing process, the input video semantic features are sequentially processed by three deconvolution groups, wherein each deconvolution group comprises a deconvolution layer, a batch normalization layer and Relu activation functions; then a convolution group is processed, including a deconvolution layer, a batch normalization layer and a Tanh activation function; the deconvolution layer is expressed as: k= (k _h,k_w),p＝(p_h,p_w), s; where k= (k _h,k_w) represents the convolution kernel size, p= (p _h,p_w) represents the zero padding number, s represents the convolution kernel sliding step size, relu activation function is y=max (0, x), and Tanh activation function isX represents the output of the batch normalization layer in the deconvolution group.

As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: the signal discrimination module comprises two convolution groups, a full connection layer and a Sigmoid activation function, wherein the convolution groups comprise a convolution layer of 3×3, a batch normalization layer, a Relu activation function and a maximum pooling layer.

The Sigmoid activation function isAnd taking the output of the full connection layer as a function input x, and outputting the probability that the signal belongs to the real signal.

As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: the loss optimization module optimizes parameters of the feature extraction module, the signal reconstruction module and the signal discrimination module by adopting a combination of a generated counterloss function and a mean square error loss function, wherein,

The generation of the countermeasures loss function is:

Wherein E (-) is the desired function, G (-) and D (-) represent the haptic signal generation network and the haptic signal discrimination network, respectively, and P _data (-) represents the data distribution.

The mean square error loss function is expressed as:

Wherein s _i is equal to Representing the true haptic spectrum S and the reconstructed haptic spectrum/>, respectivelyN represents the number of elements in the spectrum.

As a preferable scheme of the 6G-oriented haptic mode signal reconstruction method, the invention comprises the following steps: the training adopts a random gradient descent method, the training round is 70, the initial learning rate is 0.001, the learning rate is continuously adjusted by using a cosine annealing regulator, and the batch processing amount is set to be 6.

The invention has the beneficial effects that:

In the invention, a modal dataset VisTouch comprising video and touch is constructed for a 6G cross-modal application scene; reconstructing the video modal signals with semantic relevance into haptic modal signals based on a deep learning technology; in order to improve the signal reconstruction quality, two types of loss functions, namely the countermeasures loss and the mean square error loss, are used as target functions, training is carried out based on VisTouch, and the accuracy of the reconstruction method is verified.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

fig. 1 is a VisTouch data acquisition diagram of the 6G-oriented haptic mode signal reconstruction method of the present invention.

Fig. 2 is a diagram of a video-assisted haptic signal reconstruction model for a 6G-oriented haptic mode signal reconstruction method of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Further, in describing the embodiments of the present invention in detail, the cross-sectional view of the device structure is not partially enlarged to a general scale for convenience of description, and the schematic is only an example, which should not limit the scope of protection of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Example 1

Referring to fig. 1-2, for a first embodiment of the present invention, a method for reconstructing a 6G-oriented haptic mode signal is provided, the reconstruction method comprising,

S1: data samples are acquired, and VisTouch datasets are constructed that contain video and haptic mode signals.

Specifically, the data sample collection includes the following steps:

s11: and selecting an acquired sample, and classifying the acquired sample.

Materials common in life and high in practical value are selected, 47 materials are summed and classified, and the materials are used as sample categories of the VisTouch data set constructed as shown in table 1. In addition, during the sample collection process, it can be observed that the same materials have different colors due to dyeing, processing and the like, for example, glass is classified into common glass and quartz glass, and colored glass and transparent glass, which pose a certain challenge to the cross-modal information processing. For this reason, for the same type of sample, collect multiple colors as far as possible, for example synthetic textile, collect samples of four colors such as red, yellow, blue, white, etc., for glass, collect samples such as colored glass, transparent glass, frosted glass, etc., in order to reduce the influence of color on experimental results.

Table 1 VisTouch sample categories contained in dataset

S12: and selecting acquisition equipment, and synchronously setting the acquisition equipment.

In order to collect the video and haptic signals simultaneously, a suitable camera and haptic sensor need to be selected. Specific parameters of the acquisition equipment, sampling rate, resolution, etc. used for VisTouch datasets are given in table 2.

Table 2 collecting device information

S13: setting an acquisition mode, and acquiring video signals and touch signals of different samples in different states through acquisition equipment.

The touch data acquisition means is used for controlling the manipulator to slidingly touch various materials, recording sliding friction force generated by friction between fingertips and the materials in the sliding touch process as touch signals, acquiring video signals by using a high-definition camera, and synchronizing the two signals by using a timestamp.

In addition, in order to ensure accurate and low-noise collection of the touch signal, the method starts from two aspects of (1) placing the mechanical arm on a desktop and giving a constant driving force downwards perpendicular to the desktop to the mechanical arm mounted at the tail end of the mechanical arm; (2) The acquisition material is sheet-shaped to ensure the normal direction of the driving force on the contact surface, thereby reducing the influence of the material shape factor on the acquisition signal.

The sliding touch track is provided with three types of linear sliding, curve sliding and folding line sliding, and meanwhile, the constant normal driving force is provided with three types of 3N, 6N and 9N, and the three types of constant normal driving force are combined with the sliding track in a crossing way, so that 9 sliding modes (such as folding line sliding touch under 3N driving force) can be provided in total.

S2: by utilizing the semantic relevance between the two modal signals, a cross-modal signal reconstruction model with internal semantic relevance driving is constructed based on deep learning.

Furthermore, the cross-modal signal reconstruction model comprises a feature extraction module, a signal reconstruction module, a signal discrimination module and a loss optimization module.

S21: the feature extraction module extracts video semantic features after processing video frames of the video signal.

Specifically, in extracting semantic features based on a 3D CNN (three-dimensional convolutional neural network) for a video signal, firstly, performing scaling and clipping preprocessing on each video frame; secondly, the video frame image is input into a 3D Resnet50 (three-dimensional residual network), and video semantic features F _R are output through multi-layer 3D convolution processing. The 3D Resnet50 can enable the learning curve to be converged rapidly by virtue of the unique residual design, meanwhile, the problem of gradient disappearance can be avoided, and the model size and the accuracy are both realized.

Let the input video signal be a 5-dimensional tensor I e R ^{N×T×C×H×W}, where N is the batch, T represents the video frame number, C represents the image channel number, and for RGB images c=3, H and W represent the height and width of the image, respectively, here the scaling and cropping preprocessing is performed on each video frame image so that the image size is unified to 224×224, i.e. h=w=224. Secondly, inputting I into the 3D resnet50, performing multi-layer 3D convolution processing, and outputting a feature map of F e R ^{N'×T'×C'×H'×W'}, wherein for the 3D resnet50, T '=2, C' =2048, h '=w' =7, in order to facilitate the processing of a subsequent haptic signal reconstruction module, the reconstruction method performs shape transformation on F to obtain a four-dimensional tensor F _R∈R^{N'×T'C'×H'×W'},F_R, wherein T 'C' =2×2048=4096 represents video semantic features.

For the haptic signals, preprocessing is required to be performed on the real haptic signals after the acquisition, including, for the haptic signals in a time series form, obtaining a frequency spectrum by using an STFT (short time fourier transform), wherein in the STFT, the sampling frequency is set to 1000Hz, the window width is 50, so as to obtain a complex matrix with the size of 26×41, and separating a real part and an imaginary part of the complex, so as to obtain a real haptic frequency spectrum S with the size of 2×26×41.

S22: inputting the video semantic features into a signal reconstruction module, and obtaining a reconstructed touch signal after reconstruction processing.

In particular, the present embodiment utilizes a combination of deconvolution, batch normalization, and linear activation functions to achieve cross-modal signal mapping from small to large, high to low, and semantic to target domains. According to the output video semantic features, reconstructing the frequency spectrum of the haptic signal through processing of a deconvolution layer, a batch normalization layer and an activation function, and obtaining the haptic mode signal in the time domain through Fourier inverse transformation.

The reconstruction module is provided with five layers of sub-modules, wherein the first layer is an input layer, the second to fourth layers are combinations of deconvolution layers, batch normalization layers and activation functions and are used for reconstructing the height and the width of the spectrogram, and the fifth layer is a convolution group and is used for reconstructing the channel dimension of the spectrogram.

In the reconstruction process, the input video semantic features are sequentially processed by three deconvolution groups (second to fourth layers), each deconvolution group including a deconvolution layer, a batch normalization layer, and Relu activation functions, as shown in table 3; and then processed by a convolution set (fifth layer).

The deconvolution layer is expressed as: k= (k _h,k_w),p＝(p_h,p_w), s; wherein, the activation function is used for enhancing the nonlinear characterization capability of the module, the Relu function is placed at the end of the three deconvolution groups, x represents the output of the batch normalization layers in the deconvolution groups, and the Tanh function is placed at the end of the whole module and is used for generating a reconstructed touch spectrum consistent with the real spectrum distribution range.

Table 3 the signal reconstruction module specifically comprises the following structure:

Where k= (k _h,k_w) represents the convolution kernel size, p= (p _h,p_w) represents the zero padding number, s represents the convolution kernel sliding step size, relu activation function is y=max (0, x), and Tanh activation function is X represents the output of the batch normalization layer in the deconvolution group.

The signal reconstruction module in this embodiment specifically includes the following parameters:

TABLE 4 network parameters for haptic signal generation (ignore batch N)

S23: and inputting the real touch signal and the reconstructed touch signal into a signal distinguishing module to distinguish the true touch signal from the false touch signal.

Specifically, the signal discrimination module has two convolution groups, a full connection layer and a Sigmoid activation function, wherein the convolution groups comprise a convolution layer of 3×3, a batch normalization layer, a Relu activation function and a maximum pooling layer.

Sigmoid activation function isAnd taking the output of the full connection layer as a function input x, and outputting the probability that the signal belongs to the real signal.

Further, the real tactile spectrum v is combined with the reconstructed tactile spectrum generated by the signal reconstruction moduleAs the input of the signal distinguishing module, S and/> areobtained respectively through the processing of two convolution groupsCorresponding discrimination vectors v and/>Then, v is combined with/>Respectively inputting the real signals into a full connection layer and a Sigmoid function, and outputting the probability that S and S are real signals; in the network training process, we judge S as true as possible, i.e. the probability is as close to 1 as possible, and will/>The binary true and false discrimination is realized by discriminating false as far as possible, namely, the probability is as close to 0 as possible.

S24: and calculating the mean square error loss of the reconstructed haptic signal and the real haptic signal, generating the counterloss, and realizing gradient update of the parameters of the module by using a loss value through a counter propagation algorithm so as to optimize and generate the reconstructed signal with higher accuracy.

Specifically, the loss optimization module optimizes parameters of the feature extraction module, the signal reconstruction module and the signal discrimination module by adopting a combination of generating an anti-loss function and a mean square error loss function, wherein,

The generation of the countermeasures loss function is:

The mean square error loss function is expressed as:

The evaluation module is used for evaluating whether the reconstructed signal is consistent with the real signal or not, and meanwhile, in the training process, the deviation of the reconstructed signal and the real signal can be subjected to gradient back propagation, training parameters of the feature extraction module and the reconstruction module are adjusted until the quality of the reconstructed signal meets the requirement or the deviation can not be continuously optimized, and the whole reconstruction model is used for mining the inherent semantic relevance among the multi-mode signals and finally generating the accurate and low-noise reconstructed signal.

Specifically, the training adopts a random gradient descent method, the training round is 70, the initial learning rate is 0.001, and the learning rate is continuously adjusted by using a cosine annealing adjuster, and the batch processing amount is set to be 6. Further, the 3D CNN input size is 224×224, and the entire model is programmatically developed using Pytorch deep learning framework. On the hardware configuration, a single RTX 2080Ti graphic card is used for model training until the two loss functions are converged at the same time.

Example 2

In order to verify and explain the technical effect of the reconstruction method, as the method completes the haptic reconstruction work by utilizing VisTouch data sets for the first time and has no published reference model, the embodiment reduces the haptic reconstruction model assisted by the provided video to obtain the following two models as comparison references:

Model 1: the model structure is not changed, and the model is trained only by generating an antagonism loss function;

model 2: the haptic signal discrimination network is removed and the model is trained using only the mean square error loss function.

After determining the comparison standard, an evaluation index needs to be introduced to test the output result, and the present embodiment uses two evaluation indexes, i.e. average absolute error (MAE) and Accuracy (ACC) for measurement.

MAE: since the representation form of the haptic signal is time series, from the signal itself, the reconstructed haptic time signal is assumed to be TAnd if the sample capacity is M, the MAE calculation formula is as follows:

MAE is used to evaluate the absolute deviation of the reconstructed signal from the true signal.

ACC: firstly, a sample class classifier is pre-trained by using a real signal, after training is completed, a reconstruction signal is input, whether a discrimination result of the reconstruction signal on a sample class is consistent with the real sample class is checked, so that the accuracy ACC is counted, and in the embodiment, the classifier is realized by a multi-layer perceptron.

The statistical results of the model comparison experiments are shown in table 5, and it can be seen that the reconstruction accuracy of the reconstruction model is obviously improved in terms of structure and loss function design compared with the reconstruction accuracy of models 1 and 2.

Table 5 model comparison experiment results

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. A6G-oriented haptic mode signal reconstruction method is characterized by comprising the following steps: comprising the steps of (a) a step of,

Collecting data samples, and constructing a data set containing video and touch mode signals;

by utilizing the semantic relevance between two modal signals, a cross-modal signal reconstruction model with internal semantic relevance driving is constructed based on deep learning;

the cross-modal signal reconstruction model comprises a feature extraction module, a signal reconstruction module, a signal discrimination module and a loss optimization module, wherein,

The feature extraction module extracts video semantic features after processing video frames of the video signals;

Inputting the video semantic features into a signal reconstruction module, and obtaining a reconstructed touch signal after reconstruction processing;

inputting the real touch signal and the reconstructed touch signal into a signal distinguishing module to distinguish true from false;

Calculating the mean square error loss of the reconstructed haptic signal and the real haptic signal, generating the counterloss, and realizing gradient update of the parameters of the modules by the loss value through a back propagation algorithm so as to optimize and generate the reconstructed signal with higher accuracy;

The signal reconstruction module comprises a signal reconstruction module, which comprises,

Reconstructing the frequency spectrum of the haptic signal through the processing of a deconvolution layer, a batch normalization layer and an activation function according to the output video semantic features, and obtaining the reconstructed haptic signal in a time domain through Fourier inverse transformation;

During the course of the reconstruction process,

The method comprises the steps that input video semantic features are processed through three deconvolution groups in sequence, wherein each deconvolution group comprises a deconvolution layer, a batch normalization layer and Relu activation functions;

Then output after processing by a convolution group, wherein the convolution group comprises a deconvolution layer, a batch normalization layer and a Tanh activation function;

The deconvolution layer is expressed as: ，/>，s；

wherein, Representing convolution kernel size,/>Representing the number of zero padding, s representing the sliding step of the convolution kernel, relu activating the function as/>The Tanh activation function is/>X is the output of the batch normalization layer in the deconvolution group;

The signal distinguishing module comprises two convolution groups, a full connection layer and a Sigmoid activation function, wherein the convolution groups comprise a convolution layer with the size of 3 multiplied by 3, a batch normalization layer, a Relu activation function and a maximum pooling layer;

the Sigmoid activation function is Taking the output of the full connection layer as a function input x, and outputting the probability that the signal belongs to a real signal;

training a cross-modal signal reconstruction model by using the data set until the quality of the reconstructed signal meets the requirement or the deviation cannot be optimized continuously.

2. The 6G-oriented haptic mode signal reconstruction method of claim 1, wherein: the acquiring of the data samples includes,

Selecting an acquired sample, and classifying the acquired sample;

selecting acquisition equipment, and synchronously setting the acquisition equipment;

Setting an acquisition mode, and acquiring video signals and touch signals of different samples in different states through acquisition equipment.

3. The 6G-oriented haptic mode signal reconstruction method of claim 2, wherein: in the feature extraction module,

In the semantic feature extraction based on 3D CNN, firstly, scaling and clipping preprocessing is carried out on each video frame aiming at the video signal; secondly, inputting the video frame image into a 3D Resnet50, and outputting video semantic features through multi-layer 3D convolution processing.

4. A method of 6G-oriented haptic mode signal reconstruction as recited in claim 3 wherein: the real tactile signal is collected, preprocessed, including,

For the haptic signal in the time sequence form, the frequency spectrum is obtained by using STFT, and the real part and the imaginary part of the complex number in the complex matrix are separated to obtain the real haptic frequency spectrum S.

5. The 6G-oriented haptic mode signal reconstruction method of claim 4, wherein: the loss optimization module optimizes parameters of the feature extraction module, the signal reconstruction module and the signal discrimination module by adopting a combination of a generated counterloss function and a mean square error loss function, wherein,

The generation of the countermeasures loss function is:

；

wherein, Is a desired function,/>And/>Representing a haptic signal generation network and a haptic signal discrimination network, respectively,/>Representing a data distribution;

The mean square error loss function is expressed as:

；

wherein, And/>Representing the true haptic spectrum S and the reconstructed haptic spectrum/>, respectively(1 /)Element of individual position,/>Representing the number of elements in the spectrum.

6. The 6G-oriented haptic mode signal reconstruction method of claim 5, wherein: the training adopts a random gradient descent method, the training round is 70, the initial learning rate is 0.001, the learning rate is continuously adjusted by using a cosine annealing regulator, and the batch processing amount is set to be 6.