CN116778058B

CN116778058B - Intelligent interaction system of intelligent exhibition hall

Info

Publication number: CN116778058B
Application number: CN202311052551.8A
Authority: CN
Inventors: 王卫文; 张勇; 钟玉; 钟林; 陈军
Original assignee: Shenzhen Kesai Logo Intelligent Technology Co ltd; Shenzhen University
Current assignee: Shenzhen Kesai Logo Intelligent Technology Co ltd; Shenzhen University
Priority date: 2023-08-21
Filing date: 2023-08-21
Publication date: 2023-11-07
Anticipated expiration: 2043-08-21
Also published as: CN116778058A

Abstract

The invention relates to the technical field of projection, in particular to an intelligent interaction system of an intelligent exhibition hall. Aiming at the problem of poor viewing experience caused by a single interaction mode, the adopted scheme is as follows: the three-dimensional display system comprises a voice interaction module, a somatosensory interaction module and a 3D display module, wherein the voice interaction module is used for enabling a viewer to interact with an exhibit through self sound, the somatosensory interaction module is used for enabling the viewer to interact with the exhibit through limb actions, and the 3D display module is used for providing three-dimensional images of the exhibit for the viewer. Through the scheme, the interaction modes such as sound, images and actions can be combined, the observation behavior and interaction request of the observer can be captured conveniently, the observer can obtain personalized observation experience and stronger participation, a three-dimensional image with higher reconstruction speed can be provided for the observer, and more intelligent, convenient and humanized observation experience is provided for the observer.

Description

Intelligent interaction system of intelligent exhibition hall

Technical Field

The invention relates to the technical field of projection, in particular to an intelligent interaction system of an intelligent exhibition hall.

Background

At present, many intelligent exhibition hall interaction systems only support a single interaction mode, such as a touch screen or voice control, and most interaction technologies have limitations, such as visual recognition is sensitive to light rays, angles and the like, and voice recognition has high requirements on voice environments and the like; therefore, only a single interaction mode is supported, so that not only is it difficult for the observer to select a corresponding interaction mode according to the own requirement, but also the interaction requirement of the observer is also difficult to be captured quickly and accurately, thereby resulting in poor experience of the observer.

Therefore, there is still a need for improvement in the existing interactive system to solve the problem that the single interactive mode results in poor viewing experience.

Disclosure of Invention

The invention mainly aims to provide an intelligent interaction system of an intelligent exhibition hall, which aims to solve the problem of poor observation experience caused by a single interaction mode.

To achieve the above object, the present invention provides an intelligent interaction system for an intelligent exhibition hall, comprising:

the voice interaction module is used for enabling the observers to interact with the exhibits through own voice;

the somatosensory interaction module is used for the observers to interact with the exhibits through limb actions;

the 3D display module is used for providing a three-dimensional image of the exhibit to the observer and comprises a stereoscopic vision three-dimensional reconstruction module based on parallax;

wherein, the three-dimensional reconstruction module of stereoscopic vision based on parallax includes:

the first acquisition unit is used for acquiring three-dimensional point cloud data of the exhibit based on the LiDAR depth sensor;

the first generation unit is used for generating a reference image and a source image according to the three-dimensional point cloud data;

a first determination unit for determining key frames and geometric metadata of the reference image and the source image based on the multi-view depth estimator;

the integration unit is used for integrating the key frames and the geometric metadata into the matching cost body based on the multi-layer perceptron;

An output unit configured to output a first feature based on a first network and a second feature based on a second network; the first network is a two-dimensional coding and decoding convolutional network, the second network is a neural network, and the inputs of the first network and the second network are matching cost bodies;

the fusion unit is used for carrying out feature fusion on the first feature and the second feature to obtain a third feature;

and the integration unit is used for carrying out depth plane integration according to the third characteristic to obtain a three-dimensional reconstructed exhibit image.

The intelligent interaction system provided by the invention comprises a basic voice interaction module, a somatosensory interaction module and a 3D display module. Specific:

(1) Through the voice interaction module, the observers can interact with the exhibits through own voice;

(2) Through the somatosensory interaction module, observers can interact with the exhibits through limb actions, the somatosensory interaction module can be designed into desktop interaction or ground interaction or wall interaction, the desktop interaction is fusion of virtual and reality, the exhibits or interface systems are projected on the desktop, various content displays are obtained through touch and gestures, the wall interaction can be divided into single-point wall interaction, multi-point wall interaction, induction wall interaction and the like, and the ground interaction can bring immersive interaction experience;

(3) Through the 3D display module, a viewer can see a three-dimensional image of an exhibit, the 3D display module comprises a parallax-based stereoscopic vision three-dimensional reconstruction module, the parallax-based stereoscopic vision three-dimensional reconstruction module comprises a first acquisition unit, a first generation unit, a first determination unit, an integration unit, an output unit, a fusion unit and an integration unit, and more particularly, (3.1) the first acquisition unit acquires three-dimensional point cloud data of the exhibit by utilizing a radar technology so as to prepare for generating a reference image and a source image; (3.2) the first generation unit generates a reference image and a source image according to the three-dimensional point cloud data so as to be used for determining key frames and geometric metadata of the reference image and the source image later; (3.3) the first determining unit determining key frames and geometric metadata of the reference image and the source image based on the multi-view depth estimator for subsequent integration of the key frames and the geometric metadata into a matching Cost Volume, which is a parameter for measuring similarity of left and right views in binocular matching, is a left and right parallax search space in the stereo matching problem; (3.4) the integration unit integrates the key frames and the geometric metadata into the matching cost body based on the multi-layer perceptron so as to utilize the matching cost body output characteristics later; (3.5) the output unit outputs the first feature based on the first network and outputs the second feature based on the second network for subsequent feature fusion; (3.6) the fusion unit performs feature fusion on the first feature and the second feature to obtain a third feature, so that depth plane integration is performed later, and more representative features can be learned by performing feature fusion; and (3.7) an integration unit for performing depth plane integration according to the third characteristic to obtain a three-dimensional reconstructed exhibit image. Compared with the traditional three-dimensional reconstruction method based on binocular/multi-view vision (the matching points of the pictures are found by utilizing the corrected images, and then the three-dimensional images of related exhibits are restored according to the geometric principle), the parallax-based stereoscopic three-dimensional reconstruction module has obvious advantages in the aspect of depth estimation, and the reconstruction speed is very high.

According to the specific description, the intelligent interaction system combines interaction modes such as sound, images and actions, so that the observation behavior and interaction request of the observer can be captured conveniently, the observer can obtain personalized observation experience and stronger participation, the reconstruction speed of the three-dimensional image is faster, and more intelligent, convenient and humanized observation experience is provided for the observer.

Drawings

FIG. 1 is a block diagram of an intelligent interaction system for an intelligent exhibition hall according to an embodiment of the present invention;

FIG. 2 is a block diagram of an intelligent interaction system for an intelligent exhibition hall according to a second embodiment of the present invention;

FIG. 3 is a block diagram of an intelligent interaction system for an intelligent exhibition hall according to a third embodiment of the present invention;

fig. 4 is a component structure diagram of an intelligent interaction system of an intelligent exhibition hall according to a fourth embodiment of the present invention;

FIG. 5 is a block diagram of an intelligent interaction system for an intelligent exhibition hall according to an embodiment of the present invention;

fig. 6 is a component structure diagram of an intelligent interaction system of an intelligent exhibition hall according to a tenth embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1

The embodiment of the invention provides an intelligent interaction system of an intelligent exhibition hall, as shown in fig. 1, comprising:

the voice interaction module 10 is used for the observers to interact with the exhibits through own voice;

the somatosensory interaction module 11 is used for the observers to interact with the exhibits through limb actions;

a 3D display module 12 for providing a three-dimensional image of an exhibit to a viewer, comprising a parallax-based stereoscopic three-dimensional reconstruction module 13;

wherein the parallax-based stereoscopic three-dimensional reconstruction module 13 includes:

a first acquisition unit 14, configured to acquire three-dimensional point cloud data of an exhibit based on a LiDAR depth sensor;

a first generation unit 15 for generating a reference image and a source image from the three-dimensional point cloud data;

a first determining unit 16 for determining key frames and geometric metadata of the reference image and the source image based on the multi-view depth estimator;

An integration unit 17, configured to integrate the key frame and the geometric metadata into a matching cost body based on the multi-layer perceptron;

an output unit 18 for outputting the first feature based on the first network and outputting the second feature based on the second network; the first network is a two-dimensional coding and decoding convolutional network, the second network is a neural network, and the inputs of the first network and the second network are matching cost bodies;

a fusion unit 19, configured to perform feature fusion on the first feature and the second feature to obtain a third feature;

and the integrating unit 20 performs depth plane integration according to the third characteristic to obtain a three-dimensional reconstructed exhibit image.

The intelligent interaction system provided in this embodiment includes a basic voice interaction module 10, a somatosensory interaction module 11 and a 3D display module 12. Specific:

(1) Through the voice interaction module 10, the observers can interact with the exhibits through own voice;

(2) Through the somatosensory interaction module 11, observers can interact with the exhibits through limb actions, the somatosensory interaction module 11 can be designed into desktop interaction or ground interaction or wall interaction, the desktop interaction is fusion of virtual and reality, the exhibits or interface systems are projected on the desktop, various content display is obtained through touch and gestures, the wall interaction can be divided into single-point wall interaction, multi-point wall interaction, induction wall interaction and the like, and the ground interaction can bring immersive interaction experience;

(3) Through the 3D display module 12, a viewer can see a three-dimensional image of an exhibit, and the 3D display module 12 comprises a parallax-based stereoscopic three-dimensional reconstruction module 13, wherein the parallax-based stereoscopic three-dimensional reconstruction module 13 comprises a first acquisition unit 14, a first generation unit 15, a first determination unit 16, an integration unit 17, an output unit 18, a fusion unit 19 and an integration unit 20, more specifically, (3.1) the first acquisition unit 14 acquires three-dimensional point cloud data of the exhibit by using radar technology so as to prepare for generating a reference image and a source image; (3.2) the first generation unit 15 generates a reference image and a source image from the three-dimensional point cloud data for subsequent determination of key frames and geometric metadata of the reference image and the source image; (3.3) the first determining unit 16 determines key frames and geometric metadata of the reference image and the source image based on the multi-view depth estimator in preparation for subsequent integration of the key frames and the geometric metadata into a matching Cost Volume, which is a parameter for measuring similarity of left and right views in binocular matching, is a left and right disparity search space in the stereo matching problem; (3.4) the integration unit 17 integrates the key frame and the geometric metadata into the matching cost body based on the multi-layer perceptron, for subsequent use of the matching cost body output characteristics; (3.5) the output unit 18 outputting the first feature based on the first network and outputting the second feature based on the second network for subsequent feature fusion; (3.6) the fusion unit 19 performs feature fusion on the first feature and the second feature to obtain a third feature, so that the depth plane integration can be performed later, and more representative features can be learned by performing feature fusion; and (3.7) an integration unit 20 for performing depth plane integration according to the third feature to obtain a three-dimensional reconstructed exhibit image. Compared with the conventional three-dimensional reconstruction method based on binocular/multi-view (the matching points of the pictures are found by using the corrected images, and then the three-dimensional images of related exhibits are restored according to the geometric principle), the parallax-based stereoscopic three-dimensional reconstruction module 13 has obvious advantages in terms of depth estimation and has very high reconstruction speed.

Optionally, the intelligent interaction system comprises a database, (1) aiming at data storage and management of the database, the database needs to store a large amount of exhibit data (such as historical background and scientific technology), model files (such as a 3D model), parameter information and the like, and the database can well store and manage the exhibit data to be the basis of exhibit information display; therefore, when designing the database, proper table structures and fields need to be designed according to the requirements and the data types; (2) For backup and recovery of the database, factors such as importance and recovery speed of the data need to be considered, and the backup should be performed regularly to prevent data loss; (3) Aiming at the performance optimization of the database, the factors such as data access speed, query efficiency and the like need to be considered, and the database is targeted to be indexed and partitioned so as to improve the data query efficiency; meanwhile, the database is monitored and optimized to ensure the stable performance of the system; (4) After running for a period of time, interaction data of observers can be obtained, and behaviors, preferences and the like of the observers are analyzed through a big data analysis technology, so that better reference and decision basis are provided for managers of the intelligent exhibition hall.

Optionally, in addition to bidirectional interaction, unidirectional pushing and multi-user cooperative interaction are also possible, and the display information is pushed to the observers randomly through unidirectional pushing.

Optionally, the intelligent interaction system further includes a scene restoration module, where the scene restoration module includes:

a scene creation unit for creating a scene in the virtual exhibition hall;

a tenth acquisition unit for acquiring real data of a real exhibition hall; the real data comprise scene data and exhibit data;

the preprocessing unit is used for preprocessing the real data and obtaining fourth data corresponding to the real data;

a seventh generating unit for generating a three-dimensional model from the fourth data; wherein the three-dimensional model corresponds to a scene or an exhibit;

the rendering unit is used for rendering the three-dimensional model according to the real data;

an adding unit for adding picture control logic and a viewing path for a viewer in the virtual exhibition hall;

and the display unit is used for displaying the virtual exhibition hall.

Optionally, (1) the scene creation unit creates a scene by using Unity 3D software, adds elements such as a light source, shadows, ground and the like in the scene, and can be developed by combining an image rendering library such as OpenGL (Open Graphics Library), so that the exhibition hall scene and the exhibition effect are more stereoscopic and real; (2) The tenth acquisition unit is at least one of a laser radar, a camera and a scanner; (3) The preprocessing unit processes the acquired real data by utilizing three-dimensional reconstruction software; (4) A seventh generating unit generates a three-dimensional model by using three-dimensional reconstruction software; (5) The rendering unit performs 3D rendering by using Unity 3D software, designs textures, materials and the like on the three-dimensional model, simulates light and environment, and renders a real 3D scene; (6) The picture control logic added by the adding unit is scripts and animations, so that the display can be rotated, zoomed, moved and the like in the display process to display different angles and characteristics of the display, the display path is fixed, a mobile camera is used for shooting along the display path, the display picture is continuously played in a film fragment mode, and then the lens of the mobile camera is controlled to randomly switch each scene; (7) The display unit displays the virtual exhibition hall in the form of roaming animation.

Example two

The embodiment of the invention also provides an intelligent interaction system of the intelligent exhibition hall, which is the same as the first embodiment and is not repeated, and the difference is as shown in fig. 2, and the intelligent interaction system of the intelligent exhibition hall further comprises: a second acquisition unit 21 for acquiring the first data; the first data is used for assisting in judging interaction tendency of the observers, and the first data is at least one of exhibit information, observer information and exhibition hall environment information.

Besides performing interactive control based on the data of the voice interaction module 10, the somatosensory interaction module 11 and the 3D display module 12, the second acquisition unit 21 can acquire the first data, and the state of the observers, the exhibitors and the exhibition halls in the exhibition halls can be more comprehensively perceived by combining the first data so as to realize more intelligent interaction. The exhibition hall environment information comprises, but is not limited to, temperature and humidity, light rays and environment volume, and the observer information comprises, but is not limited to, the moving speed of the observer.

Example III

The embodiment of the invention also provides an intelligent interaction system of the intelligent exhibition hall, which is the same as the first embodiment and is not repeated, and the difference is as shown in fig. 3, and the intelligent interaction system of the intelligent exhibition hall further comprises:

A typing command interaction module 22 for the spectator to interact with the exhibits by manually typing commands;

an immersive interaction module 23 for providing an immersive interaction with the exhibit to the observer;

a master control module 24;

a communication module 25;

the voice interaction module 10, the somatosensory interaction module 11, the 3D display module 12, the typing command interaction module 22 and the immersion interaction module 23 are all communicated with the master control module 24 through the communication module 25; the voice interaction module 10, the somatosensory interaction module 11, the typing command type interaction module 22 and the immersion interaction module 23 are all used for receiving an interaction request sent by a viewer, forwarding the interaction request to the master control module 24, receiving a feedback instruction, and feeding back content corresponding to the feedback instruction to the viewer; the feedback instructions are generated and forwarded by the master control module 24 in response to the interaction request.

Specifically, (1) the input command type interaction module 22 can allow the observer to interact with the exhibited article by manually inputting commands, including but not limited to clicking a mouse, clicking a keyboard key, etc., the input command type interaction module 22 is not easily interfered by environment and external signals, has stable performance, and can provide interaction functions for a long time; (2) The immersive interaction module 23 can present the exhibits in a digital form for the observers to play the interactive games, and by providing the immersive interactions for the observers, the participation of the observers can be improved, so that the observers obtain personalized observation experience, and the satisfaction of the observers is improved; (3) The master control module 24 can receive the interaction request, generate a feedback instruction according to the interaction request, and forward the feedback instruction so as to trigger each module to timely respond to the observer; (4) The communication module 25 can provide communication support for each module, so that each module can cooperate to provide more intelligent interaction function.

Example IV

The embodiment of the present invention further provides an intelligent interaction system of an intelligent exhibition hall, which is the same as the first embodiment, and details are not repeated, wherein the difference is that, as shown in fig. 4, the 3D display module 12 further includes a holographic projection module 26, and the holographic projection module 26 is a film holographic projection module or a digital holographic projection module.

Specifically, the holographic projection technique used by the holographic projection module 26 is a technique that uses the interference principle to record an image on an interference pattern, and then projects the pattern into space by a laser or other light source to form a stereoscopic three-dimensional projection, which can record and reproduce all three-dimensional information of an object, including information of color, texture, shape, depth, and the like. Therefore, the holographic projection technology can provide a true and vivid display effect.

The film holographic projection module is used for recording light rays of a shot object on a glass plate coated with a photosensitive film by utilizing laser, then the recorded light rays are projected by the laser, so that a stereoscopic image is formed, the resolution of the film holographic projection module is high, the stereoscopic image with high resolution can be projected, the projected image details are clearer and more vivid, the film holographic projection module also has strong reproducibility, the photosensitive film on the glass plate is used as a recording medium, the recorded light information can be stored for a long time, the projection can be repeatedly read and reproduced, and the image quality cannot be influenced.

The digital holographic projection module generates a three-dimensional image in advance by using a computer, and projects the three-dimensional image onto a specially-made transparent medium through laser to form a stereoscopic image, and the digital holographic projection module can realize real-time interaction and dynamic change.

Example five

The embodiment of the invention also provides an intelligent interaction system of the intelligent exhibition hall, which is the same as the first embodiment and is not repeated, and the difference is that the intelligent interaction system of the intelligent exhibition hall further comprises:

the entity tag (not shown in the figure) corresponds to the exhibited item, has a preset length from the entity tag, and is provided with a bar code or a two-dimensional code for the mobile terminal held by the exhibited person to recognize; wherein, the bar code or the two-dimensional code stores the exhibit information.

After the entity tag is added, a viewer can scan the bar code or the two-dimensional code on the entity tag through the held mobile terminal to check the information of the exhibits, such as text introduction, photo information and video information, and can perform three-dimensional reconstruction of a model on part of the exhibits, so that the overall appearance of the exhibits can be checked at the mobile terminal, and the method is simple and convenient; the spectator can also control the scaling, sliding, rotation and the like of the exhibits in the mobile terminal through gestures, and more detail information of the exhibits is obtained through changing the exhibiting position and the exhibiting proportion of the exhibits, so that the spectator's spectator experience is improved.

Example six

The embodiment of the present invention further provides an intelligent interaction system for an intelligent exhibition hall, which is the same as the first embodiment, and is not described in detail, and the difference is that the voice interaction module 10 includes:

a voice detection module for detecting a sound signal of an observer based on the microphone array;

the noise reduction module is used for carrying out noise reduction processing on the sound signals and obtaining noise reduction signals;

the text generation module is used for generating text content corresponding to the noise reduction signal based on the automatic voice recognition tool;

the voice command analysis module is used for analyzing text content based on a TCResNet network and generating a voice command corresponding to the text content;

and the voice navigation module is used for providing navigation content for observers, wherein the navigation content comprises feedback content corresponding to the voice command.

Specifically, (1) the microphone array adopted by the voice detection module is an array formed by regularly arranging a plurality of omnidirectional microphones positioned at different positions in space according to a certain shape (such as arc shape or linear shape) and is used for detecting sound signals propagated in space; aiming at the space position information, the space position information can be acquired according to detected sound signals, specifically, the time difference between reaching different array elements is calculated by utilizing a microphone array to realize sound source arrival direction estimation, so that the sound source position is automatically detected, and the sound source positioning information can be used for intelligent interaction and subsequent voice enhancement on the sound source direction; for output, the microphone array is commonly used for limiting the output by an automatic gain control technology, and the loudness of the voice can be adjusted by changing the compression ratio of the input and the output and automatically controlling the amplitude of the gain, so that more real and natural voice interaction can be realized based on the microphone array; (2) Because the sound signals detected by the microphone array have certain noise, such as interference signals of non-human sound, background music, sound signals of non-interactors, reverberation, echo and the like, noise reduction processing is needed; (3) The automatic voice recognition tool adopted by the text generation module supports recognition and conversion of multiple languages, and the recognition and conversion speed is high; (4) The tcres net network used by the voice command parsing module is a time convolution neural network capable of performing real-time keyword detection, applies time convolution (i.e., one-dimensional convolution along a time dimension), uses mel-frequency cepstrum coefficients (MFCCs, which convert original voice into a time-frequency representation) as input channels, and applies zero padding to match input and output resolutions assuming a stride of 1. Since convolutional neural networks typically use small kernels, it is difficult to capture information features from low and high frequencies using relatively shallow networks, so transforming two-dimensional data into one dimension, taking mel-frequency coefficients per frame as a time series of data, rather than an intensity or grayscale image, can increase the perceptual domain of the audio features. The advantage of time convolution is utilized, so that the accuracy of keyword detection mobile model is improved and the delay is reduced. (5) Through the voice navigation module, reliable guidance can be provided for observers at the user interface layer, the operation and browsing of the user interface layer and the interface display of exhibits are perfected, reliable voice prompts can be provided for the observers in the virtual exhibition hall or the real exhibition hall, more accurate navigation route and time estimation are provided for the observers, the navigation experience of the observers is optimized, and the observation efficiency is improved.

As a supplementary explanation, the main purpose of keyword detection is to identify whether a specific word appears in a sound signal and where the specific word appears. The traditional keyword detection system generally comprises an acoustic feature extraction module, an acoustic model, a language model and a decoder, wherein the acoustic feature extraction module is utilized to preprocess sound signals, and then data in the decoder, the acoustic model and the language model are compared to obtain a result. The traditional keyword detection system needs to be designed in a split mode, the design is relatively complex, recognition delay is increased, and meanwhile, the recognition range of different voices is limited by the model.

Alternatively, the voice is played using a cross-platform audio library OpenAL (Open Audio Library).

Example seven

The embodiment of the invention also provides an intelligent interaction system of the intelligent exhibition hall, which is the same as the sixth embodiment and is not repeated, and the difference is that the noise reduction module comprises:

the first extraction unit is used for analyzing the intensity and the frequency spectrum distribution of the sound of the observer and the noise in the sound signal based on the digital filter to obtain the characteristic information of the sound of the observer and the noise;

the noise suppression unit is used for comparing the spectrum distribution of the sound and the noise of the observer, suppressing the spectrum of the noise according to the comparison result and obtaining a noise reduction signal;

The text generation module comprises:

the first conversion unit is used for carrying out analog-to-digital conversion, framing, windowing, fourier transformation, spectrum calculation and feature extraction on the noise reduction signal, and obtaining a feature matrix;

the second generation unit is used for receiving the feature matrix input by the first conversion unit and generating a corresponding pinyin tag sequence;

and the second conversion unit is used for converting the phonetic label sequence into text content.

Specifically, the noise reduction module adopts an automatic noise suppression technology, (1) the first extraction unit performs feature extraction by using a digital filter, so that accurate basic data can be provided for subsequent noise suppression; (2) The noise suppression unit is used for obtaining a noise reduction signal by comparing the spectrum distribution of the sound and the noise of the observer and suppressing the spectrum of the noise according to the comparison result, so that the influence of the noise on the sound of the observer can be effectively reduced, and the definition and the audibility of the audio are improved;

in the text generation module, (1) the first conversion unit performs processing such as analog-to-digital conversion, framing, windowing, fourier transformation, spectrum calculation, feature extraction and the like on the noise reduction signal, can convert the noise reduction signal to obtain a feature matrix, and provides feature information for subsequent sequence generation; (2) The second generating unit receives the feature matrix input by the first converting unit and generates a corresponding pinyin tag sequence, and the voice content in the audio can be accurately marked by generating the pinyin tag sequence, so that preparation is made for text conversion; (3) The second conversion unit converts the phonetic label sequence into text content, so that the conversion of voice into text is realized, and the understanding and the utilization of audio information are facilitated.

Example eight

The embodiment of the invention also provides an intelligent interaction system of the intelligent exhibition hall, which is the same as the first embodiment and is not repeated, and the difference is as shown in fig. 5, and the intelligent interaction system of the intelligent exhibition hall further comprises:

an exhibit identification module 27 for identifying the exhibit and obtaining an exhibit identification result;

the gesture recognition module 28 is configured to recognize a gesture of the observer and obtain a gesture recognition result;

a human body posture estimation module 29, configured to perform human body posture estimation on the observer, and obtain a human body posture estimation result;

the track generation module 30 is configured to generate a motion track of the observer within a preset period.

Specifically, (1) the exhibit identification module 27 uses an ultrasonic detection technology or an image identification technology to identify the exhibit, and after obtaining the exhibit identification result, the exhibit identification module can be used for triggering corresponding actions; (2) The gesture recognition module 28 recognizes the gesture of the observer by using an image recognition technology, and after the gesture recognition result is obtained, the gesture recognition module can be used for triggering corresponding actions; (3) The human body posture estimation module 29 performs human body posture estimation on the observer by using an image recognition technology or a radar detection technology, and after obtaining a human body posture estimation result, the human body posture estimation module can be used for triggering corresponding actions; (4) The track generation module 30 generates a motion track of the observer in a preset period by using a sensor detection technology or an image recognition technology, and after the motion track is obtained, the motion track can be used for triggering corresponding actions.

According to the scheme, various monitoring modes such as gestures, postures and positions are integrated, and the states of observers are monitored from various ways, so that the observers can select different interaction modes according to the display and interaction requirements of different exhibits, and accordingly interaction experience with more naturalness, reality and immersion is achieved.

Example nine

The embodiment of the invention also provides an intelligent interaction system of the intelligent exhibition hall, which is the same as the eighth embodiment and is not repeated, the difference is that,

the exhibit identification module 27 includes:

the third acquisition unit is used for acquiring auxiliary identification data; the auxiliary identification data comprise second data and/or third data, wherein the second data are three-dimensional coordinate information of the exhibited item, the second data are acquired through a depth camera, and the third data are distance information of the exhibited item, and the third data are acquired through an ultrasonic sensor;

the exhibit identification unit is used for carrying out exhibit identification based on the Yolo v5 algorithm and the auxiliary identification data to obtain an exhibit identification result;

the gesture recognition module 28 includes:

the fourth acquisition unit is used for acquiring gesture pictures of observers; wherein, the collection is carried out by a depth camera;

the second extraction unit is used for extracting the hand 3D key points in the gesture picture; wherein, extracting by a HandPointNet algorithm;

The recognition unit is used for carrying out gesture recognition of the observer according to the hand 3D key points to obtain a gesture recognition result;

the track generation module 30,

or comprises:

the fifth acquisition unit is used for acquiring the posture and motion information of the observer in a preset period; wherein, the acquisition is carried out by a gyroscope or an accelerometer;

a second determining unit for determining the position of the observer in a preset period according to the posture and the motion information of the observer;

the third generation unit is used for generating a motion track of the observer in a preset period according to the position of the observer in the preset period;

or comprises:

a sixth acquisition unit for acquiring an image of the observer within a preset period;

a third determining unit for determining a position of the observer within a preset period according to the image of the observer;

a fourth generation unit, configured to generate a motion trail of the observer in a preset period according to the position of the observer in the preset period;

or comprises:

the seventh acquisition unit is used for acquiring wifi signal intensity information of the mobile terminal held by the observer in a preset period;

a fourth determining unit, configured to determine a position of the observer within a preset period according to wifi signal strength information; the position of the observer in the preset period is the position of the mobile terminal held by the observer;

A fifth generation unit, configured to generate a motion trail of the observer in a preset period according to the position of the observer in the preset period;

the human body posture estimation module 29,

or comprises:

an eighth acquisition unit for acquiring depth images of the observers; wherein, the collection is carried out by a depth camera;

the third extraction unit is used for extracting 2D human body key point coordinates in the depth image; extracting by a 2D CNN algorithm;

the third conversion unit is used for carrying out coordinate conversion according to the depth information corresponding to the depth image and the 2D human body key point coordinates to obtain 3D human body key point coordinates;

the first posture estimation unit is used for estimating the human body posture of the observer based on a 3D posture regression algorithm and 3D human body key point coordinates to obtain a human body posture estimation result;

or comprises:

a ninth acquisition unit for acquiring three-dimensional coordinate information of the observer; wherein, the acquisition is carried out by a laser radar;

a sixth generation unit for generating a depth image according to the three-dimensional coordinate information of the observer;

a fourth extraction unit, configured to extract coordinates of 2D human body key points in the depth image; extracting by a 2D CNN algorithm;

The fourth conversion unit is used for carrying out coordinate conversion according to the depth information corresponding to the depth image and the 2D human body key point coordinates to obtain 3D human body key point coordinates;

the second posture estimation unit is used for estimating the human body posture of the observer based on the 3D posture regression algorithm and the 3D human body key point coordinates to obtain a human body posture estimation result.

In particular, the method comprises the steps of,

in the exhibit identification module 27, (1) the third acquisition unit acquires auxiliary identification data for subsequent exhibit identification using an ultrasonic technique and/or an image identification technique; (2) The exhibit identification unit adopts the Yolo v5 algorithm, the algorithm has the advantages of high precision, good instantaneity, high identification speed, light weight and strong flexibility, auxiliary identification data are combined during identification, accuracy of exhibit identification can be improved, and the exhibit identification unit can be used for triggering corresponding actions after the exhibit identification result is obtained.

In the gesture recognition module 28, (1) the fourth acquisition unit acquires the gesture image of the observer by using an image technology, so that the subsequent acquisition of the 3D key points of the hand is facilitated; (2) The HandPointNet algorithm adopted by the second extraction unit has good real-time performance, robustness and accuracy; (3) The recognition unit performs gesture recognition of the observer according to the hand 3D key points, and after a gesture recognition result is obtained, the gesture recognition unit can be used for triggering corresponding actions.

When the trajectory generation module 30 includes the fifth acquisition unit, the second determination unit, and the third generation unit, (1) the fifth acquisition unit acquires pose and motion information of the observer using a Micro-Electro-Mechanical Systems system for subsequent determination of the position of the observer; (2) The second determining unit determines the position of the observer in a preset period according to the posture and the motion information of the observer, so as to be used for generating a motion trail subsequently; (3) The third generating unit generates a motion trail of the observer in a preset period according to the position of the observer in the preset period, and after the motion trail is obtained, different actions can be triggered according to different motion trail. The module is suitable for being carried in mobile terminals, such as mobile phones and watches, has the advantages of small size, low power consumption and quick response, and can accurately capture motion trajectories.

When the trajectory generation module 30 includes the sixth acquisition unit, the third determination unit, and the fourth generation unit, (1) the sixth acquisition unit acquires an image of the observer using an image technique for subsequent determination of the position of the observer; (2) The third determining unit determines the position of the observer in a preset period according to the image of the observer, and is used for generating a motion trail subsequently; (3) The fourth generation unit generates a motion trail of the observer in a preset period according to the position of the observer in the preset period, and after the motion trail is obtained, different actions can be triggered according to different motion trail. The foregoing modules are particularly effective in indoor environments, and can capture more detailed observer status information.

When the track generating module 30 includes the seventh collecting unit, the fourth determining unit, and the fifth generating unit, (1) the seventh collecting unit collects wifi signal strength information by using a network transmission technology, so as to be used for determining the position of the observer subsequently; (2) The fourth determining unit determines the position of the observer in a preset period according to wifi signal intensity information, and the position is used for generating a motion track subsequently; (3) The fifth generation unit generates a motion trail of the observer in a preset period according to the position of the observer in the preset period, and after the motion trail is obtained, different actions can be triggered according to different motion trail. The module has wide coverage and is suitable for indoor and outdoor environments. The wifi positioning is based on wireless signal intensities of the mobile terminal and three wireless network access points, and is used for performing triangular positioning on a viewer relatively accurately through a differential algorithm, and the wifi positioning can achieve meter-level positioning, so that the method is suitable for positioning and navigation of small objects; optionally, the technology of image recognition and positioning from inside to outside and visible light image recognition can be combined, a camera of the head-mounted device is used for shooting surrounding pictures, then some characteristic points of scenery in a room are recognized through the images, displacement of the characteristic points can be obtained through comparison with the characteristic points shot last time, and the displacement is estimated through a triangulation method by adding auxiliary data provided by a gyroscope, so that positioning of the head-mounted device is realized. The image recognition positioning technology from inside to outside can realize space positioning in a virtual scene by using the equipment without depending on an external sensor, and is convenient for providing more man-machine interaction according to positioning information by positioning.

When the human body posture estimation module 29 includes the eighth acquisition unit, the third extraction unit, the third conversion unit and the first posture estimation unit, (1) the eighth acquisition unit acquires the depth image of the observer by using the image technology, and the subsequent extraction of the coordinates of the key points is more accurate and stable; (2) The 2D CNN algorithm adopted by the third extraction unit is very efficient in image processing, and the calculation can be processed in parallel, so that the method is suitable for parallel computing equipment such as a Graphic Processing Unit (GPU); (3) The third conversion unit performs coordinate conversion according to the depth information corresponding to the depth image and the 2D human body key point coordinates to obtain 3D human body key point coordinates for subsequent posture estimation; (4) The 3D gesture regression algorithm adopted by the first gesture estimation unit can directly predict the three-dimensional gesture of the observer, and has better robustness in processing occlusion and visual angle change. The module provides abundant depth information, the depth camera can directly acquire the depth information of different parts of the body of the observer, provides abundant three-dimensional space information and is beneficial to more accurately estimating the human body posture; meanwhile, the depth image is not influenced by shielding and visual angle change, and the stability is good;

When the human body posture estimation module 29 includes the ninth acquisition unit, the sixth generation unit, the fourth extraction unit, the fourth conversion unit, and the second posture estimation unit, (1) the ninth acquisition unit acquires three-dimensional coordinate information of the observer using the radar technology for subsequent generation of the depth image; (2) A sixth generation unit generates a depth image according to the three-dimensional coordinate information of the observer, so as to be used for subsequent coordinate extraction of the key points; (3) The fourth extraction unit and the third extraction unit have the same function; (4) The fourth conversion unit and the third conversion unit have the same function; (5) The second posture estimation unit and the first posture estimation unit function identically. The three-dimensional coordinate information acquired by the module is accurate and stable, and accurate human body posture estimation results are obtained.

Examples ten

The embodiment of the invention also provides an intelligent interaction system of the intelligent exhibition hall, which is the same as the first embodiment and is not repeated, and the difference is that the intelligent interaction system of the intelligent exhibition hall further comprises an interface module 31 as shown in fig. 6; the interface module 31 includes:

the configuration unit is used for enabling the observers to define the observation files and loading the observation files; the content of the exhibition file comprises exhibition route and exhibit information on the exhibition route;

The exception detection unit is used for detecting whether the custom content is correct when the custom viewing file is custom-displayed and detecting whether the loading process is correct when the custom viewing file is loaded.

Specifically, (1) the configuration unit can be used for a viewer to define a configuration file, the viewer can interactively add information of the exhibits or a viewing route, personalized experience is better, the viewer can select a default viewing route of the system for viewing, and the viewer can roam the whole exhibition hall with a first person viewing angle along the viewing route; (2) The exception detection unit can detect whether the custom content is correct when the custom viewing file is custom-displayed, and can detect whether the loading process is correct when the custom viewing file is loaded, so that the exception can be found and eliminated in time.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and drawings of the present invention or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims

1. An intelligent interactive system for an intelligent exhibition hall, comprising:

the integration unit is used for carrying out depth plane integration according to the third characteristic to obtain a three-dimensional reconstructed exhibit image;

the intelligent interaction system of the intelligent exhibition hall further comprises:

the exhibit identification module is used for identifying the exhibits and obtaining an exhibit identification result;

the gesture recognition module is used for recognizing gestures of the observers and obtaining gesture recognition results;

the human body posture estimation module is used for carrying out human body posture estimation on the observers to obtain a human body posture estimation result;

the track generation module is used for generating a motion track of the observer in a preset period;

the exhibit identification module comprises:

the gesture recognition module comprises:

the track generation module is configured to generate a track,

or comprises:

the human body posture estimation module is used for estimating the human body posture,

or comprises:

Or comprises:

2. The intelligent interactive system of an intelligent exhibition hall according to claim 1, wherein the intelligent interactive system of the intelligent exhibition hall further comprises:

the second acquisition unit is used for acquiring the first data; the first data is used for assisting in judging interaction tendency of the observers, and the first data is at least one of exhibit information, observer information and exhibition hall environment information.

3. The intelligent interactive system of an intelligent exhibition hall according to claim 1, wherein the intelligent interactive system of the intelligent exhibition hall further comprises:

A typing command type interaction module for the observers to interact with the exhibits through manually typing commands;

an immersive interaction module for providing an immersive interaction with the exhibit to the observer;

a master control module;

a communication module;

the voice interaction module, the somatosensory interaction module, the 3D display module, the typing command interaction module and the immersive interaction module are all communicated with the master control module through the communication module; the voice interaction module, the somatosensory interaction module, the typing command interaction module and the immersion interaction module are all used for receiving an interaction request sent by a viewer, forwarding the interaction request to the master control module, receiving a feedback instruction and feeding back content corresponding to the feedback instruction to the viewer; and generating and forwarding a feedback instruction by the master control module according to the interaction request.

4. The intelligent interaction system of an intelligent exhibition hall according to claim 1, wherein the 3D display module further comprises a holographic projection module, and the holographic projection module is a thin film holographic projection module or a digital holographic projection module.

5. The intelligent interactive system of an intelligent exhibition hall according to claim 1, wherein the intelligent interactive system of the intelligent exhibition hall further comprises:

The entity tag corresponds to the exhibited article and has a preset length with the distance, and also has a bar code or a two-dimensional code for the mobile terminal held by the exhibited person to recognize; wherein, the bar code or the two-dimensional code stores the exhibit information.

6. The intelligent interactive system of claim 1, wherein the voice interaction module comprises:

7. The intelligent interactive system of claim 6, wherein the noise reduction module comprises:

the text generation module comprises:

8. The intelligent interactive system of claim 1, wherein the intelligent interactive system of the intelligent exhibition hall further comprises an interface module; the interface module includes: