CN115810099A

CN115810099A - Image fusion equipment for virtual immersion type depression treatment system

Info

Publication number: CN115810099A
Application number: CN202310054924.9A
Authority: CN
Inventors: 严龙生; 林友辉
Original assignee: Xiamen Yi'an Intelligent Technology Co ltd
Current assignee: Xiamen Yi'an Intelligent Technology Co ltd
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-03-17
Anticipated expiration: 2043-02-03
Also published as: CN115810099B

Abstract

The invention provides an image fusion device for a virtual immersive depression treatment system, wherein the virtual immersive depression treatment system comprises a camera, a VR device, a physiological data sensor and a server, and the device comprises: the system comprises a collecting unit, a selecting unit, a fusing unit and a playing unit which are mutually matched, so that a key image sequence is extracted after video images of a user are collected by a camera and sent to a server, physiological data of the user collected by a physiological data sensor are sent to the server, and then a scene video is selected from a scene database; the server uses a multi-parameter confrontation generation neural network model to fuse each frame image in the key frame image sequence into the scene video based on the physiological data to generate a fused scene video image, and then the fused scene video image is sent to the VR device to be played. The image of the patient is fused with the frame image in the virtual scene, so that the immersion of the patient is enhanced.

Description

Image fusion equipment for virtual immersion type depression treatment system

Technical Field

The invention relates to the technical field of artificial intelligence and medical data processing, in particular to image fusion equipment for a virtual immersion type depression treatment system.

Background

The chinese patent invention 2017103003414 discloses a virtual scene system for adjuvant therapy of depressive disorder based on VR technology, which includes a virtual scene interaction module, a data acquisition module and a data analysis module that are connected to each other, where the virtual scene interaction module includes VR devices, the virtual scene interaction module is connected to the data acquisition module through VR devices, and the data analysis module includes a database; the virtual scene interaction module is used for simulating a story narration manipulation of psychological persuasion by a doctor, providing various virtual scenes with rich story scenes by means of VR equipment, realizing interaction behaviors, creating an immersive experience environment for a user, serving as a mood and physiological data trigger source in a scene building stage, and playing a role in relieving depression mood in a use stage after the scene is optimal; the data acquisition module is used for acquiring feedback data of a user in the virtual scene interaction module and sending the feedback data to the data analysis module; and the data analysis module is used for converting the feedback data in the database into the adjustment information of various numerical values required in the virtual scene interaction module by combining a statistical theory. The virtual scene interaction module also comprises an environment module, a role module and an plot module; the environment module, the role module and the plot module construct a plurality of virtual scenes and form a virtual scene story system through a time axis; the virtual scene is initially created by combining experience basis of depression and emotion relief and professional guidance suggestions of a psychologist; and the virtual scene is used at the later stage, the environment, the character and the plot module need to be adjusted according to the feedback data of the data analysis module, and the virtual scene with the optimal value is obtained.

However, the virtual scene adopted by the system generally adopts cartoon characters to replace patients, so that the patients have poor immersion and poor treatment effect.

In the prior art, image fusion is generally performed in a way of resisting generation of a neural network by adopting wavelet transformation, but in a virtual environment of depression, how to perform image fusion based on physiological parameters of a patient is a technical challenge, namely how to enable people in a fused video to perfectly match the physical condition of the patient and improve the fusion speed.

Disclosure of Invention

The present invention proposes the following technical solutions to address one or more technical defects in the prior art.

An image fusion device for a virtual immersive depression treatment system, the virtual immersive depression treatment system comprising a camera, a VR device, a physiological data sensor, and a server, the device comprising:

the acquisition unit is used for enabling the camera to acquire video images of users and send the video images to the server, and the server extracts a key image sequence from the video images after receiving the video images and stores the key image sequence in a first cache queue;

the selection unit is used for sending the physiological data of the user, which are acquired by the physiological data sensor, to the server, and the server selects a scene video from a scene database based on the physiological data;

the fusion unit is used for enabling the server to fuse each frame image in the key frame image sequence in the first cache queue into the scene video based on the physiological data by using a multi-parameter confrontation generation neural network model to generate a fusion scene video image, wherein the multi-parameter confrontation generation neural network model is obtained by adopting optimized loss function training;

and the playing unit is used for sending the fusion scene video image to the VR equipment and playing the fusion scene video image to the user in a display device of the VR equipment.

Further, the physiological data at least comprises body temperature, brain waves, blood pressure, heart rate, electrocardio and myoelectricity data.

Further, the operation of extracting the key image sequence from the video image is as follows: and inputting each frame of the video image into a first convolutional neural network for processing to obtain a key frame image sequence.

Further, the operation of the fusion unit is: acquiring physiological data Pi corresponding to a key frame image Mi in a key frame image sequence from a server, wherein Pi = (Ti, NEi, BPi, HRi, HEi and MEi), determining a corresponding image frame Ni to be fused in the scene video based on the key frame image Mi, determining fusion coordinates of the key frame image Mi in the image frame Ni to be fused, inputting the key frame image Mi, the image frame Ni to be fused, the physiological data Pi and the fusion coordinates into a multi-parameter antagonistic generation neural network model to generate a fusion image frame, and combining the generated multiple fusion image frames into a fusion scene video image after all key frame images in the key frame image sequence are processed, wherein n is more than or equal to 0, n is the total frame number in the key frame image sequence, and Ti, NEi, BPi, HRi, HEi and Mei respectively represent corresponding body temperature, brain wave, blood pressure, electrocardio, heart rate and myoelectricity data.

Further, the operation of determining the corresponding image frame Ni to be fused in the scene video based on the key frame image Mi is as follows: and judging the similarity between the posture of the virtual character in each frame of image in the scene video and the posture of the user in the key frame image Mi, and taking the frame object with the maximum similarity as the image frame Ni to be fused.

Further, the countermeasure generation neural network model includes a fusion image generator G and a fusion image discriminator D, the fusion image generator G and the fusion image discriminator D are alternately trained, each training sample in the sample set is input into the fusion image generator G to generate a fusion image, and the fusion image discriminator D is used for discriminating the difference between the generated fusion image and the real image.

Further, each training sample Sj in the sample set comprises a user image Uj, user physiological data Pj, a background image Bj to be fused, a fusion coordinate Cj and a real image RMj, wherein Pj = (Tj, NEj, BPj, HRj, HEj and MEj), wherein m is larger than or equal to j and larger than or equal to 0, m is the number of the training samples, and Tj, NEj, BPj, HRj, HEj and MEj respectively represent corresponding body temperature, brain wave, blood pressure, heart rate, electrocardio and myo data in the user physiological data Pj in the training sample Sj.

Further, the optimized loss function includes a loss function LossG of the fused image generator G and a loss function LossD of the fused image discriminator D, wherein:

;

；

;

;

wherein ,

is expressed as input

A fused image generated by the time-fused image generator G;

representing pairs of discriminators D using fused images

In the above-described manner, the result of the recognition,

to represent

And the difference between the values of the first and second coefficients RMj,

and representing the recognition result of the fusion image discriminator D on the RMj, and obtaining the well-trained confrontation network after a plurality of times of iterative training.

The invention also provides an image fusion method for the virtual immersive depression treatment system, wherein the virtual immersive depression treatment system comprises a camera, a VR device, a physiological data sensor and a server, and the method comprises the following steps:

the method comprises the steps of collecting a video image of a user by a camera and sending the video image to a server, extracting a key image sequence from the video image after the server receives the video image, and storing the key image sequence in a first cache queue;

a selecting step, namely sending the physiological data of the user, which is acquired by the physiological data sensor, to the server, and selecting a scene video from a scene database by the server based on the physiological data;

a fusion step, wherein the server fuses each frame image in the key frame image sequence in the first cache queue into the scene video to generate a fusion scene video image based on the physiological data by using a multi-parameter confrontation generation neural network model, wherein the multi-parameter confrontation generation neural network model is obtained by adopting optimized loss function training;

and a playing step, namely sending the fusion scene video image to the VR equipment, and playing the fusion scene video image to the user in a display device of the VR equipment.

Still further, the physiological data includes at least body temperature, brain waves, blood pressure, heart rate, electrocardiogram, and electromyogram data.

Further, the fusing step operates by: acquiring physiological data Pi corresponding to a key frame image Mi in a key frame image sequence from a server, wherein Pi = (Ti, NEi, BPi, HRi, HEi and MEi), determining a corresponding image frame Ni to be fused in the scene video based on the key frame image Mi, determining fusion coordinates of the key frame image Mi in the image frame Ni to be fused, inputting the key frame image Mi, the image frame Ni to be fused, the physiological data Pi and the fusion coordinates into a multi-parameter confrontation generation neural network model to generate a fusion image frame, and combining a plurality of generated fusion image frames into a fusion scene video image after all key frame images in the key frame image sequence are processed, wherein n i is not less than 0, n is the total frame number in the key frame image sequence, and Ti, NEi, BPi, HRi, HEi and Mei respectively represent corresponding body temperature, brain wave, blood pressure, electrocardio, myoelectricity and myoelectricity data.

Further, the multi-parameter countermeasure generation neural network model includes a fused image generator G and a fused image discriminator D, alternately training the fused image generator G and the fused image discriminator D, inputting each training sample in the sample set into the fused image generator G to generate a fused image, and the fused image discriminator D for discriminating a difference between the generated fused image and a real image.

Furthermore, each training sample Sj in the sample set comprises a user image Uj, user physiological data Pj, a background image Bj to be fused, a fusion coordinate Cj and a real image RMj, wherein Pj = (Tj, NEj, BPj, HRj, HEj, MEj), wherein m is not less than j and not less than 0, m is the number of the training samples, and Tj, NEj, BPj, HRj, HEj and MEj respectively represent corresponding body temperature, brain wave, blood pressure, heart rate, electrocardio and electromyographic data in the user physiological data Pj in the training sample Sj.

；

;

;

;

wherein ,

is expressed as input

A fused image generated by the time-fused image generator G;

representing pairs of discriminators D using fused images

As a result of the recognition of (1),

to represent

And the difference between the values of the first and second coefficients RMj,

and representing the recognition result of the fusion image discriminator D on the RMj, and obtaining a well-trained confrontation network after a plurality of times of iterative training.

The invention also proposes an electronic device comprising a processor and a memory, said memory being connected to the processor, said memory having stored therein program code, the method of any of the above mentioned being implemented when said processor executes the program code in said memory.

The present invention also proposes a computer-readable storage medium having stored thereon computer program code which, when executed by a computer, performs the method of any of the above.

The invention has the technical effects that: the invention relates to an image fusion method, an electronic device and a storage medium for a virtual immersion type depression treatment system, wherein the virtual immersion type depression treatment system comprises a camera, a VR device, a physiological data sensor and a server, and the method comprises the following steps: the method comprises the following steps that S101, a camera collects video images of a user and sends the video images to a server, the server extracts a key image sequence from the video images after receiving the video images, and the key image sequence is stored in a first cache queue; a selecting step S102, in which the physiological data of the user collected by the physiological data sensor is sent to the server, and the server selects a scene video from a scene database based on the physiological data; a fusion step S103, fusing each frame image in the key frame image sequence in the first cache queue into the scene video by the server based on the physiological data by using a multi-parameter confrontation generation neural network model to generate a fusion scene video image, wherein the multi-parameter confrontation generation neural network model is obtained by adopting optimized loss function training; and a playing step S104, sending the fusion scene video image to the VR equipment, and playing the fusion scene video image to the user in a display device of the VR equipment. The method comprises the steps of collecting video images of a user (namely a depression patient), extracting a key image sequence from the video images, storing the key image sequence in a first cache queue, sending physiological data of the user collected by a physiological data sensor to a server, selecting a scene video from a scene database based on the physiological data, adopting a multi-parameter confrontation generation neural network model to fuse each frame image in the key frame image sequence in the first cache queue into the scene video based on the physiological data so as to generate a fused scene video image, training the multi-parameter confrontation generation neural network model by adopting an optimized loss function, sending the fused scene video image to VR equipment, and playing the fused scene video image to the user in a display device of the VR equipment. Because the method adopts the image of the patient to extract the key frame sequence and then fuses with the frame image in the virtual scene, the immersion of the patient is more comfortable, and the technical problem that the cartoon character is not true in the scene video in the background technology is solved; in the invention, the similarity between the posture of a virtual character in each frame of image in the scene video and the posture of a user in the key frame image Mi is judged, a frame object with the maximum similarity is used as an image frame Ni to be fused, namely the frame object is used for perfectly matching the action of the user with the action of a virtual human to replace the virtual human in the video, then the fusion coordinate of the key frame image Mi is determined in the image frame Ni to be fused, the fusion coordinate is determined, so that the calculation amount of an anti-generation network model is reduced in the fusion process, the fusion speed is improved, then the key frame image Mi, the image frame Ni to be fused, physiological data Pi and the fusion coordinate are input into an anti-neural network model to generate a fusion image frame, and after all key frame images in the key frame image sequence are processed, a plurality of fusion image frames are generated to be combined into a fusion scene video image. Because the fusion coordinate is determined, the calculation amount of the confrontation generation network model is reduced in the fusion process, the fusion speed is improved, and the generated fusion scene video image is more matched with the action of the user, so that the patient obtains more vivid immersion feeling, and the depression recovery is facilitated; the antagonism generation neural network model of the invention adopts a plurality of parameters to generate images, and a part of parameters are physiological data of a patient, so that the images of the user in the fused video are more in line with the condition of the patient.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

Fig. 1 is a flow diagram of an image fusion method for a virtual immersive depression treatment system according to an embodiment of the present invention.

Fig. 2 is a block diagram of an image fusion device for a virtual immersive depression treatment system according to an embodiment of the present invention.

Fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an image fusion method of the present invention for a virtual immersive depression treatment system including a camera, a VR device, a physiological data sensor, and a server, the method including:

the method comprises the following steps that S101, a camera collects video images of a user and sends the video images to a server, the server extracts a key image sequence from the video images after receiving the video images, and the key image sequence is stored in a first cache queue;

a selecting step S102, in which the physiological data of the user collected by the physiological data sensor is sent to the server, and the server selects a scene video from a scene database based on the physiological data;

a fusion step S103, fusing each frame image in the key frame image sequence in the first cache queue into the scene video by the server based on the physiological data by using a multi-parameter confrontation generation neural network model to generate a fusion scene video image, wherein the multi-parameter confrontation generation neural network model is obtained by adopting optimized loss function training;

and a playing step S104, sending the fusion scene video image to the VR equipment, and playing the fusion scene video image to the user in a display device of the VR equipment.

The method comprises the steps of collecting video images of a user (namely a depression patient), extracting a key image sequence from the video images, storing the key image sequence in a first cache queue, sending physiological data of the user collected by a physiological data sensor to a server, selecting a scene video from a scene database based on the physiological data, fusing each frame image in the key frame image sequence in the first cache queue to the scene video based on the physiological data by adopting a multi-parameter confrontation generation neural network model to generate a fused scene video image, training the multi-parameter confrontation generation neural network model by adopting an optimized loss function, sending the fused scene video image to VR equipment, and playing the fused scene video image to the user in a display device of the VR equipment. The method has the advantages that the key frame sequence is extracted from the image of the patient, and then the image is fused with the frame image in the virtual scene, so that the immersion of the patient is more comfortable, the technical problem that cartoon characters are not true in the scene video in the background technology is solved.

In a further embodiment, the physiological data comprises at least body temperature, brain waves, blood pressure, heart rate, electro-cardio and myoelectric data, which are acquired by the physiological data sensors, which may be arranged at different locations on the patient (user) to acquire these physiological data.

In a further embodiment, the operation of extracting the key image sequence from the video image is: and inputting each frame of the video image into a first convolutional neural network for processing to obtain a key frame image sequence. The neural network technology provided by the user key frame is mature, and the key frame can be extracted by constructing a convolutional neural network model and training by adopting a corresponding training sample set.

In a further embodiment, the fusing step operates to: acquiring physiological data Pi corresponding to a key frame image Mi in a key frame image sequence from a server, wherein Pi = (Ti, NEi, BPi, HRi, HEi and MEi), determining a corresponding image frame Ni to be fused in the scene video based on the key frame image Mi, determining fusion coordinates of the key frame image Mi in the image frame Ni to be fused, inputting the key frame image Mi, the image frame Ni to be fused, the physiological data Pi and the fusion coordinates into an anti-neural network model to generate a fusion image frame, and combining a plurality of generated fusion image frames into a fusion scene video image after all key frame images in the key frame image sequence are processed, wherein n i is more than or equal to 0, n is the total frame number in the key frame image sequence, and Ti, NEi, BPi, HRi, HEi and Mei respectively represent corresponding body temperature, brain wave, blood pressure, heart rate, electrocardio and myoelectric data.

In a further embodiment, the operation of determining the corresponding image frame to be fused Ni in the scene video based on the key frame image Mi is: and judging the similarity between the posture of the virtual character in each frame of image in the scene video and the posture of the user in the key frame image Mi, and taking the frame object with the maximum similarity as the image frame Ni to be fused.

In the invention, the similarity between the posture of a virtual character in each frame of image in the scene video and the posture of a user in the key frame image Mi is judged, a frame object with the maximum similarity is used as an image frame Ni to be fused, namely, the image frame object is used for perfectly matching the action of the user with the action of the virtual human to replace the virtual human in the video, then the fusion coordinates of the key frame image Mi are determined in the image frame Ni to be fused, because the fusion coordinates are determined, the calculation amount of an antagonistic generation network model is reduced in the fusion process, the fusion speed is improved, then the key frame image Mi, the image frame Ni to be fused, physiological data Pi and the fusion coordinates are input into an antagonistic neural network model to generate a fusion image frame, and after all key frame images in the key frame image sequence are processed, a plurality of fusion image frames are generated to be combined into a fusion scene video image. The fusion coordinates are determined, so that the calculation amount of the anti-generation network model is reduced in the fusion process, the fusion speed is improved, and the generated fusion scene video images are more matched with the actions of the user, so that the patient obtains more vivid immersion feeling, and the depression recovery is facilitated, which is another important invention point of the invention.

In a further embodiment, the multi-parameter countermeasure generation neural network model includes a fused image generator G and a fused image discriminator D, alternately training the fused image generator G and the fused image discriminator D, inputting each training sample of the sample set into the fused image generator G to generate a fused image, and the fused image discriminator D for discriminating a difference between the generated fused image and the real image.

In a further embodiment, each training sample Sj in the sample set includes a user image Uj, user physiological data Pj, a background image Bj to be fused, a fusion coordinate Cj and a real image RMj, wherein Pj = (Tj, NEj, BPj, HRj, HEj, MEj), wherein m ≧ j ≧ 0, m is the number of training samples, and Tj, NEj, BPj, HRj, HEj, MEj respectively represent corresponding body temperature, brain wave, blood pressure, heart rate, electrocardiogram and electromyogram data in the user physiological data Pj in the training sample Sj.

In a further embodiment, the optimized loss function comprises a loss function LossG of the fused image generator G and a loss function LossD of the fused image discriminator D, wherein:

;

；

；

;

wherein ,

is expressed as input

A fused image generated by the time-fused image generator G;

representing pairs using fused image discriminators D

As a result of the recognition of (1),

to represent

And the difference between the values of the first and second coefficients RMj,

The antagonism generation neural network model of the invention adopts a plurality of parameters to generate images, and a part of parameters are physiological data of a patient, so that the images of the user in the fused video are more in line with the condition of the patient, the invention improves the loss function of the antagonism generation neural network model, in the loss function of the generator, the average value and the maximum value of all physiological parameters are improved based on the comparison of the average value and the maximum value, in the loss function of the discriminator, the loss function is improved based on the comparison of the minimum value and the average value of all physiological parameters, through the improvement, the loss function reflects the characteristic of a plurality of parameters of the antagonism generation neural network model, so that the training speed of the neural network is faster, and the generated fusion image is more approximate to the real condition of the patient, therefore, the improved loss function of the invention is another important invention point.

Fig. 2 shows an image fusion device of the present invention for a virtual immersive depression treatment system including a camera, a VR device, a physiological data sensor, and a server, the device including:

the acquisition unit 201 is used for acquiring a video image of a user by the camera and sending the video image to the server, extracting a key image sequence from the video image after the server receives the video image, and storing the key image sequence in a first cache queue;

the selecting unit 202 is used for sending the physiological data of the user, which is acquired by the physiological data sensor, to the server, and the server selects a scene video from a scene database based on the physiological data;

a fusion unit 203, the server fusing each frame image in the key frame image sequence in the first buffer queue to the scene video based on the physiological data by using a multi-parameter confrontation generation neural network model to generate a fusion scene video image, wherein the multi-parameter confrontation generation neural network model is obtained by adopting optimized loss function training;

and the playing unit 204 is used for sending the fusion scene video image to the VR equipment and playing the fusion scene video image to the user in a display device of the VR equipment.

In a further embodiment, the physiological data comprises at least body temperature, brain waves, blood pressure, heart rate, electro-cardio and myoelectric data, which are acquired by the physiological data sensors, which may be arranged at different locations on the patient (user) to acquire the physiological data.

In a further embodiment, the operation of extracting the key image sequence from the video image is: and inputting each frame of the video image into a first convolutional neural network for processing to obtain a key frame image sequence. The neural network technology provided by the user key frame is mature, and the key frame can be extracted after a convolutional neural network model is constructed and trained by adopting a corresponding training sample set.

In a further embodiment, the operation of the fusion unit is: acquiring physiological data Pi corresponding to a key frame image Mi in a key frame image sequence from a server, wherein Pi = (Ti, NEi, BPi, HRi, HEi and MEi), determining a corresponding image frame Ni to be fused in the scene video based on the key frame image Mi, determining fusion coordinates of the key frame image Mi in the image frame Ni to be fused, inputting the key frame image Mi, the image frame Ni to be fused, the physiological data Pi and the fusion coordinates into an anti-neural network model to generate a fusion image frame, and combining a plurality of generated fusion image frames into a fusion scene video image after all key frame images in the key frame image sequence are processed, wherein n i is not less than 0, n is the total frame number in the key frame image sequence, and Ti, NEi, BPi, HRi, HEi and Mei respectively represent corresponding body temperature, brain wave, blood pressure, heart rate, electrocardio and myo data.

In the invention, the similarity between the posture of a virtual character in each frame of image in the scene video and the posture of a user in the key frame image Mi is judged, a frame object with the maximum similarity is used as an image frame Ni to be fused, namely the frame object is used for perfectly matching the action of the user with the action of a virtual human to replace the virtual human in the video, then the fusion coordinate of the key frame image Mi is determined in the image frame Ni to be fused, the calculation amount of an antagonistic generation network model is reduced in the fusion process due to the determination of the fusion coordinate, the fusion speed is improved, then the key frame image Mi, the image frame Ni to be fused, physiological data Pi and the fusion coordinate are input into an antagonistic neural network model to generate a fusion image frame, and after all key frame images in the key frame image sequence are processed, a plurality of fusion image frames are generated to be combined into a fusion scene video image. The fusion coordinates are determined, so that the calculation amount of the anti-generation network model is reduced in the fusion process, the fusion speed is improved, and the generated fusion scene video images are more matched with the actions of the user, so that the patient obtains more vivid immersion feeling, and the depression recovery is facilitated, which is another important invention point of the invention.

;

;

;

;

wherein ,

is expressed as input

A fused image generated by the time-fused image generator G;

indicating the recognition result by the fused image discriminator D pair,

to represent

And the difference between the values of the first and second coefficients RMj,

The method comprises the steps of generating an image by adopting a plurality of parameters, wherein a part of parameters are physiological data of a patient, so that the image of the user in a fused video is more accordant with the condition of the patient, improving a loss function of the anti-generation neural network model, improving a traditional generator loss function based on the comparison of the mean value and the maximum value of each physiological parameter in the loss function of a generator, improving a traditional discriminator loss function based on the comparison of the minimum value and the mean value of each physiological parameter in the loss function of a discriminator, and reflecting the characteristic of a plurality of parameters of the anti-generation neural network model by the loss function through the improvement, so that the training speed of the neural network is higher, and the generated fused image is more approximate to the real condition of the patient.

Fig. 3 shows an electronic device of the invention comprising a processor and a memory coupled to the processor, the memory having stored therein program code for performing any of the above-mentioned methods when the processor executes the program code in the memory. The electronic device may be a variety of computers, handheld devices, etc., a distributed computer, etc.

For convenience of description, the above devices are described as being divided into various units for separate description. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application or portions thereof contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the apparatuses according to the embodiments or some parts of the embodiments of the present application.

Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention any modifications and equivalents.

Claims

1. An image fusion device for a virtual immersive depression treatment system, the virtual immersive depression treatment system including a camera, a VR device, a physiological data sensor, and a server, the device comprising:

a fusion unit, configured to enable the server to fuse each frame image in the key frame image sequence in the first cache queue into the scene video based on the physiological data using a multi-parameter confrontation generation neural network model to generate a fused scene video image, where the multi-parameter confrontation generation neural network model is obtained by using optimized loss function training;

2. The apparatus of claim 1, wherein the physiological data includes at least body temperature, brain waves, blood pressure, heart rate, electrocardiogram and electromyogram data.

3. The device according to claim 2, characterized in that the operation of extracting a key image sequence from said video images is: and inputting each frame of the video image into a first convolutional neural network for processing to obtain a key frame image sequence.

4. The apparatus of claim 3, wherein the operation of the fusion unit is to: acquiring physiological data Pi corresponding to a key frame image Mi in a key frame image sequence from a server, wherein Pi = (Ti, NEi, BPi, HRi, HEi and MEi), determining a corresponding image frame Ni to be fused in the scene video based on the key frame image Mi, determining fusion coordinates of the key frame image Mi in the image frame Ni to be fused, inputting the key frame image Mi, the image frame Ni to be fused, the physiological data Pi and the fusion coordinates into a multi-parameter antagonistic generation neural network model to generate a fusion image frame, and combining the generated multiple fusion image frames into a fusion scene video image after all key frame images in the key frame image sequence are processed, wherein n is more than or equal to 0, n is the total frame number in the key frame image sequence, and Ti, NEi, BPi, HRi, HEi and Mei respectively represent corresponding body temperature, brain wave, blood pressure, electrocardio, heart rate and myoelectricity data.

5. The device according to claim 4, wherein the operation of determining the corresponding image frame to be fused Ni in the scene video based on the key frame image Mi is: and judging the similarity between the posture of the virtual character in each frame of image in the scene video and the posture of the user in the key frame image Mi, and taking the frame object with the maximum similarity as the image frame Ni to be fused.

6. The apparatus of claim 5, wherein the multi-parameter countermeasure generation neural network model comprises a fused image generator G and a fused image discriminator D, wherein the fused image generator G and the fused image discriminator D are trained alternately, wherein each training sample in the sample set is input to the fused image generator G to generate a fused image, and wherein the fused image discriminator D is configured to discriminate a difference between the generated fused image and a real image.

7. The apparatus according to claim 6, wherein each training sample Sj in the sample set comprises a user image Uj, user physiological data Pj, a background image Bj to be fused, a fusion coordinate Cj and a real image RMj, wherein Pj = (Tj, NEj, BPj, HRj, HEj, MEj), wherein m ≧ j ≧ 0, m is the number of training samples, and Tj, NEj, BPj, HRj, HEj, MEj respectively represent corresponding body temperature, brain wave, blood pressure, heart rate, electrocardiogram and electromyogram data in the user physiological data Pj in the training sample Sj.

8. The apparatus of claim 7, wherein the optimized loss function comprises a loss function LossG of the fused image generator G and a loss function LossD of the fused image discriminator D, wherein: