CN109816630A

CN109816630A - FMRI visual coding model building method based on transfer learning

Info

Publication number: CN109816630A
Application number: CN201811570733.3A
Authority: CN
Inventors: 闫镔; 张驰; 于子雅; 段晓菡; 童莉; 王林元; 高辉
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2019-05-28
Anticipated expiration: 2038-12-21
Also published as: CN109816630B

Abstract

The invention belongs to Vision information processing technical fields, in particular to a kind of fMRI visual coding model building method based on transfer learning, include: obtaining visual stimulus image data and its corresponding fMRI data set, using stimulating image data as the input of encoding model, visual area fMRI responds the output as model；Eye response encoding model is constructed by the CNN feature of depth convolutional neural networks model extraction stimulating image data, and for each visual area；It is that each voxel of brain visual area chooses optimum visual encoding model according to eye response mode by the full articulamentum response model in construction dynamic loss function training visual coding model.The present invention extracts the feature of fMRI visual experiment stimulation using the deep neural network model that training obtains on large data sets, and the feature of extraction is obtained into the response for different voxels by effective and reasonable Nonlinear Mapping, the higher visual coding model of precision is constructed, the forecasting accuracy to the response of brain visual area voxel is improved.

Description

FMRI visual coding model building method based on transfer learning

Technical field

The invention belongs to Vision information processing technical field, in particular to a kind of fMRI visual coding based on transfer learning Model building method.

Background technique

The brain of people is the five-star part of nervous system, and wherein cerebral cortex is the most flourishing, is the organ of thinking, is high The neururgic material base of grade.In numerous external informations that brain receives, visual information is perception and the understanding world One of main path.And brain has high efficiency, robustness in terms of the processing to visual information.Visual coding based on fMRI Research is established a computable encoding model and is passed through to simulate mainly using human brain Vision information processing mechanism as theoretical basis The prediction to brain function activity is realized in fMRI actually measured brain activity, further parses human brain visual processes mechanism, real Existing " imitative brain ".Deep neural network is that some early detections based on network structure and vision system grow up.Neuscience Studies have shown that deep neural network and human visual system are handled, the mode of visual information is similar, handle input picture when It waits, by realizing that lower layer signal to the Function Mapping of high-level characteristic, establishes the hierarchical model of implication relation inside learning data.Such as What extracts the feature of fMRI visual experiment stimulation using the deep neural network model that training obtains on large data sets, and will mention The feature taken obtains the response for different voxels by effective and reasonable Nonlinear Mapping, constructs the higher visual coding of precision Model explores vision mechanism and all has great importance for simulating human brain pathways for vision.

Summary of the invention

For this purpose, the present invention provides a kind of fMRI visual coding model building method based on transfer learning, construction feature is empty Between response space between Nonlinear Mapping relationship, improve visual coding model forecasting accuracy.

According to design scheme provided by the present invention, a kind of fMRI visual coding model construction side based on transfer learning Method includes following content:

Visual stimulus image data and its corresponding fMRI data set are obtained, using stimulating image data as encoding model Input, visual area fMRI respond the output as model；

By the CNN feature of depth convolutional neural networks model extraction stimulating image data, and it is directed to each visual area structure Make eye response encoding model；

For training dataset, mould is responded by the full articulamentum in construction dynamic loss function training visual coding model Type (full articulamentum) is that each voxel of brain visual area chooses optimum visual encoding model according to eye response mode.

Above-mentioned, in CNN feature extraction, AlexNet network model is initially set up, and fix its model parameter, extracts thorn Swash the CNN feature of image data different layers, and two layers of full articulamentum is added behind and constructs visual coding model.

Above-mentioned, by the full articulamentum response model in construction dynamic loss function training visual coding model, include Following content: being directed to training dataset, establishes Nonlinear Mapping response model in feature space and response space, passes through construction one Dynamic loss function is planted to realize the selection to voxel, to preferably train the response model in visual coding model.

Preferably, dynamic loss function expression are as follows:

Loss=-r₁*R+r₂*L₂(W)

Wherein, R indicates correlation between the response of prediction voxel and true voxel response, and W represents network layer parameter, L₂(W) It indicates to use L₂Regularization function, r₂It is regularization term coefficient,Here R₀For verifying collection The response of prediction voxel and true voxel response between correlation, l be that the displacement obtained in advance verifies conspicuousness threshold value.

Above-mentioned, it is complete to two layers based on loss function by the way that stimulating image data are randomly divided into training set and verifying collection Articulamentum response model is trained, and after one wheel of training set training, collects undated parameter r in verifying₁, change by the update of number wheel parameter In generation, minimizes loss function, obtains response model.

Preferably, using loss function as constraint, trained response model and the effective voxel of coding are obtained, when pre- When correlation between the voxel response of survey and true voxel response is greater than l, determine that coding significantly has with the significance of m Effect, m are the numerical value obtained according to l.

Above-mentioned, for training set, utilize trained neural network configuration transfer learning encoding model；It is damaged using dynamic Function is lost as constraint, obtains trained response model and the effective voxel of coding；For effective voxel, in trained base The highest model of predictablity rate is chosen in multiple encoding models of every layer of CNN feature, corresponds to voxel as brain visual area Optimum visual encoding model, and finally test set assess encoding model effect.

Beneficial effects of the present invention:

The present invention can not preferably train the deep neural network for being suitable for extracting feature for existing fMRI data volume, And the predicated response for the feature that extracts by Linear Mapping output is in the situations such as high-level vision area precision of prediction is not high enough, Pass through the CNN feature of trained depth convolutional neural networks model extraction stimulating image data based on transfer learning；And it is directed to Training dataset obtains full articulamentum response model in neural network model by loss function, is brain according to vision mode The each voxel of visual area chooses optimum visual encoding model, utilizes the deep neural network model that training obtains on large data sets The feature of fMRI visual experiment stimulation is extracted, and the feature of extraction is obtained by effective and reasonable Nonlinear Mapping for difference The response of voxel constructs the higher visual coding model of precision, for simulating human brain pathways for vision, explores vision mechanism all Have great importance.

Detailed description of the invention:

Fig. 1 is model building method flow diagram in embodiment；

Fig. 2 is visual coding model training flow chart in embodiment；

Fig. 3 is that loss function parameter value changes schematic diagram in embodiment；

Fig. 4 is to compare histogram with the coding accuracy rate of GWP model difference visual area voxel in embodiment；

Fig. 5 is to compare histogram with the coding accuracy rate of CNN linear model difference visual area voxel in embodiment.

Specific embodiment:

To make the object, technical solutions and advantages of the present invention clearer, understand, with reference to the accompanying drawing with technical solution pair The present invention is described in further detail.

The visual performance of human brain is studied, parses Vision information processing mechanism, and realized to human brain with certain calculation method The simulation of Vision information processing process, it has also become one of the hot spot of Cognitive Neuroscience research field, it can be raising machine Intelligence and problem-solving ability new thinking is provided.The calculating mould that simulation human brain Vision information processing process generallys use Type, referred to as encoding model, encoding model clearly convert nerves reaction for complex stimulus.In recent years due to neuroimaging Rapid development, using brain magnetic (magnetoencephalography, MEG), brain electricity (electroencephalography/ Event-related potential, EEG/ERP), near-infrared optical cerebral function imaging (functional near Infrared spectroscopy, fNIRS), functional mri (functional magnetic resonance ) etc. imaging the research that non-invasive means carry out human brain visual performance has been achieved for a series of great achievements.And functional MRI Imaging has become observation brain function activity because it is provided simultaneously with higher temporal resolution and spatial resolution, explores view Feel the best means of secret.

Visual information encoding model can be abstracted as the transformation of two steps, be from stimulation space to the non-linear of feature space first Transformation, is the process of a feature extraction.Second step is the transformation from feature space to voxel space.Specifically, mainly by Following four part composition: first part is a series of visual stimulus or various task conditions used in experimentation.Second Part is the one group of feature extracted by modes such as Gabor filter, dictionary, neural networks, describes visual stimulus and brain function Abstraction relation between capable of responding.Part III is one or more area-of-interests (region of in brain Interest), it can choose the voxel that building model needs from region of interest.Last part is according to data estimated coding The algorithm of model parameter.The transformation of encoding model second step is generally considered to be linear transformation, and this hypothesis is established first Walking the feature extracted and the response of actual voxel has good linear fit relationship, but in fact, past using Linear Mapping Past is the compromise done because of sample size and calculating.So, effective and reasonable Nonlinear Mapping is established, to improve encoding model Predictablity rate, be significantly to study.2014, Pulkit Agrawal et al. was put forward for the first time based on convolutional Neural Network directly inputs (i.e. pixel) according to lower-level vision to explain high-level vision function, the results showed that, encoding model is accurately Predict many rudimentary and high-level vision area cerebration.The results show of van Gerven in 2015 et al. depth mind The similitude of structure and information processing manner through network and brain vision veutro and Dorsal stream.The depth for having excellent properties Neural network needs mass data to train, however, fMRI technology image taking speed is not fast enough, noise is relatively low, and tests to quilt Examination requires height, and the experimentation duration is longer, and one high-performance depth mind of training is much not achieved in obtained valid data amount Demand through network.For this purpose, the embodiment of the present invention proposes the fMRI visual coding model based on transfer learning.So-called migration is learned Practise, it is popular for be exactly that the thing learnt from source domain is applied to new field.Transfer learning can solve small data and ask Topic, can move to small data for the model for being used for big data.It is shown in Figure 1 in the embodiment of the present invention, a kind of base is provided Include following content in the fMRI visual coding model building method of transfer learning:

S101, visual stimulus image data and its corresponding fMRI data set are obtained, using stimulating image data as coding The input of model, visual area fMRI respond the output as model；

S102, by the CNN feature of depth convolutional neural networks model extraction stimulating image data, and be directed to each vision Area constructs eye response encoding model；

S103, pass through the full articulamentum response model in construction dynamic loss function training visual coding model, foundation view Feel that response modes are that each voxel of brain visual area chooses optimum visual encoding model.

In CNN feature extraction, AlexNet network model is initially set up, and fixes its model parameter, extracts stimulating image number According to the CNN feature of different layers, and two layers of full articulamentum is added behind and constructs visual coding model.Preferably, dynamic by constructing Full articulamentum response model in state loss function training visual coding model, includes following content: being directed to training dataset, Nonlinear Mapping response model is established in feature space and response space, is realized by constructing a kind of dynamic loss function to body The selection of element, to preferably train the response model in visual coding model.

Above-mentioned, loss function indicates are as follows: Loss=-r₁*R+r₂*L₂(W), wherein R indicate the response of prediction voxel with it is true Correlation between the response of entity element, W represent network layer parameter, L₂(W) it indicates to use L₂Regularization function, r₂It is regularization term system Number,Here R₀The phase between the response of prediction voxel and true voxel response of verifying collection Guan Xing, l are the displacement verification conspicuousness threshold value obtained in advance.

Preferably, using loss function as constraint, trained response model and the effective voxel of coding are obtained, when pre- When correlation between the voxel response of survey and true voxel response is greater than l, determine that coding significantly has with the significance of m Effect, m are the numerical value obtained according to l.For training set, trained neural network configuration transfer learning encoding model is utilized；Benefit It uses dynamic loss function as constraint, obtains trained response model and the effective voxel of coding；For effective voxel, instructing The highest model of predictablity rate is chosen in the multiple encoding models based on every layer of CNN feature perfected, as brain visual area The optimum visual encoding model of corresponding voxel, and finally encoding model effect is assessed in test set.

The present invention is for fMRI data SNR is low, valid data amount is few, is difficult to train deep neural network (character modules Type) the problems such as, firstly, feature extraction is carried out to stimulation picture using AlexNet network based on transfer learning, then, by most The customized loss function of smallization is trained two layers of full articulamentum, realizes and reflects from feature space to response the non-linear of space It penetrates, improves the forecasting accuracy to the response of brain visual area voxel.

For the validity for further verifying technical solution of the present invention, it is further explained below by specific simulation example Illustrate:

It is shown in Figure 2, the CNN feature extraction based on transfer learning；Customized loss function；Based on customized loss letter Several Nonlinear Mapping training；Forced coding model is selected for each voxel.Using transfer learning, using AlexNet network as Then Feature Selection Model establishes two layers of fully-connected network model in response in feature space and response space, and defines one New loss function is planted to train the Nonlinear Mapping, to improve the predictablity rate of encoding model, building more meets the mankind The encoding model of vision mechanism.CNN feature extraction based on transfer learning, first selection AlexNet network, its parameter is fixed It is constant, extract the CNN feature of the different layers of stimulating image.

The data set that experimental data was published from Gallant team in 2008.Data can obtain online, be obtained two Name subject is divided into the data of different training set and test set.Training set includes the voxel sound for corresponding to 1750 nature pictures It answers, and test set is made of the voxel response for corresponding to 120 nature pictures.It includes 5 layers of convolutional layer and 3 layers that CNN network, which has altogether, Full articulamentum, inputting stimulates for image, exports as 1024 features.L is obtained by permutation test as a result, real in loss function The middle value that sets is tested as 0.27, is indicated when the correlation between the response of the voxel of prediction and true voxel response is greater than 0.27, It encodes significant with 0.1% significance；In loss function, when correlation R is greater than 0.27, by r₁It is set as 1；Correlation R is 0 With 0.27 between when, r₁It is set as the value with correlation linear change；When correlation R is less than 0.27, r₁Be 0, presentation code without Effect.Use customized loss function as constraint, the full connection layer parameter of CNN feature training on training set for every layer.It will Training set 1750 opens image and is randomly divided into two parts, and a part of training set 1630 is opened, and another part verifying collection 120.It trained Cheng Shouxian is by r₁It is initialized as 1, each layer of parameter W is obtained on the training set that image is opened by first part 1630, then at 120 Correlation R is obtained on verifying collection image₀R is updated with this₁Value, then fix r on 1630 images₁, continue training parameter, The correlation R obtained again by 120 images₀Value update r₁Value, so pass through number wheel iteration, minimize loss function, i.e., The trained two layers full articulamentum (Nonlinear Mapping) for every layer of CNN feature is obtained, while it is effective also to have chosen coding Voxel.The output of encoding model is the corresponding number of voxels purpose voxel response picked out；For each voxel picked out, The highest model of predictablity rate, the volume as the voxel are selected in trained 8 based on every layer of CNN feature encoding model Code model.

It is trained with the neural network in the embodiment of the present invention based on transfer learning and by customized loss function Nonlinear Mapping is as visual coding model, to predict voxel response.And with utilize pyramid structure Gabor wavelet mould Receptive field model (Receptive-Field Model), the CNN of type (Gabor Wavelet Pyramid, GWP) foundation Linear model compares.For GWP model using Gabor filter as feature extractor, the voxel response of prediction is many Different space frequency, different directions, different location Gabor basic function linear combination.CNN linear model is chosen identical AlexNet network carry out feature extraction, from feature space to response space select regularization orthogonal matching pursuit (Regularized Orthogonal Matching Pursuit, ROMP) algorithm obtains voxel response to carry out Linear Mapping. As a result as shown in Figure 4,5 respectively.Fig. 3 is r in loss function₁The value variation diagram that changes with relative coefficient R of value.Fig. 4 is The model proposed in GWP model and the embodiment of the present invention predicated response and the average Pearson came really responded in different visual areas Related coefficient histogram, related coefficient is higher, and to represent encoding model predictablity rate higher；As can be seen from the figure GWP model is whole It is not so good as migration models on body, and as visual area is from rudimentary to advanced, difference is more and more significant；As it can be seen that extracted in GWP model Gabor characteristic is more feature relevant with lower-level vision area, lacks high-level vision feature.Fig. 5 is CNN linear model Predicated response and the average Pearson came phase relation really responded in different visual areas from the model proposed in the embodiment of the present invention Number histograms, related coefficient is higher, and to represent encoding model predictablity rate higher；As can be seen that in lower-level vision area, using moving The encoding efficiency of shifting formwork type is not so good as CNN linear model, and the encoding efficiency of high-level vision area migration models is preferable；Illustrate CNN spy Sign can predict that lower-level vision area responds by Linear Mapping well, but it is advanced to need Nonlinear Mapping that could preferably predict Visual area response, this shows the space that CNN feature also further increases on the information processing mechanism of analog vision access.

Feature extraction is carried out to stimulation picture based on transfer learning in the embodiment of the present invention and using AlexNet network, then Utilize two layers of full articulamentum construction feature space to the Nonlinear Mapping relationship in response space；By number wheel repetitive exercise process, Loss function is minimized, trained two layers of fully-connected network is obtained.By specific experiment data, can further verify, it will Encoding model in the embodiment of the present invention carries out the prediction of different visual area voxels response on test set, and with GWP model and The prediction effect of CNN linear model compares, significant difference, and it is accurate to improve the prediction responded to brain visual area voxel Property.

Unless specifically stated otherwise, the opposite step of the component and step that otherwise illustrate in these embodiments, digital table It is not limit the scope of the invention up to formula and numerical value.

Based on above-mentioned method, the embodiment of the present invention also provides a kind of server, comprising: one or more processors；It deposits Storage device, for storing one or more programs, when one or more of programs are executed by one or more of processors, So that one or more of processors realize above-mentioned method.

Based on above-mentioned method, the embodiment of the present invention also provides a kind of computer-readable medium, is stored thereon with computer Program, wherein the program realizes above-mentioned method when being executed by processor.

The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

In all examples being illustrated and described herein, any occurrence should be construed as merely illustratively, without It is as limitation, therefore, other examples of exemplary embodiment can have different values.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.

The flow chart and block diagram in the drawings show the system of multiple embodiments according to the present invention, method and computer journeys The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, section or code of table, a part of the module, section or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually base Originally it is performed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that It is the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, can uses and execute rule The dedicated hardware based system of fixed function or movement is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation, in another example, multiple units or components can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be through some communication interfaces, device or unit it is indirect Coupling or communication connection can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, of the invention Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with Store the medium of program code.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. a kind of fMRI visual coding model building method based on transfer learning, which is characterized in that include following content:

Visual stimulus image data and its corresponding fMRI data set are obtained, using stimulating image data as the defeated of encoding model Enter, visual area fMRI responds the output as model；

By the CNN feature of depth convolutional neural networks model extraction stimulating image data, and for each visual area construction view Feel response encoding model；

By the full articulamentum response model in construction dynamic loss function training visual coding model, according to eye response mode Optimum visual encoding model is chosen for each voxel of brain visual area.

2. the fMRI visual coding model building method according to claim 1 based on transfer learning, which is characterized in that In CNN feature extraction, AlexNet network model is initially set up, and fixes its model parameter, extracts stimulating image data different layers CNN feature, and two layers of full articulamentum is added behind and constructs visual coding model.

3. the fMRI visual coding model building method according to claim 1 based on transfer learning, which is characterized in that logical The full articulamentum response model in construction dynamic loss function training visual coding model is crossed, includes following content: for training Data set establishes Nonlinear Mapping response model in feature space and response space, by constructing a kind of dynamic loss function The selection to voxel is realized, to preferably train the response model in visual coding model.

4. the fMRI visual coding model building method according to claim 3 based on transfer learning, which is characterized in that dynamic State loss function expression formula are as follows:

Loss=-r₁*R+r₂*L₂(W)

Wherein, R indicates correlation between the response of prediction voxel and true voxel response, and W represents network layer parameter, L₂(W) it indicates to adopt Use L₂Regularization function, r₂It is regularization term coefficient,Here R₀For the prediction of verifying collection Correlation between voxel response and true voxel response, l are the displacement verification conspicuousness threshold value obtained in advance.

5. the fMRI visual coding model building method according to claim 4 based on transfer learning, which is characterized in that logical It crosses and stimulating image data is randomly divided into training set and verifying collection, two layers of full articulamentum response model is carried out based on loss function Training after training set training one is taken turns, collects undated parameter r in verifying₁, iteration is updated by number wheel parameter, keeps loss function minimum Change, obtains response model.

6. the fMRI visual coding model building method according to claim 4 based on transfer learning, which is characterized in that benefit Use loss function as constraint, obtain trained response model and the effective voxel of coding, when prediction voxel response with it is true When correlation between the response of entity element is greater than l, determine that coding is significantly effective with the significance of m, m is to obtain according to l Numerical value.

7. the fMRI visual coding model building method according to claim 5 based on transfer learning, which is characterized in that needle To training set, trained neural network configuration transfer learning encoding model is utilized；Using dynamic loss function as constraint, obtain To trained response model and the effective voxel of coding；For effective voxel, trained more based on every layer of CNN feature The highest model of predictablity rate is chosen in a encoding model, the optimum visual for corresponding to voxel as brain visual area encodes mould Type, and finally encoding model effect is assessed in test set.