Thursday, May 26, 2016

multi-voxel pattern analysis


A commonly-held idea is that Multi-Voxel Pattern Analysis has something to do with patterns over voxels.

This flames of this misconception are probably fanned by images like this:

[http://www.cogsci.mq.edu.au/research/projects/thebrainthatadapts/]

Because we never saw this kind of image in review papers about single-voxel fMRI analysis, we tend to think MVPA is special in being able to detect patterns like the one colorfully shown.

What do we mean by a "pattern", exactly? In particular, people often get the idea that MVPA is special because it's sensitive to cases where nearby voxels might encode the stimulus in opposite directions. This intuitively fits with the image above.

The truth is, analyses that treat each voxel independently are perfectly happy to tell you about nearby voxels encoding a stimulus in opposite directions. Suppose you have two stimulus conditions, like face vs. house. At each voxel independently, you can perform an ANOVA against these category labels. If two adjacent voxels encode the categories in opposite directions, the F-statistics at these voxels will both be large.

You can spatially smooth these F-statistics and align them between subjects, and get a statistical map of where in the brain encodes information about faces and houses. (Of course, ANOVA isn't the only option. For example, you can use support vector machines or other methods to classify the one-dimensional response in a single voxel, and report classification accuracies.)

Thus, even considering one voxel at a time, you can still pick up *the same pattern* of positive and negative encoding shown in the colorful image above.

What MVPA does is to trade spatial resolution for statistical sensitivity. By combining the reports of multiple (usually adjacent) voxels, more information about the variable of interest (e.g., faces vs. houses) is pooled together. The tradeoff is that we don't know which of our voxels really contain information about faces and houses.

To make this more concrete, consider multiple regression, a powerful tool for multi-voxel analysis. Our prediction of the response y (e.g., faceness vs. houseness) is related to:

beta_1 * voxel_1 + beta_2 * voxel_2 + ... + beta_n * voxel_n

In other words, our prediction is roughly the average (or sum) of the predictions we would have made from individual GLMs run separately on those n voxels. [This is not completely true because the actual betas we estimate would be different in single regressions compared to multiple regression. In particular, a good multi-voxel method would weight the contributions of different voxels according to how predictive they are.]

A weighted average can be much better than a prediction from a single voxel. But there's no extra "pattern" information here!

We said above that we can use support vector machines on single voxels. We can go further and use most of the multi-voxel toolkit to analyze single voxels, including representational similarity analysis. For example, here's a representational dissimilarity matrix constructed from a single MEG sensor (since I have these data handy):




However, it's worth noting two things. First, doing multi-voxel analysis inspired this way of thinking about brain data. Second, many of these tools work much better on multi-voxel data than on single voxel or single sensor data, because much more information is being pooled together (again, trading spatial specificity).

The picture changes if we permit nonlinear effects in our multi-voxel model. In the nonlinear case, our multi-voxel analysis can detect encodings that are completely invisible to single voxel analyses, like this one:



In many practical cases, nonlinear analyses underperform linear ones, maybe because of overfitting.
[Technical side note - linear classification, unlike linear regression, could perform above chance on these example data, by drawing a boundary that puts all the blue points on one side, and half of the red points on the other side. This is because classification is a non-linearity where we don't penalize the incorrectly classified red points for being farther from the boundary.]


PS. I definitely don't claim to be saying anything remotely new in this blog post. This has all been said many times before.

PPS. Thanks to Archy for the inspiration to blog about science!