What is Multiple Imputation and why does it work?

Shredded Paper Multiple Imputation

Imagine that you’re an investigative journalist, and you have a shredded document that you need to reconstruct for a story that you’re writing. You only have a portion of the slices of the document. After putting the slices of paper back together, you are left with a series of sentences with some letters missing from each sentence. Let’s consider the following sentence missing from the document, from a fictional company, Corporation X:

We are involved in ____ing toxic envi____ental waste in a local river.

If you’re skilled at the Wheel of Fortune-style word games, you might be able to fill in the missing letters to some of these words. First let us consider the word “envi____ental.” You might guess that this word is environmental (I can’t think of any other words that could fill in here, can you?). However, two words prior, we have “____ing.” Is it “dumping”, “cleaning”, “jumping” or any number of a variety of missing words? How would you figure out the meaning of the paper here?

In practice, you would likely use context. In addition to the information in the sentence above, the sentences just prior and just after might suggest what that word is. Without context, you might pick the most common word ending in –ing in the English language.

One option for this is that you could pick the most likely option, let’s say that is “dumping” in this case. This would be big news! The headline might read: Corporation X has leaked internal documents saying that they dump toxic environmental waste in a local river!

However, another option for reconstructing the paper would be to try different possibilities for what the missing information could be, rather than just a single option. For example, we could say, given the information in the rest of the document, there’s a 46% chance that the word is “dumping” or a synonym of it, and a 44% chance that the word is “cleaning” or some synonym, and a 10% that the word has some other meaning. We could then express, the possible meanings of the document, based on the different possible imputations, or filling in, of the missing words. This would give us a much better understanding of the document, rather than just looking at one possibility.

Multiple imputation is a statistical method which uses the principle described above to help make accurate estimates and inferences when some of the data are missing. Just as the paper uses context to fill in the missing values, in multiple imputation, we use both information from prior knowledge and information that we learn from the data set. Furthermore, we incorporate a number of possibilities for the missing data. In many applications of multiple imputation, we impute numbers rather than letters, but the principle is the same. Once we impute the missing data, we can complete a statistical analysis using any one of a variety of statistical methods, such as linear regression, with only minor modifications.

Multiple Imputation and other good methods for handling missing data in statistical analysis allow us to make accurate inferences in situations where other methods (like ignoring missing data) yield inaccurate inferences. If you hadn’t considered other possibilities, you might inaccurately conclude that Corporation X is dumping waste, when there’s an almost equal chance it is cleaning waste based on the document. While the details of multiple imputation can get a bit technical, you should now understand the concept of it. For those interested in further reading about some of the more technical aspects of the method, you can see this site: The Multiple Imputation FAQ Page.

Amit Chowdhry is an MD-PhD student at the University of Rochester, currently working toward his PhD in statistics. His research focuses on making accurate inferences when combining the results of multiple studies (meta-analysis). In his free time, he enjoys cooking, reading about other branches of science, and volunteering at a student-run free clinic.

comments powered by Disqus

Get the best

Get monthly email updates with the best from The Concepts Project. No spam, ever.

Contact us

Get in touch, we'd love to hear from you: theconceptsproject@gmail.com

Greatest Hits

Thinking At The Margin: what to do when you drop your piggy bank in the middle of the forest.

Strategy and Backward Induction: how to win a week of lunches from your unsuspecting colleagues.

What is Multiple Imputation?: when statisticians turn into detectives.

On Shuttle Drivers, Chocolate and NP Completeness: a deliciously difficult problem in computer science.

Rest and Digest vs Fight or Flight: how your body (and medications) help with fighting tigers.

Sites we like

William Shaw, writing about Politics, Theatre, Sci-fi… Mainly Sci-fi.

Better Explained, for maths explanations that click.

Science Non Fiction, a graduate student perspective on science in the news and in our lives.

Clearer Thinking, learn to think more clearly and make better decisions.

EconScribe.org, working to improve the quality of research communications.

Jess Whittlestone, a blog about decision making and behavioural science.