A reader asked how to choose a sample for a survey. Good question.
My daughters are both taking statistics (one in college, one in high school) and this question has been mentioned more than once. So I’ll give you my take on sampling. There are a lot of resources out there (you know, references and other sources). My favorite is in Dillman 3rd edition, page 57.
Sampling is easier than most folks make it out to be. Most of the time you are dealing with an entire population. What, you ask, how can that be?
You are dealing with an entire population when you survey the audience of a workshop (population 20, or 30, or 50). You are dealing with a population when you deal with a series of workshops (anything under 100). Typically, workshops are a small number; only happen once or twice; rarely include participants who are there because they have to be there. If you have under 100, you have an entire population. They can all be surveyed.
Now if your workshop is a repeating event with different folks over the offerings, then you will have the opportunity to sample your population because it is over 100 (see Dillman, 3rd edition, page 57). If you have over 100 people to survey AND you have contact information for them, then you want to randomly sample from that population. Random selection (another name for random sampling) is very different from random assignment; I’m talking about random sampling.
Random sampling is a process where everyone gets an identification number (and an equal chance to be selected), sequentially; so 1- 100. Then find a random number table; usually found in statistic books in the back. Close your eyes and let your hand drop onto a number. Let’s say that number is 56997. You know you need numbers between 1 and 100 and you will need (according to Dillman) for a 95% confidence level with a plus or minus 3% margin of error and a 50/50 split at least 92 cases (participants) OR if you want an 80/20 split, you will need 87 cases (participants). So you look at the number and decide which two digit number you will select (56, 69, 99, 0r 97). That is your first number. Let us say you chose 99 that is the third two digit number found in the above random number (56 and 69 being the first two). So participant 99 will be on the randomly selected (random sampling) list. Now you can go down the list, up the list, to the left or the right of the list and identify the next two digit number in the same position. For this example, using the random numbers table from my old Minium (for which I couldn’t find a picture since it is OLD) stat book (the table was copied from the Rand Corporation, A million random digits with 100,000 normal deviates, Glencoe, IL: The Free Press, 1955), the number going right is 41534, I would choose participant number 53. Continuing right, with the number 01953, I would choose participant number 95, etc. If you come across a number that you have already chosen, go to the next number. Do this process until you get the required number of cases (either 92 or 87). You can select fewer if you want a 10% plus or minus margin of error (49, 38) or a 5% plus or minus margin of error (80, 71). (I always go for the least margin of error, though.) Once you have identified the required number, drafted the survey, and secured IRB approval, you can send out the survey. We will talk about response rates next week.
Hi Molly, thank you for your insight on gaining samplings for surveys. I do have a follow up question: in your opinion how important is obtaining data from people in different geographical locations? I’m assuming the importance of gathering data from various locations depends on what type of information your survey is trying to correlate, but I am wondering if you feel data retrieved from a specific city or state might lose any credibility if the results are supposed to reflect an entire country?
Thanks for your efforts on surveys and sampling. Really appreciated.