Having just read Harold Jarche’s April 27, 2014 blog, making sense of the network era, about personal knowledge mastery (PKM), I am once again reminded about the challenge of evaluation. I am often asked, “Do you have a form I could use about…?” My nutrition and exercise questions notwithstanding (I do have notebooks of those), this makes evaluation sound like it is routine, standardized, or prepackaged rather than individualized, customized, or specific. For me, evaluation is about the exceptions to the rule; how the evaluation this week may have similarities to something I’ve done before (after all this time, I would hope so…), yet is so different; unique, specific.

You can’t expect to find a pre-made formsurvey 2 for your individual program (unless, of course you are replicating a previously established program). Evaluations are unique and the evaluation approach needs to match that unique program specialness. Whether the evaluation uses a survey, a focus group, or an observation (or any other data gathering approach), that approach to gathering data needs to focus on the evaluation question you want answered. You can start with “What difference did the program make?” Only you, the evaluator, can determine if you have enough resources to conduct the evaluation to answer the specific questions that result from what difference did the program make.  You probably do not have enough resources to determine if the program led your target audience to world peace; you might have enough resources to determine if the intention to do something different is there. You probably have enough resources to decide how to use your findings. It is so important that the findings be used; use may be how world peace may be accomplished.

demographics 4There are a few commonalities in data collection; those are the demographics, the data that tell you what your target audience looks like. Things like gender, age, marital status, education level, SES, probably a few other things depending on the program. Make sure when you ask demographic information that a “choose not to answer” option is provided in the survey. Sometimes you have to ask; observations don’t always provide the answer. You need to make sure you include demographics in your survey as most journals want to know what the target audience looked like.

Readers, what makes your evaluations different, unique, special? I’d like to hear about that. Oh and while you are at it…like and share this post, if you do.


I had a topic all ready to write about then I got sick.  I’m sitting here typing this trying to remember what that topic was, to no avail. That topic went the way of much of my recent memory; another day, perhaps.

I do remember the conversation with my daughter about correlation.  She had a correlation of .3 something with a probability of 0.011 and didn’t understand what that meant.  We had a long discussion of causation and attribution and correlation.

We had another long conversation about practical v. statistical significance, something her statistics professor isn’t teaching.  She isn’t learning about data management in her statistics class either.  Having dealt with both qualitative and quantitative data for a long time, I have come to realize that data management needs to be understood long before you memorize the formulas for the various statistical tests you wish to perform.  What if the flood happens????lost data

So today I’m telling you about data management as I understand it, because the flood  did actually happen and, fortunately, I didn’t loose my data.  I had a data dictionary.

Data dictionary.  The first step in data management is a data dictionary.   There are other names for this, which escape me right now…know that a hard copy of how and what you have coded is critical.  Yes, make a back up copy on your hard drive…have a hard copy because the flood might happen. (It is raining right now and it is Oregon in November.)

Take a hard copy of your survey, evaluation form, qualitative data coding sheet and mark on it what every code notation you used means.  I’d show you an example of what I do, only they are at the office and I am home sick without my files.  So, I’ll show you a clip art instead…data management    smiley.  No, I don’t use cards any more for my data (I did once…most of you won’t remember that time…), I do make a hard copy with clear notations.  I find my self doing that with other things to make sure I code the response the same way.  That is what a data dictionary allows you to do–check yourself.

Then I run a frequencies and percentages analysis.  I use SPSS (because that is what I learned first).  I look for outliers, variables that are miscoded, and system generated missing data that isn’t missing.  I look for any anomaly in the data, any humon error (i. e. my error).  Then I fix it.  Then I run my analyses.

There are probably more steps than I’ve covered today.  These are the first steps that absolutely must be done BEFORE you do any analyses.  Then you have a good chance of keeping your data safe.

“In reality, winning begins with accountability. You cannot sustain success without accountability. It is an absolute requirement!” (from walkthetalk.com.)

I’m quoting here.  I wish I had thought of this before I read it.  It is important in everyone’s life, and especially when evaluating.


Webster’s defines accountability as, “…“the quality or state of being accountable; an obligation (emphasis added) or willingness to accept responsibility for one’s actions.”  The business dictionary goes a little further and defines accountability as “…The obligation of an individual (or organization) (parentheses added) to account for its activities, accept responsibility for them, and to disclose the results in a transparent manner.”

It’s that last part to which evaluators need to pay special attention; the “disclose results in a transparent manner” part.  There is no one looking over your shoulder to make sure you do “the right thing”; that you read the appropriate document; that you report the findings you found not what you know the client wants to hear.  If you maintain accountability, you are successful; you will win.

AEA has a adopted a set of Guiding Principles Guiding principlesfor the organization and its members.  The principles are 1) Systematic inquiry; 2) Competence; 3) Integrity/Honesty; 4) Respect for people; and 5) Responsibilities for the General and Public Welfare.  I can see where accountability lies within each principle.  Can you?

AEA has also endorsed the Program Evaluation Standards  program evaluation standards of which there are five as well.  They are:  1) Utility, 2) Feasibility, 3) Proprietary, 4) Accuracy, and 5) Evaluation accountability.  Here, the developers were very specific and made accountability a specific category.  The Standard specifically states, “The evaluation accountability standards encourage adequate documentation of evaluations and a metaevaluative perspective focused on improvement and accountability for evaluation processes and products.”

You may be wondering about the impetus for this discussion of accountability (or, not…).  I have been reminded recently that only the individual can be accountable.  No outside person can do it for him or her.  If there is an assignment, it is the individual’s responsibility to complete the assignment in the time required.  If there is a task to be completed, it is the individual’s responsibility (and Webster’s would say obligation) to meet that responsibility.  It is the evaluator’s responsibility to report the results in a transparent manner–even if it is not what was expected or wanted.  As evaluator’s we are adults (yes, some evaluation is completed by youth; they are still accountable) and, therefore, responsible, obligated, accountable.  We are each one responsible–not the leader, the organizer, the boss.  Each of us.  Individually.  When you are in doubt about your responsibility, it is your RESPONSIBILITY to clarify that responsibility however works best for you.  (My rule to live by number 2:  Ask.  If you don’t ask, you won’t get; if you do, you might not get.)

Remember, only you are accountable for your behavior–No. One. Else.  Even in an evaluation.; especially in an evaluation




You implement a program.  You think it is effective; that it makes a difference; that it has merit and worth.  You develop a survey to determine the merit and worth of the program.  You send the survey out to the target audience which is an intact population–that is, all of the participants are in the target audience for the survey.  You get less than 4o% response rate.  What does that mean?  Can you use the results to say that the participants saw merit in the program?  Do the results indicate that the program has value; that it made a difference if only 40% let you know what they thought.

I went looking for some insights on non-responses and non-responders.  Of course, I turned to Dillman  698685_cover.indd(my go to book for surveys…smiley).  His bottom line: “…sending reminders is an integral part of minimizing non-response error” (pg. 360).

Dillman (of course) has a few words of advice.  For example, on page 360, he says, ” Actively seek means of using follow-up reminders in order to reduce non-response error.”  How do you not burden the target audience with reminders, which are “…the most powerful way of improving response rate…” (Dillman, pg. 360).  When reminders are sent they need to be carefully worded and relate to the survey being sent.  Reminders stress the importance of the survey and the need for responding.

Dillman also says (on page 361) to “…provide all selected respondents with similar amounts and types of encouragement to respond.”  Since most of the time incentives are not an option for you the program person, you have to encourage the participants in other ways.  So we are back to reminders again.

To explore the topic of non-response further, there is a booksurvey non-response (Groves, Robert M., Don A. Dillman, John Eltinge, and Roderick J. A. Little (eds.). 2002. Survey Nonresponse. Wiley-Interscience: New York) that deals with the topic. I don’t have it on my shelf, so I can’t speak to it.  I found it while I was looking for information on this topic.

I also went on line to EVALTALK and found this comment which is relevant to evaluators attempting to determine if the program made a difference:  “Ideally you want your non-response percents to be small and relatively even-handed across items. If the number of nonresponds is large enough, it does raise questions as to what is going for that particular item, for example, ambiguous wording or a controversial topic. Or, sometimes a respondent would rather not answer a question than respond negatively to it. What you do with such data depends on issues specific to your individual study.”  This comment was from Kathy Race of Race & Associates, Ltd.,  September 9, 2003.

A bottom line I would draw from all this is respond…if it was important to you to participate in the program then it is important for you to provide feedback to the program implementation team/person.




Miscellaneous thought 1.

Yesterday, I had a conversation with a long time friend of mine.  When we stopped and calculated (which we don’t do very often), we realized that we have know each other since 1981.  We met at the first AEA (only it wasn’t AEA then) conference in Austin, TX.  I was a graduate student; my friend was a practicing professional/academic.  Although we were initially talking about other things evaluation; I asked my friend to look at an evaluation form I was developing.  I truly believe that having other eyes (a pilot if you will) view the document helps.  It certainly did in this case.  I feel really good about the form.  In the course of the conversation, my friend advocated strongly for a odd numbered scales.  My friend had good reasons, specifically

1) It tends to force more comparisons on the respondents; and

2)  if you haven’t given me a neutral  point I tend to mess up the scale on purpose because you are limiting my ability to tell you what I am thinking.

I, of course, had an opposing view (rule number 8–question authority).  I said, ” My personal preference is an even number scale to avoid a mid-point.  This is important because I want to know if the framework (of the program in question) I provided worked well with the group and a mid-point would provide the respondent with a neutral point of view, not a working or not working opinion.   An even number (in my case four points) can be divided into working and not working halves.  When I’m offered a middle point, I tend to circle that because folks really don’t want to know what I’m thinking.  By giving me an opt out/neutral/neither for or against option they are not asking my opinion or view point.”

Recently, I came across an aea365 post on just this topic.  Although this specific post was talking about Likert scales, it applies to all scaling that uses a range of numbers (as my friend pointed out).  The authors sum up their views with this comment, “There isn’t a simple rule regarding when to use odd or even, ultimately that decision should be informed by (a) your survey topic, (b) what you know about your respondents, (c) how you plan to administer the survey, and (d) your purpose. Take time to consider these four elements coupled with the advantages and disadvantages of odd/even, and you will likely reach a decision that works best for you.”  (Certainly knowing my friend like I do, I would be suspicious of responses that my friend submitted.)  Although they list advantages and disadvantages for odd and even responses, I think there are other advantages and disadvantages that they did not mentioned yet are summed up in their concluding sentence.

Miscellaneous thought 2.

I’m reading the new edition of Qualitative Data Analysis (QDA).  Qualitative data analysis ed. 3  This has always been my go to book for QDA and I was very sad when I learned that both of the original authors had died.  The new author, Johnny Saldana (who is also the author of The Coding Manual for Qualitative Researcherscoding manual--johnny saldana), talks (in the third person plural, active voice) about being a pragmatic realist.  That is an interesting concept.  They (because the new author includes the previous authors in his statement) say “that social phenomena exist not only in the mind but also in the world–and that some reasonably stable relationships can be found among the idiosyncratic messiness of life.”  Although I had never used those exact words before, I agree.  It is nice to know the label that applies to my world view.  Life is full of idiosyncratic messiness; probably why I think systems thinking is so important.  I’m reading this volume because I’ve been asked to write the review of one of my favorite books.  We will see if I can get through it between now and July 1 when the draft of the review is due.  Probably aught to pair it with Saldana’s other book; won’t happen between now and July 1.

I was reminded recently about the 1992 AEA meeting in Seattle, WA.  That seems like so long ago.  The hot topic of that meeting was whether qualitative data or quantitative data were best.  At the time I was a nascent evaluator having been in the field less that 10 years and absorbed debates like this as a dry sponge does water.  It was interesting; stimulating; exciting.  It felt cutting edge.

Now 20+ years later, I wonder what all the hype was about.  Now, there can be rigor in what ever data are collected, regardless of type (numbers or words); language has been developed to look at that rigor.   (Rigor can also escape the investigator regardless of the data collected; another post, another day.)  Words are important for telling stories (and there is a wealth of information on how story can be rigorous) and numbers are important for counting (and numbers have a long history of use–Thanks Don Campbell).  Using both (that is, mixed methods) makes really good sense when conducting an evaluation in community environments, work that I’ve done for most of my career (community-based work).

I was reading another evaluation blog (ACET) and found the following bit of information that I thought I’d share as it is relevant to looking at data.  This particular post (July, 2012) was a reflection of the author. (I quote from that blog).

  • § Utilizing both quantitative and qualitative data. Many of ACET’s evaluations utilize both quantitative (e.g., numerical survey items) and qualitative (e.g., open-ended survey items or interviews) data to measure outcomes. Using both types of data helps triangulate evaluation findings. I learned that when close-ended survey findings are intertwined with open-ended responses, a clearer picture of program effectiveness occurs. Using both types of data also helps to further explain the findings. For example, if 80% of group A “Strongly agreed” to question 1, their open-ended responses to question 2 may explain why they “Strongly agreed” to question 1.

Triangulation was a new (to me at least) concept in 1981 when a whole chapter was devoted to the topic in a volume dedicated to Donald Campbell, titled Scientific Inquiry and the Social Sciences. scientific inquiry and the social sciences   I have no doubt that this concept was not new; Crano, the author of this chapter titled “Triangulation and Cross-Cultural Research”, has three and one half pages of references listed that support the premise put forth in the chapter.  Mainly, that using data from multiple different sources may increase the understanding of the phenomena under investigation.  That is what triangulation is all about–looking at a question from multiple points of view; bringing together the words and the numbers and then offering a defensible explanation.

I’m afraid that many beginning evaluators forget that words can support numbers and numbers can support words.

The topic of survey development seems to be  popping up everywhere–AEA365, Kirkpatrick Partners, eXtension Evaluation Community of Practice, among others.  Because survey development is so important to Extension faculty, I’m providing links and summaries.


 AEA365 says:

“… it is critical that you pre-test it with a small sample first.”  Real time testing helps eliminate confusion, improve clarity, and assures that you are asking a question that will give you an answer to what you want to know.  This is so important today when many surveys are electronic.

It is also important to “Train your data collection staff…Data collection staff are the front line in the research process.”  Since they are the people who will be collecting the data, they need to understand the protocols, the rationales, and the purposes of the survey.

Kirkpatrick Partners say:

“Survey questions are frequently impossible to answer accurately because they actually ask more than one question. ”  This is the biggest problem in constructing survey questions.  They provide some examples of asking more than one question.


Michael W. Duttweiler, Assistant Director for Program Development and Accountability at Cornell Cooperative Extension stresses the four phases of survey construction:

  1. Developing a Precise Evaluation Purpose Statement and Evaluation Questions
  2. Identifying and Refining Survey Questions
  3. Applying Golden Rules for Instrument Design
  4. Testing, Monitoring and Revising

He then indicates that the next three blog posts will cover point 2, 3, and 4.

Probably my favorite post on survey recently was one that Jane Davidson did back in August, 2012 in talking about survey response scales.  Her “boxers or briefs” example captures so many issues related to survey development.

Writing survey questions which give you useable data that answers your questions about your program is a challenge; it is not impossible.  Dillman writes the book about surveys; it should be on your desk.

Here is the Dillman citation:
Dillman, D. A., Smyth, J. D., & Christian, L. M. (2009).  Internet, mail, and mixed-mode surveys: The tailored design method.  Hoboken, NJ: John Wiley & Sons, Inc.

What is the difference between need to know and nice to know?  How does this affect evaluation?  I got a post this week on a blog I follow (Kirkpatrick) that talks about how much data does a trainer really need?  (Remember that Don Kirkpatrick developed and established an evaluation model for professional training back in the 1954 that still holds today.)

Most Extension faculty don’t do training programs per se, although there are training elements in Extension programs.  Extension faculty are typically looking for program impacts in their program evaluations.  Program improvement evaluations, although necessary, are not sufficient.  Yes, they provide important information to the program planner; they don’t necessarily give you information about how effective your program has been (i.e., outcome information). (You will note that I will use the term “impacts” interchangeably with “outcomes” because most Extension faculty parrot the language of reporting impacts.)

OK.  So how much data do you really need?  How do you determine what is nice to have and what is necessary (need) to have?  How do you know?

  1. Look at your logic model.  Do you have questions that reflect what you expect to have happen as a result of your program?
  2. Review your goals.  Review your stated goals, not the goals you think will happen because you “know you have a good program”.
  3. Ask yourself, How will I USE these data?  If the data will not be used to defend your program, you don’t need it.
  4. Does the question describe your target audience?  Although not demonstrating impact, knowing what your target audience looks like is important.  Journal articles and professional presentations want to know this.
  5. Finally, ask yourself, Do I really need to know the answer to this question or will it burden the participant.  If it is a burden, your participants will tend to not answer, then you  have a low response rate; not something you want.

Kirkpatrick also advises to avoid redundant questions.  That means questions asked in a number of ways and giving you the same answer; questions written in positive and negative forms.  The other question that I always include because it will give me a way to determine how my program is making a difference is a question on intention including a time frame.  For example, “In the next six months do you intend to try any of the skills you learned to day?  If so, which one.”  Mazmaniam has identified the best predictor of behavior change (a measure of making a difference) is stated intention to change.  Telling someone else makes the participant accountable.  That seems to make the difference.



Mazmanian, P. E., Daffron, S. R., Johnson, R. E., Davis, D. A., & Kantrowits, M. P. (1998).   Information about barriers to planned change: A Randomized controlled trail involving continuing medical education lectures and commitment to change.  Academic Medicine, 73(8).


P.S.  No blog next week; away on business.




Quantitative data analysis is typically what happens to data that are numbers (although qualitative data can be reduced to numbers, I’m talking here about data that starts as numbers.)  Recently, a library colleague sent me an article that was relevant to what evaluators often do–analyze numbers.

So why, you ask, am I talking about an article that is directed to librarians?  Although that article is is directed at librarians, it has relevance to Extension.  Extension faculty (like librarians), more often than not, use surveys to determine the effectiveness of their programs.  Extension faculty are always looking to present the most powerful survey conclusions (yes, I lifted from the article title), and no you don’t need to have a doctorate in statistics to understand these analyses.  The other good thing about this article is that it provides you with a link to an online survey-specific software:  (Raosoft’s calculator at http://www.raosoft.com/samplesize.html).

This article refers specifically to three metrics that are often overlooked by Extension faculty:  margin of error (MoE), confidence level (CL), and cross-tabulation analysis.   These are three statistics which will help you in your work. The article also does a nice job of listing the eight recommended best practices which I’ve appended here with only some of the explanatory text.


Complete List of Best Practices for Analyzing Multiple Choice Surveys

1. Inferential statistical tests. To be more certain of the conclusions drawn from survey data, use inferential statistical tests.

2. Confidence Level (CL). Choose your desired confidence level (typically 90%, 95%, or 99%) based upon the purpose of your survey and how confident you need to be of the results. Once chosen, don’t change it unless the purpose of your survey changes. Because the chosen confidence level is part of the formula that determines the margin of error, it’s also important to document the CL in your report or article where you document the margin of error (MoE).

3. Estimate your ideal sample size before you survey. Before you conduct your survey use a sample size calculator specifically designed for surveys to determine how many responses you will need to meet your desired confidence level with your hypothetical (ideal) margin of error (usually 5%).

4. Determine your actual margin of error after you survey. Use a margin of error calculator specifically designed for surveys (you can use the same Raosoft online calculator recommended above).

5. Use your real margin of error to validate your survey conclusions for your larger population.

6. Apply the chi-square test to your crosstab tables to see if there are relationships among the variables that are not likely to have occurred by chance.

7. Reading and reporting chi-square tests of cross-tab tables.

  • Use the .05 threshold for your chi-square p-value results in cross-tab table analysis.
  • If the chi-square p-value is larger than the threshold value, no relationship between the variables is detected. If the p-value is smaller than the threshold value, there is a statistically valid relationship present, but you need to look more closely to determine what that relationship is. Chi-square tests do not indicate the strength or the cause of the relationship.
  • Always report the p-value somewhere close to the conclusion it supports (in parentheses after the conclusion statement, or in a footnote, or in the caption of the table or graph).

8. Document any known sources of bias or error in your sampling methodology and in your survey design in your report, including but not limited to how your survey sample was obtained.


Bottom line:  read the article.

Hightower, C. & Kelly, S. (2012, Spring).  Infer more, describe less: More powerful survey conclusions through easy inferential tests.  Issues in Science and Technology Librarianship. DOI:10.5062/F45H7D64. [Online]. Available at: http://www.istl.org/12-spring/article1.html

Evaluation costs:  A few weeks ago, I posted a summary about evaluation costs. A recent AEA LinkedIn discussion was on the same topic (see this link).  If you have not linked to other evaluators, there are other groups besides AEA that have LinkedIn groups.  You might want to join one that is relevant.

New topic:  The video on surveys posted last week generated a flurry of comments (though not on this blog).  I think it is probably appropriate to revisit the topic of surveys.  As I decided to revisit this topic,  an AEA 365 post from the Wilder Research group talked about data coding related to longitudinal data.

Now, many surveys, especially Extension surveys, focus on cross sectional data not on longitudinal data.  They may, however, involve a large number of participants and the hot tips that are provided apply to coding surveys.  Whether the surveys Extension professionals develop involve 30, 300, or 3000 participants, these tips are important especially if the participants are divided into groups on some variable.  Although the hot tips in the Wilder post talk about coding, not surveys specifically, they are relevant to surveys and I’m repeating them here.   (I’ve also adapted the original tip to Extension use).

  • Anticipate different groups.  If you do this ahead of time, and write it down in a data dictionary or coding guide, your coding will be easier.  If the raw data are dropped, or for some other reason scrambled (like a flood, hurricane, or a sleepy night), you will be able to make sense out of the data quicker.
  • Sometimes there are preexisting identifying information (like location of the program) that have a logical code.  Use that code.
  • Precoding by the location sites helps keep the raw data organized and enables coding.

Over the rest of the year, I’ll be revisiting survey on a regular basis.  Survey is often used by Extension.  Developing a survey that provides you with information you want, can use, and makes sense is a useful goal.

New topic:  I’m thinking of varying the format of the blog or offering alternative formats with evaluation information.  I’m curious as to what would help you do your work better.  Below are a few options.  Let me know what you’d like.

  • Videos in blogs
  • Short concise (i.e., 10-15 minute) webinars
  • Guest writers/speakers/etc.
  • Other ideas