Recently I’ve needed to plot compositional data by one or more groups. These are usually in the form of a categorical variable (ordered or not) and a binary variable to distinguish two groups; e.g., minority status or poverty (0/1). I was struggling to plot the categorical variable across the two groups so that the bars sum to 100% for each group. Let’s start with a simple example.
Data is from IPUMS. My data is (here) with setup (here). We have an exhaustive five category grouping of family structure: (1) two adults no working woman, (2) two adults with working woman, (3) single woman not working, (4) single woman working, and (5) single male.
Let’s say we want to examine poverty status across family structure. Poverty is measured using the US Census official poverty measure. We want to analyze the family structure compositions of the poor versus non-poor.
Let’s begin with a descriptive barchart of family structure using the catplot package from Nicholas J. Cox.
catplot fams, percent /// ytitle("Percent") /// title("Poverty and family structure") /// subtitle("") /// note("Source: IPUMS ACS") /// ysize(3) blabel(bar, format(%9.1f)) graph export catplot1.png, replace
Now let’s add the poverty status.
catplot povstat fams, percent asyvars ///
ytitle("Percent") ///
legend(label(1 "Non-poor") label(2 "Poor")) ///
title("Poverty and family structure") ///
subtitle("") ///
note("Source: IPUMS ACS") ///
ysize(3) blabel(bar, format(%9.1f))
graph export catplot1.png, replace
This is not bad. However, the percent are cumulative for the entire sample. We are usually looking for compositions within group so that we can compare across groups.
catplot povstat fams, percent(povstat) /// ytitle("Percent") /// legend(label(1 "Non-poor") label(2 "Poor")) /// title("Poverty and family structure") /// subtitle("") /// note("Source: IPUMS ACS") /// ysize(3) blabel(bar, format(%9.1f)) graph export catplot3.png, replace
Again, we have improvement. But, visually seeing the 0 and 1 next to each other remains burdensome on the viewer. Can we get rid of those and add some color to facilitate comparison?
The option asyvars
helps here.
catplot povstat fams, percent(povstat) asyvars /// ytitle("Percent") /// legend(label(1 "Non-poor") label(2 "Poor")) /// title("Poverty and family structure") /// subtitle("") /// note("Source: IPUMS ACS") /// ysize(3) blabel(bar, format(%9.1f)) graph export catplot4.png, replace
David,
This was a good example of how scientists analyze and visualize their data. It’s also a great example of how social science is applied to understand real world data. I got here starting with this article . Since you were quoted, google your name and found this blog. 🙂 I used to work as a research associate in Biochemistry at OSU and now I teach computer science after school to middle and high school students. I really appreciate how you showed step-wise your thinking about how to solve this problem. It’s the kind of thinking I try to encourage in my students. Also, I’m always on the look out for examples of real problems to which they can apply their budding coding and design skills.
Looks like I messed up my HTML tags in my comment above. Should have been: Nonprofit wants to help East Portland residents build wealth through strip mall. 😉