**Question Asked**

At this stage I asked several questions regarding the spatial distribution of population characteristics in all counties in Oregon in 2014: What are the county level spatial patterns of reported age-adjusted *Salmonella *rates within Oregon in 2014? County level spatial patterns of proportions of females? Median Age? Proportion of infants/young children aged 0-4 years?

To answer these questions I used several different datasets. The first dataset used is a collection of all reported *Salmonella *cases in Oregon from 2008-2017 which includes information like sex, age group, county in which the case was reported, and onset of illness. The information in this dataset was deidentified by Oregon Health Authority. The second dataset used was a collection of Oregon population estimates over the same time period. This dataset includes sex and age group specific county level population information. I also obtained county level median ages from AmericanFactFinder. The last dataset used is a shapefile from the Oregon Spatial Data Library containing polygon information of all Oregon counties.

**Names of analytical tools/approaches used**

I used a **direct age adjustment** (using the 2014 statewide population as the standard population) to obtain county level age-adjusted *Salmonella *rates. After calculating county level summary data e.g. proportion of females, proportion of children aged 0-4, median age, and age-adjusted *Salmonella *rates, I merged this information with a spatial dataframe containing polygonal data of every county in Oregon. After doing this I did both **local** (between 0-150 km)** and global **(statewide) **spatial autocorrelation **to get a **Moran’s I statistic** for each of the population variables listed above. I produced **choropleth maps** of each of the variables for Oregon as well. Finally, I produced a **heatmap** for county-level age-adjusted *Salmonella *rates using a **Getis-Ord Gi*** **local statistic** to evaluate statistically significant clustering of high/low rates of reported *Salmonella *cases.

**Description of the analytical process**

After extensive reformatting, I was able to organize cases of *Salmonella* by age group and by county for the year 2014. After this I formatted 2014 county level population estimates in the same way. I then divided the *Salmonella *case dataframe by the population estimate dataframe to get rates by the different age groups. To get county age-adjusted rates I created a “standard population”, in this case I used Oregon’s statewide population broken down into the same age groups as above. I then multiplied the each of the county’s age-specific rates by the standard population’s matching age groups to create a dataframe of hypothetical cases. This dataframe represents the number of cases we would expect in each of the counties if they had the same population and age distribution as Oregon as a whole. I summed the expected *Salmonella *cases by county and divided this number by the 2014 statewide population. This yielded age-adjusted reported *Salmonella *rates by county.

Given that the population data contained county level populations broken down by age group and by sex I was able to calculate proportions of county populations which were female, and which were young children aged 0-4 years by dividing those respective group populations by the total county population.

After this I performed local and global spatial autocorrelation with Moran’s I using the county level median age, proportion of children, proportion of females, and age adjusted *Salmonella *rates which were associated with centroid points for each county. The global Moran’s I was calculated using the entire extent of the state and the local Moran’s I was calculated by limiting analysis to locations within 150 km of the centroid. Both global and local Moran’s I statistics were calculated using the Monte-Carlo method with 599 simulations.

Finally, I completed a Hot Spot Analysis using Getis-Ord Gi* to assess for any statistically significant hot or cold spots in Oregon. This was only done for the age-adjusted *Salmonella *rates. This was completed using the same county centroid points as above. I completed this analysis with a local weights matrix using Queen Adjacency for neighbor connectivity. The weighting scheme was set to where all neighbor weights when added together equaled 1.

**Brief description of results you obtained**

**Choropleth Maps of Oregon: **

From the median age map, we can see that there are some clusters of older counties in the northeastern portion of the state and along west coast. Overall, the western portion of Oregon is younger than the eastern portion of the state.

From the proportion of children map there are a few clusters of counties in the northern portion of the state with high proportions of children compared to the rest of the state. Overall, the counties surrounding the Portland metro area have higher proportions of children compared to the rest of the state.

From the proportion of females map, we can see that the counties with the highest proportion of females are clustered in the western portion of the state.

Finally, from the age-adjusted county *Salmonella *rates map we can see that the highest rates of *Salmonella *occur mostly in the western portion of the state with a few counties in the northeast having high rates as well. Overall, the counties surrounding Multnomah county have the highest rates of *Salmonella*.

**The global Moran’s I statistics:**

- County proportions of females: 0.053 with a p-value of 0.15. This suggests insignificant amounts of slight clustering.
- County median age: 0.175 with a p-value of 0.02. This provides evidence of some significant mild clustering.
- County proportions of children: 0.117 with a p-value of 0.05. This provides evidence of significant mild clustering
- County age-adjusted
*Salmonella*rates: -0.007 with a p-value of 0.32. This suggests insignificant amounts of higher dispersal than would be expected.

**Local Moran’s I Statistics:**

- County proportions of females: 0.152 with a p-value of 0.02. This suggests significant amounts of mild clustering.
- County median age: 0.110 with a p-value of 0.07. This provides evidence of some insignificant mild clustering.
- County proportions of children: 0.052 with a p-value of 0.1617. This provides evidence of insignificant slight clustering
- County age-adjusted
*Salmonella*rates: -0.032 with a p-value of 0.5083. This suggests insignificant amounts of higher dispersal than would be expected.

**Getis-Ord Gi*:**

- The heatmap shows a significant hotspot (with 95% confidence) in Clackamas county with another hotspot (with 90% confidence) in Hood River County. Three cold spots (with 90% confidence) are seen in Malheur, Crook, and Morrow counties.

**Critique of Methods**

The choropleth maps were very useful at showing areas with high/values however this method was not able to detect counties with significantly different values compared their neighbors. Overall, it was useful as an exploratory tool. The global and local Moran’s I calculations were able to detect if high/low values were closely clustered or more dispersed than what is expected. However, I am unsure if this method was completely appropriate given the coarseness of this county level data. At a local scale, only the proportion of women showed a significant amount of clustering, and globally median age and proportion of children showed some amount of significant clustering. Given that most of the Moran’s I statistics were not associated with significant values, I don’t believe this analytical method highlighted a particularly meaningful spatial pattern in my data. The heatmap provided evidence of some significant hot and cold spots in Oregon, however this was based on immediate neighbor weights and perhaps global weights would be more appropriate. Overall, this tool was very useful in detecting significantly high/low *Salmonella* rates.

jonesju — April 27, 2019 @ 5:16 pm

Seth, good progress. The Moran’s I and hotspot analysis draw our attention to the Portland area for higher adjusted rates of Salmonella. As you note, the ability of spatial pattern analysis to reveal patterns in county-level data is limited because of its coarse spatial resolution. Perhaps for Ex 2 you would like to start looking at change maps of change over time? Or do you have access to finer-scale data of some of your possible driver variables, so that you could ask how the spatial pattern of certain factors might influence Salmonella cases? Do you have data on income? or the proportion of times a family eats out each week? What hypotheses do you have about the factors that cause Salmonella?