- Question Asked
At this stage I asked several questions regarding the spatial distribution of population characteristics in all counties in Oregon in 2014: What are the county level spatial patterns of reported age-adjusted Salmonella rates within Oregon in 2014? County level spatial patterns of proportions of females? Median Age? Proportion of infants/young children aged 0-4 years?
To answer these questions I used several different datasets. The first dataset used is a collection of all reported Salmonella cases in Oregon from 2008-2017 which includes information like sex, age group, county in which the case was reported, and onset of illness. The information in this dataset was deidentified by Oregon Health Authority. The second dataset used was a collection of Oregon population estimates over the same time period. This dataset includes sex and age group specific county level population information. I also obtained county level median ages from AmericanFactFinder. The last dataset used is a shapefile from the Oregon Spatial Data Library containing polygon information of all Oregon counties.
- Names of analytical tools/approaches used
I used a direct age adjustment (using the 2014 statewide population as the standard population) to obtain county level age-adjusted Salmonella rates. After calculating county level summary data e.g. proportion of females, proportion of children aged 0-4, median age, and age-adjusted Salmonella rates, I merged this information with a spatial dataframe containing polygonal data of every county in Oregon. After doing this I did both local (between 0-150 km) and global (statewide) spatial autocorrelation to get a Moran’s I statistic for each of the population variables listed above. I produced choropleth maps of each of the variables for Oregon as well. Finally, I produced a heatmap for county-level age-adjusted Salmonella rates using a Getis-Ord Gi* local statistic to evaluate statistically significant clustering of high/low rates of reported Salmonella cases.
- Description of the analytical process
After extensive reformatting, I was able to organize cases of Salmonella by age group and by county for the year 2014. After this I formatted 2014 county level population estimates in the same way. I then divided the Salmonella case dataframe by the population estimate dataframe to get rates by the different age groups. To get county age-adjusted rates I created a “standard population”, in this case I used Oregon’s statewide population broken down into the same age groups as above. I then multiplied the each of the county’s age-specific rates by the standard population’s matching age groups to create a dataframe of hypothetical cases. This dataframe represents the number of cases we would expect in each of the counties if they had the same population and age distribution as Oregon as a whole. I summed the expected Salmonella cases by county and divided this number by the 2014 statewide population. This yielded age-adjusted reported Salmonella rates by county.
Given that the population data contained county level populations broken down by age group and by sex I was able to calculate proportions of county populations which were female, and which were young children aged 0-4 years by dividing those respective group populations by the total county population.
After this I performed local and global spatial autocorrelation with Moran’s I using the county level median age, proportion of children, proportion of females, and age adjusted Salmonella rates which were associated with centroid points for each county. The global Moran’s I was calculated using the entire extent of the state and the local Moran’s I was calculated by limiting analysis to locations within 150 km of the centroid. Both global and local Moran’s I statistics were calculated using the Monte-Carlo method with 599 simulations.
Finally, I completed a Hot Spot Analysis using Getis-Ord Gi* to assess for any statistically significant hot or cold spots in Oregon. This was only done for the age-adjusted Salmonella rates. This was completed using the same county centroid points as above. I completed this analysis with a local weights matrix using Queen Adjacency for neighbor connectivity. The weighting scheme was set to where all neighbor weights when added together equaled 1.
- Brief description of results you obtained
Choropleth Maps of Oregon:
From the median age map, we can see that there are some clusters of older counties in the northeastern portion of the state and along west coast. Overall, the western portion of Oregon is younger than the eastern portion of the state.
From the proportion of children map there are a few clusters of counties in the northern portion of the state with high proportions of children compared to the rest of the state. Overall, the counties surrounding the Portland metro area have higher proportions of children compared to the rest of the state.
From the proportion of females map, we can see that the counties with the highest proportion of females are clustered in the western portion of the state.
Finally, from the age-adjusted county Salmonella rates map we can see that the highest rates of Salmonella occur mostly in the western portion of the state with a few counties in the northeast having high rates as well. Overall, the counties surrounding Multnomah county have the highest rates of Salmonella.
The global Moran’s I statistics:
- County proportions of females: 0.053 with a p-value of 0.15. This suggests insignificant amounts of slight clustering.
- County median age: 0.175 with a p-value of 0.02. This provides evidence of some significant mild clustering.
- County proportions of children: 0.117 with a p-value of 0.05. This provides evidence of significant mild clustering
- County age-adjusted Salmonella rates: -0.007 with a p-value of 0.32. This suggests insignificant amounts of higher dispersal than would be expected.
Local Moran’s I Statistics:
- County proportions of females: 0.152 with a p-value of 0.02. This suggests significant amounts of mild clustering.
- County median age: 0.110 with a p-value of 0.07. This provides evidence of some insignificant mild clustering.
- County proportions of children: 0.052 with a p-value of 0.1617. This provides evidence of insignificant slight clustering
- County age-adjusted Salmonella rates: -0.032 with a p-value of 0.5083. This suggests insignificant amounts of higher dispersal than would be expected.
- The heatmap shows a significant hotspot (with 95% confidence) in Clackamas county with another hotspot (with 90% confidence) in Hood River County. Three cold spots (with 90% confidence) are seen in Malheur, Crook, and Morrow counties.
- Critique of Methods
The choropleth maps were very useful at showing areas with high/values however this method was not able to detect counties with significantly different values compared their neighbors. Overall, it was useful as an exploratory tool. The global and local Moran’s I calculations were able to detect if high/low values were closely clustered or more dispersed than what is expected. However, I am unsure if this method was completely appropriate given the coarseness of this county level data. At a local scale, only the proportion of women showed a significant amount of clustering, and globally median age and proportion of children showed some amount of significant clustering. Given that most of the Moran’s I statistics were not associated with significant values, I don’t believe this analytical method highlighted a particularly meaningful spatial pattern in my data. The heatmap provided evidence of some significant hot and cold spots in Oregon, however this was based on immediate neighbor weights and perhaps global weights would be more appropriate. Overall, this tool was very useful in detecting significantly high/low Salmonella rates.