**1. Research question.**

For this problem I want to answer:

How socio-demographic and spatial variables explain patterns of survey responses about attitudes regarding flood-safety in neighborhoods?

**2 Description of the dataset.**

The dataset for the study has the following characteristics:

- Data is obtained from Household voluntary survey
- Convenience sampling. Households located within the 100 and 500 FEMA’s Flood hazard Map. All at once.
- Each participant answers socio-demographic and predefined intentions’ questionnaire
- A printed coded survey questionnaire was mailed out to residents. The code is used to identify resident addresses.
- 103 variables have been collected
- Most of the variables are categorical

An example of a typical variable to be analyzed:

**Variable:**Suppose your current home was to flood, how confident are you in the following possible conditions? – I will be able to evacuate my home before flooding begins (This variable is expressed in 5**categories**of**discrete values without hierarchy**):**Categories:**- Very confident
- Confident
- Neutral
- Somewhat confident
- Not confident at all

The spatial data consists of a map identifying land properties within the boundaries of the 100 year and 500 year flood hazard Fema’s map has been developed, as shown in Figure 1. The survey has been mailed out to randomly selected properties withing this map boundaries.

Figure 1. Map for South Corvallis affected properties according to FEMA’s 100 year and 500 year flood categories.

**3. Hypotheses:**

Attitudes regarding flood-safety in neighborhoods are clustered according to socio-demographic and spatial factors.

**4. Approaches:**

After testing various geospatial aproaches, as such as the hotspot, kriging, and others. I ended up clustering my data using Spline Interpolation. For clustering, the different categories values were taking as the “Z” values in order to use the spline interpolation of independent variables.

Later, a Principal Component Analysis (PCA) and Factor Analysis (FA) is applied to analyze the collected data in order to find multivariate clustering.

Figure 2. Principal component (http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues).

**5. Results:**

I found statistical relationships between the categorical variables collected from the survey according to its spatial location that define patterns formation. And also, maps of these relationships within the 100 year and 500 Fema’s Map of Figure 1.

Bootstrapped confidence intervals for raw and composite correlations was performed using R. The result of this analysys using the Spearman correlation is shown in Figure 3.

Figure 3. Bootstrapped confidence intervals for raw and composite correlations for 108 variables

Using Spline Interpolation, different clusters were found analizing independently each variable. Categories values were taking as the “Z” values in order to use the spline interpolation of independent variables. For instance the “Perception of flooding extent” figure below shows four categories interpolated spatially. Similarly, the other variables where studied.

Statistical analysys of the collected data was performed using R. The figure “Self-confidence in being flood-ready” below shows a comparison of people’s answer. Five variables are analyzed with answer categories from 1 to 5, being 1 very-confident and 5 not-confident-at-all. It is noticed that people are more confident on timely house evacuation and, they have low confidence in flood insurance coverage. Similarly, the other variables where studied.

A preliminary examination of the collected data indicates that 27 variables are suitable for multivariate clustering. A Principal Component Analysis (PCA) and Factor Analysis (FA) is applied to analyze the collected data. The Figure “Eigenvalues of principal factors with 27 variables” shows variances greater than 1. This indicates that multivariate clustering can be performed. A two factors analysis indicates loadings for the clustered variables.

**6. Significance:**

This research will contribute to policy and decision making for neighborhood adaptation to climate change. Adaptation policies can better be addressed by identifying residents attitude patterns. Patterns identification of attitudes regarding flood-safety in neighborhoods is important for planning adaptation to different flooding scenarios in order to minimize personal risks.

**7. Learning about geospatial software:**

I have improved my knowledge of Arc-Gis, Model Builder and/or GIS programming in Python, all for geospatial processing. In addition, I have improved my knowledge in R for statistical analysis.

**8. Learning about statistics:**

I have added Principal Component Analysis (PCA) and Factor Analysis (FA) to my current statistical knowledge.

**9. Answer to comments:**

Primarily, I have modified my project title. This change was based on the change of my research question. Also, I have tried different methods inorder to make suitable the clustering of my categorical variables.

__References__

Getis, A., & Ord, J. K. (2010). The Analysis of Spatial Association by Use of Distance Statistics. *Geographical Analysis*, *24*(3), 189–206. https://doi.org/10.1111/j.1538-4632.1992.tb00261.x

IBM Knowledge Center – Estimation Methods for Replacing Missing Values. (n.d.). Retrieved May 8, 2017, from https://www.ibm.com/support/knowledgecenter/en/SSLVMB_20.0.0/com.ibm.spss.statistics.help/replace_missing_values_estimation_methods.htm