__Background:__

The emission of Carbon Dioxide (CO_{2}) is one of the greatest factors causing climate change. Forests, as the largest terrestrial carbon reservoir have an important role to reduce CO_{2} emissions and prevent climate change. Considering the important role of forests to reduce CO_{2} emissions and prevent climate change, it is necessary to manage forests sustainably. In sustainable forest management, forest manager must frequently monitor and measure forest structure parameters, such as number of trees, tree locations, tree heights, crown width, biomass, and forest carbon. However, these forest structure parameter are usually measured through conventional field measurements, which are costly and labor intensive. New technology, such as Unmanned Aircraft Systems (UAS) can be used as alternative methods to monitor and assess forest structure parameters. There are some programs designed to process UAS visual imagery data and measure forest structure parameters from the data, for example Agisoft PhotoScan, Fusion, ArcMap, and RStudio.

__Research question:__

Considering the statements on the background, I dicided to assess forest structure parameters (crown width, height, and number of trees) in a particular part of Siberut National Park based on UAS visual imagery data. The research questions are: (1). How the crown width and height vary in space in the study area, and are they autocorrelated? (2). Are Crown Width and Tree Height linearly related in space in the study area?

__Dataset:__

This project used Unmanned Aircraft System (UAS) visual imagery data that was taken from Sony camera NEX-5R mounted in a fix wing UAS. The data was gathered in Siberut National Park in West Sumatra Province, Indonesia. It was a part of my Advisor Professor Michael Wing’s project. There are 3 flights with 300 images in each flight. The image resolution is 350 x 350 inch. The images were taken in a flight with 80% sideward overlap and 80% forward overlap. I focused on the UAS visual imagery data in the first flight. The UAV visual images were processed through some software. Agisoft photoscan was used to make a 3D point cloud based on the UAS visual images. Then, the 3D point cloud was used in Fusion software to derive forest structure measurements, such as number of trees, tree positions, tree height, and crown width. One of my labmate did this data processing. The result of this data processing is a data set (points) consisting of 804 detected trees with their height and crown width (see figure 1). Unfortunately, because the images were not georeferenced yet for this process, the height in this project is the sum of tree height and elevation (ground height).

__Hypotheses:__

I hypothesize that crown width and height vary in space in the study area. There are trees that have small crown and some other that have bigger crown because differences in species and ages. In this project, the variation of the height will be affected mostly due to the elevation which is included in the height. Furthermore, I hypothesize that crown width is not autocorrelated, while height is autocorrelated since height is the total of tree height and elevation. I also hypothesize that crown width and height are linearly related. The linear relationship can be positive or negative due to some factors, such as differences in elevation and species.

__Approaches:__

In general, I use modeling approach to answer the research questions. To answer the first research question, I did variogram analysis in RStudio. Furthermore, I did Geographically Weighted Regression (GWR) in ArcMap to answer the second research question. Description of the analysis or tool and the steps are given below:

**Variogram Analysis**

I use variogram analysis to assess how the height and crown width vary in space in the study area, and whether they are autocorrelated or not. The variogram is defined as the variance of the difference between field values at two locations x and y across realizations of the field (Cressie 1993).

The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. Most variograms are defined through several parameters; namely, the nugget effect, sill, and range.

- Nugget effect represents micro-scale variation or measurement error. It is estimated from the empirical variogram as the value of y(h) for h = 0.
- Sill representing the variance of the random field.
- Range is the distance (if any) at which data are no longer autocorrelated.

In Rstudio, I used the R codes that are stated below to do the variogram analysis. First, I need to install the “gstat” package and load the package using the following R codes:

> install.packages(“gstat”)

> library(gstat)

Next, to load my data into the RStudio, I have to set my working directory. It can be done through the “session” toolbar. Then, I just need to write some R codes to do variogram analysis. The first variogram analysis is for crown width and space (x and y). The second variogram analysis is for height and location (x and y). All the R codes are given below.

Treelist1 <- read.table(“D:/Oregon State University/Term/Spring 2017/GEOG 566/Exercises/Exercise 2/treelist1.csv”, sep=”,”, header=T)

g1 <- gstat(id = “Max.crown.width”, formula = Max.crown.width~1, locations = ~X+Y, + data = Treelist1)

> plot(variogram(g1))

g2 <- gstat(id = “Height”, formula = Height~1, locations = ~X+Y, + data = Treelist1)

> plot(variogram(g2))

**Geographically Weighted Regression (GWR) Analysis**

I use Geographically Weighted Regression (GWR) tool in ArcMap to assess the linear relationship between crown width and height in space in the study area. Since the data has not been georeferenced, the Height is the sum of the tree height and the elevation (ground height).

GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth is dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors’ parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation. GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets.

The first step is load the dataset into the ArcMap. The data set is in shape file (points) consisting of 804 detected trees with their location, height and crown width (see figure 3). As stated before, because the images were not georeferenced yet, the height is the sum of tree height and elevation.

The second step is open the Geographically Weighted Regression (GWR) tool. It will appear like the figure 3 below. Then, we need to choose the Input features, dependent variable, and explanatory variable. We can choose more than one explanatory variable, but for this project I only choose one. My dependent variable is Height (Ground Height + Tree Height), and my explanatory variable is crown width. Both are continuous numerical varibales.

The next step, we can choose the kernel type, bandwidth method, and number of neighbors. By default, the kernel type is “Fixed”; the Bandwidth method is AICc; Number of neighbors is 30. I decided to use the default format because I think it is already appropriate for my dataset. As it is for my dataset, it is especially important to select Fixed for Kernel type whenever distance is a component of the analysis. To finish the analysis just click “Ok”.

__Results:__

As given below, the results of the variogram analysis and geographically weighted regression (GWR) analysis are shown in the graphs, tables (statistical relationship), and new layers (shape file). To better understand the results, the description is given below:

**The Result Description of Variogram Analysis**

Based on the variogram graph of crown width (figure 3), in general crown width is not autocorrelated although there are some patches with different nugget, sill, and range. In the first part of the graph, there is a patch with 25 range (distance) which is relatively larger than the other range. I think it can be happened due to different size of crown width in the study area. It seems like there are some trees with big crown and some other with smaller crown that are clustered. Although quite rare, 25 m which indicates the size of the crown width is reasonable because there are some tree species in tropical forest that has even more than 25 m crown width.

Moreover, based on the variogram graph of height (see figure 4), it can be seen that the semivariance is incremental with the increase of range (distance). It means that height is autocorrelated. I think that can happen because of the height data which is the sum of tree height and elevation. Since the elevation is included in the height data, the elevation or the topography of the study area will influence the variance and the variogram analysis. Therefore, it seems like the study area is on a slope. As known, an autocorrelated data is not good for some statistical analysis that requires independent assumption. I can do some effort to ensure the independent assumption of the data or do some adjustment before doing further statistical analysis (e.g. calculating confidence interval). The effort that I can do is “adjusting the standard errors”. This is really important for autocorrelated data because the usual standard error will overestimate the precision when there is positive autocorrelation.

** **

**The Result Description of Geographically Weighted Regression (GWR)**

We can see the whole result by looking at the attribute table which shows the statistical relationship between crown width and height in space in the study area (see figure 5). By default ArcMap will show the Residual Standard Deviation (RSD) for each feature. It is shown in a new layer with different colors which show the level or range of the RSD (see figure 6).

The attribute table consists of all the value related to regression analysis. There are observed and predicted value, coefficient of the explanatory variable, intercept, standard error, residual standard deviation, etc. From the result, in general, Crown width and Height have positive linear relationship. Which means there is an increase in height for every one unit increase in Crown width. In other words, the bigger the crown the higher the tree will be.

Figure 5. Attribute table of Geographically Weighed Regression result.

Figure 6. Distribution of Residual Standard Deviation in study area.

However, it is more interesting and useful to see the Coefficient of the explanatory variable instead of Residual Standard Deviation (RSD). Therefore, we need to make some changes in the layer properties. To open the layer properties, right click on the layer and choose “properties”. In the layer properties, particularly in the “Symbology”, we can change the field value from Residual Standard Deviation (RSD) into the Coefficient of the explanatory variable (see figure 7). We also can change the number of classes and color. In my case, I chose three classes to distinguish features (trees) that have positive linear relationship, negative linear relationship, and trees that have coefficient (linear relationship) close to zero. The layer after adjustment can be seen in the figure 8.

Figure 7. Layer properties:Symbology.

Figure 8. Distribution of coefficient of the explanatory variable.

However, if we see the result for each individual feature (tree), some of the trees have positive linear relationship between crown width and height (Ground height + Tree height), and some other trees have negative linear relationship. The distribution of trees that have positive and negative linear relationship can be seen in Figure 8. The red points indicate trees that have negative linear relationships between Crown width and Height, which means trees with big crown will have lower height. On the other hand, blue points indicate trees that have positive linear relationship, which means trees with big crown will have higher height. While the white points indicate trees that have either positive or negative linear relationship, but their coefficients of the explanatory variable are close to 0.

Differences in linear relationship (positive and negative linear relationship) in this case might be happened due to some factors, such as different trees species, different elevation, or error factor from the Fusion software analysis. Tropical forest has hundreds different trees species that have different characteristic. Some of the trees have big crown and high trunk, and some other have big crown and short trunk. In addition, different elevation can give significant effect because the height data used in this project is the total of ground height (elevation) and tree height. Trees with positive linear relationships might be distributed in the area with higher elevation (hill, mount, or peak). On the other hand, trees with negative linear relationship might be distributed in the area with lower elevation (watershed or valley). Trees with coefficient close to zero might be occurred because of the data that has not been georeferenced and the algorithm used in Geographically Weighted Regression analysis that included data from the neighbor features (trees). That can affect the value of coefficient in linear relationship.

In general, the R-squared is quite low, with most of the features (trees) have R-squared lower than 0.2. To improve the regression analysis, I think I need to georeference the data. The Height which is the sum of the ground height (elevation) and tree height can affect the regression model (intercept, coefficient, etc) between Crown width and Height. I also can add additional explanatory variables like “tree species” to increase the accuracy of the linear model.

** **

__Significance:__

I learned from the results of this project that Unmanned Aircraft System (UAS) visual imagery data can be used as a low cost alternative method for measuring forest structure parameters after being processed through some software and analysis. Forest structure parameters are usually measured by forest managers using conventional methods of field ground measurements. These conventional methods are costly and labor intensive. By using UAS visual imagery data that have been processed in some software, such as Agisoft, Fusion, ArcMap, and RStudio, forest managers will be able to monitor forest structure parameters in a faster and cheaper way.

Forest manager can do point pattern analysis, variogram analysis, geographically weighted regression (GWR) analysis, or other spatial and statistical analysis to assess the structure of a particular forest. For example, I found out from this project that we can assess the linear relationship between forest structure parameters in space in the particular forest. In my case, there is linear relationship between crown width and tree height, so we can predict the growth of the trees from this linear relationship. Therefore, generally speaking, forest manager can make a linear regression model using geographically weighted regression (GWR) tool in ArcMap between two or more variables. Then, forest managers can estimate or predict the forest growth based on the linear model in a faster and cheaper way. That is why this new method will be very useful, especially when it is used in remote areas with limited accessibility, where it is almost impossible to do field ground measurements.

__Things that I learned: Software__

From this class and this project, I learned some new and useful codes and packages in RStudio. I learned how to do some spatial and statistical analysis that I have never done before, for example variogram analysis and geographically weighted regression (GWR) analysis. Even though, I did not include the geographically weighted regression (GWR) that I did in RStudio in this project, I already shared the link that has all of the codes in the Geog 566 blog post. In addition, I realized that besides table format data (.csv) I also can use raster data as an input in RStudio. Although my experience with raster data did not go well since there was not enough memory in RStudio, I did well with my table format data (.csv). I definitely will use RStudio and some of the packages and codes in my future research.

Moreover, In ArcMap, I learned how to do point pattern analysis (spatial autocorrelation) and raster clustering analysis (unsupervised classification) in exercise 1. And I learned geographically weighted regression (GWR) analysis in exercise 3 and this final project. All of those tools will be very useful for me in assessing forest structure parameters of tropical forest. At first I only do regression analysis in RStudio, and the analysis does not consider the spatial information (location of the features) in its equation. Then, I learned from this project that I can do regression analysis in ArcMap which includes the spatial information (location of the features) in the analysis. Although I have not figured out the whole formula used in this analysis, I am satisfied that I was able to do this analysis with my data set.

__Things that I learned: Statistic__

I learned some new statistical analysis in this class and this project. The statistical analysis that I did is Spatial Autocorrelation (Global Moran’s I), Variogram, and Geographically Weighted Regression (GWR). First, I did spatial autocorrelation (Global Moran’s I) in ArcMap. As we know, the Spatial Autocorrelation (Global Moran’s I) tool measures spatial autocorrelation based on both feature locations and feature values simultaneously. Given a set of features and an associated attribute, it evaluates whether the pattern expressed is clustered, dispersed, or random. The tool calculates the Moran’s I Index value and both a z-score and p-value to evaluate the significance of that Index.

The next statistical analysis that I did is Variogram. I did this analysis in RStudio. As mentioned before, the variogram is defined as the variance of the difference between field values at two locations x and y across realizations of the field (Cressie 1993). The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. For some statistical analysis, independent assumption is really important. Therefore, it is necessary to check the independency of the variables by doing statistical analysis like variogram.

The third statistical analysis that I did is geographically weighted regression (GWR). I did this analysis in ArcMap. GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. In this project, I was able to automatically make linear regression model for all 804 trees in my data set. For my linear model, I choose height as the response variable and crown width as the explanatory variable. GWR analysis requires all the variables as continuous numerical variables. One thing that is interesting for me is that GWR uses spatial information (tree location) in its statistical equation. This is quite new for me since all the regression models that I have learned before only use regular data set and do not put into account the spatial information (e.g. tree location). Hence, GWR will be very useful to analyze linear relationship between forest structure variables of tropical forest which have spatial information attached.

__Responses to Comments__

Most of the responses that I got concerned about my dataset which is not georeferenced yet. I am also thinking the same, and I wish I can georefence the data. However, since it will need more data (DEM of the particular study area), time, and processing through some software which I am still learning how to use, I cannot georeference the data at this point. Nevertheless, I am working on it, and I hope I will have the georeferenced data as soon as possible. Once I have the georeferenced data, I will be able to correctly interpret the results from spatial statistical analysis (e.g. variogram and geographically weighted regression (GWR)).

I also got another comment from Dr. Jones to elaborate more in my explanation of the result, particularly explanation about study area that is probably located in a slope based on variogram analysis of height, and explanation about variogram analysis of crown width which has about 25 m in size. In response, I have tried to revise and elaborate more my explanation in this final project. Moreover, I got commend that is asking about the precision of crown width measurements in topical forest based on UAS imagery data. I think this question can be answered if I have ground truth data, which I do not have, so I can compare the UAS and ground measurements. Otherwise, to get an idea about the precision of UAS measurement compared to ground measurements, we can read some papers that assess this topic. I have read some papers, and it appears that UAS measurement is quite accurate compared to ground measurement and LiDAR measurement.

**References
**

http://faculty.washington.edu/edford/Variogram.pdf

https://en.wikipedia.org/wiki/Variogram

http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/geographically-weighted-regression.htm