Introduction

mgwrsar package estimates linear and local linear models with spatial autocorrelation of the following forms:

\[y=\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (GWR)\]

\[y=\beta_c X_c+\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR)\]

\[y=\lambda Wy+\beta_c X_c+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,k,0))\]

\[y=\lambda Wy+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,0,k))\]

\[y=\lambda Wy+\beta_c X_c+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,k_c,k_v))\]

\[y=\lambda(u_i,v_i) Wy+\beta_c X_c+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,k,0))\]

\[y=\lambda(u_i,v_i)Wy+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,0,k))\]

\[y=\lambda(u_i,v_i)Wy+\beta_cX_c+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,k_c,k_v))\]

For a deeper understanding of the estimation method, please refer to Geniaux, G. and Martinetti, D. (2017). A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics. (https://doi.org/10.1016/j.regsciurbeco.2017.04.001)

The estimation of the aforementioned model can be performed using the MGWRSAR wrapper function, which necessitates a dataframe and a matrix of coordinates. It’s important to note:

  • Both 2SLS and Best 2SLS methods are applicable. In cases where the model involves local regressions, the specification requires a bandwidth and a kernel type.
  • Optimal bandwidth estimation can be achieved using the golden_search_bandwidth function, considering either AICc or Leave-One-Out Cross Validation, applicable to all models, whether they involve spatial autocorrelation or not.
  • There are three modes of predictions available for spatially varying models (refer to Geniaux, 2024, “Speeding up estimation of spatially varying coefficients models,” forthcoming in the Journal of Geographical Systems).
  • Predictions for models incorporating spatial autocorrelation can be conducted using the Best Linear Unbiased Prediction method proposed by Goulard et al. (2017) (Goulard, M., Laurent, T., & Thomas-Agnan, C., 2017, “About predictions in spatial autoregressive models: Optimal and almost optimal strategies,” published in Spatial Economic Analysis, 12(2-3), 304-325).
  • model specifications and spatial stationarity of coefficients can be tested using bootstrap method.

In addition to enabling the consideration of spatial autocorrelation in GWR/MGWR-like models, the MGWRSAR function introduces several useful techniques for estimating local regression with space and space-time coordinates:

  • It employs RCCP and RCCPeigen code to expedite computations and facilitates parallel computing using the doParallel package.
  • It offers a method for selecting an optimal set of target points (based on a specified size) to enhance computational efficiency.
  • It accommodates the consideration of space-time kernels (including asymmetric time kernels and asymmetric time cycling kernels) with appropriate diagnostics (such as AICc) for determining optimal bandwidths.

Please note:

  • When spatial autocorrelation is implied in the model, a row-normalized spatial weight matrix must be provided.
  • When the model entails mixed GWR regression, the names of stationary covariates must be provided.

Estimation of GWR/MGWR with spatial dependance.

The MGWRSAR function requires a dataframe and a matrix of coordinates, and eventually a spatial weight matrix if model include spatiale dependence. The package includes a simulated data example which is used for this vignette.

library(mgwrsar)
#> Loading required package: sp
#> Loading required package: Matrix
#> Registered S3 method overwritten by 'future':
#>   method               from      
#>   all.equal.connection parallelly
## loading data example
data(mydata)
coord=as.matrix(mydata[,c("x","y")])
head(coord)
#>             x       y
#> [1,] 872142.8 6234075
#> [2,] 872894.3 6233366
#> [3,] 878615.1 6229884
#> [4,] 869936.0 6228408
#> [5,] 866441.8 6231471
#> [6,] 866952.9 6222411

head(mydata)
#>   Y_mgwrsar_1_0_kv X0       X1        X2        X3       Beta0      Beta1
#> 1       0.44146014  1 2.396274 0.4763857 0.6854775  0.11599551 0.26922206
#> 2       0.02223952  1 1.684776 0.3303000 0.8994048  0.11533789 0.25784575
#> 3       3.41607993  1 2.369278 1.5117796 0.5808293  0.20973774 0.55723064
#> 4       0.28497090  1 4.208743 1.0916590 1.4058201 -0.08706293 0.06795084
#> 5      -0.79417336  1 3.109543 0.6487661 1.6977387 -0.08209832 0.21763086
#> 6       6.78538775  1 7.284831 0.8828887 1.5085908 -0.28330886 0.52468859
#>       Beta2      Beta3    lambda     Y_ols    Y_sar       Y_gwr      Y_mgwr
#> 1 0.6772792 -1.1106223 0.2408643 0.9890454 2.010387  0.32246490  0.39829402
#> 2 0.6751576 -1.0326101 0.2273072 0.2732833 1.539485 -0.15597959 -0.12664995
#> 3 0.5623073 -0.5412975 0.3384363 2.1155894 3.637831  2.06565538  1.79922754
#> 4 1.0796267 -0.9258913 0.3543732 1.7902102 3.201578  0.07587226 -0.02831127
#> 5 1.1019372 -1.2759568 0.3772657 0.5057989 1.460611 -0.85670748 -0.38820500
#> 6 1.5488474 -0.7649646 0.4933481 3.0167134 4.654380  3.75240014  3.39782791
#>   Y_mgwrsar_0_0_kv Y_mgwrsar_0_kc_kv Y_mgwrsar_1_kc_kv Y_mgwr_outlier X2dummy
#> 1        0.5673431         0.7137761        0.55150700      0.3982940   FALSE
#> 2        0.2031676         0.2611103        0.06388119     -0.1266500    TRUE
#> 3        3.8183989         3.1882294        2.86908705      1.7992275    TRUE
#> 4        0.3314998         0.1832260        0.14519215     -0.0188881    TRUE
#> 5       -0.7858941        -0.1114446       -0.13694311     -0.3882050   FALSE
#> 6        5.8227160         5.2859121        6.16333174      3.3978279    TRUE
#>   Y_mgwr_X2dummy        x       y
#> 1     0.07564794 872142.8 6234075
#> 2    -0.34965450 872894.3 6233366
#> 3     0.94914284 878615.1 6229884
#> 4    -1.20689547 869936.0 6228408
#> 5    -1.10310449 866441.8 6231471
#> 6     2.03036799 866952.9 6222411
## plot
ggplot(mydata,aes(x,y))+geom_point(aes(color=Beta0))+ ggtitle('Spatial Pattern of coefficient Beta0')+scale_colour_gradientn(colours = terrain.colors(10))

ggplot(mydata,aes(x,y))+geom_point(aes(color=Beta1))+ ggtitle('Spatial Pattern of coefficient Beta1')+scale_colour_gradientn(colours = terrain.colors(10))

ggplot(mydata,aes(x,y))+geom_point(aes(color=lambda))+ ggtitle('Spatial Pattern of spatial autocorrelation coefficient (lambda)')+scale_colour_gradientn(colours = terrain.colors(10))


W=kernel_matW(H=4,kernels='rectangle',coords=coord,NN=5,adaptive=TRUE,diagnull=TRUE)

Note that in this package there are two ways to express a bandwidth \(H\). It can be a distance (adaptive = F) or a number of neighbors (adaptive = T).

Estimation

Estimation of a mgwrsar(1,0,kv) model with all parameter spatially varying using an adapative gaussian kernel (based on 20 nearest neighbors):



res_mgwrsar_1_0_kv<- search_bandwidths(formula ='Y_mgwrsar_1_0_kv~X1+X2+X3',data = mydata,coords = coord,kernels = c("gauss"),Model = "MGWRSAR_1_0_kv",control = list( criterion = "AICc", adaptive = c(TRUE), Type = "GD", verbose = FALSE,W=W),hs_range = c(6, nrow(coord)),ht_range =NULL,n_seq = 10,ncore = 1,show_progress = F,n_rounds = 2,refine=T,tol = c(1),verbose = TRUE)

H_opt=res_mgwrsar_1_0_kv$best$h_s

mgwrsar_1_0_kv<-MGWRSAR(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'),H=H_opt, Model = 'MGWRSAR_1_0_kv',control=list(SE=T,adaptive=T,W=W))
summary(mgwrsar_1_0_kv)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_0_kv~X1+X2+X3", data = mydata, 
#>     coords = coord, fixed_vars = NULL, kernels = c("gauss"), 
#>     H = H_opt, Model = "MGWRSAR_1_0_kv", control = list(SE = T, 
#>         adaptive = T, W = W))
#> 
#> Model: MGWRSAR_1_0_kv 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 7 
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.207 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1000 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Varying Parameters]
#>    Intercept              X1                X2               X3          
#>  Min.   :-7.72674   Min.   :0.05003   Min.   :-1.758   Min.   :-2.01204  
#>  1st Qu.:-1.75755   1st Qu.:0.35654   1st Qu.: 0.422   1st Qu.:-1.24745  
#>  Median :-0.62264   Median :0.55660   Median : 1.516   Median :-0.99979  
#>  Mean   :-0.89440   Mean   :0.51027   Mean   : 1.655   Mean   :-0.98199  
#>  3rd Qu.: 0.08947   3rd Qu.:0.64788   3rd Qu.: 2.769   3rd Qu.:-0.68450  
#>  Max.   : 2.02044   Max.   :1.05498   Max.   : 6.594   Max.   :-0.07195  
#>      lambda       
#>  Min.   :-0.3523  
#>  1st Qu.: 0.1821  
#>  Median : 0.3848  
#>  Mean   : 0.3726  
#>  3rd Qu.: 0.5621  
#>  Max.   : 1.2224  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 704.39 
#>    AICc: 3711.12 
#>    Residual sum of squares: 1031.78 
#>    RMSE: 1.0158 
#> ------------------------------------------------------

Estimation of a mgwrsar(0,0,kv) model with stationnary spatial multiplier using an adapative gaussian kernel (based on 20 nearest neighbors):

res_mgwrsar_0_0_kv<- search_bandwidths(formula ='Y_mgwrsar_0_0_kv~X1+X2+X3',data = mydata,coords = coord,kernels = c("gauss"),Model = "MGWRSAR_0_0_kv",control = list( criterion = "AICc", adaptive = c(TRUE), Type = "GD", verbose = FALSE,W=W),hs_range = c(6, nrow(coord)),ht_range =NULL,n_seq = 10,ncore = 1,show_progress = F,n_rounds = 1,refine=T,tol = c(1),verbose = TRUE)

H_opt=res_mgwrsar_0_0_kv$best$h_s

mgwrsar_0_0_kv<-MGWRSAR(formula = 'Y_mgwrsar_0_0_kv~X1+X2+X3', data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'),H=H_opt, Model = 'MGWRSAR_0_0_kv',control=list(SE=T,adaptive=T,W=W))
summary(mgwrsar_0_0_kv)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_0_0_kv~X1+X2+X3", data = mydata, 
#>     coords = coord, fixed_vars = NULL, kernels = c("gauss"), 
#>     H = H_opt, Model = "MGWRSAR_0_0_kv", control = list(SE = T, 
#>         adaptive = T, W = W))
#> 
#> Model: MGWRSAR_0_0_kv 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 6 
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.168 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1000 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Constant Parameters]
#> [1] 0.377765
#> 
#>    [Varying Parameters]
#>    Intercept             X1                 X2                X3          
#>  Min.   :-5.5933   Min.   :-0.02637   Min.   :-2.7354   Min.   :-1.99933  
#>  1st Qu.:-1.1951   1st Qu.: 0.36239   1st Qu.: 0.3739   1st Qu.:-1.25872  
#>  Median :-0.3825   Median : 0.55045   Median : 1.3259   Median :-0.98323  
#>  Mean   :-0.5167   Mean   : 0.50539   Mean   : 1.3059   Mean   :-0.98406  
#>  3rd Qu.: 0.2199   3rd Qu.: 0.64008   3rd Qu.: 2.6088   3rd Qu.:-0.70394  
#>  Max.   : 2.9978   Max.   : 0.97632   Max.   : 5.1919   Max.   :-0.09413  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 710.04 
#>    AICc: 2369.83 
#>    Residual sum of squares: 275.99 
#>    RMSE: 0.5254 
#> ------------------------------------------------------

Estimation of a mgwrsar(0,kc,kv) model with stationary X2 and stationnary spatial multiplier using an adapative gaussian kernel (based on 20 nearest neighbors):

res_mgwrsar_0_kc_kv<- search_bandwidths(formula ='Y_mgwrsar_0_kc_kv~X1+X2+X3',data = mydata,coords = coord,kernels = c("gauss"),fixed_vars='X2',Model = "MGWRSAR_0_kc_kv",control = list( criterion = "AICc", adaptive = c(TRUE), Type = "GD", verbose = FALSE,W=W),hs_range = c(6, nrow(coord)),ht_range =NULL,n_seq = 10,ncore = 1,show_progress = F,n_rounds = 2,refine=T,tol = c(1),verbose = TRUE)

H_opt=res_mgwrsar_0_kc_kv$best$h_s
mgwrsar_0_kc_kv<-MGWRSAR(formula = 'Y_mgwrsar_0_kc_kv~X1+X2+X3', data = mydata,coords=coord, fixed_vars='X2',kernels=c('gauss'),H=H_opt, Model = 'MGWRSAR_0_kc_kv',control=list(SE=T,adaptive=T,W=W))
summary(mgwrsar_0_kc_kv)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_0_kc_kv~X1+X2+X3", data = mydata, 
#>     coords = coord, fixed_vars = "X2", kernels = c("gauss"), 
#>     H = H_opt, Model = "MGWRSAR_0_kc_kv", control = list(SE = T, 
#>         adaptive = T, W = W))
#> 
#> Model: MGWRSAR_0_kc_kv 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 6 
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.149 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1000 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Constant Parameters]
#>        X2      PhWy 
#> 0.7132556 0.4151707 
#> 
#>    [Varying Parameters]
#>    Intercept             X1                X3         
#>  Min.   :-1.6122   Min.   :-0.0617   Min.   :-1.3988  
#>  1st Qu.:-0.3765   1st Qu.: 0.3556   1st Qu.:-1.0765  
#>  Median : 0.1363   Median : 0.5511   Median :-0.9934  
#>  Mean   : 0.1990   Mean   : 0.5056   Mean   :-0.9949  
#>  3rd Qu.: 0.6953   3rd Qu.: 0.6453   3rd Qu.:-0.9114  
#>  Max.   : 2.4668   Max.   : 0.9752   Max.   :-0.6007  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 731.12 
#>    AICc: 2120.19 
#>    Residual sum of squares: 233.26 
#>    RMSE: 0.483 
#> ------------------------------------------------------

Estimation of a mgwrsar(1,kc,kv) model with stationary X2 using an adapative gaussian kernel (based on 20 nearest neighbors):

res_mgwrsar_1_kc_kv<- search_bandwidths(formula ='Y_mgwrsar_1_kc_kv~X1+X2+X3',data = mydata,coords = coord,kernels = c("gauss"),fixed_vars='X2',Model = "MGWRSAR_1_kc_kv",control = list( criterion = "AICc", adaptive = c(TRUE), Type = "GD", verbose = FALSE,W=W),hs_range = c(6, nrow(coord)),ht_range =NULL,n_seq = 10,ncore = 1,show_progress = F,n_rounds = 2,refine=T,tol = c(1),verbose = TRUE)

H_opt=res_mgwrsar_0_kc_kv$best$h_s
mgwrsar_1_kc_kv<-MGWRSAR(formula = 'Y_mgwrsar_1_kc_kv~X1+X2+X3', data = mydata,coords=coord, fixed_vars='X2',kernels=c('gauss'),H=7, Model = 'MGWRSAR_1_kc_kv',control=list(SE=T,adaptive=T,W=W))
summary(mgwrsar_1_kc_kv)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_kc_kv~X1+X2+X3", data = mydata, 
#>     coords = coord, fixed_vars = "X2", kernels = c("gauss"), 
#>     H = 7, Model = "MGWRSAR_1_kc_kv", control = list(SE = T, 
#>         adaptive = T, W = W))
#> 
#> Model: MGWRSAR_1_kc_kv 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 7 
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.171 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1000 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Constant Parameters]
#>       X2 
#> 1.956348 
#> 
#>    [Varying Parameters]
#>    Intercept              X1                  X3              lambda       
#>  Min.   :-3.23744   Min.   :0.0003216   Min.   :-1.2686   Min.   :0.03101  
#>  1st Qu.:-1.69589   1st Qu.:0.3576303   1st Qu.:-1.0586   1st Qu.:0.42899  
#>  Median :-1.33853   Median :0.5394726   Median :-1.0081   Median :0.57111  
#>  Mean   :-1.36110   Mean   :0.4988842   Mean   :-1.0026   Mean   :0.55998  
#>  3rd Qu.:-1.00408   3rd Qu.:0.6349332   3rd Qu.:-0.9474   3rd Qu.:0.68432  
#>  Max.   : 0.02671   Max.   :0.9032831   Max.   :-0.7237   Max.   :1.43057  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 738.59 
#>    AICc: -708.39 
#>    Residual sum of squares: 14.17 
#>    RMSE: 0.119 
#> ------------------------------------------------------

Estimation of a mgwrsar(1,kc,0) model with stationary X1 using an adapative gaussian kernel (based on 20 nearest neighbors):

res_mgwrsar_1_kc_0<- search_bandwidths(formula ='Y_mgwrsar_1_kc_kv~X1+X2+X3',data = mydata,coords = coord,kernels = c("gauss"),fixed_vars='X2',Model = "MGWRSAR_1_kc_0",control = list( criterion = "AICc", adaptive = c(TRUE), Type = "GD", verbose = FALSE,W=W),hs_range = c(6, nrow(coord)),ht_range =NULL,n_seq = 10,ncore = 1,show_progress = F,n_rounds = 2,refine=T,tol = c(1),verbose = TRUE)


mgwrsar_1_kc_0<-MGWRSAR(formula = 'Y_mgwrsar_1_kc_kv~X1+X2+X3', data = mydata,coords=coord, fixed_vars=c('Intercept','X1','X2','X3'),kernels=c('gauss'),H=7, Model = 'MGWRSAR_1_kc_0',control=list(SE=T,adaptive=T,W=W))
summary(mgwrsar_1_kc_0)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_kc_kv~X1+X2+X3", data = mydata, 
#>     coords = coord, fixed_vars = c("Intercept", "X1", "X2", "X3"), 
#>     kernels = c("gauss"), H = 7, Model = "MGWRSAR_1_kc_0", control = list(SE = T, 
#>         adaptive = T, W = W))
#> 
#> Model: MGWRSAR_1_kc_0 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 7 
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.108 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1000 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Constant Parameters]
#>  Intercept         X1         X2         X3 
#> -1.2829925  0.5155295  1.5111313 -1.0299700 
#> 
#>    [Varying Parameters]
#>      lambda       
#>  Min.   :-2.2507  
#>  1st Qu.: 0.4532  
#>  Median : 0.5995  
#>  Mean   : 0.4923  
#>  3rd Qu.: 0.7094  
#>  Max.   : 0.9941  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 917.89 
#>    AICc: 1509.35 
#>    Residual sum of squares: 221.19 
#>    RMSE: 0.4703 
#> ------------------------------------------------------

Plot and summary

Summary and plot of mgwrsar object using leaflet:


mgwrsar_1_0_kv<-MGWRSAR(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data = mydata,coords=coord,fixed_vars=NULL,kernels=c('gauss'),H=7, Model = 'MGWRSAR_1_0_kv',control=list(SE=T,adaptive=T,W=W))

summary(mgwrsar_1_0_kv)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_0_kv~X1+X2+X3", data = mydata, 
#>     coords = coord, fixed_vars = NULL, kernels = c("gauss"), 
#>     H = 7, Model = "MGWRSAR_1_0_kv", control = list(SE = T, adaptive = T, 
#>         W = W))
#> 
#> Model: MGWRSAR_1_0_kv 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 7 
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.287 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1000 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Varying Parameters]
#>    Intercept              X1                X2               X3          
#>  Min.   :-7.72674   Min.   :0.05003   Min.   :-1.758   Min.   :-2.01204  
#>  1st Qu.:-1.75755   1st Qu.:0.35654   1st Qu.: 0.422   1st Qu.:-1.24745  
#>  Median :-0.62264   Median :0.55660   Median : 1.516   Median :-0.99979  
#>  Mean   :-0.89440   Mean   :0.51027   Mean   : 1.655   Mean   :-0.98199  
#>  3rd Qu.: 0.08947   3rd Qu.:0.64788   3rd Qu.: 2.769   3rd Qu.:-0.68450  
#>  Max.   : 2.02044   Max.   :1.05498   Max.   : 6.594   Max.   :-0.07195  
#>      lambda       
#>  Min.   :-0.3523  
#>  1st Qu.: 0.1821  
#>  Median : 0.3848  
#>  Mean   : 0.3726  
#>  3rd Qu.: 0.5621  
#>  Max.   : 1.2224  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 704.39 
#>    AICc: 3711.12 
#>    Residual sum of squares: 1031.78 
#>    RMSE: 1.0158 
#> ------------------------------------------------------
p<-plot(mgwrsar_1_0_kv,type='B_coef',var='X2',crs=2154)
#> Warning in plot.mgwrsar(x = x, type = type, var = var, crs = crs, mypalette =
#> mypalette, : n_time_steps ignored (Model is not Spatio-Temporal GDT).
p<-plotly::partial_bundle(p)
p

Prediction

In this example, we utilize an in-sample of size 800 for model estimation and an out-sample of size 200 for prediction. It’s worth noting that for models with spatial autocorrelation, you must provide a global weight matrix of size 1000, ordered by in-sample and then out-sample locations.

For GWR and MGWR_1_0_kv models, where all coefficients vary in space, predictions are made using the corresponding model estimate with the out-sample locations as target points. Hence, the estimated coefficients are not utilized; only the call and the data of the estimated model are employed (method_pred = ‘TP’, default value). However, for mixed models that include some constant coefficients (such as MGWR, MGXWR_0_0_kv, MGXWR_1_kc_kv, MGXWR_1_kc_0), the estimated coefficients are extrapolated using a weighting matrix. By default, the weighting matrix for prediction is derived from the expected weights of out-sample data if they were added to in-sample data to estimate the corresponding MGWRSAR model: method_pred=‘tWtp_model’ (refer to Geniaux, 2024, for further details). Alternatively, the user can choose to extrapolate the model coefficients using a Shepard kernel with a custom number of neighbors (method_pred=‘shepard’,k_extra=12) or using the same kernel and bandwidth as the estimated model (method_pred=‘model’). In case of extrapolation, predition method without the best linear unbiased predictor (‘type = YTC’) and with the best linear unbiased predictor (‘type = BPN’) are available.

Prediction of MGWRSAR_1_0_kv model using shepard kernel with 4 neighbors and Best Linear Unbiased Predictor (BLUP) :


### Global Spatial Weight matrix W should be ordered by in_sample (S) then out_sample

# in_sample and out_sample folds
length_out=800
index_in=sample(1:1000,length_out)
index_out=(1:1000)[-index_in]

#W=kernel_matW(H=4,kernels='rectangle',coords=coord,NN=4,adaptive=TRUE,diagnull=TRUE)
W_in_out=kernel_matW(H=4,kernels='rectangle',coords=rbind(coord[index_in,],coord[index_out,]),NN=4,adaptive=TRUE,diagnull=TRUE)
W_in=W_in_out[1:length(index_in),1:length(index_in)]
W_in=mgwrsar::normW(W_in)
newdata=mydata[index_out,]
newdata_coords=coord[index_out,]


### SAR Model
model_SAR_insample<-MGWRSAR(formula = 'Y_sar~X1+X2+X3', data = mydata[index_in,],coords=coord[index_in,], fixed_vars=NULL,kernels=NULL,H=NULL, Model = 'SAR',control=list(W=W_in))
model_SAR_insample@RMSE
#> [1] 0.1325775
summary(model_SAR_insample)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_sar~X1+X2+X3", data = mydata[index_in, ], 
#>     coords = coord[index_in, ], fixed_vars = NULL, kernels = NULL, 
#>     H = NULL, Model = "SAR", control = list(W = W_in))
#> 
#> Model: SAR 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : NA (Fixed / Distance) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.003 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 800 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Constant Parameters]
#>  Intercept         X1         X2         X3     lambda 
#>  0.2014254  0.5020017  1.1437199 -1.0141987  0.3035669 
#> 
#> ------------------------------------------------------
#> Diagnostics:
#>    AICc:  
#>    Residual sum of squares: 14.06 
#>    RMSE: 0.1326 
#> ------------------------------------------------------

## SAR Model: predictions
## without Best Linear Unbiased Predictor
# prediction YTC
Y_pred10=predict(model_SAR_insample, newdata=newdata, newdata_coords=newdata_coords,W=W_in_out,type='YTC')
head(Y_pred10)
#> [1] 5.011485 3.237824 3.492985 2.679169 1.862020 2.766853
RMSE_YTC=sqrt(mean((mydata$Y_sar[index_out]-Y_pred10)^2))
RMSE_YTC 
#> [1] 0.08964377

## Using Best Linear Unbiased Predictor
Y_pred11=predict(model_SAR_insample, newdata=newdata, newdata_coords=newdata_coords,W=W_in_out,type='BPN')
head(Y_pred11)
#> [1] 5.042327 3.230759 3.513008 2.660348 1.888774 2.704777
RMSE_BPN=sqrt(mean((mydata$Y_sar[index_out]-Y_pred11)^2))
RMSE_BPN 
#> [1] 0.06277489


#### MGWRSAR_1_0_kv model
model_MGWRSAR_1_0_kv_insample<-MGWRSAR(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data = mydata[index_in,],coords=coord[index_in,], fixed_vars=NULL,kernels='gauss',H=20, Model = 'MGWRSAR_1_0_kv',control=list(W=W_in,adaptive=TRUE,isgcv=F))
model_MGWRSAR_1_0_kv_insample@RMSE
#> [1] 0.8045282
summary(model_MGWRSAR_1_0_kv_insample)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_0_kv~X1+X2+X3", data = mydata[index_in, 
#>     ], coords = coord[index_in, ], fixed_vars = NULL, kernels = "gauss", 
#>     H = 20, Model = "MGWRSAR_1_0_kv", control = list(W = W_in, 
#>         adaptive = TRUE, isgcv = F))
#> 
#> Model: MGWRSAR_1_0_kv 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 20 
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.118 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 800 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Varying Parameters]
#>    Intercept             X1               X2               X3         
#>  Min.   :-4.4001   Min.   :0.1075   Min.   :-1.268   Min.   :-1.8995  
#>  1st Qu.:-1.6387   1st Qu.:0.3755   1st Qu.: 0.964   1st Qu.:-1.3126  
#>  Median :-0.8543   Median :0.5274   Median : 1.708   Median :-1.0658  
#>  Mean   :-0.8422   Mean   :0.4971   Mean   : 2.155   Mean   :-1.0343  
#>  3rd Qu.:-0.2260   3rd Qu.:0.6201   3rd Qu.: 3.494   3rd Qu.:-0.7195  
#>  Max.   : 2.2572   Max.   :0.8375   Max.   : 7.352   Max.   :-0.2145  
#>      lambda       
#>  Min.   :-0.1958  
#>  1st Qu.: 0.1621  
#>  Median : 0.2907  
#>  Mean   : 0.2789  
#>  3rd Qu.: 0.4109  
#>  Max.   : 0.7153  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 704.98 
#>    AICc: 2139.4 
#>    Residual sum of squares: 517.81 
#>    RMSE: 0.8045 
#> ------------------------------------------------------

#### MGWRSAR_1_0_kv model: predictions

## Using Best Linear Unbiased Predictor with method_pred='model' 
Y_pred12=predict(model_MGWRSAR_1_0_kv_insample, newdata=newdata, newdata_coords=newdata_coords,W=W_in_out,type='BPN',method_pred='model' )
head(Y_pred12)
#> [1] 1.1404331 6.6056158 2.4251779 2.9401019 1.2551243 0.9482162
RMSE_model_BPN=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred12)^2))
RMSE_model_BPN 
#> [1] 0.4050825

# prediction with method_pred='tWtp'
Y_pred13=predict(model_MGWRSAR_1_0_kv_insample, newdata=newdata, newdata_coords=newdata_coords,W=W_in_out,type='BPN',method_pred='tWtp_model')
head(Y_pred13)
#> [1] 1.128355 6.561310 2.489386 3.025743 1.259118 1.018711
RMSE_tWtp_BPN=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred13)^2))
RMSE_tWtp_BPN 
#> [1] 0.3991092

## Using Best Linear Unbiased Predictor with method_pred='shepard' 
W_out=kernel_matW(H=4,kernels='rectangle',coords=coord[index_out,],NN=4,adaptive=TRUE,diagnull=TRUE)
Y_pred14=predict(model_MGWRSAR_1_0_kv_insample, newdata=newdata, newdata_coords=newdata_coords,W=W_in_out,type='BPN',method_pred='shepard',k_extra=8)
head(Y_pred14)
#> [1] 0.6940118 6.8240480 2.5281222 3.1481212 1.2232821 1.0784311
RMSE_shepard_BPN=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred14)^2))
RMSE_shepard_BPN
#> [1] 0.4430776

Prediction of MGWRSAR_1_kc_kv model using shepard kernel with 4 neighbors and Best Linear Unbiased Predictor (BLUP) :


### Global Spatial Weight matrix W should be ordered by in_ sample (S) then out_sample

# in_sample and out_sample folds
length_out=800
index_in=sample(1:1000,length_out)
index_out=(1:1000)[-index_in]

W=kernel_matW(H=4,kernels='rectangle',coords=rbind(coord[index_in,],coord[index_out,]),NN=4,adaptive=TRUE,diagnull=TRUE)
W_in=W[index_in,index_in]
W_in=mgwrsar::normW(W_in)

### model_MGWRSAR_1_kc_kv
model_MGWRSAR_1_kc_kv_insample<-MGWRSAR(formula = 'Y_mgwrsar_1_kc_kv~X1+X2+X3', data = mydata[index_in,],coords=coord[index_in,], fixed_vars='X3',kernels=c('gauss'),H=11, Model = 'MGWRSAR_1_kc_kv',control=list(W=W_in,adaptive=TRUE,isgcv=F))
model_MGWRSAR_1_kc_kv_insample@RMSE
#> [1] 0.3379756
summary(model_MGWRSAR_1_kc_kv_insample)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_kc_kv~X1+X2+X3", data = mydata[index_in, 
#>     ], coords = coord[index_in, ], fixed_vars = "X3", kernels = c("gauss"), 
#>     H = 11, Model = "MGWRSAR_1_kc_kv", control = list(W = W_in, 
#>         adaptive = TRUE, isgcv = F))
#> 
#> Model: MGWRSAR_1_kc_kv 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 11 
#> ------------------------------------------------------
#> Model Settings:
#>    Method for spatial autocorrelation: 2SLS 
#>    Computation time: 0.116 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 800 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Constant Parameters]
#>       X3 
#> -1.05894 
#> 
#>    [Varying Parameters]
#>    Intercept               X1                X2              lambda         
#>  Min.   :-10.82461   Min.   :0.07481   Min.   :-0.7886   Min.   :-0.157793  
#>  1st Qu.: -1.91473   1st Qu.:0.37015   1st Qu.: 1.9299   1st Qu.:-0.028099  
#>  Median : -0.89254   Median :0.53437   Median : 2.8884   Median : 0.001659  
#>  Mean   : -1.12218   Mean   :0.51080   Mean   : 3.3694   Mean   : 0.003490  
#>  3rd Qu.: -0.03437   3rd Qu.:0.65271   3rd Qu.: 5.3199   3rd Qu.: 0.036549  
#>  Max.   :  1.79588   Max.   :0.97652   Max.   :11.9464   Max.   : 0.157537  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 668.12 
#>    AICc: 852.14 
#>    Residual sum of squares: 91.38 
#>    RMSE: 0.338 
#> ------------------------------------------------------

### model_MGWRSAR_1_kc_kv: predictions
## without Best Linear Unbiased Predictor
newdata=mydata[index_out,]
newdata_coords=coord[index_out,]
newdata$Y_mgwrsar_1_kc_kv=0

Y_pred15=predict(model_MGWRSAR_1_kc_kv_insample, newdata=newdata, newdata_coords=newdata_coords,W=W,type='YTC',method_pred='shepard')
head(Y_pred15)
#> [1] 0.5695360 4.5434772 2.5779640 2.7029408 0.9313297 2.2977041
RMSE_YTC=sqrt(mean((mydata$Y_mgwrsar_1_kc_kv[index_out]-Y_pred15)^2))
RMSE_YTC
#> [1] 0.4463266

## Using Best Linear Unbiased Predictor
Y_pred16=predict(model_MGWRSAR_1_kc_kv_insample, newdata=newdata, newdata_coords=newdata_coords,W=W,type='BPN',method_pred='shepard')
head(Y_pred16)
#> [1] 0.5864393 4.5429482 2.5936091 2.7319570 0.9341798 2.3247283
RMSE_BPN=sqrt(mean((mydata$Y_mgwrsar_1_kc_kv[index_out]-Y_pred16)^2))
RMSE_BPN 
#> [1] 0.4305838

References

Brunsdon, C., Fotheringham, A., Charlton, M., 1996. Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 28 (4), 281–298.

Friedman, J. H. 1984. A variable span smoother. Laboratory for Computational Statistics, Department of Statistics, Stanford University: Technical Report(5).

Geniaux, G. and Martinetti, D. 2018. A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics, 72, 74-85. https://doi.org/10.1016/j.regsciurbeco.2017.04.001

Geniaux, G. and Martinetti, D. 2017b. R package mgwrsar: Functions for computing (mixed) geographycally weighted regression with spatial autocorrelation,. https://CRAN.R-project.org/package=mgwrsar.

Geniaux, G. 2024. Speeding up estimation of spatially varying coefficients models. Jounal of Geographical System 26, 293–327. https://doi.org/10.1007/s10109-024-00442-3

Loader, C. 1999. Local regression and likelihood, volume 47. springer New York.

McMillen, D. P. 1996. One hundred fifty years of land values in chicago: A nonparametric approach. Journal of Urban Economics, 40(1):100 – 124.